Ajinkya Bobade

Posted on Jan 15, 2025

📝✨ClearText

#devchallenge #githubchallenge #webdev #ai

This is a submission for the GitHub Copilot Challenge : Transitions and Transformations

What I Built

I have built "ClearText" which is an AI-powered text detection and enhancement tool that makes text in images cleaner.

It's Perfect For 🎯

📄 Document Digitization
📚 Book Scanning
📱 Mobile Photos of Text
🖨️ Improving Scanned Documents
📑 Text Enhancement in Images

Demo

Repo

Github Repository - ClearText

Here's an example of what ClearText can do:

ClearText takes input image (left hand side), removes all noise and outputs pure text (right hand side).

ClearText has a huge potential where it can be used in the following fields:

Document Processing 📄

Banking & Finance
- 🏦 Check processing
- 📊 Financial statement digitization

Healthcare 🏥

Medical Records
- 📋 Patient records digitization
- 🔬 Lab report enhancement

Legal Industry ⚖️

Document Management
- 📜 Contract digitization
- 🗄️ Case file processing

Academic Use Cases 📚

📖 Textbook scanning
📑 Research paper digitization

Copilot Experience 🤖

I used co-pilot extensively to complete this amazing project. Here are the ways in which co-pilot helped me :

Code Completion 📝

Auto-completed common OpenCV operations
Suggested image processing parameters
Completed function signatures for Streamlit components

Chat Assistance 💬

Debugged ONNX model loading issues
Explained image processing pipeline
Suggested optimizations for image transformations

Inline Suggestions ⚡

Recommended error handling patterns
Suggested variable names and types

Model Switching 🔄

Used different models for specific tasks:

Code Completion: GitHub Copilot
Documentation: Claude
Debugging: GPT-4

Common Prompts Used 🎯

# Function implementation
/explain image processing pipeline
/suggest error handling
/optimize performance

Code Edits ✏️

Refactored image processing functions
Added blur/no-blur options
Improved error messages
Enhanced documentation

Project Evolution & Contributions

Building on Open Source

This project builds upon the excellent CRAFT text detection model by CLOVA AI Research, while making significant architectural and functional improvements:

1. Production-Ready Architecture 🏗️

I converted the research-focused PyTorch model to production-ready ONNX format
Leveraged ONNX Runtime for optimized inference across different hardware
Added complete Docker containerization for reliable deployment

2. Enhanced Text Processing Pipeline 🔄

The original CRAFT model provides basic text detection. ClearText significantly expands on this by:

Adding custom image preprocessing for better text clarity
Implementing new post-processing transforms for enhanced output quality
Creating an entirely new text enhancement pipeline
Developing a user-friendly web interface for easy interaction

3. Major Output Improvements 📈

ClearText transforms the basic text detection output into a comprehensive text enhancement solution:

Original CRAFT: Basic text region detection
ClearText Additions:
- Text clarity enhancement
- Document digitization capabilities
- Support for various document types (books, mobile photos, scanned documents)
- Complete image processing pipeline

Transparency Statement

While this project builds upon CRAFT's foundational text detection capabilities, ClearText represents a significant evolution with entirely new functionality, architecture, and use cases. All original CRAFT code is properly credited and licensed under MIT License.

Conclusion

Developing ClearText during the GitHub Copilot 1-Day Build Challenge has been an amazing journey. Without co-pilot, transforming complex text detection model into an accessible, user-friendly web application would have been tremendously difficult. The project showcases how AI can bridge the gap between computer vision and practical, everyday use cases.

DEV Community