This is a Ruby on Rails application that demonstrates how to implement file upload using Active Storage and perform Optical Character Recognition (OCR) using the rtesseract gem.
- Upload files (e.g., images, PDFs) using Active Storage
- Perform OCR on uploaded files to extract text
- Display the extracted text on the document show page
- Ruby 3.0.0 or later
- Rails 6.0 or later
- Tesseract OCR
- Tesseract language pack (optional, for additional languages)
Ensure you have the following installed:
- Ruby
- Rails
- Tesseract OCR
macOS:
brew install tesseract
brew install tesseract-langUbuntu:
sudo apt-get install tesseract-ocr
sudo apt-get install tesseract-ocr-<lang-code>Windows:
Download the installer from the Tesseract at UB Mannheim page and follow the installation instructions. For additional languages, download the appropriate language pack files.
- Clone the repository:
git clone https://github.com/your-username/rails-file-upload-ocr.git
cd rails-file-upload-ocr- Install dependencies:
bundle install- Set up the database:
rails db:create
rails db:migrate- Start the Rails server:
./bin/dev-
Navigate to
http://localhost:3000in your web browser. -
Upload a new document by visiting
http://localhost:3000/documents/new. -
After uploading, the OCR text will be displayed on the document show page.
app/models/document.rb: Model representing the Document with file attachment and OCR method.app/controllers/documents_controller.rb: Controller handling document upload and OCR.app/views/documents/new.html.erb: Form for uploading a new document.app/views/documents/show.html.erb: View displaying the uploaded document and OCR text.
- Fork the repository
- Create your feature branch (
git checkout -b feature/new-feature) - Commit your changes (
git commit -m 'Add some feature') - Push to the branch (
git push origin feature/new-feature) - Open a pull request
This project is licensed under the MIT License.