This application is a component of a larger backend system for industrial parts evaluation. It uses the LLaVA model to perform visual inspections of factory equipment, identify objects, materials, and potential issues, and generate a detailed inspection report in PDF format.
The application is organized into the following modules:
app.py: The main entry point for the application.model_loader.py: Handles loading the LLaVA model and processor.helper_functions.py: Contains utility functions for image processing, QR code generation, and interacting with the LLaVA model.report_generator.py: Generates the inspection report in Markdown and PDF format.gradio_ui.py: Defines the Gradio user interface.requirements.txt: Lists the Python dependencies for the project.style.css: Contains the CSS for the PDF report.
The application leverages the llava-hf/llava-1.5-7b-hf model, a 7-billion parameter Large Language and Vision Assistant (LLaVA) model. This powerful model is designed to understand both text and images, making it ideal for visual inspection tasks.
- Multi-modal Understanding: The LLaVA model can process and reason about both visual and textual information, enabling it to analyze images and generate descriptive text.
- 4-bit Quantization: To optimize performance and reduce the memory footprint, the model is loaded with 4-bit quantization using the
bitsandbyteslibrary. This allows the model to run efficiently on a wider range of hardware. - Object and Anomaly Detection: The model is used to identify objects, assess their condition, and detect anomalies or potential issues in the factory equipment.
- Report Generation: The insights generated by the LLaVA model are used to create a comprehensive inspection report, which includes a summary of the findings, a list of identified objects, and a comparison with a reference image.
-
Create a virtual environment:
python3 -m venv vinsp source vinsp/bin/activate -
Install the required dependencies:
pip install -r requirements.txt
To start the application, run the following command:
python3 app.pyThis will start a Gradio server, and you can access the user interface in your web browser at http://0.0.0.0:7860.