FinancialRAG

A Retrieval-Augmented Generation (RAG) application for financial data analysis using DeepSeek LLM, built with Streamlit and powered by LangChain.

Overview

FinancialRAG is an intelligent document analysis tool that allows you to upload financial PDF documents and ask questions about them using natural language. The application uses advanced RAG techniques to provide accurate, context-aware answers by retrieving relevant information from your documents.

Features

📄 PDF Document Processing: Upload and process financial PDF documents using Docling
🔍 Semantic Search: Uses FAISS vector database for efficient similarity search
💬 Natural Language Q&A: Ask questions in plain English about your financial documents
🖼️ PDF Preview: View uploaded PDFs directly in the sidebar
💾 Persistent Storage: Save processed documents and reuse them without reprocessing
🔄 Streaming Responses: Real-time answer generation with streaming support
🎯 Context-Aware Answers: Leverages DeepSeek R1 model for accurate financial analysis

Prerequisites

Before running this application, you need to have:

Python 3.11+ installed on your system
Poppler (for PDF to image conversion):
- Ubuntu/Debian: sudo apt-get install poppler-utils
- macOS: brew install poppler
- Windows: Download from poppler for Windows
Ollama installed and running locally
Required Ollama models:
- nomic-embed-text (for embeddings)
- deepseek-r1:1.5b (for question answering)

Installing Ollama and Models

Install Ollama from https://ollama.ai

Pull the required models:

ollama pull nomic-embed-text
ollama pull deepseek-r1:1.5b

Ensure Ollama is running:
```
ollama serve
```

Installation

Clone the repository:

git clone https://github.com/hyperion912/FinancialRAG.git
cd FinancialRAG

Install the required Python packages:
```
pip install -r requirements.txt
```

Usage

Start the Streamlit application:
```
streamlit run app.py
```
Open your browser and navigate to http://localhost:8501
Upload a new document:
- Select "Upload New Document" from the dropdown
- Upload a PDF file containing financial data
- Click "Process PDF and Store in Vector DB"
- Wait for processing to complete
Query existing documents:
- Select a previously processed document from the dropdown
- Enter your question in the text input field
- Click "Submit Question" to get an answer

Project Structure

FinancialRAG/
├── app.py                  # Main Streamlit application
├── rag.py                  # RAG pipeline implementation
├── ragbot.ipynb           # Jupyter notebook for experimentation
├── requirements.txt       # Python dependencies
├── vector_db/            # Storage for FAISS vector databases and PDFs
├── .devcontainer/        # Dev container configuration
└── README.md             # This file

How It Works

Document Processing: PDFs are converted to markdown using Docling
Text Splitting: Markdown content is split into chunks based on headers
Embedding: Text chunks are embedded using the nomic-embed-text model
Vector Storage: Embeddings are stored in a FAISS vector database
Retrieval: When a question is asked, relevant chunks are retrieved using MMR search
Answer Generation: The DeepSeek model generates answers based on retrieved context

Configuration

The application uses the following default configurations:

Ollama Base URL: http://localhost:11434
Embedding Model: nomic-embed-text
LLM Model: deepseek-r1:1.5b
Vector DB Folder: vector_db/
Retrieval Method: MMR (Maximum Marginal Relevance)
Top K Results: 5

To modify these settings, edit the respective files:

app.py for Streamlit UI settings
rag.py for RAG pipeline configuration

Dependencies

Core dependencies include:

langchain - LLM orchestration framework
langchain-community - Community integrations
langchain-ollama - Ollama integration
faiss-cpu - Vector similarity search
docling - PDF to markdown conversion
streamlit - Web UI framework
pdf2image - PDF rendering

See requirements.txt for the complete list.

Development

This project includes a dev container configuration for easy development setup. If you're using VS Code or GitHub Codespaces:

Open the project in VS Code
Click "Reopen in Container" when prompted
The environment will be automatically configured

Troubleshooting

Ollama Connection Issues

Ensure Ollama is running: ollama serve
Check if models are installed: ollama list
Verify the base URL in the code matches your Ollama installation

PDF Processing Errors

Ensure the PDF is not corrupted
Check if the PDF contains actual text (not just images)
Verify you have sufficient disk space for image conversion

Memory Issues

Consider using smaller documents
Reduce the chunk size in rag.py
Use a more lightweight embedding model

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is open source. Please check the repository for license information.

Acknowledgments

Built with LangChain
Powered by Ollama and DeepSeek
Document processing by Docling
UI built with Streamlit

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FinancialRAG

Overview

Features

Prerequisites

Installing Ollama and Models

Installation

Usage

Project Structure

How It Works

Configuration

Dependencies

Development

Troubleshooting

Ollama Connection Issues

PDF Processing Errors

Memory Issues

Contributing

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.devcontainer		.devcontainer
.gitignore		.gitignore
README.md		README.md
app.py		app.py
rag.py		rag.py
ragbot.ipynb		ragbot.ipynb
requirements.txt		requirements.txt

hyperion912/FinancialRAG

Folders and files

Latest commit

History

Repository files navigation

FinancialRAG

Overview

Features

Prerequisites

Installing Ollama and Models

Installation

Usage

Project Structure

How It Works

Configuration

Dependencies

Development

Troubleshooting

Ollama Connection Issues

PDF Processing Errors

Memory Issues

Contributing

License

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages