An AI-powered legal assistant for Canadian criminal law, built with Streamlit, Google Gemini AI, and FAISS vector search.
- Intelligent Search: Semantic search through the entire Criminal Code of Canada
- AI-Powered Responses: Context-aware answers using Google Gemini AI
- Source Citations: Always shows relevant legal sections with full references
- Custom Context: Upload legal documents for specialized searches
- Chat History: Track previous questions and responses
- Downloadable Reports: Export questions, answers, and sources
- Professional UI: Clean, responsive interface with legal disclaimers
pip install -r requirements.txtCreate a .env file in the root directory:
GOOGLE_API_KEY=your_gemini_api_key_hereCriminal Code PDF From (https://laws-lois.justice.gc.ca/eng/acts/C-46/page-1.html):
data/
└── C-46.pdf # Canadian Criminal Code PDF
python pdf_scraper.pyThis will:
- Extract legal sections from the PDF
- Clean and validate content
- Save to
data/processed/sections.json
python build_index.pyThis will:
- Create vector embeddings for all sections
- Build FAISS search index
- Save to
faiss_index/directory
streamlit run app.py.
├── app.py # Main Streamlit application
├── pdf_scraper.py # PDF extraction and processing
├── build_index.py # FAISS index creation
├── requirements.txt # Python dependencies
├── .env # Environment variables (LLM API Key)
├── data/
│ ├── C-46.pdf # Criminal Code PDF (you provide)
│ └── processed/
│ └── sections.json # Extracted sections (generated)
└── faiss_index/ # Vector search index (generated)
The default embedding model is all-MiniLM-L6-v2. To use a different model, modify build_index.py:
self.embedding_model = "sentence-transformers/all-mpnet-base-v2" # Higher qualityAdjust search results in app.py:
relevant_docs = assistant.get_relevant_documents(query, k=10) # More resultsSwitch to a different Gemini model in app.py:
model = genai.GenerativeModel(model_name="models/gemini-1.5-pro-001") # More powerful-
"FAISS index not found"
- Run
python build_index.pyto create the search index
- Run
-
"Sections file not found"
- Run
python pdf_scraper.pyto extract sections from PDF
- Run
-
"GOOGLE_API_KEY not found"
- Add Gemini API key to the
.envfile
- Add Gemini API key to the
-
Memory issues during indexing (Optional)
- Reduce batch size in
build_index.py - Use a smaller embedding model
- Reduce batch size in
-
PDF extraction errors
- Ensure
data/C-46.pdfexists and is readable - Check PDF format compatibility
- Ensure
- First run: Initial model loading takes 30-60 seconds
- Subsequent runs: Models are cached for instant responses
- Large PDFs: Extraction may take several minutes
- Memory usage: ~2GB RAM recommended for full functionality
- "What constitutes assault under Canadian law?"
- "What are the penalties for theft over $5000?"
- "How is murder defined in the Criminal Code?"
- "Explain section 265 of the Criminal Code"
- "What does section 322 say about theft?"
- "What is the difference between summary and indictable offences?"
- "How does the bail system work in Canada?"
This application provides general information about Canadian criminal law for educational purposes only. It is NOT a substitute for professional legal advice. Always consult with a qualified lawyer for specific legal matters.
Contributions are welcome! Please ensure all changes maintain:
- Code quality and documentation
- Legal accuracy and appropriate disclaimers
- Performance optimizations
- User experience improvements
This project is for educational and research purposes. Please respect copyright laws and use responsibly.
Built with: Streamlit • Google Gemini AI • FAISS • LangChain • PyMuPDF