A comprehensive framework for testing and evaluating attacks against Retrieval-Augmented Generation (RAG) systems, implementing various attack methodologies from the research literature.
This project implements a complete RAG attack research framework that includes:
- Victim RAG System: A simplified but functional RAG implementation using LangChain
- Multiple RAG Attacks: Implementation of various attack methodologies from the literature, including PoisonedRAG, CorruptRAG, and other knowledge corruption attacks
- Evaluation Framework: Comprehensive metrics for assessing attack effectiveness
- Dataset Management: Support for BEIR benchmark datasets (Natural Questions, MS MARCO, HotpotQA)
The framework is designed for security research and educational purposes to understand vulnerabilities in RAG systems and implement the growing body of research on RAG attacks.
This repository is under active development. Our goal is to implement a comprehensive collection of RAG attacks from the research literature. We are continuously working to expand the framework with additional attack methodologies from published papers and improvements to existing implementations.
Current status:
- β PoisonedRAG attack implementation (complete)
- β CorruptRAG attack implementation (complete with its variations CorruptRAG-AS and CorruptRAG-AK)
- π Additional attack methods from literature (in progress)
- π Enhanced evaluation metrics (planned)
- π Defense mechanisms (planned)
Available Attack Methods:
- PoisonedRAG: A knowledge poisoning attack that generates malicious documents using both generator attacks (creating documents with incorrect information) and retrieval attacks (optimizing documents for high retrieval relevance while containing misleading content)
- CorruptRAG: Template-based poisoning attacks with two variants:
- CorruptRAG-AS (Adversarial Suffix): Uses specific templates to construct poisoned text by combining target queries with adversarial templates claiming correct answers are outdated
- CorruptRAG-AK (Adversarial Knowledge): Builds on AS by using LLM refinement to make malicious documents more natural and coherent while preserving targeted misinformation
- Practical Poisoning Attacks against Retrieval-Augmented Generation
awesome-rag-attacks/
βββ src/ # Main source code
β βββ victim_rag.py # RAG system implementation
β βββ attacks/ # Attack implementations
β β βββ poisoned_rag_attack.py # PoisonedRAG attack
β β βββ corrupt_rag_attack.py # Corrupt RAG attack
β β βββ attack_factory.py # Attack selection factory
β βββ dataset_loader.py # BEIR dataset handling
β βββ evaluation.py # Evaluation metrics
β βββ schemas.py # Data structures
β βββ prompts.py # Prompt templates
βββ config/ # Configuration files
β βββ config.py # Configuration classes
β βββ config.yaml # Default settings
βββ tests/ # Test files
β βββ testing_rag.py # RAG system tests
βββ main.py # Main orchestrator
βββ requirements.txt # Python dependencies
βββ README.md # This file
- Python 3.10+
- OpenAI API key (for language models)
- 8GB+ RAM (for dataset processing)
- Internet connection (for dataset downloads)
- Clone the repository:
git clone <repository-url>
cd awesome-rag-attacks- Install dependencies:
pip install -r requirements.txt- Set up your OpenAI API key:
export OPENAI_API_KEY="your-api-key-here"python main.pyYou can also specify which attack to use:
python main.py --attack poisoned_rag # Use PoisonedRAG attack
python main.py --attack corrupt_rag # Use Corrupt RAG attackThis will:
- Load and sample a dataset (Natural Questions by default)
- Build a RAG system with the benign documents
- Generate target queries for attack
- Create malicious documents using the selected attack method
- Insert malicious documents into the RAG system
- Compare responses before and after poisoning
Edit config/config.yaml to customize your configurations
A LangChain-based RAG implementation designed for attack research:
from src.victim_rag import VictimRAG
from config.config import RagConfig, load_settings
# Initialize RAG system
config = load_settings().rag_config
rag = VictimRAG(config)
# Load and process documents
documents = [Document(page_content="Your content here", metadata={})]
processed_docs = rag.prepare_documents(documents)
rag.build_vectorstore(processed_docs)
rag.setup_retrieval_chain()
# Query the system
answer = rag.query("What is machine learning?")
print(answer)Implementation of various attacks against RAG systems from the research literature:
from src.attacks.attack_factory import get_attack_class
from config.config import load_configuration
# Load configuration
config = load_configuration()
# Use attack factory to select attack method
attack = get_attack_class("poison_rag", config.attack_config)
# Or use corrupt rag attack
attack = get_attack_class("corrupt_rag", config.attack_config)
# Generate malicious documents for target queries
target_queries = ["What is the capital of France?"]
malicious_docs = attack.generate_malicious_corpus_for_target_queries(target_queries)
# Inject into RAG system
rag.insert_text(malicious_docs)Handles BEIR benchmark datasets:
from src.dataset_loader import BeirDatasetLoader
from config.config import DatasetLoaderConfiguration
# Load dataset
config = DatasetLoaderConfiguration(
dataset_name="nq",
dataset_path="data/",
sample_size=100
)
loader = BeirDatasetLoader(config)
dataset = loader.load_beir_dataset()
# Convert to documents
documents = loader.create_documents_from_dataset(dataset)Supported Datasets:
- Natural Questions (nq): Real questions from Google search
- MS MARCO (msmarco): Microsoft's reading comprehension dataset
- HotpotQA: Multi-hop reasoning questions
# Compare RAG responses before and after attack
from main import RagAttackOrchestrator
from src.victim_rag import VictimRAG
from src.dataset_loader import BeirDatasetLoader
from src.attacks.attack_factory import get_attack_class
from config.config import load_configuration
config = load_configuration()
rag = VictimRAG(configuration.rag_config)
dataset_loader = BeirDatasetLoader(configuration.dataset_loader_config)
attack = get_attack_class(attack_type, configuration.attack_config)
orchestrator = RagAttackOrchestrator(
rag,
dataset_loader,
attack
)
# Setup RAG with benign documents
orchestrator.initialize_rag_system()
# Generate target queries
target_queries = orchestrator.benchmark_dataset.get_random_queries(num_queries=5)
# Get clean responses
clean_responses = orchestrator.victim_rag_system.answer_multiple_questions(target_queries)
# Poison the system
orchestrator.inject_malicious_documents(target_queries)
# Get poisoned responses
poisoned_responses = orchestrator.victim_rag_system.answer_multiple_questions(target_queries)
# Compare results
for query, clean, poisoned in zip(target_queries, clean_responses, poisoned_responses):
print(f"Query: {query}")
print(f"Clean: {clean}")
print(f"Poisoned: {poisoned}")
print("---")Key dependencies:
langchain>=0.2.14- RAG frameworklangchain-community>=0.2.14- Community componentslangchain-openai>=0.1.25- OpenAI integrationfaiss-cpu>=1.8.0- Vector similarity searchsentence-transformers>=3.0.1- Text embeddingspandas>=2.2.2- Data manipulationloguru>=0.7.2- Logging
See requirements.txt for complete list.
Contributions are welcome! Priority areas for development:
- RAG Attack Methods: Implementation of attacks from recent research papers
- Defense Mechanisms: Methods to detect and prevent various attacks
- Evaluation Metrics: More sophisticated assessment methods for different attack types
- Model Support: Integration with additional LLM providers
- Performance: Optimization for larger datasets
We especially welcome implementations of new attack methods from the literature. Please ensure any new attacks include proper attribution to the original research.
This framework is designed for:
- β Security research and education
- β Understanding RAG vulnerabilities
- β Developing defense mechanisms
- β Academic research
NOT for:
- β Attacking production systems without permission
- β Spreading misinformation
- β Malicious activities
MIT License - see LICENSE file for details.
This framework implements attacks and methodologies from various research papers:
Primary Attack References:
- PoisonedRAG: Knowledge Poisoning Attacks on Retrieval-Augmented Generation
- Practical Poisoning Attacks against Retrieval-Augmented Generation
Dataset and Evaluation References:
Additional references will be added as more attack methods are implemented from the literature.
For issues and questions:
- Check existing GitHub issues
- Review configuration documentation
- Ensure proper API key setup
- Verify dataset downloads completed
For bugs or feature requests, please open an issue with:
- System information (OS, Python version)
- Error messages or logs
- Steps to reproduce
- Expected vs actual behavior