Skip to content

Zakaria010/Radio-RAG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Radio-RAG

Retrieval-Augmented Generation (RAG) for radio regulations (e.g., ITU rules and spectrum management).
Index regulation PDFs with FAISS, retrieve the most relevant passages, and generate grounded answers with an LLM.

arXiv Paper

RAG pipeline overview (Fig. 2)


Table of Contents


Overview

Radio-RAG implements a practical RAG pipeline tailored for telecom/spectrum regulations:

  1. Ingest regulation PDFs and split them into chunks.
  2. Embed those chunks and build a FAISS index.
  3. Retrieve the most relevant passages for a user question.
  4. Generate grounded answers with an LLM using the retrieved context.

Radio-RAG Custom GPT

Prefer a chat interface instead of running the code locally?

Use the Radio-RAG Custom GPT, a specialized assistant built on top of this RAG pipeline:

  • ✅ Backed by the same PDF → chunks → FAISS retrieval used in this repo
  • 📚 Tailored for ITU Radio Regulations & spectrum management use-cases
  • 📎 Answers are grounded in the indexed documents, with references to the relevant provisions
  • 🧪 Great for quick checks, exploration, and validating how RAG behaves before full deployment

👉 Open the Radio-RAG Custom GPT


Features

  • 🔎 PDF → chunks → FAISS: simple, configurable ingestion pipeline
  • 🧠 Model-agnostic: choose your embedding and LLM backends
  • ⚙️ Tunable retrieval: chunk size, overlap, index type, top-K
  • 🧪 Experiment ready: compare vanilla LLM vs. RAG-augmented runs

Quick Start

1) Install

git clone https://github.com/Zakaria010/Radio-RAG.git
cd Radio-RAG

# (optional) create a virtual environment
python -m venv .venv
source .venv/bin/activate    # Windows: .venv\Scripts\activate

pip install -r requirements.txt

2) Add your PDFs

Create the data/ folder (if it doesn’t exist) and put your regulation PDFs inside:

data/
├─ 2400594-RR-Vol 1-E-A5.pdf
├─ 2400594-RR-Vol 2-E-A5.pdf
├─ 2400594-RR-Vol 3-E-A5.pdf
├─ 2400594-RR-Vol 4-E-A5.pdf
└─ your_other_regulation_book.pdf

Project Structure

Radio-RAG/
├─ data/                 # Put regulation PDFs here
├─ tests/                # Evaluation / experiment scripts
├─ utils/                # Helpers (parsing, chunking, indexing, retrieval)
├─ local_rag.py          # CLI entry-point
├─ requirements.txt
├─ LICENSE
└─ README.md

Usage

Run the built-in help to see the exact flags supported by your current version.

python local_rag.py --help

Example

**A) Ask a question **

python local_rag.py \
  --pdf_folder ./data \

Then the app will run and you can ask your question

Common Arguments

  • --pdf_folder (str, default: ./data) — directory of PDFs
  • --chunk_size (int) — chunk length used for text splitting
  • --overlap (int) — overlap between adjacent chunks
  • --index_type (str) — FAISS index (flatl2, hnsw, ivfflat, ivfpq, …)
  • --model_name (str) — LLM ID/name
  • --top_k (int) — number of retrieved chunks

Experiments

Evaluation utilities live in tests/. A typical pattern:

python -m tests.evaluation  --chunk_size 400 --overlap 50 --index_type innerproduct --top_k 5 --model_name "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B" --test "splits/eval.json"

This evaluation.py script include a switch like --norag to compare vanilla LLM vs RAG.


Hugging Face (ZeroGPU)

Prefer a hosted demo? Try the app on Hugging Face Spaces (ZeroGPU spins up on demand):

Open the HF Space


Figures

Figure 5 — Vanilla vs. RAG (Qualitative)

GPT-4o vs RAG with context (Fig. 5)


Troubleshooting

  • No/irrelevant answers → Confirm PDFs parse correctly; try larger --top_k; adjust --chunk_size / --overlap
  • Index performance → Start with flatl2 (baseline) or hnsw (fast). IVF variants can help at larger scale
  • Changed models/params → Rebuild the index to avoid stale vectors

Citation

If you use this repository, please cite the paper:

@misc{kassimi2025retrieval,
      title={Retrieval-Augmented Generation for Reliable Interpretation of Radio Regulations}, 
      author={Zakaria El Kassimi and Fares Fourati and Mohamed-Slim Alouini},
      year={2025},
      eprint={2509.09651},
      archivePrefix={arXiv},
      primaryClass={cs.IR},
      url={https://arxiv.org/abs/2509.09651}, 
}






About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages