An AI-powered research paper analysis and Q&A API that helps researchers find and analyse academic papers from arXiv using natural language queries.
- Natural Language Query Processing: Convert research questions into optimised arXiv search queries using AI with temporal sorting support
- Intelligent Paper Analysis: AI-powered synthesis of research papers into comprehensive answers with confidence scoring
- Multiple Citation Formats: Generate citations in APA, MLA, Chicago, Harvard, BibTeX, and IEEE formats with modular formatter architecture
- Direct Paper Links: Access PDF, HTML, and arXiv links for each paper with full citation data
- RESTful API: Clean, well-documented API endpoints with OpenAPI/Swagger documentation
- Python 3.9+
- uv package manager
-
Clone the repository:
git clone <repository-url> cd research-flow-api
-
Install dependencies:
uv sync
-
Set up environment variables:
cp .env.example .env # Edit .env with your API keys -
Run the application:
uv run uvicorn main:app --reload --host 0.0.0.0 --port 8000
-
Access the API:
- API: http://localhost:8000
- Interactive docs: http://localhost:8000/docs
- Alternative docs: http://localhost:8000/redoc
Create a .env file in the project root with the following variables:
# AI Configuration
GOOGLE_API_KEY=your_google_api_key_here
GEMINI_MODEL=gemini-2.5-flash
# LangChain Configuration (optional - only for tracing)
# LANGCHAIN_API_KEY=your_langchain_api_key_here
# LANGCHAIN_TRACING_V2=false
# LANGCHAIN_ENDPOINT=https://api.smith.langchain.com-
Google API Key (required):
- Go to Google AI Studio
- Create a new API key
- Add it to your
.envfile
-
LangChain API Key (optional):
- Go to LangSmith
- Create an account and get your API key
- Used for tracing and monitoring AI interactions
- Not required - the API works without it
research-flow-api/
├── agents/ # AI agents for query processing
│ ├── __init__.py
│ ├── prompt_templates.py # Centralised AI prompt templates
│ └── query_conversion_agent.py # Converts natural language to arXiv queries
├── controllers/ # API endpoint controllers
│ ├── __init__.py
│ └── research_paper_controller.py # Main research question endpoint
├── models/ # Pydantic data models
│ ├── __init__.py
│ ├── base.py # Base response models
│ ├── citation_types.py # Citation format enums
│ ├── research_paper.py # Research paper data model
│ └── research_paper_link.py # Paper links and citations model
├── services/ # Business logic services
│ ├── __init__.py
│ ├── arxiv_service.py # arXiv API integration
│ ├── citation_service.py # Citation generation service
│ ├── citation_formatters/ # Citation format implementations
│ │ ├── __init__.py
│ │ ├── base.py # Abstract citation formatter
│ │ ├── apa.py # APA citation formatter
│ │ ├── mla.py # MLA citation formatter
│ │ ├── chicago.py # Chicago citation formatter
│ │ ├── harvard.py # Harvard citation formatter
│ │ ├── bibtex.py # BibTeX citation formatter
│ │ └── ieee.py # IEEE citation formatter
│ ├── query_converter.py # Query conversion orchestration
│ └── research_analyser.py # AI-powered research analysis
├── tests/ # Test suite
│ ├── __init__.py
│ ├── fixtures/ # Test data fixtures
│ │ ├── minimal_arxiv.xml
│ │ ├── multiple_arxiv.xml
│ │ └── valid_arxiv.xml
│ ├── test_arxiv_service.py # arXiv service tests
│ ├── test_citation_service.py # Citation service tests
│ ├── test_main.py # API endpoint tests
│ ├── test_paper.py # Data model tests
│ └── test_prompt_templates.py # AI prompt tests
├── config.py # Configuration management
├── main.py # FastAPI application entry point
├── pyproject.toml # Project configuration and dependencies
└── README.md # This file
- AI Agents: Handle natural language processing and research analysis with centralised prompt templates
- Citation Formatters: Modular citation format implementations (APA, MLA, Chicago, Harvard, BibTeX, IEEE) in separate files for maintainability
- Services: Business logic for arXiv integration, query conversion, and research analysis
- Models: Type-safe data structures using Pydantic v2
- Controllers: Clean API endpoints with proper error handling
- Tests: Comprehensive test coverage for all components with fixtures
Answer research questions using AI-powered paper analysis.
Parameters:
question(string, required): Research question in natural languagemax_papers(integer, optional): Maximum papers to analyse (default: 10, max: 50)
Example Request:
curl -X POST "http://localhost:8000/api/research-papers?question=What%20are%20the%20latest%20developments%20in%20machine%20learning%20for%20healthcare?&max_papers=5" \
-H "accept: application/json"Example Response:
{
"question": "What are the latest developments in machine learning for healthcare?",
"answer": "The latest developments in machine learning for healthcare...",
"key_findings": [
"ML as a Catalyst for Value-Based Care...",
"Ethical Considerations and Health Equity..."
],
"relevant_papers": [
{
"title": "Machine Learning as a Catalyst for Value-Based Health Care",
"summary": "This paper highlights the strategic role of ML..."
}
],
"paper_links": [
{
"title": "Machine Learning as a Catalyst for Value-Based Health Care",
"authors": ["Matthew G. Crowson", "Timothy C. Y. Chan"],
"published": "2020-05-15T13:22:08",
"arxiv_id": "2005.07534v1",
"links": {
"pdf": "http://arxiv.org/pdf/2005.07534v1",
"html": "http://arxiv.org/abs/2005.07534v1",
"arxiv": "https://arxiv.org/abs/2005.07534v1",
"doi": null
},
"citations": {
"apa": "Matthew G. Crowson & Timothy C. Y. Chan (2020)...",
"mla": "Matthew G. Crowson and Timothy C. Y. Chan...",
"chicago": "Matthew G. Crowson and Timothy C. Y. Chan...",
"harvard": "Matthew G. Crowson and Timothy C. Y. Chan 2020...",
"bibtex": "@article{crowson2020,\n author = {Matthew G. Crowson...",
"ieee": "Matthew G. Crowson, Timothy C. Y. Chan..."
}
}
],
"confidence": 0.8,
"search_strategy": "Multi-field search with synonym expansion",
"papers_analysed": 5,
"metadata": {
"total_papers_found": 5,
"query_used": "(ti:\"machine learning\" OR abs:\"machine learning\") AND (ti:healthcare OR abs:healthcare...)",
"search_performed_at": "2024-01-15T10:30:00.000Z"
}
}Check API health status.
Response:
{
"status": "healthy",
"timestamp": "2024-01-15T10:30:00.000Z"
}The API intelligently recognises temporal keywords and applies appropriate sorting to return the most relevant papers.
# Run all tests
uv run pytest
# Run with verbose output
uv run pytest -v
# Run specific test file
uv run pytest tests/test_citation_service.py -v# Lint code
uv run flake8 --max-line-length=79
# Format code (if using black)
uv run black --line-length=79 .# 1. Make your changes
# 2. Run linting
uv run flake8 --max-line-length=79 --exclude=.venv,uv.lock .
# 3. Run tests
uv run python -m pytest tests/ -v
# 4. Fix any issues
# 5. Commit only when all checks pass
## Contributing
1. Fork the repository
2. Create a feature branch
3. Make your changes following the development standards above
4. Add tests for new functionality
5. Ensure all tests pass and linting is clean
6. Submit a pull request
## Support
For issues and questions:
- Create an issue in the repository
- Check the API documentation at `/docs`
- Review the test files for usage examples