Research Flow API

An AI-powered research paper analysis and Q&A API that helps researchers find and analyse academic papers from arXiv using natural language queries.

Features

Natural Language Query Processing: Convert research questions into optimised arXiv search queries using AI with temporal sorting support
Intelligent Paper Analysis: AI-powered synthesis of research papers into comprehensive answers with confidence scoring
Multiple Citation Formats: Generate citations in APA, MLA, Chicago, Harvard, BibTeX, and IEEE formats with modular formatter architecture
Direct Paper Links: Access PDF, HTML, and arXiv links for each paper with full citation data
RESTful API: Clean, well-documented API endpoints with OpenAPI/Swagger documentation

Quick Start

Prerequisites

Python 3.9+
uv package manager

Installation

Clone the repository:

git clone <repository-url>
cd research-flow-api

Install dependencies:
```
uv sync
```

Set up environment variables:

cp .env.example .env
# Edit .env with your API keys

Run the application:

uv run uvicorn main:app --reload --host 0.0.0.0 --port 8000

Access the API:
- API: http://localhost:8000
- Interactive docs: http://localhost:8000/docs
- Alternative docs: http://localhost:8000/redoc

Environment Setup

Create a .env file in the project root with the following variables:

# AI Configuration
GOOGLE_API_KEY=your_google_api_key_here
GEMINI_MODEL=gemini-2.5-flash

# LangChain Configuration (optional - only for tracing)
# LANGCHAIN_API_KEY=your_langchain_api_key_here
# LANGCHAIN_TRACING_V2=false
# LANGCHAIN_ENDPOINT=https://api.smith.langchain.com

Getting API Keys

Google API Key (required):
- Go to Google AI Studio
- Create a new API key
- Add it to your .env file
LangChain API Key (optional):
- Go to LangSmith
- Create an account and get your API key
- Used for tracing and monitoring AI interactions
- Not required - the API works without it

Project Structure

research-flow-api/
├── agents/                          # AI agents for query processing
│   ├── __init__.py
│   ├── prompt_templates.py          # Centralised AI prompt templates
│   └── query_conversion_agent.py    # Converts natural language to arXiv queries
├── controllers/                     # API endpoint controllers
│   ├── __init__.py
│   └── research_paper_controller.py # Main research question endpoint
├── models/                          # Pydantic data models
│   ├── __init__.py
│   ├── base.py                      # Base response models
│   ├── citation_types.py            # Citation format enums
│   ├── research_paper.py            # Research paper data model
│   └── research_paper_link.py       # Paper links and citations model
├── services/                        # Business logic services
│   ├── __init__.py
│   ├── arxiv_service.py             # arXiv API integration
│   ├── citation_service.py          # Citation generation service
│   ├── citation_formatters/         # Citation format implementations
│   │   ├── __init__.py
│   │   ├── base.py                  # Abstract citation formatter
│   │   ├── apa.py                   # APA citation formatter
│   │   ├── mla.py                   # MLA citation formatter
│   │   ├── chicago.py               # Chicago citation formatter
│   │   ├── harvard.py               # Harvard citation formatter
│   │   ├── bibtex.py                # BibTeX citation formatter
│   │   └── ieee.py                  # IEEE citation formatter
│   ├── query_converter.py           # Query conversion orchestration
│   └── research_analyser.py         # AI-powered research analysis
├── tests/                           # Test suite
│   ├── __init__.py
│   ├── fixtures/                    # Test data fixtures
│   │   ├── minimal_arxiv.xml
│   │   ├── multiple_arxiv.xml
│   │   └── valid_arxiv.xml
│   ├── test_arxiv_service.py        # arXiv service tests
│   ├── test_citation_service.py     # Citation service tests
│   ├── test_main.py                 # API endpoint tests
│   ├── test_paper.py                # Data model tests
│   └── test_prompt_templates.py     # AI prompt tests
├── config.py                        # Configuration management
├── main.py                          # FastAPI application entry point
├── pyproject.toml                   # Project configuration and dependencies
└── README.md                        # This file

Key Components

AI Agents: Handle natural language processing and research analysis with centralised prompt templates
Citation Formatters: Modular citation format implementations (APA, MLA, Chicago, Harvard, BibTeX, IEEE) in separate files for maintainability
Services: Business logic for arXiv integration, query conversion, and research analysis
Models: Type-safe data structures using Pydantic v2
Controllers: Clean API endpoints with proper error handling
Tests: Comprehensive test coverage for all components with fixtures

API Endpoints

Research Papers

POST `/api/research-papers`

Answer research questions using AI-powered paper analysis.

Parameters:

question (string, required): Research question in natural language
max_papers (integer, optional): Maximum papers to analyse (default: 10, max: 50)

Example Request:

curl -X POST "http://localhost:8000/api/research-papers?question=What%20are%20the%20latest%20developments%20in%20machine%20learning%20for%20healthcare?&max_papers=5" \
  -H "accept: application/json"

Example Response:

{
  "question": "What are the latest developments in machine learning for healthcare?",
  "answer": "The latest developments in machine learning for healthcare...",
  "key_findings": [
    "ML as a Catalyst for Value-Based Care...",
    "Ethical Considerations and Health Equity..."
  ],
  "relevant_papers": [
    {
      "title": "Machine Learning as a Catalyst for Value-Based Health Care",
      "summary": "This paper highlights the strategic role of ML..."
    }
  ],
  "paper_links": [
    {
      "title": "Machine Learning as a Catalyst for Value-Based Health Care",
      "authors": ["Matthew G. Crowson", "Timothy C. Y. Chan"],
      "published": "2020-05-15T13:22:08",
      "arxiv_id": "2005.07534v1",
      "links": {
        "pdf": "http://arxiv.org/pdf/2005.07534v1",
        "html": "http://arxiv.org/abs/2005.07534v1",
        "arxiv": "https://arxiv.org/abs/2005.07534v1",
        "doi": null
      },
      "citations": {
        "apa": "Matthew G. Crowson & Timothy C. Y. Chan (2020)...",
        "mla": "Matthew G. Crowson and Timothy C. Y. Chan...",
        "chicago": "Matthew G. Crowson and Timothy C. Y. Chan...",
        "harvard": "Matthew G. Crowson and Timothy C. Y. Chan 2020...",
        "bibtex": "@article{crowson2020,\n  author = {Matthew G. Crowson...",
        "ieee": "Matthew G. Crowson, Timothy C. Y. Chan..."
      }
    }
  ],
  "confidence": 0.8,
  "search_strategy": "Multi-field search with synonym expansion",
  "papers_analysed": 5,
  "metadata": {
    "total_papers_found": 5,
    "query_used": "(ti:\"machine learning\" OR abs:\"machine learning\") AND (ti:healthcare OR abs:healthcare...)",
    "search_performed_at": "2024-01-15T10:30:00.000Z"
  }
}

Health Check

GET `/api/health`

Check API health status.

Response:

{
  "status": "healthy",
  "timestamp": "2024-01-15T10:30:00.000Z"
}

Temporal Query Support

The API intelligently recognises temporal keywords and applies appropriate sorting to return the most relevant papers.

Development

Running Tests

# Run all tests
uv run pytest

# Run with verbose output
uv run pytest -v

# Run specific test file
uv run pytest tests/test_citation_service.py -v

Code Quality

# Lint code
uv run flake8 --max-line-length=79

# Format code (if using black)
uv run black --line-length=79 .

Development Standards

Development Workflow

# 1. Make your changes
# 2. Run linting
uv run flake8 --max-line-length=79 --exclude=.venv,uv.lock .

# 3. Run tests
uv run python -m pytest tests/ -v

# 4. Fix any issues
# 5. Commit only when all checks pass


## Contributing

1. Fork the repository
2. Create a feature branch
3. Make your changes following the development standards above
4. Add tests for new functionality
5. Ensure all tests pass and linting is clean
6. Submit a pull request


## Support

For issues and questions:
- Create an issue in the repository
- Check the API documentation at `/docs`
- Review the test files for usage examples

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Research Flow API

Features

Quick Start

Prerequisites

Installation

Environment Setup

Getting API Keys

Project Structure

Key Components

API Endpoints

Research Papers

POST `/api/research-papers`

Health Check

GET `/api/health`

Temporal Query Support

Development

Running Tests

Code Quality

Development Standards

Development Workflow

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
agents		agents
controllers		controllers
models		models
services		services
tests		tests
.gitignore		.gitignore
README.md		README.md
config.py		config.py
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

mpconsultsai/research-flow-api

Folders and files

Latest commit

History

Repository files navigation

Research Flow API

Features

Quick Start

Prerequisites

Installation

Environment Setup

Getting API Keys

Project Structure

Key Components

API Endpoints

Research Papers

POST /api/research-papers

Health Check

GET /api/health

Temporal Query Support

Development

Running Tests

Code Quality

Development Standards

Development Workflow

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

POST `/api/research-papers`

GET `/api/health`

Packages