Skip to content

mpconsultsai/research-flow-api

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Research Flow API

An AI-powered research paper analysis and Q&A API that helps researchers find and analyse academic papers from arXiv using natural language queries.

Features

  • Natural Language Query Processing: Convert research questions into optimised arXiv search queries using AI with temporal sorting support
  • Intelligent Paper Analysis: AI-powered synthesis of research papers into comprehensive answers with confidence scoring
  • Multiple Citation Formats: Generate citations in APA, MLA, Chicago, Harvard, BibTeX, and IEEE formats with modular formatter architecture
  • Direct Paper Links: Access PDF, HTML, and arXiv links for each paper with full citation data
  • RESTful API: Clean, well-documented API endpoints with OpenAPI/Swagger documentation

Quick Start

Prerequisites

  • Python 3.9+
  • uv package manager

Installation

  1. Clone the repository:

    git clone <repository-url>
    cd research-flow-api
  2. Install dependencies:

    uv sync
  3. Set up environment variables:

    cp .env.example .env
    # Edit .env with your API keys
  4. Run the application:

    uv run uvicorn main:app --reload --host 0.0.0.0 --port 8000
  5. Access the API:

Environment Setup

Create a .env file in the project root with the following variables:

# AI Configuration
GOOGLE_API_KEY=your_google_api_key_here
GEMINI_MODEL=gemini-2.5-flash

# LangChain Configuration (optional - only for tracing)
# LANGCHAIN_API_KEY=your_langchain_api_key_here
# LANGCHAIN_TRACING_V2=false
# LANGCHAIN_ENDPOINT=https://api.smith.langchain.com

Getting API Keys

  1. Google API Key (required):

  2. LangChain API Key (optional):

    • Go to LangSmith
    • Create an account and get your API key
    • Used for tracing and monitoring AI interactions
    • Not required - the API works without it

Project Structure

research-flow-api/
├── agents/                          # AI agents for query processing
│   ├── __init__.py
│   ├── prompt_templates.py          # Centralised AI prompt templates
│   └── query_conversion_agent.py    # Converts natural language to arXiv queries
├── controllers/                     # API endpoint controllers
│   ├── __init__.py
│   └── research_paper_controller.py # Main research question endpoint
├── models/                          # Pydantic data models
│   ├── __init__.py
│   ├── base.py                      # Base response models
│   ├── citation_types.py            # Citation format enums
│   ├── research_paper.py            # Research paper data model
│   └── research_paper_link.py       # Paper links and citations model
├── services/                        # Business logic services
│   ├── __init__.py
│   ├── arxiv_service.py             # arXiv API integration
│   ├── citation_service.py          # Citation generation service
│   ├── citation_formatters/         # Citation format implementations
│   │   ├── __init__.py
│   │   ├── base.py                  # Abstract citation formatter
│   │   ├── apa.py                   # APA citation formatter
│   │   ├── mla.py                   # MLA citation formatter
│   │   ├── chicago.py               # Chicago citation formatter
│   │   ├── harvard.py               # Harvard citation formatter
│   │   ├── bibtex.py                # BibTeX citation formatter
│   │   └── ieee.py                  # IEEE citation formatter
│   ├── query_converter.py           # Query conversion orchestration
│   └── research_analyser.py         # AI-powered research analysis
├── tests/                           # Test suite
│   ├── __init__.py
│   ├── fixtures/                    # Test data fixtures
│   │   ├── minimal_arxiv.xml
│   │   ├── multiple_arxiv.xml
│   │   └── valid_arxiv.xml
│   ├── test_arxiv_service.py        # arXiv service tests
│   ├── test_citation_service.py     # Citation service tests
│   ├── test_main.py                 # API endpoint tests
│   ├── test_paper.py                # Data model tests
│   └── test_prompt_templates.py     # AI prompt tests
├── config.py                        # Configuration management
├── main.py                          # FastAPI application entry point
├── pyproject.toml                   # Project configuration and dependencies
└── README.md                        # This file

Key Components

  • AI Agents: Handle natural language processing and research analysis with centralised prompt templates
  • Citation Formatters: Modular citation format implementations (APA, MLA, Chicago, Harvard, BibTeX, IEEE) in separate files for maintainability
  • Services: Business logic for arXiv integration, query conversion, and research analysis
  • Models: Type-safe data structures using Pydantic v2
  • Controllers: Clean API endpoints with proper error handling
  • Tests: Comprehensive test coverage for all components with fixtures

API Endpoints

Research Papers

POST /api/research-papers

Answer research questions using AI-powered paper analysis.

Parameters:

  • question (string, required): Research question in natural language
  • max_papers (integer, optional): Maximum papers to analyse (default: 10, max: 50)

Example Request:

curl -X POST "http://localhost:8000/api/research-papers?question=What%20are%20the%20latest%20developments%20in%20machine%20learning%20for%20healthcare?&max_papers=5" \
  -H "accept: application/json"

Example Response:

{
  "question": "What are the latest developments in machine learning for healthcare?",
  "answer": "The latest developments in machine learning for healthcare...",
  "key_findings": [
    "ML as a Catalyst for Value-Based Care...",
    "Ethical Considerations and Health Equity..."
  ],
  "relevant_papers": [
    {
      "title": "Machine Learning as a Catalyst for Value-Based Health Care",
      "summary": "This paper highlights the strategic role of ML..."
    }
  ],
  "paper_links": [
    {
      "title": "Machine Learning as a Catalyst for Value-Based Health Care",
      "authors": ["Matthew G. Crowson", "Timothy C. Y. Chan"],
      "published": "2020-05-15T13:22:08",
      "arxiv_id": "2005.07534v1",
      "links": {
        "pdf": "http://arxiv.org/pdf/2005.07534v1",
        "html": "http://arxiv.org/abs/2005.07534v1",
        "arxiv": "https://arxiv.org/abs/2005.07534v1",
        "doi": null
      },
      "citations": {
        "apa": "Matthew G. Crowson & Timothy C. Y. Chan (2020)...",
        "mla": "Matthew G. Crowson and Timothy C. Y. Chan...",
        "chicago": "Matthew G. Crowson and Timothy C. Y. Chan...",
        "harvard": "Matthew G. Crowson and Timothy C. Y. Chan 2020...",
        "bibtex": "@article{crowson2020,\n  author = {Matthew G. Crowson...",
        "ieee": "Matthew G. Crowson, Timothy C. Y. Chan..."
      }
    }
  ],
  "confidence": 0.8,
  "search_strategy": "Multi-field search with synonym expansion",
  "papers_analysed": 5,
  "metadata": {
    "total_papers_found": 5,
    "query_used": "(ti:\"machine learning\" OR abs:\"machine learning\") AND (ti:healthcare OR abs:healthcare...)",
    "search_performed_at": "2024-01-15T10:30:00.000Z"
  }
}

Health Check

GET /api/health

Check API health status.

Response:

{
  "status": "healthy",
  "timestamp": "2024-01-15T10:30:00.000Z"
}

Temporal Query Support

The API intelligently recognises temporal keywords and applies appropriate sorting to return the most relevant papers.

Development

Running Tests

# Run all tests
uv run pytest

# Run with verbose output
uv run pytest -v

# Run specific test file
uv run pytest tests/test_citation_service.py -v

Code Quality

# Lint code
uv run flake8 --max-line-length=79

# Format code (if using black)
uv run black --line-length=79 .

Development Standards

Development Workflow

# 1. Make your changes
# 2. Run linting
uv run flake8 --max-line-length=79 --exclude=.venv,uv.lock .

# 3. Run tests
uv run python -m pytest tests/ -v

# 4. Fix any issues
# 5. Commit only when all checks pass


## Contributing

1. Fork the repository
2. Create a feature branch
3. Make your changes following the development standards above
4. Add tests for new functionality
5. Ensure all tests pass and linting is clean
6. Submit a pull request


## Support

For issues and questions:
- Create an issue in the repository
- Check the API documentation at `/docs`
- Review the test files for usage examples

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages