Stratum

A CrewAI-based multi-agent system for recursive forensic audits of scientific research papers.

Overview

Stratum performs deep analysis of scientific papers by extracting their fundamental structure: hypotheses, data, and conclusions. Based on George Whitesides' definition of a scientific paper, Stratum deconstructs research into Knowledge Tables and builds citation networks for graph visualization in Obsidian.

Features

Three-Agent Architecture
- Librarian Agent: Fetches papers, extracts citations, ranks for recursive analysis
- Analyst Agent: Deconstructs papers following the Whitesides Standard (hypothesis-driven, data-centric)
- Archivist Agent: Converts analysis to Obsidian markdown with wikilinks
Model-Agnostic LLM Support
- Switch between OpenAI, Anthropic, or Ollama (local) by changing one config variable
- Powered by LiteLLM for unified LLM interface
Recursive Paper Analysis
- Automatically analyzes foundational citations up to configurable MAX_DEPTH
- Deduplication prevents reprocessing papers
- State persistence enables resume after crashes
Knowledge Graph Output
- Generates Obsidian markdown files with YAML frontmatter
- Wikilinks create navigable citation networks
- Graph view visualizes paper dependencies

Installation

Prerequisites

Python 3.10-3.13
Docker (for GROBID citation parser)
LLM API key (OpenAI/Anthropic) OR Ollama installed locally

Setup

Clone the repository:

git clone <repository-url>
cd stratum

Create virtual environment and install:

python3 -m venv venv
source venv/bin/activate
pip install -e ".[dev]"

Configure environment:

cp .env.example .env
# Edit .env with your API keys and settings

Start GROBID (citation parser):

docker run -t --rm -p 8070:8070 lfoppiano/grobid:0.8.0

Configuration

Edit .env to configure:

# LLM - Switch between providers
# Supported: OpenAI (gpt-4o), Anthropic (claude-3-5-sonnet-20241022),
#            Google Gemini (gemini/gemini-2.0-flash-exp), Ollama (ollama/llama3.2)
LLM_MODEL=gpt-4o
LLM_API_KEY=your_api_key_here

# Recursion settings
MAX_DEPTH=3
MAX_CITATIONS_PER_PAPER=5

# Paths
OUTPUT_DIR=./output
CACHE_DIR=./data

Usage

Check Dependencies

Before running, verify all dependencies are available:

stratum doctor

Analyze a Paper

Analyze a paper recursively:

stratum analyze <DOI> --max-depth 3 --max-citations 5

Example:

stratum analyze 10.1000/example.2024 --max-depth 2 --max-citations 3

Options:

--max-depth, -d: Maximum recursion depth (default: 3)
--max-citations, -c: Max citations per paper (default: 5)
--output-dir, -o: Output directory (default: ./output)
--model, -m: LLM model to use (overrides .env)
--verbose/--quiet, -v/-q: Enable/disable verbose output

Check Status

View analysis progress and statistics:

stratum status

Shows:

Total papers processed
Papers by depth level
List of processed DOIs

Reset State

Clear analysis state to reprocess papers:

stratum reset --force

Version Info

stratum version

Project Structure

Stratum/
├── src/stratum/
│   ├── models/           # Pydantic data models
│   ├── agents/           # CrewAI agents
│   ├── tasks/            # CrewAI tasks
│   ├── tools/            # Custom tools (PDF, citations, etc.)
│   ├── llm/              # LLM abstraction (LiteLLM)
│   ├── config/           # Settings and agent configs
│   ├── utils/            # Utilities (recursion, obsidian)
│   ├── crew.py           # Crew orchestration
│   ├── flow.py           # Recursive workflow
│   └── main.py           # CLI entry point
│
├── output/papers/        # Generated Obsidian markdown
├── data/                 # Cache and state
└── tests/                # Unit and integration tests

Development Status

Phase 1: Project foundation and data models ✅
Phase 2: Agent tools (PDF, citations, paper fetcher) ✅
Phase 3: LLM abstraction and agent definitions ✅
Phase 4: Crew orchestration and recursive flow ✅
Phase 5: CLI and production polish ✅

🎉 MVP Complete! All core functionality implemented and tested.

Future Enhancements (Post-MVP)

Streamlit Web Interface: Interactive UI for exploring papers and citation graphs
- Paper upload and real-time analysis
- Interactive Obsidian graph visualization
- Citation network navigation with filtering
- Evidence strength visualization
Figure/Table Extraction: Computer vision for data extraction
Semantic Search: Vector embeddings for similarity queries
Collaboration Features: Shared vaults, annotations, discussions

Data Contract

All agents communicate using the KnowledgeTable schema:

{
  "kt_id": "KT_YYYY_XXX",
  "meta": {
    "title": "Paper Title",
    "authors": ["Author1", "Author2"],
    "year": 2024,
    "doi": "10.1000/example"
  },
  "core_analysis": {
    "central_hypothesis": "What question is being answered?",
    "methodology_summary": "How was it tested?",
    "significance": "Why does it matter?"
  },
  "key_points": [
    {
      "id": "KP1",
      "content": "Specific claim or finding",
      "evidence_anchor": "Table 2",
      "confidence_score": 0.95
    }
  ],
  "logic_chains": [
    {
      "name": "Argument Name",
      "argument_flow": "Step 1 -> Step 2 -> Conclusion",
      "conclusion_derived": "Final conclusion"
    }
  ],
  "citation_network": [
    {
      "target_paper_doi": "10.1000/cited",
      "target_paper_title": "Cited Paper",
      "usage_type": "Foundational",
      "notes": "Why this citation matters"
    }
  ]
}

Testing

Run all tests:

pytest

Run with coverage:

pytest --cov=src/stratum --cov-report=html

Run specific test file:

pytest tests/unit/test_models.py -v

Architecture Principles

Whitesides Standard

Papers are organized descriptions of:

Hypothesis - What question is being tested?
Data - What evidence supports the answer?
Conclusions - What was learned?

Analyst Agent Rules

Hypothesis-Driven: Identify what was TESTED, not just what was done
Data-Centric: Anchor every claim to specific evidence (Table X, Figure Y)
Structure over Time: Organize by logic, not chronology
Strict JSON: Output must validate against KnowledgeTable schema

Contributing

Contributions welcome! Please ensure:

All tests pass: pytest
Code is formatted: black src/ tests/
Linting passes: ruff check src/ tests/

License

MIT

Knowledge Distillation

Stratum performs forensic knowledge distillation - extracting the logical structure from narrative papers. See knowledge/knowledge_distillation.md for details on:

The Whitesides Standard approach
Evidence anchoring and logic chain mapping
Typed citation networks (Foundational/Comparison/Refuting)
Recursive knowledge graph construction

Acknowledgments

Based on the Whitesides Standard for scientific writing
Powered by CrewAI, LiteLLM, and GROBID
Designed for Obsidian graph visualization
Supports OpenAI, Anthropic, Google Gemini, and Ollama (local) models

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
knowledge		knowledge
scripts		scripts
src/stratum		src/stratum
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
TODO.md		TODO.md
critical_review_by_codex.md		critical_review_by_codex.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Stratum

Overview

Features

Installation

Prerequisites

Setup

Configuration

Usage

Check Dependencies

Analyze a Paper

Check Status

Reset State

Version Info

Project Structure

Development Status

Future Enhancements (Post-MVP)

Data Contract

Testing

Architecture Principles

Whitesides Standard

Analyst Agent Rules

Contributing

License

Knowledge Distillation

Acknowledgments

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

mkuiper/stratum

Folders and files

Latest commit

History

Repository files navigation

Stratum

Overview

Features

Installation

Prerequisites

Setup

Configuration

Usage

Check Dependencies

Analyze a Paper

Check Status

Reset State

Version Info

Project Structure

Development Status

Future Enhancements (Post-MVP)

Data Contract

Testing

Architecture Principles

Whitesides Standard

Analyst Agent Rules

Contributing

License

Knowledge Distillation

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages