Skip to content

Latest commit

 

History

History
 
 

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 

README.md

Data Directory

This directory stores all data files for the AI-Tutor system, including knowledge bases, user data, logs, etc.

📁 Directory Structure

data/
├── knowledge_bases/          # Knowledge base storage directory
│   ├── kb_config.json        # Knowledge base configuration file
│   └── {kb_name}/            # Individual knowledge base directories
│       ├── metadata.json     # Knowledge base metadata
│       ├── numbered_items.json  # Numbered items (definitions, theorems, etc.)
│       ├── raw/               # Original documents (PDF/Markdown)
│       ├── images/            # Extracted images
│       ├── content_list/      # Document content list
│       └── rag_storage/       # RAG knowledge graph storage
│           ├── graph_chunk_entity_relation.graphml
│           ├── kv_store_*.json
│           └── vdb_*.json
│
└── user/                      # User data directory
    ├── solve/                 # Problem solving module output
    │   └── solve_YYYYMMDD_HHMMSS/
    │       ├── investigate_memory.json
    │       ├── solve_chain.json
    │       ├── citation_memory.json
    │       ├── final_answer.md
    │       └── artifacts/     # Code execution output
    │
    ├── question/              # Question generation module output
    │   └── question_YYYYMMDD_HHMMSS/
    │
    ├── research/              # Research module output
    │   ├── cache/             # Research cache
    │   │   └── research_*/    # Queue and intermediate results
    │   └── reports/           # Research reports
    │       └── research_*.md
    │
    ├── guide/                 # Guided learning output
    │   └── session_{session_id}.json
    │
    ├── notebook/              # Notebook data
    │   └── notebooks_index.json
    │
    ├── co-writer/             # Co-Writer output
    │   ├── audio/             # TTS audio files
    │   └── tool_calls/        # Tool call history
    │
    ├── logs/                  # System logs
    │   └── ai_tutor_*.log
    │
    ├── run_code_workspace/    # Code execution workspace
    │
    └── user_history.json      # User activity history

📋 Directory Description

knowledge_bases/

Stores all knowledge base data files. Each knowledge base contains:

  • metadata.json: Knowledge base metadata, including creation time, update time, update history, etc.
  • numbered_items.json: Extracted numbered items (Definition, Theorem, Formula, etc.)
  • raw/: Original uploaded documents (PDF, Markdown, etc.)
  • images/: Images extracted from documents
  • content_list/: Document content list (JSON format)
  • rag_storage/: RAG knowledge graph storage files
    • graph_chunk_entity_relation.graphml: Knowledge graph structure
    • kv_store_*.json: Key-value storage (documents, entities, relations, etc.)
    • vdb_*.json: Vector database indices

user/

Stores all user-generated data and output files.

solve/

Problem solving module output directory. Each solving task generates a timestamped directory:

  • investigate_memory.json: Analysis Loop memory data
  • solve_chain.json: Complete Solve Loop steps and tool call records
  • citation_memory.json: Citation management data
  • final_answer.md: Final answer (Markdown format)
  • artifacts/: Files generated by code execution (images, data files, etc.)

question/

Question generation module output directory. Each question generation task generates a timestamped directory containing generated questions and validation results.

research/

Research module output directory:

  • cache/: Intermediate data during research (queue state, planning results, etc.)
  • reports/: Final generated research reports (Markdown format)

guide/

Guided learning module output directory. Each learning session is saved as a JSON file containing session state, knowledge points, chat history, etc.

notebook/

Notebook data storage. notebooks_index.json contains index information for all notebooks.

co-writer/

Co-Writer module output directory:

  • audio/: TTS-generated audio files
  • tool_calls/: AI tool call history

logs/

System log files, named by date.

run_code_workspace/

Code execution tool workspace for temporarily storing files generated by code execution.

🔧 Configuration

Data directory paths are configured in config/main.yaml:

paths:
  user_data_dir: "./data/user"
  knowledge_bases_dir: "./data/knowledge_bases"
  user_log_dir: "./data/user/logs"

📝 Notes

  1. Backup Important Data: Recommend regularly backing up knowledge_bases/ and important user data
  2. Version Control: Recommend adding data/ directory to .gitignore to avoid committing large files
  3. Disk Space: Knowledge bases and user data may occupy significant disk space, clean old data regularly
  4. Permission Management: Ensure application has read/write permissions
  5. Path Consistency: All modules use unified path configuration, avoid hardcoded paths

🔗 Related Modules

  • Knowledge Base Management: src/knowledge/ - Knowledge base creation, updates, queries
  • User Data: Each functional module automatically manages its corresponding user data directory
  • Logging System: src/core/logging/ - Unified logging management

🛠️ Maintenance Operations

Clean Old Data

# Clean old solving records (keep last 30 days)
find data/user/solve -type d -mtime +30 -exec rm -rf {} \;

# Clean old log files (keep last 7 days)
find data/user/logs -name "*.log" -mtime +7 -delete

Backup Knowledge Base

# Backup entire knowledge base directory
tar -czf knowledge_bases_backup_$(date +%Y%m%d).tar.gz data/knowledge_bases/

# Backup specific knowledge base
tar -czf ai_textbook_backup.tar.gz data/knowledge_bases/ai-textbook/

Restore Knowledge Base

# Restore knowledge base
tar -xzf knowledge_bases_backup_20250101.tar.gz -C data/