This directory stores all data files for the AI-Tutor system, including knowledge bases, user data, logs, etc.
data/
├── knowledge_bases/ # Knowledge base storage directory
│ ├── kb_config.json # Knowledge base configuration file
│ └── {kb_name}/ # Individual knowledge base directories
│ ├── metadata.json # Knowledge base metadata
│ ├── numbered_items.json # Numbered items (definitions, theorems, etc.)
│ ├── raw/ # Original documents (PDF/Markdown)
│ ├── images/ # Extracted images
│ ├── content_list/ # Document content list
│ └── rag_storage/ # RAG knowledge graph storage
│ ├── graph_chunk_entity_relation.graphml
│ ├── kv_store_*.json
│ └── vdb_*.json
│
└── user/ # User data directory
├── solve/ # Problem solving module output
│ └── solve_YYYYMMDD_HHMMSS/
│ ├── investigate_memory.json
│ ├── solve_chain.json
│ ├── citation_memory.json
│ ├── final_answer.md
│ └── artifacts/ # Code execution output
│
├── question/ # Question generation module output
│ └── question_YYYYMMDD_HHMMSS/
│
├── research/ # Research module output
│ ├── cache/ # Research cache
│ │ └── research_*/ # Queue and intermediate results
│ └── reports/ # Research reports
│ └── research_*.md
│
├── guide/ # Guided learning output
│ └── session_{session_id}.json
│
├── notebook/ # Notebook data
│ └── notebooks_index.json
│
├── co-writer/ # Co-Writer output
│ ├── audio/ # TTS audio files
│ └── tool_calls/ # Tool call history
│
├── logs/ # System logs
│ └── ai_tutor_*.log
│
├── run_code_workspace/ # Code execution workspace
│
└── user_history.json # User activity history
Stores all knowledge base data files. Each knowledge base contains:
- metadata.json: Knowledge base metadata, including creation time, update time, update history, etc.
- numbered_items.json: Extracted numbered items (Definition, Theorem, Formula, etc.)
- raw/: Original uploaded documents (PDF, Markdown, etc.)
- images/: Images extracted from documents
- content_list/: Document content list (JSON format)
- rag_storage/: RAG knowledge graph storage files
graph_chunk_entity_relation.graphml: Knowledge graph structurekv_store_*.json: Key-value storage (documents, entities, relations, etc.)vdb_*.json: Vector database indices
Stores all user-generated data and output files.
Problem solving module output directory. Each solving task generates a timestamped directory:
investigate_memory.json: Analysis Loop memory datasolve_chain.json: Complete Solve Loop steps and tool call recordscitation_memory.json: Citation management datafinal_answer.md: Final answer (Markdown format)artifacts/: Files generated by code execution (images, data files, etc.)
Question generation module output directory. Each question generation task generates a timestamped directory containing generated questions and validation results.
Research module output directory:
cache/: Intermediate data during research (queue state, planning results, etc.)reports/: Final generated research reports (Markdown format)
Guided learning module output directory. Each learning session is saved as a JSON file containing session state, knowledge points, chat history, etc.
Notebook data storage. notebooks_index.json contains index information for all notebooks.
Co-Writer module output directory:
audio/: TTS-generated audio filestool_calls/: AI tool call history
System log files, named by date.
Code execution tool workspace for temporarily storing files generated by code execution.
Data directory paths are configured in config/main.yaml:
paths:
user_data_dir: "./data/user"
knowledge_bases_dir: "./data/knowledge_bases"
user_log_dir: "./data/user/logs"- Backup Important Data: Recommend regularly backing up
knowledge_bases/and important user data - Version Control: Recommend adding
data/directory to.gitignoreto avoid committing large files - Disk Space: Knowledge bases and user data may occupy significant disk space, clean old data regularly
- Permission Management: Ensure application has read/write permissions
- Path Consistency: All modules use unified path configuration, avoid hardcoded paths
- Knowledge Base Management:
src/knowledge/- Knowledge base creation, updates, queries - User Data: Each functional module automatically manages its corresponding user data directory
- Logging System:
src/core/logging/- Unified logging management
# Clean old solving records (keep last 30 days)
find data/user/solve -type d -mtime +30 -exec rm -rf {} \;
# Clean old log files (keep last 7 days)
find data/user/logs -name "*.log" -mtime +7 -delete# Backup entire knowledge base directory
tar -czf knowledge_bases_backup_$(date +%Y%m%d).tar.gz data/knowledge_bases/
# Backup specific knowledge base
tar -czf ai_textbook_backup.tar.gz data/knowledge_bases/ai-textbook/# Restore knowledge base
tar -xzf knowledge_bases_backup_20250101.tar.gz -C data/