Skip to content

AURORA is a lightweight research-oriented AI engineering framework for multi-task NLP training, evaluation, and FastAPI deployment.

License

Notifications You must be signed in to change notification settings

Kevin28576/shion_ai

AURORA AI Engineering Project

License Python

AURORA is a research & engineering codebase for training and serving a multi-task machine‑learning model on Wikipedia text streams. The project demonstrates clean architecture, reproducible experiments, an end-to-end training/evaluation pipeline, and a lightweight FastAPI/CLI deployment.

🚀 Features

  • Data engineering: cleaning, deduplication, anomaly checks, anonymization
  • Training: multi-task network (sentiment classification + intensity regression)
  • Evaluation: accuracy, precision, recall, F1, intensity MAE on held‑out data
  • Optimization: automated hyperparameter search using configuration files
  • Serving: FastAPI endpoint and CLI chat interface for inference

📁 Repository Layout

shion_ai/
├── app.py                    # command-line entrypoint
├── api/server.py             # FastAPI web server
├── aurora/                   # core library
│   ├── core/                 # algorithm implementations
│   ├── data/                 # dataset utilities & streaming
│   ├── training/             # train / eval / optimize scripts
│   ├── serving/              # prediction and reply helpers
│   └── utils/                # configuration & logging helpers
├── cli/chat.py               # simple chat client
├── configs/                  # YAML configuration files
├── data/                     # wiki extracts + generated gold sets
├── docs/                     # supplementary documentation
├── artifacts/                # checkpoints, reports, outputs
└── requirements.txt          # Python dependencies

🛠 Installation

  1. Clone the repository:
    git clone https://github.com/Kevin28576/shion_ai.git
    cd shion_ai
  2. Create a virtual environment and install dependencies:
    python -m venv venv
    source venv/bin/activate   # or `.
    venv\Scripts\activate` on Windows
    pip install -r requirements.txt

⚙️ Usage

Most functionality is accessed via the main CLI wrapper in app.py.

Examples:

python app.py train --config configs/base.yaml
python app.py eval --config configs/base.yaml
python app.py optimize --config configs/base.yaml
python app.py serve --reload            # start FastAPI server
python app.py chat                      # interactive command‑line chat

Generate or evaluate a gold dataset:

python app.py build-gold --config configs/base.yaml \
    --out data/gold/gold_candidates.jsonl --size 500
python app.py eval-gold --config configs/base.yaml \
    --gold data/gold/gold_candidates.jsonl

Notes

  • Input data: place WikiExtractor output under data/extracted/.
  • Splits happen on-the-fly using hashing; no static processed files are required.
  • Reports are written under artifacts/reports/.

📄 Contributing

Contributions are welcome! Please open issues or pull requests. Follow the Code of Conduct and describe your changes clearly.

  1. Fork the repo.
  2. Create a feature branch: git checkout -b feature/your-feature
  3. Commit your changes, push, and open a PR.

See CONTRIBUTING.md (to be added) for more details.

📜 License

This project is released under the MIT License.


This README was automatically generated for clarity on GitHub.

About

AURORA is a lightweight research-oriented AI engineering framework for multi-task NLP training, evaluation, and FastAPI deployment.

Topics

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Contributors

Languages