AURORA AI Engineering Project

AURORA is a research & engineering codebase for training and serving a multi-task machine‑learning model on Wikipedia text streams. The project demonstrates clean architecture, reproducible experiments, an end-to-end training/evaluation pipeline, and a lightweight FastAPI/CLI deployment.

🚀 Features

Data engineering: cleaning, deduplication, anomaly checks, anonymization
Training: multi-task network (sentiment classification + intensity regression)
Evaluation: accuracy, precision, recall, F1, intensity MAE on held‑out data
Optimization: automated hyperparameter search using configuration files
Serving: FastAPI endpoint and CLI chat interface for inference

📁 Repository Layout

shion_ai/
├── app.py                    # command-line entrypoint
├── api/server.py             # FastAPI web server
├── aurora/                   # core library
│   ├── core/                 # algorithm implementations
│   ├── data/                 # dataset utilities & streaming
│   ├── training/             # train / eval / optimize scripts
│   ├── serving/              # prediction and reply helpers
│   └── utils/                # configuration & logging helpers
├── cli/chat.py               # simple chat client
├── configs/                  # YAML configuration files
├── data/                     # wiki extracts + generated gold sets
├── docs/                     # supplementary documentation
├── artifacts/                # checkpoints, reports, outputs
└── requirements.txt          # Python dependencies

🛠 Installation

Clone the repository:

git clone https://github.com/Kevin28576/shion_ai.git
cd shion_ai

Create a virtual environment and install dependencies:

python -m venv venv
source venv/bin/activate   # or `.
venv\Scripts\activate` on Windows
pip install -r requirements.txt

⚙️ Usage

Most functionality is accessed via the main CLI wrapper in app.py.

Examples:

python app.py train --config configs/base.yaml
python app.py eval --config configs/base.yaml
python app.py optimize --config configs/base.yaml
python app.py serve --reload            # start FastAPI server
python app.py chat                      # interactive command‑line chat

Generate or evaluate a gold dataset:

python app.py build-gold --config configs/base.yaml \
    --out data/gold/gold_candidates.jsonl --size 500
python app.py eval-gold --config configs/base.yaml \
    --gold data/gold/gold_candidates.jsonl

Notes

Input data: place WikiExtractor output under data/extracted/.
Splits happen on-the-fly using hashing; no static processed files are required.
Reports are written under artifacts/reports/.

📄 Contributing

Contributions are welcome! Please open issues or pull requests. Follow the Code of Conduct and describe your changes clearly.

Fork the repo.
Create a feature branch: git checkout -b feature/your-feature
Commit your changes, push, and open a PR.

See CONTRIBUTING.md (to be added) for more details.

📜 License

This project is released under the MIT License.

This README was automatically generated for clarity on GitHub.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AURORA AI Engineering Project

🚀 Features

📁 Repository Layout

🛠 Installation

⚙️ Usage

Notes

📄 Contributing

📜 License

About

Uh oh!

Releases

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
api		api
aurora		aurora
cli		cli
configs		configs
data/gold		data/gold
docs		docs
tests		tests
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

License

Kevin28576/shion_ai

Folders and files

Latest commit

History

Repository files navigation

AURORA AI Engineering Project

🚀 Features

📁 Repository Layout

🛠 Installation

⚙️ Usage

Notes

📄 Contributing

📜 License

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Contributors

Uh oh!

Languages