A robust data processing pipeline that converts JSNL files into Parquet format using DuckDB, with additional functionality to extract trade and equity data to MariaDB.
This project provides a data processing pipeline that:
- Processes JSNL files into Parquet format
- Extracts trade and equity data to MariaDB
- Supports hourly, daily, and monthly data aggregation
- Includes automated processing via systemd timer
- Provides flexible command-line interface for processing
- Python 3.x
- MariaDB/MySQL
- DuckDB
- Systemd (for automated processing)
- Clone the repository:
git clone <repository-url>
cd report_data_processor- Run the setup script:
sudo ./setup.shThe setup script will:
- Create necessary directories
- Install required Python dependencies
- Set up systemd service and timer
- Configure the processing pipeline
The pipeline uses environment variables for database configuration. Create a .env file with the following variables:
DB_HOST=localhost
DB_PORT=3306
DB_USER=root
DB_PASSWORD=your_password
DB_NAME=trading
The processor can be run in several ways:
- Process all files in default directories:
python jsnl_processor.py- Process a specific file:
python jsnl_processor.py --file /path/to/file.jsnl- Process files from custom directories:
python jsnl_processor.py --source_path /custom/input --processed_path /custom/processed- Limit processing to N files:
python jsnl_processor.py --limit 10The pipeline is configured to run automatically via systemd timer. To check the status:
systemctl status jsnl_processor.timer/data/to_process/dashboard_data_archive- Input directory for JSNL files/data2/processed/dashboard_data_archive- Processed JSNL files/data/parquet/- Output directory for Parquet files/temp- Temporary processing files/hourly- Hourly aggregated data/daily- Daily aggregated data/monthly- Monthly aggregated data
/log- Log files
Logs are written to /log/jsnl_processor.log with detailed information about the processing pipeline's operation.
- Fork the repository
- Create your feature branch
- Commit your changes
- Push to the branch
- Create a new Pull Request
[Add your license information here]
[Add support information here]