Bigbytes is a hybrid framework for transforming and integrating data. It combines the best of both worlds: the flexibility of notebooks with the rigor of modular code.
- Extract and synchronize data from 3rd party sources.
- Transform data with real-time and batch pipelines using Python, SQL, and R.
- Load data into your data warehouse or data lake using our pre-built connectors.
- Run, monitor, and orchestrate thousands of pipelines without losing sleep.
The recommended way to install the latest version of Bigbytes is through Docker with the following command:
docker pull getbigbytes/bigbytes:latestYou can also install Bigbytes using pip or conda, though this may cause dependency issues without the proper environment.
pip install bigbytesconda install -c conda-forge bigbytesLooking for help? The fastest way to get started is by checking out our documentation here.
Looking for quick examples? Open a demo project right in your browser or check out our guides.
Build and run a data pipeline with our demo app.
WARNING
The live demo is public to everyone, please don’t save anything sensitive (e.g. passwords, secrets, etc).
Click the image to play video
A sample data pipeline defined across 3 files ➝
- Load data ➝
@data_loader def load_csv_from_file() -> pl.DataFrame: return pl.read_csv('default_repo/titanic.csv')
- Transform data ➝
@transformer def select_columns_from_df(df: pl.DataFrame, *args) -> pl.DataFrame: return df[['Age', 'Fare', 'Survived']]
- Export data ➝
@data_exporter def export_titanic_data_to_disk(df: pl.DataFrame) -> None: df.to_csv('default_repo/titanic_transformed.csv')