Oracle Data Science Bootcamp 2014
A series of workshops to explain both basic and advanced data science concepts using SQL, Python, Matplotlib, and Apache Spark.
- Download the Oracle Big Data Lite VM at http://www.oracle.com/technetwork/database/bigdata-appliance/oracle-bigdatalite-2104726.html
This requires Virtual Box. All username/passwords for the VM are
oracle/welcome1unless stated otherwise. - Clone this git repository:
git clone https://github.com/dwmclary/odsb2014 - Change into the setup directory:
cd odsb2014/setup - Run the setup script:
./data_science_bootcamp_setup.sh - Run the data download script
./download_data.sh - Run the pyspark installation script
./setup_pyspark_notebook.sh - Run the database setup script
sqlplus sys/welcome1 as sysdba @fludb.sql - Start the database listener
lsnrctl start - Source ~/.bashrc or open a new terminal window
- Change to the
odsb2014/notebooksdirectory and start ipython:ipython notebook --profile pyspark
The flu_statitics and flu_news directories contain data download scripts that must be run
in order to complete the workshop. These can be run standalone, or by running setup/download_data.sh.
The flu_shots directory contains a script for fetching data from the US Dept. HHS, but collection
of this data is included as part of the series of notebooks.