Skip to content

romantilly/bike-sharing

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

bike-sharing

Data Collection

src/data-preprocessing

Overview

  • create a database
  • setup the script on a server
  • run script automated with a cron job

Prerequisites

  • Python 3.6
  • Libraries:
    • requests
    • psycopg2

install packages:

cd src/data-preprocessing
pip install -r requirements.txt

Scripts

SQL Script create_bikeDB.sql to create the database scheme

Create a database where the data queried in the script is being stored.

Script query_bike_apis.py is used to query provider API data

API requests to receive all current locations of bikes from nextbike, lidlbike and mobike in Berlin (inner circle) and store them into a single database.

Script query_nextbike_stations.py is used to query the stations of nextbike

Config File Add config.py file to src/data-preprocessing with API Keys for Deutsche Bahn API (https://developer.deutschebahn.com/store/) and database credentials. (see Example config-example.py)

Run script automized

Set up a cron job that runs the script in regular intervalls. E.g. this setup

  • runs the query_bike_apis.py script every 4 minutes
  • runs the query_nextbike_stations.py script once a day at 8 AM
  • runs a cleaning script on the database (/src/clean_script.py) once a day at 11 PM deleting all unnecessary rows in the database.

CRON JOBS

    */4 * * * * python3 [PATH TO FOLDER]/src/query_bike_apis.py
    0 8 * * * python3 [PATH TO FOLDER]/src/query_nextbike_stations.py
    0 23 * * * python3 [PATH TO FOLDER]/src/clean_script.py

Query other cities or providers

To query APIs for different cities the src/data-processing/query_bike_apis.py script has to be adapted accordingly. To query other providers this documentation is a good source of information.

For access to lime bike API insert phone_no to config.py and follow steps in lime_access.py (three manual steps required).

Data Analysis

src/analysis

Jupyter Notebook to analyse data.

  • preprocess.ipynb contains the preprossing steps of the raw data to a usable format.

    • raw.csv contains the data from the database
    • preprocessed.csv contains the data with added columns and fixed lat / lng
    • routed.csv contains the data with distance and waypoints
    • cleaned.csv is the cleaned routed dataset (unplausible data is removed)
    • pseudonomysed.csv is the anonymized, cleaned data, following this standard
    • pseudonomysed_raw.csv ist the anonymized data (NOT cleaned).
  • analysis.ipynb includes analysis about provider and bike specific data

  • pseudonomysed.ipynb includes analysis using the anonymized dataset (without information on providers.)

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 98.2%
  • Python 0.8%
  • JavaScript 0.5%
  • HTML 0.2%
  • CSS 0.1%
  • Rich Text Format 0.1%
  • TSQL 0.1%