bike-sharing

Data Collection

src/data-preprocessing

Overview

create a database
setup the script on a server
run script automated with a cron job

Prerequisites

Python 3.6
Libraries:
- requests
- psycopg2

install packages:

cd src/data-preprocessing
pip install -r requirements.txt

Scripts

SQL Script create_bikeDB.sql to create the database scheme

Create a database where the data queried in the script is being stored.

Script query_bike_apis.py is used to query provider API data

API requests to receive all current locations of bikes from nextbike, lidlbike and mobike in Berlin (inner circle) and store them into a single database.

Script query_nextbike_stations.py is used to query the stations of nextbike

Config File Add config.py file to src/data-preprocessing with API Keys for Deutsche Bahn API (https://developer.deutschebahn.com/store/) and database credentials. (see Example config-example.py)

Run script automized

Set up a cron job that runs the script in regular intervalls. E.g. this setup

runs the query_bike_apis.py script every 4 minutes
runs the query_nextbike_stations.py script once a day at 8 AM
runs a cleaning script on the database (/src/clean_script.py) once a day at 11 PM deleting all unnecessary rows in the database.

CRON JOBS

    */4 * * * * python3 [PATH TO FOLDER]/src/query_bike_apis.py
    0 8 * * * python3 [PATH TO FOLDER]/src/query_nextbike_stations.py
    0 23 * * * python3 [PATH TO FOLDER]/src/clean_script.py

Query other cities or providers

To query APIs for different cities the src/data-processing/query_bike_apis.py script has to be adapted accordingly. To query other providers this documentation is a good source of information.

For access to lime bike API insert phone_no to config.py and follow steps in lime_access.py (three manual steps required).

Data Analysis

src/analysis

Jupyter Notebook to analyse data.

preprocess.ipynb contains the preprossing steps of the raw data to a usable format.
- raw.csv contains the data from the database
- preprocessed.csv contains the data with added columns and fixed lat / lng
- routed.csv contains the data with distance and waypoints
- cleaned.csv is the cleaned routed dataset (unplausible data is removed)
- pseudonomysed.csv is the anonymized, cleaned data, following this standard
- pseudonomysed_raw.csv ist the anonymized data (NOT cleaned).
analysis.ipynb includes analysis about provider and bike specific data
pseudonomysed.ipynb includes analysis using the anonymized dataset (without information on providers.)

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
routing		routing
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

bike-sharing

Data Collection

Overview

Prerequisites

Scripts

Run script automized

Query other cities or providers

Data Analysis

About

Uh oh!

Releases

Packages

Languages

License

romantilly/bike-sharing

Folders and files

Latest commit

History

Repository files navigation

bike-sharing

Data Collection

Overview

Prerequisites

Scripts

Run script automized

Query other cities or providers

Data Analysis

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages