Tools to work with data downloaded from Open Humans research platform.
- Tool #1: Unzip, split if needed based on size, and convert json to csv, and do it on a full batch of downloaded data from OH.
- Tool #2: Unzip, merge, and create output file from multiple data files from an OH download
- Tool #3: Examples and descriptions of the four data file types from Nightscout
- Tool #4: Pull ISF from device status
- Tool #5: Assess amount of looping data
- Tool #6: Outcomes
Tool #1: Unzip, split if needed based on size, and convert json to csv, and do it on a full batch of downloaded data from OH.
Unzip-Zip-CSVify-OpenHumans-data.sh Note that this tool was designed for use with the OpenAPS and Nightscout Data Commons, which pulls in Nightscout data as json files to Open Humans. Any users will need to specify the data file types for use in the second "for" loop. (The first for loop is Nightscout specific based on the data type, and uses an alternative json to csv conversion - see tips for other installation requirements).
This script calls complex-json2csv and jsonsplit.sh. Both tools are in a package (see repo here) which can be installed by npm (see this).
Progress output from the tool while running, with the script in current form, looks like:
########
########_entries.json
########_entries.csv
Starting participant ########
Extracted ########_profile.json; splitting it...
.
Grouping split records into valid json...
-
Creating CSV files...
=
Participant ########: profile CSV files created: 1
Extracted ########_treatments.json; splitting it...
..............
Grouping split records into valid json...
--------------
Creating CSV files...
==============
Participant ########: treatments CSV files created: 14
Extracted ########_devicestatus.json; splitting it...
...................................
Grouping split records into valid json...
-----------------------------------
Creating CSV files...
===================================
Participant ########: devicestatus CSV files created: 35
Unzip-merge-output.sh Note that this tool was designed for use with the OpenAPS and Nightscout Data Commons, which pulls in Nightscout data as json files to Open Humans. Any users will need to specify the data file types for use in the second "for" loop, but can see this script as a template for taking various pieces of data from multiple files (i.e. timezone from devicestatus and BG data from entries) and creating one file, complete with headers, ready for data analysis.
Per the headers for the file provided as an example in this script, if needed, I have formulas created in excel to calculate if data is from a control or intervention period or neither; the hour of day the data is from to calculate if it is day or nighttime; and also (once looping start date manually added to file) can calculate number of days looping and number of days of data in the upload to calculate the control/intervention time frames based on the project protocol.
Mock data in output file along with additional calculations for various variables as defined by a project protocol:
NS-data-types.md attemps to explain the nuances and what is contained in each of the four data file types: profile, entries, device status, and treatments.
Requires csvkit, so do sudo pip install csvkit to install before running this script. Also, it assumes your NS data files are already in csv format, using tool #1 Unzip-Zip-CSVify-OpenHumans-data.sh.
Note: depending on your install of six, you may get an attribute error.
Following this rabbit hole about the error, various combinations of solutions outlined in this stack overflow article may help.
The devicestatus-pull-isf-timestamp.sh script, when successful, pulls ISF and timestamp to enable further ISF analysis.
There are two methods for assessing amounts of data.
- You can use howmuchBGdata.sh to see how much time worth of BG entries someone has. However, this doesn't necessarily represent time of looping data.
- Or, you can use howmuchdevicestatusdata.sh to see how much looping data (OpenAPS only for now; someone can add in Loop assessment later with same principle) someone has in the Data Commons.
Before running howmuchdevicestatusdata.sh, you'll need to first run devicestatustimestamp.sh to pull out the timestamp into a separate file. If you haven't, you'll need csvkit (see Tool #4 for details). Also, both of these may need chmod +x <filename> before running on your machine.
Output on the command line of devicestatustimestamp.sh:

Then, run howmuchdevicestatusdata.sh, and the output in the command line also shows which files are being processed:

The original output of howmuchdevicestatusdata.sh is a CSV.
- Due to someone having multiple uploads, there may be multiple lines for a single person. You can use Excel to de-duplicate these.
- Loop users (until someone updates the script to pull in loop/enacted/timestamp) will show up as 0. You may want to remove these before averaging to estimate the Data Commons' total looping data.
- add Loop/enacted/timestamp to also assess Loop users
- add a script version to include both BG and looping data in same output CSV)
This script (outcomes.sh) assess the start/end of BG and looping data to calculate time spent low (default: <70 mg/dL), time in range (default: 70-180mg/dL), time spent high (default:>180mg/dL), amount of high readings, and the average glucose for the time frame where there is entries data and they are looping.
Tl;dr - this analyzes the post-looping time frame.


