Skip to content

wri/decision-tree

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

TerraFund Decision Tree

Repository Overview

This repository contains the code for running a rule-based decision process to identify the appropriate verification method (field or remote) for a TerraFund project.

🧠 Purpose & Scope

The primary functions of this code are located within run_decision_tree.py.

Data Gathering & Cleaning

  • Query the TerraMatch API, Maxar API, and OpenTopo API to gather input data for the decision tree
  • Process, validate, and clean the API response into the various "branches" of the tree

Apply Logic

  • Apply the rules framework

Make Decisions

  • Apply the weighted scoring approach at the polygon level
  • Calculate the cost to monitor the project
  • Aggregate polygon scores to derive project score

Upload

  • Push results to Asana
  • Push results to S3

🎯 Goals & Non-Goals

  • Goals: create an individual verification score for each project in the TerraFund portfolio.
  • Non-Goals: score projects that are not part of TerraFund.

πŸ“¦ Inputs & Upstreams

Upstream Repositories

Repo Artifact Provided Interface Versioning/Contract Notes
maxar-tools API response --- ------------------- -----
terramatch-researcher-api API response -- ------------------- -----
tf-biophysical-monitoring patched tree cover -- ------------------- -----

External Data Sources

Source Type (API/S3/etc.) Endpoint/Path Auth Data Format Schema Link
OpenTopography API ------------- API key geotiff -----------

Expected Input Formats

  • mode dependent - mainly defers to params.yaml file or csv for processing

Input Validation

  • Schema checks: process_api_results.py checks dtypes, missing or incomplete data and duplicates
  • Range checks: None
  • Null/duplicate handling: process_api_results.py checks dtypes, missing or incomplete data and duplicates
  • Failure behavior (fail fast vs warn): None
  • Upstream repos:: tf-biophysical-indicators, terramatch-researcher-api, maxar-tools
  • External data sources: (S3/API/DB): Opentopo API
  • Expected input format(s): (e.g., Parquet, GeoTIFF, JSON)
  • Input validation logic: (schema checks, nulls, ranges):

πŸ“€ Outputs & Downstreams

Produced Artifacts

Artifact Format Path/Location Update Frequency Size Retention
project ids csv data/portfolio_csvs/prj_ids_{cohort}_{data_version}.csv N/A x Permanent
tm API response json data/tm_api_response/{cohort}_{data_version}.json N/A x Permanent
tn clean response csv data/feats/tm_api_{cohort}_{data_version}.csv N/A x Permanent
project polygon geojson s3://tree_verification/output/project_data/SHORTNAME/geojson/SHORTNAME_{date}.geojson N/A x Permanent
img metadata csv data/imagery_availability/comb_img_availability_{cohort}_{data_version}.csv N/A x Permanent
slope statistics csv data/slope/slope_statistics_{cohort}_{data_version}.csv N/A x Permanent
decision scores csv data/decision_scores/prj_output_{cohort}_{data_version}_{experiment_id}.csv N/A x Permanent

Downstream Repos / Consumers

| Repo | Consumed Artifact | Contract Type | Dependency Level (strong/soft) | | tree-verification | binary decision | ---- | strong | | maxar-tools (image ordering) | binary decision | ---- | strong | | maxar-tools (image availability) | image count | ---- | soft |

Output Guarantees / Completeness checks

  • Schema stability: None
  • Freshness SLO: none
  • Completeness / QA checks before publish: none
  • Artifacts produced: (files, tables, models):

βš™οΈ Configuration & Secrets

Config Files

Config files: params.yaml, rule_template.csv

criteria:
  canopy_threshold: 60                                        
  cloud_thresh: 50                                   
  off_nadir: 30                                               
  sun_elevation: 30
  img_count: 1                                                 
  baseline_range: [-365, 0]                                
  ev_range: [730, 1095]                                        
  drop_missing: False
  slope_thresh: 20
  rules: data/rule_template.csv

Configuration Keys

Don't actually list the config keys in plain text but describe what is needed.

Key Type Default Required Description

Secrets Handling

Config files: secrets.yaml

access_token:
opentopo_key: 
aws:
  aws_access_key_id: 
  aws_secret_access_key:
  aws_region:
asana:
  pat: 

πŸš€ Execution

Entry Points

Command Purpose
python3 run_decision_tree.py runs pipeline

Orchestration

There is currently no orchestration (manual trigger via CLI) but this pipeline needs to be run after the polygons have been approved on TerraMatch and the tree cover indicator has been calculated.

System Trigger Frequency Dependency Triggered?
N/A CLI Manual No

Runtime & Cost Expectations

  • Expected runtime: tbd
  • GPU/CPU requirements:
  • Storage footprint: ~1GB
  • Cost control levers (batch size, concurrency): N/A
  • Entry point(s):
  • CLI usage: (example commands)
  • Schedulers / Orchestrators: (Airflow / Cron / GitHub Actions):
  • Runtime + cost notes:

πŸ—οΈ Environment & Dependencies

Python & System Environment

can create a requirements.txt when the time comes.

πŸ” Observability

Logging

The logging for this repository should ideally create a .jsonl file within each SHORTNAME directory specifying success / failure.

  • Format: Currently only logging via print statements and capture to a txt file via python3 main.py > tmp_log.txt
  • Logging levels: Singular
  • Sample log message:

Metrics

Metric Type Labels Description

Dashboards/Monitoring

  • Links
  • Alerting rules
  • Logging: (format, sinks)

πŸ’₯ Failure Modes & Recovery

Expected Failures & Behavior

Failure Case *

Checkpoints & Resume

  • Where checkpoints stored:
  • Resume instructions:
  • Common failure scenarios

πŸ‘₯ Ownership & Access

Owners

  • Engineering owner: Jessica Ertel
  • Data owner
  • On-call/rotation (if exists)

Access Model

  • Code access
  • Data bucket access
  • Secrets access

πŸ—ΊοΈ Pipeline Position

Data Flow

<Upstream repo/data> β†’ This repo β†’ <Downstream repo/data>
  • Sync frequency with upstream/downstream repos
  • Dependency risks

About

A custom rule-based system to forecast TerraFund's monitoring and verification needs.

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published