Volve Well Knowledge Graph (Well Dossier Manifests)

This repository generates per-well dossier manifests for the public Equinor Volve dataset by mapping:

well identifier -> relevant archive artifacts (paths only)

It does not redistribute Volve data files. It only produces structured indexes (Markdown and JSON) that reference file paths inside an existing Volve dataset mount (for example, Databricks Volume paths).

What this produces

For each whitelisted Volve well, the generator writes:

wells//manifest.md (human-readable)
wells//manifest.json (machine-readable)

Manifests are bucketed into lifecycle-relevant categories such as:

Well_Construction_Reports
DDR_XML, DDR_PDF, DDR_HTML
Logs (LAS, DLIS, LIS plus well log folders)
Survey_Trajectory
WellTechnical_General
Other

In addition, v1.3 applies:

De-duplication by normalized filename within each well and bucket using a deterministic path-preference rule
Cross-well reference flagging on file entries (directories are ignored)

Generating the input catalog

This repo expects a prebuilt master catalog CSV named:

out_wellkg_v3_catalog_v1.csv

You generate this catalog outside this repository (for example, using Databricks to crawl the Volve mount and export a CSV).

Reference pipeline:

Volve Metadata Discovery Index (Databricks catalog crawl + tagging): https://github.com/985185/volve-metadata-index

The output of that crawl can be filtered/renamed into out_wellkg_v3_catalog_v1.csv as long as it contains the required columns listed below.

Inputs

Minimum required columns in out_wellkg_v3_catalog_v1.csv:

path
name
type (file or dir)
ext_norm
top_folder
tags
well_final (the well identifier used by the scripts)

Note: Some older documentation refers to a column named well. The current scripts use well_final.

Quickstart

Windows PowerShell

Place out_wellkg_v3_catalog_v1.csv in the repository root.
Run:

.\scripts\run_all.ps1

Linux / macOS (bash)

Place out_wellkg_v3_catalog_v1.csv in the repository root.
Run:

bash scripts/run_all.sh

If you want to make it executable:

chmod +x scripts/run_all.sh
./scripts/run_all.sh

Outputs are written to:

wells/

Repository structure

Path	What it is
scripts/	Core pipeline scripts (whitelist build + manifest generation + entry points).
tools/	Utility helpers and experimental scripts (not part of the core pipeline).
docs/	Documentation, validation notes, and research context.
schema/	YAML schemas and structured definitions for downstream graph/DDR work.
examples/	Small example artifacts illustrating expected output format.
data/	Ground-truth or small reference artifacts (never raw Volve archive files).
wells/	Generated per-well manifests (output).
well_whitelist.csv	Default whitelist artifact used by the generator.
out_wellkg_v3_catalog_v1.csv	Input catalog (local only; do not commit).

De-duplication and cross-well flagging

De-duplication: Within each well and bucket, items are de-duplicated by normalized filename. When duplicates exist, the generator keeps a single “best” path using a deterministic preference score (for example, preferring Well_logs_pr_WELL over Well_logs).

Cross-well flagging: For file rows only, the generator scans the filename and path text for other well IDs (strict pattern match). If any are found, they are not excluded. They are included and annotated:

manifest.json: foreign_ref_wells: ["15/9-F-..."]
manifest.md: appended as (foreign-ref: ...)

Directories are not cross-well flagged.

Intended use

These manifests are designed to:

Speed up well-centric discovery in a large, inconsistently organized archive
Provide a deterministic evidence map for research and SPE-style studies
Serve as a foundation for later graph work (RDF, Neo4j, NetworkX)

This repository focuses on deterministic indexing, not ML or LLM retrieval.

Requirements

See requirements.txt.

License

MIT License. See License.

Citation

See CITATION.cff.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Volve Well Knowledge Graph (Well Dossier Manifests)

What this produces

Generating the input catalog

Inputs

Quickstart

Windows PowerShell

Linux / macOS (bash)

Repository structure

De-duplication and cross-well flagging

Intended use

Requirements

License

Citation

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
data		data
docs		docs
examples		examples
schema		schema
scripts		scripts
tools		tools
CITATION.cff		CITATION.cff
License		License
README.md		README.md
requirements.txt		requirements.txt
well_whitelist.csv		well_whitelist.csv

Folders and files

Latest commit

History

Repository files navigation

Volve Well Knowledge Graph (Well Dossier Manifests)

What this produces

Generating the input catalog

Inputs

Quickstart

Windows PowerShell

Linux / macOS (bash)

Repository structure

De-duplication and cross-well flagging

Intended use

Requirements

License

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages