Setup

python -m pip install ontologize

Usage

Vignette: Genes

Suppose we have a list of genes, perhaps ones that are upregulated in a certain environment, and we wish to understand the functional changes in the cell.

We can first build an Ontology object from a list of the genes' BioCyc IDs:

from ontologize.ontology import build_ontology

# cadA EG10131
# lacA EG10524 
# xylA EG11074 
ont = build_ontology(objects=["EG10131", "EG10524", "EG11074"], schema_type="Gene")

Ontology objects store an annotated ontology graph, as a networkX DiGraph:

import networkx as nx
assert isinstance(ont.graph, nx.DiGraph)

Rich printing options are supported, including truncation of the graph at a given depth, inclusion/exclusion of leaf nodes, whether to color by depth.

print(ont.to_string(max_depth=None, include_leaves=False, colors=True))

In this example, we see that lacA and xylA are both involved in carbon utilization, while cadA is related to pH adaptation.

Command-Line Interface

Once exposed, ontologize exposes a runnable script, and can also be called as a module:

ontologize <file> <schema_type> [flags]
python -m ontologize <file> <schema_type> [flags]

The required arguments are given as follows:

file: Path to a .csv, .tsv, or .xlsx file with BioCyc object IDs to ontologize. By default, assumes a (header-less, if .csv or .tsv) first column containing the IDs to be ontologized. If a .xlsx file is given, then by default, IDs are assumed to be in the first sheet in the first column, treating the first entry as a header.
schema_type : Type of the objects (or properties) to be ontologized in the Biocyc Schema. For example, this might be Gene, Pathway, Compound, etc.

Note that schema_type uses the singular form of the class name!

Example:

# TODO

Flags

Ontology-building options:

-s <sheet_name>, --sheet <sheet_name>: For a .xlsx file, the name of the sheet containing BioCyc IDs. Ignored if file is not a .xlsx file.
-o <objects>, --objects <objects>: For a multi-column file, the name of the column containing BioCyc IDs for the objects to ontologize. Requires a header row containing column names.
-p <objects>, --property <objects>: For a multi-column file, the name of the column containing BioCyc IDs for the property to ontologize. Requires a header row containing column names. When using this option, the objects must also be specified using the -o option.
WARNING: -p, --property NOT YET IMPLEMENTED
--database <orgid>: BioCyc organism ID, used to specify the organism-specific database within to search. ECOLI by default.

Printing options:

--depth <depth>: Maximum depth of the ontology to print. No limit by default.
--leaves: Whether to show leaf nodes, i.e., the ontologized objects themselves. Not shown by default.
--coloroff: Turns off colorful printing.

TODO: graph options (not implemented), pkl options, --interactive (allows maintaining session)

References

[BioCyc19] Karp, P.D., et al., The BioCyc collection of microbial genomes and metabolic pathways Briefings in Bioinformatics (2019).

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
src/ontologize		src/ontologize
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
vignette_1.png		vignette_1.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Setup

Usage

Vignette: Genes

Command-Line Interface

Example:

Flags

References

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

Robotato/ontologize

Folders and files

Latest commit

History

Repository files navigation

Setup

Usage

Vignette: Genes

Command-Line Interface

Example:

Flags

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages