Skip to content

jorvis/GALES

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GALES

Genomic Annotation Logic and Execution System (GALES): Annotate a genome locally or in the cloud in minutes.

Getting Started

These instructions describe how to get an annotation pipeline running on your machine. The current version contains a functional prokaryotic pipeline with both metagenomic and eukaryotic coming next. You can test run the prok-cheetah pipeline now, and please file issue tickets if you encounter any errors or if anything is unclear.

The setup instructions below are for Ubuntu 18.04 LTS, but you can adjust as necessary for your OS.

Prerequisities

The pipeline and tools are represented using Common Workflow Language (CWL), with all dependent tools contained within Docker images. These two things are the only prerequisites, and are easily installed.

Install Docker and pre-requisites

The Docker site has detailed instructions for many architectures, but for some this may be as simple as:

sudo apt-get install docker.io python3 python3-pip python-pip zlib1g-dev libxml2-dev
sudo usermod -aG docker $USER
[restart]

If you get an error there about python-pip not being found, you probably need to enable the universe repository.

If this is the first time you've installed Docker Engine, reboot your machine (even if the docs leave this step out.)

Install CWL

sudo pip3 install cwlref-runner

If you get an error like "error: externally-managed-environment × This environment is externally managed ╰─> To install Python packages system-wide, try apt install..."

You can check out the cwltool github repo and follow the README instructions to install the library. One simple solution, which completes a system-wide install using apt, is shown below:

sudo apt-get install cwltool

This Stack Overflow thread has decent replies which indicate other solutions to this type of problem.

Install igraph (OS X only)

brew install igraph

Install python modules

The Biocode scripts and libraries are used within GALES. Note that biocode uses Python3, so the version of pip called is pip3.

sudo pip3 install biocode jinja2

Get GALES

Now that you have the dependencies to run things, you need only the actual pipeline/tool CWL definitions.

git clone https://github.com/jorvis/GALES.git

Getting reference data

The pipelines depend on reference data against which searches will be performed. These only need to be downloaded once but can be large depending on the version of the pipeline you use. As an example, let's walk through running the 'cheetah' version of the prokaryotic annotation pipeline, which is the fastest and uses the smallest datasets.

cd GALES/bin
sudo mkdir /dbs
sudo chown $USER /dbs
./download_reference_data -rd /dbs -p prok-cheetah

I put my reference collection in /dbs (you can choose another directory), and this tells the script to search for any I don't have yet and place them there.

Running

There are launchers for the different pipelines, which will check your system before running.

If you have multiple processors/threads, you can tell GALES to use more of these by passing the -t option, such as "-t 4" to use four threads when executing those tools in the pipeline which are capable of using it. Once completed, the annotated GFF file will be called 'attributor.annotation.gff3', along with many other files representing the evidence involved in generating the annotation.

./run_prok_pipeline -i ../test_data/genomes/E_coli_k12_dh10b.fna -od /tmp/demo -v cheetah -rd /dbs

Visualization

This is very experimental and under active development, but you can create a web interface to view the results of your annotation and evidence graphically like this:

./view_annotation -i /tmp/demo -f ../test_data/genomes/E_coli_k12_dh10b.fna

This will parse the database, generate a GO-slim mapping, and provide a local URL where you can view the browser.

Common issues and solutions

  • "requests module not found". This has been reported by some Mac users. A suggested fix is:
/Library/Frameworks/Python.framework/Versions/3.8/bin/python3.8 -m pip install requests

Authors

See the list of contributors who participated in this project.

License

This project is licensed under the MIT License - see the LICENSE.md file for details

About

Genomic Annotation Logic and Execution System

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •