Skip to content

Wyss/pysciquery

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PySciQuery

PySciQuery is a Python package for querying scientific databases like PubMed and PMC (PubMed Central). It provides a command-line interface for easy searching and downloading of scientific articles.

Installation

  1. Ensure you have Python 3.10 installed.

  2. Install the package from Github:

pip install git+https://github.com/Wyss/pysciquery.git

Usage

PySciQuery provides two main commands: search and download.

Search Command

Search PubMed or PMC databases using queries from an input JSON file.

pysciquery search <database> <input_file>
  • <database>: Either 'pubmed' or 'pmc'
  • <input_file>: Path to the input JSON file containing search queries and parameters

Example:

pysciquery search pubmed input_file.json

The search will be performed using the NCBI API. All terms in the query list will be queried and then all terms in the query list appended with the terms in the modifier list will be queried (ex. "thermal proteome profiling" and "thermal proteome profiling xenopus"). "Strict" queries will only return results that contain the exact phrase (ex."thermal proteome profiling) across all fields. "Full" queries will return results that contain any of the terms in the query list (ex. thermal[All Fields] AND ("proteome"[MeSH Terms] OR "proteome"[All Fields]) AND profiling[All Fields]) across all fields.

"All Fields" include:

  • Title
  • Abstract
  • Author names
  • Journal name
  • MeSH (Medical Subject Headings) terms
  • Substance names
  • Publication types
  • Personal name as subject
  • Corporate author
  • Secondary source ID
  • Comment/correction relations
  • Other terms field

The input JSON file should contain the databases you would like to query (DATABASE_LIST), the query type, either "strict" or "full", (NCBI_QUERY_TYPE), the query list (QUERY_LIST), and the modifier list (MODIFIER_LIST) and your NCBI email (NCBI_EMAIL). The NCBI email is not required. Example structure:

{
    "DATABASE_LIST": [
        "pubmed", "pmc"
    ],
    "NCBI_QUERY_TYPE": "full", 
    "NCBI_EMAIL": "my_email@domain.com",
    "QUERY_LIST": [
        "thermal proteome profiling",
        "ketamine",
        "dexmedetomidine",
        "etomidate"
    ],
    "MODIFIER_LIST": [
        "xenopus",
        "xenopus laevis",
        "ketamine",
        "dexmedetomidine",
        "etomidate",
        "zebrafish",
        "danio rerio",
        "human",
        "homo sapiens",
        "mouse",
        "mus musculus",
        "anesthetic"
    ]
}

The search command will return two excel files in the current working directory:

  • <database>_api_<query_type>_<timestamp>.xlsx: Contains detailed information about each article
  • <database>_total_results_<query_type>_<timestamp>.xlsx: Contains a summary of the total number of results for each query

Download Command

Download PDF articles from PMC using a list of PMIDss from a JSON file. The script will match PMIDs to PMCIDs and download the full text of the articles and save them in a specified directory.

pysciquery download <id_file> --email <your_email> [--output-dir <directory>]
  • <id_file>: Path to the JSON file containing article PMIDs
  • --output-dir: (Optional) Directory to save downloaded files (default: ./downloads)

Example:

pysciquery download ids.json --output-dir ./pubmed_articles

Supported ID Types

  • Use PMIDs (e.g., "39290210")

The JSON file with IDs should have the following structure:

{
    "NCBI_EMAIL": "your.email@example.com",
    "PMIDS": [
        "39290210",
        "39028932"
]
}

Development

To set up the development environment:

  1. Clone the repository (if you haven't already)
  2. Install development dependencies:
    pipenv install
    
  3. Activate the virtual environment:
    pipenv shell
    

About

Search literature and databases using combinatory keyword query and download results.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages