PySciQuery

PySciQuery is a Python package for querying scientific databases like PubMed and PMC (PubMed Central). It provides a command-line interface for easy searching and downloading of scientific articles.

Installation

Ensure you have Python 3.10 installed.
Install the package from Github:

pip install git+https://github.com/Wyss/pysciquery.git

Usage

PySciQuery provides two main commands: search and download.

Search Command

Search PubMed or PMC databases using queries from an input JSON file.

pysciquery search <database> <input_file>

<database>: Either 'pubmed' or 'pmc'
<input_file>: Path to the input JSON file containing search queries and parameters

Example:

pysciquery search pubmed input_file.json

The search will be performed using the NCBI API. All terms in the query list will be queried and then all terms in the query list appended with the terms in the modifier list will be queried (ex. "thermal proteome profiling" and "thermal proteome profiling xenopus"). "Strict" queries will only return results that contain the exact phrase (ex."thermal proteome profiling) across all fields. "Full" queries will return results that contain any of the terms in the query list (ex. thermal[All Fields] AND ("proteome"[MeSH Terms] OR "proteome"[All Fields]) AND profiling[All Fields]) across all fields.

"All Fields" include:

Title
Abstract
Author names
Journal name
MeSH (Medical Subject Headings) terms
Substance names
Publication types
Personal name as subject
Corporate author
Secondary source ID
Comment/correction relations
Other terms field

The input JSON file should contain the databases you would like to query (DATABASE_LIST), the query type, either "strict" or "full", (NCBI_QUERY_TYPE), the query list (QUERY_LIST), and the modifier list (MODIFIER_LIST) and your NCBI email (NCBI_EMAIL). The NCBI email is not required. Example structure:

{
    "DATABASE_LIST": [
        "pubmed", "pmc"
    ],
    "NCBI_QUERY_TYPE": "full", 
    "NCBI_EMAIL": "my_email@domain.com",
    "QUERY_LIST": [
        "thermal proteome profiling",
        "ketamine",
        "dexmedetomidine",
        "etomidate"
    ],
    "MODIFIER_LIST": [
        "xenopus",
        "xenopus laevis",
        "ketamine",
        "dexmedetomidine",
        "etomidate",
        "zebrafish",
        "danio rerio",
        "human",
        "homo sapiens",
        "mouse",
        "mus musculus",
        "anesthetic"
    ]
}

The search command will return two excel files in the current working directory:

<database>_api_<query_type>_<timestamp>.xlsx: Contains detailed information about each article
<database>_total_results_<query_type>_<timestamp>.xlsx: Contains a summary of the total number of results for each query

Download Command

Download PDF articles from PMC using a list of PMIDss from a JSON file. The script will match PMIDs to PMCIDs and download the full text of the articles and save them in a specified directory.

pysciquery download <id_file> --email <your_email> [--output-dir <directory>]

<id_file>: Path to the JSON file containing article PMIDs
--output-dir: (Optional) Directory to save downloaded files (default: ./downloads)

Example:

pysciquery download ids.json --output-dir ./pubmed_articles

Supported ID Types

Use PMIDs (e.g., "39290210")

The JSON file with IDs should have the following structure:

{
    "NCBI_EMAIL": "your.email@example.com",
    "PMIDS": [
        "39290210",
        "39028932"
]
}

Development

To set up the development environment:

Clone the repository (if you haven't already)
Install development dependencies:
```
pipenv install
```
Activate the virtual environment:
```
pipenv shell
```

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
pysciquery		pysciquery
.gitignore		.gitignore
LICENSE		LICENSE
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PySciQuery

Installation

Usage

Search Command

Download Command

Supported ID Types

Development

About

Uh oh!

Releases

Packages

Languages

License

Wyss/pysciquery

Folders and files

Latest commit

History

Repository files navigation

PySciQuery

Installation

Usage

Search Command

Download Command

Supported ID Types

Development

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages