Skip to content
View shipatel302's full-sized avatar

Block or report shipatel302

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
shipatel302/README.md

Shivani Patel

I turn biological chaos into actionable insight.

Single cells. Whole genomes. Drug targets. Clinical records. From raw data to decisions — end to end.

LinkedIn Email University


The Short Version

I am a bioinformatician, data scientist, and statistical modeler with 4+ years of experience. I build things that scale, run analyses that hold up under scrutiny, and communicate findings that people actually understand and use.

The data I work with: single-cell RNA-seq · bulk RNA-seq · whole genome & exome · proteomics · HTS compound screens · clinical records · multi-omics

The problems I solve: disease subtype classification · drug target identification · somatic variant detection · pathway dysregulation · compound hit prioritization · pipeline bottlenecks

The standard I hold: reproducible · statistically rigorous · biologically interpretable · production-ready


What I Actually Do

Raw Data                         Insight
────────                         ───────
FASTQ files          ──▶   QC → Alignment → Variant Calling → Annotation
scRNA-seq counts     ──▶   Normalization → Clustering → Cell Type ID → DEGs
Mass spec outputs    ──▶   LFQ → Protein quantification → Multi-omics fusion
HTS screening data   ──▶   Plate QC → Dose-response → IC50 → Hit ID
Clinical records     ──▶   Cleaning → Modeling → Statistical reports
All of the above     ──▶   Machine learning → Prediction → SHAP → Publication

Four Roles. One Person.

🧬 Bioinformatician

Building end-to-end NGS pipelines that actually run in production. Snakemake workflows on SLURM HPC and AWS Batch. scRNA-seq with Seurat and Scanpy. Somatic variant calling with GATK Mutect2. Differential expression with DESeq2. 30% faster. 26% more throughput. Not by accident — by design.

Tools: Seurat · Scanpy · DESeq2 · STAR · BWA-MEM · GATK · Mutect2 · samtools · bcftools · Snakemake · BaseSpace

🤖 Data Scientist

Machine learning on biological data — where the features are genes, the labels are disease subtypes, and the stakes are real. Random Forest classifier at AUC 0.85. SHAP-based feature importance. 5-fold cross-validation. UMAP and PCA for dimensionality reduction. Models that generalize, not just memorize.

Tools: R · Python · randomForest · caret · xgboost · SHAPforxgboost · scikit-learn · pROC

📊 Statistical Modeler

Statistics as the backbone of every analysis — not an afterthought. FDR correction on differential expression. LASSO for feature selection. Multivariate regression on clinical datasets. Proper experimental design. Cross-validation that means something. Results rigorous enough to publish and clear enough for a clinician to act on.

Tools: R (primary) · DESeq2 · LASSO · logistic regression · PCA · FDR · Biostatistics

🗄️ Data Analyst

Raw data is almost never clean. Clinical records, formulation datasets, structured databases — every dataset has problems. SQL for querying and validation. Python scripts for integrity checks. QC protocols that catch errors before they propagate. Normalization, transformation, and reporting that R&D teams can trust.

Tools: SQL · Python · R · pandas · dplyr · ggplot2 · tidyr · Excel


Featured Projects

Single-cell RNA-seq pipeline + Machine Learning classifier for HPV+ cancer cohorts

A complete analysis workflow — from raw count matrices to a validated Random Forest subtype classifier. Built in R using Seurat for preprocessing, DESeq2 for differential expression, and SHAP for model interpretability.

QC & Filtering → Normalization → UMAP Clustering → FDR-controlled DEGs
→ Feature Matrix (top 200 DEGs + TMB scores)
→ Random Forest (5-fold CV) → AUC 0.85 → SHAP Feature Importance

R Seurat DESeq2 randomForest SHAP pROC ggplot2


Production-grade NGS pipeline: somatic variant detection + transcriptomic analysis

A Snakemake workflow covering the full NGS stack — QC, alignment, duplicate marking, base recalibration, Mutect2 variant calling, and DESeq2 differential expression. Deployable on SLURM HPC and AWS Batch. Achieves 30% runtime reduction through parallelization and optimized resource allocation.

FastQC → Trimmomatic → BWA-MEM → MarkDuplicates → BQSR
→ Mutect2 → FilterMutectCalls → bcftools merge
→ featureCounts → DESeq2 → MultiQC report

Snakemake GATK4 BWA-MEM Mutect2 DESeq2 AWS SLURM Docker


HTS compound screening pipeline for proximity-based drug discovery (PROTACs, molecular glues)

An R-based pipeline for processing high-throughput screening data — plate QC (Z' factor, SSMD), 4-parameter log-logistic dose-response curve fitting, IC50 estimation with 95% CI, hit identification, and downstream integration with transcriptomic and proteomic data for multi-omics candidate prioritization.

Plate QC (Z' factor) → 4PL Dose-Response Fitting → IC50 Estimation
→ Hit Identification → Transcriptomic + Proteomic Integration
→ Composite Prioritization Score → Ranked Candidate List

R drc ggplot2 DESeq2 LFQ proteomics pheatmap


Technical Stack

languages    <- c("R (primary)", "Python", "SQL", "Bash", "Linux")

omics        <- c("scRNA-seq", "Bulk RNA-seq", "WGS", "WES",
                  "Proteomics (MS/LFQ)", "Somatic Variants",
                  "TMB", "CNV", "Clinical Data", "HTS Screening")

ml_stats     <- c("Random Forest", "LASSO", "Logistic Regression",
                  "PCA", "UMAP", "SHAP", "Cross-Validation",
                  "FDR Correction", "Differential Expression",
                  "Multivariate Regression", "Dimensionality Reduction")

bio_tools    <- c("Seurat", "Scanpy", "DESeq2", "STAR", "BWA-MEM",
                  "GATK4", "Mutect2", "samtools", "bcftools",
                  "IGV", "Snakemake", "BaseSpace", "featureCounts")

infra        <- c("SLURM HPC", "AWS S3", "AWS EC2", "AWS Batch",
                  "Docker", "Git", "GitHub")

By The Numbers

Metric Value
Years of experience 4+
Random Forest classifier AUC 0.85
Pipeline throughput improvement +26%
Pipeline runtime reduction -30%
Omics data types worked with 8+
Programming languages 4
GPA (M.S. Bioinformatics) 3.8 / 4.0

Education

🎓 M.S. Bioinformatics Data Science · University of Delaware · GPA 3.8 · Expected May 2026 Systems Biology · Biostatistics · Machine Learning · Bioinformatics · Electronic Health Records

🎓 B.Pharm · Dr. Babasaheb Ambedkar Technological University · 2022


Currently Open To

roles = [
    "Bioinformatics Analyst",
    "Senior Bioinformatics Analyst",
    "Computational Biologist",
    "Data Scientist (Biotech / Pharma / Healthtech)",
    "Statistical Analyst / Biostatistician",
    "Clinical Data Analyst"
]

domains = [
    "Drug Discovery",
    "Oncology",
    "Rare Disease",
    "Genomics",
    "Multi-Omics",
    "Clinical Research"
]

If your team works in any of these spaces — let's talk.

📧 shivanip8369@gmail.com · LinkedIn


"Three years of working with data that does not cooperate has a way of building a very specific skill set."

Popular repositories Loading

  1. shipatel302 shipatel302 Public

    Bioinformatics Analyst & Data Scientist | 3+ years · scRNA-seq · Multi-Omics · Drug Discovery · PROTACs · HTS · Statistical Modeling · Machine Learning · R · Python · SQL · Snakemake · AWS · SLURM …

    1

  2. cancer_driver_analysis cancer_driver_analysis Public

    cancer_driver_analysis

  3. breast_cancer_ML breast_cancer_ML Public

    breast_cancer_ML

    Jupyter Notebook

  4. cancer_mortality cancer_mortality Public

    HTML

  5. scRNAseq-cancer-analysis scRNAseq-cancer-analysis Public

    Single-cell RNA-seq pipeline with Random Forest classifier (AUC 0.85) for HPV+ cancer subtype prediction | R · Seurat · DESeq2 · SHAP · AWS

    R

  6. lifesci-rag-pipeline lifesci-rag-pipeline Public

    Python