I'm a research-driven Machine Learning Engineer passionate about building trustworthy, interpretable, and domain-aligned ML systems, especially in high-stakes fields like healthcare.
With a background in Electrical Engineering (B.Tech, NIT) and an M.Eng from the University of Toronto, Iโve worked across the ML stackโfrom backend systems to LLM finetuning and clinical QA evaluation. My recent work focuses on Latent Fragility and Diagnostic Instability in Clinical LLMs, presented at the Agentic & GenAI Evaluation Workshop @ KDD 2025.
My core focus lies in making LLMs robust, explainable, and clinically aligned. I specialize in:
-
LLM Evaluation for Clinical QA
Developed LAPD (Latent Agentic Perturbation Diagnostics) and LDFR (Latent Diagnosis Flip Rate), a geometric framework for analyzing hidden fragility in foundation models when perturbed with masking, negation, and synonym edits. -
Multimodal and Privacy-Aware AI in Healthcare
- Clinical Panda: Generates diagnosis-grounded explanations from synthetic clinical notes.
- PIIguardLLM: Enhances privacy via structured LLM-based redaction, preserving utility while masking sensitive data.
- Wang Lab (UofT): Investigating vision-language alignment for robust multimodal medical LLMs.
-
Interpretability & Sparse Reasoning
Built neuro-inspired explanation techniques for CNNs with a focus on localized, human-aligned visual saliency under sparse constraints.
-
๐ KDD 2025 (Agentic & GenAI Evaluation Workshop)
Embeddings to Diagnosis: Latent Fragility under Agentic Perturbations in Clinical LLMs
โ Introduced a diagnostic framework that captures latent instabilities missed by surface metrics like BERTScore. -
๐ก Research Engineer @ Vector Institute
- Finetuned clinical LLMs using Retrieval-Augmented Generation (RAG) pipelines.
- Integrated diagnostic agents with MedGemma and ClinicalBERT for QA tasks.
-
๐ฅ Independent Researcher
- Evaluated LLM robustness using synthetic and real clinical notes (DiReCT, MIMIC-IV).
- Quantified boundary flips in PCA-projected latent space across OpenAI, Mistral, and Meta models.
I'm actively seeking roles where I can drive evaluation-first ML, especially in:
- Applied Scientist / Research Engineer roles focused on LLM evaluation, clinical robustness, or model safety
- Teams that value original, responsible AI thinking over resume checkboxes
- Startups, labs, or product teams working on health, legal, or high-risk LLM deployment
- ๐ Portfolio: https://unni12345.github.io
- ๐ซ Reach me at raj.vijayaraj@mail.utoronto.ca

