Sayantan Pal sayantan007pal

Hi 👋, I'm Sayantan Pal

Associate Software Engineer | AI/ML Enthusiast | Digital VLSI Developer

🚀 About Me

I'm a passionate technologist currently working as an Associate Software Engineer at Prismforce, where I architect scalable recruitment platforms and AI-powered solutions. My journey spans from digital VLSI design to full-stack development, machine learning, and production-grade AI systems.

🔭 Currently building SelectPrism - a high-scale recruitment platform handling 1,000+ concurrent AI interview sessions
🎓 B.Tech in Electronics and Communication Engineering from Jalpaiguri Government Engineering College (CGPA: 7.647/10.0)
💼 2+ years of experience in software development, AI/ML, and systems architecture
🏆 Won $100 at Quine Quest 22 for an AI-powered document summarizer
📝 Published research at ICDEC-2025 and NCRTST-2025 on ML-based cardiac risk prediction and FPGA timing systems
🌱 Deep learning into Deep Learning, Computer Vision, and VLSI Design

💻 Current Work @ Prismforce

Building SelectPrism from Scratch (Feb 2025 - Present)

Architected a production-grade recruitment platform using Node.js, TypeScript, Python (FastAPI), MongoDB, and AWS
Engineered voice AI interview agents with WebRTC and LiveKit, achieving P95 latency under 200ms for STT→LLM→TTS pipeline
Developed a resume parser using Ollama with local LLMs, achieving 92% accuracy on 500+ manually labeled resumes
Implemented security hardening: Redis-based rate limiting (100 req/min), bcrypt authentication, Google reCAPTCHA
Built automated IVR campaigns via Ozonetel and Bull queue email system, scaling from 100 to 5,000+ daily notifications
Optimized API performance with MongoDB schema validation and RTK Query caching, reducing response time from 340ms to 220ms

🏆 Featured Projects

🎯 Questionify - AI Question Generation Platform

Tech Stack: Next.js 14, Node.js, TypeScript, Redux Toolkit, MongoDB, AWS SQS, Claude AI

The most sophisticated project in my portfolio - A production-grade multi-agent system for AI-powered assessment question generation with advanced quality assurance.

Key Achievements:

Multi-Agent Architecture (V2): Implemented asynchronous message-passing system with Research Agent → Question Generation Agent → Judge Agent pipeline using AWS SQS for enterprise-scale reliability
Advanced Quality Control: Every question evaluated on 6 criteria (Requirements Alignment 25%, Research Accuracy 15%, Difficulty Match 15%, Uniqueness 15%, Clarity 15%, Industry Standards 15%) with configurable 85+ threshold
Intelligent Clarification System: Built conversational AI that achieves full context in 2+ rounds or when AI confirms sufficient information using multi-turn context management
Comprehensive Feature Engineering: 494 total features including non-linear transformations, statistical moments, network analysis, pathway scores, and complexity measures
Full-Stack Excellence: Next.js 14 App Router with Redux state management, real-time progress tracking, and responsive UI components
Production-Ready Infrastructure: LocalStack for development, containerized with Docker, comprehensive error handling and dead-letter queues

Technical Highlights:

Multiple LLM provider support (Claude, OpenAI, Gemini, Ollama) with factory pattern
Cloud-agnostic design (AWS/GCP/Azure) with provider abstraction
Microservices-ready architecture - each agent can be independently scaled
Real-time status polling with WebSocket-style updates
Advanced file generation (PDF, DOCX, JSON, CSV) with custom formatting

Impact: Transforms manual question creation (hours) → automated high-quality generation (minutes) with iterative refinement until quality standards met.

🧬 MPEG-G Track 4 - Latent Health State Discovery

Tech Stack: PyTorch, Python, UMAP, HDBSCAN, Optuna, Scikit-learn

Competition-winning deep learning solution achieving 0.8125+ silhouette score (baseline: 0.747, +8.7% improvement) for health state embedding discovery.

Key Innovations:

Multi-Modal Deep Learning Architecture:
- Cytokine Encoder: Multi-head attention transformer (8 heads, 2 layers) for capturing complex cytokine relationships
- Clinical Encoder: Specialized MLP for metabolic features
- Temporal Encoder: Bidirectional GRU for longitudinal patterns
- Cross-Modal Attention Fusion: Allows modalities to dynamically attend to each other
Advanced Contrastive Learning:
- Combined loss function: NT-Xent (SimCLR-style) + Triplet Loss + Supervised Contrastive + Temporal Contrastive
- Optimized weights: Supervised (50%) + Temporal (49%) dominant after 100 Optuna trials
- Temperature scaling and hard negative mining for improved embedding quality
Systematic Hyperparameter Optimization:
- 100 trials using Optuna TPE sampler
- Discovered shallow architecture (2 layers) outperforms deep (3-4 layers)
- Comprehensive search across 15+ hyperparameters
UMAP-64 Preprocessing Pipeline:
- Dimensionality reduction from 256D → 64D before clustering
- Scientifically sound approach validated across multiple runs
- Reproducible evaluation with fixed random seeds

Performance Metrics:

Validation Silhouette: 0.8125
Discovered 10+ distinct health state clusters
Noise: <15% with optimized HDBSCAN (min_cluster=20, min_samples=15, metric='manhattan')

Deliverables: Complete submission package with embeddings.csv, visualizations (UMAP 2D, t-SNE, cluster distributions), performance metrics, and comprehensive documentation.

🌍 Adaptation Atlas - Climate Data Storytelling

Tech Stack: Observable Framework, D3.js, JavaScript, Python, Google Earth Engine

Climate adaptation research platform for infectious disease analysis with interactive data visualizations.

Features:

Observable notebooks with custom styling and IBM Plex Sans typography
Integration with Adaptation Atlas datasets (GAUL 2024 administrative boundaries, WMO watershed data)
Python + Google Earth Engine pipeline for soil and climate data processing
Responsive visualizations optimized for web and mobile
Export to standalone HTML for distribution

Impact: Enables data-driven insights for climate-health nexus research with accessible, shareable visualizations.

🎾 Tennis Analysis System with YOLO & CNN

Tech Stack: Python, YOLOv8, PyTorch, OpenCV, CNNs

Fine-tuned YOLOv8 on custom tennis dataset for multi-object tracking across 1,200+ frames without ID loss
Trained PyTorch CNN for court keypoint detection achieving 92% accuracy on 17 keypoints per frame
Built end-to-end pipeline integrating detection, tracking, and keypoint models for real-time match analysis
Extracted player positions, court geometry, and movement analytics from match footage

🤖 AI-Powered Resume Matching Engine

Tech Stack: Python, FastAPI, Ollama, MongoDB, Docker, LocalStack

Developed complete recruitment pipeline: PDF parsing → embedding generation → vector search → ranked candidate output
Validated with 200 recruiter-labeled job-resume pairs, achieving 88% top-5 recommendation accuracy
Containerized entire stack with Docker and simulated AWS (S3, SQS) locally using LocalStack for cost-efficient development
Implemented semantic search using Ollama embeddings for intelligent candidate-job matching

💬 Chat with Resume - AI Career Assistant

Tech Stack: React.js, Node.js, Express.js, CopilotKit, LangGraph, Material-UI

Built AI-powered resume interaction platform with multiple specialized CoAgents
Features: Resume evaluation, job description tailoring, and interview preparation simulation
Integrated LangGraph for multi-agent orchestration and conversation flow management
Clean, responsive Material-UI interface for seamless user experience

🏥 Healthcare Diagnosis Assistant

Tech Stack: Python, Flask, Machine Learning, Medical AI

ML-based diagnostic system for symptom analysis and disease prediction
User-friendly web interface for patient information input and diagnosis output
Integrated explainable AI for transparent prediction reasoning

📅 Scheduling Calendar Assistant with CoAgent

Tech Stack: Python, Google Calendar API, CoAgent, Natural Language Processing

AI-powered calendar management system with natural language query understanding
OAuth2 integration with Google Calendar for seamless event management
CoAgent NLP capabilities for diverse user query interpretation
Automated scheduling, event creation, and calendar conflict resolution

💬 AI Customer Support App with MindsDB

Tech Stack: Python, Flask/FastAPI, React, MindsDB, REST APIs

Full-stack customer support application with AI-driven response generation
MindsDB integration for real-time ML-powered query handling
Live chat interface and ticket management system
Scalable architecture suitable for production deployment

🚀 Daytona Pydantic AI Flask App

Tech Stack: Python, Flask, Pydantic, OpenAI API, Daytona, Tailwind CSS

Built for Daytona Challenge 023 demonstrating streamlined dev environment management
AI-powered prompt responses using OpenAI integration
Pydantic for robust data validation and type safety
Responsive design with Tailwind CSS

🔐 Daytona Authorizer

Tech Stack: Node.js, Express.js, JWT, bcrypt, MongoDB

Secure authentication and authorization system with JWT implementation
Password hashing with bcrypt, email-based password reset functionality
Role-based access control (admin/user) and protected route management
Comprehensive error handling and security best practices

🎓 Research & Publications

📄 Published Papers

"An Advanced Framework For Cardiac Risk Prediction And Real-Time Monitoring Using Machine Learning And IoT"
- Presented at ICDEC-2025 (International Conference on Digital Electronics and Communications)
- Developed ML-based system for real-time cardiac risk flagging using IoT sensor data
- Integrated edge computing with cloud-based ML models for continuous health monitoring
"FPGA-Based Precision Timing Generator for Cold Collision Experiments"
- Published at NCRTST-2025 (National Conference on Recent Trends in Science and Technology)
- Achieved timing precision of ±1ns for quantum physics experimental setups
- Implemented on FPGA for high-reliability, deterministic timing control

🛠️ Technical Skills

Languages

Backend Development

Frameworks: Node.js, Express.js, FastAPI, Flask
APIs: REST APIs, WebRTC, LiveKit
Real-time: WebSockets, Server-Sent Events, Bull Queue

Frontend Development

Frameworks: React.js, Next.js
State Management: Redux Toolkit, RTK Query
Styling: Bootstrap, Tailwind CSS, Material-UI

Databases & Caching

NoSQL: MongoDB, Redis
SQL: PostgreSQL, MySQL
Vector DBs: Experience with embedding-based search

Machine Learning & AI

Frameworks: PyTorch, TensorFlow, Scikit-learn
Computer Vision: YOLOv8, OpenCV, CNN architectures
NLP: Ollama, LangChain, CopilotKit, LangGraph
Libraries: Pandas, NumPy, Seaborn, Matplotlib

DevOps & Cloud

Cloud Platforms: AWS (EC2, S3, SQS, Lambda)
Containerization: Docker, LocalStack
Workflow Orchestration: Apache Airflow
Version Control: Git, GitHub
CI/CD: GitHub Actions, automated deployment pipelines

Hardware & VLSI

Tools: MATLAB, Arduino
Design: Digital circuit design, FPGA programming
Testing: Logic analyzer, oscilloscope

📊 Previous Experience

Data Science Intern @ Celebal Technologies

May 2024 - July 2024 | Remote

Analyzed employee turnover patterns using K-means clustering
Discovered 40% lower turnover in mid-tenure employees (3-5 years) earning 27L-40L
Mapped salary-vs-experience retention curves and pitched restructuring to HR leadership
Delivered data-driven insights for talent retention strategy optimization

🌟 Open Source Contributions

🏆 stdlib.js - Production DevOps Fix (12.5k ⭐)

PR #8600: Fix DevContainer Build Failures in GitHub Codespaces - MERGED ✅

Impact: Fixed critical infrastructure issue affecting all 538+ contributors trying to use GitHub Codespaces for stdlib development.

The Problem:

DevContainer builds consistently failing with write error: no space left on device
32GB Codespaces exhausted by massive 10GB+ universal base image
Broken ShellCheck dependency blocking container initialization
Python support missing despite being required for development

My Solution:

{
  "image": "mcr.microsoft.com/devcontainers/javascript-node:1-22-bookworm", // ⚡ 70% smaller
  "features": {
    "ghcr.io/devcontainers/features/python:1": {},                          // ✅ Restored
    "ghcr.io/devcontainers-extra/features/shellcheck:1": {},                // ✅ Fixed dependency
    "ghcr.io/rocker-org/devcontainer-features/r-apt:0": {},
    "ghcr.io/julialang/devcontainer-features/julia:1": {},
    "ghcr.io/rocker-org/devcontainer-features/pandoc:1": {}
  }
}

Technical Achievements:

Optimized Base Image: Migrated from universal:2 (10GB+) to javascript-node:1-22-bookworm - reducing disk footprint by ~70%
Fixed Broken Dependencies: Updated unmaintained ShellCheck feature (marcozac/) to actively maintained fork (devcontainers-extra/)
Restored Python Support: Explicitly added Python feature that was missing from smaller base image
Verified Multi-Language Support: Ensured Node.js, Python, R, Julia, ShellCheck, and Pandoc all working post-migration

Results:

✅ Container builds successfully on standard 32GB Codespaces
⚡ 3x faster rebuild times due to smaller image
🔧 All required development tools functional
📊 Approved by 2 maintainers (@batpigandme, @Planeshifter)
🎯 137/137 CI checks passed

Community Response:

"LGTM. I got a high CPU usage warning at one point, but build succeeds without a write error. Thanks for this fix!"
— @batpigandme (stdlib maintainer)

"Thank you @sayantan007pal for this PR; much appreciated!"
— @Planeshifter (stdlib core maintainer)

Skills Demonstrated:

DevOps troubleshooting in complex multi-language environments
Docker optimization and container image selection
Dependency management and upstream feature tracking
Cross-platform development environment setup (Node.js + Python + R + Julia)
GitHub Codespaces infrastructure understanding

Daytona (13.8k ⭐)

Pull Request #1545: Updated samples index for Daytona development environment manager
Contributed to open-source dev environment standardization project ($7M funded startup)

Active Community Member

Regular contributor to developer communities on DEV.to
Published tutorials on AI/ML, DevOps, and full-stack development
Mentored developers on Daytona, Fluvio, MindsDB, and CoAgent implementations

📈 GitHub Stats

🏅 Achievements & Certifications

🏆 $100 Winner - Quine Quest 22 for AI-powered document summarizer
📜 ICDEC-2025 Presenter - Cardiac Risk Prediction using ML and IoT
📜 NCRTST-2025 Publisher - FPGA-based Timing Generator
🎯 Hackathon Participant - Daytona Challenge 023
💻 Active Open Source Contributor - Multiple repositories across AI/ML domain

📫 Connect With Me

📧 Email: sayantanpal100@gmail.com
💼 LinkedIn: sayantan-pal-05b99b125
🐙 GitHub: @sayantan007pal
📊 Kaggle: @sayantan007pal
Zindi: @sayantan007pal

💡 What I'm Learning

🧠 Advanced Deep Learning architectures (Transformers, GANs, Diffusion Models)
🔬 Quantum Computing and Quantum ML
⚡ Advanced VLSI Design and Verification
🎯 MLOps and Production ML Systems
🌐 Distributed Systems and Microservices Architecture

⚡ "Building the future, one commit at a time" ⚡

📌 Note: Currently exploring opportunities in AI/ML Engineering, Full-Stack Development, and VLSI Design roles. Open to collaborations on innovative projects!