Nitesh Arora nitesharora21

👋 Hey, I'm Nitesh

MS Computer Science (Machine Learning) • Georgia Tech • Senior Infrastructure Engineer | Distributed Systems

🚀 What I Build

I'm obsessed with large-scale distributed systems, ML infrastructure, and making compute resources work smarter. I spend my time architecting systems that process hundreds of commits daily, managing bare-metal clusters, and building AI-driven optimizers.

Core Passion Areas

Distributed Build Orchestration
I love building multi-threaded, event-driven systems that coordinate complex workflows across hundreds of machines. Dynamic resource allocation, load balancing across heterogeneous hardware, and achieving sub-second latency in orchestration layers gets me excited. My sweet spot is designing schedulers that intelligently distribute workload while maintaining 99.8%+ uptime.

ML-Powered Infrastructure Optimization
Applying machine learning to real infrastructure problems is where I thrive. I've built TimeBox — a time-series forecasting system using genetic algorithm for workload prediction. I'm passionate about using Random Forests, ensemble methods, and distributed training to solve capacity planning and predictive quality problems at scale.

CI/CD Pipeline Architecture
I currently work as a senior software engineer in Cisco and build tools that integrate 50+ downstream systems, orchestrate parallel validation paths, and provide sub-linear scaling characteristics. The challenge of making complex pipelines feel instantaneous to developers is what drives me.

Cryptographic Fingerprinting & Security
I've designed systems that encode intellectual capital through one-way fingerprinting for ML model ingestion. Working with crypto libraries to build tamper-proof telemetry pipelines that feed into defect prediction models is incredibly satisfying. Patent awarded for this work.

ETL & Data Pipeline Engineering
Building asynchronous data pipelines that consolidate multi-source streams into unified models. I've successfully built from scratch multiple tools, end-to-end requiring parsing, cleaning, and transforming heterogeneous data at scale — and in one project, turning 6-day manual processes into 10-second automated ones.

🛠️ Tech Stack I Live In

Languages & Frameworks

Cloud & Infrastructure

ML/AI & Data Science

Databases

Systems & Networking

Architectures: Microservices • Event-Driven • Multi-threaded • Distributed Systems
Protocols: TCP/IP • HTTP/S • REST • CORBA
Specializations: Concurrency patterns • Race condition mitigation • Network protocol analysis

🧠 Interesting Problems I've Solved

Resource Scheduler for 900+ Bare-Metal Servers
Designed dynamic allocation algorithms that distribute build workloads across a heterogeneous cluster. The challenge: minimize idle time while respecting hardware constraints and priority queues. Solution involved building custom heuristics that outperformed naive round-robin.

AI-Driven Workload Predictor
Built ML models that analyze historical usage patterns to predict future compute demand. Used time-series forecasting with genetic algorithm and gradient boosting to optimize scheduling decisions. The system actively prevents resource contention and reduces cloud waste through intelligent pre-allocation.

Cryptographic Device Fingerprinting Pipeline
Engineered a system that creates tamper-proof device signatures using one-way hashing, feeding telemetry into Random Forest models for defect prediction. The fingerprinting approach ensures data integrity while enabling ML inference on sensitive hardware telemetry from millions of devices.

Automated Test Framework with Remote Execution
Created a test orchestration system that decouples UI from logic layers in large-scale Java applications. Used CORBA middleware and Façade pattern to enable headless test execution without rebuilding.

🏆 Recognition

US Patent 11,121,952 B2 — Fingerprinting-Based Defect Detection
Co-inventor on a novel approach to encoding device telemetry for ML-driven quality prediction

Cisco Innovate Everywhere Challenge — Semi-Finalist (AI resource optimizer)
Ericsson North America IoT Showcase — 1st Place
Engineering Design Competitions — Multiple podium finishes

🎓 Academic Background

MS Computer Science | Machine Learning Specialization | Georgia Institute of Technology | 3.8 GPA
Coursework: Deep Learning, Reinforcement Learning, Artificial Intelligence, Machine Learning for Trading

BEng Computer Engineering | Concordia University

💡 What Drives Me

I'm energized by:

Systems that scale elegantly — watching a well-designed distributed system handle 10x load without breaking a sweat
ML applied to real problems — not research for research's sake, but ML that saves money, prevents failures, and optimizes resources
Infrastructure as code — treating systems like software, with proper versioning, testing, and deployment pipelines
Performance optimization — finding the bottlenecks, profiling the hot paths, and making things 10x faster
Cross-functional collaboration — integrating 50+ tools and making them feel like one cohesive platform

I thrive in environments where I can architect systems from scratch, own the full stack, and see direct impact from my work. I'm most comfortable when dealing with complex concurrency, distributed state, and resource optimization problems.

🔍 Currently Exploring

End-to-End ML Pipeline Orchestration
Building production-grade ML pipelines with Kubeflow and MLflow for model versioning, experiment tracking, and automated retraining workflows. Exploring feature stores (Feast, Tecton) for low-latency feature serving and managing feature drift in production environments.

MLOps & Model Deployment at Scale
Deep-diving into model serving infrastructure with TorchServe and TensorFlow Serving, implementing A/B testing frameworks for gradual rollouts, and building monitoring systems for model performance degradation detection. Working on automated retraining triggers based on data drift metrics using Evidently AI.

LLM Fine-tuning & Parameter-Efficient Methods
Experimenting with LoRA (Low-Rank Adaptation) and QLoRA for memory-efficient fine-tuning of large language models. Exploring PEFT (Parameter-Efficient Fine-Tuning) techniques, quantization strategies (INT8/INT4), and distributed training across multi-GPU setups using DeepSpeed and FSDP (Fully Sharded Data Parallel).

RAG Architecture & Vector Databases
Building production RAG (Retrieval-Augmented Generation) systems with focus on:

Chunking strategies and semantic splitting for optimal context windows
Dense vs. sparse retrieval trade-offs (embedding models vs. BM25 hybrid approaches)
Vector database optimization (Pinecone, Weaviate, Milvus) for sub-100ms retrieval at scale
Embedding model selection and fine-tuning for domain-specific retrieval
Reranking pipelines with cross-encoders for precision improvements
Context compression techniques to maximize token utilization

Advanced Distributed Training
Implementing model parallelism (pipeline, tensor, sequence) for training models that don't fit on single GPUs. Exploring gradient accumulation strategies, mixed precision training (FP16/BF16), and communication optimization in multi-node setups. Profiling NCCL/GLOO backends for optimal inter-GPU bandwidth.

Reinforcement Learning for Systems Optimization
Applying PPO (Proximal Policy Optimization) and DQN variants to dynamic resource allocation problems. Building gym-style environments for modeling distributed system behavior and training agents for intelligent job scheduling and capacity planning.

Kubernetes Operators for ML Workloads
Writing custom Kubernetes operators for ML-specific workflows — automated GPU allocation, spot instance management for training jobs, and custom schedulers that understand model training phases (data loading, forward pass, backprop) for optimal resource packing.

📫 Let's Connect

I love talking about distributed systems architecture, ML infrastructure challenges, and performance optimization war stories.

Open to discussing: Senior/Tech Leader Engineer roles | ML Infrastructure | Build/Release Engineering at Scale

Building systems that scale • Optimizing with ML • Shipping production infrastructure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly