Senior Data Engineer focused on streaming data systems with regular exposure to ML, NLP, and backend engineering.
I design and operate pipelines where reliability, scalability, and correctness matter — often at the intersection of real-time data, machine learning inference, and downstream analytics.
- Real-time and near-real-time data pipelines
- Streaming architectures (event-driven, stateful processing)
- Large Scale Batch Processing
- Formalizing requirements for supporting critical processes
- ML & NLP integration in production data systems
- Data quality, observability, and backfills
- Bridging data engineering with backend services
Languages
- Python, SQL
- Java, Scala
- Rust (systems & streaming)
Streaming & Data
- Kafka, Redis, RabbitMQ
- Spark / Flink
- Airflow / Dagster
ML & NLP
- Feature & inference pipelines
- Text processing & embeddings
- Model integration and monitoring
Storage & Analytics
- PostgreSQL
- Cloud object storage (S3/GCS)
- NoSQL DBs (Cassandra, ScyllaDB, DynamoDB, FireBase)
- Iceberg / Delta Lake
Infra
- Docker
- CI/CD (GitHub Actions)
- Linux
- Streaming systems in Rust
- Event-time correctness and state management
- ML observability in data pipelines
- Vector search & retrieval in production
- LinkedIn: https://www.linkedin.com/in/utsav-goel


