Hi, I'm Vootla Pushpak, a Data engineer with 2 years of experience in building data pipelines & working with different data systems . I currently work at Ensono, where I design and implement data pipelines to drive business insights.
- Data Engineering: I'm fascinated by the challenges of handling large-scale data and building systems that can process, store, and analyze it efficiently.
- Programming languages: Python, SQL
- Database Management Systems: MySQL, Microsoft SQL server
- Cloud Services: Azure Data Lake, Azure Databricks, Azure Data Factory(ADF), Synapse Analytics, AzureSQL Database, Azure Key Vault, EC2, S3, RDS, Elastic Beanstalk, DynamoDB, Lambda.
- Big Data Technologies: Apache Spark, HDFS, Delta Lake
-
Built a metadata-driven ingestion framework using Azure Data Factory to migrate operational data from on-prem SQL Server to Azure SQL Database, processing over 1 million records/day across 8+ tables.
-
Designed parameterized ADF pipelines driven by JSON configurations stored in ADLS Gen2, enabling dynamic table selection and reducing manual intervention by 40%.
-
Implemented incremental load logic using watermark columns , optimizing performance and reducing data volume by over 80%.
-
Applied data validation techniques such as row count checks and checksums to ensure 100% data consistency across source and target systems.
-
Engineered a Medallion Architecture in Azure Databricks using PySpark and Delta Lake, transforming data across bronze, silver, and gold layers to support analytics and reporting.
-
Utilized Delta Lake features like schema evolution, ACID compliance, and time travel to manage complex transformations and data lineage.
-
Tuned Spark jobs using partitioning, caching, and cluster resource configuration, reducing runtime by 30% and lowering compute costs by 15%.
-
Delivered curated Gold-layer datasets to analytics teams, enabling a 25% improvement in sales forecast accuracy and 15% reduction in inventory overhead.
-
Integrated Databricks notebooks into CI/CD pipelines using Azure DevOps, automating version-controlled deployments across dev, test, and prod environments.
-
Azure Databricks & Spark Optimization: Utilized Azure Databricks to process large-scale retail data, applying Spark optimizations that improved performance. This led to a 25% increase in sales forecast accuracy and a 15% reduction in inventory costs through advanced data modeling and analysis.
- Microsoft - Azure Fundamentals - AZ 900, Azure Data Fundamentals - DP 900, Azure AI Fundamentals - AI 900
- AWS - Cloud Practitioner Foundational
- Neo4j - Neo4j Fundamentals, Cypher Fundamentals, Graph Data Modelling Fundamentals
- Google - Cloud Digital Leader
- Oracle -
- Coursera - Data Engineering Essentials
- Databricks - Generative AI Fundamentals, Lakehouse Fundamentals
- DeepLearning.AI - Generative AI for Everyone
Feel free to reach out to me on GitHub or LinkedIn if you'd like to discuss data engineering, collaborate on a project, or simply say hello!
