GCP Data Engineer position in Pune,Maharashtra

Job Description

Key Responsibilities :

Data Pipeline Development: Build and optimize ETL/ELT workflows using Airflow, Prefect, or similar orchestration tools.
Data Integration & Transformation: Ingest data from multiple sources (transactional, behavioural, streaming) and prepare feature-ready datasets.
Application Integration: Connect pipelines to Salesforce, Gainsight, and other enterprise applications for real-time data sync.
BI Semantic Modelling: Design and maintain semantic layers for BI tools (Power BI, Looker, Tableau) to enable self-service analytics.
Version Control & Experiment Tracking: Implement DVC for dataset versioning and integrate MLflow for experiment reproducibility.
Distributed Data Processing: Use Spark, Ray, or Databricks for large-scale data transformations and feature engineering.
Cloud Infrastructure: Deploy and manage pipelines on Azure Data Factory, Azure Synapse, GCP Dataflow, and Big Query; leverage storage (ADLS, GCS) and compute clusters.
MLOps Enablement: Collaborate on CI/CD workflows for ML models, feature store integration, and monitoring pipelines.
Data Quality & Governance: Implement validation, lineage tracking, and compliance checks for secure and reliable data.
Performance Optimization: Profile and tune pipelines for cost efficiency and low latency.
Collaboration: Partner with Data Scientists to ensure timely delivery of clean, well-structured data for ML/DL models.

Required Skills : Technical:

Python (pandas, PySpark), SQL; familiarity with Scala is a plus.
Orchestration: Airflow, Prefect, DBT.
Distributed frameworks: Spark, Ray, Databricks.
Cloud platforms: Azure (Data Factory, Synapse, ADLS) and GCP (Dataflow, BigQuery, GCS).
BI Semantic Modelling: Power BI, Looker, Tableau.
Application Integration: Salesforce, Gainsight, REST APIs.
Containerization: Docker; basic Kubernetes.
Versioning: Git, DVC; experiment tracking: MLflow.
Streaming: Kafka or Pub/Sub for real-time ingestion.

Apply

Job Details