Job Description

Job Title: Data Analyst/Engineer

Location: India (Bangalore)  

About the Role

As a Data Analyst & Engineer for O2C Phase-1, you will build robust batch pipelines into a managed PostgreSQL data layer to ingest from CUBE/RegBook, MetricStream and the client's Entity master. You will implement high-quality, auditable data flows with strong contracts, lineage and idempotency.

You will collaborate with the Data Architect, Integrations Engineer and Reporting to deliver reliable datasets and views that power persona-based dashboards.

Key Responsibilities

  1. Pipeline Engineering
  • Build and operate batch ingestion jobs (files/APIs) with retries, alerting and replay.
  • Implement source-to-target mappings, data quality checks, and schema evolution safely.
  1. Data Layer build
  • Create and optimize tables, indexes and views for analytics and application use.
  • Contribute to PDM standards, partitioning, retention and performance baselines.
  1. Lineage & Controls
  • Capture lineage and provenance; ensure auditability of changes and versioning.
  • Handle PII/sensitive fields per policy; follow least-privilege patterns.
  1. Collaboration
  • Work with data integrations to stabilize upstream feeds; support reporting on semantic models.
  • Support QA with data fixtures and automated validation for UAT.

Preferred Skills & Experience

  • 5–9 years in data engineering with strong SQL and ETL/ELT experience.
  • Proficiency in Python and SQL for data manipulation and data analysis. 
  • Hands-on experience with AWS services including Postgres, Step Functions, Lambda, Glue, S3.  
  • Strong understanding of data modelling, schema design, and performance tuning. 
  • Experience integrating with enterprise systems via batch/APIs; strong understanding of DQ and idempotency.
  • Familiarity with Azure data services and CI/CD for data pipelines and AWS Sage Maker is a plus.

 

Build batch ingestion pipelines into managed PostgreSQL (Flexible Server) for CUBE/RegBook, MetricStream and Entity Master. Implement source-to-target mappings, data quality checks, idempotent loads, lineage capture and schedules per O2C Phase-1 scope (no AI). Optimize schemas, indexes and views used by reporting.