Job Description
Job Title: Data Analyst/Engineer
Location: India (Bangalore)
About the Role
As a Data Analyst & Engineer for O2C Phase-1, you will build robust batch pipelines into a managed PostgreSQL data layer to ingest from CUBE/RegBook, MetricStream and the client's Entity master. You will implement high-quality, auditable data flows with strong contracts, lineage and idempotency.
You will collaborate with the Data Architect, Integrations Engineer and Reporting to deliver reliable datasets and views that power persona-based dashboards.
Key Responsibilities
- Pipeline Engineering
- Build and operate batch ingestion jobs (files/APIs) with retries, alerting and replay.
- Implement source-to-target mappings, data quality checks, and schema evolution safely.
- Data Layer build
- Create and optimize tables, indexes and views for analytics and application use.
- Contribute to PDM standards, partitioning, retention and performance baselines.
- Lineage & Controls
- Capture lineage and provenance; ensure auditability of changes and versioning.
- Handle PII/sensitive fields per policy; follow least-privilege patterns.
- Collaboration
- Work with data integrations to stabilize upstream feeds; support reporting on semantic models.
- Support QA with data fixtures and automated validation for UAT.
Preferred Skills & Experience
- 5–9 years in data engineering with strong SQL and ETL/ELT experience.
- Proficiency in Python and SQL for data manipulation and data analysis.
- Hands-on experience with AWS services including Postgres, Step Functions, Lambda, Glue, S3.
- Strong understanding of data modelling, schema design, and performance tuning.
- Experience integrating with enterprise systems via batch/APIs; strong understanding of DQ and idempotency.
- Familiarity with Azure data services and CI/CD for data pipelines and AWS Sage Maker is a plus.
Build batch ingestion pipelines into managed PostgreSQL (Flexible Server) for CUBE/RegBook, MetricStream and Entity Master. Implement source-to-target mappings, data quality checks, idempotent loads, lineage capture and schedules per O2C Phase-1 scope (no AI). Optimize schemas, indexes and views used by reporting.