Job Description

This role outlines the day-to-day tasks which involve refactoring legacy Spark jobs to new standards, upgrading Airflow jobs, and completing migrations. Manager mentions that AI-based automations are available to assist with code refactoring, and three full-time engineers will provide support. The required skills include expertise in Airflow and Spark, with AWS exposure being a plus also experience with modern editors like Cursor would be beneficial.The role involves refactoring existing data pipelines, so the new hires are not expected to build it from scratch.

Non negotiable Skills:

Python, Spark, and Airflow experience, with at least 3 years of experience in these tools, though Airflow experience could be less than 3 years. 

Candidates without all three required skills might be considered if they have strong experience in Python and Spark.

Nice to Haves:

AWS experience, particularly with S3 and EMR, is desirable but not strictly required.

Interview process: 2 rounds 

Coding round focusing on Python and SQL /PySpark 

Manager is open for candidates who will request to WFH but would be open to come to office if required. 

Regular time: 10 am to 7pm

Duration: 6 months ( no obligation for renewal - depending on Business needs and performance)

Get to Know the Role
You will support the mission of the team by maintaining and extending the platform capabilities through implementation of new features and continuous improvements. You will also explore new developments in the space and continuously bring them to our platform there by helping the data community at Client

The Critical Tasks You Will Perform
? You will maintain and extend the Python/Go/Scala backend for Client's Airflow, Spark, Trino and Starrocks platform ? You will modify and extend Python/Scala Spark applications and Airflow pipelines for better performance, reliability, and cost. ? You will design and implement architectural improvements for new use cases or efficiency.
? You will build platforms that can scale to the 3 Vs of Big Data (Volume, Velocity, Variety)
? You will follow various testing best practices and SRE best practices to ensure system stability and reliability.

 

Qualifications What Essential Skills You Will Need
? Software Engineering, Computer Science, or related undergraduate degree. Proficient in at least one of the following: Python, Go, or Scala and strong appetite to learn other programming languages.
? You have 3-5 years of relevant professional experience
? Good working knowledge in 3 or more of the following: Airflow, Spark, relational databases (ideally MySQL), Kubernetes, Starrocks, Trino, and backend API implementation and being passionate about learning the others.
? Experience with AWS services (S3, EKS, IAM) and infrastructure as code tools like Terraform.
? Proficiency in CI/CD tools (Jenkins, GitLab, etc.)
? You are highly motivated to work smart and intelligently using available AI resources at Client Skills that are Good to have
? Proficient in Kubernetes with hands of experience with building custom resources using frameworks like kubebuilder.
? Proficient in Apache Spark, with good knowledge of resource managers like Yarn, Kubernetes and how spark interacts and work with them
? Advanced understanding of Apache Airflow and its working with Celery and/or Kubernetes executor backend with exposure to Python SQLAlchemy framework.
? Advanced knowledge of other query engines like Trino, Starrocks and others
? Advanced knowledge of AWS Cloud
? Good understanding of lakehouse table formats like Iceberg and Delta lake, how query engines work with it.