Job Description

Job Description:

Get to Know the Role:
You will support the mission of the team by maintaining and extending the platform capabilities through implementation of new features and continuous improvements. You will also explore new developments in the space and continuously bring them to our platform there by helping the data community at client

The Critical Tasks You Will Perform:
You will maintain and extend the Python/Go/Scala backend for client's Airflow, Spark, Trino and Starrocks platform
You will modify and extend Python/Scala Spark applications and Airflow pipelines for better performance, reliability, and cost.
You will design and implement architectural improvements for new use cases or efficiency.
You will build platforms that can scale to the 3 Vs of Big Data (Volume, Velocity, Variety)
You will follow various testing best practices and SRE best practices to ensure system stability and reliability.

Qualifications:
What Essential Skills You Will Need
Software Engineering, Computer Science, or related undergraduate degree. Proficient in at least one of the following: Python, Go, or Scala and strong appetite to learn other programming languages.
You have 3-5 years of relevant professional experience
Good working knowledge in 3 or more of the following: Airflow, Spark, relational databases (ideally MySQL), Kubernetes, Starrocks, Trino, and backend API implementation and being passionate about learning the others.
Experience with AWS services (S3, EKS, IAM) and infrastructure as code tools like Terraform.
Proficiency in CI/CD tools (Jenkins, GitLab, etc.)
You are highly motivated to work smart and intelligently using available AI resources at client

Skills that are Good to have:
Proficient in Kubernetes with hands of experience with building custom resources using frameworks like kubebuilder.
Proficient in Apache Spark, with good knowledge of resource managers like Yarn, Kubernetes and how spark interacts and work with them
Advanced understanding of Apache Airflow and its working with Celery and/or Kubernetes executor backend with exposure to Python SQLAlchemy framework.
Advanced knowledge of other query engines like Trino, Starrocks and others
Advanced knowledge of AWS Cloud
Good understanding of lakehouse table formats like Iceberg and Delta lake, how query engines work with it.