Job Description

As a Senior Software Engineer- MLOps, you’ll architect and manage ML deployment pipelines, ensuring models are delivered in scalable, secure and production-grade environments. You will implement CI/CD, container orchestration, monitoring and governance frameworks, serving as a critical link between ML engineering and enterprise infrastructure aligned with HD Supply’s tech stack.

JOB REQUIREMENTS

Education and Certifications

·         Bachelor’s or Master’s degree in computer science, software engineering, or related fields

·         Certifications in DevOps, Kubernetes, GCP, or MLOps preferred

Required Experience

·         4–7 years in ML/DevOps roles, building CI/CD pipelines and deployment frameworks for ML applications

·         Experience implementing CI/CD pipelines for ML artifacts and model packaging

·         Proficient in containerization (Docker), orchestration (Kubernetes / EKS / GKE), and Airflow/Prefect pipelines

·         Hands-on support for production ML deployments: caching, load balancing, version rollback

Essential skills

·         Experience building automated ML pipelines using CI/CD tools such as Jenkins, GitLab CI, Azure DevOps

·         Proficiency in Linux administration, containerization (Docker) and Kubernetes orchestration

·         Strong hands-on experience with Google Cloud Platform (GCP)

·         Experience working with Vertex AI for scalable ML pipeline deployment

·         Knowledge of monitoring, logging and alerting frameworks (Prometheus, Grafana, ELK stack)

·         Proficiency in Python/Bash scripting and automated testing frameworks

·         Familiarity with deploying ML models as scalable API services (Seldon, KFServing)

Desired skills

·         Familiarity with Google Vertex AI pipeline

·         Understanding of Snowflake architecture and its integration points

·         Experience with feature store implementation, MLOps platform architecture

·         Certifications in cloud-native technologies, MLOps, or Kubernetes

·         Understanding of security/risk controls around ML deployments

·         Familiarity with machine learning, model development

·         Familiarity with machine learning test automation and continuous validation frameworks

·         Monitoring logs by enabling or setting up log analytics dashboard

ROLES & RESPONSIBILITIES

Delivery and Execution

·         Define the MLOps pipeline architecture: version control, model validation, deployment, rollback mechanisms

·         Work closely with ML engineers to design scalable, reliable system-level integration plans

·         Architect model lifecycle flows consistent with enterprise standards and service-level requirements

·         Build and maintain CI/CD pipelines for ML workflows, including model packaging, testing, serving

·         Deploy containers and microservices to Kubernetes or managed cloud services

·         Implement monitoring solutions to track model performance, drift, and system health

Support and Enablement

·         Automate operational tasks such as deployments, scaling, canary releases, and job scheduling via Airflow

·         Document system configurations, incident protocols, and deployment playbooks

·         Conduct post-mortems and root cause analyses for platform incidents