AI Infrastructure Engineer (DevOps/MLOps)

Our company:

    Techvantage.ai is a next-generation technology and product engineering company at the forefront of innovation in Generative AI, Agentic AI, and autonomous intelligent systems. We build intelligent, cutting-edge solutions designed to scale and evolve with the future of artificial intelligence.

    Role Overview:

    We are looking for a skilled and versatile AI Infrastructure Engineer (DevOps/MLOps) to build and manage the cloud infrastructure, deployment pipelines, and machine learning operations behind our AI-powered products. You will work at the intersection of software engineering, ML, and cloud architecture to ensure that our models and systems are scalable, reliable, and production-ready.


What we are looking from an ideal candidate?

    • Design and manage CI/CD pipelines for both software applications and machine learning workflows.
    • Deploy and monitor ML models in production using tools like MLflow, SageMaker, Vertex AI, or similar.
    • Automate the provisioning and configuration of infrastructure using IaC tools (Terraform, Pulumi, etc.).
    • Build robust monitoring, logging, and alerting systems for AI applications.
    • Manage containerized services with Docker and orchestration platforms like Kubernetes.
    • Collaborate with data scientists and ML engineers to streamline model experimentation, versioning, and deployment.
    • Optimize compute resources and storage costs across cloud environments (AWS, GCP, or Azure).
    • Ensure system reliability, scalability, and security across all environments.

Preferred Skills:

What skills do you need?

    • 5+ years of experience in DevOps, MLOps, or infrastructure engineering roles.
    • Hands-on experience with cloud platforms (AWS, GCP, or Azure) and services related to ML workloads.
    • Strong knowledge of CI/CD tools (e.g., GitHub Actions, Jenkins, GitLab CI).
    • Proficiency in Docker, Kubernetes, and infrastructure-as-code frameworks.
    • Experience with ML pipelines, model versioning, and ML monitoring tools.
    • Scripting skills in Python, Bash, or similar for automation tasks.
    • Familiarity with monitoring/logging tools (Prometheus, Grafana, ELK, CloudWatch, etc.).
    • Understanding of ML lifecycle management and reproducibility.

    Preferred Qualifications:

    • Experience with Kubeflow, MLflow, DVC, or Triton Inference Server.
    • Exposure to data versioning, feature stores, and model registries.
    • Certification in AWS/GCP DevOps or Machine Learning Engineering is a plus.
    • Background in software engineering, data engineering, or ML research is a bonus.

    What We Offer:

    • Work on cutting-edge AI platforms and infrastructure
    • Cross-functional collaboration with top ML, research, and product teams
    • Competitive compensation package – no constraints for the right candidate