techvantage

Our company:

Techvantage.ai is a next-generation technology and product engineering company at the forefront of innovation in Generative AI, Agentic AI, and autonomous intelligent systems. We build intelligent, cutting-edge solutions designed to scale and evolve with the future of artificial intelligence.

Role Overview:

We are looking for a skilled and versatile AI Infrastructure Engineer (DevOps/MLOps) to build and manage the cloud infrastructure, deployment pipelines, and machine learning operations behind our AI-powered products. You will work at the intersection of software engineering, ML, and cloud architecture to ensure that our models and systems are scalable, reliable, and production-ready.

What we are looking from an ideal candidate?

Design and manage CI/CD pipelines for both software applications and machine learning workflows.
Deploy and monitor ML models in production using tools like MLflow, SageMaker, Vertex AI, or similar.
Automate the provisioning and configuration of infrastructure using IaC tools (Terraform, Pulumi, etc.).
Build robust monitoring, logging, and alerting systems for AI applications.
Manage containerized services with Docker and orchestration platforms like Kubernetes.
Collaborate with data scientists and ML engineers to streamline model experimentation, versioning, and deployment.
Optimize compute resources and storage costs across cloud environments (AWS, GCP, or Azure).
Ensure system reliability, scalability, and security across all environments.

Preferred Skills:

What skills do you need?

5+ years of experience in DevOps, MLOps, or infrastructure engineering roles.
Hands-on experience with cloud platforms (AWS, GCP, or Azure) and services related to ML workloads.
Strong knowledge of CI/CD tools (e.g., GitHub Actions, Jenkins, GitLab CI).
Proficiency in Docker, Kubernetes, and infrastructure-as-code frameworks.
Experience with ML pipelines, model versioning, and ML monitoring tools.
Scripting skills in Python, Bash, or similar for automation tasks.
Familiarity with monitoring/logging tools (Prometheus, Grafana, ELK, CloudWatch, etc.).
Understanding of ML lifecycle management and reproducibility.

Preferred Qualifications:

Experience with Kubeflow, MLflow, DVC, or Triton Inference Server.
Exposure to data versioning, feature stores, and model registries.
Certification in AWS/GCP DevOps or Machine Learning Engineering is a plus.
Background in software engineering, data engineering, or ML research is a bonus.

What We Offer:

Work on cutting-edge AI platforms and infrastructure
Cross-functional collaboration with top ML, research, and product teams
Competitive compensation package – no constraints for the right candidate