Home Job Details
N
Information Technology 🏢 Full Time ⭐️ Verified

Lead AI Infrastructure Engineer | San Francisco, CA

Nexus Future Systems
San Francisco
Estimated Salary
USD 180.000 – USD 250.000
Live Update
17 Mei 2026
Deadline
17 Mei 2027

Job Description

We are building the foundation for the next decade of intelligent computing. As a Lead AI Infrastructure Engineer at Nexus Future Systems, you will architect the high-performance, scalable infrastructure that powers our proprietary generative AI models. You will bridge the gap between cutting-edge machine learning research and robust, production-grade engineering.

In this pivotal role, you will lead a team of engineers in deploying containerized microservices, optimizing large-scale data pipelines, and ensuring our systems are resilient, secure, and ready for the demands of 2026 and beyond. If you thrive in a fast-paced, innovative environment and want to define the future of tech, we want you on our team.

Responsibilities

  • Architect & Deploy: Design and implement scalable, cloud-native infrastructure (Kubernetes, AWS, GCP) for high-velocity AI workloads.
  • Performance Optimization: Analyze and optimize ML training and inference pipelines to reduce latency and improve cost-efficiency.
  • Collaboration: Partner with Data Scientists and ML Engineers to integrate models seamlessly into production environments.
  • Security & Compliance: Enforce enterprise-grade security protocols and data governance standards across all infrastructure layers.
  • Team Leadership: Mentor junior engineers, conduct code reviews, and drive technical best practices within the engineering organization.
  • Disaster Recovery: Develop and maintain robust disaster recovery plans to ensure 99.99% system uptime.

Qualifications

  • Experience: 8+ years of experience in software engineering, with at least 3 years in a lead role focusing on infrastructure.
  • Core Tech: Deep expertise in Python, Go, or Rust, and experience with containerization tools (Docker, Kubernetes).
  • Cloud Mastery: Proven track record of architecting solutions on AWS, GCP, or Azure with a focus on serverless or managed services.
  • ML Knowledge: Strong understanding of machine learning operations (MLOps) and large-scale data processing frameworks (Spark, Kafka, Airflow).
  • Problem Solving: Exceptional ability to troubleshoot complex, distributed system issues in real-time.

Required Skills

Python Kubernetes AWS GCP Docker Machine Learning MLOps Data Engineering Cloud Architecture Go Rust

Ready to Take This Challenge?

Make sure your resume is ready. Submit your application now before the deadline.

Apply Now

Related Jobs

Similar job recommendations for you

View All