Machine Learning Engineer Job at Evolve Group, Hayward, CA

Y0dXZGVzdmZjMlJBWVVVNWRNVU9lUDA9
  • Evolve Group
  • Hayward, CA

Job Description

Machine Learning Engineer

Tech start-up

San Fransisco based

We’ve partnered with one of the most ambitious and technically rigorous AI research labs in the world. Based in San Francisco, this team is building foundation models entirely from scratch.

They are now hiring ML Infrastructure Engineers to design and scale the systems that power large-scale, distributed model training. If you’ve built infrastructure that runs across hundreds of GPUs, thrive under technical complexity, and want to work side-by-side with elite AI researchers — this is the role.

Key Responsibilities:

  • Build and scale distributed training systems for large-scale model training across LLMs, vision, and robotics.
  • Set up and run large-scale training across many GPUs using tools like Kubernetes, DeepSpeed, and FSDP.
  • Troubleshoot system issues (GPU errors, network problems) and build tools to monitor and recover from failures.
  • Optimize PyTorch pipelines, sharding, and sampling strategies.
  • Collaborate closely with researchers to support novel model training at scale.

Requirements:

  • 3–15 years in ML infrastructure, systems, or research engineering roles.
  • Proven experience scaling distributed training for large models.
  • Strong with PyTorch, CUDA, NCCL, Kubernetes.
  • Familiar with setting up distributed training clusters.
  • Deep understanding of PyTorch dataloaders, data sharding, and sampling.
  • Strong communicator with a collaborative, mission-driven mindset.

This is a fully in-person role based in San Francisco , it's ideal for engineers excited to build at the edge of what's possible in AI.

Job Tags

Immediate start,

Similar Jobs

Wieland Electric North America

Logistics and Material Management Coordinator Job at Wieland Electric North America

 ...USA. About the Role We are seeking a highly motivated Logistics and Material Management Coordinator to join our growing team...  ...strong asset. Proven negotiation skills and experience in international supply chain management. Working Conditions Full-time... 

Sterling Distributors

Sales Support Associate Job at Sterling Distributors

 ...Sterling Distributors, were more than just a wholesale distributor of medical devices. Sterling Distributors is a trusted supplier of high-...  ...integrity. Role Overview This is a full-time, on-site Sales Support role. The position supports our sales team with critical... 

HealthFitness

Health Fitness Professional Job at HealthFitness

HealthFitness, a Trustmark company, is a proven leader in providing fitness solutions that engage and connect people both on-site and online, to create a strong community of health. Our work focuses on creating meaningful connections with each of our participants to help... 

ASM International

Publishing Coordinator Job at ASM International

 ...Position Summary The Publishing Coordinator plays a key role in the end-to-end production of high-quality print and digital publications, including books, journals, magazines, and online content. This position is responsible for coordinating content assets, copyediting... 

Oakland Family Services

Child Care Assistant (Temporary) Job at Oakland Family Services

 ...has proudly been named a Top Workplace for ten (10) consecutive years, voted on by our own staff. We offer a warm, engaging, equitable,...  .... Experience working with children 12 months through 12 years old or demonstrated knowledge of child development. Experience utilizing...