Machine Learning Engineer Job at Evolve Group, Hayward, CA

Y0dXZGVzdmZjMlJBWVVVNWRNVU9lUDA9
  • Evolve Group
  • Hayward, CA

Job Description

Machine Learning Engineer

Tech start-up

San Fransisco based

We’ve partnered with one of the most ambitious and technically rigorous AI research labs in the world. Based in San Francisco, this team is building foundation models entirely from scratch.

They are now hiring ML Infrastructure Engineers to design and scale the systems that power large-scale, distributed model training. If you’ve built infrastructure that runs across hundreds of GPUs, thrive under technical complexity, and want to work side-by-side with elite AI researchers — this is the role.

Key Responsibilities:

  • Build and scale distributed training systems for large-scale model training across LLMs, vision, and robotics.
  • Set up and run large-scale training across many GPUs using tools like Kubernetes, DeepSpeed, and FSDP.
  • Troubleshoot system issues (GPU errors, network problems) and build tools to monitor and recover from failures.
  • Optimize PyTorch pipelines, sharding, and sampling strategies.
  • Collaborate closely with researchers to support novel model training at scale.

Requirements:

  • 3–15 years in ML infrastructure, systems, or research engineering roles.
  • Proven experience scaling distributed training for large models.
  • Strong with PyTorch, CUDA, NCCL, Kubernetes.
  • Familiar with setting up distributed training clusters.
  • Deep understanding of PyTorch dataloaders, data sharding, and sampling.
  • Strong communicator with a collaborative, mission-driven mindset.

This is a fully in-person role based in San Francisco , it's ideal for engineers excited to build at the edge of what's possible in AI.

Job Tags

Immediate start,

Similar Jobs

Motion Recruitment

Business System Analyst Job at Motion Recruitment

&##128640; Exciting Contract Opportunity: Technical Business Systems Analyst 3&##128205; Location: Dallas, TX | Duration: 24 months (with potential extension) Are you an experienced Business Systems Analyst ready to take on a dynamic role supporting a critical... 

SourcePro Search, LLC

Legal Lateral Recruiting Manager (Law Firm Experience) Job at SourcePro Search, LLC

 ...projects or other duties may be assigned within other areas of the LTO specifically, Talent Management, Learning & Development, and Diversity Management Department. What You'll Bring: ~ B.A. required. ~7 or more years of recruiting experience. Recruiting... 

Hale International

Senior Workday Analyst Job at Hale International

Senior Workday Analyst $160,000 + Bonus & Relocation Package You must be located in Coppell, TX, or Fort Washington, PA Are you looking for your next Workday opportunity that will provide PROGRESSION , a great company CULTURE, and EXPOSURE to the entire Workday...

O'Reilly Hospitality Management LLC

Hotel Breakfast Cook Job at O'Reilly Hospitality Management LLC

 ...Health, Dental, Vision & Life Insurance~ Paid Time Off, including Paid Parental Leave ~ Growth Potential and Career Advancement ~ Hotel/Restaurant Travel Perks & Discounts! Never wait for a paycheck again! OHM Team Members can sign up for earned wage access... 

NEON

Customer Success Manager Job at NEON

&##128640; Were Hiring: Customer Success Manager | NEON NEON is transforming how businesses engage their customers through a cutting-edge all in one solutionand were just getting started. As we continue to scale, were looking for a Customer Success Manager who...