EngineerJobs.io
← Back to all jobs

Job Description

This onsite opportunity in San Francisco offers you the chance to join a nimble, high-ownership team building data infrastructure for cutting-edge AI systems. You’ll work hands-on with a small group of engineers and researchers, shaping scalable pipelines that handle tens to hundreds of terabytes of multimodal data, with a focus on speech-to-speech research and productionizing models. The role combines impactful development with strong collaboration and clear ownership, backed by competitive compensation and comprehensive benefits.

Benefits

  • Competitive base salary of USD 180,000 to 250,000 per year
  • Significant equity as an early team member
  • Immigration support
  • Fully covered medical, dental, and vision insurance
  • 401(k) plan
  • Onsite collaboration in San Francisco
  • Opportunity to work with 100+ TBs of data and large-scale AI models

Responsibilities

  • Build and scale infrastructure and distributed data pipelines for large-scale AI and ML systems
  • Process and manage tens to hundreds of terabytes of multimodal data
  • Support data systems used for training, evaluation, and improvement of speech-to-speech AI models
  • Develop batch processing, real-time streaming, and distributed orchestration capabilities
  • Design reliable pipelines for speech data transformation, filtering, evaluation, and model improvement
  • Collaborate closely with a small, high-performing engineering and research team
  • Bridge cutting edge AI research with real-world production environments

Requirements

  • Experience building infrastructure and distributed data pipelines to process tens of terabytes of data
  • Proven track record working with multimodal data in AI/ML products or systems
  • Strong expertise in batch processing, real-time streaming systems, and distributed orchestration
  • Hands-on experience with Spark, Kafka, Flyte, Kubernetes, or similar technologies
  • Solid software engineering fundamentals and ability to build reliable, scalable systems
  • Fast learner who adapts well in a dynamic startup environment
  • Strong ownership mindset and ability to work independently with high autonomy
  • Comfort working in person with the San Francisco team

Technologies

Key tools include Spark, Kafka, Flyte, and Kubernetes

Nice to Have

  • Experience in early-stage startups
  • Independent project creation, startup experience, side projects, or open-source work
  • Background in transformation pipelines for speech processing
  • Experience with transcription, diarization, speech enhancement, filtering, or audio data processing
  • Background working with large-scale AI models or ML infrastructure
  • Interest in voice AI, speech systems, conversational AI, or multimodal AI products

Interview Process

  • 30-minute introductory conversation
  • Two technical interviews
  • Two culture interviews
  • Onsite co-working session collaborating with the team on a data system

Similar Jobs

Get Job Alerts

New jobs delivered to your inbox.