This onsite opportunity in San Francisco offers you the chance to join a nimble, high-ownership team building data infrastructure for cutting-edge AI systems. You’ll work hands-on with a small group of engineers and researchers, shaping scalable pipelines that handle tens to hundreds of terabytes of multimodal data, with a focus on speech-to-speech research and productionizing models. The role combines impactful development with strong collaboration and clear ownership, backed by competitive compensation and comprehensive benefits.

Benefits

Competitive base salary of USD 180,000 to 250,000 per year
Significant equity as an early team member
Immigration support
Fully covered medical, dental, and vision insurance
401(k) plan
Onsite collaboration in San Francisco
Opportunity to work with 100+ TBs of data and large-scale AI models

Responsibilities

Build and scale infrastructure and distributed data pipelines for large-scale AI and ML systems
Process and manage tens to hundreds of terabytes of multimodal data
Support data systems used for training, evaluation, and improvement of speech-to-speech AI models
Develop batch processing, real-time streaming, and distributed orchestration capabilities
Design reliable pipelines for speech data transformation, filtering, evaluation, and model improvement
Collaborate closely with a small, high-performing engineering and research team
Bridge cutting edge AI research with real-world production environments

Requirements

Experience building infrastructure and distributed data pipelines to process tens of terabytes of data
Proven track record working with multimodal data in AI/ML products or systems
Strong expertise in batch processing, real-time streaming systems, and distributed orchestration
Hands-on experience with Spark, Kafka, Flyte, Kubernetes, or similar technologies
Solid software engineering fundamentals and ability to build reliable, scalable systems
Fast learner who adapts well in a dynamic startup environment
Strong ownership mindset and ability to work independently with high autonomy
Comfort working in person with the San Francisco team

Technologies

Key tools include Spark, Kafka, Flyte, and Kubernetes

Nice to Have

Experience in early-stage startups
Independent project creation, startup experience, side projects, or open-source work
Background in transformation pipelines for speech processing
Experience with transcription, diarization, speech enhancement, filtering, or audio data processing
Background working with large-scale AI models or ML infrastructure
Interest in voice AI, speech systems, conversational AI, or multimodal AI products

Interview Process

30-minute introductory conversation
Two technical interviews
Two culture interviews
Onsite co-working session collaborating with the team on a data system

Software Engineer (Data & Evals)

Job Description

Benefits

Responsibilities

Requirements

Technologies

Nice to Have

Interview Process

Similar Jobs

Principal AI / Machine Learning Data Engineer

Data Engineer

Senior AI Engineer

Senior Data Engineer

Senior Data Engineer, Engineering Data Analytics

Senior Data Engineer