R
Software Engineer (Data & Evals)
Job Description
This onsite opportunity in San Francisco offers you the chance to join a nimble, high-ownership team building data infrastructure for cutting-edge AI systems. You’ll work hands-on with a small group of engineers and researchers, shaping scalable pipelines that handle tens to hundreds of terabytes of multimodal data, with a focus on speech-to-speech research and productionizing models. The role combines impactful development with strong collaboration and clear ownership, backed by competitive compensation and comprehensive benefits.
Benefits
- Competitive base salary of USD 180,000 to 250,000 per year
- Significant equity as an early team member
- Immigration support
- Fully covered medical, dental, and vision insurance
- 401(k) plan
- Onsite collaboration in San Francisco
- Opportunity to work with 100+ TBs of data and large-scale AI models
Responsibilities
- Build and scale infrastructure and distributed data pipelines for large-scale AI and ML systems
- Process and manage tens to hundreds of terabytes of multimodal data
- Support data systems used for training, evaluation, and improvement of speech-to-speech AI models
- Develop batch processing, real-time streaming, and distributed orchestration capabilities
- Design reliable pipelines for speech data transformation, filtering, evaluation, and model improvement
- Collaborate closely with a small, high-performing engineering and research team
- Bridge cutting edge AI research with real-world production environments
Requirements
- Experience building infrastructure and distributed data pipelines to process tens of terabytes of data
- Proven track record working with multimodal data in AI/ML products or systems
- Strong expertise in batch processing, real-time streaming systems, and distributed orchestration
- Hands-on experience with Spark, Kafka, Flyte, Kubernetes, or similar technologies
- Solid software engineering fundamentals and ability to build reliable, scalable systems
- Fast learner who adapts well in a dynamic startup environment
- Strong ownership mindset and ability to work independently with high autonomy
- Comfort working in person with the San Francisco team
Technologies
Key tools include Spark, Kafka, Flyte, and Kubernetes
Nice to Have
- Experience in early-stage startups
- Independent project creation, startup experience, side projects, or open-source work
- Background in transformation pipelines for speech processing
- Experience with transcription, diarization, speech enhancement, filtering, or audio data processing
- Background working with large-scale AI models or ML infrastructure
- Interest in voice AI, speech systems, conversational AI, or multimodal AI products
Interview Process
- 30-minute introductory conversation
- Two technical interviews
- Two culture interviews
- Onsite co-working session collaborating with the team on a data system