Machine Learning Engineer (Agent & Inference) - (Chinese Mandarin Speaker)
Job Description
Bitus Labs seeks a Machine Learning Engineer to develop agent systems and production inference for an online gaming product, with fluent Mandarin Chinese required.
Responsibilities
- Design, build, and optimize LLM powered agents, covering planning, tool use, workflow orchestration, and multi step reasoning
- Architect memory systems including short term memory, long term memory, context management, and session state
- Build and optimize retrieval augmented generation pipelines for relevance, grounding, freshness, and retrieval quality
- Design and operate vector store infrastructure such as pgvector, Milvus, Qdrant, and Weaviate
- Define evaluation methodologies for agents, prompts, and workflows
- Optimize end to end agent quality, latency, reliability, and operating cost
- Build and operate production inference services that are low latency, high concurrency, and highly reliable
- Serve online learning models with real time inference and online parameter or weight updates (contextual bandits, reinforcement learning policies)
- Deploy and optimize AI inference systems for latency, throughput, reliability, and resource efficiency
- Analyze and resolve inference serving bottlenecks
- Support deployment and serving of recommendation, ranking, and reinforcement learning models developed by research scientists
- Apply lightweight model adaptation techniques (LoRA, QLoRA, PEFT) when appropriate for domain specific requirements
- Build and maintain deployment pipelines, observability systems, and tracing infrastructure for agents and serving endpoints
- Monitor quality regression, performance degradation, and model drift
- Maintain version control for models, prompts, datasets, and agent configurations
- Contribute to automated validation, testing, and CI/CD workflows for AI systems
- Partner with research scientists, backend engineers, and data scientists to integrate AI systems into production products
- Document systems, best practices, and internal tooling
- Contribute to engineering standards and operational excellence across AI initiatives
Requirements
- Bachelor's or Master's degree in Computer Science, Machine Learning, or a related field
- 3+ years of industry experience in Machine Learning Engineering or related roles
- Strong software and systems engineering experience with low latency, reliable production services in Go, Rust, C++, or equivalent
- Experience building or supporting real time inference systems for recommendation, ranking, contextual bandits, reinforcement learning, or similar adaptive ML applications
- Strong experience with PyTorch and the Hugging Face ecosystem
- Experience building production LLM or agent applications (for example LangGraph, LlamaIndex, or equivalent frameworks)
- Hands on experience with RAG systems, embeddings, and vector databases
- Experience evaluating and monitoring LLM or agent systems in production
- Experience deploying and optimizing production machine learning or LLM systems
- Understanding of inference runtime behavior, resource utilization, latency optimization, and production serving performance
- Experience with Docker and Kubernetes
- Experience with cloud platforms such as AWS, GCP, or Azure
- Fluent Mandarin Chinese
Technologies
- Go
- Rust
- C++
- PyTorch
- Hugging Face
- LlamaIndex
- pgvector
- Milvus
- Qdrant
- Weaviate
- Docker
- Kubernetes
- AWS
- GCP
- Azure
- LoRA
- QLoRA
- PEFT
- CUDA
- OpenAI Triton
- TFLite
- CoreML
- FSDP
- DeepSpeed
- Spark
- Hadoop
Benefits
- 401(k)
- 401(k) matching
- Dental insurance
- Health insurance
- Life insurance
- Paid time off
- Parental leave
- Retirement plan
- Vision insurance
Pay
Salary: USD 130,000 per year
Location details
- Ability to commute: Irvine, CA 92618 (Required)
- Ability to relocate: Irvine, CA 92618, relocate before starting work (Required)
- Work location: In person