Senior Lead Machine Learning Engineer
Job Description
Senior Lead Machine Learning Engineer at Capital One responsible for productionizing ML applications at scale, focusing on architecture, design, and deployment within Agile teams.
Responsibilities
- Design and deliver ML models and components that address real business needs, collaborating with Product and Data Science teams.
- Guide ML infrastructure choices by applying knowledge of modeling techniques, including model selection, data handling, feature engineering, training, hyperparameter tuning, dimensionality, bias-variance considerations, and validation.
- Develop and validate code and ML models, automate tests, and support deployment to production to solve complex problems.
- Work within a cross-functional Agile team to build software powering advanced big data and ML applications.
- Retrain, maintain, and monitor models in production to ensure performance and reliability.
- Leverage cloud-based architectures and platforms to deliver scalable ML solutions.
- Construct efficient data pipelines that feed ML models.
- Apply CI/CD best practices, test automation, and monitoring to ensure reliable deployment of models and code.
- Maintain secure, well-governed code and ensure ML practices align with Responsible and Explainable AI standards.
- Work with Python, Scala, or Java to implement solutions.
Requirements
- Bachelor’s Degree
- Minimum 8 years designing and building data-intensive solutions on distributed computing platforms (no internship credits).
- Minimum 4 years programming in Python, Scala, or Java.
- Minimum 3 years building, scaling, and optimizing ML systems.
- Minimum 2 years leading teams that develop ML solutions.
Technologies
- Python
- Scala
- Java
- scikit-learn
- PyTorch
- Dask
- Spark
- TensorFlow
- AWS
- Azure
- Google Cloud Platform
Benefits
- Health benefits
- Financial benefits
- Performance-based incentives (cash bonuses and long-term incentives)
Preferred Qualifications
- Master’s or doctoral degree in computer science, electrical engineering, mathematics, or a related field
- Experience developing and deploying ML solutions in a public cloud such as AWS, Azure, or Google Cloud Platform
- 4+ years using industry-recognized ML frameworks such as scikit-learn, PyTorch, Dask, Spark, or TensorFlow
- 3+ years delivering performant, resilient, and maintainable code
- 3+ years of data gathering and preparation for ML models
- 3+ years of people management experience
- Contributions to ML industry impact through conference talks, papers, blogs, open source, or patents
- 3+ years building production-ready data pipelines for ML models
- Ability to communicate complex technical concepts clearly to diverse audiences