Staff Machine Learning Engineer
Job Description
Join Home Depot / THD in a remote-friendly role based in Atlanta, GA, where you will lead production machine learning initiatives at scale. This position offers a competitive salary range of $120,000 to $190,000 and the opportunity to shape model development, deployment, monitoring, and lifecycle management across the ML stack. You will collaborate with cross-functional teams, engage in ongoing learning, and participate in conferences and communities of practice to stay at the forefront of technology.
Responsibilities
- Partner with UX, engineering, and product management teams to design secure, reliable, and scalable ML solutions.
- Collaborate with the Product Team to craft user stories that are clear for developers, easy to understand, and testable.
- Configure off the shelf solutions to meet evolving business needs.
- Develop dashboards, logging, alerts, and response plans to proactively identify and address issues.
- Engage in learning activities around modern software design, ML, and core development practices through communities of practice.
- Continuously explore articles, tutorials, and videos to stay informed about new technologies and industry best practices.
- Attend conferences to assess how new innovations can be applied where appropriate.
- Analyze business trends and behavioral data to identify opportunities for improvement and new initiatives.
- Lead evaluations and recommendations of technology products and platforms to deliver cost-effective solutions that meet requirements.
- Research and design suitable infrastructure, network, database, security, and ML architectures for products.
- Create and maintain monitoring and support tools.
- Contribute to project planning and management across multiple efforts.
- Develop formal training courses.
- Answer questions from other product or support teams and foster cross-team collaboration.
- Provide production application support and monitor service level objectives for products.
- Review performance and capacity of production components, including code, infrastructure, data, messaging, and prediction quality.
Requirements
- Must be eighteen years of age or older.
- Must be legally permitted to work in the United States.
Technologies
- Python
- SQL
- Git
- Linux / Unix
- Google Cloud Platform
- Vertex AI
- BigQuery / BigQueryML / AutoML
- Jupyter Notebooks
- Pandas / SciPy / Scikit-learn
- Gensim
- TensorFlow / PyTorch
- REST
- CI/CD
- Datastore
Travel Requirements
- Typically requires overnight travel 5% to 20% of the time.
Physical Requirements
- Most time spent seated with occasional movement; light items may be lifted on rare occasions.
Working Conditions
- Located in a comfortable indoor environment; adverse conditions are infrequent and not objectionable.
Minimum Education
- High School Diploma or GED
Preferred Qualifications
- 3 to 6 years of relevant work experience.
- Proven ability to design, train, evaluate, and deploy ML models in production, including batch and real-time inference.
- Experience with ML lifecycle management, including feature engineering, versioning, experimentation, validation, and monitoring for data drift and model degradation.
- Experience building and operating ML pipelines using cloud-native services, data platforms, and CI/CD practices for reproducible deployments.
- Strong grasp of statistics, model evaluation metrics, and tradeoffs among accuracy, interpretability, latency, and cost.
- Familiarity with algorithms such as clustering, forecasting, anomaly detection, and neural networks.
- Experience with NLP, CNNs, autoencoders, GANs, embeddings, and related architectures.
- Experience training models with very large datasets and integrating ML tools (Jupyter, Pandas, SciPy, Scikit-learn, Gensim, TensorFlow, PyTorch) into scalable systems.
- Experience with Google Cloud Platform components (Vertex AI, BigQueryML, AutoML) and data engineering practices with BigQuery and datastore.
- Proficiency in Python, SQL, Git, Linux/Unix, and CI/CD toolchains; REST and scalable web service design.
- Production systems design awareness, including high availability, disaster recovery, performance, efficiency, and security.
- Familiarity with modern ML architectures including GANs, GRUs, LSTMs, RNNs, CNNs, and style transfer.
Minimum Education
- High School Diploma or GED