Data Engineer
Job Description
Amazon.com Services LLC is seeking a Data Engineer to design scalable data pipelines and ML-ready infrastructure that enable GenAI-powered analytics across its Operations Technology ecosystem. This onsite role in Austin, TX focuses on delivering robust data platforms that support global operations and data-driven decision making. The position offers an annual salary range of USD 125,500 to 178,800.
Responsibilities
- Design, implement, and maintain production-grade ETL/ELT pipelines and scalable big data infrastructure that underpin Operations Technology operational intelligence.
- Develop feature engineering workflows and ML-ready data pipelines to support data science experimentation and production model serving.
- Contribute to data governance and quality standards across analytical and ML data products.
- Assist in deploying GenAI solutions for automated reporting, diagnostics, predictive, and prescriptive analytics.
- Build and maintain semantic layers and dashboard data models that inform global operations decisions.
- Collaborate with Program Managers, BI teams, ML Engineers, Data Scientists, and operational stakeholders to prioritize work aligned with OTS goals.
- Adhere to and contribute to data engineering best practices, including code reviews, testing, monitoring, and documentation.
Requirements
- At least 3 years of data engineering experience.
- At least 3 years of developing and operating large-scale data structures for business intelligence analytics with data modeling experience.
- Experience in data modeling, data warehousing, and building ETL pipelines.
- Experience with AWS technologies such as Redshift, S3, AWS Glue, EMR, Kinesis, Firehose, Lambda, and IAM roles and permissions.
- Background in data warehouse architectures, data modeling, infrastructure components, ETL/ELT processes, reporting and analytic tools, data structures, and hands-on SQL coding.
- Bachelor’s degree or higher in computer science, machine learning, engineering, or related fields, or equivalent experience building and maintaining data flows and pipelines.
- Proficiency in Python and SQL; experience with PySpark or Apache Spark.
- Experience with infrastructure-as-code (CDK, CloudFormation) and CI/CD pipelines for data and ML systems.
- Experience with data modeling and relational/non-relational database design.
Technologies
- Redshift
- S3
- AWS Glue
- EMR
- Kinesis
- Firehose
- Lambda
- IAM
- Python
- SQL
- PySpark
- Apache Spark
- CDK
- CloudFormation
Benefits
- Medical, Dental, and Vision Coverage
- Maternity and Parental Leave Options
- Paid Time Off (PTO)
- 401(k) Plan