Senior Data Engineer
Job Description
General Motors seeks a Senior Data Engineer for a hybrid role based in Austin, TX or Michigan to design, build, and optimize industrial data assets and pipelines that support business intelligence and advanced analytics.
Responsibilities
- Aggregate large, complex data sets to meet both functional and non-functional business requirements.
- Identify and implement process improvements, including automation, data delivery optimization, and scalable infrastructure redesign.
- Lead the development and delivery of data driven solutions across multiple languages, tools, and technologies.
- Contribute to architecture discussions, solution design, and strategic technology adoption.
- Design and optimize highly scalable data pipelines with complex transformations and efficient code.
- Create new source system integrations from diverse formats (files, database extracts, APIs).
- Build solutions for delivering data that meets SLA requirements.
- Collaborate with operations teams to troubleshoot production issues impacting the platform.
- Apply industry best practices including Agile, design thinking, and continuous deployment.
- Develop tooling and automation to streamline deployments and production monitoring.
- Partner with business and technology stakeholders, offering leadership, guidance, and best practices.
- Mentor peers and junior engineers and share knowledge on emerging industry trends and technologies.
Requirements
- Bachelor’s degree in Computer Science, Software Engineering, or a related field, or equivalent practical experience.
- At least 7 years in data engineering or development, with proficiency in Python or Scala, SQL, and relational and non-relational data storage, including ETL frameworks and big data processing (NoSQL).
- Three or more years of distributed data processing with Spark and container orchestration using Kubernetes.
- Experience delivering data streams via Kubernetes and Kafka.
- Experience with cloud platforms, with Azure preferred; AWS or GCP also considered.
- Strong understanding of CI/CD principles and tooling.
- Familiarity with big data technologies such as Hadoop, Hive, HBase, object storage (ADLS/S3), and event queues.
- Solid grasp of performance optimization techniques including partitioning, clustering, and caching.
- Proficiency in SQL, plus key-value stores and document stores.
- Familiarity with data architecture and modeling concepts to support efficient data consumption.
- Advanced understanding of data normalization and denormalization techniques.
- Ability to translate enterprise requirements into effective data models and address project needs when applicable.
- Design, build, and optimize scalable batch and streaming data pipelines using Databricks (Apache Spark, Delta Lake) to support Medalion Architecture.
- Contribute to the design and operational management of a cloud-native data platform on Azure, integrating Azure Data Lake Storage, Event Hubs, Azure SQL, and AKS.
- Monitor and troubleshoot data pipelines and platform workloads; optimize Spark jobs, cluster configurations, and SQL warehouses to improve performance.
- Strong collaboration and communication skills; ability to work across multiple teams and disciplines.
Technologies
- Python
- Scala
- SQL
- Spark
- Kubernetes
- Hadoop
- Hive
- HBase
- ADLS
- S3
- Event Queues
- Kafka
- Databricks
- Delta Lake
- Azure Data Lake Storage
- Event Hubs
- Azure SQL
- AKS
- Snowflake
- Azure
- AWS
- GCP
Benefits
- Health and wellbeing benefit programs
- Medical
- Dental
- Vision
- Health Savings Account
- Flexible Spending Accounts
- Retirement savings plan
- Sickness and accident benefits
- Life insurance
- Paid vacation and holidays
- Tuition assistance programs
- Employee assistance program
- GM vehicle discounts
Compensation
- The expected base compensation for this role is: $129,400 - $168,650. Actual base compensation within the identified range will vary based on factors relevant to the position.
- Bonus Potential: An incentive pay program offers payouts based on company performance, job level, and individual performance.
Preferred Qualifications
- Master’s degree in Computer Science, Software Engineering, or related field
- Knowledge of data governance, metadata management, or data quality/observability
- Familiarity with schema design and data contracts
- Experience handling various file formats (video, audio, image)
- Experience with Databricks, Snowflake, or similar platforms