ETL/Data Engineer
Job Description
Remote opportunity for a Senior Azure Data Engineer / ETL Data Engineer to architect, build, and operate an enterprise data platform on Microsoft Azure, owning end-to-end data pipelines and data products for analytics, regulatory reporting, dashboards, and AI/ML use cases.
Responsibilities
- Create scalable, reusable ingestion and transformation pipelines driven by parameters, leveraging Azure Data Factory, Synapse Pipelines, Data Bricks, and Microsoft Fabric Data Factory.
- Implement a medallion style architecture (Bronze / Silver / Gold) on Azure Data Lake Storage Gen2 using Delta Lake, Parquet, and structured streaming patterns.
- Develop high-performance ELT workflows with pushdown to source systems such as Synapse Dedicated SQL Pool, Azure SQL, and Teradata where applicable.
- Build and optimize PySpark notebooks and jobs on Azure Databricks or Synapse Spark.
- Design analytics-ready dimensional models (Kimball star/snowflake) and data vault patterns for consumption.
- Apply Slowly Changing Dimensions (Type 1/2/3), Change Data Capture, and late-arriving data patterns.
- Tune distributed SQL workloads in Synapse Dedicated SQL Pool or Fabric Warehouse, including distribution keys, partitioning, and clustered column store indexes.
- Set up CI/CD for data pipelines using Azure DevOps (YAML pipelines, ARM/Bicep/Terraform) across Dev, SIT, UAT, and Prod environments.
- Instrument pipelines with comprehensive logging, auditing, and monitoring using Azure Monitor, Log Analytics, and KQL.
- Define and enforce coding standards, code reviews, branching strategies, and release management processes.
- Contribute to legacy-to-cloud migrations such as Informatica PowerCenter to Azure Data Factory and on-premises Teradata / Oracle / SQL Server to Synapse or Fabric.
- Perform workload assessment, capacity planning, and cost modeling for target-state architectures.
- Provide production incident response for critical pipelines.
Requirements
- Hands-on expertise with Azure Data Factory including pipelines, datasets, linked services, triggers, parameterization, mapping data flows, and all three Integration Runtime types (Azure, Self-hosted, SSIS).
- Strong experience in Data Bricks and PySpark.
- Production experience with one or more of: Azure Synapse Analytics (Dedicated and Serverless SQL Pools, Spark Pools) or Azure Databricks (Delta Lake, Unity Catalog) or Microsoft Fabric (Warehouse, Lakehouse, OneLake).
- Solid working knowledge of Azure Data Lake Storage Gen2 (hierarchical namespace, RBAC + ACLs, lifecycle management, security).
- Experience with Azure Key Vault, Azure AD / Entra ID (including managed identities and service principals), and private networking (VNet integration, private endpoints).
- Monitoring and troubleshooting with Azure Monitor, Log Analytics, and KQL.
- Advanced SQL including window functions, common table expressions, query optimization, execution plan analysis, and performance tuning.
- Strong Python for data engineering (pandas, PySpark, REST API integration, unit testing with pytest).
- Proficient in T-SQL; familiarity with Spark SQL, KQL, PowerShell, and Bash scripting.
Technologies
- Azure Data Factory
- Synapse Pipelines
- Data Bricks
- Microsoft Fabric Data Factory
- Delta Lake
- Parquet
- Structured streaming
- PySpark
- Azure Databricks
- Synapse Spark
- Unity Catalog
- Azure Data Lake Storage Gen2
- Azure Key Vault
- Azure AD / Entra ID
- Managed identities
- Service principals
- Azure Monitor
- Log Analytics
- KQL
- T-SQL
- Spark SQL
- PowerShell
- Bash
- Python
- pandas
- REST API
- pytest
- Informatica PowerCenter
- Azure DevOps
- YAML pipelines
- ARM
- Bicep
- Terraform
- SQL Server
- Teradata
- Oracle
- Snowflake
- Informatica
- Azure Synapse Analytics
- Fabric
- OneLake