EngineerJobs.io
← Back to all jobs
RCR Technology Corporation

ETL/Data Engineer

Remote Remote Full time Posted 2d ago

Job Description

Remote opportunity for a Senior Azure Data Engineer / ETL Data Engineer to architect, build, and operate an enterprise data platform on Microsoft Azure, owning end-to-end data pipelines and data products for analytics, regulatory reporting, dashboards, and AI/ML use cases.

Responsibilities

  • Create scalable, reusable ingestion and transformation pipelines driven by parameters, leveraging Azure Data Factory, Synapse Pipelines, Data Bricks, and Microsoft Fabric Data Factory.
  • Implement a medallion style architecture (Bronze / Silver / Gold) on Azure Data Lake Storage Gen2 using Delta Lake, Parquet, and structured streaming patterns.
  • Develop high-performance ELT workflows with pushdown to source systems such as Synapse Dedicated SQL Pool, Azure SQL, and Teradata where applicable.
  • Build and optimize PySpark notebooks and jobs on Azure Databricks or Synapse Spark.
  • Design analytics-ready dimensional models (Kimball star/snowflake) and data vault patterns for consumption.
  • Apply Slowly Changing Dimensions (Type 1/2/3), Change Data Capture, and late-arriving data patterns.
  • Tune distributed SQL workloads in Synapse Dedicated SQL Pool or Fabric Warehouse, including distribution keys, partitioning, and clustered column store indexes.
  • Set up CI/CD for data pipelines using Azure DevOps (YAML pipelines, ARM/Bicep/Terraform) across Dev, SIT, UAT, and Prod environments.
  • Instrument pipelines with comprehensive logging, auditing, and monitoring using Azure Monitor, Log Analytics, and KQL.
  • Define and enforce coding standards, code reviews, branching strategies, and release management processes.
  • Contribute to legacy-to-cloud migrations such as Informatica PowerCenter to Azure Data Factory and on-premises Teradata / Oracle / SQL Server to Synapse or Fabric.
  • Perform workload assessment, capacity planning, and cost modeling for target-state architectures.
  • Provide production incident response for critical pipelines.

Requirements

  • Hands-on expertise with Azure Data Factory including pipelines, datasets, linked services, triggers, parameterization, mapping data flows, and all three Integration Runtime types (Azure, Self-hosted, SSIS).
  • Strong experience in Data Bricks and PySpark.
  • Production experience with one or more of: Azure Synapse Analytics (Dedicated and Serverless SQL Pools, Spark Pools) or Azure Databricks (Delta Lake, Unity Catalog) or Microsoft Fabric (Warehouse, Lakehouse, OneLake).
  • Solid working knowledge of Azure Data Lake Storage Gen2 (hierarchical namespace, RBAC + ACLs, lifecycle management, security).
  • Experience with Azure Key Vault, Azure AD / Entra ID (including managed identities and service principals), and private networking (VNet integration, private endpoints).
  • Monitoring and troubleshooting with Azure Monitor, Log Analytics, and KQL.
  • Advanced SQL including window functions, common table expressions, query optimization, execution plan analysis, and performance tuning.
  • Strong Python for data engineering (pandas, PySpark, REST API integration, unit testing with pytest).
  • Proficient in T-SQL; familiarity with Spark SQL, KQL, PowerShell, and Bash scripting.

Technologies

  • Azure Data Factory
  • Synapse Pipelines
  • Data Bricks
  • Microsoft Fabric Data Factory
  • Delta Lake
  • Parquet
  • Structured streaming
  • PySpark
  • Azure Databricks
  • Synapse Spark
  • Unity Catalog
  • Azure Data Lake Storage Gen2
  • Azure Key Vault
  • Azure AD / Entra ID
  • Managed identities
  • Service principals
  • Azure Monitor
  • Log Analytics
  • KQL
  • T-SQL
  • Spark SQL
  • PowerShell
  • Bash
  • Python
  • pandas
  • REST API
  • pytest
  • Informatica PowerCenter
  • Azure DevOps
  • YAML pipelines
  • ARM
  • Bicep
  • Terraform
  • SQL Server
  • Teradata
  • Oracle
  • Snowflake
  • Informatica
  • Azure Synapse Analytics
  • Fabric
  • OneLake

Similar Jobs

Get Job Alerts

New jobs delivered to your inbox.