Senior Data Engineer - Vice President
Job Description
Join Citi in Irving, Texas as a Senior Data Engineer - Vice President and shape enterprise scale data platforms. This onsite role offers a competitive salary range, comprehensive benefits, and the chance to lead cloud based data infrastructure, orchestrate scalable pipelines, and collaborate with data scientists on AI initiatives including retrieval augmented generation and Agentic AI. You will provide technical leadership within an Agile, client facing environment and mentor junior engineers as you drive data quality and governance across the analytics lifecycle.
Salary: $125,760 - $188,640 per year • Location: Irving, Texas (onsite)
Benefits and culture
- Medical, dental, and vision coverage
- 401(k)
- Life, accident, and disability insurance
- Wellness programs
- Planned time off (vacation)
- Unplanned time off (sick leave)
- Paid holidays
Responsibilities
- Design, build, and maintain scalable ETL and ELT pipelines using PySpark, Spark SQL, and Delta Lake on Databricks to ingest, transform, and integrate large datasets across cloud platforms.
- Manage cloud data platforms on AWS, GCP, or Azure, leveraging cloud native services for storage, processing, and analytics.
- Work with Databricks, Snowflake, and open table formats such as Apache Iceberg to process petabyte scale data.
- Optimize Spark workloads and Databricks clusters through tuning, partitioning strategies, caching, and autoscaling to enhance performance and control costs.
- Implement and govern Lakehouse architectures using Delta Lake and Unity Catalog to ensure data quality, schema evolution, and secure data for analytics.
- Lead the design and architecture of Starburst based data solutions, ensuring scalability, performance, and reliability for enterprise platforms.
- Develop and manage data federation strategies using Starburst connectors to query across diverse systems.
- Identify bottlenecks and optimize data pipelines and queries for performance, storage efficiency, and cost effectiveness.
- Build and maintain robust data pipelines with strong governance, ensuring data quality, lineage, and compliant data movement from ingestion to consumption.
- Collaborate with data scientists on AI model development and deployment, supporting RAG and Agentic AI initiatives through reliable data infrastructure.
- Operate within an Agile framework, participating in sprint planning, daily stand ups, and retrospectives to deliver iterative milestones.
- Provide technical leadership, mentor junior engineers, and guide projects to align with client needs and organizational strategy.
- Serve as a primary point of contact for stakeholders and clients, translating complex requirements into actionable technical tasks.
Requirements
- Expert level Python proficiency with data ecosystem experience (Pandas, NumPy, Dask); production grade code for data processing, automation, and API development.
- Extensive PySpark experience with DataFrame API, Spark SQL, and performance tuning for distributed processing.
- Proven Databricks Lakehouse Platform experience, including Delta Lake, structured streaming, and optimizing Spark jobs.
- Strong Ab Initio experience (GDE, Co>Operating System, Conduct>It) for enterprise ETL design and execution.
- Snowflake expertise in data warehouse design, data modeling, security (RBAC), performance tuning, Snowpipe, and Time Travel.
- Experience with Starburst or Trino for federated querying across multiple data sources.
- Familiarity with Apache Iceberg for managing large analytic datasets.
- In depth cloud provider experience with at least one major platform (AWS, Google Cloud Platform, or Azure).
- Hands on experience building cloud native data pipelines using services such as AWS Glue, Lambda, S3, Redshift; Azure Data Factory, Synapse Analytics; or Google Cloud Composer, Dataflow, BigQuery.
- Understanding of the data lifecycle for machine learning and experience building data pipelines to support AI/ML models, including preparing data for vector databases used in RAG and Agentic AI.
- Strong Agile and Scrum proficiency with a track record of delivering iteratively.
- Demonstrated leadership ability, influence on architecture, and successful project outcomes with stakeholder management.
- 6 to 10 years of hands on data engineering experience in a large scale enterprise or financial services setting.
- Experience leading project work streams and mentoring junior team members; Applications Development managerial experience and senior level roles preferred.
- Relevant certifications such as AWS Certified Big Data, Google Professional Data Engineer, or Snowflake SnowPro.
- Experience with containerization (Docker) and orchestration (Kubernetes).
- Solid understanding of data governance, data quality, and data security principles.
- Excellent analytical and problem solving abilities; ability to work independently or as part of a team; clear written and verbal communication.
Anticipated posting close date: May 18, 2026