Data Solutions Engineer
Job Description
This hybrid Data Solutions Engineer role at Citi offers the chance to shape next generation data platforms while working across cloud migration initiatives and cross-functional teams. The role is based in Irving, Texas with a listed opportunity in Jacksonville, Florida, and offers a competitive salary range of USD 107,120 to 160,680 per year. Citi provides a comprehensive benefits package that supports health, retirement, wellness, and work-life balance through discretionary and formulaic incentive programs, medical/dental/vision coverage, a 401(k), life and disability insurance, wellness programs, and paid time off.
Responsibilities
- As a core member of the Data Engineering team, design and build scalable Big Data solutions.
- Collaborate with domain experts, product managers, analysts, and data scientists to create robust pipelines in Hadoop or Snowflake environments.
- Deliver a data as a service framework to enable data accessibility and governance across the organization.
- Lead and execute the migration of all legacy workloads to cloud platforms, coordinating across stakeholders.
- Engage with stakeholders to elicit and document requirements, including detailed data flow specifications.
- Assess solution options and work with cross-functional teams to drive optimal implementations.
- Partner with data scientists to build pipelines from heterogeneous data sources and provide engineering services for data science applications.
- Research and evaluate open-source technologies, recommending and integrating suitable components into designs.
- Serve as a technical expert, mentoring teammates on Big Data and Cloud technology stacks.
- Define requirements for maintainability, testability, performance, security, quality, and usability across the data platform.
- Drive the implementation of consistent patterns, reusable components, and coding standards across data engineering processes.
- Convert SAS based pipelines into modern languages such as PySpark and Scala for Hadoop and non-Hadoop ecosystems.
- Optimize Big Data applications on Hadoop and non-Hadoop platforms for peak performance.
- Evaluate new IT developments and evolving business needs, recommending system enhancements aligned with industry standards.
- Assess risk and ensure compliance in decision making, safeguarding Citi, its clients, and assets while escalating and addressing control issues with transparency.
Requirements
- 5+ years of experience with Hadoop and Big Data technologies.
- Proficiency in Python, PySpark, and Scala, including hands-on experience with fundamental machine learning libraries.
- Experience building robust data solutions on Google Cloud or AWS; relevant certifications preferred.
- Experience with SAS.
- Experience with containerization and related technologies such as Docker and Kubernetes.
- Comprehensive understanding of software engineering and data analytics.
- Hands-on knowledge of the Hadoop ecosystem and Big Data technologies (HDFS, MapReduce, Hive, Pig, Impala, Kafka, Kudu, Solr).
- Knowledge of Agile (Scrum) development methodologies.
- Strong development and automation skills.
- System-level understanding of data structures, algorithms, distributed storage, and compute.
- A proactive approach to solving complex business problems, with strong interpersonal and teamwork skills.
- Bachelor’s degree in a related field.
Technologies
- Hadoop
- Snowflake
- Python
- PySpark
- Scala
- Google Cloud
- AWS
- SAS
- Docker
- Kubernetes
- HDFS
- MapReduce
- Hive
- Pig
- Impala
- Kafka
- Kudu
- Solr
- Java
- Apache Beam