C
Data Engineer
Job Description
Codoxo is seeking a Data Engineer to design, build, and maintain scalable data pipelines that power analytics, reporting, and machine learning initiatives. This onsite role in Duluth, GA offers an opportunity to work under the guidance of senior engineers and help shape the data infrastructure supporting the company’s analytics needs.
Responsibilities
- Contribute to the design, construction, and upkeep of scalable ETL and ELT pipelines.
- Build and optimize batch and streaming workflows using AWS Glue, Spark, and Airflow.
- Facilitate data integration across multiple structured and unstructured sources.
- Deliver clean, efficient code in Python, PySpark, and SQL.
- Monitor, troubleshoot, and enhance pipeline reliability and performance.
- Improve database performance with an emphasis on PostgreSQL and cloud environments.
- Maintain and support AWS-based infrastructure (EC2, S3, Glue, and related services).
- Implement data validation, quality checks, and monitoring processes.
- Ensure adherence to data governance, security, and regulatory standards.
- Collaborate with data scientists and analysts to translate data requirements into scalable solutions.
- Document data flows, architecture decisions, and technical processes.
- Leverage AI-assisted development tools to accelerate work, improve testing coverage, and enhance code quality.
Requirements
- Bachelor’s degree in Computer Science, Data Engineering, Information Systems, or a related technical field, or equivalent practical experience.
- 2+ years of experience in data engineering, software engineering, or related roles (internships included).
- Proficiency in Python, PySpark, and SQL.
- Familiarity with ETL/ELT concepts and data pipeline architecture.
- Experience with relational databases, particularly PostgreSQL.
- Basic understanding of cloud computing concepts, preferably AWS.
- Exposure to distributed data processing frameworks such as Spark.
- Experience working in Linux environments and basic shell scripting.
- Strong analytical and problem-solving abilities.
- Ability to collaborate effectively in a team under mentorship.
- Strong written and verbal communication skills.
Technologies
- Python
- PySpark
- SQL
- AWS
- AWS Glue
- Spark
- Airflow
- PostgreSQL
- Linux
- Shell scripting
- Git
Benefits
- Health, dental, and vision insurance with 100% employee premium coverage starting day one
- Unlimited PTO
- Annual professional development stipend
- Annual home office stipend
- 401K match after 90 days
Preferred Qualifications
- Experience working with medical claims data is strongly preferred.
- Hands-on experience with AWS services such as EC2, S3, Glue, and IAM.
- Experience with workflow orchestration tools like Apache Airflow.
- Exposure to data warehousing concepts and dimensional modeling.
- Familiarity with CI/CD pipelines and version control (e.g., Git).
- Understanding of data security, governance, and compliance best practices.
- Experience supporting machine learning pipelines or analytics platforms.
- Demonstrated use of AI tools to improve development efficiency.
- Physical requirements: work is performed in an office environment (office or remote) and requires the ability to work at a desk, use a computer, and operate standard office equipment.