Lead Analytics Engineer - Data Modeling & Quality
Job Description
Lead Analytics Engineer focusing on data modeling and data quality, owning the DBT/SQL layer to transform clinical and claims data into trusted datasets with strong data quality ownership and cross-functional collaboration within Arcadia's healthcare analytics platform.
Responsibilities
- Author, review, and maintain DBT models built on Spark/Hudi, spanning ingest through bronze and silver layers.
- Clarify data model concepts for clients by communicating assumptions and limitations via deliberate validation.
- Troubleshoot issues, implement fixes, and create DBT tests to proactively prevent problems.
- Optimize SQL performance for slow-running pipelines and queries.
- Collaborate with Data Engineering on Hudi table design, partition strategies, and incremental processing patterns.
- Triage data quality alerts, distinguishing source-level problems from transform-layer failures.
- Design and maintain volume monitors and data quality monitors (null rates, distribution checks, future-date validations).
- Author and enforce clinical DQ rules (entity volume, field coverage, LOINC coverage, referential integrity) and claims validation rules across silver and gold layers.
- Lead quality reviews for connector promotions, evaluating silver entity coverage, validation pass rates, and bronze-to-silver transformation correctness.
- Own the ticket queue for data quality, attribution, hierarchy, and customer-specific data quality issues, delivering clear, customer-facing findings.
- Guide data quality reviews during connector installation and promotion (UAT/PRD), including claims validation playbooks and null analyses.
- Partner with Data Engineering on root-cause analysis for errors, ingestion anomalies, and silver table issues surfaced by monitoring.
- Coordinate with the Measure Implementation Team when data quality affects quality measure scores.
- Contribute to and enforce data modeling standards across teams.
- Data modeling stack encompasses DBT-Spark, SQL, Claude; warehousing with Redshift, Hudi, and AWS Athena; orchestration via Argo Workflows and Airflow; observability with Grafana and Loki; issue tracking in Jira.
- Maintain robust source control through Git and GitHub with PR-based workflows.
- Work with healthcare data domains including claims (plan, professional, pharmacy), EHR clinical entities, and MPI.
Requirements
- Bachelor's or Master's degree in Computer Science, Statistics, Business, Economics, or a related field.
- Advanced SQL skills with window functions, complex CTEs, multi-step aggregations, and performance tuning on columnar databases.
- Hands-on DBT experience: authoring models, tests, macros, and YAML documentation; familiarity with incremental strategies.
- Healthcare data literacy covering claims data (professional, institutional, pharmacy), clinical data (EHR entities), and quality dimensions (member months, coverage rates, null patterns).
- Data quality mindset with the ability to separate source data issues from transform problems, design systematic validation checks, and communicate findings clearly.
- Clear communicator capable of translating technical insights for clients and non-technical stakeholders.
- Strong analytical judgment and the ability to identify anomalies in data distributions.
- Ability to manage multiple projects concurrently, leveraging AI tooling for organization and efficiency.
- Genuine interest in learning and applying AI tools to improve operations and processes.
Preferred qualifications
- Experience with Spark SQL and the Hudi table format.
- Familiarity with data quality monitoring tools.
- Comfort working in an AI-first environment using Claude to build and verify workflows.
- Exposure to population health analytics concepts such as HEDIS measures, risk adjustment, and value-based care metrics.
- Python scripting for data investigation and automation.
- Experience with Argo Workflows or similar orchestration platforms.
- Healthcare data standards familiarity: ICD-10, CPT, NDC, LOINC, NPI.
Technologies
- DBT-Spark
- SQL
- Claude
- Amazon Redshift
- Apache Hudi
- AWS Athena
- Argo Workflows
- Airflow
- Git
- GitHub
- Grafana
- Loki
- Jira
Benefits
- Collaborate with a talented team tackling complex healthcare data challenges.
- Flexible, fully remote work environment with strong support resources.
- Exposure to senior leadership and strategic initiatives.
- Be at the forefront of AI adoption, leveraging cutting-edge tools to accelerate work and shape team processes.
- Help improve data quality and reliability that inform patient care decisions.
- Join a mission-driven organization transforming the healthcare industry.
- Become part of a diverse, energized Arcadian community aligned with purpose.