Data Scientist / Data Analytics Engineer
Analytics
Automation
AWS
Business Analytics
Business Intelligence
Cloud
Data
Data Analysis
Data Analytics
Data Architecture
Data Engineer
Data Integration
Data Management
Data Modeling
Data Pipeline
Data Platform
Data Processing
Data Science
Data Warehouse
Database
Dimensional Modeling
ETL
Integration
SQL
Job Description
Transflo is seeking a Data Scientist / Data Analytics Engineer to design, build, and operationalize analytics solutions across transportation and logistics, delivering both predictive and point-in-time insights on AWS.
Responsibilities
- Design, train, validate, and deploy predictive models across regression, classification, time-series forecasting, survival analysis, clustering, anomaly detection, and gradient-boosted or deep learning approaches as appropriate.
- Lead model selection, hyperparameter tuning, cross-validation, and performance evaluation using business-aligned metrics such as precision/recall trade-offs, MAPE, RMSE, lift, and calibration.
- Develop data products in transportation domains including operational metrics, fraud signals, pricing analytics, and industry trends.
- Establish model monitoring, drift detection, retraining cadence, and explainability practices (SHAP, feature importance, partial dependence) to maintain production reliability.
- Produce point-in-time analytics, KPI scorecards, and exception reporting to inform daily decisions across dispatch, fleet, customer success, finance, and product teams.
- Partner with business stakeholders to translate questions into well-scoped analyses and deliver defensible insights with documented assumptions and data lineage.
- Build and maintain reusable analytical datasets, semantic layers, and certified metrics to ensure a single source of truth.
- Design and maintain data pipelines (batch and streaming) on AWS using Redshift, S3, Glue, Lambda, Step Functions, Kinesis / MSK, EMR, Athena, and SageMaker.
- Apply medallion architecture (bronze, silver, gold) to progressively refine raw operational data into analytics-ready and ML-ready datasets.
- Utilize STARR modeling to create performant, business-friendly data models in Redshift and the warehouse layer.
- Drive data selection, curation, profiling, and quality enforcement with source-of-truth datasets, lineage documentation, and data contracts.
- Collaborate with data engineering and platform teams on CI/CD for data and ML assets, infrastructure as code, and cost-aware AWS design.
- Take customer-facing analytics features from concept to implementation in partnership with product, design, and engineering.
- Contribute to product discovery through interviews, opportunity sizing, prototyping, and rapid iteration on analytics concepts.
- Own the analytical correctness of customer-facing metrics, models, and visualizations, including edge cases and explanations for non-technical users.
- Define success metrics for shipped analytics features and drive iterative improvements post-launch.
- Translate complex analyses into clear narratives and visuals for technical and non-technical audiences, including executives and customers.
- Partner across product, engineering, operations, and commercial teams to embed analytics into workflows and customer-facing products.
- Mentor analysts and engineers on statistical rigor, modeling practices, and modern data architecture.
Requirements
- Bachelor's degree in Statistics, Mathematics, or Supply Chain Management; Computer Science is acceptable. Master’s degree preferred but not required.
- Professional experience in transportation, trucking, freight, logistics, or broader supply chain with working knowledge of loads, stops, shipments, ELD/telematics, TMS, dispatch, and billing data.
- Proven track record taking customer-facing analytics products from idea through launch, including discovery, scoping, metric and model design, and production support with real customers. Prepared to discuss at least one end-to-end example.
- Strong ability to build advanced analytical models end-to-end: problem framing, data selection, feature engineering, model training/validation, and deployment.
- Hands-on experience with AWS PaaS and analytics tooling, including Redshift and services such as S3, Glue, Lambda, Step Functions, Athena, Kinesis, EMR, and SageMaker.
- Proficiency in SQL (advanced window functions, performance tuning on Redshift or similar warehouses) and at least one analytics-language (Python preferred) with libraries like pandas, scikit-learn, statsmodels, XGBoost/LightGBM, and PyTorch or TensorFlow as appropriate.
- Experience designing and operating production data pipelines with clear orchestration, idempotency, observability, and data quality practices.
- Solid grounding in statistics including hypothesis testing, experimental design, regression, time-series, and uncertainty quantification.
Technologies
- AWS, Redshift, S3, Glue, Lambda, Step Functions, Kinesis, MSK, EMR, Athena, SageMaker
- Python, pandas, scikit-learn, statsmodels, XGBoost, LightGBM, PyTorch, TensorFlow
- Jupyter, SQL, QuickSight, Power BI, Looker
- Airflow, Git, CI/CD, Terraform, CloudFormation
- Medallion architecture, STARR (Star schema / dimensional) modeling
Preferred Qualifications
- Master’s degree in Statistics, Mathematics, Operations Research, Supply Chain, Computer Science, or a related field.
- Experience implementing medallion architecture in cloud data lakehouse or warehouse environments.
- Experience designing STARR / star-schema dimensional models for analytics consumption.
- Experience with streaming and event-driven data (Kinesis, Kafka/MSK) for near real-time analytics on transportation events.
- Experience deploying and monitoring ML models in production using SageMaker, MLflow, or equivalent MLOps tooling.
- Familiarity with BI tools and semantic layer concepts (QuickSight, Power BI, Looker).
- Exposure to optimization and operations research techniques applied to transportation problems.
- Experience with ELD/HOS data, telematics feeds, geospatial data, TMS/dispatch data, brokerage data, and understanding of transportation backoffice operations.
Core Competencies
- Analytical rigor and the ability to defend methodology and assumptions
- Business pragmatism and value-driven problem solving
- Product mindset with focus on end-user experience
- Engineering discipline and reproducible, testable code
- Stakeholder partnership and clear communication of trade-offs
- Curiosity, ownership, and root-cause problem-solving
Representative Tech Environment
- Cloud & Data Platform: AWS (Redshift, S3, Glue, Lambda, Step Functions, Athena, Kinesis, EMR, SageMaker)
- Modeling & Analysis: Python (pandas, scikit-learn, statsmodels, XGBoost/LightGBM, PyTorch/TensorFlow), SQL, Jupyter
- Data Architecture: Medallion (bronze/silver/gold), STARR dimensional models, data contracts, lineage tooling
- Orchestration & DevOps: Airflow / Step Functions, Git, CI/CD, Terraform or CloudFormation
- Visualization: QuickSight, Power BI, or Looker