Forward Deployed Data Engineer
Job Description
Join Spear AI as a Forward Deployed Data Engineer and help build real-time and offline data pipelines, scalable data warehouses, and high-performance models for maritime domain awareness. This position offers a remote-friendly arrangement with on-site collaboration with customers in Groton, CT and requires a U.S. Secret clearance. You’ll work in a practical, impact-driven environment with direct access to leadership and a culture that values clear thinking, quality work, and real-world contributions.
Benefits
- Unlimited PTO to support work–life balance
- Dedicated sick time to safeguard health
- Comprehensive medical, dental, and vision coverage
- 11 paid holidays
- Professional development with educational resources
- Collaborative environment with direct leadership and a flat structure
- Mission-driven work that supports national security and real-world impact
- Growth opportunities during a phase of expansion
- 401(k) with company match
- Onsite, remote, flexible, or hybrid work arrangements depending on role
- Relocation assistance where applicable
- Referral and performance bonuses
- Life insurance and disability coverage
- Home office stipend for technology setup
- Professional certification reimbursement where applicable
Why work with us
- We deliver tangible products, not long-running projects
- Our work directly supports national security and submarines integration work
- We prioritize responsible growth and value-aligned teammates
- Remote collaboration via real-time tools and asynchronous workflows
- We are profitable with committed investors supporting our success
- We encourage meticulous, high-quality work without unnecessary bureaucracy
- We maintain a light, collaborative culture focused on meaningful impact
Responsibilities
- Design and implement real-time data pipelines using MQTT and Redpanda for stream processing
- Build offline data pipelines for batch processing with Dagster
- Parse and process binary message formats from diverse data sources
- Develop data warehouses using Postgres, Apache Iceberg, Parquet, and S3
- Craft data models that support high-performance query workloads
- Validate and normalize data sources to ensure reliability
- Improve local development and CI/CD workflows with modern tooling and GitHub Actions
Requirements
- Active or current U.S. Secret clearance
- Expertise in time-series data processing and analysis (windowing, resampling, interpolation, etc.)
- Proficiency in Python and Rust for data engineering tasks
- Experience with binary message parsing
- Experience with both row-based and columnar data formats
- Familiarity with OLTP and OLAP databases
- Understanding of distributed systems, streaming architectures, and batch processing patterns
- Hands-on experience with batch orchestrators such as Dagster or Airflow
- Hands-on experience with streaming platforms such as Redpanda or Kafka
- Hands-on experience with binary formats like Protobuf
Technologies
- MQTT
- Redpanda
- Dagster
- Apache Iceberg
- Parquet
- Postgres
- S3
- Python
- Rust
- Protobuf
- Airflow
- GitHub Actions
- Kafka
NICE TO HAVE
- Experience with IoT devices and sensors
- Digital signal processing experience
- Geospatial analysis and GIS experience
- Familiarity with working in monorepos