Senior Vice President, Data Pipeline Engineer – Ontology & Investment Data Standard (IDS)
Job Description
Bank of New York Mellon Corporation seeks a Senior Vice President, Data Pipeline Engineer to architect and scale the data pipelines that power the Investment Data Standard (IDS) and its knowledge graph. This leadership role partners with ontology and knowledge-architecture teams to create a unified data ecosystem, handling data from both internal platforms and external vendors. The position is based onsite in Pittsburgh, PA, with an alternative in Lake Mary, FL.
Responsibilities
- Design and implement scalable pipelines that ingest and process data across batch, streaming, and near real-time patterns from internal systems and external vendors.
- Transform a variety of data formats — including APIs, flat files, streaming data, and unstructured sources — into clean, standardized time-series and event-driven datasets aligned to IDS entity models.
- Create reusable frameworks to normalize identifiers, symbology, units, hierarchies, and event data such as corporate actions and transactions.
- Collaborate with the Ontology/Knowledge architecture team to map source data to canonical entities, relationships, and attributes, enabling graph ingestion and entity resolution.
- Establish robust data quality controls with full lineage, provenance, and traceability tied to the IDS product lineage.
- Support multi-vendor data ingestion, comparison, and reconciliation, including source prioritization, hierarchy logic, and coverage/quality analytics.
- Build modular, cloud-native pipelines optimized for scale, performance, and cost, leveraging platforms like Snowflake, with monitoring and SLA-driven reliability.
- Work across functions to translate business and data requirements into production-ready pipelines and support downstream distribution via APIs, data products, and client platforms.
Requirements
- Bachelor’s degree in a related discipline or equivalent work experience; an advanced degree with a preference for statistics/statistical analysis is preferred.
- Minimum six years of total experience, including at least three years focused on data analysis and business intelligence.
- Extensive background in data engineering, building and scaling production-grade pipelines.
- Hands-on expertise in Python, Spark, and SQL, with strong ETL/ELT frameworks and orchestration tools.
- Proven ability to design and operate high-volume, resilient pipelines across batch, streaming, and distributed environments.
- Solid understanding of structured and semi-structured data modeling, including time-series and event-driven architectures.
- Experience designing data transformation and normalization layers, covering schema evolution and backward compatibility.
- Proficiency with modern data platforms (Snowflake, AWS, Databricks), lakehouse architectures, and API-based data integration.
- Strong skills in performance tuning, cost optimization, and implementing data quality, monitoring, logging, and lineage frameworks.
- Domain experience with financial datasets (market data, pricing, reference data, portfolio holdings, transactions, corporate actions) and familiarity with vendors such as Bloomberg, ICE, and MSCI.
- Exposure to knowledge graph/ontology-driven systems, entity resolution workflows, AI/LLM-based unstructured data integration (documents, PDFs), and data entitlements, licensing, and usage tracking is preferred.
Technologies
- Python
- Spark
- SQL
- Snowflake
- AWS
- Databricks