
ROLE SUMMARY
The AI Acceleration (AIA) function within the Chief Marketing Office (CMO) is the single, business-led engine that owns the design, delivery, and scale-up of priority AI capabilities across Commercial. AIA works in tight collaboration with various Pfizer functions to deploy and maintain production-grade AI solutions that simplify how we work and drive measurable value across all processes.
As a Data engineer in the newly formed AIA team, you should be able to design build, integrate, curate, and operationalize data and models into a semantic layer to power AI-enabled products. Additionally, you need to ensure interpretability, lineage of reusable data assets and uphold the bar on governance, performance measurement, and responsible AI
Data Pipeline Development
- Build the semantic layer that enables contextualized and explainable AI-driven workflows, including ontology development, entity models and knowledge graphs
- Build robust data pipelines to ingest, transform and prepare structured and unstructured datasets from diverse internal and external sources e.g. CRM platforms (e.g., Veeva) and field force deployment or alignment tools, HCP engagement data, digital metrics, and campaign data sources.
- Ensure data is clean, normalized, and optimized for downstream AI/ML and analytics use.
Infrastructure & Architecture
- Develop and manage data infrastructure using cloud platforms (e.g., AWS, Azure, GCP).
- Implement data lake, data warehouse, and real-time streaming architecture.
- Support containerization and orchestration using data management tools
- Enable real-time and batch data access for AI agents, LLM-based applications and analytical products
Data Quality & Governance
- Implement data validation, profiling, and monitoring processes to ensure accuracy and reliability.
- Collaborate with compliance teams to ensure data handling aligns with HIPAA, FDA, and other U.S. healthcare regulations.
- Maintain metadata, lineage, and audit trails for all data assets.
Collaboration & Cross-Functional Support
- Collaborate with data scientists, ML engineers and product managers to optimize data for use in RAG, autonomous AI agents and retrieval pipelines
- Support rapid prototyping and iterative development of AI solutions.
Performance Optimization
- Tune data workflows for performance, scalability, and cost-efficiency.
- Implement caching, indexing, and partitioning strategies to support high-volume data processing.
- Monitor system health and troubleshoot bottlenecks in data pipelines.
BASIC QUALIFICATIONS
- Bachelor’s or Master’s degree in Computer Science, Engineering, or related field.
- Upto 4 years of experience in data engineering, preferably in healthcare or life sciences.
- Proficiency in SQL, Python, and data pipeline frameworks (e.g., Apache Airflow, Spark, Kafka).
- Experience with cloud data platforms (e.g., AWS Redshift, Azure Synapse, Google BigQuery).
- Familiarity with pipeline orchestration (e.g. Airflow, DBT, Prefect,Dagster)
- Excellent problem-solving, communication, and collaboration skills.
- Extensive experience working in agile setting or bring agile best-practice mentorship to the team.
- Familiarity with data privacy standards, pharma industry practices/GDPR compliance is preferred.
- Prioritizes excellence in Data Engineering by following F.A.I.R. principles and adhering to engineering and documentation standards set for by the organization.
Pfizer is an equal opportunity employer and complies with all applicable equal employment opportunity legislation in each jurisdiction in which it operates.
Information & Business Tech