Senior Manager, Evaluation & QA Engineer
- Greece-Thessaloniki Chortiatis

ROLE SUMMARY
As Senior Manager, Evaluation & QA Engineer (Individual Contributor), you will own the quality, evaluation, and reliability strategy for our pip‑installable AI Capabilities SDK and its reusable modules (RAG components, agent/tool cookiecutters, MCP adaptors, standardized interfaces). You will design evaluation frameworks, test harnesses, and automation pipelines that make quality measurable and repeatable—from unit and integration tests to adversarial/fuzz testing, performance/cost benchmarking, and safety/privacy checks.
You will collaborate closely with:
Embedded AI Architects and Solution Designers to codify evaluation requirements that mirror real solution patterns.
AI Engineering to validate PI increments, integrate external technologies, and capture model/agent‑level failure modes.
DevOps Engineers to wire evaluation gates into CI/CD and release trains.
Developer Experience Engineers to ensure developers receive clear, actionable evaluation reports and quickstarts.
Digital Creation Centers and Forward Impact Teams (FITs) to gather field feedback and ensure tests reflect production reality.
Why this Role Matters
For modular AI capabilities to scale safely and predictably, we need evidence‑driven quality. This role turns evaluation into a first‑class, automated discipline—ensuring every SDK change is validated for correctness, performance, safety, and operability. You will reduce regressions, raise confidence, and accelerate adoption across products and services that rely on the SDK as their backbone.
ROLE RESPONSIBILITIES
1) Evaluation Strategy & Quality Architecture
Define the evaluation strategy for SDK modules (parsers, retrievers, rankers, connectors, agent tools, MCP adaptors), including metrics, datasets/fixtures, benchmarks, and acceptance thresholds.
Establish quality gates for release: test coverage minimums, latency/throughput/cost budgets, stability envelopes, and backward compatibility checks.
Establish source code writing principles that align with the widely accepted coding standards and ensure high maintainability degree.
Create reproducibility standards (seeds, dataset/version pins, environment parity) and evidence capture (reports, artifacts) for auditability.
2) Test Harnesses & Automation
Build unit, integration, contract, property‑based, and fuzz/adversarial test harnesses for SDK components; standardize fixtures for RAG and agent scenarios (vector stores, document corpora, tool execution).
Partner with DevOps to integrate tests and quality gates into CI/CD (parallelization, matrix runs, caching) and to fail builds on significant regressions or unmet quality bars.
Design benchmark suites for performance and cost profiling (e.g., latency distributions, throughput, memory/CPU footprints) with clear pass/fail criteria.
3) Safety, Privacy & Guardrails
Codify safety evaluations (toxicity/abuse prompts, jailbreak resistance, sensitive data leakage risks) and policy adherence tests for agent/tool behaviors.
Validate security-by-design aspects of SDK usage (authn/z integration points, secrets handling, data minimization); coordinate with Platform/Security for service consumers.
Ensure privacy and compliance checks are represented in test plans and evidence bundles (PII handling, retention, deletion workflows).
4) Observability-by-Design for Libraries
Work with Capability Architecture and DevOps to embed diagnostic hooks suitable for libraries (structured logging, metrics/traces toggles) and define SLIs relevant to consuming services.
Verify error models (exceptions/types/codes) and retry/backoff behaviors; ensure runbooks and troubleshooting guides are informed by evaluation results.
5) Data & Datasets Governance for Evaluation
Curate and version evaluation datasets (synthetic and real-world samples where appropriate); maintain data lineage and provenance.
Implement dataset health checks (drift, representativeness, coverage) and red‑team suites that evolve with product use cases.
Align dataset usage with legal/privacy guidance; document constraints and approvals.
6) Developer Experience & Enablement
Produce evaluation reports and quality dashboards that developers can interpret quickly (summary + drill‑downs + diffs vs. baseline).
Collaborate with Technical Writers and Developer Experience engineers to include evaluation guidance, quickstarts, and troubleshooting in SDK docs.
Provide templates for teams to extend evaluation suites when contributing new modules via feature branches.
7) Collaboration & Continuous Improvement
Close feedback loops with Embedded Architecture, Solution Design, AI Engineering, Creation Centers, and FITs to keep evaluation aligned with production realities.
Lead post‑release and incident reviews to improve test coverage, resilience, and guardrails.
Track and improve quality KPIs (escaped defects, regression counts, benchmark variance, time-to-detect/repair).
MEASURES OF SUCCESS
Safety & compliance: Guardrail tests pass; reduced policy violations; documented evidence meets audit requirements.
DX & adoption: Clear evaluation reports improve developer confidence; faster time‑to‑first‑success for consuming teams.
Continuous Improvement: Actionable insights from incidents and field feedback lead to expanded test suites and stronger guardrails.
Process Excellence: Data-driven evaluation of the efficiency, quality and consistency of the SDK testing and quality workflows.
QUALIFICATIONS
Basic Qualifications
7+ years in QA/Quality Engineering/Evaluation for SDKs, libraries, or platform components in software/AI contexts.
Hands‑on experience building automated test frameworks (unit/integration/contract) and benchmark suites; strong Python testing fluency (e.g., pytest, property-based testing, fuzzing).
Proficiency with CI/CD systems (GitHub Actions, Azure DevOps, GitLab CI) and integrating quality gates (coverage, performance, security).
Working knowledge of AI/ML or GenAI patterns (RAG, vector stores, agents/tools) and evaluation considerations for these workloads.
Strong grasp of reproducibility, data governance, and observability‑by‑design for libraries.
Preferred Qualifications
Experience with adversarial/red‑teaming methodologies for LLM‑enabled features and safety guardrails.
Background in performance engineering (profiling, optimization, latency/cost analysis) for Python libraries.
Familiarity with security and privacy testing (static analysis, secrets scanning, dependency hygiene, PII handling).
Exposure to regulated environments (healthcare/pharma/finance) and audit‑ready evidence capture.
Comfortable collaborating across architecture, DevOps, DX, LLMOps, platform, and product teams in matrixed organizations.
Work Location Assignment: Hybrid
Purpose
Breakthroughs that change patients' lives... At Pfizer we are a patient centric company, guided by our four values: courage, joy, equity and excellence. Our breakthrough culture lends itself to our dedication to transforming millions of lives.
Digital Transformation Strategy
One bold way we are achieving our purpose is through our company wide digital transformation strategy. We are leading the way in adopting new data, modelling and automated solutions to further digitize and accelerate drug discovery and development with the aim of enhancing health outcomes and the patient experience.
Flexibility
We aim to create a trusting, flexible workplace culture which encourages employees to achieve work life harmony, attracts talent and enables everyone to be their best working self. Let’s start the conversation!
Equal Employment Opportunity
We believe that a diverse and inclusive workforce is crucial to building a successful business. As an employer, Pfizer is committed to celebrating this, in all its forms – allowing for us to be as diverse as the patients and communities we serve. Together, we continue to build a culture that encourages, supports and empowers our employees.
Disability Inclusion
Our mission is unleashing the power of all our people and we are proud to be a disability inclusive employer, ensuring equal employment opportunities for all candidates. We encourage you to put your best self forward with the knowledge and trust that we will make any reasonable adjustments to support your application and future career. Your journey with Pfizer starts here!