AI for Clinical R&D · Course 03

AI for Protocol Design & Optimization

The protocol is the single most consequential document in a clinical trial. It defines who can enroll, what is measured, where the study runs, and how long it takes. This course shows clinical, operational, regulatory, and data professionals how artificial intelligence is responsibly applied to design leaner, faster, more inclusive, and inspection-ready protocols — always under human judgment and within FDA, EMA, NIST, and ICH guardrails.

⏱ ~75–90 min 🎓 Intermediate · no coding required 📘 8 lessons · 5 interactive labs 🏆 Certificate of Completion ♿ WCAG 2.1 AA · SCORM 1.2

~$535K

Median direct cost of a single substantial protocol amendment

Illustrative est. · Tufts CSDD-range

~57%

Of protocol amendments are considered potentially avoidable by design

Illustrative est.

~80%

Of trials fail to enroll on time; a major driver is over-restrictive design

Illustrative est.

Design levers this course teaches you to optimize with AI

Complexity · Eligibility · Feasibility · Recruitment

◆ Why this matters

Decisions locked into the protocol propagate through every downstream system — EDC build, site contracts, regulatory submissions, the eTMF, and the statistical analysis plan. A flaw caught at design costs a conversation; the same flaw caught after first-patient-in costs an amendment, re-consent, re-training, and months of timeline. AI moves error-detection and optimization upstream, where it is cheapest and safest to act.

01The four design levers you will master

This course is organized around the four protocol-design decisions where AI delivers the most defensible value. Each has a dedicated lesson with a hands-on lab.

Figure 0.1 — The Aurelyn protocol-design value map. Four levers, one governance foundation.

02Who this course is for

Clinical & medical

Medical writers, clinical scientists, and physicians who own protocol content and need to evaluate AI-generated suggestions critically.

Operations & feasibility

Study start-up, feasibility, and clinical operations leads who select countries and sites and forecast enrollment.

Regulatory, quality & data

Regulatory affairs, QA, biostatistics, and data managers responsible for validation, governance, and inspection readiness.

02How this course builds your expertise

This is a professionally engineered learning experience. Every lesson’s objectives and outcomes are mapped to Bloom’s Taxonomy — the six-level model of cognitive skill — so you don’t merely memorize AI concepts, you climb to analyzing, evaluating, and ultimately creating defensible, inspection-ready protocol-design decisions. Select any level to see what it means and exactly where you practise it in this course.

✓ How to use this course

Work through lessons in order using the outline on the left. Each lesson opens with objectives and outcomes framed in Bloom's taxonomy, then teaches the concept in plain language, then gives you an interactive lab and a knowledge check. Mark each lesson complete to advance the progress bar. Your place is saved automatically — close the tab and click Resume to return exactly where you left off. Finish the final assessment to unlock your printable certificate.

No prior AI or data-science experience required.

Lesson 1

The Strategic Case & the Regulatory Map

Before touching a single AI tool, you need to understand two things: why protocol design is the highest-leverage decision in a trial, and which rules govern AI when it touches that decision. This lesson builds both foundations.

⏱ ~12 min🧪 Interactive: Regulatory Stack Explorer📋 Knowledge check ×2

◎ Learning Objectives

▸Recall the principal cost and timeline consequences of poor protocol design.Remember
▸Explain how a design decision propagates through the downstream trial data chain.Understand
▸Map each AI use case to the FDA, EMA, NIST, and ICH instrument that governs it.Analyze

◎ Learning Outcomes

▸You can articulate the business case for AI-assisted design to a sponsor or steering committee.Evaluate
▸You can name the governing authority for any AI-in-protocol scenario you encounter.Apply
▸You can position "human-in-the-loop" correctly as a regulatory expectation, not a courtesy.Understand

1.1The economics of a protocol decision

A clinical protocol is a contract with reality. Every inclusion criterion narrows your eligible population. Every procedure adds site workload and patient burden. Every visit adds dropout risk. These choices feel small on paper, but they compound.

The clearest signal of design strain is the protocol amendment — a formal change to an approved protocol. Substantial amendments require re-review by ethics committees and regulators, re-consent of enrolled participants, site re-training, and EDC re-programming. Research from the Tufts Center for the Study of Drug Development has consistently found that a large share of amendments are potentially avoidable — driven by design choices that could have been caught earlier.

Figure 1.1 — The cost-of-change curve. AI's job is to pull detection leftward, into the cheap zone.

1.2The design-to-data chain

A protocol decision is never isolated. Consider one over-restrictive lab threshold in the exclusion criteria. It flows downstream like this:

Protocol design → smaller eligible pool → slower enrollment → more screen failures → site frustration & dropout → timeline slip → amendment to relax criteria → re-consent & re-train → delayed database lock → delayed submission.

AI cannot make the scientific judgment for you. What it does brilliantly is surface the consequence of a choice before you commit to it — quantifying the eligible pool, the procedure burden, the enrollment curve, and the amendment risk while the decision is still cheap to change.

1.3The regulatory map: who governs AI in protocol design

AI used to inform a regulatory decision is not unregulated. A coherent stack of guidance now applies. The defining principle across all of them is risk-based, context-of-use, human-overseen. Nothing here permits an AI to design a trial autonomously; everything here expects a qualified human to remain accountable.

Authority / Instrument	What it is	What it requires for AI in protocol design
FDA Draft Guidance on AI to support regulatory decision-making (Jan 2025)	A risk-based credibility assessment framework for AI models used in drug & biologic development.	Define the model's context of use, assess model risk (model influence × decision consequence), and provide credibility evidence proportionate to that risk.
FDA Diversity Action Plans (FDORA 2022)	Statutory expectation to enroll representative populations.	Eligibility-criteria optimization must improve, not narrow, demographic representativeness; AI used to model populations should surface diversity impact.
EMA Reflection Paper on AI in the medicinal product lifecycle (2024)	EMA's principles for AI across development, authorization, and post-market.	Risk-based approach, human oversight, data quality/representativeness, transparency, and lifecycle governance for any AI touching a regulatory dataset.
NIST AI Risk Management Framework 1.0 (2023)	The de-facto governance backbone: functions Govern · Map · Measure · Manage.	Not law, but the structure regulators and sponsors expect: identify context (Map), test for validity/bias (Measure), control risk through the lifecycle (Manage), under accountable governance (Govern).
ICH E8(R1) General Considerations	Modern trial-design philosophy: Quality by Design and Critical-to-Quality (CtQ) factors.	The conceptual home of complexity reduction — design only what is fit-for-purpose; AI helps identify non-CtQ procedures that add burden without value.
ICH E6(R3) Good Clinical Practice	The GCP standard, risk-proportionate and technology-aware.	Computerized systems (including AI) must protect data integrity, traceability, and participant safety; sponsors retain oversight of any tool used.
EU AI Act (Reg. 2024/1689)	Horizontal, risk-tiered AI law.	AI influencing health decisions can fall in high-risk tiers, triggering data-governance, documentation, human-oversight, and transparency obligations.
EU CTR 536/2014 · 21 CFR Parts 11/50/56/312	Core clinical-trial and electronic-records law.	The protocol and any system supporting it must meet submission, consent, IRB/EC, and electronic-records (Part 11 / Annex 11) requirements; AISvalidation follows GAMP 5.

⚑ The one principle to remember

Across every framework above, the message is identical: AI advises; a qualified human decides and remains accountable. "Human-in-the-loop" is not a soft nicety — in FDA's credibility framework, EMA's reflection paper, the NIST AI RMF, and the EU AI Act it is an explicit, documented control. Build your AI workflows so that a person reviews, can override, and signs off on every AI-influenced design choice.

1.4The global regulatory landscape

AI in protocol design is not governed by a single rulebook. A web of authorities — anchored by ICH harmonization — shapes what “good” looks like in each region. Explore the landscape below: select any authority (in the map or the buttons) to see the instruments that bear on your design and what they require of you.

Regulatory references reflect publicly available frameworks current as of authoring and are subject to change. Always consult the applicable authority and your organization’s quality and regulatory functions.

Interactive Lab 1

Regulatory Stack Explorer

Select an AI use case below. The explorer shows which authority governs it and what you must do to stay compliant. Click through all four to complete the activity.

Select a use case above to reveal its governing frameworks and obligations.

Explored: 0/4

Knowledge Check 1.1

A sponsor wants to use an AI model to recommend relaxing three exclusion criteria. Under the FDA's 2025 draft AI guidance, what determines how much credibility evidence the model needs?

Knowledge Check 1.2

Which statement best reflects the shared principle across FDA, EMA, NIST, and the EU AI Act regarding AI in protocol design?

Lesson 2 · Lever ①

AI for Protocol Complexity Analysis

Protocol complexity has risen steadily for two decades — more endpoints, more procedures, more data points per patient. Much of it adds cost and burden without adding scientific value. This lesson teaches you to measure complexity objectively and use AI to strip out what ICH E8(R1) would call non-critical-to-quality.

⏱ ~16 min🧪 Lab: Complexity Simulator📋 Knowledge check ×2

◎ Learning Objectives

▸Define the measurable dimensions of protocol complexity.Remember
▸Describe how NLP and benchmarking convert a protocol document into a complexity score.Understand
▸Compute and interpret a Protocol Complexity Index and an amendment-risk band.Apply

◎ Learning Outcomes

▸You can identify non-critical-to-quality procedures that inflate burden.Analyze
▸You can justify a complexity-reduction recommendation to a clinical team using ICH E8(R1).Evaluate
▸You can operate a complexity model and translate its output into design action.Apply

2.1What "complexity" really means

Complexity is not a feeling — it is a set of countable design dimensions. AI tools quantify each one and roll them into a single comparable index.

Endpoint load

Number and type of primary, secondary, and exploratory endpoints. Each endpoint implies assessments, data, and analysis.

Procedure burden

Distinct procedures per visit × number of visits — the Schedule of Activities (SoA) footprint experienced by every participant.

Eligibility density

Count and restrictiveness of inclusion/exclusion criteria — a leading driver of screen failures.

Visit frequency

Total scheduled visits and their spacing. More visits = more dropout risk and site workload.

Geographic scope

Number of countries and regulatory regimes — each adds translation, import, and submission complexity (ICH E17).

Data volume

Total unique data fields collected. A large fraction of collected data is never used in any analysis.

2.2How AI analyzes complexity

Modern complexity analysis depends on having the protocol in a machine-readable structure. The CDISC USDM (Unified Study Definition Model) and ICH M11 template turn a prose protocol into structured data an algorithm can parse. From there, four AI techniques do the work:

AI technique	What it does	Design value
NLP / LLM extraction	Reads the protocol text and extracts endpoints, criteria, procedures, and the SoA into structured fields.	Turns a 120-page document into a quantified profile in minutes, not days.
Benchmarking	Compares the profile against a library of historical protocols in the same phase and indication.	Flags where your design is an outlier — e.g., "twice the median procedure count for Phase 2 oncology."
Amendment-risk modeling	A trained model predicts the likelihood of a substantial amendment from the design profile.	Quantifies downstream risk while the design is still cheap to change.
CtQ / burden flagging	Identifies procedures not linked to any endpoint or safety need — candidate "data we collect but never use."	Directly supports ICH E8(R1) Critical-to-Quality reduction.

⚑ Regulatory anchor — ICH E8(R1)

ICH E8(R1) introduced Quality by Design and the concept of Critical-to-Quality (CtQ) factors — the elements that truly protect participants and the reliability of results. The guidance explicitly encourages designing trials to be fit-for-purpose, avoiding unnecessary complexity. An AI complexity tool is, in effect, an automated CtQ screen: it helps you keep what matters and challenge what does not. This is reinforced by ICH E6(R3), which expects sponsors to apply effort proportionate to risk.

Interactive Lab 2

Protocol Complexity Simulator

Adjust the six design dimensions below. The simulator computes a Protocol Complexity Index (PCI, 0–100), an estimated per-participant procedure burden, and an amendment-risk band — then offers AI-style reduction recommendations. Values are illustrative modeling, not regulatory thresholds.

Endpoints (primary + secondary + exploratory) 8

Distinct procedures per visit 9

Scheduled visits 12

Eligibility criteria (I + E) 28

Countries 6

Unique data fields collected 1200

PROTOCOL COMPLEXITY INDEX —

—

Procedure-events per participant

—

Modeled amendment likelihood

◆ AI REDUCTION RECOMMENDATIONS

✓ Practitioner takeaway

Notice how the index responds far more to procedures × visits than to any single dimension. The highest-yield complexity reduction is almost always trimming the Schedule of Activities — removing procedures that map to no endpoint and consolidating low-value visits. That is where AI flagging earns its keep.

Knowledge Check 2.1

An AI tool flags that 14 of a protocol's procedures are not linked to any endpoint or safety assessment. Which ICH concept best supports removing them?

Knowledge Check 2.2

Why is a structured digital protocol (CDISC USDM / ICH M11) a prerequisite for reliable AI complexity analysis?

Lesson 3 · Lever ②

AI for Eligibility Criteria Optimization

Eligibility criteria are where scientific caution and operational reality collide. Every restriction protects something — but stacked together, restrictions can shrink your eligible population to a sliver, slow enrollment, and exclude the very patients who will use the drug. AI lets you see the population cost of each criterion and broaden safely.

⏱ ~16 min🧪 Lab: Eligibility Optimizer📋 Knowledge check ×2

◎ Learning Objectives

▸Recognize the trade-off between internal validity and generalizability in eligibility design.Understand
▸Explain how RWD/EHR mining and computable phenotypes estimate a criterion's population cost.Understand
▸Evaluate which criteria to keep, relax, or remove using population and diversity evidence.Evaluate

◎ Learning Outcomes

▸You can quantify the eligible-pool impact of a criteria set.Apply
▸You can connect eligibility decisions to FDA Diversity Action Plan goals.Analyze
▸You can defend a broadening recommendation without compromising participant safety.Evaluate

3.1The eligibility tension

Tight eligibility criteria buy you a clean, homogeneous study population — easier to power, easier to interpret. But the same tightness has four costs that compound:

↓ Pool

A smaller eligible population means slower enrollment and longer timelines

↑ Fails

More criteria means more screen failures and wasted site effort

↓ Diversity

Age caps, comorbidity and language exclusions disproportionately remove underrepresented groups

↓ Generalize

Results may not apply to the real-world patients who receive the drug

The art is to keep restrictions that are truly about safety or scientific necessity and challenge restrictions that are merely convention or copy-paste from a previous protocol. AI makes that distinction visible by attaching a population cost to each criterion.

3.2How AI optimizes eligibility

Technique	What it does	Output you act on
RWD / EHR mining	Queries real-world databases (EHR, claims, registries) to estimate how many real patients meet — or fail — each criterion.	A population-impact percentage per criterion.
Computable phenotypes	Translates a free-text criterion into a precise, executable definition (codes, labs, time windows).	Consistent, reproducible eligibility logic that sites apply uniformly.
"What-if" simulation	Recomputes the eligible pool as you relax or remove criteria.	The marginal enrollment gain of each loosening.
Diversity / equity modeling	Estimates how a criterion shifts the demographic composition of the eligible pool.	Evidence for an FDA Diversity Action Plan.

⚑ Regulatory anchor — FDA eligibility & diversity

FDA has issued a series of guidances urging sponsors to avoid unnecessarily restrictive eligibility criteria (notably in oncology) and to broaden trial populations. The Food and Drug Omnibus Reform Act (FDORA, 2022) established the expectation of Diversity Action Plans for many pivotal trials. When AI helps you relax a criterion that excluded older adults, patients with controlled comorbidities, or organ-function ranges with no safety basis, it is directly serving these expectations — provided a clinician confirms the change is safe. EMA shares this representativeness principle.

⚠ Watch for bias

RWD is not neutral. If the EHR data feeding the model under-represents a group, the model's population estimates inherit that gap. Under the NIST AI RMF Measure function you must test the data and model for representativeness before trusting its diversity projections. Garbage in, biased out.

Interactive Lab 3

Eligibility Optimizer

Below is a sample criteria set for a chronic-disease Phase 2 study. Toggle a criterion off to remove it and watch the eligible pool and diversity index respond. The AI recommendation tag suggests whether each is safety-critical (keep), conventional (relax), or low-value (remove). Population figures are illustrative.

RELATIVE ELIGIBLE POOL

—

of the unrestricted indication population would qualify

DIVERSITY INDEX

—

higher = closer to representative of the real-world population

—

Modeled screen-failure rate at the clinic

Knowledge Check 3.1

An AI model estimates that an upper age limit of 65 excludes 31% of real-world patients with no supporting safety rationale. The most defensible action is to:

Knowledge Check 3.2

A "computable phenotype" improves eligibility design primarily because it:

Lesson 4 · Lever ③

AI for Feasibility Assessments

Feasibility is the discipline of asking, "Can this protocol actually be executed — here, now, on time?" The wrong countries and sites are the quiet killers of a trial: a site that never enrolls still consumes a contract, training, and oversight. AI turns feasibility from a survey-and-gut-feel exercise into a data-driven prediction.

⏱ ~15 min🧪 Lab: Feasibility Scorecard📋 Knowledge check ×2

◎ Learning Objectives

▸Identify the inputs that drive study-, country-, and site-level feasibility.Remember
▸Explain how predictive models score and rank candidate geographies and sites.Understand
▸Analyze how changing strategic priorities re-orders a feasibility ranking.Analyze

◎ Learning Outcomes

▸You can build a weighted feasibility scorecard for a country shortlist.Create
▸You can flag likely non-enrolling sites before contracting them.Analyze
▸You can apply ICH E17 reasoning to a multiregional design.Apply

4.1The three altitudes of feasibility

Study-level

Is the overall design executable at all? Are endpoints measurable in routine practice? Is the comparator the regional standard of care?

Country-level

Disease prevalence, competing-trial density, regulatory and ethics timelines, import logistics, infrastructure, and data-privacy regime.

Site-level

Historical enrollment performance, patient access, staff capacity, data quality, and the all-important risk of being a non-enroller.

4.2How AI predicts feasibility

AI feasibility models learn from the outcomes of thousands of past sites and studies. Instead of relying solely on a site's self-reported optimism on a feasibility questionnaire, the model weighs what actually happened at comparable sites.

Model capability	Signal it uses	Decision it supports
Country scoring	Epidemiology/prevalence (often from RWD), regulatory timelines, competing trials.	Which countries to include and how to allocate targets.
Predictive site selection	Each site's historical start-up time, enrollment rate, and data quality.	Which sites to activate — and which to avoid.
Competition density	Count of active trials competing for the same patients in a region.	Whether your enrollment assumptions are realistic.
Non-enroller risk	Patterns that historically preceded zero-enrollment sites.	De-risking the site list before contracts are signed.

⚑ Regulatory anchor — ICH E17, E6(R3), GDPR

ICH E17 governs multiregional clinical trials — it asks you to justify the choice of regions and to consider whether results will be consistent and interpretable across them. AI country-selection must respect that scientific framing, not just chase the cheapest enrollment. ICH E6(R3) holds the sponsor accountable for oversight of every selected site. And when feasibility models use real-world data across borders, GDPR (EU) and HIPAA (US) constrain how patient data may be processed — a key input to your data-governance plan.

Interactive Lab 4

Country Feasibility Scorecard

Five candidate countries are pre-scored (0–100) on five feasibility factors. Use the weight sliders to reflect your study's strategic priorities — the scorecard recomputes a weighted feasibility score and re-ranks the countries live. Country data is illustrative and not a recommendation about any real jurisdiction.

FACTOR WEIGHTS

Disease prevalence / patient access 30%

Site performance history 25%

Regulatory / ethics speed 20%

Low competing-trial density 15%

Infrastructure / data quality 10%

Weights normalize automatically. Total raw: 100

Knowledge Check 4.1

A predictive model flags a proposed site as high risk for never enrolling, based on its historical patterns. The best use of this output is to:

Knowledge Check 4.2

When an AI model selects countries purely to maximize fast enrollment, which ICH guideline reminds you to also justify regional choices scientifically?

Lesson 5 · Lever ④

AI for Recruitment Modeling & Enrollment Forecasting

A protocol that cannot recruit on time fails regardless of its scientific elegance. Enrollment forecasting predicts how fast, from how many sites, to a target N — and AI sharpens those predictions and re-forecasts continuously as real data arrives. This is where design assumptions meet the calendar.

⏱ ~16 min🧪 Lab: Recruitment Forecaster📋 Knowledge check ×2

◎ Learning Objectives

▸Describe the recruitment funnel and where participants are lost.Understand
▸Explain classical (Poisson–Gamma) versus AI-enhanced enrollment forecasting.Understand
▸Model the timeline impact of site count, activation speed, and screen-fail rate.Apply

◎ Learning Outcomes

▸You can produce an enrollment forecast and read its sensitivity to assumptions.Create
▸You can locate the recruitment bottleneck and target an intervention.Analyze
▸You can name the consent, privacy, and advertising rules that bound AI-driven recruitment.Remember

5.1The recruitment funnel

Every enrolled participant survives a funnel. Loss at any stage moves your finish line. AI predicts the conversion rate at each step so you can intervene where it matters.

Figure 5.1 — The recruitment funnel. AI estimates the conversion rate at each narrowing.

5.2From Poisson–Gamma to AI

The classical workhorse is the Poisson–Gamma model: it treats each site's enrollment as a random (Poisson) process with rates that vary across sites (a Gamma distribution). It is robust and explainable. AI extends it by adding covariates — site type, indication, seasonality, competition, digital-recruitment signals — and by Bayesian updating: as the first patients enroll, the forecast re-calibrates to reality instead of clinging to the original plan.

Predictive curves

Project cumulative enrollment over time with confidence bands.

Screen-fail prediction

Estimate how many screens are needed per enrollment — the hidden multiplier on site workload.

Bottleneck detection

Pinpoint whether the constraint is activation, referral, or screen-fail — so you fix the right thing.

⚑ Regulatory anchor — consent, privacy & advertising

AI-driven recruitment touches several controls at once. Informed consent is governed by 21 CFR Part 50; recruitment materials and advertising require IRB/EC approval (Part 56). Any system handling candidate data must satisfy 21 CFR Part 11 / EU Annex 11 for electronic records and HIPAA / GDPR for privacy. If an AI model profiles or targets potential participants, the EU AI Act and GDPR's automated-decision provisions add transparency and fairness duties — and you must ensure targeting does not systematically exclude protected groups.

Interactive Lab 5

Recruitment Forecaster

Set your study parameters. The forecaster plots cumulative enrollment for a base plan and an AI-optimized plan (faster site activation and lower screen-fail from better eligibility design), and reports months-to-target for each. Modeled illustration, not a guarantee.

Active sites 25

Enrollments / site / month 0.8

Months to activate all sites 6

Screen-failure rate 35%

Target randomized (N) 300

—

Base plan: months to target

—

AI-optimized: months to target

◆ TIME SAVED & SCREENING LOAD

Knowledge Check 5.1

Why is Bayesian updating valuable in AI-enhanced enrollment forecasting?

Knowledge Check 5.2

An AI tool targets potential participants using profiling. Beyond consent (Part 50) and IRB-approved advertising (Part 56), which additional concern is most directly raised?

Lesson 6 · Foundation

Responsible AI, Regulation & Governance

Everything in Lessons 2–5 sits on one foundation: can you trust, validate, and defend the AI you used? This lesson assembles the governance stack — NIST AI RMF, the FDA credibility framework, the EMA reflection paper, GAMP 5 / Part 11 / ALCOA+, and the EU AI Act — into a workflow you can operate and an audit you can pass.

⏱ ~14 min🧪 Lab: NIST AI RMF Readiness📋 Knowledge check ×2

◎ Learning Objectives

▸State the four functions of the NIST AI RMF and what each demands.Remember
▸Explain FDA's risk-based credibility assessment and "context of use."Understand
▸Assess an AI workflow's governance maturity against the framework.Evaluate

◎ Learning Outcomes

▸You can run a readiness check on an AI-in-design use case.Apply
▸You can map validation duties to GAMP 5, Part 11, and ALCOA+.Analyze
▸You can position a use case in the EU AI Act risk tiers.Evaluate

6.1The NIST AI RMF — your governance backbone

The NIST AI Risk Management Framework is not a law, but it has become the common language regulators and sponsors use. Its four functions form a continuous loop.

① Govern

Establish accountability, policies, roles, and a risk culture. Who owns this AI? Who signs off?

② Map

Define the context of use, the decision it informs, and what could go wrong. What is this model actually for?

③ Measure

Test for accuracy, robustness, bias, and representativeness. Is it valid and fair on our data?

④ Manage

Control risk across the lifecycle, including drift monitoring and human override. How do we keep it safe over time?

6.2FDA's credibility assessment

FDA's 2025 draft guidance gives a risk-based, seven-step credibility framework for AI used to support regulatory decisions. The two ideas you must internalize:

Context of use (COU)

A precise statement of what specific question the model answers and how its output is used in the decision. Credibility evidence is judged against the COU — not the model in the abstract.

Model risk

A function of model influence (how much the decision relies on the model) × decision consequence (how serious an error would be). Higher risk demands more credibility evidence.

6.3Validation & data integrity

Standard	Domain	What you must do
GAMP 5 (2nd ed.)	Computerized-system validation	Apply a risk-based, fit-for-intended-use validation lifecycle to the AI system; document specification, verification, and ongoing control.
21 CFR Part 11 / EU Annex 11	Electronic records & signatures	Ensure audit trails, access controls, and record integrity for AI outputs that feed regulated records.
ALCOA+	Data integrity	AI-generated data and decisions must be Attributable, Legible, Contemporaneous, Original, Accurate — plus Complete, Consistent, Enduring, Available.
EU AI Act	Horizontal AI law	Classify the system by risk tier; high-risk uses require data governance, documentation, human oversight, transparency, and post-market monitoring.

◆ The governance one-liner

Map the context, Measure validity and bias, Manage the lifecycle with human override, all under accountable Governance — and keep a Part-11-grade, ALCOA+ audit trail of every AI-influenced design decision. Do that, and you can defend any of the four levers in an inspection.

6.4Primary source — the FDA–EMA Guiding Principles

In January 2026, the U.S. FDA and the European Medicines Agency jointly published Guiding Principles of Good AI Practice in Drug Development — ten principles meant to lay a common foundation for good practice as AI is used to generate evidence across the drug product life cycle. Every lever in this course is built to operationalize them. Expand each principle to see what it means in plain language and where you practised it.

U.S. FDAEMAJanuary 202610 principles · joint guidance

Guiding Principles of Good AI Practice in Drug Development

A joint statement of common principles to inform, enhance, and promote the responsible use of AI for generating evidence across nonclinical, clinical, post-marketing, and manufacturing phases of the drug product life cycle.

1Human-centric by design

Build and use AI around ethical, human-centred values: patient benefit and safety come first, and a qualified human stays in the loop on every consequential decision.

Theme throughoutLessons 1 & 6

2Risk-based approach

Scale validation, mitigation, and oversight to the model's context of use and assessed risk — apply more scrutiny precisely where a wrong answer would matter most.

Lesson 6 · COU & model riskLesson 2 · complexity risk

3Adherence to standards

Meet the applicable legal, ethical, technical, scientific, cybersecurity, and Good-Practice (GxP) standards rather than treating an AI system as exempt from them.

Lesson 6 · GAMP 5 · Part 11 · ALCOA+

4Clear context of use

State precisely what the model is for — the exact question it answers and how its output is used — before judging whether it is fit for purpose.

Lesson 6 · context of useLesson 1 · regulatory map

5Multidisciplinary expertise

Combine AI/technical skill with domain expertise — clinical, regulatory, operational, statistical — at every stage of the model's life cycle.

Lesson 4 · feasibilityLesson 0 · who this is for

6Data governance and documentation

Document data provenance, processing steps, and analytical choices in a traceable, verifiable way, and protect privacy and sensitive data throughout the life cycle.

Lesson 3 · RWD provenance & biasLesson 6 · governance

7Model design and development practices

Follow sound model and software-engineering practice on fit-for-use data, balancing interpretability, explainability, and predictive performance for reliable, robust systems.

Lesson 6 · model risk & transparency

8Risk-based performance assessment

Evaluate the whole system — including the human–AI interaction — with fit-for-use data and metrics suited to the intended context, validated by well-designed testing.

Lesson 6 · validationLesson 5 · forecast validation

9Life cycle management

Run a risk-based quality management system across the model's life: capture and resolve issues, and monitor and re-evaluate on a schedule to catch data drift.

Lesson 6 · QMS & drift monitoring

10Clear, essential information

Use plain language to tell users and patients what the AI is for, how it performs, its limits, the data behind it, and how it changes over time.

This course's plain-language designLesson 6 · transparency

◆ How Aurelyn operationalizes all ten

Across the four levers, the same discipline recurs: a clear context of use, a risk-based depth of validation, a documented, ALCOA+ data trail, and plain-language transparency — with AI that advises while a qualified, named human decides and remains accountable.

Read the source document

⬇ Download PDF ↗ Open in new tab

Source: U.S. Food & Drug Administration and European Medicines Agency, "Guiding Principles of Good AI Practice in Drug Development," January 2026. Reproduced for educational reference within this course package.

Interactive Lab 6

NIST AI RMF Readiness Self-Assessment

Answer 12 questions about an AI use case in your protocol-design workflow. The tool scores each NIST function, plots a readiness radar, and returns an overall maturity tier. Educational self-assessment — not a formal audit.

OVERALL READINESS —

0/100

Knowledge Check 6.1

In FDA's credibility framework, "model risk" is best described as:

Knowledge Check 6.2

Which NIST AI RMF function is most directly concerned with testing an AI model for bias and representativeness before use?

Lesson 7 · Capstone

Final Assessment & Certification

This capstone confirms that you can reason across the full protocol-design lifecycle — complexity, eligibility, feasibility, recruitment, and governance — the way an AI-augmented study team must. Score 80% or higher to unlock your Certificate of Completion.

🎓 Learning Objectives & Outcomes

By the end of this lesson you will be able to:

Evaluate a protocol scenario and judge where AI adds defensible value versus where human accountability must govern.
Analyze trade-offs between complexity, eligibility breadth, feasibility, and recruitment speed.
Create a defensible, regulation-aligned recommendation under the human-in-the-loop principle.

Demonstrated outcomes:

You integrate evidence from five protocol-design domains into a single judgement.
You correctly map each AI use to its governing framework (FDA, EMA, NIST, ICH, EU AI Act).
You articulate the accountability boundary between model and investigator.

Course Recap — The Through-Line

Across this course one idea recurs: AI amplifies whatever is already in your data and your design. A protocol that is internally inconsistent, an eligibility set that quietly excludes the target population, or a feasibility model fed optimistic assumptions will be made faster and more confidently wrong by automation. The discipline you have practised — interrogating inputs, demanding representativeness, and preserving a documented human decision — is what converts AI from a liability into a controlled instrument of study quality.

ComplexityScore burden early; every avoided amendment protects timeline, budget, and data integrity. ICH E8(R1) Quality-by-Design.

EligibilityOptimize for a population that is both answerable and representative. FDA Diversity Action Plans / FDORA 2022.

FeasibilityRank countries and sites on evidence, not habit; document the basis. ICH E17 multi-regional principles.

RecruitmentForecast with uncertainty, update with Bayesian discipline, plan for the failure mode. 21 CFR 50/56/11; HIPAA/GDPR.

GovernanceWrap every model in NIST's Govern–Map–Measure–Manage and FDA context-of-use. NIST AI RMF 1.0; EU AI Act 2024/1689.

The ConstantAI advises; a qualified, named human decides and remains accountable on the record.

Final Assessment

10 questions · 80% to pass · unlimited attempts

0 / 10