11  Agentic Safety and Guardrails

Published

April 26, 2026

There is a meaningful safety boundary between an AI system that suggests a course of action and one that executes it. On one side of that boundary, a clinician reads the output and decides what to do next; the human is the actor, and the AI is a source of input. On the other side, the AI plans and executes a sequence of steps — placing a message in a patient’s portal, submitting a prior authorization request, updating a medication list — while a human nominally supervises but does not approve each individual action. Most AMC governance frameworks were designed for advisory AI. They ask how to validate a model’s output and how to surface it appropriately in the clinical workflow. They were not designed to govern systems that initiate actions in the world.

That gap matters now because the agentic threshold is being crossed — not in research settings, but in production deployments. Epic’s “Chart with Art” can autonomously query clinical data and prepopulate order sets. Prior authorization agents submit benefit determinations to payers without manual triggering. AI-enabled inbox management tools route, draft, and in some configurations send patient responses. The safety failures that arise in these systems are qualitatively different from those that arise when a clinician misreads an AI recommendation. They are faster, harder to trace, and more likely to cascade into downstream harm before anyone notices. This chapter gives AMC clinical and operational leaders the framework for governing these systems before the first incident, not after.

11.1 From Advisory to Agentic: The Autonomy Spectrum

It is more useful to think of AI autonomy as a spectrum than as a binary distinction. At the advisory end, the AI generates text and stops — it drafts a clinical note, and the clinician reviews every word before attesting. The human is the sole actor in any consequential sense. Moving along the spectrum, systems begin to initiate actions without per-step human approval: routing a message to a triage queue, flagging a result for follow-up, prefilling a prior authorization form. At the far end, systems take actions that write to authoritative records — the medication list, the order queue, the problem list — based on autonomous reasoning about what the clinical situation requires.

flowchart LR
    A["**Advisory**\nAmbient scribe\nDrafts only\nHITL: attestation"] --> B["**Semi-Agentic**\nInbox triage\nDrafts + routes\nHITL: periodic review"]
    B --> C["**Autonomous Admin**\nPrior auth agent\nSubmits via API\nHOTL: exception alerts"]
    C --> D["**Fully Agentic**\nMedication reconciliation\nWrites to EHR\nHITL: per-action approval"]

    style A fill:#d4f1d4,stroke:#2d6a2d
    style B fill:#fff3cd,stroke:#856404
    style C fill:#ffe0b2,stroke:#7d4e00
    style D fill:#ffcccc,stroke:#721c24
Figure 11.1: The advisory-to-agentic spectrum with current deployed examples and required oversight tier at each level. HITL = human-in-the-loop (approves each action). HOTL = human-on-the-loop (reviews logs after the fact).

The critical observation in Figure 11.1 is that risk is not monotonically increasing along the spectrum. A fully agentic system that writes to the medication list is obviously high-risk. But an autonomous prior authorization agent operating at the “autonomous administrative” level also carries substantial risk: it interprets clinical data, matches it to insurance criteria, and submits a determination on behalf of the institution — all without a clinician reviewing the specific case. When the agent makes an error, the error is already in an external system before anyone looks at it. The delay between the action and its discovery is the relevant safety variable, and it scales with autonomy.

11.2 Agentic Failure Modes

Advisory AI fails in familiar ways: a model produces an incorrect output, the clinician reads it, and ideally the clinician catches the error. The human is the circuit breaker. Agentic AI introduces failure modes where the circuit breaker either is not in position to act, or has been trained by experience to defer.

Cascading errors. In a multi-agent architecture — a diagnostic agent feeds a recommendation to a scheduling agent, which triggers a prior authorization agent — an error in the first agent propagates through the chain. Each downstream agent treats its input as authoritative, because that is how the system was designed. By the time a human reviews the output, the consequences of the initial error may have ramified across multiple systems and cannot be easily undone.

Scope creep. An agent authorized to perform task A may, when the instructions are ambiguous, take action on task B because it appears adjacent or necessary. An inbox triage agent authorized to route messages may, in the course of routing, infer that a result requires follow-up and draft an outreach message the clinician never requested. The agent is not malfunctioning; it is being helpful in a way the governance design did not anticipate.

Action on stale data. An agent’s knowledge of the patient context — diagnosis, current medications, recent labs — is as fresh as its last synchronization with the EHR. If that synchronization is minutes or hours old, and the clinical situation has changed in the interval, the agent may take actions that were appropriate for a patient who no longer exists. This is not a theoretical risk; it is a timing problem that arises whenever an agent operates asynchronously against a live clinical record.

Automation surprise. The clinician discovers that something happened — a message was sent, a form was submitted, a list was updated — without their explicit direction. The surprise itself is a safety event, independent of whether the action was correct. A clinician who does not understand what actions an agent may take cannot be a reliable overseer of those actions (Parasuraman and Manzey 2010). Automation surprise is the signal that the human oversight model was not designed correctly for the system’s actual autonomy level.

Sycophancy. Research on agentic AI in reasoning tasks has documented a tendency for models to converge on the perspective of the most recent authoritative input rather than reason independently. In clinical terms, an agent asked to verify a clinician’s tentative diagnosis may confirm that diagnosis because the clinician proposed it — not because the evidence supports it. This is particularly dangerous in contexts where the agent’s role is explicitly supervisory, such as medication reconciliation or prior authorization review.

11.3 Human-in-the-Loop Architecture

“Human-in-the-loop” has become a compliance phrase in healthcare AI governance. It appears in policies, vendor contracts, and regulatory guidance as a shorthand for adequate oversight. The problem is that it describes an architectural design choice without specifying what that choice requires. A human who clicks “approve” on an AI-generated order in under five seconds is technically in the loop; they are not, in any meaningful sense, exercising oversight.

The automation bias literature is unambiguous on this point. Operators consistently over-trust automated systems, especially under cognitive load, and consistently under-correct for errors that the system presents in a confident, well-formatted output (Parasuraman and Manzey 2010). The more accurate the system generally is, the worse this effect becomes: a clinician who has approved 200 consecutive AI recommendations without incident will review the 201st less carefully than the first. This is not a character failure; it is a predictable consequence of human attention operating on probabilistic feedback.

The governance implication is that the question is not “is there a human in the loop?” but “is the HITL checkpoint likely to catch errors, given the cognitive conditions under which it operates?” A high-stakes HITL checkpoint — a human reviewing an autonomous medication order — should be designed for scrutiny, not for speed. That means surfacing the specific clinical reasoning the agent used, not just the recommended action; setting an expected review time that is long enough to be meaningful; and tracking override rates as a quality metric. A near-zero override rate is not a sign that the AI is performing well; it may be a sign that the checkpoint has become a rubber stamp.

For lower-stakes actions, a human-on-the-loop (HOTL) architecture may be appropriate: the agent acts, and the human reviews a log of actions after the fact. The distinction is consequential. HOTL is only acceptable when the actions are readily reversible, the volume is too high for per-action review, and the exception-detection mechanism is robust enough to surface anomalous actions before they cause harm.

11.4 Kill Switches and Circuit Breakers

Every agentic AI system deployed in a clinical environment requires a mechanism to pause or terminate its operation in response to anomalous behavior, and that mechanism must be tested before deployment, not designed after an incident.

The aviation analogy is instructive. Aircraft autopilot systems do not operate without limits; they operate within a flight envelope, and they disengage automatically when the aircraft departs that envelope. The pilot has a manual disconnect that takes effect immediately. Neither feature requires the pilot to monitor every output of the autopilot to remain safe; the system enforces its own limits and returns control explicitly when those limits are exceeded.

Clinical AI governance needs the same architectural discipline. For each deployed agentic system, the governance design should specify: (1) the conditions under which the system will automatically pause, (2) who has authority to manually pause and resume, (3) what the fallback workflow is while the system is paused, and (4) how often the kill switch is tested. An untested circuit breaker is not a safety feature.

The specific trigger conditions depend on the system. An inbox management agent might pause automatically if more than a certain percentage of messages in a defined time window are routed to an unusual destination — an indicator that something in the routing logic has drifted or been compromised. A prior authorization agent might pause if the approval rate for a specific payer departs significantly from the institutional baseline. A medication reconciliation agent might pause if it encounters a clinical context it has not seen in training. The triggers should be specified by the clinical team and the informatics team together, not left to the vendor.

flowchart TD
    A([Trigger]) --> B[Input Guardrail\nScope + PHI check]
    B --> C[Agent Plans\nMulti-step reasoning]
    C --> D[Retrieve Context\nEHR read access only]
    D --> E[Reasoning Guardrail\nPlausibility · Bounds check]
    E --> F[Propose Action]
    F --> G{HITL Required?\nRisk tier check}
    G -->|High-risk action| H[Human Review\nFull reasoning surfaced]
    G -->|Low-risk action| I[Execute]
    H -->|Approved| I
    H -->|Rejected| J[Log + Discard]
    I --> K[Granular Audit Log]
    K --> L{Anomaly\nDetected?}
    L -->|No| M([Continue])
    L -->|Yes| N[Circuit Breaker\nPause + Alert Supervisor]
Figure 11.2: Agentic AI loop architecture with layered safety controls. Each guardrail operates independently; the circuit breaker is the final catch for unanticipated failures.

11.5 Least Privilege and Scope Limitation

The principle of least privilege — a system should have access only to the resources it needs to perform its specified function, and no more — is a foundational concept in information security that applies with particular force to clinical AI agents. An agent that needs to read a patient’s problem list and draft a prior authorization request does not need write access to the medication list. An agent that needs to route inbox messages does not need access to the full longitudinal record.

The technical mechanism for enforcing least-privilege access in the EHR context is the SMART on FHIR authorization framework, which supports granular, parameterized access scopes. Rather than granting an agent broad access to a patient record, the institution can specify that the agent may read specific resource types (e.g., Observation, Condition) for specific patients, during specific session windows. The agent cannot read or write outside that scope, because the authorization framework physically prevents it.

Operationally, this requires a change in how clinical AI tools are procured and configured. Many commercial tools request broad EHR access during initial setup because it simplifies integration. AMC informatics teams should treat this as a negotiating point in the procurement process: what is the minimum access scope required for this tool to function as advertised? Any access beyond that minimum is a risk surface that the institution is accepting without commensurate benefit.

Audit logging for agent actions should be as granular as audit logging for human actions. If a clinician querying the EHR generates a log entry, an agent querying the EHR should generate an equivalent log entry. The log should capture: which agent, on behalf of which patient, accessed which resource, at what time, and what action resulted. This is not primarily a compliance requirement; it is the mechanism by which automation surprises are investigated and understood.

11.6 The Regulatory Picture for Agentic AI

The regulatory landscape for agentic clinical AI is unsettled, and AMC leaders should approach it with the same expectation of ambiguity that currently governs AI medical devices generally. Several regulatory frameworks bear directly on agentic systems, even where they do not address agenticity explicitly.

The FDA’s Predetermined Change Control Plan guidance, discussed in Chapter 6, applies with special force to agentic systems that learn and adapt as they operate. A prior authorization agent that updates its matching criteria based on approval outcomes is an adaptive system; if it meets the Software as a Medical Device definition, it requires either a PCCP or a new regulatory filing for each adaptation (U.S. Food and Drug Administration 2024). The PCCP pathway is available but requires pre-specifying the bounds of the adaptation — which is precisely the kind of disciplined governance that agentic systems require in any case.

The CMS Interoperability and Prior Authorization Final Rule (CMS-0057-F) mandates that payers respond to prior authorization requests within 72 hours for urgent requests and seven days for standard requests (Centers for Medicare and Medicaid Services 2024). This regulatory pressure is a primary driver of agentic PA deployment: health systems need to submit faster, and automation is the only technically feasible path at volume. It does not, however, change the safety requirements for those systems; it increases the operational pressure to skip them.

California AB 3030’s disclosure requirement for AI-generated patient communications (AB 3030 2024) becomes more complex in agentic contexts. When a human clinician reviews and attests to an AI-drafted message, the disclosure is straightforward. When an agent routes and sends a communication without clinician review, the disclosure architecture requires more thought: what is being disclosed, to whom, and when? AMCs deploying agentic communication tools should consult legal counsel on whether AB 3030 and similar state laws require the disclosure to appear before the communication is sent, not after.

The NIST AI Risk Management Framework’s Govern/Map/Measure/Manage structure (National Institute of Standards and Technology 2024) provides the most practical scaffold for AMC agentic AI governance. Specifically, the Measure function — developing metrics for agentic AI performance, error rates, and override patterns — is where most institutions are currently underprepared. You cannot govern what you are not measuring.

11.7 The Action Risk Authorization Matrix

The table below provides a working framework for assigning governance controls to categories of agent action. The tiers are illustrative; specific institutions will need to calibrate thresholds based on their risk tolerance, patient population, and clinical context.

Table 11.1: Risk-level and authorization matrix for agentic AI actions in the clinical environment. Rows with “Required” HITL tiers should have override rate monitoring built into the governance framework.
Action Type Authorization Model Audit Logging HITL Tier Kill-Switch
Read EHR data Agent identity + scoped token Summary level Not required Not required
Route message to clinical queue Per-session scope Granular Optional Recommended
Draft patient communication Per-session scope Granular Required: clinician attestation Recommended
Submit prior authorization Periodic institutional review Granular Optional for low-risk PA Required
Update problem list or medication Per-action explicit approval Granular Required: per-action Required
Place clinical order Per-action explicit approval Granular Required: per-action Required

11.8 Where to Start

The governance gap for agentic AI is real, but it does not require a complete rebuild of existing frameworks. The two projects below are tractable starting points for institutions that have already begun deploying semi-agentic tools and need to formalize oversight before the autonomy level increases.

11.8.1 Starter Project 1: Agentic AI Audit and Tiering

What it is: An inventory and autonomy-tiering of every deployed AI tool that takes any action — routes, submits, updates, or schedules — without per-action human approval. For each tool, assign an autonomy tier using the spectrum in Figure 11.1, assess whether the current governance controls match the tier requirements in Table 11.1, and document the gaps.

Why now: Many institutions have deployed ambient scribes, inbox triage tools, and PA assistants without formally categorizing them as agentic systems. The governance gap is often not a deliberate decision; it is an artifact of tools being piloted and scaled before the governance framework caught up.

How to execute: This is an extension of the clinical AI inventory recommended in Section 6.4. The additional work is adding an autonomy tier column, auditing the access scopes currently granted to each tool (via the EHR’s OAuth/SMART application registry), and reviewing audit log coverage. The output is a tiered register with documented gaps and a prioritized remediation plan.

Buy vs. build: Governance exercise. The tiering framework and audit process are institutional work; the only technology question is whether the EHR’s application registry surfaces the information needed for an access scope audit.

11.8.2 Starter Project 2: Kill-Switch Design and Testing

What it is: For each agentic tool at the “autonomous administrative” tier or above, define the circuit-breaker trigger conditions, pause mechanism, fallback workflow, and test schedule. Run at least one tabletop exercise in which the kill switch is activated and the fallback workflow is executed.

Why now: Kill switches that have never been tested do not work reliably under pressure. The tabletop exercise surfaces both technical gaps (the pause mechanism does not propagate to all system components) and operational gaps (the fallback workflow requires a staff role that is not adequately covered on nights and weekends).

How to execute: Work with the vendor, the EHR team, and the clinical operations team to document the pause mechanism for each tool. Define trigger conditions with the clinical team — what anomaly pattern, at what threshold, triggers a pause? Schedule a tabletop exercise with the on-call clinical informatics lead and relevant clinical leadership. Use the exercise findings to update the governance documentation before the next deployment expansion.

Buy vs. build: Design and process work, not a technology purchase. Most commercial vendors have a pause mechanism; the gap is typically in the institutional design of when and how to use it.