OSRA Worked Example: EuroBank Sentinel

Running the Operational Substrate Risk Audit against a DORA-regulated AI deployment

The Scenario

EuroBank is a mid-tier European bank, €45B in assets under management, headquartered in Frankfurt and regulated under DORA, the EU AI Act, and ECB supervisory requirements. In Q3 2025, the bank deployed "Sentinel" — an AI-powered transaction monitoring and fraud detection system that replaced a legacy rules-based engine.

Sentinel handles roughly 2.1 million transactions a day. Flagged transactions enter a review workflow that can block payments, freeze accounts, or escalate to compliance. Unflagged transactions pass straight through with no human review.

Three teams were involved in the procurement. The CTO's office handled infrastructure. Compliance set the regulatory requirements. An external AI consultancy, DataForge GmbH, selected and fine-tuned the model. Each team owns a piece of the system; nobody owns the whole thing.

On paper, EuroBank looks solid. ISO 27001 certified. Progressing toward ISO 42001. DORA compliance procedures documented. The board gets a quarterly AI risk report, and it currently shows green across every metric.

Phase 1 — Substrate Mapping

System Boundary

Field	Value
AI System Name	Sentinel — AI Transaction Monitoring
System Owner	Compliance (functional), CTO Office (infrastructure), DataForge GmbH (model)
Regulatory Classification	EU AI Act: High-Risk (Annex III). DORA: Critical ICT service.
Primary Function	Real-time transaction monitoring for fraud, sanctions, and AML compliance
Users / Consumers	Compliance analysts, automated payment system (pass/block), regulators (audit trail)
Deployment Date	September 2025
Downstream Dependencies	Payment processing, regulatory reporting, customer account status

Dependency Chain

Model Layer

Dependency	Provider	Single Point?	Visibility	Fallback
Base model: GPT-4o	OpenAI via Azure OpenAI Service	Y	Known-Unmonitored	None
Fine-tuning pipeline	DataForge GmbH	Y	Known-Unmonitored	None — DataForge holds exclusive knowledge
Model weights	Azure West Europe blob storage	Y	Visible	None — single copy, no replication

Compute Layer

Dependency	Provider	Single Point?	Visibility	Fallback
Inference compute	Azure West Europe, A100 GPU	Y	Visible	None — no reserved capacity, no secondary region

Data Layer

Dependency	Provider	Single Point?	Visibility	Fallback
Core banking transaction feed	Internal Kafka → Azure Event Hub	N	Visible	Kafka replay
Refinitiv sanctions screening	Refinitiv API (LSEG), UK-hosted	Y	Known-Unmonitored	None
Customer risk ratings	Internal SQL, Frankfurt	N	Visible	Cached ratings (24h stale)

Energy Layer

Dependency	Provider	Single Point?	Visibility	Fallback
Primary power	Dutch national grid (TenneT)	Y	Invisible	Unknown to EuroBank
Cooling	Azure facility management	Y	Invisible	Unknown to EuroBank

Contractual Layer

Dependency	SLA Detail	Risk
Azure OpenAI Service	99.95% uptime (covers API availability, not output quality)	SLA scope mismatch
OpenAI model lifecycle	6-month deprecation notice (policy, not contractual)	Unverifiable commitment
Refinitiv sanctions API	No explicit response time or data quality SLA	No contractual protection
DataForge consultancy	Contract expires March 2027, no knowledge transfer clause	Single-vendor knowledge lock

Phase 1 Findings

28 dependencies mapped. 19 of them are single points of dependency — 68%. Of those, 21 have no fallback at all. Visibility breaks down as 12 Visible, 10 Known-Unmonitored, and 6 Invisible.

But the finding that matters most here is organisational, not technical. Ownership of this system is split three ways, and the split means nobody has a complete picture of the substrate. Compliance doesn't know what GPU type runs inference. The CTO's office doesn't know what Refinitiv's API depends on upstream. DataForge built the fine-tuning pipeline and the preprocessing logic, and that knowledge lives entirely with them — there's no documentation, no handover, no backup.

Phase 2 — Failure Surface Analysis

Seven failure scenarios were run against the Substrate Map. A full OSRA execution would cover all 28 dependencies; this example focuses on the ones that scored highest.

F1 — Base model silent update (CRITICAL)

OpenAI has pushed five significant GPT-4o behaviour updates with minimal communication to downstream users. If the base model changes, the fine-tuned fraud detection layer on top of it may behave differently — and EuroBank has no way to know. They monitor flagging volume, not model output distribution. A subtle shift in accuracy would go completely undetected.

F2 — Sanctions data staleness (CRITICAL)

Refinitiv's API keeps responding, but what if the data behind it goes stale — sanctions lists not refreshed, matching accuracy quietly degrading? There's no error signal in that scenario. EuroBank monitors whether the API is up, not whether the data is fresh or the matching is accurate.

Detection confidence: none. This is a criminal liability scenario in multiple jurisdictions.

F3 — Azure West Europe region outage (CRITICAL)

A full region outage takes everything down at once — inference, model weights, model registry, storage. All Sentinel components sit in the same region. The system goes offline entirely, and transactions either queue indefinitely or pass without screening.

This one is at least detectable; Azure Service Health alerts would fire within minutes. The harder question is what happens next. The legacy rules engine was decommissioned. There's no documented fallback for what EuroBank does when Sentinel is unavailable.

F4 — DataForge knowledge concentration (HIGH)

The DataForge contract expires March 2027. No knowledge transfer clause. If DataForge becomes unavailable for any reason — acquisition, insolvency, a key person leaving — EuroBank loses the ability to retrain, debug, or modify the model. That makes continuous model management, which the EU AI Act Art. 9 expects, impossible to deliver.

F5 — Co-located monitoring (CRITICAL)

This one is worth pausing on. The monitoring dashboards and alerting systems that are supposed to catch Sentinel failures are hosted in the same Azure region as Sentinel itself. If the region degrades — not a full outage, something subtler — monitoring degrades too, potentially masking the problem underneath.

No independent detection exists outside Azure West Europe. Detection confidence: none.

This is the most dangerous failure mode the analysis found. The system that's supposed to detect failure fails at the same time as the system it's monitoring. EuroBank would look at its dashboards and see something that looks roughly normal while Sentinel is producing degraded outputs.

F7 — GPU silent data corruption (HIGH)

Per NVIDIA and OpenCompute research, roughly 1 in 1,000 machines in hyperscaler GPU fleets experience silent data corruption — corrupted compute that produces wrong inference results with no error signal. EuroBank has no detection at the customer level. Azure may run fleet-level checks, but that's invisible to EuroBank.

Phase 2 Findings

Four of the seven scenarios are critical severity. Five involve silent failure risk. Three have no detection mechanism at all.

The pattern that emerges: the visible failures (F3, the region outage) are not the dangerous ones. They're dramatic but detectable. The dangerous ones — F1, F2, F5 — are the scenarios where EuroBank's board report keeps showing green while the system quietly produces wrong outputs. That quarterly report would stay green through all three of them.

Phase 3 — Trust Surface Audit

Six trust signals were audited. A full OSRA execution would cover more; these are the ones with the highest consequences.

T1 — Azure OpenAI uptime SLA: "99.95%"

Unverified. EuroBank takes this at face value. What the SLA actually guarantees is that the API will respond — it says nothing about whether the response is correct, consistent with previous behaviour, or timely enough for real-time transaction processing. EuroBank's continuity planning treats "uptime" and "working correctly" as the same thing. They aren't.

T2 — OpenAI model deprecation: "6 months notice"

Unverifiable. This is a policy statement on OpenAI's website, not a contractual commitment. They can change it whenever they want. And the notice period covers full model deprecation — it doesn't cover behaviour-altering updates pushed within a model version, which is EuroBank's actual risk. The trust signal covers the wrong thing.

T3 — DataForge model accuracy: "93.7% F1 score"

Partially verified. EuroBank's data science team reviewed DataForge's validation methodology but never ran an independent validation on a holdout dataset of their own. The benchmark was done on historical data that DataForge selected. No adversarial testing.

Worth noting the precedent: Epic Systems' proprietary sepsis prediction model claimed 76-83% AUC. When independently validated, actual sensitivity was 33%. 170+ hospitals had deployed it without checking.

T4 — ISO 27001 certificate: vendor security

Unverified. Accepted at face value. ISO 27001 certifies that a management system for information security exists. It doesn't certify the specific controls on EuroBank's workload, the security of the AI inference pipeline, model weight integrity in storage, or tenant isolation. EuroBank's DORA compliance evidence references this certificate as proof of adequate third-party security. The certificate doesn't cover what EuroBank thinks it covers.

T5 — Refinitiv sanctions data completeness

Unverified. EuroBank has never audited Refinitiv's data sources, update frequency, matching methodology, or coverage gaps. The contract doesn't include a right-to-audit clause. EuroBank treats "we use Refinitiv" as equivalent to "our sanctions screening is complete." Those are two very different statements, and the distance between them carries criminal liability.

T6 — Board AI risk report: "green across all metrics"

Unverifiable in its current form. The report tracks system uptime, flagging volume, false positive rate, and model accuracy based on DataForge's benchmark. None of those metrics would catch a silent model update (F1), sanctions data going stale (F2), monitoring degrading alongside the system (F5), or GPU corruption (F7).

The report measures what's easy to measure. The failure modes that Phase 2 identified as most dangerous sit entirely outside what the report covers. The board believes it is exercising oversight. On this evidence, it isn't.

Trust Chains

EuroBank Trusts	Direct Provider	Provider Trusts	Tier 3	Depth	Verified?
"AI fraud detection works"	Azure OpenAI Service	OpenAI base model quality	OpenAI's training data integrity	3	No
"Sanctions screening complete"	Refinitiv API	Refinitiv data sources	Original sanctions list publishers	3	No
"Model is accurate"	DataForge benchmark	DataForge's validation methodology	Historical fraud label quality	3	Partially
"Infrastructure is secure"	Azure ISO 27001	Microsoft's internal controls	Azure supply chain (TSMC, power grid, cooling)	4+	No

Phase 3 Findings

Four of six trust signals are unverified or unverifiable. Five of six have a scope mismatch — the verification, where it exists, doesn't cover what the organisation assumes it covers.

The finding that cuts deepest is T6. Every other trust gap here is operational: fixable with monitoring, testing, or a better contract clause. The board report is different. It's the mechanism by which the organisation assures itself that oversight is happening, and it's structurally blind to the failure modes that actually matter. That makes it the trust signal with the highest consequence, because everything downstream of the board — resource allocation, risk acceptance, regulatory submissions — rests on it.

Phase 4 — Convergence Mapping

Convergence Matrix

Dependency	Critical/High Severity?	Silent Failure Risk?	Unverified Trust?	Conditions	Classification
GPT-4o base model	Y (F1)	Y	Y (T2)	3/3	CRITICAL CONVERGENCE
Refinitiv sanctions API	Y (F2)	Y	Y (T5)	3/3	CRITICAL CONVERGENCE
Azure West Europe co-location	Y (F5)	Y	Y (T1, T4)	3/3	CRITICAL CONVERGENCE
DataForge GmbH	Y (F4)	N	Y (T3)	2/3	CONVERGENCE POINT
Azure GPU fleet	Y (F7)	Y	N	2/3	CONVERGENCE POINT

Convergence Risk Summary

Three critical convergences and two convergence points. Ranked by weighted score.

#1 — AI Model Behaviour Change (Score: 33.0)

What converges: The base model can change without notification, EuroBank has no detection mechanism, and the lifecycle policy is unverifiable. The fine-tuned layer's accuracy hangs entirely on base model stability.

Why governance missed it: The EU AI Act Annex IV asks for documentation of "versions of relevant software." EuroBank wrote "GPT-4o." A version label doesn't guarantee behaviour. No framework currently in use requires continuous monitoring of model behaviour between version changes.

Regulatory exposure: EU AI Act Art. 9 and 15; DORA Art. 6; potential AML enforcement if the model change degrades sanctions screening.

Recommended actions:

Immediate: implement output distribution monitoring — track flagging rate, confidence distribution, and decision boundary behaviour; alert on drift
30 days: set up an independent validation pipeline, running Sentinel against a fixed benchmark dataset on a monthly cycle
90 days: negotiate contractual model change notification with Microsoft/OpenAI; if that's not available, begin evaluating alternative base models
Governance: add model stability metrics to the board AI risk report

#2 — Co-located Monitoring Failure (Score: 32.0)

What converges: Every Sentinel component and every monitoring system that would detect Sentinel failure sits in the same Azure region. A regional degradation takes out both the patient and the doctor.

Why governance missed it: DORA Art. 11 requires incident detection, and EuroBank has it. What no framework requires is verifying that monitoring infrastructure is independent from the systems it watches. The architecture was designed for efficiency — same region, lower latency — not for the scenario where efficiency and resilience pull in opposite directions.

The question a DORA auditor would ask: "How would you know if Sentinel was producing incorrect results?" Answer: "Our monitoring dashboards." Follow-up: "Where are those dashboards hosted?" The conversation ends there.

Recommended actions:

Immediate: deploy an independent health check outside Azure West Europe — a simple external probe that validates Sentinel output against known test cases
30 days: stand up monitoring in a secondary region or on-premises, architecturally independent of Azure West Europe
90 days: run a DORA-aligned resilience test simulating regional degradation, not full outage — degradation is harder to spot and more dangerous
Governance: make monitoring independence a standing architecture requirement

#3 — Sanctions Data Integrity (Score: 31.0)

What converges: Refinitiv is a single point of dependency with no fallback. Nobody monitors data freshness or matching quality. The contract offers no SLA on either, and no right-to-audit clause. The organisation treats "we use Refinitiv" as evidence of adequate sanctions screening.

Why governance missed it: Refinitiv is a market-standard provider, and using one feels like due diligence. But DORA Art. 28-30 requires assessment of third-party ICT provider concentration risk and exit strategies. EuroBank has neither.

Recommended actions:

Immediate: implement a daily automated comparison between Refinitiv's output and at least one independent sanctions source (OFAC SDN is freely available and can serve as a baseline check)
30 days: negotiate a right-to-audit and data freshness SLA with Refinitiv; if they refuse, document the refusal in the DORA third-party risk register
90 days: evaluate a secondary sanctions screening provider and implement dual-screening for high-value transactions
Governance: reclassify Refinitiv as a DORA "critical third-party provider"

#4 — Vendor Knowledge Concentration (Score: 20.0)

What converges: DataForge owns the fine-tuning pipeline, the preprocessing code, and the validation methodology exclusively. The contract expires March 2027 with no knowledge transfer clause. The accuracy benchmark was DataForge's own work, reviewed by EuroBank but never independently replicated.

Why governance missed it: Vendor management looked at DataForge's financial stability and data handling. The question nobody asked: "If DataForge disappears tomorrow, can we keep this system running ourselves?"

Recommended actions:

Immediate: open knowledge transfer negotiations before the next contract renewal window
60 days: conduct an independent model validation on a dataset DataForge didn't select
6 months: build internal capability to retrain and maintain the model — this is an investment in operational independence
Governance: add vendor knowledge concentration to the DORA third-party risk register and flag the contract expiry to the board with a remediation budget

#5 — GPU Silent Data Corruption (Score: 18.5)

What converges: Azure's GPU fleet has a documented ~1 in 1,000 machine silent data corruption rate (NVIDIA/OpenCompute research). Individual transaction decisions may come back wrong, and there's no error signal to catch it. EuroBank has no customer-level detection mechanism.

Why governance missed it: This sits below the abstraction layer governance operates at. It's a physics-level failure, and no framework currently addresses it.

Recommended actions:

60 days: request documentation from Microsoft on Azure's fleet-level SDC detection
90 days: evaluate output integrity verification — running critical transactions through inference twice and flagging discrepancies
For low-value transactions, this may be an accepted risk; if so, document the acceptance with the rationale

Governance Integration Map

Convergence Point	Regulation	Internal Document	Required Change
CP1 — Model behaviour	EU AI Act Art. 9, 15; DORA Art. 6	AI Risk Assessment; Vendor Management	Add model behaviour monitoring; update risk assessment for base model instability
CP2 — Sanctions data	AML Directives; DORA Art. 28-30	Third-Party Risk Register; AML Policy	Reclassify Refinitiv as critical provider; add data quality verification
CP3 — Co-located monitoring	DORA Art. 11, 15	Incident Response Plan; Architecture Standards	Add monitoring independence requirement; test degradation scenarios
CP4 — Vendor knowledge	EU AI Act Art. 9; DORA Art. 28	Third-Party Risk Register; Contracts	Add knowledge transfer clause; flag to board
CP5 — GPU corruption	DORA Art. 6; EU AI Act Art. 15	IT Risk Register	Document accepted risk; investigate vendor detection

What This Example Shows

EuroBank Sentinel is compliant on paper: ISO-certified, DORA-documented, reporting green to the board every quarter. Running OSRA against it produced five convergence points that existing governance can't see.

Three of the five are critical convergences — high-severity failure, silent failure risk, and unverified trust all present at the same time. Two carry criminal liability exposure (model behaviour change affecting sanctions screening, and sanctions data staleness). One means the system built to detect failure would go down with the system it's supposed to be watching.

The quarterly board report would stay green through every one of these scenarios. The metrics it tracks are real, but they don't cover the failure modes that the substrate analysis identified as most dangerous.

That gap — between what governance reports and what the substrate actually risks — is what OSRA was built to make visible.

This is a hypothetical worked example designed to demonstrate the OSRA methodology. "EuroBank" and "DataForge GmbH" are fictional. The infrastructure patterns, regulatory frameworks, and failure modes described are drawn from documented real-world evidence.

OSRA is open source under CC BY-SA 4.0. Full methodology, templates, and action catalogue: github.com/marcobrondani/OSRA