OSRA Worked Example: EuroBank Sentinel
Running the Operational Substrate Risk Audit against a DORA-regulated AI deployment
The Scenario
EuroBank is a mid-tier European bank, €45B in assets under management, headquartered in Frankfurt and regulated under DORA, the EU AI Act, and ECB supervisory requirements. In Q3 2025, the bank deployed "Sentinel" — an AI-powered transaction monitoring and fraud detection system that replaced a legacy rules-based engine.
Sentinel handles roughly 2.1 million transactions a day. Flagged transactions enter a review workflow that can block payments, freeze accounts, or escalate to compliance. Unflagged transactions pass straight through with no human review.
Three teams were involved in the procurement. The CTO's office handled infrastructure. Compliance set the regulatory requirements. An external AI consultancy, DataForge GmbH, selected and fine-tuned the model. Each team owns a piece of the system; nobody owns the whole thing.
On paper, EuroBank looks solid. ISO 27001 certified. Progressing toward ISO 42001. DORA compliance procedures documented. The board gets a quarterly AI risk report, and it currently shows green across every metric.
Phase 1 — Substrate Mapping
System Boundary
| Field | Value |
|---|---|
| AI System Name | Sentinel — AI Transaction Monitoring |
| System Owner | Compliance (functional), CTO Office (infrastructure), DataForge GmbH (model) |
| Regulatory Classification | EU AI Act: High-Risk (Annex III). DORA: Critical ICT service. |
| Primary Function | Real-time transaction monitoring for fraud, sanctions, and AML compliance |
| Users / Consumers | Compliance analysts, automated payment system (pass/block), regulators (audit trail) |
| Deployment Date | September 2025 |
| Downstream Dependencies | Payment processing, regulatory reporting, customer account status |
Dependency Chain
Model Layer
| Dependency | Provider | Single Point? | Visibility | Fallback |
|---|---|---|---|---|
| Base model: GPT-4o | OpenAI via Azure OpenAI Service | Y | Known-Unmonitored | None |
| Fine-tuning pipeline | DataForge GmbH | Y | Known-Unmonitored | None — DataForge holds exclusive knowledge |
| Model weights | Azure West Europe blob storage | Y | Visible | None — single copy, no replication |
Compute Layer
| Dependency | Provider | Single Point? | Visibility | Fallback |
|---|---|---|---|---|
| Inference compute | Azure West Europe, A100 GPU | Y | Visible | None — no reserved capacity, no secondary region |
Data Layer
| Dependency | Provider | Single Point? | Visibility | Fallback |
|---|---|---|---|---|
| Core banking transaction feed | Internal Kafka → Azure Event Hub | N | Visible | Kafka replay |
| Refinitiv sanctions screening | Refinitiv API (LSEG), UK-hosted | Y | Known-Unmonitored | None |
| Customer risk ratings | Internal SQL, Frankfurt | N | Visible | Cached ratings (24h stale) |
Energy Layer
| Dependency | Provider | Single Point? | Visibility | Fallback |
|---|---|---|---|---|
| Primary power | Dutch national grid (TenneT) | Y | Invisible | Unknown to EuroBank |
| Cooling | Azure facility management | Y | Invisible | Unknown to EuroBank |
Contractual Layer
| Dependency | SLA Detail | Risk |
|---|---|---|
| Azure OpenAI Service | 99.95% uptime (covers API availability, not output quality) | SLA scope mismatch |
| OpenAI model lifecycle | 6-month deprecation notice (policy, not contractual) | Unverifiable commitment |
| Refinitiv sanctions API | No explicit response time or data quality SLA | No contractual protection |
| DataForge consultancy | Contract expires March 2027, no knowledge transfer clause | Single-vendor knowledge lock |
Phase 1 Findings
28 dependencies mapped. 19 of them are single points of dependency — 68%. Of those, 21 have no fallback at all. Visibility breaks down as 12 Visible, 10 Known-Unmonitored, and 6 Invisible.
But the finding that matters most here is organisational, not technical. Ownership of this system is split three ways, and the split means nobody has a complete picture of the substrate. Compliance doesn't know what GPU type runs inference. The CTO's office doesn't know what Refinitiv's API depends on upstream. DataForge built the fine-tuning pipeline and the preprocessing logic, and that knowledge lives entirely with them — there's no documentation, no handover, no backup.
Phase 2 — Failure Surface Analysis
Seven failure scenarios were run against the Substrate Map. A full OSRA execution would cover all 28 dependencies; this example focuses on the ones that scored highest.
F1 — Base model silent update (CRITICAL)
OpenAI has pushed five significant GPT-4o behaviour updates with minimal communication to downstream users. If the base model changes, the fine-tuned fraud detection layer on top of it may behave differently — and EuroBank has no way to know. They monitor flagging volume, not model output distribution. A subtle shift in accuracy would go completely undetected.
F2 — Sanctions data staleness (CRITICAL)
Refinitiv's API keeps responding, but what if the data behind it goes stale — sanctions lists not refreshed, matching accuracy quietly degrading? There's no error signal in that scenario. EuroBank monitors whether the API is up, not whether the data is fresh or the matching is accurate.
Detection confidence: none. This is a criminal liability scenario in multiple jurisdictions.
F3 — Azure West Europe region outage (CRITICAL)
A full region outage takes everything down at once — inference, model weights, model registry, storage. All Sentinel components sit in the same region. The system goes offline entirely, and transactions either queue indefinitely or pass without screening.
This one is at least detectable; Azure Service Health alerts would fire within minutes. The harder question is what happens next. The legacy rules engine was decommissioned. There's no documented fallback for what EuroBank does when Sentinel is unavailable.
F4 — DataForge knowledge concentration (HIGH)
The DataForge contract expires March 2027. No knowledge transfer clause. If DataForge becomes unavailable for any reason — acquisition, insolvency, a key person leaving — EuroBank loses the ability to retrain, debug, or modify the model. That makes continuous model management, which the EU AI Act Art. 9 expects, impossible to deliver.
F5 — Co-located monitoring (CRITICAL)
This one is worth pausing on. The monitoring dashboards and alerting systems that are supposed to catch Sentinel failures are hosted in the same Azure region as Sentinel itself. If the region degrades — not a full outage, something subtler — monitoring degrades too, potentially masking the problem underneath.
No independent detection exists outside Azure West Europe. Detection confidence: none.
This is the most dangerous failure mode the analysis found. The system that's supposed to detect failure fails at the same time as the system it's monitoring. EuroBank would look at its dashboards and see something that looks roughly normal while Sentinel is producing degraded outputs.
F7 — GPU silent data corruption (HIGH)
Per NVIDIA and OpenCompute research, roughly 1 in 1,000 machines in hyperscaler GPU fleets experience silent data corruption — corrupted compute that produces wrong inference results with no error signal. EuroBank has no detection at the customer level. Azure may run fleet-level checks, but that's invisible to EuroBank.
Phase 2 Findings
Four of the seven scenarios are critical severity. Five involve silent failure risk. Three have no detection mechanism at all.
The pattern that emerges: the visible failures (F3, the region outage) are not the dangerous ones. They're dramatic but detectable. The dangerous ones — F1, F2, F5 — are the scenarios where EuroBank's board report keeps showing green while the system quietly produces wrong outputs. That quarterly report would stay green through all three of them.
Phase 3 — Trust Surface Audit
Six trust signals were audited. A full OSRA execution would cover more; these are the ones with the highest consequences.
T1 — Azure OpenAI uptime SLA: "99.95%"
Unverified. EuroBank takes this at face value. What the SLA actually guarantees is that the API will respond — it says nothing about whether the response is correct, consistent with previous behaviour, or timely enough for real-time transaction processing. EuroBank's continuity planning treats "uptime" and "working correctly" as the same thing. They aren't.
T2 — OpenAI model deprecation: "6 months notice"
Unverifiable. This is a policy statement on OpenAI's website, not a contractual commitment. They can change it whenever they want. And the notice period covers full model deprecation — it doesn't cover behaviour-altering updates pushed within a model version, which is EuroBank's actual risk. The trust signal covers the wrong thing.
T3 — DataForge model accuracy: "93.7% F1 score"
Partially verified. EuroBank's data science team reviewed DataForge's validation methodology but never ran an independent validation on a holdout dataset of their own. The benchmark was done on historical data that DataForge selected. No adversarial testing.
Worth noting the precedent: Epic Systems' proprietary sepsis prediction model claimed 76-83% AUC. When independently validated, actual sensitivity was 33%. 170+ hospitals had deployed it without checking.
T4 — ISO 27001 certificate: vendor security
Unverified. Accepted at face value. ISO 27001 certifies that a management system for information security exists. It doesn't certify the specific controls on EuroBank's workload, the security of the AI inference pipeline, model weight integrity in storage, or tenant isolation. EuroBank's DORA compliance evidence references this certificate as proof of adequate third-party security. The certificate doesn't cover what EuroBank thinks it covers.
T5 — Refinitiv sanctions data completeness
Unverified. EuroBank has never audited Refinitiv's data sources, update frequency, matching methodology, or coverage gaps. The contract doesn't include a right-to-audit clause. EuroBank treats "we use Refinitiv" as equivalent to "our sanctions screening is complete." Those are two very different statements, and the distance between them carries criminal liability.
T6 — Board AI risk report: "green across all metrics"
Unverifiable in its current form. The report tracks system uptime, flagging volume, false positive rate, and model accuracy based on DataForge's benchmark. None of those metrics would catch a silent model update (F1), sanctions data going stale (F2), monitoring degrading alongside the system (F5), or GPU corruption (F7).
The report measures what's easy to measure. The failure modes that Phase 2 identified as most dangerous sit entirely outside what the report covers. The board believes it is exercising oversight. On this evidence, it isn't.
Trust Chains
| EuroBank Trusts | Direct Provider | Provider Trusts | Tier 3 | Depth | Verified? |
|---|---|---|---|---|---|
| "AI fraud detection works" | Azure OpenAI Service | OpenAI base model quality | OpenAI's training data integrity | 3 | No |
| "Sanctions screening complete" | Refinitiv API | Refinitiv data sources | Original sanctions list publishers | 3 | No |
| "Model is accurate" | DataForge benchmark | DataForge's validation methodology | Historical fraud label quality | 3 | Partially |
| "Infrastructure is secure" | Azure ISO 27001 | Microsoft's internal controls | Azure supply chain (TSMC, power grid, cooling) | 4+ | No |
Phase 3 Findings
Four of six trust signals are unverified or unverifiable. Five of six have a scope mismatch — the verification, where it exists, doesn't cover what the organisation assumes it covers.
The finding that cuts deepest is T6. Every other trust gap here is operational: fixable with monitoring, testing, or a better contract clause. The board report is different. It's the mechanism by which the organisation assures itself that oversight is happening, and it's structurally blind to the failure modes that actually matter. That makes it the trust signal with the highest consequence, because everything downstream of the board — resource allocation, risk acceptance, regulatory submissions — rests on it.
Phase 4 — Convergence Mapping
Convergence Matrix
| Dependency | Critical/High Severity? | Silent Failure Risk? | Unverified Trust? | Conditions | Classification |
|---|---|---|---|---|---|
| GPT-4o base model | Y (F1) | Y | Y (T2) | 3/3 | CRITICAL CONVERGENCE |
| Refinitiv sanctions API | Y (F2) | Y | Y (T5) | 3/3 | CRITICAL CONVERGENCE |
| Azure West Europe co-location | Y (F5) | Y | Y (T1, T4) | 3/3 | CRITICAL CONVERGENCE |
| DataForge GmbH | Y (F4) | N | Y (T3) | 2/3 | CONVERGENCE POINT |
| Azure GPU fleet | Y (F7) | Y | N | 2/3 | CONVERGENCE POINT |
Convergence Risk Summary
Three critical convergences and two convergence points. Ranked by weighted score.
#1 — AI Model Behaviour Change (Score: 33.0)
What converges: The base model can change without notification, EuroBank has no detection mechanism, and the lifecycle policy is unverifiable. The fine-tuned layer's accuracy hangs entirely on base model stability.
Why governance missed it: The EU AI Act Annex IV asks for documentation of "versions of relevant software." EuroBank wrote "GPT-4o." A version label doesn't guarantee behaviour. No framework currently in use requires continuous monitoring of model behaviour between version changes.
Regulatory exposure: EU AI Act Art. 9 and 15; DORA Art. 6; potential AML enforcement if the model change degrades sanctions screening.
Recommended actions:
- Immediate: implement output distribution monitoring — track flagging rate, confidence distribution, and decision boundary behaviour; alert on drift
- 30 days: set up an independent validation pipeline, running Sentinel against a fixed benchmark dataset on a monthly cycle
- 90 days: negotiate contractual model change notification with Microsoft/OpenAI; if that's not available, begin evaluating alternative base models
- Governance: add model stability metrics to the board AI risk report
#2 — Co-located Monitoring Failure (Score: 32.0)
What converges: Every Sentinel component and every monitoring system that would detect Sentinel failure sits in the same Azure region. A regional degradation takes out both the patient and the doctor.
Why governance missed it: DORA Art. 11 requires incident detection, and EuroBank has it. What no framework requires is verifying that monitoring infrastructure is independent from the systems it watches. The architecture was designed for efficiency — same region, lower latency — not for the scenario where efficiency and resilience pull in opposite directions.
The question a DORA auditor would ask: "How would you know if Sentinel was producing incorrect results?" Answer: "Our monitoring dashboards." Follow-up: "Where are those dashboards hosted?" The conversation ends there.
Recommended actions:
- Immediate: deploy an independent health check outside Azure West Europe — a simple external probe that validates Sentinel output against known test cases
- 30 days: stand up monitoring in a secondary region or on-premises, architecturally independent of Azure West Europe
- 90 days: run a DORA-aligned resilience test simulating regional degradation, not full outage — degradation is harder to spot and more dangerous
- Governance: make monitoring independence a standing architecture requirement
#3 — Sanctions Data Integrity (Score: 31.0)
What converges: Refinitiv is a single point of dependency with no fallback. Nobody monitors data freshness or matching quality. The contract offers no SLA on either, and no right-to-audit clause. The organisation treats "we use Refinitiv" as evidence of adequate sanctions screening.
Why governance missed it: Refinitiv is a market-standard provider, and using one feels like due diligence. But DORA Art. 28-30 requires assessment of third-party ICT provider concentration risk and exit strategies. EuroBank has neither.
Recommended actions:
- Immediate: implement a daily automated comparison between Refinitiv's output and at least one independent sanctions source (OFAC SDN is freely available and can serve as a baseline check)
- 30 days: negotiate a right-to-audit and data freshness SLA with Refinitiv; if they refuse, document the refusal in the DORA third-party risk register
- 90 days: evaluate a secondary sanctions screening provider and implement dual-screening for high-value transactions
- Governance: reclassify Refinitiv as a DORA "critical third-party provider"
#4 — Vendor Knowledge Concentration (Score: 20.0)
What converges: DataForge owns the fine-tuning pipeline, the preprocessing code, and the validation methodology exclusively. The contract expires March 2027 with no knowledge transfer clause. The accuracy benchmark was DataForge's own work, reviewed by EuroBank but never independently replicated.
Why governance missed it: Vendor management looked at DataForge's financial stability and data handling. The question nobody asked: "If DataForge disappears tomorrow, can we keep this system running ourselves?"
Recommended actions:
- Immediate: open knowledge transfer negotiations before the next contract renewal window
- 60 days: conduct an independent model validation on a dataset DataForge didn't select
- 6 months: build internal capability to retrain and maintain the model — this is an investment in operational independence
- Governance: add vendor knowledge concentration to the DORA third-party risk register and flag the contract expiry to the board with a remediation budget
#5 — GPU Silent Data Corruption (Score: 18.5)
What converges: Azure's GPU fleet has a documented ~1 in 1,000 machine silent data corruption rate (NVIDIA/OpenCompute research). Individual transaction decisions may come back wrong, and there's no error signal to catch it. EuroBank has no customer-level detection mechanism.
Why governance missed it: This sits below the abstraction layer governance operates at. It's a physics-level failure, and no framework currently addresses it.
Recommended actions:
- 60 days: request documentation from Microsoft on Azure's fleet-level SDC detection
- 90 days: evaluate output integrity verification — running critical transactions through inference twice and flagging discrepancies
- For low-value transactions, this may be an accepted risk; if so, document the acceptance with the rationale
Governance Integration Map
| Convergence Point | Regulation | Internal Document | Required Change |
|---|---|---|---|
| CP1 — Model behaviour | EU AI Act Art. 9, 15; DORA Art. 6 | AI Risk Assessment; Vendor Management | Add model behaviour monitoring; update risk assessment for base model instability |
| CP2 — Sanctions data | AML Directives; DORA Art. 28-30 | Third-Party Risk Register; AML Policy | Reclassify Refinitiv as critical provider; add data quality verification |
| CP3 — Co-located monitoring | DORA Art. 11, 15 | Incident Response Plan; Architecture Standards | Add monitoring independence requirement; test degradation scenarios |
| CP4 — Vendor knowledge | EU AI Act Art. 9; DORA Art. 28 | Third-Party Risk Register; Contracts | Add knowledge transfer clause; flag to board |
| CP5 — GPU corruption | DORA Art. 6; EU AI Act Art. 15 | IT Risk Register | Document accepted risk; investigate vendor detection |
What This Example Shows
EuroBank Sentinel is compliant on paper: ISO-certified, DORA-documented, reporting green to the board every quarter. Running OSRA against it produced five convergence points that existing governance can't see.
Three of the five are critical convergences — high-severity failure, silent failure risk, and unverified trust all present at the same time. Two carry criminal liability exposure (model behaviour change affecting sanctions screening, and sanctions data staleness). One means the system built to detect failure would go down with the system it's supposed to be watching.
The quarterly board report would stay green through every one of these scenarios. The metrics it tracks are real, but they don't cover the failure modes that the substrate analysis identified as most dangerous.
That gap — between what governance reports and what the substrate actually risks — is what OSRA was built to make visible.
This is a hypothetical worked example designed to demonstrate the OSRA methodology. "EuroBank" and "DataForge GmbH" are fictional. The infrastructure patterns, regulatory frameworks, and failure modes described are drawn from documented real-world evidence.
OSRA is open source under CC BY-SA 4.0. Full methodology, templates, and action catalogue: github.com/marcobrondani/OSRA