Tenant isolation failures in multi-tenant SaaS: a pattern catalogue

Tenant isolation is not a database concern. It is not a framework concern. It is not something your ORM handles for you, something your JWT library enforces, or something that falls out of a correctly configured cloud environment. It is an authorization concern — and authorization is the hardest unsolved problem in software that the industry keeps treating as solved.

This article is a catalogue of how tenant isolation fails. Not theoretically. In production, in systems that were designed by competent engineers, reviewed by competent security teams, and running in organizations that believed they had the problem handled. The patterns here are not exotic. They are the default failure modes of systems that confused data access filtering with authorization, that conflated role assignment with boundary enforcement, and that migrated between authorization models without understanding what the original model was actually doing.

The intended audience is architects making decisions about authorization models and practitioners doing threat modelling on systems where those decisions have already been made. This is not introductory material.

§ I

The authorization model problem

What authorization models actually do

An authorization model answers one question: given a principal, an action, and a resource, is the action permitted? Every model — RBAC, ABAC, ReBAC — is a different answer to how you represent and evaluate that question at scale. The failure modes of each model are direct consequences of the tradeoffs built into that representation.

RBAC — Role-Based Access Control — assigns permissions to roles and roles to users. Its strength is administrative simplicity: you manage a bounded set of roles rather than per-user permission sets. Its structural weakness, in multi-tenant systems, is that it says nothing about resource ownership. A role grants document:read. It does not grant document:read WHERE document.tenant = user.tenant. That constraint has to come from somewhere else — and in most RBAC implementations, it comes from the data access layer, not the authorization layer. The role check and the tenancy check are different code paths, maintained separately, and that separation is where isolation failures live.

ABAC — Attribute-Based Access Control — evaluates policies against attributes of the principal, the resource, and the environment. It is more expressive than RBAC and more operationally complex. In multi-tenant systems, ABAC can express tenancy natively: a policy that requires resource.tenant_id == principal.tenant_id is a first-class authorization rule rather than a query filter. The failure mode shifts from missing constraints to policy authoring errors — incorrect attribute mappings, stale attribute values, evaluation order bugs in compound policies. ABAC gets the tenancy constraint into the right layer. It introduces new ways to get the constraint wrong.

ReBAC — Relationship-Based Access Control — models authorization as a graph of relationships between principals and resources. The Zanzibar paper defines the canonical formalization: a relation tuple object#relation@user expresses that user holds relation on object. A check evaluates whether a path exists in the relationship graph from the principal to the resource via the required relation. The expressive power is significant — ReBAC can model hierarchical ownership, delegated permissions, and cross-object inheritance in ways that RBAC and ABAC cannot without significant gymnastics.

Its failure mode in multi-tenant systems is the subject of the next section.

The migration failure pattern

The most dangerous moment in a multi-tenant authorization system's lifecycle is not the initial implementation. It is the migration between authorization models — specifically, the migration from a model that encoded tenancy implicitly to one that requires tenancy to be modelled explicitly.

Consider a real failure pattern. A SaaS platform implements RBAC with a BST-shaped tenant hierarchy: a tenant is the root entity, groups exist within tenants, subgroups and users within groups. A user's position in the tree is their authorization boundary. Roles are assigned to users, permissions to roles. The segmentation logic — what data a user can see — is enforced not in the authz layer but at the ORM level: every query that touches tenant-scoped data carries an enforced filter clause equivalent to WHERE tenant_uid = {user.tenant_uid} AND group_uid = {user.group_uid}.

This is an implicit authorization boundary. It works. It is also invisible to the authorization system — the RBAC model knows about roles and permissions, not about the WHERE clause that enforces the actual boundary. The WHERE clause is infrastructure. It is not a policy. No one wrote it down in the authorization model's configuration language because it was never in the authorization model.

When the platform migrates to ReBAC, the migration team audits the RBAC model. They find roles. They find permissions. They translate them into relation tuples: resource:X#viewer@user:U. They implement the Zanzibar-style check graph. The implementation is correct — a user with the viewer relation on resource X can view resource X. They ship it.

What they did not model: tenancy. The ReBAC tuples answer "does user U have the viewer relation on resource X?" They do not answer "does resource X belong to the tenant tree that user U is a member of?" That question was never in the authz model. It was in the WHERE clause. The WHERE clause is not in the ReBAC implementation because the migration team looked at the authorization system and translated what they saw. The ORM filter looked like a data access pattern, not an authorization primitive. So it was not carried forward.

The result is a complete collapse of tenant isolation. Any authenticated user can access any tenant's data, because the ReBAC check passes — the user has the right relation on the resource — and the ORM filter that previously enforced the boundary is no longer applied on the new code paths. The new model is technically correct and operationally catastrophic.

The named pattern: Implicit Boundary Erasure. An authorization model migration that correctly translates explicit permissions but fails to identify and carry forward implicit constraints that were encoded outside the authorization layer. The constraint was not in the model. It was in the infrastructure. The migration inherited the model and discarded the infrastructure.

Detection is difficult precisely because the new implementation is internally consistent. Unit tests against the ReBAC model pass. The check API returns correct results for all the cases it was asked to evaluate. The missing constraint was never expressed as a check — it was expressed as a filter — so no check-level test catches its absence.

The precondition for this failure is any system where the effective authorization boundary is distributed across more than one enforcement mechanism. When tenancy is enforced in the ORM and roles are enforced in the authz service and object-level access is enforced in the API layer, a migration that touches one of those layers without a full audit of all three will produce gaps. In sufficiently large systems, the full audit never happens because no single person has the complete mental model of all three layers simultaneously.

IDOR as an authz model failure

Insecure Direct Object Reference is framed in most security literature as a vulnerability class — a scanner finding, a penetration test result. That framing is wrong in a way that matters. IDOR is not a bug. It is the observable consequence of an authorization model that grants role-level or action-level access without expressing object-level ownership constraints.

The distinction is precise. An authorization model that says "users with the viewer role can read documents" and a data layer that exposes documents by ID has not expressed whether this specific document is accessible to this specific user. The check answers "can this user read documents?" not "can this user read this document?" The object reference — the document ID — is trusted as implicit authorization. Possession of the ID is treated as evidence of entitlement to the resource it identifies.

In single-tenant systems, this is sometimes an acceptable tradeoff. In multi-tenant systems, it is a structural isolation failure. Document IDs are sequential, derivable, or leaked through indirect channels. A user in tenant A who discovers or guesses a document ID belonging to tenant B submits a request that the authorization model evaluates as permitted — the user has the viewer role, the document exists — and receives tenant B's data.

The remediation is not input validation. It is not rate limiting. It is expressing the missing constraint in the authorization model: document:X#viewer@user:U is insufficient when X can belong to any tenant. The complete check requires establishing that X belongs to a tenant T and that U is a member of T — either as a relation tuple (document:X#tenant@tenant:T, evaluated transitively) or as an explicit policy attribute in an ABAC model. The constraint must live in the authorization layer, not in a query filter that can be bypassed, not in a UI that doesn't render links to inaccessible resources, and not in the assumption that object IDs are unguessable.

§ II

The token and session layer

JWT tenant context failures

JSON Web Tokens are the dominant mechanism for carrying authentication and authorization context in multi-tenant SaaS APIs. They are also one of the most reliable sources of tenant isolation failures, because they sit at the boundary between the authentication layer (which produced the token) and the authorization layer (which consumes it) — and that boundary is where the hard questions about tenant context get quietly answered wrong.

The canonical JWT structure is known. What matters here is how tenant context gets encoded in it, and what happens when that encoding is trusted without verification.

The standard pattern: a token issued after successful authentication carries a claim identifying the tenant — tenant_id, org_id, workspace_id, or a domain-specific equivalent. The application extracts this claim at request time and uses it to scope data access. This works when:

The token was issued by an authorization server that verified the principal's membership in the claimed tenant
The claim is verified server-side on every request, not just decoded
The token cannot be modified post-issuance without invalidating the signature
The token is scoped to a single tenant and cannot be replayed against another tenant's resources

Each of these conditions is a place where production systems fail.

Tenant context without server-side validation

The token carries tenant_id: "acme-corp". The application decodes the token, extracts the claim, and uses it to filter queries: WHERE tenant_id = claims.tenant_id. The signature is verified — the token is cryptographically valid — but no server-side check establishes that the authenticated principal is actually a member of acme-corp. If the token was issued by a system where tenant membership is asserted by the client rather than verified by the authorization server, the claim is user-controlled input dressed as a cryptographic guarantee.

This failure mode is most common in systems that issue tokens based on user-supplied parameters at registration or login — where the tenant identifier is passed in the request body and embedded in the token without a membership lookup. The token is signed. The claim is false. The signature verifies. Every request succeeds.

The fix is not in the token validation logic. It is in the token issuance logic: tenant membership must be verified against an authoritative store at issuance time, not asserted by the client.

Token reuse across tenant boundaries

A token issued for a principal's session in tenant A is a valid, signed JWT. Nothing in the token's structure prevents it from being presented to an endpoint that serves tenant B's resources, if that endpoint does not verify that the token's tenant_id claim matches the tenant context of the requested resource.

This is a property of the endpoint, not the token. The token cannot enforce its own scope. The endpoint must verify: not just "is this token valid?" but "is this token valid for this resource's tenant context?" In systems where tenant context is embedded in URL path parameters (/api/tenants/{tenant_id}/resources/{resource_id}), the check requires comparing the path parameter against the token claim on every request. The check is frequently absent, overlooked in code review, or implemented inconsistently across endpoints.

The pattern is particularly common in internal service-to-service calls, where a token issued for a user session is forwarded to a downstream service that trusts it without re-evaluating tenant context — because the assumption is that the upstream service already performed that check.

Tenant switching attacks

A variation of the above. A principal has legitimate access to tenants A and B — a user who belongs to multiple tenants is common in B2B SaaS. The application issues separate tokens per tenant session. The switching attack occurs when a token scoped to tenant A is deliberately presented in a request context that the application interprets as a tenant B session — through session cookie manipulation, through a request to an endpoint that derives tenant context from a header rather than the token, or through an endpoint that accepts both a tenant parameter and a JWT and uses the parameter rather than the claim when they conflict.

The precondition is any system where the tenant context for a request can be derived from more than one source, and those sources are not validated for consistency against each other.

User-controlled tenant claims

The tenant_id claim is decoded from the token. The token is signed. But the signing key is a symmetric secret shared across all tenants, and the token issuance endpoint does not verify tenant membership before signing. A principal who can obtain a valid token for tenant A can construct a token with tenant_id: "target-corp", sign it with the same shared secret, and present it as a valid tenant B token.

This is a key management failure as much as an authz failure, but its impact is tenant isolation collapse. The remediation is per-tenant signing keys or asymmetric signing with per-tenant audience claims — not just token signature verification, but verification that the token was issued by an authority that had the right to assert membership in the claimed tenant.

Algorithm confusion and `kid` injection

JWT libraries validate token signatures, but the algorithm used for that validation is, in many implementations, specified by the token header rather than enforced by the application. The alg claim in the JWT header tells the verifier which algorithm was used to produce the signature. An alg: none attack — instructing the verifier to accept an unsigned token — is widely known and most modern libraries reject it by default. The subtler variant is algorithm confusion.

An API server issues tokens signed with an RSA private key (RS256). The public key is available at a JWKS endpoint. The application's token verification logic accepts both RS256 and HS256. An attacker obtains the RSA public key — which is, by design, public — and uses it as the secret for an HMAC-SHA256 signature. The resulting token, signed with HS256 using the RSA public key as the HMAC secret, passes verification on a library that selects the verification algorithm from the token header rather than enforcing it from configuration.

The kid (key ID) header parameter specifies which key in a JWKS should be used to verify the token. In implementations where the kid value is used to construct a filesystem path or a database query for key lookup, it becomes an injection vector. A kid value of ../../dev/null pointing a filesystem-based key lookup at a zero-byte file causes verification against an empty key — and an HMAC signature over an empty key is trivially forgeable. SQL injection via kid in a database-backed key store follows the same pattern.

Both of these are well-documented. They appear in production with regularity because the fixes — enforce the algorithm in application configuration rather than accepting it from the token header, parameterize key lookup queries, validate kid values against an allowlist — require knowing that the vulnerability class exists, and the JWT libraries that are vulnerable do not fail loudly.

In multi-tenant systems, the blast radius of an algorithm confusion or kid injection vulnerability is total isolation collapse: an attacker who can forge arbitrary tokens can assert any tenant identity. The vulnerability is in the token layer, but the consequence is in the isolation layer.

§ III

The data layer

The data layer is where tenant isolation goes to die quietly. The authorization model passes. The token validates. The ORM produces a query. The query returns data that belongs to a different tenant. Nobody notices until a penetration tester, a curious user, or an attacker does.

The missing WHERE clause

The most straightforward data layer isolation failure: a query that should be scoped to a tenant is not. This happens in three common ways.

The first is omission — a developer writes a query that retrieves by resource ID without including the tenant filter. The resource ID is validated, the query executes, the result is returned. The authorization model confirmed the user has access to resources of this type. It did not confirm that this resource belongs to this tenant.

The second is conditional application — the tenant filter is applied in the common case but bypassed in specific code paths. Administrative endpoints, bulk export operations, search functionality, and webhook handlers are the most frequent bypass locations. These code paths are written by different developers, at different times, against different requirements, and the tenant filter that is automatic in the standard CRUD layer is not automatic in the non-standard path.

The third is framework-level bypass — the application uses an ORM with tenant-scoping middleware that automatically appends the tenant filter. A developer using a raw query interface, a direct database connection, or a migration script bypasses the middleware entirely. The scoping that appeared automatic was only automatic for queries that went through the expected code path.

The mitigation that actually works is enforcing the tenant constraint at the deepest possible layer — not in middleware that can be bypassed, not in a base class that can be subclassed around, but as close to the query execution as the architecture permits. Row-level security at the database level is frequently proposed as this deep enforcement point.

It is not.

Why row-level security is not isolation

Row-level security (RLS) — available in PostgreSQL, supported in various forms in other RDBMSs — allows security policies to be defined at the table level that automatically filter rows based on session-level variables. The surface appeal is significant: the database enforces the tenant filter, not the application, and application code cannot bypass it.

The problem is that RLS is a database-level mechanism for a problem that is defined at the application level. For RLS to enforce tenant isolation, the database session must carry accurate tenant context. That context has to come from somewhere — it is set by the application before executing queries, which means the application is still responsible for correctly establishing and maintaining the tenant context that RLS enforces.

If the application sets the wrong tenant context — through a bug in session initialization, through a connection pool that reuses connections without resetting session variables, through a code path that doesn't set the context at all — RLS either enforces the wrong boundary or enforces nothing. The failure mode moves from "missing WHERE clause" to "wrong session variable," which is not an improvement.

Connection pooling is the most operationally common RLS failure in production. A connection pool maintains persistent database connections. Session-level variables set on a connection persist across queries if they are not explicitly reset. A query that sets SET app.current_tenant = 'acme-corp' and then returns the connection to the pool leaves that variable set. If the next query to acquire that connection belongs to a different tenant and the connection initialization does not reset all session variables, RLS enforces acme-corp's policies on a different tenant's data access.

This is not a theoretical concern. It is a class of production bug that appears whenever RLS is deployed with a connection pooler — PgBouncer in transaction pooling mode is particularly prone to it because session-level variables do not reset between transactions. The fix requires careful session variable management that is itself an application-level concern.

The deeper issue is that RLS creates an illusion of defense-in-depth when it is more accurately described as offense-in-depth for the application. You have not added an independent enforcement layer. You have added a dependent enforcement layer that fails in new ways when the application layer fails, and in doing so, you have added operational complexity without adding genuine isolation guarantees.

Tenant isolation is an application concern. Build it in the application. Do not delegate it to a layer that cannot understand your application's authorization model.

ORM eager loading across tenant boundaries

Modern ORM frameworks — Django's select_related and prefetch_related, ActiveRecord's includes, Hibernate's fetch joins — optimize query performance by loading associated objects in fewer database round trips. The same mechanism that makes them useful makes them dangerous in multi-tenant systems: eager loading follows relationships without inheriting the tenant scope of the parent query.

A query that retrieves Project.objects.filter(tenant=current_tenant).prefetch_related('documents') appears to be tenant-scoped. The Project filter is correctly applied. The prefetch_related issues a second query: SELECT * FROM documents WHERE project_id IN (...). The project IDs in that IN clause are correctly tenant-scoped. The documents query is not further filtered by tenant. If the document model has a foreign key to project and a separate foreign key to tenant — which is common in models built incrementally — and the project_id foreign key is used for the prefetch without the tenant filter, the prefetch returns documents associated with the project IDs regardless of their own tenant attribute.

This class of bug is particularly hard to find because the parent query looks correct. Code review catches the missing filter(tenant=current_tenant) on a top-level query. It does not consistently catch the missing filter on an eager-loaded association two levels deep in a query chain.

The pattern compounds in systems that use serializer-level traversal — GraphQL resolvers, DRF nested serializers, JSON:API relationships — where each level of the object graph issues its own queries and each level is a potential site for a missing tenant constraint.

Cache poisoning across tenants

Caching layers — Redis, Memcached, application-level caches — introduce a third category of data layer isolation failure distinct from query construction: result contamination across tenants through incorrect cache key design.

The standard failure: a cache key is constructed from the resource identifier without including the tenant identifier. cache.get(f"document:{document_id}") returns the cached result for that document ID regardless of which tenant is requesting it. If tenant A's request populated the cache entry, tenant B's request for the same document ID — which should return a cache miss and a not-found or forbidden response — instead returns tenant A's document.

This is most common in caches that sit in front of expensive operations — full-text search results, aggregated reports, permission sets. These are the operations most likely to be cached because they are the most expensive to compute. They are also the operations most likely to contain sensitive cross-tenant data.

The second failure mode is cache invalidation scope: when tenant A modifies a resource, the cache entry is invalidated. If the cache key does not include the tenant identifier, the invalidation affects the cache entry for all tenants, causing unnecessary cache misses. More dangerously, if the invalidation logic is written to match on resource ID only and the cache contains entries for the same resource ID across multiple tenants, selective invalidation may leave stale cross-tenant entries in place.

The fix is mechanically simple — include the tenant identifier in every cache key for tenant-scoped data — and operationally underimplemented because cache key design is treated as a performance concern rather than a security concern. The review that catches a missing tenant filter in an ORM query does not consistently catch a missing tenant identifier in a cache key.

§ IV

The pattern catalogue

What follows is a structured reference of the failure patterns described above, plus additional patterns that do not warrant full section treatment but are sufficiently common to name. Each entry carries: the pattern name, the precondition that makes it possible, how it manifests, where to look for it, and the remediation class.

Implicit Boundary Erasure

Precondition: An authorization model migration where the effective isolation boundary was enforced outside the authorization layer — in ORM filters, middleware, or query construction logic — and the migration translated the explicit authorization model without auditing the implicit enforcement mechanisms.

Manifestation: Complete or partial collapse of tenant isolation on new code paths. The new authorization model is internally consistent; unit tests pass. The missing constraint was never expressed as a check, so no check-level test catches its absence.

Where to look: Any system mid-migration between authorization models. Specifically: query paths that were rewritten to use the new model, and whether the tenant-scoping filter that existed in the old code path was carried forward or silently dropped.

Remediation class: Explicit modelling of tenancy as an authorization constraint in the new model, not as a query filter. Every resource must carry a tenant relation that is evaluated as part of the authorization check, not assumed from context.

Object-Level Entitlement Bypass (IDOR)

Precondition: An authorization model that grants action-level or role-level permissions without expressing object ownership constraints. The model answers "can this user perform this action?" without answering "can this user perform this action on this specific object?"

Manifestation: A user in tenant A accesses a resource belonging to tenant B by supplying the resource's identifier directly. The authorization check passes because the user holds the required role or permission. The object-level tenancy constraint is absent from the check.

Where to look: Any endpoint that accepts a resource identifier as a path or query parameter and performs an authorization check that does not verify the relationship between the resource and the requesting user's tenant. Particularly: bulk operations, export endpoints, and any endpoint where the resource identifier is the primary input.

Remediation class: Express object-level ownership as an explicit authorization constraint. The check must establish both that the user has the required permission type and that the resource belongs to the user's tenant context.

Unsigned Tenant Assertion

Precondition: A token issuance endpoint that embeds a client-supplied tenant identifier in the token without verifying the principal's membership in the claimed tenant against an authoritative store.

Manifestation: A principal presents a cryptographically valid token asserting membership in an arbitrary tenant. All downstream tenant-scoped operations execute under the false tenant context.

Where to look: Token issuance logic — specifically, whether the tenant identifier embedded in the token is verified against a membership store or passed through from the client request.

Remediation class: Verify tenant membership at issuance time. The token is a cryptographic assertion of facts the issuer verified, not a carrier for facts the client supplied.

Cross-Tenant Token Replay

Precondition: An endpoint that validates token authenticity without verifying that the token's tenant context matches the tenant context of the requested resource.

Manifestation: A token issued for tenant A is presented to an endpoint serving tenant B's resources. The token validates. The resource is returned.

Where to look: Any endpoint where the tenant context of the request is derived from a URL parameter, a header, or a session variable rather than, or in addition to, the token claim — and where those sources are not validated for consistency with each other.

Remediation class: On every authenticated request, verify that the token's tenant claim matches the tenant context of the requested resource. This check must be performed server-side, at the endpoint level, not assumed from the token alone.

Session Variable Contamination (RLS)

Precondition: Row-level security enforced via database session variables, with a connection pool that does not reset session state between tenant contexts.

Manifestation: Queries for tenant B execute under the RLS policy set by tenant A's previous connection, returning or filtering data according to tenant A's tenant context.

Where to look: Connection pool configuration — specifically, whether session variables are reset on connection checkout. PgBouncer in transaction pooling mode is the highest-risk configuration.

Remediation class: If RLS is used, session variables must be reset on every connection acquisition. Prefer statement-level RLS policies that derive context from the query itself over session-variable-based policies where possible.

Eager Load Boundary Escape

Precondition: An ORM eager loading operation that follows a foreign key relationship without inheriting the tenant scope of the parent query.

Manifestation: A tenant-scoped parent query is correctly filtered. The eager-loaded association issues a secondary query that is not tenant-scoped, returning associated records across tenant boundaries.

Where to look: Any ORM query chain that uses prefetch_related, includes, eager_load, or equivalent, where the associated model carries its own tenant attribute. Serializer-level traversal in API frameworks compounds this — each level of nesting is a potential missing scope.

Remediation class: Explicitly scope eager loading operations to the tenant context. Do not assume that the tenant filter on the parent query propagates to associated queries. In frameworks that support default scopes, apply tenant scoping at the model level with care — default scopes can be explicitly unscoped, and unscoped queries in bulk operations are a common bypass.

Cache Key Tenant Omission

Precondition: A cache key constructed from resource identifiers without a tenant discriminator, in a system where the same resource identifier can exist across multiple tenants.

Manifestation: A cache entry populated by tenant A's request is returned to tenant B's request for the same resource identifier. Alternatively, cache invalidation for tenant A's modification affects the cached results for all tenants sharing the resource identifier.

Where to look: Cache key construction logic for any operation that is tenant-scoped. Search result caches, permission caches, and aggregated report caches are the highest-risk locations.

Remediation class: Include the tenant identifier as a mandatory component of every cache key for tenant-scoped data. Treat cache key design as a security concern, not a performance concern.

Algorithm Confusion / `kid` Injection

Precondition: A JWT verification implementation that derives the verification algorithm from the token header, or derives the key lookup path from the kid parameter, without sanitization or allowlisting.

Manifestation: An attacker forges a token asserting an arbitrary tenant identity. In algorithm confusion, the RSA public key is used as an HMAC secret. In kid injection, the key lookup is subverted to return a predictable or empty key.

Where to look: JWT library configuration — whether the accepted algorithm is enforced in application configuration or derived from the token. Key lookup implementation — whether kid values are validated against an allowlist before use.

Remediation class: Enforce the accepted signing algorithm in application configuration. Validate kid values against an explicit allowlist. Parameterize any database or filesystem key lookup.

Systemic Boundary Enforcement Failure

Precondition: An organization that has treated authorization as a feature rather than infrastructure — no unified authz layer, access control logic distributed across services, no consistent model for what constitutes a valid access decision.

Manifestation: Authentication bypasses that cascade into authorization failures. Fortinet CVE-2022-40684 is the canonical example: an authentication bypass in the administrative interface that allowed unauthenticated requests to be processed as authenticated ones, which then operated with full administrative privileges because the authorization layer assumed the authentication layer had already made the access decision. The isolation failure was not in the authorization model. It was in the assumption that a prior layer had correctly established the principal's identity and entitlements.

Where to look: Any system where authorization checks downstream assume authentication has been performed upstream, without independently verifying the authentication result. Particularly: internal service meshes, API gateways, and middleware chains where each layer trusts the prior layer without verification.

Remediation class: Treat every service boundary as an authentication and authorization boundary. Do not assume that a request arriving at an internal endpoint has been pre-authorized. Verify principal identity and tenant context at every enforcement point, not just at the perimeter.

§ V

Detection in inherited systems

The catalogue above describes failure patterns. This section describes how to find them in a system you didn't build, where the original design decisions are partially documented at best and the people who made them may no longer be available.

The challenge with tenant isolation failures is that they are silent in normal operation. A missing WHERE clause doesn't throw an exception. A misconfigured cache key doesn't produce an error. Cross-tenant data access that the authorization model permits looks identical in application logs to legitimate access. You are looking for the absence of a constraint, which produces no signal of its own — only the consequence of the constraint's absence, which may be indistinguishable from correct behavior unless you know what to look for.

Effective detection requires combining three approaches: static analysis of the authorization model, instrumentation of the data access layer, and adversarial testing against the patterns in the catalogue.

Mapping the authorization model

Before instrumenting anything, establish what the authorization model is actually doing — not what the documentation says it does, not what the engineers believe it does, but what the code does.

Start with the enforcement points. In a sufficiently large system, authorization checks are not centralized. They are distributed across middleware, service methods, ORM base classes, API decorators, and individual endpoint handlers. Map every location where an access decision is made. The gaps between those locations — code paths that reach data access without passing through a known enforcement point — are your highest-priority investigation targets.

The specific question to answer for each enforcement point: does this check establish both that the principal has the required permission type and that the requested resource belongs to the principal's tenant context? A check that answers only the first question is an IDOR precondition. Document which checks answer both and which answer only one.

For systems mid-migration between authorization models, map which code paths use the old model and which use the new. The boundary between them is where Implicit Boundary Erasure manifests. Any resource accessible via both old and new code paths deserves explicit verification that both paths enforce equivalent tenant constraints.

Query pattern analysis

The data layer leaves a record. The signals to look for in query logs and slow query logs:

Cross-tenant query patterns. In a correctly isolated system, queries against tenant-scoped tables should show a consistent distribution — each tenant's queries touch only that tenant's rows. A query that returns rows with tenant identifiers that don't match the session's tenant context is a hard signal. This requires query logging at a level of detail that most production systems don't enable by default, but enabling it on a staging environment against a representative data set is sufficient for an audit.

Missing filter patterns. Queries against tables that carry a tenant discriminator column but don't include that column in the WHERE clause. Most query analysis tools — pg_stat_statements in PostgreSQL, slow query logs in MySQL — expose query templates that make this pattern visible without exposing actual data values. A query template of SELECT * FROM documents WHERE id = $1 against a table that should always be filtered by tenant_id is an immediate finding.

Anomalous cache hit rates. A cache that is correctly keyed by tenant should show roughly uniform hit rates across tenants of similar activity levels. A cache with tenant-discriminator omission will show artificially high hit rates for tenants with lower activity — because high-activity tenants are populating cache entries that low-activity tenants are hitting. This is a weak signal and requires baseline data to interpret, but it is detectable in cache monitoring tools that expose per-key hit statistics.

Eager loading query shapes. ORM eager loading produces recognizable query patterns — a primary query followed by a secondary query with an IN clause. The secondary query should carry the same tenant filter as the primary. In query log analysis, look for IN-clause queries against associated tables that lack the tenant discriminator. These are candidates for eager load boundary escape.

Token and session instrumentation

JWT claims are logged inconsistently across systems — often not at all, because logging claim contents raises privacy concerns. For an authorization audit, logging the tenant claim and the tenant context of the accessed resource on every authenticated request — not the full token, just the relevant claims — provides the instrumentation surface needed to detect cross-tenant token replay and tenant switching attacks.

The specific signal: requests where the token's tenant claim does not match the tenant context of the resource being accessed. In a correctly implemented system, this should never occur on legitimate traffic. Any occurrence is either an attack attempt, a client bug, or an authorization model gap. All three are worth investigating.

For systems using session-based tenant context rather than token claims, instrument session initialization to log when tenant context is set and when it changes within a session. Tenant context changes mid-session are a signal for switching attack attempts. Session context that is set once at login and never re-verified against the current request's resource context is a structural gap.

Adversarial testing methodology

Static analysis and log instrumentation find passive signals. Adversarial testing actively exercises the failure patterns.

The testing methodology for inherited systems follows the catalogue structure. For each pattern, the test asks: is the precondition present, and if so, does the failure manifest?

For object-level entitlement bypass: enumerate resource identifiers across tenants — this is usually possible with a test environment containing data for multiple tenants — and attempt to access tenant B's resource identifiers while authenticated as tenant A. A 200 response is a finding. A 403 that leaks resource metadata in the error body is a partial finding.

For cross-tenant token replay: obtain a valid token for tenant A. Construct a request for a resource known to belong to tenant B. Present tenant A's token. Observe whether the endpoint validates the token's tenant claim against the resource's tenant context or only validates the token's authenticity.

For cache key tenant omission: with two tenant sessions active simultaneously, request the same resource identifier from both sessions in sequence. Verify that the second response reflects the second tenant's data, not the first tenant's cached response. Vary the request order. Verify that cache invalidation in tenant A does not affect tenant B's cached results for the same resource identifier.

For eager load boundary escape: request a resource with associated objects from tenant A's session. Verify that the associated objects in the response all carry tenant A's identifier. In a test environment with cross-tenant foreign key relationships — which can be introduced deliberately to exercise this — verify that the ORM does not return associated objects belonging to other tenants.

These tests are not a substitute for a full security assessment. They are a triage methodology for quickly establishing whether the highest-risk patterns from the catalogue are present, so investigation effort can be prioritized accordingly.

§ VI

Closing

None of these patterns are obscure. The Discord 2023 message visibility incident — where users briefly received messages from servers they were not members of — was a boundary check failure in a system that had been running in production for years, reviewed by competent engineers, and scaled to hundreds of millions of users. The Fortinet authentication bypass was a systemic enforcement failure in security infrastructure. The RBAC-to-ReBAC migration failure described in Part I happened at a competent engineering organization with a dedicated authz service.

The common thread is not negligence. It is the compounding cost of implicit assumptions — that the database handles scoping, that the ORM handles tenant filtering, that the upstream service handled the auth check, that the migration carried forward all the constraints the old model enforced. Each assumption is individually defensible. Together, they produce systems where the isolation boundary exists in three places, is enforced by none of them reliably, and fails in ways that are invisible until they are not.

Tenant isolation does not emerge from a correctly configured system. It is designed, explicitly, at the authorization layer, and verified at every other layer that touches tenant-scoped data. The catalogue above is a starting point for that verification. It is not exhaustive — isolation failures are generative, and the patterns described here will manifest in new combinations as architectures evolve.

The prerequisite for getting this right is accepting that authorization is not solved. It is the hardest problem in software that the industry keeps shipping around rather than through. Every framework that promises to handle it for you, every ORM that promises to scope it for you, every database feature that promises to enforce it for you, is offering to move the problem somewhere less visible. It is still your problem. It was always your problem.