Workload Identity Request Forgery Prevention
TL;DR
- this article dives into the messy world of workload identity request forgery and how to stop it. we cover why service principals are getting hit, the risks of leaked credentials in github, and how to use continuous access evaluation to keep your cloud apps safe. its a deep look at protecting non-human identities from modern spoofing attacks.
The growing threat of workload identity forgery
Ever wonder why hackers stopped chasing passwords and started chasing tokens? It’s because machine identities don't use mfa, making them the perfect "skeleton key" for modern cloud environments.
The sheer scale of non-human identities (nhi) is exploding. In most enterprise clouds, service accounts outnumber human users by 5 to 1, yet they often lack a formal lifecycle. Since these identities are programmatic, spotting a "bad" sign-in is incredibly tough—it just looks like another api call.
- No MFA: Unlike your team, a service principal can't tap a notification on a phone.
- Anomalies hide in plain sight: Because workloads are automated, security teams struggle to distinguish between a spike in traffic and a credential theft.
- Over-privileged access: We often give apps "Contributor" roles just to get things working, then forget to trim them back.
Attackers aren't just guessing keys anymore; they're forging the very identity of the workload. If a dev leaves a secret in a public github repo, it's game over. But even more subtle is "Server-Side Request Forgery" (ssrf). This is where an attacker exploits a vulnerability in your web app to make it query the local metadata service—usually at the 169.254.169.254 address. Since the app is running on the server, the metadata service thinks the request is legit and hands over a valid access token. Just like that, the hacker has your identity without ever seeing a password.
In finance or healthcare, a single leaked token from a container can lead to massive data exfiltration before anyone notices. Next, we'll look at the actual plumbing of how these tokens move.
How token exchange actually works
To stop a forgery, you gotta understand the "handshake." When a workload needs to talk to a resource, it doesn't just show up. It goes to an Identity Provider (IdP) like microsoft entra or okta.
The workload presents its credentials—maybe a client secret or a certificate—and asks for a token. The IdP verifies this and issues a JSON Web Token (jwt). This token contains "claims" like who the identity is and what it's allowed to do. The workload then passes this token in the header of its api calls. The resource server looks at the signature on the token to make sure it hasn't been tampered with. If the signature matches, the door opens.
The problem is that if an attacker steals that jwt, they can "replay" it from anywhere until it expires. That's why the industry is moving toward more dynamic flows.
Technical strategies for prevention and detection
So, we know the "how" behind the forgery—now let's talk about actually stopping these attackers. It’s not just about stronger keys; it’s about making the entire identity environment way more hostile for a thief.
The old way of doing things—issuing a token that’s valid for an hour—is basically a lifetime in "hacker years." If a secret leaks, that token is a golden ticket until it expires. Continuous Access Evaluation (CAE) flips the script.
Because CAE allows for near-instant revocation based on real-time events (like a change in location or a deleted service principal), it actually allows us to safely use Long Lived Tokens (LLTs) that last 24 hours or more. We don't have to worry about the long expiration because the second something smells fishy, the IdP kills the session. It's way more efficient than refreshing tokens every 60 minutes.
You can't just set a policy and walk away. You have to watch the traffic for things that don't belong.
- Baseline behavior: You need to know what "normal" looks like. If an app that usually just uploads logs to a storage bucket suddenly starts trying to list all the secrets in an azure keyvault, that’s a massive red flag.
- Spotting the forgery: Look for "Anomalous service principal activity" detections. This might look like a service principal suddenly generating massive cross-tenant traffic that it never did before.
- Data exfiltration patterns: In industries like retail or finance, watch for high-volume api calls to sensitive endpoints that don't match the typical weekly batch processing schedule.
Establishing a robust NHI framework
Managing machine identities isn't just a "set it and forget it" task—it's more like tending a garden that grows way faster than you can weed. If you don't have a framework, you're basically just waiting for a credential to leak on github.
- Inventory is everything: You can't protect what you don't know exists. Start by cataloging every service principal and its purpose.
- Credential hygiene: Switch to x509 certificates instead of long-lived client secrets whenever you can. It’s harder to manage but way tougher to steal.
- Community wisdom: Joining groups like the Non-Human Identity Management Group (NHIMG) helps you stay ahead of how attackers are bypassing traditional pam tools.
In a recent chat with a peer in retail, they found that 40% of their "active" app identities hadn't signed in for six months. That’s a massive, unnecessary attack surface. Using tools that detect anomalous behavior—like a service principal suddenly trying to modify directory permissions—is the only way to catch a forgery in progress.
Remediation and incident response for compromised workloads
So, your service principal just got flagged for "Anomalous service principal activity." it's a gut punch, right? You realize some script in your dev environment is suddenly acting like a global admin.
Don't panic, but move fast. If an identity is pwned, you have to lock it down before you start cleaning up the mess.
- Revoke Permissions First: Before you do anything else, strip the identity of its iam roles or access to the vault. If you rotate secrets while the hacker still has "Contributor" rights, they might just steal the new ones too.
- Kill the old keys: Once access is revoked, immediately remove any compromised client secrets or certificates.
- Scrub the vault: Now that the identity can't get back in, go into your azure keyvault and rotate every secret that the pwned identity had access to. Consider everything it could "see" as compromised.
- Audit the logs: check the sign-in logs for unfamiliar ips. In healthcare, this might look like a pharmacy app suddenly calling the graphapi from a hosting provider in a different country.
Future proofing your identity architecture
The future of nhi security is getting rid of secrets entirely. We’re moving toward Workload Identity Federation, which is basically "secretless" auth. Instead of putting a password in a config file, your workload (like a github actions runner) uses its own oidc provider to prove who it is to the cloud. No secrets to leak, no secrets to rotate.
You should also be looking at SPIFFE/SPIRE. This provides a universal identity control plane for distributed systems. It issues short-lived, cryptographic identities to workloads regardless of where they run—on-prem, in k8s, or in the cloud. By moving to these "trust-based" models and monitoring behavior constantly, you make it almost impossible for a forged token to do any real damage. Stop using static secrets and start treating identity as a dynamic, living part of your stack.