Akeyless uses learning agents to review PRs and compound security

By NHI Mgmt Group Editorial TeamPublished 2026-05-11Domain: Best PracticesSource: Akeyless

TL;DR: An AI-driven review pipeline can convert each investigation into a reusable skill, then apply those skills across every pull request to test authentication, secrets handling, and trust-boundary regressions automatically, according to Akeyless. The deeper issue is not speed but whether security knowledge can be encoded into persistent review logic without confusing detection with durable governance.

At a glance

What this is: Akeyless outlines a learning PR-review pipeline that turns each security investigation into reusable skills and applies them automatically to code changes.

Why it matters: It matters because identity teams increasingly need to govern code paths, secrets, and trust boundaries as living controls, not one-time review events.

👉 Read Akeyless's full methodology for learning-based pull request security reviews

Context

Security review becomes harder when the system being reviewed keeps changing faster than human reviewers can retain context. In this case, the primary issue is not just code quality, but whether security knowledge about authentication, secrets handling, and trust boundaries can be carried forward from one pull request to the next without losing precision.

For identity programmes, the relevance sits in machine identity governance and secret handling rather than human authentication. The article describes a review system that reasons over token scope, request paths, and downstream blast radius, which places it squarely in NHI security and lifecycle oversight rather than general application scanning.

Key questions

Q: How should security teams govern pull requests that change authentication or secrets logic?

A: Treat those pull requests as identity control changes, not ordinary code updates. Require dedicated review, a proving test, and a merge gate that checks the full authentication path. That approach reduces the chance that a valid change in one component quietly weakens token scope, permission checks, or secrets handling elsewhere in the delivery chain.

Q: What breaks when token validation is treated as the same thing as authorisation?

A: The control fails because a valid token can still be used outside its intended scope. If reviewers stop at signature verification, they miss whether the claims, audience, and action all match. That creates a familiar machine-identity weakness where identity is proven, but permission is assumed rather than enforced.

Q: How do teams know whether a learning review system is actually improving security?

A: Look for fewer repeat findings on the same auth paths, stronger tests attached to every issue, and a lower rate of regressions after code changes merge. The best signal is whether the system catches scope drift and trust-boundary failures before production, not whether it produces more findings overall.

Q: Who is accountable when machine-identity review logic becomes part of the control plane?

A: Accountability stays with the security and engineering owners who define the review logic and approve its use. Once learned reasoning is reused across pull requests, it becomes part of the organisation’s control surface and must be governed like any other security policy, including validation, exception handling, and retirement.

Technical breakdown

How skill extraction turns findings into reusable security logic

The core mechanism is not a static rule set. An investigation produces a finding, then the investigative logic is distilled into a skill that describes what to look for, where to look for it, what context matters, and how to judge failure. That skill is then loaded into persistent context for later reviews. In practice, this is closer to encoded security reasoning than pattern matching, because the agent is asked to replay the logic across future code changes. The important distinction is that the knowledge is structured enough to be reused, but still specific enough to preserve the original security intent.

Practical implication: Treat any learning review system as a knowledge control that must preserve context, not just a detection engine.

Why PR pipelines are being used as security enforcement points

The article describes a four-stage pipeline: review, test, re-test, and final test. That structure matters because it shifts security from advisory comments into merge-gating enforcement. Review finds the issue, test proves it, re-test validates the fix, and final test checks the merged system state rather than only the diff. This is materially different from ordinary code scanning because the system evaluates combined behaviour, including adjacent changes from earlier pull requests. For identity and secrets flows, that approach is valuable because regressions often appear only when authentication logic and caching, routing, or permission checks interact.

Practical implication: Use pull requests as enforcement checkpoints for identity-sensitive code paths, not just as review inboxes.

Blast radius changes how credential and token issues should be scored

Akeyless frames every finding through downstream impact. That is the right instinct for machine identity systems, where a single credential can protect many other systems. The article makes clear that a valid signature is not enough if the token is then trusted for actions outside its intended scope. That is a familiar NHI failure mode: authentication succeeds, authorisation is assumed, and the resulting trust expansion becomes the real risk. In distributed trust environments, the severity of a flaw depends less on the local bug than on what the compromised identity can reach next.

Practical implication: Score findings by reachable systems and token scope, not only by whether the immediate control failed.

Threat narrative

Attacker objective: The objective is not simple code tampering, but the persistence of a security blind spot that can be reused across future identity and secrets changes.

entry: A trusted identity or token reaches the review path through a legitimate pull request or security-relevant code change, giving the system a real code path to inspect.
escalation: The review logic examines token scope, request context, and downstream trust boundaries, then persists what it learns as reusable skill rather than treating the finding as a one-off.
impact: If the learning loop is wrong or incomplete, the programme normalises a flawed security assumption across future reviews and allows repeated exposure in authentication and secrets handling paths.

Reviewdog GitHub Action supply chain attack — reviewdog/action-setup GitHub Action supply chain attack exposed secrets.
CI/CD pipeline exploitation case study — full server takeover via exposed .git directory and mismanaged CI/CD pipeline secrets.

Read our 52 NHI Breaches Analysis report for a comprehensive view of breaches impacting Non-Human Identities including AI Agents.

NHI Mgmt Group analysis

Security review knowledge is becoming an identity control, not just an engineering convenience. When a system learns from investigations and re-applies that reasoning on every pull request, the control boundary shifts from human review to embedded governance logic. That matters because security findings about tokens, trust boundaries, and request scope are really identity decisions in disguise. Practitioners should treat this as machine identity governance inside the delivery pipeline, not as a faster scanner.

Valid authentication is not the same as valid authorisation, and learning systems only help if they preserve that distinction. The article’s token example is the classic failure mode: a signature check proves identity, but nothing proves the token is being used within its intended scope. That distinction is central to OWASP-NHI and zero-trust thinking because machine identities are often trusted too broadly once they are authenticated. The implication is that review logic must keep authorisation separate from identity validation.

Blast radius is the correct severity lens for credentialed systems. In a platform that stores or brokers other systems’ credentials, a flaw is never local. The meaningful question is what the identity can reach if trust is misplaced or scope is expanded. That is why lifecycle, scope, and downstream dependency mapping belong in the same conversation. Practitioners should weigh any identity-related finding by the systems it can unlock, not only by the immediate defect.

Persistent context creates a governance benefit, but it also creates a governance memory problem. Once a learning system accumulates skills, it becomes part of the organisation’s control surface and inherits the burden of change management, validation, and auditability. Security teams should not assume that more learning automatically equals better governance. The real task is proving that the stored reasoning remains aligned with current auth flows, trust boundaries, and exception handling.

Identity programmes should stop treating code review, secrets, and runtime trust as separate planes. The article shows they are one continuous control path. A token scope decision made in code can become a runtime access decision later, and a review finding can become policy if it is encoded into persistent logic. Practitioners should manage that chain as a single identity lifecycle problem, from code change to enforced control.

From our research:
85% of organisations lack full visibility into third-party vendors connected via OAuth apps, according to The State of Non-Human Identity Security.
A further 47% have only partial visibility into those vendors, which leaves governance teams without a complete picture of delegated access paths.
For a broader view of how credential sprawl turns into breach exposure, see The 52 NHI breaches Report.

What this signals

Skill-based review will expose a new governance boundary for machine identities. As security logic becomes reusable and persistent, teams will need to decide which parts of that logic are policy, which parts are evidence, and which parts are simply analyst memory encoded in software. The practical challenge is not building more checks, but governing the lifecycle of the checks themselves.

With 1 in 4 organisations already investing in dedicated NHI security capabilities and 60% planning to follow, the market is moving toward controls that can handle identity behaviour across code, pipeline, and runtime. That makes lifecycle alignment more important, because a review skill that outlives the auth model it was built for can create false confidence.

Persistent review memory only works if it stays tied to current trust boundaries. That is the central programme risk here. Teams that rely on learning automation must be able to retire obsolete skills, verify scope assumptions, and keep review logic aligned with actual credential flow. Otherwise the control becomes a repository of yesterday’s security thinking.

For practitioners

Map identity-sensitive review paths end to end Identify every pull request path that can change authentication, token validation, secrets handling, or trust-boundary logic, then require a dedicated review and test chain for those changes. Use the same ownership model for code review that you use for privileged access decisions.
Separate identity validation from authorisation checks Require review logic to prove not only that a token or credential is valid, but also that the claims, scope, and intended request context match the action being attempted. A valid signature must never be treated as proof of permission.
Score findings by downstream blast radius Classify issues by what a compromised credential can reach next, not just by the local code defect. In systems that broker secrets or protect other platforms, the real severity is the number of dependent systems exposed by one flaw.
Require re-test against merged system state Do not stop at a diff-level fix. Re-run tests after adjacent changes, cached responses, or validation helpers are updated, because identity regressions often appear when old and new logic interact in the same runtime path.
Preserve and audit learned security logic Treat every extracted skill as governed content. Track when it was created, which auth flow it applies to, and when it should be retired so that accumulated review logic does not become stale or misleading.

Key takeaways

The article shows that security review itself can become a governed identity control when findings are converted into reusable logic.
The most important failure mode is confusing authentication success with authorisation correctness, especially in token-driven machine identity flows.
Practitioners should manage these systems by blast radius, scope, and lifecycle of learned logic, not by review speed alone.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Non-Human Identity Top 10 address the attack and risk surface, while NIST CSF 2.0 and NIST Zero Trust (SP 800-207) set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Non-Human Identity Top 10	NHI-03	The article centers on token handling, secret protection, and repeatable identity review logic.
NIST CSF 2.0	PR.AC-4	Reviewing request scope and access rights aligns with least-privilege enforcement.
NIST Zero Trust (SP 800-207)	PR.AC-1	The piece focuses on continuous verification of trust boundaries and identity claims.

Map learned review skills to NHI controls and verify token scope, rotation, and validation logic on every change.

Key terms

Skill extraction: Skill extraction is the process of turning a completed investigation into reusable security logic. In this article, it means capturing what was checked, why it mattered, and how the finding was confirmed so the same reasoning can be applied to later pull requests without starting from scratch.
Blast radius: Blast radius is the downstream scope of damage that follows from a compromised identity, token, or control failure. For machine identity systems, it is the more useful severity measure because one credential can unlock many systems, workflows, or secrets rather than one isolated application.
Trust boundary: A trust boundary is the point at which one system, environment, or component stops being assumed safe by another. In identity security, it defines where a token, credential, or request must be re-validated because assumptions made on one side should not automatically apply on the other.
Persistent context: Persistent context is stored reasoning that remains available to a system across multiple executions. Here it means the agent carries prior investigative logic into future pull requests, which improves continuity but also requires governance so obsolete assumptions do not become embedded policy.

What's in the full article

Akeyless's full article covers the operational detail this post intentionally leaves for the source:

The exact skill extraction loop used to convert investigation findings into persistent review logic.
Stage-by-stage pull request pipeline behaviour, including review, test, re-test, and final merge gating.
Examples of how the agent evaluates auth flows, trust boundaries, and downstream blast radius in real code paths.
The platform-specific trust model behind distributed keys, gateway boundaries, and SaaS control plane decisions.

👉 Akeyless's full post shows how findings become skills and how those skills run on every PR

Deepen your knowledge

NHI governance, agentic AI identity, and machine identity security are core topics in our NHI Foundation Level course, the industry's only accredited NHI security programme. If you are responsible for identity security strategy or operational governance, it is worth exploring.

NHIMG Editorial Note
Published by the NHIMG editorial team on 2026-05-11.
NHI Mgmt Group — the independent authority on Non-Human Identity, IAM, and Agentic AI security. nhimg.org