What breaks when an agent can create new skills from user feedback?

Why This Matters for Security Teams

When an agent can turn user feedback into a new skill, the security problem shifts from access review to change control. The agent is no longer a fixed workload with a predictable entitlement set. It becomes an autonomous system that can expand its own capability surface, alter its tool use, and create fresh paths to data or actions. That is why static RBAC and periodic recertification are too slow for this class of workload. Current guidance in the OWASP Agentic AI Top 10 and NIST AI Risk Management Framework both point toward runtime governance, not just pre-approved roles.

NHI teams should treat skill creation as a privileged event, because it can effectively mint new operational behaviour without going through a normal software release. That is the same reason NHI governance fails when secrets, policies, and workload identity drift apart. The OWASP NHI Top 10 and CSA MAESTRO agentic AI threat modeling framework both reinforce that agent behaviour must be scoped continuously, not assumed stable after onboarding. In practice, many security teams discover this only after an agent has already gained a new tool path through routine feedback, rather than through intentional release governance.

How It Works in Practice

The practical control model is to separate learning from authority. User feedback may help the agent propose a new skill, but promotion into production should require review, policy evaluation, and a bounded identity. That means the skill is treated like a change record, while the agent keeps a workload identity that is distinct from any one skill. For runtime enforcement, intent-based authorisation is a better fit than static role grants, because the decision can ask what the agent is trying to do, what data it needs, and whether the requested action matches current policy.

For credentials, the safer pattern is just-in-time provisioning with short-lived secrets. Rather than giving the agent long-lived API keys, issue ephemeral credentials per task and revoke them when the task ends. This is where workload identity matters: the system should prove what the agent is at runtime, not just what secret it currently holds. In agentic environments, that identity can be anchored with OIDC or SPIFFE-style trust, then evaluated against policy-as-code before each sensitive action.

Gate new skills behind approval, testing, and scoped entitlement review.

Use short TTLs for secrets and rotate any material that touches new capabilities.

Evaluate authorisation at request time, not only at login or session start.

Log skill creation as an access-affecting event, not just a product feature.

That aligns with the operational direction described in the Ultimate Guide to NHIs — 2025 Outlook and Predictions and with the defensive patterns in the Anthropic — first AI-orchestrated cyber espionage campaign report, where autonomous behaviour can chain tools in ways humans did not explicitly script. These controls tend to break down when teams let the agent self-install skills in production without a separate approval path, because the entitlement graph changes faster than review cycles can catch up.

Common Variations and Edge Cases

Tighter skill governance often increases friction, requiring organisations to balance developer speed against blast-radius reduction. That tradeoff is real, especially in internal copilots, support agents, or workflow bots where teams want rapid iteration. There is no universal standard for exactly when a learned skill becomes a controlled change, but current guidance suggests treating anything that expands tool access, data reach, or execution scope as security-relevant.

Edge cases appear when the new skill is only local to one session, when it is generated but never activated, or when it changes prompt routing without directly touching a privileged API. Those still matter, because autonomous systems can use seemingly harmless logic to reach sensitive outcomes later. The OWASP Top 10 for Agentic Applications 2026 and MITRE ATLAS adversarial AI threat matrix are useful here because they emphasise dynamic abuse paths, not just obvious prompt injection. In a mature programme, the policy question is not whether the skill was helpful, but whether the agent’s new capability can be traced, constrained, and revoked on demand. For hybrid environments, the hardest cases are agents that learn from users but execute through shared service accounts, because attribution becomes ambiguous and recertification loses meaning before the next review cycle.

Standards & Framework Alignment

This section maps relevant standards and security frameworks to the operational risks and controls described in this guidance.

OWASP Agentic AI Top 10 and CSA MAESTRO address the attack and risk surface, while NIST AI RMF set the governance and control requirements practitioners need to meet.

Framework	Control / Reference	Relevance
OWASP Agentic AI Top 10		Agent skill creation is a dynamic abuse path covered by agentic security guidance.
CSA MAESTRO		MAESTRO models agent behaviour change, tool use, and policy-driven control points.
NIST AI RMF		AI RMF governance applies to autonomous systems whose behaviour evolves after deployment.

Assign ownership, monitor changes, and document controls for evolving agent capability.

#1 Authority in NHI Education, Research and Advisory, empowering organizations to tackle the critical risks posed by Non-Human Identities (NHIs), including AI Agents.

What breaks when an agent can create new skills from user feedback?

Why This Matters for Security Teams

How It Works in Practice

Common Variations and Edge Cases

Standards & Framework Alignment

Related resources from NHI Mgmt Group