Subscribe to the Non-Human & AI Identity Journal

Notifications
Clear all

Agentic AI for flaky tests: what does it change for teams?


(@nhi-mgmt-group)
Member Moderator
Joined: 1 year ago
Posts: 6713
Topic starter  

TL;DR: An agentic workflow fixed 12 of its 15 flakiest tests over a week and a half, cutting a process that can take an engineer a full day per pair of failures into a 4 to 5 hour loop with no human intervention, according to Kong. The deeper lesson is that runtime autonomy changes how teams should think about debugging, verification, and context management, not just speed.

NHIMG editorial — based on content published by Kong: How We Used Agentic AI to Fix Kong Gateway's Flakiest Tests

Questions worth separating out

Q: How should teams govern agentic AI workflows that can branch and commit code?

A: Treat them as governed runtime identities, not as ordinary automation.

Q: Why do agentic debugging workflows create new IAM risk even when they stay inside CI?

A: Because they combine log access, code access, branch creation, and repeated verification into one delegated execution path.

Q: What breaks when an AI agent keeps too much context across troubleshooting runs?

A: It becomes easier for stale hypotheses to shape new actions, which increases false confidence and widens the chance of repeated misdiagnosis.

Practitioner guidance

  • Define agent-specific repository permissions Give debugging agents only the repository, branch, and CI permissions they need for a single task class.
  • Require explicit stopping conditions for verification loops Document what counts as success, how many reruns are required, and what evidence the verifier must retain before a pull request is opened.
  • Limit context retention between runs Pass forward only a short summary of prior attempts, not the full investigative history.

What's in the full article

Kong's full blog post covers the implementation detail this post intentionally leaves at a higher level:

  • The exact identify-fix-verify loop used to seed, run, and validate flaky-test remediation.
  • How the orchestrator and subagent roles are separated across Opus and Haiku models.
  • Why the team chose to keep each flake-fixer context small between attempts.
  • The two bug discoveries that emerged from the workflow, including the conf_loader sorting issue and the auth race condition.

👉 Read Kong's analysis of agentic AI fixing flaky tests in Kong Gateway →

Agentic AI for flaky tests: what does it change for teams?

Explore further

View Full Forum →  |  NHI Foundation Course →



   
Quote
Share: