Notifications

Clear all

Agentic AI for flaky tests: what does it change for teams?

Last Post

RSS

NHI Mgmt Group

(@nhi-mgmt-group)

Member Moderator

Joined: 1 year ago

Posts: 6713

Topic starter 23/06/2026 9:50 pm

TL;DR: An agentic workflow fixed 12 of its 15 flakiest tests over a week and a half, cutting a process that can take an engineer a full day per pair of failures into a 4 to 5 hour loop with no human intervention, according to Kong. The deeper lesson is that runtime autonomy changes how teams should think about debugging, verification, and context management, not just speed.

NHIMG editorial — based on content published by Kong: How We Used Agentic AI to Fix Kong Gateway's Flakiest Tests

Questions worth separating out

Q: How should teams govern agentic AI workflows that can branch and commit code?

A: Treat them as governed runtime identities, not as ordinary automation.

Q: Why do agentic debugging workflows create new IAM risk even when they stay inside CI?

A: Because they combine log access, code access, branch creation, and repeated verification into one delegated execution path.

Q: What breaks when an AI agent keeps too much context across troubleshooting runs?

A: It becomes easier for stale hypotheses to shape new actions, which increases false confidence and widens the chance of repeated misdiagnosis.

Practitioner guidance

Define agent-specific repository permissions Give debugging agents only the repository, branch, and CI permissions they need for a single task class.
Require explicit stopping conditions for verification loops Document what counts as success, how many reruns are required, and what evidence the verifier must retain before a pull request is opened.
Limit context retention between runs Pass forward only a short summary of prior attempts, not the full investigative history.

What's in the full article

Kong's full blog post covers the implementation detail this post intentionally leaves at a higher level:

The exact identify-fix-verify loop used to seed, run, and validate flaky-test remediation.
How the orchestrator and subagent roles are separated across Opus and Haiku models.
Why the team chose to keep each flake-fixer context small between attempts.
The two bug discoveries that emerged from the workflow, including the conf_loader sorting issue and the auth race condition.

👉 Read Kong's analysis of agentic AI fixing flaky tests in Kong Gateway →

Agentic AI for flaky tests: what does it change for teams?

Explore further

View Full Forum → | NHI Foundation Course →

Quote

Topic Tags

Forum Statistics

9 Forums

8,056 Topics

13.7 K Posts

17 Online

135 Members

Latest Post: June 2025 Patch Tuesday: are your IAM controls keeping up? Our newest member: Alex Recent Posts Unread Posts Tags

Forum Icons: Forum contains no unread posts Forum contains unread posts

Topic Icons: Not Replied Replied Active Hot Sticky Unapproved Solved Private Closed

#1 Authority in NHI Education, Research and Advisory, empowering organizations to tackle the critical risks posed by Non-Human Identities (NHIs), including AI Agents.

Get in Touch

Quick Links

FAQ

NHI 101 Articles

Legal & Policies