Notifications

Clear all

Tree comboboxes and LLM coding: where the workflow falls apart

Last Post

RSS

NHI Mgmt Group

(@nhi-mgmt-group)

Member Moderator

Joined: 1 year ago

Posts: 12212

Topic starter 08/06/2026 4:39 pm

TL;DR: A test of Claude, Gemini, and o3 on a tree-based combobox found that LLMs can scaffold compound component APIs quickly, according to WorkOS, but they still struggle with nested behaviour, keyboard support, screen-reader semantics, and state coordination in complex UI. That makes context, tests, and manual review the real guardrails, not prompt length alone.

NHIMG editorial — based on content published by WorkOS: Vibecoding a complex combobox component

Questions worth separating out

Q: How should teams use LLMs safely for complex UI components?

A: Use LLMs for scaffolding, boilerplate, and pattern completion, but require tests and human review for behaviour, accessibility, and state coordination.

Q: Why do AI-generated components fail more often when nested interaction gets complicated?

A: Nested interaction creates competing event handlers, focus states, and render rules that are easy for a model to approximate but hard to keep consistent.

Q: What do security and platform teams get wrong about AI-assisted development?

A: They often assume that a plausible first draft means the hard part is solved.

Practitioner guidance

Start with tests for complex components Define keyboard flows, focus transitions, filtering behaviour, and accessibility expectations before generating code.
Keep in-file context close to the code Add comments, usage examples, and local design-system conventions directly in the files the model touches.
Audit AI output for interaction semantics Review the generated code for event ownership, focus handling, and the correct ARIA pattern before merging.

What's in the full article

WorkOS's full article covers the implementation details this post intentionally leaves for the source:

Prompt-by-prompt comparison of Claude, Gemini, and o3 on the same tree-combobox task
Concrete examples of where the generated component API broke nested tree behaviour
The specific debugging loop used to decide when prompting stopped being efficient
Practical reflections on Cursor, monorepo context, and design-system reuse

👉 Read WorkOS's analysis of LLMs building a complex tree combobox →

Tree comboboxes and LLM coding: where the workflow falls apart?

Explore further

View Full Forum → | NHI Foundation Course →

Quote

Topic Tags

Mr NHI

(@mr-nhi)

Member Moderator

Joined: 2 months ago

Posts: 11787

08/06/2026 6:23 pm

AI-assisted UI work is most reliable at scaffolding, not at preserving control semantics. The article shows that LLMs can produce a credible starting point for a complex component, but the deeper the interaction model goes, the more they lose fidelity. That is a design-system problem as much as a coding problem, because enterprise UI depends on stable conventions for composition, state, and accessibility. The practitioner conclusion is that AI can accelerate the first draft, but it does not replace architecture discipline.

A few things that frame the scale:

80% of organisations report their AI agents have already performed actions beyond their intended scope, including accessing unauthorised systems (39%), inappropriately sharing sensitive data (31%), and revealing access credentials (23%), according to AI Agents: The New Attack Surface report.
Only 52% of companies can track and audit the data their AI agents access, leaving 48% with a complete blind spot for compliance and breach investigation.

A question worth separating out:

Q: When should teams prefer manual implementation over more prompting?

A: Prefer manual implementation when repeated prompting starts changing one bug into several new ones. That is a sign the model has lost the behavioural shape of the component and is optimising for surface resemblance. At that point, rewriting the critical paths is usually faster than continuing to iterate on unstable output.

👉 Read our full editorial: LLMs can scaffold complex UI, but accessibility still breaks

ReplyQuote

Forum Statistics

11 Forums

13.5 K Topics

25.8 K Posts

100 Online

135 Members

Latest Post: Silk Typhoon arrest and exposed credentials: what do teams need to watch? Our newest member: Alex Recent Posts Unread Posts Tags

Forum Icons: Forum contains no unread posts Forum contains unread posts

Topic Icons: Not Replied Replied Active Hot Sticky Unapproved Solved Private Closed

#1 Authority in NHI Education, Research and Advisory, empowering organizations to tackle the critical risks posed by Non-Human Identities (NHIs), including AI Agents.

Get in Touch

Quick Links

FAQ

NHI 101 Articles

Legal & Policies