TL;DR: A test of Claude, Gemini, and o3 on a tree-based combobox found that LLMs can scaffold compound component APIs quickly, according to WorkOS, but they still struggle with nested behaviour, keyboard support, screen-reader semantics, and state coordination in complex UI. That makes context, tests, and manual review the real guardrails, not prompt length alone.
NHIMG editorial — based on content published by WorkOS: Vibecoding a complex combobox component
Questions worth separating out
Q: How should teams use LLMs safely for complex UI components?
A: Use LLMs for scaffolding, boilerplate, and pattern completion, but require tests and human review for behaviour, accessibility, and state coordination.
Q: Why do AI-generated components fail more often when nested interaction gets complicated?
A: Nested interaction creates competing event handlers, focus states, and render rules that are easy for a model to approximate but hard to keep consistent.
Q: What do security and platform teams get wrong about AI-assisted development?
A: They often assume that a plausible first draft means the hard part is solved.
Practitioner guidance
- Start with tests for complex components Define keyboard flows, focus transitions, filtering behaviour, and accessibility expectations before generating code.
- Keep in-file context close to the code Add comments, usage examples, and local design-system conventions directly in the files the model touches.
- Audit AI output for interaction semantics Review the generated code for event ownership, focus handling, and the correct ARIA pattern before merging.
What's in the full article
WorkOS's full article covers the implementation details this post intentionally leaves for the source:
- Prompt-by-prompt comparison of Claude, Gemini, and o3 on the same tree-combobox task
- Concrete examples of where the generated component API broke nested tree behaviour
- The specific debugging loop used to decide when prompting stopped being efficient
- Practical reflections on Cursor, monorepo context, and design-system reuse
👉 Read WorkOS's analysis of LLMs building a complex tree combobox →
Tree comboboxes and LLM coding: where the workflow falls apart?
Explore further
AI-assisted UI work is most reliable at scaffolding, not at preserving control semantics. The article shows that LLMs can produce a credible starting point for a complex component, but the deeper the interaction model goes, the more they lose fidelity. That is a design-system problem as much as a coding problem, because enterprise UI depends on stable conventions for composition, state, and accessibility. The practitioner conclusion is that AI can accelerate the first draft, but it does not replace architecture discipline.
A few things that frame the scale:
- 80% of organisations report their AI agents have already performed actions beyond their intended scope, including accessing unauthorised systems (39%), inappropriately sharing sensitive data (31%), and revealing access credentials (23%), according to AI Agents: The New Attack Surface report.
- Only 52% of companies can track and audit the data their AI agents access, leaving 48% with a complete blind spot for compliance and breach investigation.
A question worth separating out:
Q: When should teams prefer manual implementation over more prompting?
A: Prefer manual implementation when repeated prompting starts changing one bug into several new ones. That is a sign the model has lost the behavioural shape of the component and is optimising for surface resemblance. At that point, rewriting the critical paths is usually faster than continuing to iterate on unstable output.
👉 Read our full editorial: LLMs can scaffold complex UI, but accessibility still breaks