fix(voice-call): suppress duplicate OpenAI initial greeting (#85846)#85932
fix(voice-call): suppress duplicate OpenAI initial greeting (#85846)#85932lml2468 wants to merge 3 commits into
Conversation
|
Codex review: needs real behavior proof before merge. Reviewed June 14, 2026, 10:06 PM ET / 02:06 UTC. Summary PR surface: Source +52, Tests +385. Total +437 across 6 files. Reproducibility: yes. at source level: current main queues an explicit Voice Call greeting while OpenAI server VAD Review metrics: 1 noteworthy metric.
Stored data model Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Proof guidance:
Risk before merge
Maintainer options:
Next step before merge
Security Review findings
Review detailsBest possible solution: Keep the explicit Voice Call initial greeting as the startup owner, implement OpenAI-provider-owned first-turn/manual-response suppression without new shared API surface when possible, preserve all configured turn-detection preferences, and land only after redacted live Twilio/OpenAI proof. Do we have a high-confidence way to reproduce the issue? Yes at source level: current main queues an explicit Voice Call greeting while OpenAI server VAD Is this the best way to solve the issue? No, this branch is not currently the best mergeable fix because it adds shared bridge API surface and restores Full review comments:
Overall correctness: patch is incorrect AGENTS.md: found and applied where relevant. Codex review notes: model internal, reasoning high; reviewed against 44e6caff5401. Label changesLabel changes:
Label justifications:
Evidence reviewedPR surface: Source +52, Tests +385. Total +437 across 6 files. View PR surface stats
What I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
|
ClawSweeper PR egg 🎁 Pass real behavior proof to wake the egg and unlock a hatchable treat. Where did the egg go?
|
|
Thanks for the PR. Before this can move forward, please add live proof from the affected surface, not just unit tests, mocked tests, or source inspection. A useful proof update should include:
Please redact secrets, tokens, phone numbers, and private message content from logs or screenshots. |
|
Thanks @lml2468 for working on this fix. I am closing this as superseded by #86285 for the #85846 OpenAI realtime duplicate-greeting path: both PRs target the same root cause, and #86285 now carries the stronger current proof with the real behavior proof check passing. Clownfish will keep the canonical issue and candidate fix path on #85846 / #86285 so review and validation stay in one place. Credit for this source PR remains visible here and in the cluster evidence; if #86285 misses a case this PR covers, please reply and we can reopen or split that detail back out. |
Fixes #85846
Summary
Outbound OpenAI realtime voice calls with an initial message could arm two first-turn response sources: OpenClaw's explicit
triggerGreeting()path and OpenAI server VADcreate_response. This keeps the explicit greeting as the startup owner by temporarily disabling OpenAI server-VAD auto-response for the initial greeting, then restoring auto-response after that manual greeting reaches a terminal response event.Changes
src/talk/provider-types.ts- add an optional restore flag for providers that need to re-enable server-side audio auto-response after the initial greeting.src/talk/session-runtime.ts- suppress OpenAI auto-response only for initial-greeting sessions whereautoRespondToAudiowas not explicitly disabled.extensions/openai/realtime-voice-provider.ts- preserveautoRespondToAudioprovider config, emit initial OpenAI session config withcreate_response: false, and restorecreate_response: trueafterresponse.doneorresponse.cancelled.extensions/voice-call/src/webhook/realtime-handler.test.ts- add the regression test that fails before the fix and a no-greeting control case.extensions/openai/realtime-voice-provider.test.tsandsrc/talk/session-runtime.test.ts- cover OpenAI-only suppression, explicit false preservation, restore-on-done, restore-on-cancelled, and no-restore negative cases.Tests
extensions/voice-call/src/webhook/realtime-handler.test.tsextensions/openai/realtime-voice-provider.test.ts,src/talk/session-runtime.test.tsReal Behavior Proof
Behavior addressed: OpenAI realtime outbound voice calls with an initial greeting no longer allow server VAD to auto-create a competing startup response before the explicit greeting path speaks.
Real environment tested: Local macOS checkout, Node 24 via
pnpm dlx node@24, pnpm 11.2.2, mocked realtime WebSocket/unit-level voice-call path.Exact steps or command run after this patch:
CI=true pnpm dlx pnpm@11.2.2 install --frozen-lockfile;pnpm dlx node@24 scripts/build-all.mjs;pnpm dlx pnpm@11.2.2 check;pnpm dlx node@24 scripts/run-vitest.mjs extensions/openai/realtime-voice-provider.test.ts extensions/voice-call/src/webhook/realtime-handler.test.ts src/talk/session-runtime.test.tsEvidence after fix: The red-phase test failed before the implementation with
expected undefined to be false; on the fixed branch the focused suite passed with 3 files and 81 tests.Observed result after fix: OpenAI bridge creation receives
autoRespondToAudio: falseplus the restore flag for outbound initial-greeting sessions, sends one explicit greetingresponse.create, and later sends one restoresession.updatewithcreate_response: true.What was not tested: A live Twilio/OpenAI phone call was not run because this review environment does not provide those credentials.
Notes for Maintainer
session.updatescoped to turn detection rather than re-sending full session config.response.doneandresponse.cancelled; generic realtimeerrorevents still flow through the existing error handler.