Skip to content

fix(voice-call): suppress duplicate OpenAI initial greeting (#85846)#85932

Closed
lml2468 wants to merge 3 commits into
openclaw:mainfrom
lml2468:fix/issue-85846-openai-suppress-vad-during-greeting
Closed

fix(voice-call): suppress duplicate OpenAI initial greeting (#85846)#85932
lml2468 wants to merge 3 commits into
openclaw:mainfrom
lml2468:fix/issue-85846-openai-suppress-vad-during-greeting

Conversation

@lml2468

@lml2468 lml2468 commented May 24, 2026

Copy link
Copy Markdown
Contributor

Fixes #85846

Summary

Outbound OpenAI realtime voice calls with an initial message could arm two first-turn response sources: OpenClaw's explicit triggerGreeting() path and OpenAI server VAD create_response. This keeps the explicit greeting as the startup owner by temporarily disabling OpenAI server-VAD auto-response for the initial greeting, then restoring auto-response after that manual greeting reaches a terminal response event.

Changes

  • src/talk/provider-types.ts - add an optional restore flag for providers that need to re-enable server-side audio auto-response after the initial greeting.
  • src/talk/session-runtime.ts - suppress OpenAI auto-response only for initial-greeting sessions where autoRespondToAudio was not explicitly disabled.
  • extensions/openai/realtime-voice-provider.ts - preserve autoRespondToAudio provider config, emit initial OpenAI session config with create_response: false, and restore create_response: true after response.done or response.cancelled.
  • extensions/voice-call/src/webhook/realtime-handler.test.ts - add the regression test that fails before the fix and a no-greeting control case.
  • extensions/openai/realtime-voice-provider.test.ts and src/talk/session-runtime.test.ts - cover OpenAI-only suppression, explicit false preservation, restore-on-done, restore-on-cancelled, and no-restore negative cases.

Tests

  • Added failing test reproducing the issue: extensions/voice-call/src/webhook/realtime-handler.test.ts
  • Additional tests: extensions/openai/realtime-voice-provider.test.ts, src/talk/session-runtime.test.ts
  • Local checks: install ok, build ok, check ok, focused regression tests ok

Real Behavior Proof

Behavior addressed: OpenAI realtime outbound voice calls with an initial greeting no longer allow server VAD to auto-create a competing startup response before the explicit greeting path speaks.

Real environment tested: Local macOS checkout, Node 24 via pnpm dlx node@24, pnpm 11.2.2, mocked realtime WebSocket/unit-level voice-call path.

Exact steps or command run after this patch: CI=true pnpm dlx pnpm@11.2.2 install --frozen-lockfile; pnpm dlx node@24 scripts/build-all.mjs; pnpm dlx pnpm@11.2.2 check; pnpm dlx node@24 scripts/run-vitest.mjs extensions/openai/realtime-voice-provider.test.ts extensions/voice-call/src/webhook/realtime-handler.test.ts src/talk/session-runtime.test.ts

Evidence after fix: The red-phase test failed before the implementation with expected undefined to be false; on the fixed branch the focused suite passed with 3 files and 81 tests.

Observed result after fix: OpenAI bridge creation receives autoRespondToAudio: false plus the restore flag for outbound initial-greeting sessions, sends one explicit greeting response.create, and later sends one restore session.update with create_response: true.

What was not tested: A live Twilio/OpenAI phone call was not run because this review environment does not provide those credentials.

Notes for Maintainer

  • The restore uses a partial session.update scoped to turn detection rather than re-sending full session config.
  • The initial-greeting restore is currently triggered on response.done and response.cancelled; generic realtime error events still flow through the existing error handler.

@openclaw-barnacle openclaw-barnacle Bot added channel: voice-call Channel integration: voice-call extensions: openai size: M triage: mock-only-proof Candidate: PR proof only shows tests, mocks, snapshots, lint, typecheck, or CI. labels May 24, 2026
@clawsweeper

clawsweeper Bot commented May 24, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs real behavior proof before merge. Reviewed June 14, 2026, 10:06 PM ET / 02:06 UTC.

Summary
The PR suppresses OpenAI realtime server-VAD auto-response during outbound voice-call initial greetings, restores auto-response afterward, makes autoRespondToAudio provider config effective, and adds mocked regression tests.

PR surface: Source +52, Tests +385. Total +437 across 6 files.

Reproducibility: yes. at source level: current main queues an explicit Voice Call greeting while OpenAI server VAD create_response defaults true, matching the linked live report. I did not run a live Twilio/OpenAI call in this read-only review.

Review metrics: 1 noteworthy metric.

  • Config/API surfaces: 1 provider config field made effective, 1 bridge request flag added. Both surfaces affect provider/plugin compatibility and should be intentional before merge.

Stored data model
Persistent data-model change detected: serialized state: extensions/openai/realtime-voice-provider.test.ts, serialized state: extensions/voice-call/src/webhook/realtime-handler.test.ts, serialized state: src/talk/session-runtime.test.ts, serialized state: src/talk/session-runtime.ts, unknown-data-model-change: src/talk/session-runtime.test.ts, vector/embedding metadata: extensions/voice-call/src/webhook/realtime-handler.test.ts. Confirm migration or upgrade compatibility proof before merge.

Merge readiness
Overall: 🧂 unranked krab
Proof: 🧂 unranked krab
Patch quality: 🦪 silver shellfish
Result: blocked until real behavior proof from a real setup is added.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • [P1] Add redacted live Twilio/OpenAI outbound-call proof showing one initial greeting and normal later auto-response.
  • Preserve interruptResponseOnInputAudio:false when restoring server-VAD auto-response.
  • Avoid adding shared bridge API surface unless maintainers explicitly want that contract.

Proof guidance:

  • [P1] Needs real behavior proof before merge: The PR body lists mocked WebSocket/unit tests and explicitly says no live Twilio/OpenAI phone call was run; before merge, add redacted logs, terminal output, or a recording from a real outbound call, then update the PR body or ask a maintainer to comment @clawsweeper re-review.

Risk before merge

  • [P1] The restore path flips existing interruptResponseOnInputAudio:false setups back to interrupt_response:true after the greeting.
  • [P1] The proof is mock-only; no redacted live Twilio/OpenAI outbound call shows exactly one greeting and normal later turns.
  • [P1] The PR adds shared realtime bridge request API surface and an OpenAI-specific branch in shared runtime, which needs maintainer API-boundary judgment.
  • [P1] The branch is dirty against the base, so it needs rebase or refresh before any merge review can proceed.

Maintainer options:

  1. Use provider-local suppression before merge (recommended)
    Prefer an OpenAI-provider-owned suppression path that preserves existing bridge API surface and configured interruption behavior.
  2. Accept the new bridge API deliberately
    Maintainers may keep the new shared request flag only if they explicitly want this as plugin SDK surface and add contract coverage for it.
  3. Pause behind live-call proof
    Hold this PR until a real Twilio/OpenAI outbound call proves the caller hears one greeting and later turns still auto-respond normally.

Next step before merge

  • [P1] Human handling is needed because the remaining blockers are contributor/live-provider proof, dirty-branch refresh, and maintainer API-boundary judgment rather than a safe automated repair lane.

Security
Cleared: No concrete security or supply-chain concern found; the diff changes runtime voice provider/session code and tests without dependency, workflow, permission, secret-handling, or package-resolution changes.

Review findings

  • [P1] Preserve the configured interrupt setting on restore — extensions/openai/realtime-voice-provider.ts:1135
Review details

Best possible solution:

Keep the explicit Voice Call initial greeting as the startup owner, implement OpenAI-provider-owned first-turn/manual-response suppression without new shared API surface when possible, preserve all configured turn-detection preferences, and land only after redacted live Twilio/OpenAI proof.

Do we have a high-confidence way to reproduce the issue?

Yes at source level: current main queues an explicit Voice Call greeting while OpenAI server VAD create_response defaults true, matching the linked live report. I did not run a live Twilio/OpenAI call in this read-only review.

Is this the best way to solve the issue?

No, this branch is not currently the best mergeable fix because it adds shared bridge API surface and restores interrupt_response:true unconditionally. A provider-local suppression path that preserves configured turn-detection behavior is safer.

Full review comments:

  • [P1] Preserve the configured interrupt setting on restore — extensions/openai/realtime-voice-provider.ts:1135
    The restore session.update always sends interrupt_response: true. Existing OpenAI realtime config can set interruptResponseOnInputAudio: false; after this PR's initial greeting restore, that preference is flipped back on for the rest of the call. Restore the original interrupt setting and cover the false case.
    Confidence: 0.9

Overall correctness: patch is incorrect
Overall confidence: 0.86

AGENTS.md: found and applied where relevant.

Codex review notes: model internal, reasoning high; reviewed against 44e6caff5401.

Label changes

Label changes:

  • add merge-risk: 🚨 message-delivery: The PR changes OpenAI realtime response timing and restoration, which could duplicate, suppress, or misorder caller-visible speech without live-provider proof.

Label justifications:

  • P2: This is a normal-priority user-visible OpenAI voice-call bugfix with limited blast radius.
  • merge-risk: 🚨 compatibility: The PR adds a new exported realtime bridge request flag and can overwrite an existing interruption preference after the initial greeting.
  • merge-risk: 🚨 message-delivery: The PR changes OpenAI realtime response timing and restoration, which could duplicate, suppress, or misorder caller-visible speech without live-provider proof.
  • rating: 🧂 unranked krab: Overall readiness is 🧂 unranked krab; proof is 🧂 unranked krab and patch quality is 🦪 silver shellfish.
  • status: 📣 needs proof: The PR needs real behavior proof before ClawSweeper can clear the contributor ask. Needs real behavior proof before merge: The PR body lists mocked WebSocket/unit tests and explicitly says no live Twilio/OpenAI phone call was run; before merge, add redacted logs, terminal output, or a recording from a real outbound call, then update the PR body or ask a maintainer to comment @clawsweeper re-review.
Evidence reviewed

PR surface:

Source +52, Tests +385. Total +437 across 6 files.

View PR surface stats
Area Files Added Removed Net
Source 3 53 1 +52
Tests 3 385 0 +385
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 6 438 1 +437

What I checked:

  • Repository policy applied: Root AGENTS.md and extensions/AGENTS.md were read; provider routing, plugin API surface, config/default changes, and fallback behavior are compatibility-sensitive review surfaces. (AGENTS.md:31, 44e6caff5401)
  • Current main still has two response sources: Voice Call passes initialGreetingInstructions and sets triggerGreetingOnReady when an initial greeting exists, so the shared session runtime calls triggerGreeting. (extensions/voice-call/src/webhook/realtime-handler.ts:653, 44e6caff5401)
  • OpenAI auto-response defaults on: Current main builds OpenAI GA realtime turn detection with autoRespondToAudio defaulting true and sends that as turn_detection.create_response. (extensions/openai/realtime-voice-provider.ts:826, 44e6caff5401)
  • Dependency contract checked: OpenAI's official Realtime API reference exposes create_response and interrupt_response as turn-detection controls, and notes their interaction when interruption is disabled. (developers.openai.com)
  • PR adds shared bridge API surface: The PR head adds restoreAutoRespondToAudioAfterInitialGreeting to the exported realtime voice bridge request type and sets it from shared session runtime based on provider id openai. (src/talk/provider-types.ts:109, 3414023d9d25)
  • Compatibility defect on PR head: The restore session.update hardcodes interrupt_response: true, while adjacent tests preserve interruptResponseOnInputAudio:false as interrupt_response:false. (extensions/openai/realtime-voice-provider.ts:1135, 3414023d9d25)

Likely related people:

  • steipete: Recent history ties this handle to the talk session runtime, OpenAI realtime voice config, disabled interruption behavior, and voice-call realtime maintenance. (role: feature owner and recent area contributor; confidence: high; commits: f1636d5e2831, 997edf66a160, 3533297cd9cf; files: src/talk/session-runtime.ts, extensions/openai/realtime-voice-provider.ts, extensions/voice-call/src/webhook/realtime-handler.ts)
  • Solvely-Colin: Recent merged OpenAI realtime OAuth and stability commits touched the same provider/session family, with steipete as committer on several entries. (role: recent OpenAI realtime contributor; confidence: medium; commits: 7a2a31dede80, d5893d99d07b, 6e8029407919; files: extensions/openai/realtime-voice-provider.ts, src/talk/session-runtime.ts)
  • fuller-stack-dev: Recent merged work changed the Voice Call realtime handler and tests near the same call flow, though not the OpenAI server-VAD behavior itself. (role: recent adjacent contributor; confidence: medium; commits: f603fa58fec7; files: extensions/voice-call/src/webhook/realtime-handler.ts, extensions/voice-call/src/webhook/realtime-handler.test.ts)
  • vincentkoc: Recent history includes voice model/provider config work touching OpenAI realtime provider config and current shallow blame points the checked-out snapshot through this author. (role: recent adjacent provider contributor; confidence: medium; commits: 27b15a19e84c, fc6d448138fc; files: extensions/openai/realtime-voice-provider.ts, src/talk/session-runtime.ts, extensions/voice-call/src/webhook/realtime-handler.ts)
  • joshavant: Recent merged work changed OpenAI realtime voice authentication in the same provider file, relevant for routing review but not the greeting bug itself. (role: recent adjacent OpenAI provider contributor; confidence: low; commits: 9fdd56da2106; files: extensions/openai/realtime-voice-provider.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P2 Normal backlog priority with limited blast radius. merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. labels May 24, 2026
@clawsweeper

clawsweeper Bot commented May 24, 2026

Copy link
Copy Markdown
Contributor

ClawSweeper PR egg

🎁 Pass real behavior proof to wake the egg and unlock a hatchable treat.

Where did the egg go?
  • The egg game starts only after the PR passes the real-behavior proof check.
  • Before that, no creature or rarity is rolled. The treat waits for real proof.
  • This is still just collectible flavor: proof affects review readiness, not creature quality.

@omarshahine

Copy link
Copy Markdown
Contributor

Thanks for the PR. Before this can move forward, please add live proof from the affected surface, not just unit tests, mocked tests, or source inspection.

A useful proof update should include:

  • the exact build/SHA tested
  • the real environment used
  • the command, UI flow, channel flow, provider request, or other live path exercised
  • before/after symptom evidence where applicable
  • the observed result after the patch
  • any remaining proof gaps

Please redact secrets, tokens, phone numbers, and private message content from logs or screenshots.

@vincentkoc

Copy link
Copy Markdown
Member

Thanks @lml2468 for working on this fix. I am closing this as superseded by #86285 for the #85846 OpenAI realtime duplicate-greeting path: both PRs target the same root cause, and #86285 now carries the stronger current proof with the real behavior proof check passing.

Clownfish will keep the canonical issue and candidate fix path on #85846 / #86285 so review and validation stay in one place. Credit for this source PR remains visible here and in the cluster evidence; if #86285 misses a case this PR covers, please reply and we can reopen or split that detail back out.

@vincentkoc vincentkoc closed this Jun 16, 2026
@vincentkoc vincentkoc added the clownfish Tracked by Clownfish automation label Jun 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

channel: voice-call Channel integration: voice-call clownfish Tracked by Clownfish automation extensions: openai merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. merge-risk: 🚨 message-delivery 🚨 May drop, duplicate, misroute, suppress, or wrongly target messages. P2 Normal backlog priority with limited blast radius. rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. size: M status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. triage: mock-only-proof Candidate: PR proof only shows tests, mocks, snapshots, lint, typecheck, or CI.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[voice-call] OpenAI realtime: outbound calls greet caller twice — triggerGreeting + server_vad.create_response both fire response.create

3 participants