fix(agents): keep BOOTSTRAP.md pending on preseeded managed workspaces [AI-assisted]#91955
fix(agents): keep BOOTSTRAP.md pending on preseeded managed workspaces [AI-assisted]#91955luyao618 wants to merge 1 commit into
Conversation
When OpenClaw runs in a managed / GitOps / operator-style deployment
(for example Kubernetes with a PVC-backed workspace), a fresh workspace
can be preseeded with custom SOUL.md, IDENTITY.md, USER.md, and a
user-provided BOOTSTRAP.md before OpenClaw ever runs. The bootstrap
completion reconciler treated those profile-file diffs against built-in
templates as evidence that the human onboarding flow had completed and
deleted the user-provided BOOTSTRAP.md before it could run, leaving
SKILL_USAGE.md uninitialized and onboarding cron jobs uncreated.
The fix splits stale-completion evidence into two kinds:
* Real user content (memory/, MEMORY.md, populated SKILL.md under
skills/) is always a legitimate signal that a previous onboarding
flow ran but did not persist completion, so legacy / local stale
BOOTSTRAP.md recovery keeps working.
* SOUL / IDENTITY / USER diffs against built-in templates are only
accepted as completion evidence when `bootstrapSeededAt` was already
persisted to disk by a prior process lifecycle (captured before
`ensureAgentWorkspace` mutates state in memory).
A fresh preseeded workspace therefore keeps BOOTSTRAP.md, leaves
setupCompletedAt unset, and still records bootstrapSeededAt so a future
lifecycle can repair an orphan BOOTSTRAP.md the normal way.
Closes openclaw#91931
[AI-assisted]
|
Thanks for the context here. I swept through the related work, and this is now duplicate or superseded. Close this PR as superseded: it targets a real P1 bootstrap-state bug, but the branch still allows profile-only preseeded workspaces to delete BOOTSTRAP.md after a restart, while the open sibling PR at #91988 now covers that restart path with sufficient proof and a cleaner landing route. Canonical path: Use #91988 as the canonical fix, then close this branch so only one implementation proceeds for the linked bootstrap-state bug. So I’m closing this here and keeping the remaining discussion on #91988. Review detailsBest possible solution: Use #91988 as the canonical fix, then close this branch so only one implementation proceeds for the linked bootstrap-state bug. Do we have a high-confidence way to reproduce the issue? Yes. Source inspection on current main shows profile-file diffs can still drive setupCompletedAt and BOOTSTRAP.md removal, and the linked issue/PR discussion provides concrete Kubernetes and macOS evidence. Is this the best way to solve the issue? No. The branch is a plausible first-lifecycle mitigation, but it is not the best fix because the sibling PR preserves BOOTSTRAP.md across restart-before-onboarding and has positive proof. Security review: Security review cleared: The diff is limited to local workspace bootstrap-state reconciliation and tests, with no new dependency, credential, network, workflow, or code-execution surface. AGENTS.md: found and applied where relevant. What I checked:
Likely related people:
Codex review notes: model internal, reasoning high; reviewed against 8ded75628437. |
|
Thanks @luyao618 for the careful work on this BOOTSTRAP.md fix. I am closing this PR as superseded by #91988 because both branches target the same preseeded-workspace bootstrap-state bug, and the sibling PR now has the cleaner canonical path for the restart-before-onboarding case. Clownfish will keep the canonical discussion and validation on #91988 so only one implementation proceeds for #91931. Your source PR and proof remain part of the credited context for that path. If this branch still covers a distinct reproduction path after #91988 is updated or lands, please reply and maintainers can reopen or split the work back out. |
Summary
SOUL.md/IDENTITY.md/USER.mdplus a user-providedBOOTSTRAP.mdhad itsBOOTSTRAP.mdsilently deleted and bootstrap marked complete before the first onboarding flow could run.memory/,MEMORY.md, populatedSKILL.mdunderskills/) versus profile-file diffs (SOUL/IDENTITY/USER differing from built-in templates); only accept profile-file diffs as completion evidence when a prior process lifecycle has already persistedbootstrapSeededAt.src/agents/workspace.ts(+78/−8) — splitworkspaceHasBootstrapCompletionEvidenceinto a classifiedworkspaceBootstrapCompletionEvidencehelper, snapshotbootstrapSeededInPriorLifecycleinensureAgentWorkspacebefore mutating state, gate profile-diff evidence inreconcileWorkspaceBootstrapCompletionState.src/agents/workspace.test.ts(+153/−0) — newdescribe("preseeded managed workspace keeps bootstrap pending")withit.eachcoverage for profile-only preseed, ensureAgentWorkspace pod-start path, second-lifecycle repair, and a separateit.eachfor the user-content branch (memory/,MEMORY.md,skills/local-skill/SKILL.md).workspaceProfileLooksConfiguredcallers in legacy-migration paths (ensureAgentWorkspacelines ~1000+,recentAttestationPathbranch),hasWorkspaceUserContentEvidence,workspaceRequiredBootstrapLooksCustomized, channel/plugin/provider code, config schema, or any CLI surface. The change is local to the workspace bootstrap completion reconciler.Motivation
Issue #91931 documents two independent confirmed reproductions (Kubernetes/PVC operator deployment, and macOS dev install) where OpenClaw silently skips first-run onboarding on workspaces preseeded by a platform. In that mode
SOUL.md/IDENTITY.md/USER.mdare platform defaults, not user-completed onboarding output, so the existing heuristic that treats their diffs as completion evidence produces a silent first-run skip:BOOTSTRAP.mddisappears,SKILL_USAGE.mdis never initialized, and onboarding cron jobs are never created. The first-run onboarding contract becomes silently unenforceable in managed deployments.Change Type (select all)
Scope (select all touched areas)
(The change is in agent workspace bootstrap reconciliation — neither side of the scope list cleanly applies; closest area is workspace setup / startup.)
Linked Issue/PR
Real behavior proof (required for external PRs)
reconcileWorkspaceBootstrapCompletion/ensureAgentWorkspacesilently deletedBOOTSTRAP.mdand wrotesetupCompletedAtbefore the onboarding flow could run, while legacy local stale-bootstrap recovery onmemory//MEMORY.md/ populatedskills/*/SKILL.mdmust keep working.@ upstream/main bb6e47729c, macOS, Node v22.15.0, pnpm 11.2.2. Repro drives the realreconcileWorkspaceBootstrapCompletionandensureAgentWorkspacesource fromsrc/agents/workspace.tsviatsx; no mocks, no Kubernetes runtime, no channel/provider involvement.BOOTSTRAP.mdplus customSOUL.md/IDENTITY.md/USER.mdmatching the issue repro.reconcileWorkspaceBootstrapCompletion(dir)directly.ensureAgentWorkspace({ dir, ensureBootstrapFiles: true })(matches K8s pod start).memory/2026-05-01.mduser-content file; callreconcileWorkspaceBootstrapCompletion(dir)to confirm legacy stale-bootstrap recovery still triggers.node_modules/.bin/tsx repro-91931.mjson the current branch (AFTER) and onupstream/main(BEFORE).BOOTSTRAP.mdand leavessetupCompletedAtunset, while still persistingbootstrapSeededAtso a future lifecycle can repair an orphanBOOTSTRAP.mdnormally. Legacy local stale-bootstrap recovery on real user content (memory/,MEMORY.md, populatedSKILL.md) is unaffected and still triggers on the first lifecycle.upstream/main bb6e47729cshows❌for Scenarios A and B (BOOTSTRAP.md deleted + setupCompletedAt written). Captured above in the=== BEFORE the fix ===block.Root Cause (if applicable)
workspaceHasBootstrapCompletionEvidence()was a thin wrapper aroundworkspaceProfileLooksConfigured(), which OR-ed together "profile-file diffs from built-in templates" and "real user-content evidence".reconcileWorkspaceBootstrapCompletionState()then trusted any of those signals as completion evidence and deletedBOOTSTRAP.md. The heuristic implicitly assumed profile-file diffs only happen because a user manually edited them — true for local installs, false for managed deployments where a platform seeds profile files from templates before first run.src/agents/workspace.test.ts:624("uses SOUL.md customization as stale bootstrap completion evidence") was the heuristic this PR narrows; the newdescribe("preseeded managed workspace keeps bootstrap pending")block locks in the desired distinction.docs/concepts/agent.mdline 42); the heuristic was designed for the legacy local case where a user might have aBOOTSTRAP.mdleft over from a partially completed first run. The managed/preseeded deployment shape (issue [Bug]: Preseeded SOUL.md/IDENTITY.md/USER.md make OpenClaw auto-complete bootstrap and delete user-provided BOOTSTRAP.md before first run #91931) and the related persistent-bootstrap discussion in [Feature] Add ootstrap.mode: persistent option to prevent BOOTSTRAP.md deletion #84132 were not in the original design space.Regression Test Plan (if applicable)
src/agents/workspace.test.ts— newdescribe("preseeded managed workspace keeps bootstrap pending")block (2 parametrizedit.eachcases + 2 lifecycle-specific cases).bootstrapSeededAt) + profile-file diffs alone → reconciler keepsBOOTSTRAP.md, leavessetupCompletedAtunset (4 parametrized variants: SOUL only, IDENTITY only, USER only, all three).ensureAgentWorkspaceon the same shape → matches K8s pod-start lifecycle;BOOTSTRAP.mdis preserved andbootstrapSeededAtis persisted.bootstrapSeededAtis on disk, a follow-up reconcile run does treat profile-file diffs as legitimate stale-completion evidence and cleans up an orphanBOOTSTRAP.md.memory/,MEMORY.md, populatedskills/local-skill/SKILL.md) still triggers the repair (3 parametrized variants), so the existing legacy recovery contract is not regressed.it.eachkeeps the parametrization tight and asserts both the bug shape and the legacy regression contract simultaneously.src/agents/workspace.test.ts:624covered the heuristic this PR refines; no test covered the preseeded-workspace case.User-visible / Behavior Changes
SOUL.md/IDENTITY.md/USER.mdplus a user-providedBOOTSTRAP.mdnow keepBOOTSTRAP.mdon first start instead of silently deleting it, so the intended first-run onboarding flow can actually run. Legacy local behavior on workspaces with real user content (memory/,MEMORY.md, populatedskills/*/SKILL.md) is unchanged. No config flag is introduced; the heuristic is narrowed in place.Diagram (if applicable)
Security Impact (required)
Repro + Verification
Environment
Steps
git fetch upstream && git checkout fix/workspace-bootstrap-preseeded-profilenode_modules/.bin/vitest run src/agents/workspace.test.ts→Test Files 2 passed (2), Tests 110 passed (110)pnpm exec oxfmt --check src/agents/workspace.ts src/agents/workspace.test.ts→ cleannode scripts/run-tsgo.mjs -p test/tsconfig/tsconfig.core.test.json --pretty false --incremental→ noerror TSupstream/main→❌for Scenarios A/B; AFTER on this branch →✅for A/B/C; see Real behavior proof block above)