Skip to content

fix(agents): keep BOOTSTRAP.md pending on preseeded managed workspaces [AI-assisted]#91955

Closed
luyao618 wants to merge 1 commit into
openclaw:mainfrom
luyao618:fix/workspace-bootstrap-preseeded-profile
Closed

fix(agents): keep BOOTSTRAP.md pending on preseeded managed workspaces [AI-assisted]#91955
luyao618 wants to merge 1 commit into
openclaw:mainfrom
luyao618:fix/workspace-bootstrap-preseeded-profile

Conversation

@luyao618

Copy link
Copy Markdown
Contributor

🤖 AI-assisted (built with Hermes orchestration; reviewer = 麻酱, code review = Codex CLI). Test level: fully tested. Prompt summary available on request.

Summary

  • Problem: On a managed / GitOps / operator-style deployment (e.g. Kubernetes with a PVC-backed workspace) a fresh workspace preseeded with custom SOUL.md / IDENTITY.md / USER.md plus a user-provided BOOTSTRAP.md had its BOOTSTRAP.md silently deleted and bootstrap marked complete before the first onboarding flow could run.
  • Solution: Split stale-completion evidence into user-content evidence (memory/, MEMORY.md, populated SKILL.md under skills/) versus profile-file diffs (SOUL/IDENTITY/USER differing from built-in templates); only accept profile-file diffs as completion evidence when a prior process lifecycle has already persisted bootstrapSeededAt.
  • What changed: src/agents/workspace.ts (+78/−8) — split workspaceHasBootstrapCompletionEvidence into a classified workspaceBootstrapCompletionEvidence helper, snapshot bootstrapSeededInPriorLifecycle in ensureAgentWorkspace before mutating state, gate profile-diff evidence in reconcileWorkspaceBootstrapCompletionState. src/agents/workspace.test.ts (+153/−0) — new describe("preseeded managed workspace keeps bootstrap pending") with it.each coverage for profile-only preseed, ensureAgentWorkspace pod-start path, second-lifecycle repair, and a separate it.each for the user-content branch (memory/, MEMORY.md, skills/local-skill/SKILL.md).
  • What did NOT change (scope boundary): No changes to workspaceProfileLooksConfigured callers in legacy-migration paths (ensureAgentWorkspace lines ~1000+, recentAttestationPath branch), hasWorkspaceUserContentEvidence, workspaceRequiredBootstrapLooksCustomized, channel/plugin/provider code, config schema, or any CLI surface. The change is local to the workspace bootstrap completion reconciler.

Motivation

Issue #91931 documents two independent confirmed reproductions (Kubernetes/PVC operator deployment, and macOS dev install) where OpenClaw silently skips first-run onboarding on workspaces preseeded by a platform. In that mode SOUL.md / IDENTITY.md / USER.md are platform defaults, not user-completed onboarding output, so the existing heuristic that treats their diffs as completion evidence produces a silent first-run skip: BOOTSTRAP.md disappears, SKILL_USAGE.md is never initialized, and onboarding cron jobs are never created. The first-run onboarding contract becomes silently unenforceable in managed deployments.

Change Type (select all)

  • Bug fix

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

(The change is in agent workspace bootstrap reconciliation — neither side of the scope list cleanly applies; closest area is workspace setup / startup.)

Linked Issue/PR

Real behavior proof (required for external PRs)

  • Behavior or issue addressed: On a fresh preseeded managed workspace, reconcileWorkspaceBootstrapCompletion / ensureAgentWorkspace silently deleted BOOTSTRAP.md and wrote setupCompletedAt before the onboarding flow could run, while legacy local stale-bootstrap recovery on memory/ / MEMORY.md / populated skills/*/SKILL.md must keep working.
  • Real environment tested: local OpenClaw checkout @ upstream/main bb6e47729c, macOS, Node v22.15.0, pnpm 11.2.2. Repro drives the real reconcileWorkspaceBootstrapCompletion and ensureAgentWorkspace source from src/agents/workspace.ts via tsx; no mocks, no Kubernetes runtime, no channel/provider involvement.
  • Exact steps or command run after this patch:
    1. Create three temp workspaces, each preseeded with a user-provided BOOTSTRAP.md plus custom SOUL.md / IDENTITY.md / USER.md matching the issue repro.
    2. Scenario A: call reconcileWorkspaceBootstrapCompletion(dir) directly.
    3. Scenario B: call ensureAgentWorkspace({ dir, ensureBootstrapFiles: true }) (matches K8s pod start).
    4. Scenario C: same preseed + a real memory/2026-05-01.md user-content file; call reconcileWorkspaceBootstrapCompletion(dir) to confirm legacy stale-bootstrap recovery still triggers.
    5. Run the script with node_modules/.bin/tsx repro-91931.mjs on the current branch (AFTER) and on upstream/main (BEFORE).
  • Evidence after fix (redacted terminal capture):
    === BEFORE the fix (current upstream/main) ===
    === Scenario A: reconcileWorkspaceBootstrapCompletion on preseeded managed workspace ===
    Before: BOOTSTRAP.md exists: true
    After: repaired=true bootstrapExists=false setupCompletedAt=2026-06-10T14:03:01.655Z
    After: BOOTSTRAP.md still on disk: false
    ❌ Bug #91931 reproduced (Scenario A): BOOTSTRAP.md deleted or completion recorded
    
    === Scenario B: ensureAgentWorkspace on preseeded managed workspace (matches K8s pod start) ===
    Before: BOOTSTRAP.md exists: true
    After: BOOTSTRAP.md still on disk: false
    After: state.setupCompletedAt: 2026-06-10T14:03:01.659Z
    After: state.bootstrapSeededAt: 2026-06-10T14:03:01.659Z
    ❌ Bug #91931 reproduced (Scenario B): BOOTSTRAP.md deleted or completion recorded
    
    === Scenario C: legacy local stale-bootstrap recovery with user content (must keep working) ===
    After: repaired=true bootstrapExists=false setupCompletedAt=2026-06-10T14:03:01.662Z
    ✅ Scenario C: legacy stale-bootstrap recovery still triggers on real user content
    
    2 scenario(s) failed
    
    === AFTER the fix (this branch) ===
    === Scenario A: reconcileWorkspaceBootstrapCompletion on preseeded managed workspace ===
    Before: BOOTSTRAP.md exists: true
    After: repaired=false bootstrapExists=true setupCompletedAt=<undefined>
    After: BOOTSTRAP.md still on disk: true
    ✅ Scenario A: bootstrap kept pending; BOOTSTRAP.md preserved
    
    === Scenario B: ensureAgentWorkspace on preseeded managed workspace (matches K8s pod start) ===
    Before: BOOTSTRAP.md exists: true
    After: BOOTSTRAP.md still on disk: true
    After: state.setupCompletedAt: <undefined>
    After: state.bootstrapSeededAt: 2026-06-10T14:02:51.430Z
    ✅ Scenario B: bootstrap kept pending; BOOTSTRAP.md preserved
    
    === Scenario C: legacy local stale-bootstrap recovery with user content (must keep working) ===
    After: repaired=true bootstrapExists=false setupCompletedAt=2026-06-10T14:02:51.434Z
    ✅ Scenario C: legacy stale-bootstrap recovery still triggers on real user content
    
    All scenarios passed
    
  • Observed result after fix: A managed-preseeded workspace keeps BOOTSTRAP.md and leaves setupCompletedAt unset, while still persisting bootstrapSeededAt so a future lifecycle can repair an orphan BOOTSTRAP.md normally. Legacy local stale-bootstrap recovery on real user content (memory/, MEMORY.md, populated SKILL.md) is unaffected and still triggers on the first lifecycle.
  • What was not tested: Did not run a full end-to-end Kubernetes / PVC pod-start with the live runtime — repro drives the real source code paths but not the operator/StatefulSet shell around them. No CLI binary build/install run.
  • Before evidence: Same script run against upstream/main bb6e47729c shows for Scenarios A and B (BOOTSTRAP.md deleted + setupCompletedAt written). Captured above in the === BEFORE the fix === block.

Root Cause (if applicable)

  • Root cause: workspaceHasBootstrapCompletionEvidence() was a thin wrapper around workspaceProfileLooksConfigured(), which OR-ed together "profile-file diffs from built-in templates" and "real user-content evidence". reconcileWorkspaceBootstrapCompletionState() then trusted any of those signals as completion evidence and deleted BOOTSTRAP.md. The heuristic implicitly assumed profile-file diffs only happen because a user manually edited them — true for local installs, false for managed deployments where a platform seeds profile files from templates before first run.
  • Missing detection / guardrail: There was no test that distinguished "profile-file diffs in a fresh preseeded workspace" from "profile-file diffs after a real onboarding completed". The existing test at src/agents/workspace.test.ts:624 ("uses SOUL.md customization as stale bootstrap completion evidence") was the heuristic this PR narrows; the new describe("preseeded managed workspace keeps bootstrap pending") block locks in the desired distinction.
  • Contributing context: The OpenClaw workspace contract is one-time bootstrap (per docs/concepts/agent.md line 42); the heuristic was designed for the legacy local case where a user might have a BOOTSTRAP.md left over from a partially completed first run. The managed/preseeded deployment shape (issue [Bug]: Preseeded SOUL.md/IDENTITY.md/USER.md make OpenClaw auto-complete bootstrap and delete user-provided BOOTSTRAP.md before first run #91931) and the related persistent-bootstrap discussion in [Feature] Add ootstrap.mode: persistent option to prevent BOOTSTRAP.md deletion #84132 were not in the original design space.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: src/agents/workspace.test.ts — new describe("preseeded managed workspace keeps bootstrap pending") block (2 parametrized it.each cases + 2 lifecycle-specific cases).
  • Scenario the test should lock in:
    1. Fresh preseeded workspace (no prior bootstrapSeededAt) + profile-file diffs alone → reconciler keeps BOOTSTRAP.md, leaves setupCompletedAt unset (4 parametrized variants: SOUL only, IDENTITY only, USER only, all three).
    2. ensureAgentWorkspace on the same shape → matches K8s pod-start lifecycle; BOOTSTRAP.md is preserved and bootstrapSeededAt is persisted.
    3. Second-lifecycle repair: once bootstrapSeededAt is on disk, a follow-up reconcile run does treat profile-file diffs as legitimate stale-completion evidence and cleans up an orphan BOOTSTRAP.md.
    4. Legacy / local stale-bootstrap recovery: a fresh-lifecycle workspace with real user content (memory/, MEMORY.md, populated skills/local-skill/SKILL.md) still triggers the repair (3 parametrized variants), so the existing legacy recovery contract is not regressed.
  • Why this is the smallest reliable guardrail: The bug is a state-machine + heuristic boundary issue, fully testable at the workspace-reconciler unit level without needing a live container/PVC. it.each keeps the parametrization tight and asserts both the bug shape and the legacy regression contract simultaneously.
  • Existing test that already covers this (if any): N/A — src/agents/workspace.test.ts:624 covered the heuristic this PR refines; no test covered the preseeded-workspace case.
  • If no new test is added, why not: N/A.

User-visible / Behavior Changes

  • Managed / GitOps / operator-style deployments with preseeded custom SOUL.md / IDENTITY.md / USER.md plus a user-provided BOOTSTRAP.md now keep BOOTSTRAP.md on first start instead of silently deleting it, so the intended first-run onboarding flow can actually run. Legacy local behavior on workspaces with real user content (memory/, MEMORY.md, populated skills/*/SKILL.md) is unchanged. No config flag is introduced; the heuristic is narrowed in place.

Diagram (if applicable)

Before:
  fresh preseeded workspace (BOOTSTRAP.md + custom SOUL/IDENTITY/USER)
    -> workspaceProfileLooksConfigured() == true (profile diffs)
    -> reconciler writes setupCompletedAt + deletes BOOTSTRAP.md
    -> onboarding flow silently skipped

After:
  fresh preseeded workspace (no prior bootstrapSeededAt on disk)
    -> evidence.profileFilesDiffer == true, evidence.hasUserContent == false
    -> reconciler keeps BOOTSTRAP.md, persists bootstrapSeededAt only
  next lifecycle (prior bootstrapSeededAt on disk)
    -> evidence.profileFilesDiffer == true is now legitimate completion evidence
    -> reconciler writes setupCompletedAt + removes orphan BOOTSTRAP.md
  fresh workspace with real user content (memory/, MEMORY.md, skills/*/SKILL.md)
    -> evidence.hasUserContent == true (unconditional)
    -> reconciler repairs as before

Security Impact (required)

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No
  • Data access scope changed? No (still local workspace filesystem only)
  • If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

  • OS: macOS
  • Runtime/container: Node v22.15.0, pnpm 11.2.2
  • Model/provider: N/A (the bug fires before any model call)
  • Integration/channel (if any): N/A
  • Relevant config (redacted): default workspace layout under a temp dir; no custom config required to reproduce.

Steps

  1. git fetch upstream && git checkout fix/workspace-bootstrap-preseeded-profile
  2. node_modules/.bin/vitest run src/agents/workspace.test.tsTest Files 2 passed (2), Tests 110 passed (110)
  3. pnpm exec oxfmt --check src/agents/workspace.ts src/agents/workspace.test.ts → clean
  4. node scripts/run-tsgo.mjs -p test/tsconfig/tsconfig.core.test.json --pretty false --incremental → no error TS
  5. Repro driver (BEFORE on upstream/main for Scenarios A/B; AFTER on this branch → for A/B/C; see Real behavior proof block above)

When OpenClaw runs in a managed / GitOps / operator-style deployment
(for example Kubernetes with a PVC-backed workspace), a fresh workspace
can be preseeded with custom SOUL.md, IDENTITY.md, USER.md, and a
user-provided BOOTSTRAP.md before OpenClaw ever runs. The bootstrap
completion reconciler treated those profile-file diffs against built-in
templates as evidence that the human onboarding flow had completed and
deleted the user-provided BOOTSTRAP.md before it could run, leaving
SKILL_USAGE.md uninitialized and onboarding cron jobs uncreated.

The fix splits stale-completion evidence into two kinds:

  * Real user content (memory/, MEMORY.md, populated SKILL.md under
    skills/) is always a legitimate signal that a previous onboarding
    flow ran but did not persist completion, so legacy / local stale
    BOOTSTRAP.md recovery keeps working.
  * SOUL / IDENTITY / USER diffs against built-in templates are only
    accepted as completion evidence when `bootstrapSeededAt` was already
    persisted to disk by a prior process lifecycle (captured before
    `ensureAgentWorkspace` mutates state in memory).

A fresh preseeded workspace therefore keeps BOOTSTRAP.md, leaves
setupCompletedAt unset, and still records bootstrapSeededAt so a future
lifecycle can repair an orphan BOOTSTRAP.md the normal way.

Closes openclaw#91931
[AI-assisted]
@clawsweeper

clawsweeper Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Thanks for the context here. I swept through the related work, and this is now duplicate or superseded.

Close this PR as superseded: it targets a real P1 bootstrap-state bug, but the branch still allows profile-only preseeded workspaces to delete BOOTSTRAP.md after a restart, while the open sibling PR at #91988 now covers that restart path with sufficient proof and a cleaner landing route.

Canonical path: Use #91988 as the canonical fix, then close this branch so only one implementation proceeds for the linked bootstrap-state bug.

So I’m closing this here and keeping the remaining discussion on #91988.

Review details

Best possible solution:

Use #91988 as the canonical fix, then close this branch so only one implementation proceeds for the linked bootstrap-state bug.

Do we have a high-confidence way to reproduce the issue?

Yes. Source inspection on current main shows profile-file diffs can still drive setupCompletedAt and BOOTSTRAP.md removal, and the linked issue/PR discussion provides concrete Kubernetes and macOS evidence.

Is this the best way to solve the issue?

No. The branch is a plausible first-lifecycle mitigation, but it is not the best fix because the sibling PR preserves BOOTSTRAP.md across restart-before-onboarding and has positive proof.

Security review:

Security review cleared: The diff is limited to local workspace bootstrap-state reconciliation and tests, with no new dependency, credential, network, workflow, or code-execution surface.

AGENTS.md: found and applied where relevant.

What I checked:

  • Root policy read: Read the full root AGENTS.md and applied the ClawSweeper guidance for PR reviews, compatibility-sensitive startup/state changes, scoped AGENTS.md, and read-only GitHub inspection. (AGENTS.md:1, 8ded75628437)
  • Scoped agents policy read: Read src/agents/AGENTS.md; its agent-test performance guidance did not block the review, while root startup/session-state policy did apply. (src/agents/AGENTS.md:1, 8ded75628437)
  • Current main still has the reported source path: Current main still delegates bootstrap completion evidence to workspaceProfileLooksConfigured(), which treats SOUL.md, IDENTITY.md, and USER.md template diffs as completion evidence; the reconciler can then write setupCompletedAt and remove BOOTSTRAP.md. (src/agents/workspace.ts:398, 8ded75628437)
  • Latest release has the same behavior: The latest release tag v2026.6.6 shows the same workspaceHasBootstrapCompletionEvidence and deletion path, so the central bug is not already shipped fixed. (src/agents/workspace.ts:398, 8c802aa68351)
  • This PR still repairs profile-only bootstrap after prior lifecycle: At this PR head, the condition only protects fresh lifecycles; once bootstrapSeededAt was persisted, profile-file diffs can still fall through to setupCompletedAt and BOOTSTRAP.md deletion while BOOTSTRAP.md exists. (src/agents/workspace.ts:501, 9d65bcfe9002)
  • This PR's tests encode the unsafe second-lifecycle repair: The added test explicitly expects a second lifecycle with prior bootstrapSeededAt to repair by deleting BOOTSTRAP.md, which is the restart-before-onboarding gap flagged by prior review. (src/agents/workspace.test.ts:739, 9d65bcfe9002)

Likely related people:

  • Patrick-Erichsen: Authored and merged the stale bootstrap completion repair that added profile-file differences as completion evidence, which this bug-fix cluster narrows. (role: stale completion repair contributor; confidence: high; commits: 137f5c3a8b3c; files: src/agents/workspace.ts, src/agents/workspace.test.ts)
  • Takhoffman: Authored and merged workspace bootstrap prompt routing that consumes pending/completed bootstrap state in runtime paths. (role: bootstrap routing contributor; confidence: high; commits: 62703d84308a; files: src/agents/workspace.ts, src/agents/bootstrap-files.ts, src/auto-reply/reply/session-reset-prompt.ts)
  • Peter Steinberger: Introduced the original BOOTSTRAP.md workspace ritual behavior that this state machine protects. (role: introduced workspace bootstrap ritual; confidence: high; commits: 3876c1679ac9; files: src/agents/workspace.ts, src/agents/workspace.test.ts, docs/reference/templates/BOOTSTRAP.md)
  • Gustavo Madeira Santana: Added persisted bootstrap onboarding state semantics central to bootstrapSeededAt and setupCompletedAt behavior. (role: onboarding-state contributor; confidence: medium; commits: 28b78b25b721; files: src/agents/workspace.ts)

Codex review notes: model internal, reasoning high; reviewed against 8ded75628437.

@clawsweeper clawsweeper Bot added rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. P1 High-priority user-facing bug, regression, or broken workflow. merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. and removed rating: 🌊 off-meta tidepool PR readiness rating does not apply to this item. labels Jun 10, 2026
@openclaw-barnacle openclaw-barnacle Bot removed the triage: needs-real-behavior-proof Candidate: external PR needs after-fix proof from a real setup. label Jun 13, 2026
@clawsweeper clawsweeper Bot added rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. and removed rating: 🦪 silver shellfish Thin PR readiness signal; proof, validation, or implementation needs work. labels Jun 13, 2026
@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. and removed rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. labels Jun 15, 2026
@openclaw-clownfish

Copy link
Copy Markdown
Contributor

Thanks @luyao618 for the careful work on this BOOTSTRAP.md fix. I am closing this PR as superseded by #91988 because both branches target the same preseeded-workspace bootstrap-state bug, and the sibling PR now has the cleaner canonical path for the restart-before-onboarding case.

Clownfish will keep the canonical discussion and validation on #91988 so only one implementation proceeds for #91931. Your source PR and proof remain part of the credited context for that path. If this branch still covers a distinct reproduction path after #91988 is updated or lands, please reply and maintainers can reopen or split the work back out.

@openclaw-clownfish openclaw-clownfish Bot added the clownfish Tracked by Clownfish automation label Jun 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling clownfish Tracked by Clownfish automation merge-risk: 🚨 compatibility 🚨 May break existing users, config, migrations, defaults, or upgrade paths. merge-risk: 🚨 session-state 🚨 May lose, corrupt, stale, or mis-associate session, agent, or context state. P1 High-priority user-facing bug, regression, or broken workflow. proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. size: M status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Preseeded SOUL.md/IDENTITY.md/USER.md make OpenClaw auto-complete bootstrap and delete user-provided BOOTSTRAP.md before first run

1 participant