fix(models): managed first-party tier routing + pricing by xjdr-noumena · Pull Request #27 · Noumena-Network/code

xjdr-noumena · 2026-06-21T00:29:17Z

Two commits land the managed first-party tier contract and matching pricing.

fix(models): correct managed first-party tier routing
- Token estimation on first-party routes through the main-loop model so the reported usage comes back on the serving model's tokenizer (GLM/Kimi/DeepSeek V4 Flash each have provider-specific tokenizers).
- opusplan alias resolves to Opus (GLM 5.2) instead of Flash (Kimi). Plan mode wants Opus-tier reasoning, not the previous Flash-tier cost optimization.
- getContextWindowForModel consults the managed model profile before the [1m] regex, so a [1m] tag attached to a model that does not support 1M (Kimi today) cannot silently inflate the reported context window. Kimi + [1m] now correctly reports 200K.
- Adds context.1m-tier-contract.test.ts pinning the tier-tag contract.
feat(models): set managed model pricing
- GLM 5.2 - input $1.40 / output $4.40 / cached input $0.26 per Mtok
- Kimi K2.7 - input $0.95 / output $4.00 / cached input $0.19 per Mtok
- DeepSeek V4 Flash - input $0.14 / output $0.28 / cached input $0.0028 per Mtok
- Rates match current competitor pricing for the equivalent upstream models. Cache-write rates default to the input rate where no separate cache-write tier is published.
- Adds modelCost.managed.test.ts to pin the managed-tier rates.

Validation: 30/30 tests pass across the worked files; build passes.

- Token estimation routes through the main-loop model on first-party so the reported usage comes back on the serving model's tokenizer (GLM, Kimi, and DeepSeek V4 Flash each have provider-specific tokenizers). - opusplan alias resolves to Opus (GLM 5.2) instead of Flash (Kimi). Plan mode wants Opus-tier reasoning; the previous Flash fallback was an Anthropic-era cost optimization. - getContextWindowForModel consults the managed model profile before the [1m] regex so a [1m] tag attached to a model that does not support 1M (Kimi today) cannot silently inflate the reported context window. Kimi + [1m] now correctly reports 200K; GLM and DeepSeek V4 Flash stay at their native 1M. - Adds context.1m-tier-contract.test.ts pinning the tier-tag contract for all three managed profiles.

GLM 5.2 - input $1.40 / output $4.40 / cached input $0.26 per Mtok Kimi K2.7 - input $0.95 / output $4.00 / cached input $0.19 per Mtok DeepSeek V4 Flash - input $0.14 / output $0.28 / cached input $0.0028 per Mtok Rates are set to match current competitor pricing for the equivalent upstream models. Cache-write rates default to the input rate where no separate cache-write tier is published. Adds modelCost.managed.test.ts to pin the managed-tier rates so future drift is caught.

xjdr-noumena added 2 commits June 21, 2026 00:21

xjdr-noumena merged commit f919da9 into main Jun 21, 2026

xjdr-noumena deleted the feat/managed-tier-routing branch June 21, 2026 00:32

xjdr-noumena mentioned this pull request Jun 21, 2026

fix(launcher): stop forcing all tiers to the default model #29

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(models): managed first-party tier routing + pricing#27

fix(models): managed first-party tier routing + pricing#27
xjdr-noumena merged 2 commits into
mainfrom
feat/managed-tier-routing

xjdr-noumena commented Jun 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

xjdr-noumena commented Jun 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant