Skip to content

fix(models): managed first-party tier routing + pricing#27

Merged
xjdr-noumena merged 2 commits into
mainfrom
feat/managed-tier-routing
Jun 21, 2026
Merged

fix(models): managed first-party tier routing + pricing#27
xjdr-noumena merged 2 commits into
mainfrom
feat/managed-tier-routing

Conversation

@xjdr-noumena

Copy link
Copy Markdown
Contributor

Two commits land the managed first-party tier contract and matching pricing.

  1. fix(models): correct managed first-party tier routing

    • Token estimation on first-party routes through the main-loop model so the reported usage comes back on the serving model's tokenizer (GLM/Kimi/DeepSeek V4 Flash each have provider-specific tokenizers).
    • opusplan alias resolves to Opus (GLM 5.2) instead of Flash (Kimi). Plan mode wants Opus-tier reasoning, not the previous Flash-tier cost optimization.
    • getContextWindowForModel consults the managed model profile before the [1m] regex, so a [1m] tag attached to a model that does not support 1M (Kimi today) cannot silently inflate the reported context window. Kimi + [1m] now correctly reports 200K.
    • Adds context.1m-tier-contract.test.ts pinning the tier-tag contract.
  2. feat(models): set managed model pricing

    • GLM 5.2 - input $1.40 / output $4.40 / cached input $0.26 per Mtok
    • Kimi K2.7 - input $0.95 / output $4.00 / cached input $0.19 per Mtok
    • DeepSeek V4 Flash - input $0.14 / output $0.28 / cached input $0.0028 per Mtok
    • Rates match current competitor pricing for the equivalent upstream models. Cache-write rates default to the input rate where no separate cache-write tier is published.
    • Adds modelCost.managed.test.ts to pin the managed-tier rates.

Validation: 30/30 tests pass across the worked files; build passes.

- Token estimation routes through the main-loop model on first-party so
  the reported usage comes back on the serving model's tokenizer (GLM,
  Kimi, and DeepSeek V4 Flash each have provider-specific tokenizers).
- opusplan alias resolves to Opus (GLM 5.2) instead of Flash (Kimi).
  Plan mode wants Opus-tier reasoning; the previous Flash fallback was an
  Anthropic-era cost optimization.
- getContextWindowForModel consults the managed model profile before the
  [1m] regex so a [1m] tag attached to a model that does not support 1M
  (Kimi today) cannot silently inflate the reported context window. Kimi
  + [1m] now correctly reports 200K; GLM and DeepSeek V4 Flash stay at
  their native 1M.
- Adds context.1m-tier-contract.test.ts pinning the tier-tag contract
  for all three managed profiles.
GLM 5.2 - input $1.40 / output $4.40 / cached input $0.26 per Mtok
Kimi K2.7 - input $0.95 / output $4.00 / cached input $0.19 per Mtok
DeepSeek V4 Flash - input $0.14 / output $0.28 / cached input $0.0028 per Mtok

Rates are set to match current competitor pricing for the equivalent
upstream models. Cache-write rates default to the input rate where no
separate cache-write tier is published.

Adds modelCost.managed.test.ts to pin the managed-tier rates so future
drift is caught.
@xjdr-noumena xjdr-noumena merged commit f919da9 into main Jun 21, 2026
@xjdr-noumena xjdr-noumena deleted the feat/managed-tier-routing branch June 21, 2026 00:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant