Skip to content

[BUG] Rebuilt (post-June-5) Gemma-4 GGUFs still carry broken text-tower weights — verified at the tensor level, engine-independent. PPL 192.9 vs true 4.68 #6056

Description

@nihilistau

Hi — we spent several days root-causing the Gemma-4-12B perplexity anomalies
and want to share results + reproduction instruments, because the June-5
rebuild did not fix the underlying problem.

Method (engine-independent). We wrote a from-scratch reference forward
for gemma-4 directly off the official safetensors + config (no llama.cpp, no
transformers — ~130 lines of numpy/torch). On a verified token fixture
(identical to HF tokenizer.json, 5431/5431) it measures the TRUE
full-precision wikitext chunk-0 PPL at 4.6776, with targets at max-logit
(NLL ≈ 0.001). The same script can dequantize any GGUF's tensors and run the
identical arithmetic over them — which removes the inference engine as a
variable entirely.

Results (same fixture, same protocol, only the weight bytes change):

weights PPL (our forward) PPL (llama.cpp)
official bf16 safetensors 4.68
pre-fix Q4_K_M GGUF 271.2 505.9
pre-fix QAT-Q4_0 GGUF 364.3 397.5
rebuilt (post-June-5) gemma-4-12B-it-qat-UD-Q4_K_XL 192.9

Two independent engines agree per-artifact → llama.cpp's forward is NOT the
problem; the artifacts are. PR ggml-org/llama.cpp#24118 fixed
vision/audio projector config handling — the text-tower weight damage
predates and survives it.

Damage anatomy (forensics scripts included):

  • No layer permutation: the blk↔layer mapping is exactly diagonal
    (cross-layer cosines ≈ 0).
  • In-place damage with a period-6 signature: vs the official checkpoint,
    layers ≡ 0,1 (mod 6) sit at cos 0.93–0.97 while the other four sit at
    0.24–0.70 (measured on the pre-fix K_M, which shares the bf16 source).
  • The per-layer layer_output_scale class is independently defective:
    restoring ONLY those scalars from the checkpoint takes the QAT artifact
    from 364 → 97. Restoring norms or embeddings makes it worse (they are
    coherent with the damaged weights — so the matmul tensors are damaged too).
  • Generation looks deceptively OK (confident positions stay correct), which
    is why this slipped through smoke tests. PPL on a fixed fixture catches it.

Reproduction: all instruments + receipts (MIT) are here:
https://github.com/nihilistau/shannon-prime-lattice/tree/main/tests/gemma4_gold
_t2_manual_forward.py (gold), _t2c_gold_on_gguf.py (grade any GGUF),
_t2g_perm_hunt.py (cosine forensics). A step-by-step verification + fix
write-up: https://github.com/nihilistau/Position_Is_Arithmetic/blob/main/GEMMA4-QUANT-FIX.md

Suggested fix on your side: re-convert from the official safetensors and
verify at the weight level before publishing — per-layer cosine vs the
checkpoint (should be >0.99 for ≥8-bit tensors, uniform across layers) and a
fixed-fixture teacher-forced PPL within a few percent of 4.68. Happy to help
validate a candidate rebuild with the instruments above.

Also FYI: gemma-4-12B is unusually PTQ-hostile — we measured naive
all-tensor symmetric int4 at +45% PPL even from clean weights; keeping
attention/down-proj/embed at 8-bit and quantizing only FFN gate/up to 4-bit
lands at +9.6%. Recipe table in the write-up.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions