Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up

All HF Hub posts

prithivMLmodsย 
posted an update 3 days ago
view post
Post
5773
PiD โ€” Pixel Diffusion Decoder Image Edit Upscale and Image Generation Upscale, an all-in-one demo, is now live on Spaces! Great improvements in realism-based image generation and editing are powered by FLUX.2-Klein, while image generation is paired with Z-Image, and upscaling is enabled by default!

๐Ÿค— Space: prithivMLmods/PiD-Image-Upscaler
๐Ÿ”— Collection: https://huggingface.co/collections/prithivMLmods/image-generation-apps-collection

๐Ÿค— > To learn more, visit the app page or the respective model pages.
RiverRiderย 
posted an update 3 days ago
view post
Post
4684
SRT-introspect: Live Token-by-Token Readout of LLM Internal Reasoning

I have released SRT-introspect, a new public demonstration that makes the hidden reasoning process of a frozen large language model visible in real time.

The interface runs a Qwen-2.5-7B backbone equipped with the SRT Adapter and Activation Verbalizer. As the model generates each token, the system continuously measures divergence across attention heads, identifies high-signal moments, and translates the corresponding hidden-state object representations into natural-language verbalizations. You see exactly what the model is internally representing at the precise points where its computation is most active, complete with divergence scores, reflexivity estimates, and per-layer traces.

This is not a summary of the final output. It is a direct window into the modelโ€™s latent conceptual landscape, showing the dominant training-data attractors that activate even when the prompt asks for first-principles reasoning. The adaptive scheduler concentrates verbalizations precisely where the real internal work occurs, turning what used to be opaque black-box generation into observable, analyzable data.

The result is the clearest public demonstration yet that modern LLMs possess a rich, structured semiotic infrastructure that can now be audited without retraining or fine-tuning.

Try it:
RiverRider/srt-introspect
pankajpandey-devย 
posted an update about 14 hours ago
view post
Post
2518
๐Ÿ‡ฎ๐Ÿ‡ณ Qwen3-4B Hindi Instruct v2 โ€” a Hindi LLM that runs on your own machine
Most strong Hindi-capable models are either huge or cloud-only. I wanted one that's small enough to run locally but actually follows instructions in Hindi โ€” so I fine-tuned Qwen3-4B on 10K Hindi instruction pairs and shipped it with a full GGUF quant ladder.
โœ… Fine-tune (16-bit): huggingface.co/pankajpandey-dev/Qwen3-4B-Hindi-Instruct-v2
โœ… GGUF (Q4/Q5/Q8): huggingface.co/pankajpandey-dev/Qwen3-4B-Hindi-Instruct-v2-GGUF
Runs in Ollama, llama.cpp, and LM Studio. The Q4_K_M is just 2.5 GB โ€” fits comfortably on a laptop, CPU or GPU.
Part of my Hindi LLM Series โ€” building openly-licensed Indic models for local and edge use. More coming (Gemma next). Feedback welcome ๐Ÿ™
#Hindi #IndicNLP #GGUF #LocalLLM #Qwen
pankajpandey-devย 
posted an update 6 days ago
view post
Post
2573
๐Ÿงฌ Just uploaded K-quants of Carbon-3B for llama.cpp users!
@HuggingFaceBio released the original GGUF in bf16 only โ€” so I added the full quant ladder for CPU/edge inference:
โ€ข Q2_K โ†’ 1.4 GB
โ€ข Q3_K_M โ†’ 1.8 GB
โ€ข Q4_K_M โ†’ 2.1 GB โญ
โ€ข Q5_K_M โ†’ 2.4 GB
โ€ข Q6_K โ†’ 2.7 GB
โ€ข Q8_0 โ†’ 3.5 GB
๐Ÿ”— pankajpandey-dev/Carbon-3B-GGUF
Now you can generate DNA sequences on your laptop. Needs a llama.cpp build with PR #23410 (HybridDNATokenizer support).
Huge thanks to the HuggingFaceBio team for the original model ๐Ÿ™
#GGUF #llamacpp #genomics #DNA

kanaria007ย 
posted an update 2 days ago
view post
Post
81
โœ… Article highlight: *World Economy Governance & Anti-Manipulation* (art-60-161, v0.1)

TL;DR:
This article treats a world economy as a governance surface, not just a price simulator.

If you want to say โ€œprices were fair,โ€ โ€œthere was no manipulation,โ€ or โ€œthis market intervention was legitimate,โ€ you need more than dashboards. You need pinned measurement semantics, receipted adversary monitoring, and receipted institutional intervention. In this framing, markets are not vibes. They are policies with receipts.

Read:
kanaria007/agi-structural-intelligence-protocols

Why it matters:
โ€ข turns economy claims into auditable claims instead of economist-flavored storytelling
โ€ข treats bot farms, market manipulation, and propaganda as adversarial operations with receipts
โ€ข makes โ€œno manipulationโ€ a stronger claim that must be monitoring-backed
โ€ข shows how freezes, rollbacks, tax changes, and price-band interventions need explicit policy hooks and authority

Whatโ€™s inside:
โ€ข *economy observability contracts* and *metrics profiles* for pinned measurement semantics
โ€ข *economy monitoring profiles/receipts* anchored to 148 adversary monitoring
โ€ข oracle-backed economy events such as *MARKET_REGIME_SHIFT*
โ€ข receipted institutional interventions: freezes, rollback trades, tax changes, and price-band updates
โ€ข the idea of *safe-mode economics* when integrity or coverage becomes uncertain

Key idea:
Do not say:

*โ€œthe market looked healthy.โ€*

Say:

*โ€œthis economy claim is backed by pinned observability and metrics profiles, monitoring receipts, and receipted institutional actions under declared policy and authority.โ€*
SeaWolf-AIย 
posted an update 3 days ago
view post
Post
4119
Darwin-60B-DUO: Two SOTAs, One Endpoint โ€” 88.38% on GPQA Diamond ๐Ÿš€

We're excited to release Darwin-60B-DUO, the Darwin family's first DUO model. Take two domain-verified specialists, hide them behind a single OpenAI-compatible endpoint, and let a router decide which one (or both) answers. You see one model, one API โ€” but get the best of both.

The number that matters: on the full 198-question GPQA Diamond, Darwin-60B-DUO hits 88.38%. The constituents alone land at 69.70% (Darwin-28B-REASON) and 77.27% (AWAXIS-Think-31B); a naive cascade only reaches 83.84%. The DUO clears them all. Two small specialists, intelligently routed, beat one big generalist on cost and quality. Both are independently verified โ€” Darwin-28B-REASON is #3 on the HF GPQA Diamond leaderboard, AWAXIS-Think-31B is #1 on Korea's national K-AI Leaderboard (MSIT).

The brains is a Hybrid-A router picking one of five strategies on the fly. Korean โ†’ AWAXIS, English/STEM โ†’ Darwin (single-backend, ~70% of traffic at 1ร— cost). When a Korean answer needs rigorous English reasoning, split_refine fires โ€” Darwin drafts, AWAXIS polishes; MCQ/short-answer runs both with self-consistency + cross-verify. Net effective cost: only ~1.3ร— a single 30B model.

The part the community will care about: the gateway is model-agnostic and Apache-2.0. Point it at any two OpenAI-compatible backends and you've got a DUO in minutes โ€” teach router.py when to use which, and parallel calls, response merging, and routing transparency via _duo_route are handled for you. Fork it and tell us what you built.

Painless deploy: docker compose up for both vLLM backends + gateway; FP8 ~30GB colocates on a single B200/H100. One git clone (~120GB). Text-only for now, streaming in v1.1.
Two SOTAs, one endpoint. Come build your own on the Community tab.

๐Ÿ‘‡
๐Ÿ”— FINAL-Bench/Darwin-60B-DUO
codelionย 
posted an update 4 days ago
view post
Post
3028
Inspired by the Nemotron Diffusion recipe, check out dhara-250m: a 250M experimental language model that supports three decoding modes from one set of weights: autoregressive, block-diffusion, and self-speculation.

It is small, easy to try, and meant for exploring diffusion-style decoding and latency tradeoffs in compact LMs.

Model: codelion/dhara-250m

Try the chat demo here: codelion/dhara-chat
  • 3 replies
ยท
pankajpandey-devย 
posted an update 5 days ago
view post
Post
561
๐Ÿ‡ฎ๐Ÿ‡ณ Just shipped: MiniCPM5-1B-Hindi-Instruct (+ GGUF quants)

First Hindi instruction-tuned fine-tune of OpenBMB's brand-new MiniCPM5-1B (released this week).

Trained with Unsloth + LoRA (r=32) on AI4Bharat's anudesh + dolly Hindi splits โ€” ~4k high-quality examples, 2 epochs on a single T4 in 60 minutes.

๐Ÿ”— Model (16-bit + LoRA adapter):
pankajpandey-dev/MiniCPM5-1B-Hindi-Instruct

๐Ÿ“ฆ GGUF quants for llama.cpp / Ollama / LM Studio:
pankajpandey-dev/MiniCPM5-1B-Hindi-Instruct-v1-GGUF

5 quant levels โ€” from Q3_K_M (~560 MB, runs on a Raspberry Pi) to Q8_0 (~1.2 GB, near-lossless). Q4_K_M is the recommended default.

Part of my ongoing ๐Ÿ‡ฎ๐Ÿ‡ณ Hindi LLM Series โ€” bringing strong open-source LLMs to Indian languages.

#Hindi #IndicNLP #MiniCPM5 #LoRA #Unsloth #GGUF #llamacpp #Ollama #LocalLLM
pankajpandey-devย 
posted an update 8 days ago
view post
Post
222
Just released Qwen3-0.6B fine-tuned on Hindi instruction data ๐Ÿ‡ฎ๐Ÿ‡ณ

โœ… Full model: pankajpandey-dev/Qwen3-0.6B-Hindi-Instruct-v1
โœ… GGUF versions (Q2/Q4/Q5/Q8): pankajpandey-dev/Qwen3-0.6B-Hindi-Instruct-v1-GGUF

Smallest Hindi-capable GGUF โ€” runs on any laptop at 0.37GB.
Next: v2 with more data, better responses.

#Hindi #LLM #GGUF #OpenSource
tomaarsenย 
posted an update 12 days ago
view post
Post
592
๐Ÿค— Announcing the Ettin Reranker family: six new state-of-the-art CrossEncoder rerankers for search from 17M to 1B parameters, plus the full training data and the ~150-line recipe. Built on the Ettin ModernBERT encoders, Apache 2.0. Details:

All six were trained with the same single-stage pointwise MSE distillation recipe, with mixedbread-ai/mxbai-rerank-large-v2 (1.54B) as the teacher. Only the learning rate and per-device batch size change between sizes. The 1B student matches the teacher within 0.0001 NDCG@10 on MTEB(eng, v2) Retrieval, the 150M is the strongest reranker I tested in the under-600M range, and the 17M beats the 33M ms-marco-MiniLM-L12-v2 by +0.051 NDCG@10 at roughly half the parameter count.

Speed matters as much as quality for a reranker, since it determines whether the model fits the latency budget between retrieval and showing results. Our 17M is the fastest reranker in the whole comparison at 7517 pairs/sec on an H100. Our 150M runs 2.3x faster than the two other 150M ModernBERT-base rerankers (gte-reranker-modernbert-base and granite-embedding-reranker-english-r2) because the modular Transformer module propagates unpadded inputs through every layer rather than just the FA2 attention kernel. And our 1B is 2.4x faster than its 1.5B teacher while matching it on quality.

I bootstrapped the training recipe with the new train-sentence-transformers Agent Skill shipped in Sentence Transformers v5.5.0. Install it with hf skills add train-sentence-transformers --claude and ask Claude Code (or Codex / Cursor / Gemini CLI) to fine-tune a SentenceTransformer, CrossEncoder, or SparseEncoder model on your data.

I wrote a blog post walking through usage, results across six embedder pairings, the speed story, and the complete training script. Check it out, or just point your Agent to the URL:

https://huggingface.co/blog/ettin-reranker

Collection: https://huggingface.co/collections/cross-encoder/ettin-rerankers