It Keeps Refining The Error
Supernova does not stop at one static rotation. It keeps fitting the remaining error tile by tile, step by step, so compression stays expressive even when bits per weight get extremely low.
CyberNTX
Supernova makes big models dramatically smaller so they are cheaper to store, easier to fit, and more practical to run. The point is not a clever quant name. The point is a real compression system: iterative residual refinement, resident execution layouts, true ternary packing, and fused kernels aimed at keeping the runtime story intact.
At A Glance
Base
fp16Original checkpoint. The quality reference point.
Parity
2.47 BPWThe safe cut. Smaller fast, quality-first.
Frontier High
1.85 BPWThe flagship tier. Much smaller, still serious.
Frontier Low
1.23 BPWThe extreme tier. Maximum pressure, smaller still.
Why It Matters
More compression means more model fits on the same machine. Doing that without giving the win back in reconstruction overhead is the whole game.
Much Lower Weight Footprint
Supernova pushes model weights far below fp16 so serious checkpoints fit where they normally would not.
More Model Per Machine
Less memory pressure means more room to run larger models, denser batches, or more useful deployments on the same hardware.
Compression You Can Understand
Parity, Frontier High, and Frontier Low are distinct operating points with different promises, not a wall of random quant names.
Current Read
`Parity` is the safe default. `Frontier High` is the main event. `Frontier Low` is the aggressive tier and should be treated that way.
Why It Is Different
Supernova does not stop at one static rotation. It keeps fitting the remaining error tile by tile, step by step, so compression stays expressive even when bits per weight get extremely low.
Resident buckets group work by step, seed, and row block so the runtime can stay inside the transformed space instead of decompressing back to a standard dense matrix first.
True base-3 ternary packing and fused gather-plus-linear kernels are built to keep bytes and launches down. The point is not just smaller files. The point is compression that still wants to move fast.
All of that machinery still lands in a simple ladder: `base`, `parity`, `frontier_high`, and `frontier_low`. The technical depth stays under the hood while the tradeoff stays visible.
Model Focus
The first public surface is centered on Qwen3.5 so the results, the artifact flow, and the tier story stay legible instead of getting buried under a support matrix nobody believes.
Small enough to iterate quickly. Big enough to show whether the compression story is real.
The first model where “smaller and still useful” starts to become a real product argument.
The place where memory ceilings, bandwidth costs, and deployment pain become impossible to ignore.
Tiers
The goal is not to drown people in quant names. The goal is to make the tradeoff obvious before they download anything.
The safety-first tier. Built to stay close to fp16 while still cutting weight footprint hard.
The main compression tier. This is where Supernova gets much more aggressive while still trying to stay useful.
The aggressive tier. Maximum compression pressure, with a much tighter quality envelope.
Reality Check
The strongest current surface is artifact build, inspection, reload, and runtime-mode switching in `transformers`. `vLLM` and `llama.cpp` are important, but they are still integration lanes, not finished runtime stories.
Use Now
Build artifacts, inspect them, reload them, and run them through a working public surface.
Coming Next
`vLLM` and `llama.cpp` package/export bridges are in place, but they are not the finished story yet.
Why It Matters
The compression is exciting now. The broader serving story gets more valuable as the backend support catches up.
What To Read Next
Explore