CyberNTX

State-of-the-art AI compression for large models.

Supernova makes big models dramatically smaller so they are cheaper to store, easier to fit, and more practical to run. The point is not a clever quant name. The point is a real compression system: iterative residual refinement, resident execution layouts, true ternary packing, and fused kernels aimed at keeping the runtime story intact.

See The Numbers View Tiers

Smaller checkpointsLower memory pressureClear quality tiers

At A Glance

Base

fp16

Original checkpoint. The quality reference point.

Parity

2.47 BPW

The safe cut. Smaller fast, quality-first.

Frontier High

1.85 BPW

The flagship tier. Much smaller, still serious.

Frontier Low

1.23 BPW

The extreme tier. Maximum pressure, smaller still.

Why It Matters

More compression means more model fits on the same machine. Doing that without giving the win back in reconstruction overhead is the whole game.

Much Lower Weight Footprint

Supernova pushes model weights far below fp16 so serious checkpoints fit where they normally would not.

More Model Per Machine

Less memory pressure means more room to run larger models, denser batches, or more useful deployments on the same hardware.

Compression You Can Understand

Parity, Frontier High, and Frontier Low are distinct operating points with different promises, not a wall of random quant names.

Current Read

`Parity` is the safe default. `Frontier High` is the main event. `Frontier Low` is the aggressive tier and should be treated that way.

Why It Is Different

Compression, storage format, and execution strategy all in one system.

It Keeps Refining The Error

Supernova does not stop at one static rotation. It keeps fitting the remaining error tile by tile, step by step, so compression stays expressive even when bits per weight get extremely low.

It Executes In A Resident Layout

Resident buckets group work by step, seed, and row block so the runtime can stay inside the transformed space instead of decompressing back to a standard dense matrix first.

It Treats Bandwidth As The Enemy

True base-3 ternary packing and fused gather-plus-linear kernels are built to keep bytes and launches down. The point is not just smaller files. The point is compression that still wants to move fast.

The Tiers Stay Legible

All of that machinery still lands in a simple ladder: `base`, `parity`, `frontier_high`, and `frontier_low`. The technical depth stays under the hood while the tradeoff stays visible.

Model Focus

Start with Qwen3.5 and make the compression story undeniable.

The first public surface is centered on Qwen3.5 so the results, the artifact flow, and the tier story stay legible instead of getting buried under a support matrix nobody believes.

0.8B

Small enough to iterate quickly. Big enough to show whether the compression story is real.

2B

The first model where “smaller and still useful” starts to become a real product argument.

4B to 9B

The place where memory ceilings, bandwidth costs, and deployment pain become impossible to ignore.

Tiers

Three operating points. Three different promises.

The goal is not to drown people in quant names. The goal is to make the tradeoff obvious before they download anything.

Parity

2.47 BPW

The safety-first tier. Built to stay close to fp16 while still cutting weight footprint hard.

Frontier High

1.85 BPW

The main compression tier. This is where Supernova gets much more aggressive while still trying to stay useful.

Frontier Low

1.23 BPW

The aggressive tier. Maximum compression pressure, with a much tighter quality envelope.

Reality Check

The real path today is `transformers`. The backend story is still early.

The strongest current surface is artifact build, inspection, reload, and runtime-mode switching in `transformers`. `vLLM` and `llama.cpp` are important, but they are still integration lanes, not finished runtime stories.

Use Now

Build artifacts, inspect them, reload them, and run them through a working public surface.

Coming Next

`vLLM` and `llama.cpp` package/export bridges are in place, but they are not the finished story yet.

Why It Matters

The compression is exciting now. The broader serving story gets more valuable as the backend support catches up.

Start with the evidence. Then look at the workflow.

The benchmarks page is where the compression claims should earn the right to exist.
The models page is where the tiers and supported model line should become obvious.
The install and docs pages should make the first real workflow feel easy, not ceremonial.

Explore

Keep going.

Install

Build or download the first usable artifact path.

Benchmarks

See the proof, not the pitch.

Models

Understand the tiers and the first supported family.

Docs

Follow the real workflow from artifact to runtime.

Licensing

See what is open and where commercial terms begin.