MiniMax has hinted at upcoming features:
Standard transformers suffer from quadratic complexity. Sparse attention helps, but Interstellar-V3 introduces Nebula Attention, a dynamic graph-based attention system. Instead of attending to every token, the model builds a dynamic "gravity model" of the input, where important tokens (high mass) attract more attention bandwidth. This allows the model to process the entire text of War and Peace 500 times over in a single forward pass. interstellar-v3
Early independent benchmarks (via the Artificial Analysis index) reveal staggering results. Please note these are aggregated from leaked pre-release data. MiniMax has hinted at upcoming features:
| Benchmark | GPT-4 Turbo | Claude 3.5 Sonnet | Interstellar-V3 | | :--- | :--- | :--- | :--- | | MMLU (5-shot) | 86.4% | 88.7% | 91.2% | | GSM8K (Math) | 92.0% | 95.4% | 98.6% | | HumanEval (Coding) | 84.2% | 92.0% | 96.5% | | Long Context (1M tokens) | 65% accuracy | 78% accuracy | 94% accuracy | | Vibe-Eval (Video) | N/A | 32% | 87% | Standard transformers suffer from quadratic complexity
The 10-Million Context Window The standout feature is the memory retention over 10 million tokens. In a stress test, researchers fed Interstellar-V3 the entire "Three-Body Problem" trilogy, asked it to identify continuity errors across book 1 and book 3, and then rewrite the final chapter in the style of Ursula K. Le Guin. The result was coherent, stylistically accurate, and mathematically consistent with the fictional physics.
The Interstellar-V3 design philosophy pivots from brute force to intelligent resilience. It is not a single engine type but a hybrid system of four breakthrough technologies: