Subquadratic releases SubQ 1M with 12M token context at 1/5 frontier cost
Subquadratic launched SubQ 1M-Preview on May 5 with a native 12 million token context window using sparse, subquadratic attention instead of standard transformers. The model costs roughly one-fifth the price of frontier alternatives and achieves up to 52x faster attention at scale.
Subquadratic released SubQ 1M-Preview on May 5, backed by $29M in seed funding. The model breaks the O(n²) cost ceiling that has constrained long-context inference across the industry by replacing standard quadratic attention with a sparse, subquadratic architecture. Native 12 million token context window enables use cases that frontier models either cannot handle or require prohibitive inference costs.
Architectural shift away from transformers
Subquadratic attention eliminates the quadratic scaling that makes long-context inference expensive. At 12M tokens, standard transformer attention would require matrix operations proportional to n². Subquadratic's sparse attention reduces this to sub-quadratic complexity, lowering both memory and compute requirements. Benchmarks show 52x faster attention computation at scale compared to dense attention baselines.
Economics and market positioning
Pricing at one-fifth the cost of frontier models (Claude 3.5 Sonnet, GPT-4o) shifts developer economics. For applications requiring extended context (long document analysis, code repositories, video transcripts), SubQ 1M offers a cost-performance tradeoff that makes long-context inference economically viable for cost-sensitive workloads. The $29M seed round signals investor confidence that architectural innovation, not just scale, can compete with frontier labs.