Arcee AI Releases Trinity Large Thinking: An Apache 2.0 Open Reasoning Model for Long-Horizon Agents and Tool Use
The landscape of open-source artificial intelligence has shifted from purely generative models toward systems capable of complex, multi-step reasoning. While proprietary ‘reasoning’ models have dominated the conversation, Arcee AI has released Trinity Large Thinking.
This release is an open-weight reasoning model distributed under the Apache 2.0 license, positioning it as a transparent alternative for developers building autonomous agents. Unlike models optimized solely for conversational chat, Trinity Large Thinking is specifically developed for long-horizon agents, multi-turn tool calling, and maintaining context coherence over extended workflows.
Architecture: Sparse MoE at Frontier Scale
Trinity Large Thinking is the reasoning-oriented iteration of Arcee’s Trinity Large series. Technically, it is a sparse Mixture-of-Experts (MoE) model with 400 billion total parameters. However, its architecture is designed for inference efficiency; it activates only 13 billion parameters per token using a 4-of-256 expert routing strategy.
This sparsity provides the world-knowledge density of a massive model without the prohibitive latency typical of dense 400B architectures. Key technical innovations in the Trinity Large family include:
- SMEBU (Soft-clamped Momentum Expert Bias Updates): A new MoE load balancing strategy that prevents expert collapse and ensures more uniform utilization of the model’s specialized pathways.
- Muon Optimizer: Arcee utilized the Muon optimizer during the training of the 17-trillion-token pre-training phase, which allows for higher capital and sample efficiency compared to standard AdamW implementations.
- Attention Mechanism: The model features interleaved local and global attention alongside gated attention to enhance its ability to comprehend and recall details within large contexts.
Reasoning
A core differentiator of Trinity Large Thinking is its behavior during the inference phase. Arcee team in their docs state that the model utilizes a ‘thinking’ process prior to delivering its final response. This internal reasoning allows the model to plan multi-step tasks and verify its logic before generating an answer.
Performance: Agents, Tools, and Context
Trinity Large Thinking is optimized for the ‘Agentic’ era. Rather than competing purely on general-knowledge trivia, its performance is measured by its reliability in complex software environments.

Benchmarks and Rankings
The model has demonstrated strong performance in PinchBench, a benchmark designed to evaluate model capability in environments relevant to autonomous agents. Currently, Trinity Large Thinking holds the #2 spot on PinchBench, trailing only behind Claude Opus-4.6.
Technical Specifications
- Context Window: The model supports a 262,144-token context window (as listed on OpenRouter), making it capable of processing massive datasets or long conversational histories for agentic loops.
- Multi-Turn Reliability: The training focused heavily on multi-turn tool use and structured outputs, ensuring that the model can call APIs and extract parameters with high precision over many turns.
Key Takeaways
- High-Efficiency Sparse MoE Architecture: Trinity Large Thinking is a 400B-parameter sparse Mixture-of-Experts (MoE) model. It utilizes a 4-of-256 routing strategy, activating only 13B parameters per token during inference to provide frontier-scale intelligence with the speed and throughput of a much smaller model.
- Optimized for Agentic Workflows: Unlike standard chat models, this release is specifically tuned for long-horizon tasks, multi-turn tool calling, and high instruction-following accuracy. It currently ranks #2 on PinchBench, a benchmark for autonomous agent capabilities, trailing only behind Claude 3.5 Opus.
- Expanded Context Window: The model supports an extensive context window of 262,144 tokens (on OpenRouter). This allows it to maintain coherence across massive technical documents, complex codebases, and extended multi-step reasoning chains without losing track of early instructions.
- True Open Ownership: Distributed under the Apache 2.0 license, Trinity Large Thinking offers ‘True Open’ weights available on Hugging Face. This permits enterprises to audit, fine-tune, and self-host the model within their own infrastructure, ensuring data sovereignty and regulatory compliance.
- Advanced Training Stability: To achieve frontier-class performance with high capital efficiency, Arcee employed the Muon optimizer and a proprietary load-balancing technique called SMEBU (Soft-clamped Momentum Expert Bias Updates), which ensures stable expert utilization and prevents performance degradation during complex reasoning tasks.
Check out the Technical details and Model Weight. Also, feel free to follow us on Twitter and don’t forget to join our 120k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.












