MoE - F4u.in

Build Recurrent-Depth Transformers with OpenMythos for MLA, GQA, Sparse MoE, and Loop-Scaled Reasoning

By adminMay 22, 2026

def build_model(attn_type: str = “mla”, max_loop_iters: int = 8) -> tuple: “””Build a small OpenMythos model. Two attention variants supported. MLA — Multi-Latent Attention (compressed KV…

Cohere Releases Command A+: A 218B Sparse MoE Model for Agentic Workflows That Runs on as Few as Two H100 GPUs

By adminMay 21, 2026

Cohere just released Command A+, as an open-source model targeting enterprise agentic workflows. Available under an Apache 2.0 license, Command A+ is a mixture-of-experts (MoE) model…

Zyphra Releases ZAYA1-8B-Diffusion-Preview: The First MoE Diffusion Model Converted From an Autoregressive LLM With Up to 7.7x Speedup

By adminMay 16, 2026

Zyphra, the San Francisco-based AI lab behind the ZAYA1 model family, released ZAYA1-8B-Diffusion-Preview — a preview of its early work in diffusion-language models. The release demonstrates…

Meet AntAngelMed: A 103B-Parameter Open-Source Medical Language Model Built on a 1/32 Activation-Ratio MoE Architecture

By adminMay 13, 2026

A team researchers from China have released AntAngelMed, a large open-source medical language model that the team describes as the largest and most capable of its…

Zyphra Releases ZAYA1-8B: A Reasoning MoE Trained on AMD Hardware That Punches Far Above Its Weight Class

By adminMay 7, 2026

Zyphra AI has released ZAYA1-8B, a small Mixture of Experts (MoE) language model with 760 million active parameters and 8.4 billion total parameters. Trained end-to-end on…

Alibaba Qwen Team Releases Qwen3.6-27B: A Dense Open-Weight Model Outperforming 397B MoE on Agentic Coding Benchmarks

By adminApril 22, 2026

Alibaba’s Qwen Team has released Qwen3.6-27B, the first dense open-weight model in the Qwen3.6 family — and arguably the most capable 27-billion-parameter model available today for…

A Coding Implementation on Qwen 3.6-35B-A3B Covering Multimodal Inference, Thinking Control, Tool Calling, MoE Routing, RAG, and Session Persistence

By adminApril 21, 2026

class QwenChat: def __init__(self, model, processor, system=None, tools=None): self.model, self.processor = model, processor self.tokenizer = processor.tokenizer self.history: list[dict] = [] if system: self.history.append({“role”: “system”, “content”: system})…

Qwen Team Open-Sources Qwen3.6-35B-A3B: A Sparse MoE Vision-Language Model with 3B Active Parameters and Agentic Coding Capabilities

By adminApril 17, 2026

The open-source AI landscape has a new entry worth paying attention to. The Qwen team at Alibaba has released Qwen3.6-35B-A3B, the first open-weight model from the…

NVIDIA Releases Nemotron-Cascade 2: An Open 30B MoE with 3B Active Parameters, Delivering Better Reasoning and Strong Agentic Capabilities

By adminMarch 20, 2026

NVIDIA has announced the release of Nemotron-Cascade 2, an open-weight 30B Mixture-of-Experts (MoE) model with 3B activated parameters. The model focuses on maximizing ‘intelligence density,’ delivering…

Mistral AI Releases Mistral Small 4: A 119B-Parameter MoE Model that Unifies Instruct, Reasoning, and Multimodal Workloads

By adminMarch 16, 2026

Mistral AI has released Mistral Small 4, a new model in the Mistral Small family designed to consolidate several previously separate capabilities into a single deployment…

What's Hot

6 red flags that tell you to avoid a Linux distro before you install it

Fujifilm’s Instax Mini 13 got me interested in instant cameras again

Forget the Toyota RAV4—the Kia Seltos just proved it’s the smarter buy

Browsing: MoE

Build Recurrent-Depth Transformers with OpenMythos for MLA, GQA, Sparse MoE, and Loop-Scaled Reasoning

Cohere Releases Command A+: A 218B Sparse MoE Model for Agentic Workflows That Runs on as Few as Two H100 GPUs

Zyphra Releases ZAYA1-8B-Diffusion-Preview: The First MoE Diffusion Model Converted From an Autoregressive LLM With Up to 7.7x Speedup

Meet AntAngelMed: A 103B-Parameter Open-Source Medical Language Model Built on a 1/32 Activation-Ratio MoE Architecture

Zyphra Releases ZAYA1-8B: A Reasoning MoE Trained on AMD Hardware That Punches Far Above Its Weight Class

Alibaba Qwen Team Releases Qwen3.6-27B: A Dense Open-Weight Model Outperforming 397B MoE on Agentic Coding Benchmarks

A Coding Implementation on Qwen 3.6-35B-A3B Covering Multimodal Inference, Thinking Control, Tool Calling, MoE Routing, RAG, and Session Persistence

Qwen Team Open-Sources Qwen3.6-35B-A3B: A Sparse MoE Vision-Language Model with 3B Active Parameters and Agentic Coding Capabilities

NVIDIA Releases Nemotron-Cascade 2: An Open 30B MoE with 3B Active Parameters, Delivering Better Reasoning and Strong Agentic Capabilities

Mistral AI Releases Mistral Small 4: A 119B-Parameter MoE Model that Unifies Instruct, Reasoning, and Multimodal Workloads

6 red flags that tell you to avoid a Linux distro before you install it

Fujifilm’s Instax Mini 13 got me interested in instant cameras again

Forget the Toyota RAV4—the Kia Seltos just proved it’s the smarter buy

6 red flags that tell you to avoid a Linux distro before you install it

Fujifilm’s Instax Mini 13 got me interested in instant cameras again

Forget the Toyota RAV4—the Kia Seltos just proved it’s the smarter buy

Usefull link

categories