Llama 4's 10M Context Window, Mistral's Agent Coder, and OpenAI Price

Alex Rivers
Senior AI Journalist • AI Stack Digest

April 17, 2026 • 4 min read

Friday’s AI headlines are loaded. Models, money, policy, and infrastructure all moved this week. Here’s what matters before the weekend.

Meta Drops Llama 4 Scout and Maverick: Open Weights, 10M Token Context

Meta released two new Llama 4 models this week that represent the most significant open-weight leap in months. Llama 4 Scout packs 17B active parameters across a 16-expert mixture-of-experts architecture and supports a 10 million token context window — the longest ever available on an open-weight model. Llama 4 Maverick scales to 128 experts and benchmarks favourably against GPT-4o and Claude 3.5 Sonnet on several reasoning and coding tasks.

Both models are available under Meta’s custom open-source licence. Scout’s 10M context is the headline feature for developers working on RAG pipelines, long-document summarisation, or code-repository analysis. Maverick targets enterprise workloads where frontier-class reasoning is needed without frontier-class API costs. Early evals show strong results on MMLU, GPQA, and HumanEval.

Source: Meta AI Blog

Mistral Launches Devstral: A Coding Model Built for AI Agents

Mistral AI released Devstral, a 24B model designed specifically for agentic coding workflows. Unlike standard code-completion models, Devstral is trained to handle multi-step tool use, bash execution, file editing, and reasoning over entire codebases — the exact capabilities required by coding agents like Devin and OpenHands. On SWE-Bench Verified, Devstral outperforms GPT-4o and Claude 3.5 Haiku on repository-level tasks.

The model is available via Mistral’s API and as an open-weight release on HuggingFace. It runs on a single A100 GPU, making it viable for self-hosted agentic pipelines. For teams building autonomous coding workflows, Devstral is worth a serious look this weekend.

Source: Mistral AI Blog

OpenAI Cuts o4-mini Prices ~25%

OpenAI updated its API pricing, reducing o4-mini input token costs by roughly 25%. The cut follows sustained competitive pressure from Google’s Gemini 2.5 Flash — currently the price-performance benchmark for mid-tier reasoning — and Meta’s newly released Llama 4 Maverick. OpenAI also raised default rate limits for tier-2 API customers on GPT-4o.

No changes were made to GPT-4.1 or full o3 pricing. This is OpenAI’s third price reduction in twelve months, continuing the industry-wide inference commoditisation trend that is quietly making frontier AI affordable for smaller teams.

Source: OpenAI API changelog

Anthropic and Palantir Partner for Government AI Deployments

Anthropic announced a partnership with Palantir to deploy Claude models inside U.S. government and defence environments, including classified and top-secret network tiers. The deal makes Anthropic the first major frontier AI lab with access to those environments. Claude will be available through Palantir’s AIP (Artificial Intelligence Platform).

The partnership is a significant commercial milestone for Anthropic, which has positioned Constitutional AI training as a trust differentiator for high-stakes deployments. Pricing and deployment scale were not disclosed, but this signals that safety-focused AI labs can win government contracts by leading on alignment, not just benchmarks.

Source: Reuters, Anthropic newsroom

EU AI Act: August 2026 High-Risk Deadline Is 4 Months Away

For developers building or shipping AI products into EU markets: the EU AI Act’s high-risk AI system requirements become enforceable in August 2026 — now under four months away. High-risk categories include AI used in medical devices, employment decisions, credit scoring, law enforcement, and critical infrastructure.

The European AI Office has been releasing guidance documents, and first formal enforcement actions are expected in Q3 2026. If you haven’t started a compliance review for EU-facing AI products, the window is closing fast.

Source: EU AI Office, EUR-Lex

Google DeepMind Releases Full Gemma 3 Technical Report

Google DeepMind published the complete technical paper for Gemma 3, covering models from 1B to 27B parameters. Key details: training on 14 trillion tokens, knowledge distillation from Gemini, a new interleaved attention architecture, and a vision encoder supporting 768-token image patches (up from 256 in previous versions). The 27B variant scores best-in-class on MT-Bench and HumanEval for its parameter range.

The paper also introduces a “knowledge distillation at scale” method that could influence how the next generation of open-weight models are trained — making it required reading for anyone building on small-to-mid-size open models.

Source: Google DeepMind

That’s your Friday briefing. The through-line this week: open models are closing the gap with frontier proprietary models faster than most predicted at the start of 2026. Llama 4 Scout’s 10M context and Devstral’s agentic performance are the clearest signals yet. See you Monday.

— Alex Rivers, Senior AI Journalist

What to Read Next

Bookmark aistackdigest.com for daily AI tools, reviews, and workflow guides.

This article was produced with the assistance of AI tools and reviewed by the AIStackDigest editorial team.

Llama 4’s 10M Context Window, Mistral’s Agent Coder, and OpenAI Price Cuts — AI News April 17, 2026