Mistral Small 4 brings multimodal reasoning to open-source AI, and it changes the sovereign AI calculus

Mistral AI released Mistral Small 4 this week, a model that consolidates reasoning, multimodal understanding, and agentic coding into a single open-source package. Released under the Apache 2.0 license, the 119-billion parameter model uses a Mixture-of-Experts architecture with 128 experts (4 active per token), which keeps inference costs manageable even at scale. It supports a 256,000-token context window and accepts both text and image inputs natively.

Until now, organizations that wanted top-tier reasoning capability had to use one model, multimodal document processing required another, and agentic coding tasks required yet another. Small 4 merges all three into a single deployable unit. The model also includes a configurable reasoning effort parameter, letting you dial between fast low-latency responses and deep step-by-step reasoning depending on the task. Mistral reports a 40% reduction in end-to-end completion time and 3x more requests per second compared to Small 3.

The model can run on four NVIDIA H100 GPUs as a minimum setup, or two H200s, or a single DGX B200. It is already available on vLLM, llama.cpp, SGLang, Transformers, and Hugging Face. For organizations that already have GPU capacity, deploying Small 4 is straightforward. For those without, cloud-hosted options are available through Mistral's API and several third-party providers.

Why this matters for European businesses

The Apache 2.0 license is not a minor detail. It means any organization can deploy Small 4 on their own infrastructure, fine-tune it on proprietary data, and integrate it into commercial products, all without paying per-token fees to a US cloud provider and without routing sensitive documents through external APIs. For sectors handling personal data under GDPR, or organizations in financial services, healthcare, or the public sector with data residency requirements, this matters enormously.

The 256k context window changes what is possible with document-heavy workflows. Most enterprise AI use cases involve long documents: contracts, audit reports, policy manuals, procurement specifications. Models with short context windows force developers to chunk documents into pieces and stitch answers back together, which introduces errors and complexity. With 256k tokens, a typical 200-page contract fits comfortably in a single context, and the model can reason across the entire document without losing thread.

The native multimodal support is equally significant. Many documents that businesses need to process are not clean text files: they are scanned invoices, photographed delivery notes, PDFs with mixed layouts, or spreadsheets exported as images. Until recently, handling these required a separate vision model in the pipeline. Small 4 collapses that into a single model call, reducing architecture complexity and operational overhead.

Laava's perspective

Laava has been advising clients on sovereign AI deployment since before it became a mainstream conversation. The core argument has always been: European organizations should not build critical AI workflows on infrastructure they do not control, especially when the underlying models can be self-hosted without significant capability trade-offs. Mistral Small 4 strengthens that argument considerably. A year ago, choosing open-source meant accepting lower quality. That trade-off is largely gone.

The practical implications for document processing are direct. An organization running invoice extraction, contract review, or report generation on a self-hosted Small 4 instance gets a model that can handle scanned documents natively, reason across long documents without chunking, and operate entirely within their own network perimeter. Combined with fine-tuning on company-specific document formats, the accuracy on specialized tasks routinely exceeds what general-purpose cloud APIs deliver on the same inputs.

The configurable reasoning effort is also worth noting for workflow automation. Not every task in a backoffice pipeline requires deep reasoning. A routing decision on an incoming email does not need the same compute as drafting a commercial response to a contract dispute. Being able to set reasoning_effort per task, rather than paying for full reasoning power on trivial steps, directly reduces operational cost without degrading quality where it counts.

What you can do now

If your organization currently routes documents through a cloud-hosted AI API and you have concerns about data residency or long-term vendor dependency, now is a good moment to evaluate what a self-hosted model can do for your specific workflows. The combination of Mistral Small 4's capability, the Apache 2.0 license, and the broad compatibility with open-source inference frameworks (vLLM, llama.cpp) means the technical barrier is lower than it has ever been.

Laava can help you run a focused four-week pilot: define one document-heavy process, deploy a self-hosted model, measure accuracy and throughput, and hand you a clear picture of cost and capability before you commit to any infrastructure investment. If you want to understand what sovereign AI looks like in practice for your business, start there.

Mistral Small 4 brings multimodal reasoning to open-source AI, and it changes the sovereign AI calculus

Determine where this affects you first for real

From news to a concrete first route