Microsoft Enables Local ‘Computer Use’ AI Agents With New Fara-7B Model

Microsoft Research has unveiled Fara-7B, a compact 7-billion parameter AI model designed to run “computer use” agents directly on local devices.

By processing screen pixels entirely on-device, the new model aims to establish “pixel sovereignty,” allowing enterprises to automate sensitive workflows without exposing data to the cloud.

Released today under an MIT license, Fara-7B reportedly outperforms massive cloud-based rivals like OpenAI’s GPT-4o on key navigation benchmarks while slashing inference costs by over 90%.

Pixel Sovereignty: The Shift to Local Agents

Breaking from the industry trend of centralized processing, Microsoft Research’s release of Fara-7B marks a strategic pivot from cloud-dependent AI to what they call “pixel sovereignty,” ensuring sensitive data never leaves the user’s device.

Under the hood, the architecture relies on Alibaba’s Qwen2.5-VL-7B base model, processing visual data directly from screenshots rather than relying on accessibility trees or underlying code structures.

Adopting a “vision-first” strategy, the agent interacts with any application interface just as a human would, bypassing the need for custom API integrations.

Local execution addresses critical enterprise concerns regarding data privacy, particularly for regulated industries handling financial or healthcare data. By keeping all inference on the local machine, organizations can deploy autonomous agents without exposing proprietary workflows or customer information to third-party servers. Microsoft says,

“Fara-7B’s small size now makes it possible to run CUA models directly on devices. This results in reduced latency and improved privacy, as user data remains local.”

By removing the latency of round-trip cloud requests, on-device agents can react faster to UI changes, creating a smoother user experience. Such agility proves critical for complex, multi-step workflows where delays can compound into significant productivity losses. According to Microsoft:

“A pixel-only agent can work across many applications without alignment or integration, which is a big advantage. But if the UI changes, the agent may struggle. It is powerful, but also fragile.”

Optimized for consumer hardware, the compact 7-billion parameter architecture targets the NPU capabilities of Copilot+ PCs. Accessible without expensive infrastructure, these capabilities ensure that advanced agentic features remain within reach for standard enterprise deployments.

Efficiency & Benchmarks: The Cost of Autonomy

In a direct challenge to proprietary giants, Fara-7B achieves a 73.5% success rate on the WebVoyager benchmark, surpassing the 65.1% score of OpenAI’s GPT-4o (SoM). Such results suggest that smaller, specialized models can outperform larger, general-purpose models on specific tasks.

According to the technical documentation, Fara-7B functions as a multimodal decoder-only model built upon Alibaba’s Qwen2.5-VL-7B architecture. The system processes user goals, browser screenshots, and action history within a 128,000-token context window.

Local AI agents just hit a massive turning point. 🚨

Microsoft dropped Fara-7B, and it’s beating GPT-4o at web navigation while running entirely locally.

The tech is clever: Instead of scraping code (DOM) like old-school scripts, it uses visual recognition to “see” your screen… pic.twitter.com/UEzYkTTcop

— Yi (@imhaoyi) November 25, 2025

Microsoft Research specifies that the model’s toolset aligns with the Magentic-UI interface, enabling actions such as typing, clicking, and scrolling, while predicting coordinates directly as pixel positions on the screen.

Independent testing by Browserbase validates the model’s “state-of-the-art” status for its size class, though it reported a slightly lower success rate of 62% in real-world conditions. Despite this variance, the model remains highly competitive, offering a viable alternative to more resource-intensive solutions.

Cost efficiency is a major differentiator, with Microsoft estimating an average cost of $0.025 per task compared to ~$0.30 for models like GPT-5 or o3. Lowering the barrier to entry, this cost structure could significantly accelerate widespread agent deployment.

As detailed in the official announcement:

“On WebVoyager, Fara-7B uses on average 124,000 input tokens and 1,100 output tokens per task, with about 16.5 actions. Using market token prices, the research team estimate an average cost of 0.025 dollars per task, versus around 0.30 dollars for SoM agents backed by proprietary reasoning models such as GPT-5 and o3.”

Speed benchmarks show significant advantages, with the model completing tasks in approximately 154 seconds versus 254 seconds for the competing UI-TARS-1.5-7B model, according to Browserbase.

Combined with low operational costs, the rapid execution makes Fara-7B an attractive option for high-volume automation tasks.

Despite its small size, Fara-7B maintains a substantial 128,000-token context window, allowing it to retain history across long, multi-step workflows, as noted in the official announcement.

“Moving forward, we’ll strive to maintain the small size of our models. Our ongoing research is focused on making agentic models smarter and safer, not just larger,” says Microsoft.

The company acknowledges the model is experimental, pointing to limitations:

“You can freely experiment and prototype with Fara‑7B under the MIT license, but it’s best suited for pilots and proofs‑of‑concept rather than mission‑critical deployments.”

The Agentic Ecosystem: Safety & Competition

To train the model without expensive human annotation, Microsoft developed “FaraGen,” a synthetic data pipeline that generated over 145,000 verified task trajectories.

Rapidly scaling training data, this method addresses a key bottleneck in agent development.

Safety is enforced through a “Critical Point” mechanism, which pauses the agent and demands user approval before irreversible actions like purchases or sending emails. According to the model repository:

“A Critical Point is defined as any situation requiring a user’s personal data or consent before an irreversible action occurs, such as sending an email or completing a financial transaction. Upon reaching such a juncture, Fara-7B is designed to pause and explicitly request user approval before proceeding.” […] “This approach helps organizations meet strict requirements in regulated sectors, including HIPAA and GLBA.”

Intensifying the “agentic AI” arms race, the release directly competes with Anthropic’s Computer Use feature, the ChatGPT Agent launch from OpenAI, and the Gemini 2.5 Computer Use preview from Google.

While the rivals focus on cloud-based solutions, Fara-7B leaves a gap for local, privacy-focused alternatives.

Unlike competitors that often require cloud connectivity, Fara-7B’s open-weight nature allows developers to fine-tune and deploy the model in fully air-gapped environments.

Microsoft has released the model under the permissive MIT license on Hugging Face and Azure Foundry, encouraging broad community adoption and iteration. Contrasting with the closed ecosystems of its primary rivals, this open approach potentially accelerates innovation in the local agent space.

Source link

Microsoft Enables Local ‘Computer Use’ AI Agents With New Fara-7B Model

Pixel Sovereignty: The Shift to Local Agents

Efficiency & Benchmarks: The Cost of Autonomy

The Agentic Ecosystem: Safety & Competition

Recent Articles

BGIS Grand Finals Day 3 Highlights: Soul Crowned the Champions

The Sunday Papers | Rock Paper Shotgun

Microsoft Freezes Hiring in Azure Cloud Division, Sales Units as AI Costs Bite

iOS 26.4 Brings New Emoji, a Playlist Generator and More to Your iPhone

AirPods Pro 3 are now available for just $199

Related Stories