Issue 07: Fable 5 for Everyone — Mythos Goes Public, GitHub Agents Land in CI, and LangGraph Gets Patched

Issue 07: Fable 5 for Everyone — Mythos Goes Public, GitHub Agents Land in CI, and LangGraph Gets Patched

Five developments from June 9–11: Anthropic makes Claude Fable 5 (Mythos-class) publicly available at half the price of Mythos Preview; GitHub ships Agentic Workflows in public preview for natural-language CI automation; LangSmith Sandboxes GA brings hardware-isolated microVMs to agent workflows; Windsurf rebrands as Devin Desktop with a multi-agent command center and ACP integration; and Check Point Research discloses a three-CVE SQLi-to-RCE chain in LangGraph's checkpointer layer.

AI Product Engineer Day by Day
12/6/2026 · 8:02
7 suscripciones · 6 contenidos

Vistazo a la investigación

Five items from the week of June 9–11 (plus one from June 5 that slipped past our last issue): Anthropic opens its Mythos-class model to everyone with a price cut that changes the calculus on frontier-capable agents; GitHub ships natural-language CI workflows in public preview; LangSmith makes hardware-isolated agent sandboxes generally available; Windsurf rebrands into a multi-agent command center; and Check Point Research discloses a three-CVE chain in LangGraph that can turn a filter parameter into a shell.

Claude Fable 5: the first Mythos-class model anyone can call

Anthropic launched Claude Fable 5 on June 9, making its first Mythos-tier model publicly available through both claude.ai and the Claude API.1 The headline capability claim is long-horizon autonomy: the model can plan across stages, delegate to sub-agents, check its own work, and sustain a session for days — things Anthropic says previous models couldn't hold together.
The pricing is the concrete change to track. Fable 5 is $10 per million input tokens and $50 per million output tokens, with the existing 90% caching discount. Anthropic says that's less than half the price of Claude Mythos Preview.2 Early benchmark claims are strong across the board: highest-scoring model on CursorBench, #1 on Cognition's FrontierBench, and the first model to break 90% on Hebbia's finance analytics benchmark — a 10-point jump over Opus 4.8.
One production detail that matters: Fable 5 ships with safety classifiers that auto-route cybersecurity and biology queries to Opus 4.8 instead. Anthropic says fewer than 5% of sessions trigger a fallback, and that the classifiers survived 1,000+ hours of external red-teaming without a universal jailbreak. The arrangement is framed as temporary — they're working to reduce false positives as more capable models arrive.
Alongside it, Claude Mythos 5 launched in restricted access for a small set of cyberdefenders and infrastructure providers via Project Glasswing, with cybersecurity safeguards lifted. Same underlying model, tighter distribution.1 GitHub Copilot separately confirmed Fable 5 is now generally available inside Copilot as of June 9.3
The practical question for teams is whether the token efficiency gains hold at scale. Fable 5 is described as finishing Cognition's FrontierCode tasks at medium effort while still top-scoring — meaning you might get equivalent work done with fewer tokens than Opus. That's testable, and worth running before committing agentic workflows to it at volume.
Benchmark table comparing Claude Fable 5 and Mythos 5 against other frontier models across coding, knowledge work, and vision tasks
Fable 5 and Mythos 5 benchmark comparison across frontier models. 1

GitHub Agentic Workflows: CI/CD with a reasoning layer

On June 11, GitHub pushed GitHub Agentic Workflows into public preview.4 The pitch: write your automation intent in natural-language Markdown files, and the system compiles them into standard GitHub Actions YAML. Because the output is just Actions, it inherits your existing runner groups, approval policies, and secrets configuration.
The intended use cases are reasoning-heavy tasks that don't fit rule-based Actions: issue triage, CI failure analysis, documentation updates, security remediation across multiple repositories. Carvana's SVP of Engineering said in the launch post that their workflows can now span multiple repositories — the kind of cross-repo coordination that previously required human wiring.
GitHub Agentic Workflows run screenshot showing agent-coordinated CI tasks
GitHub Agentic Workflows public preview, June 11. Agents run inside sandboxed containers with read-only permissions by default. 4
The security design deserves attention. Agents run with read-only permissions by default, inside a sandboxed container behind an Agent Workflow Firewall. Outputs go through a safe-outputs validation step. A dedicated threat-detection job scans every proposed change before application. A separate June 11 update removed the requirement for a personal access token — workflows now run under the built-in GITHUB_TOKEN.5
The quickstart is at gh.io/gh-aw-quickstart. Prebuilt workflows covering triage, reporting, and compliance are available in the githubnext/agentics repository.

LangSmith Sandboxes GA: hardware-isolated compute for agents

Published June 5 — picked up after our last issue.
LangChain made LangSmith Sandboxes generally available on June 5.6 The core argument in the announcement: containers aren't enough for agents that install arbitrary dependencies from model-generated code, because containers share a kernel with the host. Each LangSmith Sandbox is a hardware-virtualized microVM with its own kernel. The agent gets a full machine — filesystem, shell, package manager, network access, persistent state — that disappears when the session ends.
The GA release ships several production-relevant primitives:
  • Snapshots and forks: capture a sandbox mid-session and spin up parallel branches via copy-on-write. Ten branches cost roughly the same as one.
  • Blueprints: pre-warm an environment (repo cloned, deps installed, config in place) so new sandboxes boot in seconds, not minutes.
  • Auth proxy: outbound requests flow through a proxy that injects credentials at the network layer. Secrets never touch the agent runtime.
  • Service URLs: if the agent starts a local web server, you get an authenticated URL you can open or share — no port forwarding.
The security case is concrete. The announcement cites Copy Fail (CVE-2026-31431), a 732-byte Python script that roots major Linux distributions through the kernel crypto API — discovered by AI tooling in about an hour.6 The point: containers don't isolate against kernel exploits. Monday.com is using Sandboxes to give their Sidekick AI assistant a secure environment for running data analysis and multimedia generation workflows.
The API entry point is client.create_sandbox() via the existing LangSmith SDK. Docs at docs.langchain.com/langsmith/sandboxes.

Windsurf becomes Devin Desktop

Cognition rebranded Windsurf as Devin Desktop in the past week, positioning it explicitly as a multi-agent command center rather than an AI code editor.7 The reframe is meaningful: the product now leads with an Agent Command Center — a Kanban-style view of agent sessions across projects, a Spaces feature that shares codebase context and Git worktrees across multiple agents, and multi-agent coordination support under Cognition's Agent Client Protocol (ACP).
Existing functionality carries over as an over-the-air update. Plans, pricing, extensions, and settings are unchanged. The free tier remains, Pro is still $20/month, and the new Max tier is $200/month. Enterprise customers on legacy Windsurf plans keep their existing terms.
The ACP integration matters for teams building mixed-agent workflows: Devin can now work alongside other ACP-compatible agents in a shared session, with context passed between them. NVIDIA joined the research preview for multi-agent support, and Ramp and Modal are listed as design partners. Harvey built an internal background agent (Spectre) on Devin Desktop that carries organizational context across engineering, product, and design teams.
SWE-1.6 — described as the fastest coding model in the world — is included at no additional charge. JetBrains support (IntelliJ, PyCharm, WebStorm) continues as a separate download.

LangGraph checkpointer: SQL injection chains to RCE

On June 11, Check Point Research published a three-CVE vulnerability chain in LangGraph's persistence layer.8 LangGraph has over 50 million monthly PyPI downloads. The attack surface is teams who self-host LangGraph with the SQLite or Redis checkpointer and expose get_state_history() with a user-controlled filter parameter.
The chain works as follows:
  1. CVE-2025-67644 (SQLite checkpointer): the filter parameter in list() passes dictionary keys directly into a SQL json_extract() call without parameterization. An attacker who controls the filter keys can inject a UNION SELECT that appends a fake checkpoint row — with attacker-controlled serialized data in the checkpoint BLOB column.
  2. CVE-2026-28277 (msgpack deserialization): when that fake row is processed, loads_typed() deserializes the BLOB. The msgpack handler uses a custom _msgpack_ext_hook that calls getattr(importlib.import_module(module), name)(arg). An attacker who controls the serialized data controls which module, function, and argument are called — os.system("...") is the canonical demo.
  3. CVE-2026-27022 (Redis checkpointer): the same injection class affects the Redis checkpointer via unparameterized filter key interpolation.
Patches are available and should be applied now:
PackageFixed version
langgraph-checkpoint-sqlite3.0.1+
langgraph1.0.10+
langgraph-checkpoint-redis1.0.2+
Diagram showing the three-step attack chain: SQL injection in the filter parameter → UNION SELECT injects a fake checkpoint row → unsafe msgpack deserialization executes arbitrary code
The full SQLi-to-RCE attack chain in LangGraph's checkpointer. 8
LangChain's managed cloud service (LangSmith Deployment, formerly LangGraph Platform) runs PostgreSQL and is not vulnerable.8 The disclosures were made on November 19, 2025, and patches shipped between December 2025 and March 2026 — the public write-up dropped today.

Next issue: June 18.

Añade más opiniones o contexto en torno a este contenido.

  • Inicia sesión para comentar.