The AIgent — The Integration War Just Started

Opening

Anthropic held its Code with Claude developer event today. Simon Willison live-blogged it in real time. That is now the publication of record for developer announcements: one independent blogger, a live text feed, faster and more specific than any press release.

Today's Signals

Anthropic ran its Code with Claude developer event today. Simon Willison live-blogged the session, covering new Claude Code features, MCP updates, and developer tooling announcements as they landed. His notes are the fastest technical read on what shipped. (simonwillison.net)
Anthropic and OpenAI are both building enterprise services divisions, according to Latent Space reporting. The shift: from selling API access to providing hands-on deployment support. Labs are betting that operators need help wiring agents into production, not just access to the model. (latent.space)
OpenAI published research on how enterprises are scaling agentic workflows in production. The findings cover how frontier organizations structure human-in-the-loop oversight, handle failure modes, and measure output quality at scale. Worth reading before your next internal agent pitch. (openai.com/blog)
Google updated its AI Search methodology to pull expert content from forums including Reddit and Stack Overflow. The move prioritizes community-sourced specialist knowledge over general web content for technical queries. For developers, this changes what ranks when the answer you need is buried in a 2019 Stack Overflow thread. (techcrunch.com)
Genesis showed GENE-26.5, a full-stack embodied agent model, at a public demo. The system runs perception, planning, and manipulation in a single model rather than chaining separate modules. TechCrunch covered the demo. Whether it holds up outside the demo environment is the open question. (techcrunch.com)

What's Moving

[TOOL] fluffypony/dothething — A local agent that takes a plain English task description and works until the task is done. No prompt engineering required. Tools it can call: web search via SearXNG, browser automation via Notte and Camoufox (captcha solving included), file operations, shell commands, HTTP requests, MCP server integration, and email via AgentMail. Practical features: cost tracking with a configurable spending limit, thread persistence so you can resume a task mid-run, multi-agent orchestrator mode for parallel execution, and live mid-task input so you can redirect without restarting. BSD-3 license. 1,500 stars. The spending limit feature alone makes this worth looking at. Most local agent tools let you run until the budget is gone. dothething lets you set a ceiling and stop. (github.com/fluffypony/dothething)

Partner

Your reach is rented. And landlords evict.

One algorithm update. One policy change. One bad quarter for a platform that isn't yours. The audience you spent years building disappears overnight.

beehiiv is what happens when you stop renting and start owning. A list that's yours. Revenue that compounds. Growth tools built in from day one.

30% off your first 3 months with code LIST30. Start building today.

The Frame

Models are converging. Every frontier lab ships fast inference, long context, and strong coding benchmarks. The capability gap between the top four providers has narrowed to the point where most production operators could swap one for another and not notice the difference on routine tasks.

That is the part Anthropic named plainly today at Code with Claude: the competition has moved up the stack. The question is not which model scores highest on HumanEval. The question is which model is most deeply wired into the environment where developers actually work. Claude Code is Anthropic’s answer to that question. It edits files, runs tests, manages git state, and orchestrates sub-agents, all from inside the terminal. The IDE becomes the interface. The model becomes infrastructure. Anthropic is not pitching a smarter assistant. It is pitching an operator that runs inside your existing workflow without requiring a new tool.

OpenAI sees the same thing. The enterprise services push is not a support play. It is an integration play. Labs that help operators wire agents into their production systems get stickier than labs that sell API access and walk away. The margin is not in the token. The margin is in the configuration work no one else has done yet. The operators paying attention to this are not benchmarking models. They are counting integrations.

On the Radar

[REPO] msitarzewski/agency-agents — 90-plus pre-built AI agent personas organized into 25 divisions: engineering, design, marketing, sales, and more. Each agent ships with a distinct personality, communication style, and workflow spec. Compatible with Claude Code, GitHub Copilot, Cursor, Gemini CLI, and Aider. The value is not any single persona. It is the division structure: you drop the right agent into the right context without writing the system prompt from scratch. 291 commits, 49 open issues, 74 PRs. Active project with real community weight. 94,300 stars. (github.com/msitarzewski/agency-agents)

[REPO] wrg32786/titus-os — An operating system for Claude Code built entirely in markdown. Fifteen files encode identity, authority, memory, delegation, and session lifecycle. No database, no server. The vault architecture uses Obsidian-compatible markdown files, which means your agent’s memory lives in a folder you can open and read. v0.5 shipped May 3 with a somatic layer: internal-state reflection, context capsules, and agent fitness tracking. For founders running parallel work streams, this is the closest thing to a persistent chief of staff that does not require a backend. (github.com/wrg32786/titus-os)

[SKILL] agent-sh/agnix — A linter and language server for AI coding assistant configuration files. Drop it into your IDE and it catches config errors in CLAUDE.md, Cursor rules, and Copilot instruction files before they hit runtime. 227 stars. If you have ever shipped a broken agent config that silently degraded output for three sessions before you noticed, this is the preventive measure. (github.com/agent-sh/agnix)

The Onboard

Project context with CLAUDE.md: set it once, stop repeating yourself.

Claude Code reads a file called CLAUDE.md from your project root every time it starts. Anything in that file becomes the session’s standing context. You never type it again.

What belongs in it: your project details, your preferences, and explicit rules. Example: "Always ask before deleting any file." Claude will follow that every session without being reminded.

Create the file in your project root: touch CLAUDE.md. Then add your context. Three categories cover most projects: Stack (your tech), Conventions (naming, exports), and Rules (test commands, migration constraints). Claude reads this at startup and carries it through the session. No re-prompting. The bigger the project, the more this matters.

Builder’s Brief

Local SEO Audit Tool: $99 cold, $49/month warm

The premise. Most local businesses have a Google Business Profile full of obvious gaps. Missing service categories, incomplete hours, sparse reviews, photos from 2021. Scrape the profile, compare it to the top three competitors in the same zip code, hand Claude the diff, and charge $99 for the one-page PDF report.

The stack. Claude Code handles the analysis and report generation. Puppeteer scrapes the Google Business Profile and competitor listings. A PDF export library formats the output. Total moving parts: three. No database needed for the audit tier. Add a lightweight SQLite store for the monitoring upsell.

The pricing. $99 one-time audit. The deliverable is a PDF with five highest-impact fixes, ranked by estimated traffic recovery. Monitoring upsell at $49/month re-runs the audit weekly and emails a delta: what changed in your profile, what changed in competitors’, what to do about it.

The first customers. Cold email 20 restaurants or home-service businesses with under 50 reviews and an incomplete profile. The pitch is one sentence: “I found three gaps in your Google listing that your top competitor doesn’t have. I can show you what they are for $99.” No deck, no call. Send the email, attach a partial screenshot showing an obvious gap, link to a Stripe payment page.

The kill risk. Google Maps DOM changes quarterly. The scraper breaks. A proxy budget solves this for the monitoring tier.

Before You Go

dothething sets a spending limit so the agent stops before the budget runs out. If you had to put a spending limit on human delegation, where would you draw the line between giving someone a task and giving them a problem?

Forward this to a builder who needs it.