AI Working for You: MCP, Canvas, and Agentic Workflows - Part 2

In our previous post, we looked at how Honeycomb provides unique visibility into LLMs operating in your production environment. Now, let’s explore how Honeycomb provides observability insights uniquely suited to helping your AI agents rapidly diagnose and fix production issues.

By: Ken Rimple

| April 6, 2026

AI & LLMs

Observability

Whitepapers

March 4, 2026

The Director’s Guide to the Future of Observability: AI, OpenTelemetry, and Complex Systems

Read Now

AI Working for You: MCP, Canvas, and Agentic Workflows - Part 2

In our previous post in our series on observability for the agent era, we looked at how Honeycomb provides unique visibility into LLMs operating in your production environment. Now, let’s flip it around and explore how Honeycomb provides observability insights uniquely suited to helping your AI agents rapidly diagnose and fix production issues, and build production feedback into the next round of development.

Read our O’Reilly book, Observability Engineering

Get your free copy and learn the foundations of observability,

right from the experts.

Download Now

Your AI coding agent can now query Honeycomb, find the bug in your source code, and generate the fix, all without leaving your IDE.

The Honeycomb MCP exposes the full capabilities of Honeycomb’s query engine. It’s a full observability interface: traces, metrics, logs, BubbleUp, query history, SLOs, and boards, all accessible to any agent that supports the Model Context Protocol. That includes Claude Code, Cursor, Windsurf, and any custom SRE agent your team has built. When your coding agent can see what's actually happening in production, it stops guessing and starts solving.

In this demo, an engineer asks their coding agent to assess current application performance. The agent calls Honeycomb MCP tools to get a live overview, identifies that the checkout service is exhibiting performance issues, retrieves the relevant source code paths from the span data, and then proposes and applies a fix, all without the engineer switching context. The agent does the observability work, the code archaeology, and the remediation in a single workflow.

Customers build their own internal SRE agents on top of the Honeycomb MCP, agents suited to their specific infrastructure and business context. Honeycomb provides the observability substrate. What your agents do with it is up to you.

Canvas: The AI agent that investigates like your best SRE and shows its work

Canvas is a collaborative investigative workspace powered by an AI agent that has access to the same query engine you do. When an alert fires, Canvas does more than summarize. It forms hypotheses, runs queries, executes BubbleUp, reads traces, and builds a structured investigation in a shared canvas that your whole team can see and contribute to in real time.

In this demo, Canvas is triggered by an alert on a checkout service slowdown. It gathers context from the trigger, identifies the measurement that caused the alert, forms three hypotheses, and attempts to spawn sub-agents to investigate each in parallel. When those sub-agents return unexpected output, Canvas adapts, pulling all hypothesis threads into its main reasoning loop. Every query it runs, every trace it views, every BubbleUp result it interprets shows up in the canvas for your team to review.

The result: a structured investigation with hypotheses, evidence, and conclusions, assembled faster than any individual human or agent could manage alone. Canvas finds the answer and shows you exactly how it got there, with links to every piece of evidence. That's useful for the on-call engineer, and it's the incident review artifact you'll use.

You don’t have to write SLOs from gut feel. Let your agent research the service and create them in 90 seconds.

A good SLO knows the most important endpoints, understands baseline latency and error rate distributions, runs precise queries, and sets thresholds that are challenging but achievable.

With the Honeycomb MCP and our published agent skills, a coding agent can do all of that research automatically and create SLOs that fit the real behavior of your service.

In this demo, an engineer asks their agent to create SLOs for the checkout service. The agent uses Honeycomb MCP to inspect the dataset, identify the most critical endpoints, check historical latency distributions, and write the query logic. In under two minutes, it has recommendations: two SLOs, both referencing PlaceOrder, the endpoint that matters most. Twenty seconds after approval, both SLOs exist in Honeycomb and are aligned with actual service performance.

Human-in-the-loop by design: Canvas investigates autonomously, but humans authorize every action

Autonomous remediation is powerful, but dangerous without the right guardrails.

Canvas has eyes everywhere, but no fingers outside its workspace unless a human says otherwise. Autonomous investigation with guardrails for action is exactly what production environments need from AI right now.

In this demo, Canvas is triggered by a checkout service slowdown, autonomously assembles the investigation context, identifies the root cause (a cache size issue), and surfaces a specific remediation recommendation: restart the deployment. But it stops there. A human on-call SRE reviews the evidence directly in the Canvas workspace, marks their approval, and authorizes the restart. After the action is taken, the engineer returns to Canvas and asks, "Did the fix work?" Canvas runs new queries and adds the results live to the investigation, revealing, in this case, that the incident is not resolved yet.

The full evidence trail, including the initial investigation, the human approval, the post-fix assessment, and the ongoing incident lives in the Canvas workspace, ready for the incident review.

Next up: the fundamental things apply

The past two posts have looked at some of the use cases Honeycomb customers are implementing to observe LLMs in production and power agentic observability workflows. In the third and final post in this series, we’ll take it back to basics and look at how the fundamental capabilities and infrastructure of Honeycomb provide the comprehensive data and fast performance that makes these use cases work at production scale.

Read part 3 now: The Fundamentals: Fast, Deep, and Ready for What Comes Next