<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://amodelandme.dev/feed.xml" rel="self" type="application/atom+xml" /><link href="https://amodelandme.dev/" rel="alternate" type="text/html" /><updated>2026-05-27T22:41:40+00:00</updated><id>https://amodelandme.dev/feed.xml</id><title type="html">a model and me</title><subtitle>Notes on harness engineering, multi-agent systems, .NET, and what AI is doing to the craft. Mostly C#, mostly in public.</subtitle><author><name>a model and me</name><email></email></author><entry><title type="html">Why your AI harness needs a circuit breaker</title><link href="https://amodelandme.dev/2026/05/circuit-breaker/" rel="alternate" type="text/html" title="Why your AI harness needs a circuit breaker" /><published>2026-05-22T00:00:00+00:00</published><updated>2026-05-22T00:00:00+00:00</updated><id>https://amodelandme.dev/2026/05/circuit-breaker</id><content type="html" xml:base="https://amodelandme.dev/2026/05/circuit-breaker/"><![CDATA[<p class="post-lede">Multi-agent systems fail in correlated ways. One bad tool call cascades into
twenty retries, and the bill arrives Monday. A pattern from electrical
engineering, ported badly into software.</p>

<p>The first time I shipped a multi-agent harness to anything resembling
production, it took out our OpenAI quota in ninety seconds. Not because the
prompts were expensive — they weren't — but because one agent's tool call
returned malformed JSON, the planner agent retried, and the supervisor
cheerfully scheduled the same broken plan eight more times.</p>

<p>The fix is older than I am. It's borrowed wholesale from electrical
engineering: a circuit breaker. When the same failure pattern repeats,
stop. Open the circuit. Let the system cool down before you try again.<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup></p>

<h2 id="a-naive-first-attempt">A naive first attempt</h2>

<p>My first cut was an <code class="language-plaintext highlighter-rouge">int _failures</code> field on the harness root. After three,
throw. It worked for about a day.</p>

<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// don't do this</span>
<span class="k">public</span> <span class="k">class</span> <span class="nc">NaiveBreaker</span>
<span class="p">{</span>
    <span class="k">private</span> <span class="kt">int</span> <span class="n">_failures</span> <span class="p">=</span> <span class="m">0</span><span class="p">;</span>

    <span class="k">public</span> <span class="k">async</span> <span class="n">Task</span><span class="p">&lt;</span><span class="n">T</span><span class="p">&gt;</span> <span class="n">InvokeAsync</span><span class="p">&lt;</span><span class="n">T</span><span class="p">&gt;(</span><span class="n">Func</span><span class="p">&lt;</span><span class="n">Task</span><span class="p">&lt;</span><span class="n">T</span><span class="p">&gt;&gt;</span> <span class="n">op</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">_failures</span> <span class="p">&gt;=</span> <span class="m">3</span><span class="p">)</span>
            <span class="k">throw</span> <span class="k">new</span> <span class="nf">CircuitOpenException</span><span class="p">();</span>
        <span class="k">try</span> <span class="p">{</span> <span class="k">return</span> <span class="k">await</span> <span class="nf">op</span><span class="p">();</span> <span class="p">}</span>
        <span class="k">catch</span> <span class="p">{</span> <span class="n">_failures</span><span class="p">++;</span> <span class="k">throw</span><span class="p">;</span> <span class="p">}</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<blockquote>
  <p><strong>note</strong>
A counter without a half-open state isn't a circuit breaker — it's a kill
switch. The system never recovers on its own.</p>
</blockquote>

<p>The real pattern has three states: <strong>closed</strong> (normal), <strong>open</strong> (failing
fast), and <strong>half-open</strong> (cautiously probing). The transition between them is
the entire trick.</p>

<h2 id="the-three-states">The three states</h2>

<table>
  <thead>
    <tr>
      <th style="text-align: left">State</th>
      <th style="text-align: left">What's happening</th>
      <th style="text-align: left">Transition out</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: left">Closed</td>
      <td style="text-align: left">Calls flow through. Failures are counted.</td>
      <td style="text-align: left">Failure threshold → <strong>Open</strong></td>
    </tr>
    <tr>
      <td style="text-align: left">Open</td>
      <td style="text-align: left">Calls fail fast without invoking the dependency.</td>
      <td style="text-align: left">Cooldown elapses → <strong>Half-Open</strong></td>
    </tr>
    <tr>
      <td style="text-align: left">Half-Open</td>
      <td style="text-align: left">One probe call is allowed.</td>
      <td style="text-align: left">Probe succeeds → <strong>Closed</strong>, fails → <strong>Open</strong></td>
    </tr>
  </tbody>
</table>

<p>The math for the cooldown matters more than I expected. Linear backoff is
wrong for AI harnesses — model providers tend to recover in bursts. I've
been running an exponential with jitter:</p>

\[t_{cooldown} = \min\left(t_{max},\; t_{base} \cdot 2^{n}\right) \cdot \big(1 + U(-0.2, 0.2)\big)\]

<p>where $n$ is the number of consecutive open transitions. The jitter prevents
a thundering herd when many breakers reopen simultaneously — a real failure
mode if you're running parallel agents that share a downstream tool.</p>

<h2 id="implementation-in-c-13">Implementation in C# 13</h2>

<p>A working version, with the half-open state and an injectable clock for
testing:</p>

<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">public</span> <span class="k">sealed</span> <span class="k">class</span> <span class="nc">CircuitBreaker</span><span class="p">(</span>
    <span class="kt">int</span> <span class="n">failureThreshold</span> <span class="p">=</span> <span class="m">5</span><span class="p">,</span>
    <span class="n">TimeSpan</span><span class="p">?</span> <span class="n">cooldown</span> <span class="p">=</span> <span class="k">null</span><span class="p">,</span>
    <span class="n">TimeProvider</span><span class="p">?</span> <span class="n">time</span> <span class="p">=</span> <span class="k">null</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">private</span> <span class="k">readonly</span> <span class="n">TimeSpan</span> <span class="n">_cooldown</span> <span class="p">=</span> <span class="n">cooldown</span> <span class="p">??</span> <span class="n">TimeSpan</span><span class="p">.</span><span class="nf">FromSeconds</span><span class="p">(</span><span class="m">30</span><span class="p">);</span>
    <span class="k">private</span> <span class="k">readonly</span> <span class="n">TimeProvider</span> <span class="n">_time</span> <span class="p">=</span> <span class="n">time</span> <span class="p">??</span> <span class="n">TimeProvider</span><span class="p">.</span><span class="n">System</span><span class="p">;</span>
    <span class="k">private</span> <span class="k">readonly</span> <span class="n">Lock</span> <span class="n">_gate</span> <span class="p">=</span> <span class="k">new</span><span class="p">();</span>

    <span class="k">private</span> <span class="kt">int</span> <span class="n">_failures</span><span class="p">;</span>
    <span class="k">private</span> <span class="n">CircuitState</span> <span class="n">_state</span> <span class="p">=</span> <span class="n">CircuitState</span><span class="p">.</span><span class="n">Closed</span><span class="p">;</span>
    <span class="k">private</span> <span class="n">DateTimeOffset</span> <span class="n">_openedAt</span><span class="p">;</span>

    <span class="k">public</span> <span class="k">async</span> <span class="n">Task</span><span class="p">&lt;</span><span class="n">T</span><span class="p">&gt;</span> <span class="n">InvokeAsync</span><span class="p">&lt;</span><span class="n">T</span><span class="p">&gt;(</span>
        <span class="n">Func</span><span class="p">&lt;</span><span class="n">CancellationToken</span><span class="p">,</span> <span class="n">Task</span><span class="p">&lt;</span><span class="n">T</span><span class="p">&gt;&gt;</span> <span class="n">op</span><span class="p">,</span>
        <span class="n">CancellationToken</span> <span class="n">ct</span> <span class="p">=</span> <span class="k">default</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="nf">EnsureCanProceed</span><span class="p">();</span>
        <span class="k">try</span>
        <span class="p">{</span>
            <span class="kt">var</span> <span class="n">result</span> <span class="p">=</span> <span class="k">await</span> <span class="nf">op</span><span class="p">(</span><span class="n">ct</span><span class="p">).</span><span class="nf">ConfigureAwait</span><span class="p">(</span><span class="k">false</span><span class="p">);</span>
            <span class="nf">OnSuccess</span><span class="p">();</span>
            <span class="k">return</span> <span class="n">result</span><span class="p">;</span>
        <span class="p">}</span>
        <span class="k">catch</span> <span class="nf">when</span> <span class="p">(!</span><span class="n">ct</span><span class="p">.</span><span class="n">IsCancellationRequested</span><span class="p">)</span>
        <span class="p">{</span>
            <span class="nf">OnFailure</span><span class="p">();</span>
            <span class="k">throw</span><span class="p">;</span>
        <span class="p">}</span>
    <span class="p">}</span>

    <span class="k">private</span> <span class="k">void</span> <span class="nf">EnsureCanProceed</span><span class="p">()</span>
    <span class="p">{</span>
        <span class="k">lock</span> <span class="p">(</span><span class="n">_gate</span><span class="p">)</span>
        <span class="p">{</span>
            <span class="k">if</span> <span class="p">(</span><span class="n">_state</span> <span class="k">is</span> <span class="n">CircuitState</span><span class="p">.</span><span class="n">Open</span><span class="p">)</span>
            <span class="p">{</span>
                <span class="k">if</span> <span class="p">(</span><span class="n">_time</span><span class="p">.</span><span class="nf">GetUtcNow</span><span class="p">()</span> <span class="p">-</span> <span class="n">_openedAt</span> <span class="p">&lt;</span> <span class="n">_cooldown</span><span class="p">)</span>
                    <span class="k">throw</span> <span class="k">new</span> <span class="nf">CircuitOpenException</span><span class="p">();</span>
                <span class="n">_state</span> <span class="p">=</span> <span class="n">CircuitState</span><span class="p">.</span><span class="n">HalfOpen</span><span class="p">;</span>
            <span class="p">}</span>
        <span class="p">}</span>
    <span class="p">}</span>

    <span class="k">private</span> <span class="k">void</span> <span class="nf">OnSuccess</span><span class="p">()</span>  <span class="p">{</span> <span class="k">lock</span> <span class="p">(</span><span class="n">_gate</span><span class="p">)</span> <span class="p">{</span> <span class="n">_failures</span> <span class="p">=</span> <span class="m">0</span><span class="p">;</span> <span class="n">_state</span> <span class="p">=</span> <span class="n">CircuitState</span><span class="p">.</span><span class="n">Closed</span><span class="p">;</span> <span class="p">}</span> <span class="p">}</span>
    <span class="k">private</span> <span class="k">void</span> <span class="nf">OnFailure</span><span class="p">()</span>
    <span class="p">{</span>
        <span class="k">lock</span> <span class="p">(</span><span class="n">_gate</span><span class="p">)</span>
        <span class="p">{</span>
            <span class="n">_failures</span><span class="p">++;</span>
            <span class="k">if</span> <span class="p">(</span><span class="n">_failures</span> <span class="p">&gt;=</span> <span class="n">failureThreshold</span> <span class="p">||</span> <span class="n">_state</span> <span class="k">is</span> <span class="n">CircuitState</span><span class="p">.</span><span class="n">HalfOpen</span><span class="p">)</span>
            <span class="p">{</span>
                <span class="n">_state</span> <span class="p">=</span> <span class="n">CircuitState</span><span class="p">.</span><span class="n">Open</span><span class="p">;</span>
                <span class="n">_openedAt</span> <span class="p">=</span> <span class="n">_time</span><span class="p">.</span><span class="nf">GetUtcNow</span><span class="p">();</span>
            <span class="p">}</span>
        <span class="p">}</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="k">public</span> <span class="k">enum</span> <span class="n">CircuitState</span> <span class="p">{</span> <span class="n">Closed</span><span class="p">,</span> <span class="n">Open</span><span class="p">,</span> <span class="n">HalfOpen</span> <span class="p">}</span>
<span class="k">public</span> <span class="k">sealed</span> <span class="k">class</span> <span class="nc">CircuitOpenException</span> <span class="p">:</span> <span class="n">Exception</span><span class="p">;</span>
</code></pre></div></div>

<p>A few notes on this version:</p>

<ul>
  <li>The <code class="language-plaintext highlighter-rouge">Lock</code> type is the C# 13 named lock. Reads and writes to the state
fields are short, so a single lock is fine; if you're seeing contention,
you've got bigger problems than the breaker.</li>
  <li><code class="language-plaintext highlighter-rouge">TimeProvider</code> is injectable so the test suite can advance time
deterministically. Don't use <code class="language-plaintext highlighter-rouge">DateTime.UtcNow</code> directly — you'll regret it.</li>
  <li><code class="language-plaintext highlighter-rouge">ConfigureAwait(false)</code> because this is library-ish code.</li>
</ul>

<blockquote>
  <p><strong>tip</strong>
In production, prefer <strong>Polly's</strong> <code class="language-plaintext highlighter-rouge">ResiliencePipelineBuilder</code> with
<code class="language-plaintext highlighter-rouge">AddCircuitBreaker</code>. The above is for teaching — Polly handles the edges
(timeouts inside the breaker, isolation between breakers, telemetry) that
a hand-rolled version misses.</p>
</blockquote>

<h2 id="tuning-the-thresholds">Tuning the thresholds</h2>

<p>Three knobs, in order of how often I touch them:</p>

<ol>
  <li><strong>Failure threshold.</strong> Start at 5 for chatty providers, 3 for ones you
pay per call. Lower for cold paths.</li>
  <li><strong>Cooldown base.</strong> 10s is fine for most providers; 30s if you're seeing
rate-limit-and-recover patterns.</li>
  <li><strong>Sliding window vs. consecutive count.</strong> Consecutive is simpler and
surprisingly good. Switch to a sliding window only if you're seeing
intermittent failures that should trip the breaker but don't.</li>
</ol>

<blockquote>
  <p><strong>warn</strong>
Don't share a single breaker across logically distinct dependencies. One
bad tool shouldn't blackhole the entire agent. Scope breakers to the
narrowest unit that makes sense — usually <code class="language-plaintext highlighter-rouge">(tool_id, provider)</code>.</p>
</blockquote>

<h2 id="what-it-doesnt-fix">What it doesn't fix</h2>

<p>Circuit breakers stop cascades; they don't stop bad plans. If your agent is
asking the wrong question, the breaker will dutifully stop you from asking
it twenty times — and then the agent will pick the next-most-confident
question and keep going. That's a separate problem, and a more interesting
one. I'll write it up next.</p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">
      <p>Michael Nygard, <em>Release It!</em> (2007). The book that put this pattern
  in front of a generation of services engineers. The original Hystrix
  docs at Netflix are also worth reading; the project itself is retired
  but the concepts hold. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>a model and me</name></author><category term="harness" /><category term="ai-engineering" /><category term="patterns" /><summary type="html"><![CDATA[Multi-agent systems fail in correlated ways. One bad tool call cascades into twenty retries, and the bill arrives Monday. A pattern from electrical engineering, ported badly into software.]]></summary></entry><entry><title type="html">Multi-agent orchestration is just distributed systems with worse error messages</title><link href="https://amodelandme.dev/2026/05/multi-agent-orchestration/" rel="alternate" type="text/html" title="Multi-agent orchestration is just distributed systems with worse error messages" /><published>2026-05-14T00:00:00+00:00</published><updated>2026-05-14T00:00:00+00:00</updated><id>https://amodelandme.dev/2026/05/multi-agent-orchestration</id><content type="html" xml:base="https://amodelandme.dev/2026/05/multi-agent-orchestration/"><![CDATA[<p>Every "multi-agent framework" I've used eventually rediscovers a paper from</p>
<ol class="post-lede">
  <li>Sometimes badly. Here's a tour of what the field has been calling new
that isn't.</li>
</ol>

<p>I spent last weekend reading the changelogs of four popular multi-agent
frameworks. Three of them shipped, in the same quarter, a feature they each
called something different but which was, structurally, a vector clock.<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>
One of them had a tutorial blog post explaining the design as "novel." It is
not novel. It is from 1978.</p>

<p>This isn't a complaint. It's an observation about where the field is: we are
collectively re-deriving the entire distributed systems literature, with
language-model latency added.</p>

<h2 id="the-same-problems-slightly-relabelled">The same problems, slightly relabelled</h2>

<p>A short table of correspondences I keep on my desk:</p>

<table>
  <thead>
    <tr>
      <th style="text-align: left">Multi-agent thing</th>
      <th style="text-align: left">Distributed systems thing</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: left">Agent loop</td>
      <td style="text-align: left">Event loop / actor</td>
    </tr>
    <tr>
      <td style="text-align: left">Tool dispatch</td>
      <td style="text-align: left">RPC</td>
    </tr>
    <tr>
      <td style="text-align: left">Shared memory / blackboard</td>
      <td style="text-align: left">Distributed cache</td>
    </tr>
    <tr>
      <td style="text-align: left">"Plan revision"</td>
      <td style="text-align: left">Optimistic concurrency control</td>
    </tr>
    <tr>
      <td style="text-align: left">Supervisor agent</td>
      <td style="text-align: left">Cluster coordinator (Raft, Zab, …)</td>
    </tr>
    <tr>
      <td style="text-align: left">"Long-term memory"</td>
      <td style="text-align: left">Replicated log + materialised views</td>
    </tr>
    <tr>
      <td style="text-align: left">Hand-off</td>
      <td style="text-align: left">Message passing with explicit channels</td>
    </tr>
  </tbody>
</table>

<p>The mapping is not 1:1. Some of these have genuinely new properties because
agents are non-deterministic in ways processes are not. But the <em>shape</em> of
the problems — ordering, consistency, recovery, partial failure — is
identical.</p>

<h2 id="three-orderings">Three orderings</h2>

<p>A worked example: ordering. Suppose two agents both update the shared
blackboard concurrently. Which write wins?</p>

<p>Three answers, each with a long history outside AI:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>1. Last-write-wins                   ← lossy; fine for caches, never for plans
2. Lamport timestamps                ← logical clock; orders causally-related events
3. CRDT (e.g., G-Counter, OR-Set)    ← order doesn't matter; merge is deterministic
</code></pre></div></div>

<p>I've watched a popular framework reinvent option 1 (silent overwrites), get
burned, ship option 2 (calling it "agent-aware versioning"), get burned
again on concurrent merges, and finally land on option 3 (calling it
"convergent state"). The cycle took eight months. The original paper is from
<em>1986</em>.<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup></p>

<blockquote>
  <p><strong>note</strong>
If you're building a harness, read the CRDT survey by Shapiro et al.
<em>before</em> you implement your shared-memory layer. It is genuinely much
harder to retrofit consistency than to start with it.</p>
</blockquote>

<h2 id="whats-actually-new">What's actually new</h2>

<p>Some things really are new:</p>

<ul>
  <li><strong>The dependency is non-deterministic.</strong> RPC to a service returns the
same answer for the same input (modulo state). RPC to an LLM does not.
This breaks retry semantics in subtle ways — a "deterministic retry"
isn't.</li>
  <li><strong>The cost model is bizarre.</strong> Latency you can model; per-token cost
with caching makes the "is this retry free?" question much harder.</li>
  <li><strong>State is partially in natural language.</strong> You cannot diff two
agents' worldviews with <code class="language-plaintext highlighter-rouge">git diff</code>. You can with embeddings,
approximately, but the tooling is bad.</li>
</ul>

<p>These deserve their own literature. But the other 80% is just systems
engineering done in a louder room.</p>

<h2 id="what-to-read">What to read</h2>

<p>If you only read three things before writing your next harness:</p>

<ul>
  <li>Lamport, <em>Time, Clocks, and the Ordering of Events in a Distributed
System</em> (1978). The original.</li>
  <li>Shapiro et al., <em>A comprehensive study of Convergent and Commutative
Replicated Data Types</em> (2011). The CRDT survey.</li>
  <li>Nygard, <em>Release It!</em> (2007). Failure modes. The circuit-breaker
chapter alone is worth the cover price.</li>
</ul>

<p>I'm not against frameworks. I'm against pretending we're inventing what
we're rediscovering.</p>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:1" role="doc-endnote">
      <p>A vector clock is a list of per-process counters that lets you
  establish a partial ordering of events in a distributed system. If
  that sentence sounds like every "agent context" feature you've seen
  shipped in 2025, you understand my point. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:2" role="doc-endnote">
      <p>Strictly: the foundational CRDT-ish work is from 1986 (Wuu &amp;
  Bernstein on a replicated dictionary); the modern formalization is
  Shapiro et al. (2011). <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>]]></content><author><name>a model and me</name></author><category term="multi-agent" /><category term="complexity" /><category term="distributed-systems" /><summary type="html"><![CDATA[Every "multi-agent framework" I've used eventually rediscovers a paper from Sometimes badly. Here's a tour of what the field has been calling new that isn't.]]></summary></entry><entry><title type="html">Building a deterministic agent loop in C# 13</title><link href="https://amodelandme.dev/2026/04/deterministic-agent-loop/" rel="alternate" type="text/html" title="Building a deterministic agent loop in C# 13" /><published>2026-04-29T00:00:00+00:00</published><updated>2026-04-29T00:00:00+00:00</updated><id>https://amodelandme.dev/2026/04/deterministic-agent-loop</id><content type="html" xml:base="https://amodelandme.dev/2026/04/deterministic-agent-loop/"><![CDATA[<p class="post-lede">A walk-through of the agent loop I've been running in production, with the
parts that surprised me marked.</p>

<p>C# 13's primary constructors and <code class="language-plaintext highlighter-rouge">field</code> keyword make this kind of code
shorter than it used to be. Here's the minimum viable shape of an agent
loop, fully deterministic given a fixed model temperature and a seedable
tool layer.</p>

<h2 id="the-contract">The contract</h2>

<p>An agent loop, at minimum, is:</p>

<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">public</span> <span class="k">interface</span> <span class="nc">IAgent</span>
<span class="p">{</span>
    <span class="n">Task</span><span class="p">&lt;</span><span class="n">AgentResult</span><span class="p">&gt;</span> <span class="nf">RunAsync</span><span class="p">(</span>
        <span class="n">AgentRequest</span> <span class="n">request</span><span class="p">,</span>
        <span class="n">CancellationToken</span> <span class="n">ct</span> <span class="p">=</span> <span class="k">default</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>

<p>Everything else is implementation detail. The request carries the seed
context; the result carries the final output plus a trace.</p>

<h2 id="the-loop">The loop</h2>

<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">public</span> <span class="k">sealed</span> <span class="k">class</span> <span class="nc">Agent</span><span class="p">(</span>
    <span class="n">IModelClient</span> <span class="n">model</span><span class="p">,</span>
    <span class="n">IToolRegistry</span> <span class="n">tools</span><span class="p">,</span>
    <span class="n">AgentPolicy</span> <span class="n">policy</span><span class="p">)</span> <span class="p">:</span> <span class="n">IAgent</span>
<span class="p">{</span>
    <span class="k">public</span> <span class="k">async</span> <span class="n">Task</span><span class="p">&lt;</span><span class="n">AgentResult</span><span class="p">&gt;</span> <span class="nf">RunAsync</span><span class="p">(</span>
        <span class="n">AgentRequest</span> <span class="n">request</span><span class="p">,</span>
        <span class="n">CancellationToken</span> <span class="n">ct</span> <span class="p">=</span> <span class="k">default</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="kt">var</span> <span class="n">trace</span> <span class="p">=</span> <span class="k">new</span> <span class="nf">AgentTrace</span><span class="p">(</span><span class="n">request</span><span class="p">.</span><span class="n">Id</span><span class="p">);</span>
        <span class="kt">var</span> <span class="n">conversation</span> <span class="p">=</span> <span class="n">request</span><span class="p">.</span><span class="n">SeedMessages</span><span class="p">.</span><span class="nf">ToList</span><span class="p">();</span>

        <span class="k">for</span> <span class="p">(</span><span class="kt">var</span> <span class="n">step</span> <span class="p">=</span> <span class="m">0</span><span class="p">;</span> <span class="n">step</span> <span class="p">&lt;</span> <span class="n">policy</span><span class="p">.</span><span class="n">MaxSteps</span><span class="p">;</span> <span class="n">step</span><span class="p">++)</span>
        <span class="p">{</span>
            <span class="n">ct</span><span class="p">.</span><span class="nf">ThrowIfCancellationRequested</span><span class="p">();</span>

            <span class="kt">var</span> <span class="n">response</span> <span class="p">=</span> <span class="k">await</span> <span class="n">model</span><span class="p">.</span><span class="nf">ChatAsync</span><span class="p">(</span><span class="n">conversation</span><span class="p">,</span> <span class="n">tools</span><span class="p">.</span><span class="n">Schema</span><span class="p">,</span> <span class="n">ct</span><span class="p">);</span>
            <span class="n">conversation</span><span class="p">.</span><span class="nf">Add</span><span class="p">(</span><span class="n">response</span><span class="p">.</span><span class="n">Message</span><span class="p">);</span>
            <span class="n">trace</span><span class="p">.</span><span class="nf">Record</span><span class="p">(</span><span class="n">step</span><span class="p">,</span> <span class="n">response</span><span class="p">);</span>

            <span class="k">if</span> <span class="p">(</span><span class="n">response</span><span class="p">.</span><span class="n">Message</span><span class="p">.</span><span class="n">ToolCalls</span> <span class="k">is</span> <span class="n">not</span> <span class="p">{</span> <span class="n">Count</span><span class="p">:</span> <span class="p">&gt;</span> <span class="m">0</span> <span class="p">}</span> <span class="n">calls</span><span class="p">)</span>
                <span class="k">return</span> <span class="n">AgentResult</span><span class="p">.</span><span class="nf">Final</span><span class="p">(</span><span class="n">response</span><span class="p">.</span><span class="n">Message</span><span class="p">.</span><span class="n">Content</span><span class="p">,</span> <span class="n">trace</span><span class="p">);</span>

            <span class="k">foreach</span> <span class="p">(</span><span class="kt">var</span> <span class="n">call</span> <span class="k">in</span> <span class="n">calls</span><span class="p">)</span>
            <span class="p">{</span>
                <span class="kt">var</span> <span class="n">observation</span> <span class="p">=</span> <span class="k">await</span> <span class="n">tools</span><span class="p">.</span><span class="nf">InvokeAsync</span><span class="p">(</span><span class="n">call</span><span class="p">,</span> <span class="n">ct</span><span class="p">);</span>
                <span class="n">conversation</span><span class="p">.</span><span class="nf">Add</span><span class="p">(</span><span class="n">ToolMessage</span><span class="p">.</span><span class="nf">From</span><span class="p">(</span><span class="n">call</span><span class="p">,</span> <span class="n">observation</span><span class="p">));</span>
                <span class="n">trace</span><span class="p">.</span><span class="nf">Record</span><span class="p">(</span><span class="n">call</span><span class="p">,</span> <span class="n">observation</span><span class="p">);</span>
            <span class="p">}</span>
        <span class="p">}</span>

        <span class="k">return</span> <span class="n">AgentResult</span><span class="p">.</span><span class="nf">Exhausted</span><span class="p">(</span><span class="n">trace</span><span class="p">);</span>
    <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<blockquote>
  <p><strong>note</strong>
Determinism comes from three places: (1) <code class="language-plaintext highlighter-rouge">temperature=0</code> on the model
client, (2) <code class="language-plaintext highlighter-rouge">tools</code> deterministic for a given input, and (3)
<code class="language-plaintext highlighter-rouge">policy.MaxSteps</code> finite. Drop any one and you've got nondeterminism
back.</p>
</blockquote>

<h2 id="what-surprised-me">What surprised me</h2>

<p>Three things, in increasing order of how long it took me to notice.</p>

<h3 id="1-tool-ordering-inside-a-single-turn-matters">1. Tool ordering inside a single turn matters</h3>

<p>The loop above iterates tool calls in the order the model returned them.
If a model returns <code class="language-plaintext highlighter-rouge">[search, calculator]</code> in one turn and the search depends
on a value the calculator produces, you've lost. I've been splitting these
across turns (one tool per response) and getting better behaviour, at the
cost of latency.</p>

<h3 id="2-the-trace-is-the-product">2. The trace is the product</h3>

<p>The thing I wish I'd known on day one: the trace — not the final answer
— is what you debug from, what you replay from, what you cache against.
Make it a first-class object from the start. Mine has shape:</p>

<div class="language-csharp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">public</span> <span class="k">sealed</span> <span class="n">record</span> <span class="nf">AgentTrace</span><span class="p">(</span><span class="n">Guid</span> <span class="n">RunId</span><span class="p">)</span>
<span class="p">{</span>
    <span class="k">public</span> <span class="n">ImmutableList</span><span class="p">&lt;</span><span class="n">TraceEntry</span><span class="p">&gt;</span> <span class="n">Entries</span> <span class="p">{</span> <span class="k">get</span><span class="p">;</span> <span class="n">init</span><span class="p">;</span> <span class="p">}</span> <span class="p">=</span> <span class="p">[];</span>
    <span class="k">public</span> <span class="n">DateTimeOffset</span> <span class="n">StartedAt</span> <span class="p">{</span> <span class="k">get</span><span class="p">;</span> <span class="n">init</span><span class="p">;</span> <span class="p">}</span> <span class="p">=</span> <span class="n">DateTimeOffset</span><span class="p">.</span><span class="n">UtcNow</span><span class="p">;</span>
    <span class="k">public</span> <span class="n">DateTimeOffset</span><span class="p">?</span> <span class="n">FinishedAt</span> <span class="p">{</span> <span class="k">get</span><span class="p">;</span> <span class="n">init</span><span class="p">;</span> <span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>

<p>I serialize one of these per run. They're cheap, they replay exactly, and
they're the only way I've found to debug a failed run two weeks later.</p>

<h3 id="3-iasyncenumerable-is-right-but-later">3. <code class="language-plaintext highlighter-rouge">IAsyncEnumerable</code> is right, but later</h3>

<p>You'll be tempted to make <code class="language-plaintext highlighter-rouge">RunAsync</code> an <code class="language-plaintext highlighter-rouge">IAsyncEnumerable&lt;TraceEntry&gt;</code> so
callers can stream. Resist for v1. Streaming complicates cancellation,
cancellation complicates the breaker (yesterday's <a href="/2026/05/circuit-breaker/">post</a>), the
breaker complicates everything. Ship the <code class="language-plaintext highlighter-rouge">Task&lt;AgentResult&gt;</code> version first.
Stream when the product team asks for it twice.</p>

<blockquote>
  <p><strong>tip</strong>
If you're going to expose streaming, do it as a separate <code class="language-plaintext highlighter-rouge">RunStreamingAsync</code>
method that internally calls into the same primitives. Keep the simple
path simple.</p>
</blockquote>

<h2 id="where-this-goes">Where this goes</h2>

<p>Next post: tool dispatch under contention — what happens when two agents
in the same harness want the same tool at the same time, and how
deterministic ordering survives.</p>]]></content><author><name>a model and me</name></author><category term=".net" /><category term="harness" /><category term="code" /><summary type="html"><![CDATA[A walk-through of the agent loop I've been running in production, with the parts that surprised me marked.]]></summary></entry></feed>