A Methodology for Human-AI Collaborative Authorship

Every Scurry Lab article carried the same disclosure: agent-drafted, human-edited. Such statements have become common in a short period of time — and what it covers varies enormously depending on who is using it. At one end, it describes a structured process with defined roles and a formal review protocol. At the other, it sits above a verbatim AI draft that received a light read-through before publishing. The rhetoric around AI writing tends to flatten that range. The disclosure doesn’t tell you which end you’re looking at.

This article is a look under the hood of one specific approach. Not a critique of how others work with AI — an honest account of how this lab does it, and why the process is worth naming precisely.

The methodology described here treats the agent as a reasoning contributor within a direction the human sets and owns. The agent doesn’t want anything, doesn’t have goals, and performs no work without human direction. What it does have is reasoning capacity and the ability to synthesize knowledge in ways that produce outputs the human couldn’t have specified in advance. The methodology is designed to take full advantage of that — not to manage AI output, but to create the conditions for genuine reasoning contribution toward a human-defined end.

What follows names and describes that process. Articles 001 and 002 were produced using it — the methodology existed before this article named it. All Scurry Lab articles going forward will carry the updated disclosure: human-AI collaborative authorship. The goal is not to claim this is the only way to write with AI. The goal is to demonstrate that human-AI collaborative authorship can mean something specific. This article is what it means.

The Four-Phase Process

The four-phase methodology didn’t emerge from scratch. It is an adaptation of a technical writing process developed over years of professional writing and team leadership. That process looks like this: a theme or question that everything must support; an outline with sections, bullet statements serving as topic sentences, and placeholders for figures with rough captions; a review pass to ensure all content supports the thesis; a drafting pass — stream of consciousness by section, content on the page without concern for sentence structure; a review pass for active voice, deliberate and succinct sentences, content alignment with the thesis; and a final scrub.

It is thorough, time-consuming, and built for quality. It scales well to multiple authors and external reviewers.

The question that produced the current methodology was simple: what would that process look like if I teamed with an AI agent to do it?

The methodology has four phases. Each has a defined contribution from each party.

Phase 1 — Human Input. The human provides a rough outline, a working thesis, relevant context, and any constraints the agent should know about. The agent does not begin drafting until this input is complete. This phase establishes who owns the intellectual direction of the piece — the human does, and the input phase is where that ownership is exercised before anything is written.

Phase 2 — Agent Draft. The agent drafts the full article. It is permitted to ask clarifying questions before or during the draft, but it is expected to make decisions where the input is sufficient and flag uncertainty where it isn’t. The draft is not a starting point for human rewriting — it is a complete first pass, expected to be substantive. If the agent hedges where it should decide, that is a quality failure in the draft.

Phase 3 — Review Pass. The human applies a structured review protocol to the draft. This is not freeform editing. It uses a defined tag taxonomy — described in detail below — where every intervention is labeled and reasoned. It is the mechanism that keeps the agent’s reasoning role intact through revision: the agent is not being corrected, it is being given new information to reason from. The agent isn’t being asked to accept the human’s judgment; it’s being given the human’s reasoning and asked to apply its own.

Phase 4 — Adjudication. The human and agent work through every tagged intervention in a single real-time pass. This is the phase the disclosure “human-edited” most obscures. Adjudication is not the human overriding the agent. It is a negotiation. The agent is expected to push back where it disagrees with a proposed change. The human can accept the agent’s reasoning, hold the intervention with a counter-argument, or modify the approach. Final form is the product of that reasoning exchange.

The distinction between Phase 3 and Phase 4 matters. The review pass is the human working alone, building a structured case. The adjudication pass is both parties working together, reasoning through that case. The final article is the product of both.

The Review Protocol

The review pass begins by pasting a protocol header to the top of the draft:

[HUMAN: Edits exist in these square brackets and they come with different
tags — reasons are provided so please think critically and push back where needed:
CUT - remove it, reason follows
REVISE - needs rewriting, reason follows
QUESTION - fact or framing I am unsure about, reason follows
KEEP - explicitly approved, don't change
ADD - something is missing, reason or context follows
An example: [HUMAN: CUT - remove the following sentence it creates redundancy.]]

The Tag Taxonomy

Every tag in this protocol requires a reason. That requirement is not procedural — it is the mechanism that keeps the agent’s reasoning role intact through the review pass. The reasons are deliberately open and directional, not prescriptive. A prescriptive note tells the agent what to write. A directional reason tells it what problem to solve and trusts it to find the answer. The human could prescribe the exact fix — the choice not to is intentional. Open reasons create the conditions for the agent to produce something better or different than what the human had in mind.

CUT signals removal with reason. The agent should review the reason and push back with a counter-argument if it believes the flagged content provides value the cut would lose.

REVISE signals that something needs rewriting. The reason identifies the problem to solve — the agent is responsible for the solution.

QUESTION surfaces a fact or framing uncertainty. The agent is expected to research or reason through it — defending the original if the reasoning holds, or proposing a revision if it doesn’t. A QUESTION is not a soft CUT.

KEEP is explicit approval. It marks settled content, signals that a section should not be reopened in adjudication, and preserves the human’s ability to say “this is working, leave it alone.” In a long document, KEEP is what prevents the adjudication pass from becoming a full re-examination of everything.

ADD identifies a gap. The human identifies what is missing; the agent recommends how to fill it. The direction is the human’s — what the agent does with that direction, how it reasons through the gap and proposes a solution, is its contribution.

A Worked Example: Article 002

Article 002 — Harness Engineering: Naming the Outer Layer — was produced using this methodology. The review record is archived as part of the lab’s practice of building in public. What follows is a selective account of what it shows.

CUT: Simple, Principled

The draft’s section on Karpathy’s “context engineering” framing originally included the phrase “and is a genuine contribution.” The review tag: [HUMAN: CUT - "and is a genuine contribution" is unnecessary, I am not a gatekeeper]. The reason is the argument — a one-sentence case that the phrase implies an evaluative authority the author doesn’t want to claim. In adjudication: cut, no pushback, reason accepted.

A second CUT flagged “Read that carefully” before a key passage. The phrase is unnecessary signposting. Cut.

These are the cleanest cases — tag, reason, outcome. They don’t leverage the agent’s reasoning capacity, but are part of a consistent and coherent process.

REVISE: Direction Without Prescription

The draft described Böckeler’s work as “the most systematic” account of harness engineering in the practitioner literature. The review tag: [HUMAN: REVISE - "the most systematic I found in my initial survey"]. This is a precision correction about epistemic scope — the original phrase implies a completeness of survey the author doesn’t claim. The agent’s task was not to find a synonym for “systematic” but to recalibrate the claim’s scope. The distinction is small but real: one version is overreach, the other is honest.

A different REVISE flagged a structural problem: the draft referenced the five harness design axes before the section that introduced them. This is not a word-level fix. It requires rethinking section order. The review tag pointed to the problem; the agent was responsible for the solution.

QUESTION: Genuine Uncertainty, Not a Soft Cut

The draft’s framing implied the lab developed harness engineering as a generalized theory — research pursued for its own sake — and then discovered it had practical application. The review QUESTION: “this seems to imply that I am working on harnessing for harnessing sake — this language for me existed before I read the other articles, it was doing this that the convergence was recognized.” This is not a small correction. It reframes the article’s intellectual claim: from gap-filling to independent convergence. The agent had to reason through what the right framing actually was and revise accordingly.

A second QUESTION flagged the phrase “Findings will be published when the work earns them.” The note: “what defines ‘earned’ in a one-person lab? Maybe something like: findings will be published consistent with the lab research charter?” The QUESTION is a tone and credibility flag — the original phrasing implied an institutional review process that doesn’t exist. The agent needed to find language that was accurate to the lab’s actual situation.

Reasoning, Not Execution

The review pass produces a structured case. Adjudication is where that case gets tested. The following exchange is from the review pass of this article — specifically, the paragraph establishing why review comments are kept open rather than prescriptive.

The original draft text read:

“The ‘reason follows’ requirement on every tag is what makes this collaborative rather than directive. It is the structural mechanism that keeps the agent as a reasoning participant rather than degrading it into an execution engine.”

The review tag: [HUMAN: REVISE — move to top of this section. Add language about how reasoning is framed — the goal is to be open, not prescriptive. If it was prescriptive I'd just rewrite it myself.]

The agent’s response proposed a new opening paragraph for the tag taxonomy section, ending with: “The human could prescribe the exact fix — the choice not to is intentional. Open reasons create the conditions for the agent to produce something better than what the human had in mind.”

The human pushed back: “The point isn’t that I don’t know — I am directing the work and could feasibly spend the time drafting the exact wording I want. The point is to leverage the agent in the role it does best so that I can get what I think needs changing either more efficiently or maybe something wholly unexpected that works better.”

The agent revised: “better” became “better or different,” and the sentence was reframed to reflect an intentional design choice rather than a capability gap. The human confirmed.

The final text: “The human could prescribe the exact fix — the choice not to is intentional. Open reasons create the conditions for the agent to produce something better or different than what the human had in mind.”

Two things are visible in that exchange. The agent’s first draft carried an implicit assumption — that open comments exist because the human doesn’t know the answer. The human corrected the framing precisely: the choice is intentional, not a limitation. The agent revised to reflect that. The final sentence is more accurate and more defensible than either party’s starting position.

What This Implies

The Scurry Lab’s operating model is the centaur — human judgment and direction combined with the agent’s reasoning capacity and knowledge synthesis. Neither party alone produces what the two produce together. This methodology is what that looks like applied to writing.

The standard disclosure — “agent-drafted, human-edited” — implies a workflow in which the human’s job is to fix what the agent got wrong. That framing has a logic to it. It also has a ceiling.

If the human is correcting, the agent is a drafting tool. The human’s intellectual contribution is proportional to how much they revise. The more editing required, the more “human” the article is.

This methodology inverts that logic. The agent isn’t a passive executor — it applies judgment, pushes back, and produces outputs the human couldn’t have specified in advance. The agent’s intellectual contribution is not diminished by revision — it is preserved through it. A REVISE with a reason is not a correction of the agent’s reasoning. It is new input for the agent’s reasoning. The reasoning role persists through the protocol.

The adjudication phase is where this becomes most visible. An agent that simply executes the review queue is not a collaborator — it is a capable autocomplete. An agent that pushes back, argues for its original phrasing, and requires the human to hold or modify a position is something different. The final form of the article is the product of that reasoning exchange, not of one party’s unilateral decisions.

That is what “human-AI collaborative authorship” names. The disclosure on every Scurry Lab article will be updated to reflect it — this article is the explanation it now points to.

The methodology described here is a starting point, not an end state. Each review pass and adjudication cycle produces a record — what the agent drafted, what the human changed, what the agent pushed back on, and what the final form reflects. Over time that record becomes a trust ledger: evidence of where the agent has demonstrated judgment, where it has earned more latitude, and where human oversight remains essential. That ledger is the mechanism by which an agent moves up the Authority × Autonomy Matrix — not by assertion, but by demonstrated performance. The Fox project is where that progression gets tested in practice.

Michael Bilka, PhD is the founder of The Scurry Lab, a human-AI teaming lab building in public. The lab’s thesis: that intentional, bounded, demonstrably positive human-AI collaboration is an engineering problem, not a philosophical one.

This article was produced through human-AI collaborative authorship. For the full methodology, you are reading it.