Over-document your decisions, or future-you will resent past-you

About a year and a half ago I picked SQS over Kafka for a job queue. There were good reasons. We had two services, modest throughput, an existing AWS footprint, no streaming team, and a product team that wanted the feature shipped that quarter. SQS was the boring, correct choice, and it has been quietly working ever since.

Last month a junior engineer joined the team, looked at the architecture diagram, and asked the question I knew was coming: “shouldn’t we just use Kafka?”. I opened my mouth to explain, and realised I could remember the decision but not the reasoning. The numbers were gone. The constraints had shifted. The thread where we had hashed it out was in a Slack channel that had been archived during the org redesign, and Slack’s search returned nothing useful.

It took me two hours to reconstruct the argument from memory and a few git logs. Two hours that I did not have, talking past a smart engineer who had every right to push back, because I could not show him the work. The whole thing would have taken ten minutes if I had written an ADR when I made the original decision.

That was the week I stopped pretending I would remember.

What an ADR is, and why the format matters

An Architecture Decision Record is a short document that captures one decision, with four sections, in a fixed order. The fixed order is the whole point. It forces you to separate “what we decided” from “why” from “what we are giving up”, and it makes the document scannable a year later when you have forgotten everything.

The format I use, lifted almost directly from Michael Nygard’s original 2011 essay:

Status. Proposed, accepted, deprecated, superseded by ADR-N. One word. This is the first thing future-you will look at, because the most important question about an old document is whether it is still in force.

Context. What was the situation when we made this decision? What constraints, what pressures, what was on fire? Two paragraphs at most. The point is not to describe the entire system, just the slice that mattered.

Decision. What did we choose? One or two sentences. Specific enough that someone could implement it.

Consequences. What follows from this choice? The good, the bad, and the things we will need to revisit later. This is the section that ages best, because it tells future-you what to watch for.

That is it. Four sections. A good ADR is usually under a page. A great one is sometimes three paragraphs. The discipline is in the brevity, not the depth.

For the SQS-vs-Kafka decision, the ADR I should have written would have been about two hundred words. It would have said: status accepted; context was two services and forty messages per second peak with no streaming expertise on the team; decision was SQS with FIFO queues and a dead letter queue; consequences were that we accept up to one second of latency, we will need to revisit if throughput exceeds five thousand messages per second, and we explicitly chose not to invest in Kafka tooling at this stage. Two hundred words. Two hours saved. A junior engineer who could read the doc, agree or disagree on the merits, and move on.

Where to keep them

In the repo. Not in Confluence, not in Notion, not in a shared drive, not in the head of the engineer who made the decision. In the repo, next to the code they describe.

The convention I have seen work best is a docs/adr/ folder with files named 0001-use-sqs-for-job-queue.md, 0002-postgres-for-tenant-metadata.md, and so on. Sequential numbers, kebab-case slugs, and one decision per file. They get reviewed as part of the pull request that implements the decision, which means they are reviewed by the people who care, at the moment they care, and they cannot drift out of sync with the code.

I have tried the alternatives. Confluence is where documents go to die: the search is bad, the URLs change, and nobody opens it without being asked. Notion is better but lives outside the engineering workflow; an ADR you cannot find via git grep is an ADR that does not exist for the people debugging at 2am. Slack threads are the worst of all: they have all the context, no structure, and they vanish when the channel gets archived.

The repo wins because it makes the documentation part of the code. New engineer joins, clones the repo, runs ls docs/adr, and reads the history of the system in twenty minutes.

What deserves an ADR

The instinct most engineers have is “ADRs are for the big stuff”. This is wrong. Big decisions get RFCs and meetings and slide decks. They are well-documented by construction. The ones that bite you are the small ones that nobody thought were a big deal at the time.

A non-exhaustive list of things I now write ADRs for:

We tried Postgres full-text search, switched to Elasticsearch because of language stemming. ADR explains why a future engineer should not rip out Elasticsearch and “simplify”.
We use UTC for all timestamps in the warehouse, even though the company is European. ADR explains why daylight savings would have made dashboards lie.
We chose dbt incremental models over snapshots for a specific table because the data was append-only. ADR explains why the obvious-looking refactor to snapshots would lose information.
We put the staging schema in a separate database, not a separate schema in the same database. ADR explains a billing/quota constraint that no longer exists today, but the migration cost would be enormous, so the decision stands.

Every one of those is a single-paragraph debate that has happened more than once on my team. Without the ADR, it happens again every six months, with a different engineer, and sometimes the answer flips because the original reasoning is gone.

The rule of thumb I use: if I can imagine someone asking “why did we do this?” and the answer being non-obvious, I write the ADR. If the answer is obvious from the code, I do not.

The “I’ll remember” fallacy

Here is the part where I have to be honest with myself: I do not remember.

I used to think I would. I have a decent memory for code and architecture, I think about systems for a living, and I have a strong narrative sense for why decisions got made. None of that survives eighteen months and three context switches. By the time the question comes back, the precise constraints have faded, the numbers are gone, and what I am left with is a vibe. “I think we did SQS because of cost.” Maybe. But also throughput, and team skills, and a deadline, and an existing IAM setup, and a comment from someone in another team who has since left the company.

A vibe is not a defence in a technical argument. A vibe loses to a confident junior engineer with a fresh idea and a Kafka tutorial open in another tab. And losing that argument can mean a six-week migration to a system that turns out to be worse for your specific use case, because nobody could remember why you did not pick it the first time.

The thing I have come to believe is that “I’ll remember” is a form of optimism that costs about ten minutes to debunk and weeks of work to indulge. Future-you is a stranger. Treat them like one. Leave them notes.

How to write one in ten minutes

The biggest objection to ADRs is “I do not have time”. Fair. Almost nobody has time for documentation when the deadline is tomorrow. Here is the version of the ritual I actually do.

When the decision is made, before I close the tab on whatever discussion produced it, I do this:

Open the repo, create docs/adr/NNNN-short-slug.md.
Status: accepted. Date.
Two sentences for context. The minimum viable description of the situation.
One sentence for the decision.
Three to five bullet points for consequences. Good and bad. What we are accepting, what we are giving up, what would make us revisit.
Commit it as part of the PR that implements the decision, or in a follow-up PR within twenty-four hours.

That is ten minutes of work. Most of it is typing. The thinking has already been done; the document is just exhaust from the decision. The hard part is the discipline of doing it before the context fades, because two days later you will have to reconstruct half of it, and a week later you will have to reconstruct all of it.

If I am being honest, I still skip ADRs sometimes. The ones I skip are always the ones I regret. There is a strong correlation between “I’ll just remember this one” and “wait, why did we do this?” eight months later.

A trick that has helped: pair the ADR with the postmortem of any decision-related incident. If you ever find yourself in an outage caused by a forgotten constraint, the postmortem action item should include “write the ADR we should have written”. You only have to do that twice before you start writing them up front.

The first ADR you write is the one that pays for all the others, because the work is in the habit, not the document. The format does not matter as much as the act: status, context, decision, consequences, in a markdown file in the repo, written when the context is fresh. The cost is ten minutes. The savings are measured in arguments you do not have to relitigate, migrations you do not accidentally undo, and the quiet relief of being able to point at a doc when a junior engineer asks the obvious question and deserves a real answer.