Trunk-based development: why most modern teams converged here

Lesson 49 introduced the three branching strategies that account for almost all professional practice. Gitflow with its formal release branches, GitHub flow with its short-lived feature branches, and trunk-based development where everyone commits straight to main and in-progress work hides behind feature flags. The lesson noted that trunk-based is the pattern most modern teams converge on as they grow, and it is the pattern Google, Facebook, and Microsoft publicly run at the largest scales. This lesson explains why.

The argument is not that trunk-based is universally correct. It has prerequisites. It needs feature flags. It needs strong CI. It demands a cultural shift in how developers think about “ready”. The point is that once the prerequisites exist, trunk-based is unusually well-aligned with how software actually gets built at speed, and the alternatives start to feel like overhead rather than safety. Understanding the pattern in depth is also useful even if the team chooses GitHub flow, because every drift in the GitHub-flow direction (shorter branches, more feature flags, faster CI) is a drift toward trunk-based, and recognising that helps the team make incremental improvements without committing to a full migration.

The pattern in one paragraph

Every commit goes to main within hours. Branches still exist, but only for the duration of code review, and the review is on small changes (tens to a few hundred lines, not thousands). Tests run on every push and on every merge. CI is fast enough that a developer can push a change and see green within a few minutes. Code that is not yet ready for users to see is still on main, but it is wrapped in a feature flag: a runtime switch that controls whether the code path executes. The flag stays off until the feature is finished, tested, and ready to release. Then the flag turns on, in production, often gradually. Releasing becomes flipping a flag, not deploying a build.

That is the whole pattern. The depth is in the prerequisites and the cultural shift.

Why large teams converged here

Three forces push organisations toward trunk-based as they grow.

Conflict avoidance dominates at scale. In a 200-engineer monorepo, the cost of merge conflicts is not theoretical. Two-week-old branches conflict with thousands of commits’ worth of changes, and the conflict resolution is dangerous: the developer rewriting the merge has limited context on what the other thousands of commits did. Hours-old branches barely conflict, because the surface area of changed code in a few hours is small. A team that works on hundreds of branches simultaneously, each living a week, will spend more engineering time on merge resolution than on the work itself. Hours-old branches make this cost vanish.

CI scales with commits, not branches. The throughput of a CI system is measured in builds per unit time. A team running 100 small commits to main per day is asking CI to run 100 builds, each on a small change. A team running 10 long-lived branches per day, each with weeks of accumulated work, is also asking CI to run builds, but each build is on a larger surface, takes longer, fails more often, and produces less actionable signal when it fails. Trunk-based development naturally produces the workload pattern CI is best at: many small changes, each individually tractable.

Continuous deployment becomes possible. When every commit on main is integration-ready, the deploy pipeline can be triggered on every merge without human intervention. Some organisations do that literally; others batch deploys hourly or daily but still treat every main commit as deploy-eligible. The batch-versus-continuous-deploy choice becomes operational rather than architectural, because the architecture (every commit is releasable) supports either.

The combination is what makes the pattern work at scale. Conflicts disappear because branches are short. CI handles small changes well, so feedback is fast. Continuous deployment becomes routine because every commit is ready. The flywheel reinforces itself: faster feedback encourages smaller changes, smaller changes reduce conflicts further, fewer conflicts encourage developers to commit even more often.

The prerequisites

The pattern works only if four pieces are in place. Skip any one of them and trunk-based becomes painful or dangerous.

Feature flags at scale. A feature flag is a runtime switch controlling whether code runs. The simplest implementation is a configuration value read at request time: if config.features.new_checkout: handle_with_new_logic() else: handle_with_old_logic(). Production-grade implementations are richer: per-user targeting, percentage rollouts, kill switches, scheduled enables, audit logs of who flipped what when. The commercial offerings include LaunchDarkly, Unleash, ConfigCat, GrowthBook, and Statsig. The standardisation effort is OpenFeature (https://openfeature.dev/, retrieved 2026-05-01), which is a CNCF project providing a vendor-neutral SDK so teams can switch flag providers without rewriting application code. Many teams build a homegrown system, especially in larger organisations where the requirements are specific.

The capability that matters is dynamic. Code merged to main ships, but the flag stays off until the team is ready. The flag can be turned on for a small percentage of users first, monitored, expanded, and rolled back instantly if something goes wrong. This is the mechanism that makes “every commit deploys” safe, because the deploy is not the release: the flag flip is the release.

Strong CI. Every push to a branch and every merge to main runs the test suite. The bar is high: the CI pipeline must be fast enough that developers wait minutes, not hours, for feedback, and reliable enough that a red build means a real failure rather than a flake. A team that lets red builds linger on main has lost the property that makes trunk-based safe, because suddenly nobody knows which commits broke things and the integration line stops being a stable foundation.

The mechanical practices that support this. Mandatory pre-merge CI: no commit lands on main without a green build on the proposed change. Branch protection rules in GitHub or GitLab enforce this without relying on developer discipline. Test parallelism so the suite finishes in minutes even as it grows. Aggressive flake elimination, because flaky tests train developers to ignore CI failures, which destroys the integration line.

Code review on small changes. Pull requests are hours old, not weeks old. The diff is small enough that a reviewer can read it in fifteen minutes. The author has not invested two weeks of work in it, so feedback is cheap to apply. This is a cultural property as much as a technical one: the team has to internalise that small, frequent PRs are better than large, infrequent ones, and the tooling has to support it. GitHub, GitLab, Gerrit, and Phabricator all have idioms for chained or stacked PRs that let a developer split a logically large change into a sequence of small reviewable pieces.

Test infrastructure that catches issues at PR time. The test suite is not just unit tests. It includes integration tests against ephemeral environments, contract tests between services, regression suites for known failure modes, and (for data work) the kinds of pipeline-level tests lesson 51 covers. The investment is significant. The payoff is that issues surface on the PR rather than on main, which means they cost a few minutes to fix rather than a production incident to recover from.

The cultural shift

The technical prerequisites are the easier part. The harder part is the cultural shift in how developers think about “ready”.

In gitflow or long-branch GitHub flow, the developer’s mental model is “I will merge this when it is done”. The branch is private workspace; the merge is the moment the work becomes public. The branch can stay private for as long as the work takes. Mistakes are recoverable on the branch, because nothing has been integrated yet.

In trunk-based, the developer’s mental model is “I will land this dark and finish it on main”. The work becomes public on the first commit. The first commit is small (a stub function, a flag default of off, a table that nothing yet writes to) and goes to main. Subsequent commits flesh out the implementation. The flag stays off the whole time, so users see no behaviour change, but the code is on main, integrating with everything else, getting the benefit of CI and review on every increment.

The shift is uncomfortable for developers used to the older model. Three specific objections come up.

“What if my code is not ready for production?” Trunk-based does not require the code to be ready for production. It requires the code to be safe to deploy. A function that exists, is tested, and is unreachable (because nothing calls it, or because the flag is off) is safe to deploy. The user-facing readiness is decoupled from the deploy.

“What if my code is half-finished?” A half-finished function still merges if it has a test and a flag. The test verifies that the half-finished function behaves the way it currently claims to behave. The flag ensures it is not used. The next PR adds more functionality, more tests, and the flag stays off until the whole thing works. Each PR is small, individually mergeable, and individually safe.

“What about big architectural changes?” The pattern is the parallel-implementation strategy. The new architecture is built next to the old one. Both ship to production. The flag controls which one runs for which users. The migration is a gradual flag flip from old to new, with the option to flip back if anything goes wrong. The old code is deleted only when the migration is complete and stable. This is more work than a one-shot rewrite, and it is dramatically safer, and it is the dominant pattern for migrations at large engineering organisations precisely because the safety properties matter at that scale.

flowchart TB
    subgraph build ["Build phase: small commits to main"]
        direction LR
        c1["init"] --> c2["flag<br/>stub off"] --> c3["small<br/>fix"] --> c4["step 1"] --> c5["step 2"] --> c6["step 3"] --> c7["tests"]
    end

    subgraph rollout ["Rollout phase: flip the flag"]
        direction LR
        c8["flag on<br/>5%"] --> c9["flag on<br/>50%"] --> c10["flag on<br/>100%"] --> c11["remove<br/>flag"]
    end

    build --> rollout
    rollout --> next["next feature, same loop"]

    classDef commit fill:#0d9488,stroke:#0d9488,color:#ffffff
    classDef flag fill:#fff5e6,stroke:#c89200,color:#5a3e00
    classDef boundary fill:transparent,stroke:#0d9488,stroke-dasharray: 5 5
    class c1,c2,c3,c4,c5,c6,c7,next commit
    class c8,c9,c10,c11 flag
    class build,rollout boundary

Diagram to create: a polished gitGraph of trunk-based with continuous tiny commits to main. Each commit is small. The “flag on” sequence shows the gradual rollout: 5%, 50%, 100%, remove flag. The visual point is that there are no branches in the diagram, the work is a sequence of small commits, and the release is a series of flag flips rather than a deploy.

The case studies

The pattern’s credibility comes partly from the public case studies. Google’s monorepo with thousands of engineers committing to one repository, described in Rachel Potvin and Josh Levenberg’s 2016 ACM paper “Why Google Stores Billions of Lines of Code in a Single Repository”, is the canonical example: trunk-based development at the largest scale ever publicly documented. Facebook’s monorepo is similar in shape and was described in the Facebook engineering blog over the 2014 to 2018 period. Microsoft’s move of the Azure DevOps codebase to trunk-based was documented in detail in a 2018 series of posts on the company’s engineering blog.

Mid-scale examples include Etsy’s deployment pipeline (made famous through Etsy’s engineering blog and the talks of John Allspaw and Ian Malpass over 2010 to 2015) and the publicly documented practices of Spotify, Booking.com, and many others. The pattern is not exotic; it is the operational baseline at most engineering organisations of more than a couple of hundred engineers, and at many smaller ones too.

The trunk-based site (https://trunkbaseddevelopment.com, retrieved 2026-05-01) maintains a longer list of case studies and a substantial body of practical guidance. The book “Accelerate” by Forsgren, Humble, and Kim (IT Revolution, 2018) draws on the State of DevOps research and identifies trunk-based development as one of the practices statistically associated with high-performing engineering organisations. The correlation does not prove causation, but the evidence is consistent enough that the pattern’s defenders treat it as established.

When trunk-based does not fit

The pattern is not universal. Three populations are better served by something else.

Small teams without feature-flag infrastructure. A four-person team working on an internal tool does not need the overhead of feature flags, ephemeral environments, and parallel implementations. The complexity of trunk-based exceeds its benefit at this scale. GitHub flow with one-day branches gets the team most of the way to the same place, with much less infrastructure investment.

Versioned-release products. Mobile apps, libraries, on-prem software, and embedded firmware all have a release process where “what we shipped” is meaningfully different from “what we are working on”. Gitflow’s branching model captures that difference explicitly. Trunk-based can be made to work for these products (especially with feature flags), but the natural fit is gitflow, and the team should resist the urge to use the trendier pattern for the sake of trendiness.

Regulatory environments where every release is explicit. Medical devices, banking core systems, aerospace software, and other industries with formal regulatory approval for each release cannot deploy on every commit. The regulator wants to know what is in this release, signed off by whom, validated against which test suite. Trunk-based development is incompatible with that requirement, because the cadence is wrong: there is no “this release”, there is only “what is on main now”. The pattern that fits here is closer to gitflow with formal release branches, formal sign-offs, and a deploy pipeline that includes regulatory review as a gate.

The data-pipeline angle

Data pipelines can use trunk-based development if the integration test infrastructure (lesson 51) is good enough and the jobs are idempotent enough (lesson 38) to handle re-running on bad data.

The principle is the same as for service code: the change ships to main, runs in production behind a flag, and is validated against real data before being promoted. For batch pipelines this typically means writing the new transformation alongside the old one, into a separate output table, and comparing the two outputs before cutting over. For streaming pipelines it means running both the old and new processors against the same input stream and comparing their outputs. Either way, the flag’s role is the same as in a service: it decouples deploy from release, so the team can deploy frequently without committing to a behaviour change every time.

The next lesson goes deep on the testing infrastructure that makes any of this safe for data work, because trunk-based development without confidence in the tests is just chaos with a fancier name.

Citations and further reading

Paul Hammant and contributors, “Trunk-Based Development”, https://trunkbaseddevelopment.com (retrieved 2026-05-01). The reference site, including detailed practical guidance and case studies.
Rachel Potvin and Josh Levenberg, “Why Google Stores Billions of Lines of Code in a Single Repository”, Communications of the ACM, July 2016. The canonical description of monorepo-plus-trunk-based at Google scale.
Nicole Forsgren, Jez Humble, and Gene Kim, “Accelerate: The Science of Lean Software and DevOps” (IT Revolution, 2018). The empirical case for trunk-based development as a high-performance practice.
OpenFeature project, https://openfeature.dev/ (retrieved 2026-05-01). The vendor-neutral feature-flag SDK and specification.
LaunchDarkly engineering blog and Unleash documentation, both linked from the OpenFeature site, for production-grade examples of feature-flag systems and the patterns built around them.