Notes on the Monorepo Pattern

Notes on the Monorepo Pattern

12/04/2022 15:00:00

Monorepos (meaning "a singular repository") is a term coined by Facebook to describe a single repository that contains all the code for a project.

It is a pattern that has been used by many large companies, including Google, Facebook, Twitter, and Microsoft. It is also used by many smaller companies, including GitHub, and by many open-source projects, including the Linux kernel.

It is frequently misinterpreted to mean "all of the software that we build", and I want to share some notes that clarify where monorepos succeed, and fail, in organisations of various sizes.

Where do monorepos work?

Monorepos work well in an inverse bell curve if productivity related to the size of the software and teams that you have:

  • when your repo is just really one app and a "component library" (...just the bits of the app in some bad directory layout)
  • when you have a very low number of apps you are coupling together via source control
  • when you have apps that either change super infrequently, or are all sharing dependencies that churn all the time that must be in lockstep.
  • when you've really just got "your app and a few associated tools" - that's very "same as it ever was" because so few repos ever had "just one tiny piece of a system" in them to start with.

Unfortunately, the zone of productivity for these organisational patterns - in my opinion - is a trap that folks fall into.

Most software doesn't fit those three categories mentioned above.

Software tends to moves at medium speed, with SME shaped teams - and in those situations monorepos are hell fraught with problems that only occur once you've opted in, wholesale, to that organisational structure.

Alternatives that match those probem spaces

In most of those cases:

  • when the software is really just one app - you should use directories instead of complicated build tools
  • when it's all for some shared libraries - you're going to reach a point where you want to version in distinctly because the blast radius of change is going to start hard coupling your teams together over time

It's trivially easy to end up in the bad place where teams end up with tightly coupled deployments that get extremely slow and have to be resolved with tools like nx that frequently take over your entire development workflow (bad!)

But the biggest red flag with them is obvious - we've been here before and it sucked!

Just an old solution

The first decade of my career before DVCS (distributed version control systems) was all effectively big monorepo source trees and it was absolutely horrible and fraught with the same coupling risks. So we changed!

Git is designed for narrower slices, and doing the monorepo dance in medium to large orgs with all your software will leave you inevitably fighting your tools, both in build, deployment, and source control scenarios.

The sane approach is this:

Software that versions together, deploys together and changes together, should be collocated.

In the case of the thin end of the wedge with web apps, this is often just "the app, a few shared libraries, and a backend admin thing, perhaps a few tools".

Monorepos are fine here! At least until you need to split ownership of those things between team boundaries where things creek.

TL;DR - This is all about Conway's Law and change frequency that charts the success of software organisation - and hard team coupling is more dangerous than software coupling.

Monorepos in massive organisations

Let's briefly talk about the other end of the spectrum - the massive organisations that have a lot of software and a lot of teams, and all claim to use monorepos. There are notable examples - Google, Facebook, Twitter, Microsoft, GitHub.

Firstly, none of those organisations use a monorepo as it is frequetly interpreted by smaller orgs and the community. It's easy to verify this, because they all operate open-source repositories that are public, and distinct from any internal monorepos they may have. What they do tend to have, is application centric repositories where a single application, and it's associated tools and libraries are colocated.

This makes absolute sense, and is no different from your existing non-monorepo.

In fact, the majority of the "famous monorepos" - Windows, the Linux kernel (which of course, isn't the same as "Linux"), and Facebook - all have entire tooling teams dedicated to making collaborating on them work, at scale, with the communities they serve. It's very important that you don't apply logic from organisations of a scale that you aren't, with resources that you do not have, to your own problem space without strong consideration.

If you don't have the budget for entire teams working on source control and collaboration, nor tens of thousands of developers to fit around your codebase - perhaps don't mimic the patterns of those who do.

Should I use a monorepo?

Application centric repositories with associated tools and libraries?

Yeah! Knock yourself out, makes lots of sense.

Putting all your applications, spread across multiple teams and ownership boundaries, into a single repository?

Absolutely not, this way leads to madness and coupling hell.