Monday, January 2, 2012

Why Many Small Source Trees Are Better Than a Single Large One

Now this is a nice bike shed, one where I've changed my mind over time, helped along by the rise of distributed version control systems and build systems like maven.

Essentially, I'm talking about:
workspace/               workspace/
 |_src/                   |_liba/
    |_liba/...   vs.      |  |_src/...
    |_libb/...            |_libb/
    |_prog/...            |  |_src/...
                          |_prog/
                             |_src/...
I used to advocate the left side:
Over time, though, I've seen a variety of problems with that model:
  • Checkouts become really big - so big that people will be reluctant to create new ones or even recreate an existing one. I've seen places where a mere checkout can take several hours.
  • Branching and merging can become expensive, thereby discouraging those essential operations or forcing people to invent shortcuts and hacks to deal with the expense.
  • It doesn't deal well with third party software: you are faced with either having to check in and carry around code that you very rarely modify, or you need to go to the right side in the diagram after all (or check in binaries, which is horrid).
  • It makes it too easy for developers to permeate API layers, since it's all out there. In the end you get the big ball of mud. Refactoring the ball of mud later on, already a sizable task on its own, is aggravated by the fact that branching and merging has become very expensive.
In other words, you will likely drown in technical debt.

The right side addresses the issues created by the left side:
  • Developers can limit the size of their checkouts to those portions of the code they are modifying, making both the checkouts and branching and merging a lot cheaper.
  • Every software component can be assigned a curator or an owner who can vet changes made. You could do this in the large tree also, but having completely separate entities allows you to simplify access controls and configuration, and make the version control systems work for you.
  • Third party software just becomes another repository and is treated essentially the same way as your own code.
Moving to the right side is not as easy is it appears though. To be successful, you will need a more sophisticated approach to building and tracking artifacts. Your build system will need to know how to merge your locally built artifacts with pre-built ones. Again, the maven build system provides somewhat of a template for doing this right, but also has some serious limitations - not the least of which is that it only really works for java builds and has little support for platform dependencies.

One of the goals of this blog is to describe how a comprehensive system of many small source code repositories can work:
  • I've already explained in my three part series why artifacts are important and how they can be built and released.
  • I need to explain how the artifacts will be versioned and tracked. For this, an artifact registry service will be introduced.
  • Finally, I need to explain how the build system needs to work to support all this.
Happy New Year!

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.