Tuesday, May 15, 2012

Basic Change Process Using Git

Very early on, I explained the basic change process in a diagram that's agnostic to the version control system. Git does have an interesting particularity that merits mention: the fast forward.

The motivation for this discussion came out of some questions around the promotion model, and what a "promotion" actually looks like in practice. As a reminder, a basic premise of the promotion model is that we use a stabilization branch to deploy from, and use master to aggregate all the new stuff.

One concern was that by merging master into the stabilization branch, we would create merge conflicts, and generally not be guaranteed to get the same code. If you first merge from the stabilization branch into master, then merge back, you shouldn't get that in any version control system.

With git though, you get a bonus:

We start with a series of checkins on the main branch. In git, a branch is a linked list of revisions, the root of which is stored in the "head" of the branch, here in dark color.
When you create a new branch, you essentially create a new head, and point it to the branch point revision. The following command creates the branch and "places" you at the head of it.
git checkout -b branch

Now if you check in a commit, you create a new revision. This revision will have a back pointer to the parent revision owned by the branch, and head will point to your new revision:
    vi somefile ...
    git commit \
      -a -m 'made change A'

Meanwhile, ongoing work hasn't stopped:
git checkout master
vi somefile ...
git commit -a -m 'made change B'

Now assume you wish to "promote" master into the branch. You start by merging branch into master:
 git merge \
    --no-commit branch
 # resolve conflicts
 git commit \
    -a -m 'Merged branch'

Now comes the git magic:
git checkout branch
git merge master
Nothing happens except that the head of branch is fast forwarded to the head of master. So we actually promoted the exact same revision we initially merged from the branch. Most other version control systems will create a new revision with a copy of the same content.

If you want, you can emulate that behavior in git:
 git checkout branch
 git merge \
    --no-ff master
This will create the copy and emulate the behavior of lesser version control systems - but why would you want to?

The bottom line is that fast forwarding makes the two models I described in my promotion post topologically equivalent. The only thing that changes between the two models is whether you reuse the same branch name or create new branch names. Topologically, a new branch will get extruded every time you do the merge down / merge back combination, no matter what model you choose:

Monday, May 14, 2012

Version Numbers Are Evil

Version numbers are a perfect example of Bikeshed. Everybody gets them, everybody will have something to say about them. Most importantly though: they hardly matter.

Some fairly famous companies like Microsoft and  Apple have been seen toying around with ideas to de-emphasize them. Hence we had code names (Longhorn, Lion...) and time stamps (Windows 95), but they are truly hard to kill (IE 9)...

Unfortunately they've been around for quite some time and are ingrained in our software engineering lore.

Back in the days where you released once a year or so, or maybe once a quarter, tracking version numbers was a minor hassle, compared to the huge size of the changes and the possible impact of a new release - promptly discouraging anyone from upgrading, which in turn promptly made fixing bugs even harder: not only did you have to fix the current version, but all the "supported" ones, including perhaps some unsupported ones if the customer was important enough...

Those days are thankfully fading. Instead, we have software as a service. Ever asked what version of gmail you're using? or Facebook? doesn't make sense, does it? Not like you have a choice...

Still, version numbers haunt many of the modern build tools and dependency management tools.

I guess there is some satisfaction in exercising positive control by updating all consumers of your toolkit with the dependency to your latest version, but in practice, it's a nightmare. Not only is there a lot of error prone labor involved, but you also encourage "mix and match", and general procrastination on bugs: "Oh, the latest version breaks my app, so I'm going to stick to the older version". "Oh, the latest fixes a security hole? Well, I hope I won't get hacked"...

The smart folks who wrote the Advanced Packaging Tool (also know as aptitude, or apt-get) realized quickly that direct dependency management could not possibly function for such a complex beast as a Linux distro. They strongly discourage explicit versions in dependencies, as shown in their many examples.

The  maven build system makes a very slight concession to the idea of version numbers being fluid via their -SNAPSHOT construct. It's unfortunately very inadequate, since there is no good way to relate to exactly which version was used once it was built.

Ivy fares slightly better: you can specify wildcards that will be resolved at build time. Still, you are stuck with a linear version space, when you really need a true build chain:

In most artifact repository systems, you can emulate this by creating branch or build specific channels or instances of a repository.
"But are you nuts? How can you know what you built against?"
You use a build number instead of a version number. The point is that it's automatically generated. You are using some continuous build system, are you? If not, get one. Jenkins is OK, TeamCity is really good, but costs money. Don't even consider testing or deploying manually built stuff.

As a compromise, append the build number to a manually maintained version number if you must, until you realize that you never really want to change the manually maintained portion unless someone prods you...
"But are you nuts? How is your app going to co-exist with my app if we depend on different base libraries?"
Build assemblies - or if you're in C/C++ land, use static linking, or, if you must, package your application so that it looks up its shared libraries in a private location.

"But are you nuts? Are you really going to force me to make compatible changes in shared libraries?"
Yes, I will. It's shared for a reason. If the interface is so crummy that you cannot derive the right functionality, create a new one that is, but don't break the existing one. Modern languages have plenty of ways to extend interfaces without breaking existing code:
  • Optional arguments with default values
  • New methods
  • Subclassing
  • Traits
  •  ...
Folks should remember that one big purpose of having a shared library is so that you can affect all consumers of the library by a single change, and don't have to go edit every application. The only way to make good on that contract is if every application maintainer stays up to date.