Friday, February 10, 2012

Building an Artifact Registry Service (part 2)

In a previous post, I explained how to map changeset ids to a monotonically increasing build number. The basic motivation for this was to create an entity which was usable as a version number, but mapped to a unique source code configuration.

In this post, I'll build on this foundation and show how we can incrementally add changeset information as metadata attached to the build number.

Let's start with a single repo "A", and attach it to some sort of continuous build system.

It will see checkin Nr. 101, and produce build Nr. 1. We will associate checkin Nr. 101 to the build Nr. 1, saying "Build Nr 1 includes change Nr. 101".

Sometime later, checkin Nr. 102 occurs, and it will trigger build Nr. 2. Now, we associate change Nr. 102 to build Nr. 2, and then look at Nr. 102's ancestor, and notice that it has been built by build Nr 1. Now instead of including change Nr. 101, we will associate the build Nr. 1 to build Nr. 2, saying "Build Nr 2 includes build Nr. 1". The idea is that we can stop scanning for further changes at that point, since the previous build already includes all of them.

See how it works when three quick checkins happen in a row, and only at checkin Nr. 105 does the continuous build system kick in and produce build Nr. 3. Now our scan picks up changes Nr 103, 104 and 105 and includes them in build Nr. 3, but then notices that change Nr 102 is in build Nr 2, so it includes that in build Nr 3, and stops the scan.

The real kicker of this method is that we can re-use the "build includes build" relationship to express dependencies to other builds.

For example here: builds done in repo A use an artifact generated in builds done in Repo B. Say we have configured our continuous build system to kick off a build in repo A whenever a build in repo B finishes.

So while developers furiously check in changes, builds keep coming, and every time a build on repo A happens, it uses the latest build from repo B to build the final artifact from repo A. It behooves us to add the relationship that build Nr. 5 includes not only build Nr. 3 but also build Nr 4 from the other repository.

Now if we want to know what the difference between build Nr. 3 and build Nr. 5 is, we can simply start by following all the arrows from build Nr. 3 and cross off all the places we traverse, which would be builds Nr. 1 and 2 and changes Nr. 101, 102 and 201. Then we start at build Nr. 5 and start following arrows until we hit something we've already seen: This would be build Nr. 4 and changes Nr 103, 104, 105 and 202 and 203.

Now let's assume nothing happens in repo A, but two more changes get put into repo B. This produces build Nr 6, which then kicks off a build on repo A.

This should create a build Nr 7, as shown. It is distinct from the previous build, as it uses a new artifact built on repo B, so a rebuild must occur even though nothing changed in repo A.

This shows that once we use dependent builds like this, we cannot simply map the changeset id (i.e. the number 105) to a build number, but we must use a hash composed of the changeset id of the repo where the build is occurring and all the build numbers the build depends on. In this case we would use "Changeset 105" + "Build 4" to create the hash that maps to build Nr. 5, and subsequently "Changeset 105" + "Build 6" to map to build Nr 7.

Nothing changes in our "Find the delta between two builds" algorithm described above, it will correctly determine that the difference between build Nr. 5 and Nr. 7 are the changesets 204 and 205.

The beauty of this method is that it scales naturally to complex dependency graphs, and will allow mixing and matching of arbitrary builds from arbitrary repositories, as long as a unique identifier can be used to correctly locate the changeset used in the build in every repository.

In part 3, I'll be talking about additional metadata which we may wish to include in our artifact repository service, and how that service can become the cornerstone of your release management organization.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.