Fortified Bikesheds: Introducing Artifacts (part 2)

Wednesday, December 7, 2011

Introducing Artifacts (part 2)

In part 1, I introduced artifacts as being data sets derived from source and other artifacts. In this post, I'll dive into the build cycle.

In its simplest form, the build cycle consists of:

Checking out source code and noting the revision id of the checkout;
Retrieving all the artifacts the build depends on, noting their revision ids;
Performing the build, creating a new artifact;
Computing a revision id for that artifact, usually by hashing the source code revision id and the revision ids of all the included artifacts;
Publishing the artifact back into the artifact repository.

Now in practice, you will keep multiple artifact repositories, only one of which would contain the artifacts actually deployed into production or released to the customer. All the other repositories will contain artifacts in various stages of completion.

When you build your new artifact, your build configuration would specifiy which repositories to examine.

In some ways, it is similar to branching in a source code repository, and you might even wish to use the branch names to name your artifact repositories.

There is one important way artifact repositories differ from branches: you can't merge them.

In fact, if you do specify multiple repositories, and different versions of the same artifact are found, you have two choices:

You can define an order of preference;
You can fail the build.

You will choose an order of preference mainly to override the production artifacts with your newly built ones.

You should fail the builds if different versions of the same artifact are present in two non-production repositories. There is no sensible way to resolve this at build time. If you are really attempting to build an artifact that depends on two separate development streams, then those two streams need to be merged at the source code level, creating a single stream, and then you can use a single artifact repository for that stream.

A slightly different approach is to create a separate repository for every artifact build itself. This approach is supported by some automated build systems, for example TeamCity, via so-called artifact dependencies.

No matter which strategy you use, it will become quickly obvious that you will need some sort of artifact registry service to keep track of their existence and their contents.

How such a registry can work will be subject of part 3 of this series.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.

Story So Far

The goal of the first batch of posts is to describe a software build and release process which assumes that builds are not necessarily reproducible and expensive to perform, and also assumes a large number of independent development teams all working on some grand piece of software.

Good release management starts at the source, so the first few postings deal about source code control and change management, and how to mine the change data correctly.

Once we can reliably build stuff, we need to manage the build products, the artifacts, so they can be reused, tested and released.