Sunday, December 4, 2011

Introducing Artifacts (part 1)

Every software project will eventually include a build process, producing new files from existing files.

Even in the simplest of all cases, for example a static web site, it makes sense to consider the process of copying new files into the doc root as a build process. Very quickly, this build process might grow to include html verification, link validation, and as the site grows, page generation from templates etc...
The generated files are generally called Artifacts. I don't know where the term originated. I first saw it when examining the maven build system.

Artifacts can be anything: executable binaries, packages, simple file sets, machine images. The point of using the term Artifact is to focus on the metadata, and disregard the actual content or shape of the Artifact.

Another property of artifacts is that they will usually depend on other artifacts. Most sites grow into using artifacts organically by starting to identify and catalogue their dependencies to third party utilities. These can be libraries, tool chains, operating system dependencies - anything that might affect your build and your product. 

The next step for growth is to treat your own artifacts as if they were third party artifacts.

A typical SaaS application might wish to define the system as a tree of artifact dependencies, starting with an artifact representing the whole cluster of hosts, each depending on host image artifacts, which in turn depend on application artifacts, which pull in configuration artifacts and so on...

Most shops end up treating artifacts as something incidental. They might store some third party artifacts in their version control system, or simply install them onto their build machines, and systematically rebuild their own artifacts from scratch whenever they need a new build. This is not only time consuming, but it also misses an opportunity to apply one of the antidotes to counter the objections against applying the Basic Change Process.

Treating every artifact you build as a precious, reusable entity, you have the opportunity to split your codebase into independent small pieces, each branched separately. This will only work, of course, if you have a good system for storing and tracking artifacts.

There are some systems out there built for storing and tracking artifacts:
  • Ivy interacts with the ant build system and provides some basic functionality
  • Maven of course has artifact management built right into it
Even though both tools are adequate, they do suffer from shortcomings, as we will see in a followup post.

The diagram on the right illustrates the build cycle for artifacts.

A new artifact is built from both a source tree and the artifacts it depends on. The build system would retrieve them as needed from an artifact repository and construct a new one, which then gets published back into the repository, to be used by other artifact builds.

The challenges here are:
  • How to track what went into building an artifact
  • How to select which dependency to pull in
  • How to coordinate source code control over multiple artifacts when releasing
I will go into these challenges in part 2 of this post.






No comments:

Post a Comment

Note: Only a member of this blog may post a comment.