Fortified Bikesheds: 2015

I'm going to define "positive" build avoidance to be avoiding a rebuild when it's already built. "Negative" build avoidance is avoiding to rebuild something that has failed to build before.

Positive build avoidance has been around for quite some time, and is usually easy to implement: simply check if the target artifact exists and has been created from your current source set. This can be as naive as make's timestamp check (if it's newer, then it has to be from the current source) to more sophisticated checksum or hash signature checks.

I don't know of any system that does negative build avoidance, so I'm building one.

I happen to already have a system which does positive build avoidance by storing build artifacts using a version computed from the git tree hashes of all its source components. I recently added an artifact that gets published on every build, no matter whether it failed or succeeded: the build log. Besides impressing auditors, this actually allows me to implement negative build avoidance:

If all artifacts are present, positive build avoidance as usual, no need to rebuild;
If artifacts are missing, but the build log artifact is present, negative build avoidance occurs, and I do not even bother to attempt to redo a build that is known to fail;
If artifacts are missing and the build log is missing, then the build either didn't occur or crashed in the middle. In that case, schedule the build.

The nice thing about implementing negative build avoidance is that it allows me to deal with unstable build farms (and any Jenkins build farm of substantial side is bound to be unstable). Simply keep recomputing the build schedule after every pass until every build either succeeded or failed with a build log. Makes the build team look good, since all build failures are now clearly the developer's fault.

In part one, I described how we can and should be computing the version strings for build artifact from the hashes of the source files used to build them. Now we need to devise a way to implement this strategy.

The maven build system, in spite of its many shortcomings, does have the right idea: the POM (Project Object Model).

It's intended to be declarative. You describe the artifacts and their dependencies.

Sadly, the implementation really does have a lot of shortcomings:

It's in XML, making it very tedious to read and manipulate.
It is too java centric, and generally too much concerned with java specific implementation details
In spite of initially being declarative, it has too many procedural details, mainly around managing versions, which is the one thing we wish to avoid here.

In the spirit of taking the best parts and leaving behind the bad parts, I decided to implement a POM-like document: the Bill of Materials.

It's a YAML (Yet Another Markup Language) file, which hopefully is easier to read than an XML file.
It simply lists artifacts and their dependencies.
I very explicitly separate out any build procedural details by simply referencing the build scripts explicitly as an artifact property. In other words: "I don't care how you produce the artifact, just tell me where it is when the build script is done".

In it's simplest form, a bill of materials file looks like this:

- GroupId: com.myself
ArtifactId: myartifact1
BuiltBy: mybuildscript
BuiltFrom: # declare where the sources are
- some/shared/source/directory
- source/directory/for/myartifact1
SourceFile: build/output/file1 # this is where the artifact ends up

- GroupId: com.myself
ArtifactId: myartifact2
BuiltBy: mybuildscript
BuiltFrom:
- some/shared/source/directory
- source/directory/for/myartifact2
SourceFile: build/output/file2

- GroupId: com.myself
ArtifactId: myartifact3
BuiltBy: mybuildscript
BuiltFrom:
- some/shared/source/directory
- source/directory/for/myartifact3
SourceFile: build/output/file3

Since YAML is a hierarchical format, it offers a straight forward way to factor out repetition. Just declare the shared attributes at a higher level:

GroupId: com.myself # Items in this section are valid
BuiltBy: mybuildscript # for all artifacts
Artifacts:

- ArtifactId: myartifact1 # Items here are only valid here
BuiltFrom:
- some/shared/source/directory
- source/directory/for/myartifact1
SourceFile: build/output/file1

- ArtifactId: myartifact2
BuiltFrom:
- some/shared/source/directory
- source/directory/for/myartifact2
SourceFile: build/output/file2

- ArtifactId: myartifact3
BuiltFrom:
- some/shared/source/directory
- source/directory/for/myartifact3
SourceFile: build/output/file3

The values of the attributes can be used to define other values, using the ${Attribute} syntax:

GroupId: com.myself
BuiltBy: mybuildscript
BuiltFrom:
- some/shared/source/directory
Artifacts:

- ArtifactId: myartifact1
BuiltFrom:
- source/directory/for/${ArtifactId}
SourceFile: build/output/file1
- ArtifactId: myartifact2
BuiltFrom:
- source/directory/for/${ArtifactId}
SourceFile: build/output/file2
- ArtifactId: myartifact3
BuiltFrom:
- source/directory/for/${ArtifactId}
SourceFile: build/output/file3

Doing this opens up more refactoring opportunities: note that the ${Attribute} are evaluated after the itemized list is constructed, so it is totally ok to reference a ${Attribute} even if it is not defined at that same level:

GroupId: com.myself
BuiltBy: mybuildscript
BuiltFrom:
- some/shared/source/directory
- source/directory/for/${ArtifactId}
Artifacts:

- ArtifactId: myartifact1
SourceFile: build/output/file1

- ArtifactId: myartifact2
SourceFile: build/output/file2

- ArtifactId: myartifact3
SourceFile: build/output/file3

I have found it convenient to have two ways to declare dependencies between artifacts.

Declare upstream dependencies in the classic maven way, by saying Requires: <artifact>. This method is useful for the classic shared code dependencies.
Declare downstream dependencies using a DeployTo: entry. This method is useful for build flow dependencies, for example to aggregate and validate build results and test results, or to bundle a bunch of individual pieces into an installer.

Example of the first type: refactor code to create a separate artifact for the shared code portion:

GroupId: com.myself
Groups:
- BuiltBy: mylibrarybuildscript
BuiltFrom:
- some/shared/source/directory
Artifacts:

- ArtifactId: mylibraryartifact
SourceFile: build/output/library

- BuiltBy: mybuildscript
Requires: ${GroupId}:mylibraryartifact
BuiltFrom: # <- this applies to all artifacts listed beneath
- source/directory/for/${ArtifactId}
Artifacts:

- ArtifactId: myartifact1
SourceFile: build/output/file1

- ArtifactId: myartifact2
SourceFile: build/output/file2

- ArtifactId: myartifact3
SourceFile: build/output/file3

Example of the second type: feed a validation task to aggregate all the build results:

GroupId: com.myself
Groups:
- ArtifactId: manifest
BuiltBy: validate
SourceFile: manifest

- DeployTo: # <- this applies to all artifacts listed beneath
- Downstream: validate
SubGroups:
- BuiltBy: mylibrarybuildscript
BuiltFrom:
- some/shared/source/directory
Artifacts:

- ArtifactId: mylibraryartifact
SourceFile: build/output/library

- BuiltBy: mybuildscript
Requires: ${GroupId}:mylibraryartifact
BuiltFrom:
- source/directory/for/${ArtifactId}
Artifacts:

- ArtifactId: myartifact1
SourceFile: build/output/file1

- ArtifactId: myartifact2
SourceFile: build/output/file2

- ArtifactId: myartifact3
SourceFile: build/output/file3

In spite of YAML being a very simple format, it is powerful enough to allow relatively compact represention of long lists of artifacts.

In part three, I will describe how I use the information in the bill of materials to generate a build plan and the corresponding jenkins job definitions to execute the plan.

Fortified Bikesheds

Friday, June 5, 2015

Positive and Negative Build Avoidance

Sunday, January 4, 2015

Use Git to Manage Build Artifacts (Part 2)