Friday, June 5, 2015

Positive and Negative Build Avoidance

I'm going to define "positive" build avoidance to be avoiding a rebuild when it's already built. "Negative" build avoidance is avoiding to rebuild something that has failed to build before.
Positive build avoidance has been around for quite some time, and is usually easy to implement: simply check if the target artifact exists and has been created from your current source set. This can be as naive as make's timestamp check (if it's newer, then it has to be from the current source) to more sophisticated checksum or hash signature checks.
I don't know of any system that does negative build avoidance, so I'm building one.
I happen to already have a system which does positive build avoidance by storing build artifacts using a version computed from the git tree hashes of all its source components. I recently added an artifact that gets published on every build, no matter whether it failed or succeeded: the build log. Besides impressing auditors, this actually allows me to implement negative build avoidance:
  • If all artifacts are present, positive build avoidance as usual, no need to rebuild;
  • If artifacts are missing, but the build log artifact is present, negative build avoidance occurs, and I do not even bother to attempt to redo a build that is known to fail;
  • If artifacts are missing and the build log is missing, then the build either didn't occur or crashed in the middle. In that case, schedule the build.
The nice thing about implementing negative build avoidance is that it allows me to deal with unstable build farms (and any Jenkins build farm of substantial side is bound to be unstable). Simply keep recomputing the build schedule after every pass until every build either succeeded or failed with a build log. Makes the build team look good, since all build failures are now clearly the developer's fault.

Sunday, January 4, 2015

Use Git to Manage Build Artifacts (Part 2)

In part one, I described how we can and should be computing the version strings for build artifact from  the hashes of the source files used to build them. Now we need to devise a way to implement this strategy.

The maven build system, in spite of its many shortcomings, does have the right idea: the POM (Project Object Model).
  • It's intended to be declarative. You describe the artifacts and their dependencies.
Sadly, the implementation really does have a lot of shortcomings:
  • It's in XML, making it very tedious to read and manipulate.
  • It is too java centric, and generally too much concerned with java specific implementation details
  • In spite of initially being declarative, it has too many procedural details, mainly around managing versions, which is the one thing we wish to avoid here.
In the spirit of taking the best parts and leaving behind the bad parts, I decided to implement a POM-like document: the Bill of Materials.
  • It's a YAML (Yet Another Markup Language) file, which hopefully is easier to read than an XML file.
  • It simply lists artifacts and their dependencies. 
  • I very explicitly separate out any build procedural details by simply referencing the build scripts explicitly as an artifact property. In other words: "I don't care how you produce the artifact, just tell me where it is when the build script is done".
In it's simplest form, a bill of materials file looks like this:

- GroupId: com.myself
  ArtifactId: myartifact1
  BuiltBy: mybuildscript
  BuiltFrom: # declare where the sources are
    - some/shared/source/directory
    - source/directory/for/myartifact1
  SourceFile: build/output/file1 # this is where the artifact ends up

- GroupId: com.myself
  ArtifactId: myartifact2
  BuiltBy: mybuildscript
  BuiltFrom:
    - some/shared/source/directory
    - source/directory/for/myartifact2
  SourceFile: build/output/file2

- GroupId: com.myself
  ArtifactId: myartifact3
  BuiltBy: mybuildscript
  BuiltFrom:
    - some/shared/source/directory
    - source/directory/for/myartifact3
  SourceFile: build/output/file3
Since YAML is a hierarchical format, it offers a straight forward way to factor out repetition. Just declare the shared attributes at a higher level:
GroupId: com.myself       # Items in this section are valid
BuiltBy: mybuildscript    # for all artifacts
Artifacts:

  - ArtifactId: myartifact1 # Items here are only valid here
    BuiltFrom:
      - some/shared/source/directory
      - source/directory/for/myartifact1
    SourceFile: build/output/file1

  - ArtifactId: myartifact2
    BuiltFrom:
      - some/shared/source/directory
      - source/directory/for/myartifact2
    SourceFile: build/output/file2

  - ArtifactId: myartifact3
    BuiltFrom:
      - some/shared/source/directory
      - source/directory/for/myartifact3
    SourceFile: build/output/file3 
The values of the attributes can be used to define other values, using the ${Attribute} syntax:
GroupId: com.myself
BuiltBy: mybuildscript
BuiltFrom:
  - some/shared/source/directory
Artifacts:
 
  - ArtifactId: myartifact1
    BuiltFrom:
      - source/directory/for/${ArtifactId}
    SourceFile: build/output/file1
  - ArtifactId: myartifact2
    BuiltFrom:
      - source/directory/for/${ArtifactId}
    SourceFile: build/output/file2
  - ArtifactId: myartifact3
    BuiltFrom:
      - source/directory/for/${ArtifactId}
    SourceFile: build/output/file3
Doing this opens up more refactoring opportunities: note that the ${Attribute} are evaluated after the itemized list is constructed, so it is totally ok to reference a ${Attribute} even if it is not defined at that same level: 
GroupId: com.myself
BuiltBy: mybuildscript
BuiltFrom:
  - some/shared/source/directory
  - source/directory/for/${ArtifactId}
Artifacts:

  - ArtifactId: myartifact1
    SourceFile: build/output/file1

  - ArtifactId: myartifact2
    SourceFile: build/output/file2

  - ArtifactId: myartifact3
    SourceFile: build/output/file3
I have found it convenient to have two ways to declare dependencies between artifacts.
  • Declare upstream dependencies in the classic maven way, by saying Requires: <artifact>. This method is useful for the classic shared code dependencies.
  • Declare downstream dependencies using a DeployTo: entry. This method is useful for build flow dependencies, for example to aggregate and validate build results and test results, or to bundle a bunch of individual pieces into an installer.
Example of the first type: refactor code to create a separate artifact for the shared code portion:
GroupId: com.myself
Groups:
  - BuiltBy: mylibrarybuildscript
    BuiltFrom:
      - some/shared/source/directory
    Artifacts:
 
      - ArtifactId: mylibraryartifact
        SourceFile: build/output/library
 
  - BuiltBy: mybuildscript
    Requires: ${GroupId}:mylibraryartifact
    BuiltFrom: # <- this applies to all artifacts listed beneath
      - source/directory/for/${ArtifactId}
    Artifacts:
 
      - ArtifactId: myartifact1
        SourceFile: build/output/file1
 
      - ArtifactId: myartifact2
        SourceFile: build/output/file2
 
      - ArtifactId: myartifact3
        SourceFile: build/output/file3
Example of the second type:  feed a validation task to aggregate all the build results:
GroupId: com.myself
Groups:
  - ArtifactId: manifest
    BuiltBy: validate
    SourceFile: manifest

  - DeployTo: 
# <- this applies to all artifacts listed beneath
     - Downstream: validate
    SubGroups:
     - BuiltBy: mylibrarybuildscript
       BuiltFrom:
         - some/shared/source/directory
       Artifacts:

         - ArtifactId: mylibraryartifact
           SourceFile: build/output/library

     - BuiltBy: mybuildscript
       Requires: ${GroupId}:mylibraryartifact
       BuiltFrom:
         - source/directory/for/${ArtifactId}
       Artifacts:

         - ArtifactId: myartifact1
           SourceFile: build/output/file1

         - ArtifactId: myartifact2
           SourceFile: build/output/file2

         - ArtifactId: myartifact3
           SourceFile: build/output/file3

In spite of YAML being a very simple format, it is powerful enough to allow relatively compact represention of long lists of artifacts.

In part three, I will describe how I use the information in the bill of materials to generate a build plan and the corresponding jenkins job definitions to execute the plan.