Sunday, January 4, 2015

Use Git to Manage Build Artifacts (Part 2)

In part one, I described how we can and should be computing the version strings for build artifact from  the hashes of the source files used to build them. Now we need to devise a way to implement this strategy.

The maven build system, in spite of its many shortcomings, does have the right idea: the POM (Project Object Model).
  • It's intended to be declarative. You describe the artifacts and their dependencies.
Sadly, the implementation really does have a lot of shortcomings:
  • It's in XML, making it very tedious to read and manipulate.
  • It is too java centric, and generally too much concerned with java specific implementation details
  • In spite of initially being declarative, it has too many procedural details, mainly around managing versions, which is the one thing we wish to avoid here.
In the spirit of taking the best parts and leaving behind the bad parts, I decided to implement a POM-like document: the Bill of Materials.
  • It's a YAML (Yet Another Markup Language) file, which hopefully is easier to read than an XML file.
  • It simply lists artifacts and their dependencies. 
  • I very explicitly separate out any build procedural details by simply referencing the build scripts explicitly as an artifact property. In other words: "I don't care how you produce the artifact, just tell me where it is when the build script is done".
In it's simplest form, a bill of materials file looks like this:

- GroupId: com.myself
  ArtifactId: myartifact1
  BuiltBy: mybuildscript
  BuiltFrom: # declare where the sources are
    - some/shared/source/directory
    - source/directory/for/myartifact1
  SourceFile: build/output/file1 # this is where the artifact ends up

- GroupId: com.myself
  ArtifactId: myartifact2
  BuiltBy: mybuildscript
  BuiltFrom:
    - some/shared/source/directory
    - source/directory/for/myartifact2
  SourceFile: build/output/file2

- GroupId: com.myself
  ArtifactId: myartifact3
  BuiltBy: mybuildscript
  BuiltFrom:
    - some/shared/source/directory
    - source/directory/for/myartifact3
  SourceFile: build/output/file3
Since YAML is a hierarchical format, it offers a straight forward way to factor out repetition. Just declare the shared attributes at a higher level:
GroupId: com.myself       # Items in this section are valid
BuiltBy: mybuildscript    # for all artifacts
Artifacts:

  - ArtifactId: myartifact1 # Items here are only valid here
    BuiltFrom:
      - some/shared/source/directory
      - source/directory/for/myartifact1
    SourceFile: build/output/file1

  - ArtifactId: myartifact2
    BuiltFrom:
      - some/shared/source/directory
      - source/directory/for/myartifact2
    SourceFile: build/output/file2

  - ArtifactId: myartifact3
    BuiltFrom:
      - some/shared/source/directory
      - source/directory/for/myartifact3
    SourceFile: build/output/file3 
The values of the attributes can be used to define other values, using the ${Attribute} syntax:
GroupId: com.myself
BuiltBy: mybuildscript
BuiltFrom:
  - some/shared/source/directory
Artifacts:
 
  - ArtifactId: myartifact1
    BuiltFrom:
      - source/directory/for/${ArtifactId}
    SourceFile: build/output/file1
  - ArtifactId: myartifact2
    BuiltFrom:
      - source/directory/for/${ArtifactId}
    SourceFile: build/output/file2
  - ArtifactId: myartifact3
    BuiltFrom:
      - source/directory/for/${ArtifactId}
    SourceFile: build/output/file3
Doing this opens up more refactoring opportunities: note that the ${Attribute} are evaluated after the itemized list is constructed, so it is totally ok to reference a ${Attribute} even if it is not defined at that same level: 
GroupId: com.myself
BuiltBy: mybuildscript
BuiltFrom:
  - some/shared/source/directory
  - source/directory/for/${ArtifactId}
Artifacts:

  - ArtifactId: myartifact1
    SourceFile: build/output/file1

  - ArtifactId: myartifact2
    SourceFile: build/output/file2

  - ArtifactId: myartifact3
    SourceFile: build/output/file3
I have found it convenient to have two ways to declare dependencies between artifacts.
  • Declare upstream dependencies in the classic maven way, by saying Requires: <artifact>. This method is useful for the classic shared code dependencies.
  • Declare downstream dependencies using a DeployTo: entry. This method is useful for build flow dependencies, for example to aggregate and validate build results and test results, or to bundle a bunch of individual pieces into an installer.
Example of the first type: refactor code to create a separate artifact for the shared code portion:
GroupId: com.myself
Groups:
  - BuiltBy: mylibrarybuildscript
    BuiltFrom:
      - some/shared/source/directory
    Artifacts:
 
      - ArtifactId: mylibraryartifact
        SourceFile: build/output/library
 
  - BuiltBy: mybuildscript
    Requires: ${GroupId}:mylibraryartifact
    BuiltFrom: # <- this applies to all artifacts listed beneath
      - source/directory/for/${ArtifactId}
    Artifacts:
 
      - ArtifactId: myartifact1
        SourceFile: build/output/file1
 
      - ArtifactId: myartifact2
        SourceFile: build/output/file2
 
      - ArtifactId: myartifact3
        SourceFile: build/output/file3
Example of the second type:  feed a validation task to aggregate all the build results:
GroupId: com.myself
Groups:
  - ArtifactId: manifest
    BuiltBy: validate
    SourceFile: manifest

  - DeployTo: 
# <- this applies to all artifacts listed beneath
     - Downstream: validate
    SubGroups:
     - BuiltBy: mylibrarybuildscript
       BuiltFrom:
         - some/shared/source/directory
       Artifacts:

         - ArtifactId: mylibraryartifact
           SourceFile: build/output/library

     - BuiltBy: mybuildscript
       Requires: ${GroupId}:mylibraryartifact
       BuiltFrom:
         - source/directory/for/${ArtifactId}
       Artifacts:

         - ArtifactId: myartifact1
           SourceFile: build/output/file1

         - ArtifactId: myartifact2
           SourceFile: build/output/file2

         - ArtifactId: myartifact3
           SourceFile: build/output/file3

In spite of YAML being a very simple format, it is powerful enough to allow relatively compact represention of long lists of artifacts.

In part three, I will describe how I use the information in the bill of materials to generate a build plan and the corresponding jenkins job definitions to execute the plan.