Monday, January 2, 2012

What Is It With "maven"?

Maven is a java based build system, mainly geared towards java development. It follows the convention over configuration pattern and therefore features relatively compact project description files for every artifact to be built. The source tree layout and build process is assumed and inferred, although it can be overridden via configuration, if you must.

The outstanding feature of Maven is the support for prebuilt artifacts. In fact, maven bootstraps itself using the same mechanism used during builds and pulls most of itself from the cloud.

Maven artifacts have an associated metadata file: pom.xml. The file originates in the project source tree building that artifact and is copied into various artifact repositories used during a build.

Maven artifacts are defined and referenced by id and version. Dependencies between artifacts are listed in the pom.xml file. Maven will compute the transitive dependencies and ensure they are available for resolution when needed.

Maven is a system I very much want to like. It implements a build system usable in the many small source trees model of development and just generally provides a nice framework for build automation and reporting. Unfortunately, it has a couple of limitations that make it unsuitable as a general purpose build framework:
  • Being java based, it assumes platform independence. It is very hard to adapt it to build C/C++ projects and artifacts.
  • The artifact dependencies are required to have explicit versions. This requires frequent updating of the pom.xml files as multiple artifacts are being rebuilt.
The latter objection is particularly sad. Maven does provide some mechanisms to mitigate the problem:
  • The versions can be specified in a "Master POM" file via the DependencyManagement section. The drawback is that the master POM needs to be available at build time, and you still need to edit it and deal with merges etc.
  • Maven has a special version identifier called SNAPSHOT. This can be used in the suffix of any version specification and means essentially: get me the latest build you can find.
Maven also provides ways to search multiple repositories, so that you can control the propagation of SNAPSHOT builds through your team, similar to the way I described the artifact build process. Unfortunately, maven appears to just implement a simple search list, whereas I would like to be able to flag an error if different versions of the same artifact are offered in two separate repositories.

Now even though SNAPSHOT dependencies resolve the immediate problem of developers having to edit the dependencies every time, it does create a new problem: what to do at release time. The infamous release plugin addresses it by essentially mass updating all the project files to include the release version, rebuild, then edit them again to reflect the next version to be developed (usually a new SNAPSHOT version). This is bad for many reasons:
  • The full source tree needs to be available for this to work. Alternatively you can attempt to release every artifact separately, but then the builds of dependent artifacts need to somehow know the right version, requiring some central place accessible to all artifact builds.
  • Forcing a rebuild just to use the new version numbers assumes that builds are trivially reproducible. This is a very difficult requirement to meet in practice. Alternatively, you can produce release candidate builds using the release plugin, and simply throw them away if they do not pass muster, but these builds are then different from development builds, and again you would end up rebuilding many artifacts just for the pleasure of branding them with a new version.
Now there would be a relatively simple way to fix this:
  • Use SNAPSHOT as a placeholder for a build id, generated at build time, and use the resulting version number in the artifact name. This is in fact how some artifact repository systems work, but unfortunately this isn't exposed. Most people use a timestamp, as it is as good as anything to instill a reasonable ordering of builds.
  • Resolve your SNAPSHOT dependencies by picking the latest build from your list of repositories, and dump out a DependencyManagement section with the resolved versions as part of your build result. This can be used in case someone wishes to attempt a repro of that build.
  • Leave the versions in the pom.xml files alone, unless you wish to express an API incompatibility be requiring a new minimal version number.
This way, any build can in theory be a releasable build, and you save a lot of unnecessary edits of the pom.xml files.

No comments:

Post a Comment