Cowboy in the desert.

Wanted: a universal application packaging format for .NET

At its simplest, the building blocks of any automated deployment or continuous delivery solution are:

Build and deployment processes

Breaking it down, what I've called "buildable artifacts" usually means source code, which should be housed in a version control system. A build process takes those artifacts, and produces new artifacts: artifacts which are ready to be deployed with a deployment process. These artifacts are typically binaries, usually packaged into some kind of format, and stamped with metadata explaining the contents.

This idea of decoupling the build process from the deployment process is important. In their book Continuous Delivery, Jez and David argue that you should "only build your binaries once":

Many build systems use the source code held in the version control system as the canonical source for many steps. The code will be compiled repeatedly in different contexts: during the commit process, again at acceptance test time, again for capacity testing, and often once for each separate deployment target. Every time you compile the code, you run the risk of introducing some difference. The version of the compiler installed in the later stages may be different from the version that you used for your commit tests. You may pick up a different version of some third-party library that you didn't intend. Even the configuration of the compiler may change the behavior of the application. We have seen bugs from every one of these sources reaching production.

Artifacts

Once we've built our code, we'll have a set of binaries ready to be deployed. If we're going to be re-deploying those binaries multiple times, we need to store them in a way that we can easily go back and find them. No doubt in your travels you've come across a file share with folders that looks something like this:

  • Copy of WebApp
  • Copy of WebApp (2)
  • WebApp
  • WebApp1
  • WebApp-2012-06-19
  • WebApp-Production
  • WebApp-Mike (DO NOT DELETE THIS IS PRODUCTION)

Properly maintaining our deployable artifacts is essential if we hope to come close to repeatable, automated, low-risk deployments. We need to put some thought into how these artifacts are structured.

Perhaps the most common artifact format is to compress the files into an archive, and to stamp the file name with a date or version number. For example, this would be a far better solution to the above:

  • WebApp-1.0.0.zip
  • WebApp-1.0.1.zip
  • WebApp-1.0.2.zip
  • WebApp-1.0.3.zip

An artifact might just contain the files needed for deployment. The version number or timestamp can then be used to find out more information about the contents, for example, the list of changes included in the artifact, who built it and when, the issues fixed, and so on.

In an ideal world, this "metadata" could be stored inside of the artifact, so that the artifact is self-describing. That's what specifications like the NuGet .nupkg file format do - they contain not just the raw binaries/files needed for deployment, but also metadata to describe them. That's why we call it a packaging format instead of "just another zip file".

Examples of common self-describing artifacts on the Windows platform are NuGet packages and Windows installer MSI files.

Artifact repositories

Once we have these self-describing artifacts, the next question is, where do we keep them? In the Continuous Delivery book, this place is called an artifact repository:

It is a key resource which stores the binaries, reports, and metadata for each of your release candidates.

There are a ton of options, from file shares to FTP servers to NuGet servers to dedicated products like Nexus.

The purpose of an artifact repository is to:

  • Provide a single place to keep your artifacts
  • Provide a quick way to find and retrieve artifacts by version

Imagine an enterprise where hundreds of applications are being built every day, producing thousands of artifacts. The ability to find and retrieve the right artifact is going to be critical for deployments to be successful.

How this applies in the .NET space

When I designed Octopus Deploy, I really wanted to stick to this goal of separating build and deployment; Octopus is an automated deployment system, not an automated build system. So the way we consume artifacts is very important.

Now imagine an environment with a large number of projects being built on a daily basis:

  • WPF applications
  • ASP.NET web applications that will run on an intranet
  • Azure cloud services
  • Azure websites
  • Windows services
  • Node.js applications

Which artifact format is going to be the best for packaging all of these application types?

ASP.NET applications can be packaged into MSDeploy packages from Visual Studio by right clicking and choosing Package or Publish. They can also be packaged from the command line by using a series of incantations and goat sacrifice. MSDeploy packages aren't very rich in metadata though (there's no version number in the package format, for example).

Azure cloud services are packaged into .cspkgs, another ZIP based file format with limited metadata. Windows Services and desktop applications could be packaged as MSI's.

Unfortunately, all of these formats come with problems. MSI's have version numbers, but they aren't ideal for many application types. MSDeploy packages are great for ASP.NET applications, but there's no easy way to create them for desktop applications, and they don't have version numbers. Azure cloud service packages again lack a version number.

NuGet was the breakthrough that allowed Octopus Deploy to happen, because it provided three things:

  • A package format that was self describing, with very rich metadata
  • Tools that make them relatively quick and simple to create (unlike - and I'm sure some will disagree - MSI's or MSDeploy packages)
  • A standard repository interface that is easy to query to find artifacts

Conceptually, any application can be packaged as a NuGet package, and stored in a NuGet repository. I saw TeamCity working on becoming a native NuGet repository, and I assumed other build systems would someday too (except TFS, which is always last).

But NuGet packages have their own problems. The conventions are heavily geared towards distributing open source libraries, not applications that need to be run, so packaging applications can break some of these conventions. They also have performance problems when it comes to larger archives.

It's also still hard to create NuGet packages for applications. While you can right-click and package an ASP.NET web application as an MSDeploy package, you can't create a NuGet package that way. Build tools like TFS can publish a web application for deployment, but they make it hard to automatically package them up. We've been working on OctoPack for a long time to try and make this easier, but it's no where near ideal.

Azure cloud services can be packaged easily from the right-click menu or via MSBuild, but currently an Azure project can't use NuGet, so using OctoPack to re-package a .cspkg as a .nupkg requires manually editing the project file. These kinds of tasks should be much easier in 2013 in an age of continuous delivery and devops.

My wish

What I'd love to see is a universal application packaging format that:

  • Works for all application types (ASP.NET apps, Windows Services, Azure packages, Node, Java)
  • Has really great tooling support on the Microsoft stack (no goat sacrifices needed to make it work for all application types on TFS, for example) and ideally other stacks
  • Has a well-defined repository interface for querying and retrieving packages by version

I don't think this format should be specific to Microsoft or to Octopus Deploy. I think it should be similar to (and could even build on top of) NuGet, but without the conventions that impede using NuGet for more than just packaging libraries.

What does it take to make this happen?


Tagged with: Ecosystem
Loading...