Building pipelines in Azure DevOps

Oleh Svintsitskyy

Jun. 1 20237 min. read time

What is this about?

Continuous Integration setup in Azure DevOps. Specifically, approach and motivation behind organization of build pipelines.

Following requires basic understanding of continuous integration concepts used in Azure DevOps like pipeline, stage, job, step.

Why?

Modernization/migration of existing portfolio built on legacy tech stack and infrastructure into Azure cloud platform.
Having set up and maintained CI/CD pipelines together with supporting infrastructure during my previous assignments using different tools and tech stack, I decided to take a challenge and do it from scratch in Azure DevOps utilizing my previous experiences.

What do I have?

Legacy CI built on top of Team Foundation Server Build Service that was done a very long time ago and is hard to understand
Set of .net and node-based projects

Where do I want to be?

The goal is to create a setup that is:

Reusable
Flexible and extendable
Easy-to-understand
With fast feedback to developers
Requires minimal maintenance effort
Not stored together with the source code that it will be used on but is under version control
Robust and troubleshootable

Two last points to mention:

I would like to start by defining a process and use a tool to implement it, rather than build a process around a tool
I would like to stay as vanilla as possible so it’s easy to understand for others

How do I get there?

Classic or YAML

Both:

YAML - primary way, as it provides more flexibility and preserves history.
Classic - playground, as they are quicker to set up and can generate YAML, which is handy when trying new things.

I ended up with one classic pipeline containing set of jobs that target different projects where I connect different repositories when needed.

Reusability

Main principles behind organization are Single Responsibility and DRY (Don't Repeat Yourself). Aside from having pipelines under version control I would like to avoid doing ‘blind copy/pasta’ across different pipelines for ‘reusability’, which leaves me with an obvious choice of using templates that would be stored in a dedicated repository (they might be even stored in a dedicated DevOps project). The projects I work with are either .net or node-based, resulting in a dedicated group of templates for each stack that are placed in separate repositories.

Process

The process that would be defined in the template is something that already exists and consists of the set of sequential actions that are applied on each project/deliverable. For .net based projects that I work with list of actions include:

Build
Test
Analyze (perform static code analysis using SonarQube)
Pack (for NuGet packages)
Publish (for web apps/apis)

List is further extended with following actions:

Setup: steps used to set up infrastructure for the rest of the pipeline: which .net sdk to use, NuGet feed authentication etc.(will be always the first one in the sequence)
List: light version of software composition analysis using built-in tools as described here
Mutate: mutation testing that I’m starting to utilize

There is a list of actions, but each project has unique needs requiring different combinations. This is where my experience says that it would be more efficient to create a template for each combination rather than having a single ‘monster’ template that tries to meet every project's need as it will introduce unnecessary noise and will be harder to follow and maintain. Plus, unused combinations can be dropped any time with no extra effort.

Each group of combinations will have the same set of actions used in the same sequence as the start (setup - build - list) reducing the total number combinations.

Another argument for having a set of ready to be used templates is that during modernization/migration phase continuous integration can be introduced early and start simple, with ability to extend it by switching to a different template as work progresses (tests are fixed, access acquired, etc).

Since each combination is a set of sequential actions it fits perfectly to be used within a job.

The definition of each action follows Single responsibility and DRY principles that result in another set of nested templates that are placed in a subfolder and are referenced from the root templates.

Actions

Actions do not match 1:1 with steps and serve as containers for a group of steps:

Setup: use .net sdk, authenticate with feeds
Build: restore dependencies then build
List: list vulnerable, list deprecated and list outdated dependencies
Analyze: we use SonarQube for static code analysis that results in three different tasks with necessity to break them into two separate actions (analyze_prepare and analyze) as SonarQubePrepare task must be run before compilation and analysis after running tests
Mutate: includes installation of the tool, using it to run mutation testing, and then publishing report

Example of how list action yml looks like:

Fast Feedback and Easy to Understand

Each combination of actions is kept within a single job to avoid additional repo checkouts or sharing anything with the next job/stage via pipeline artifacts. This is why root templates are used to define jobs and action templates are used as building blocks for them. Stages are not used as they serve as containers for jobs, which is something that I’m trying to avoid within build pipeline (or 'build' part of the pipeline).

Flexibility

I would like to have the option of adjusting the process to meet specific project needs if/when necessary with minimum effort and no distrurbance for existing pipelines or templates. For example, the 'mutate' action can take considerable amount of time for large projects. Since, as of today, its feedback is used as 'For Your Information, No Immediate Action Required', I would like to have flexibility of still using it but maybe within a separate job targeting a low-priority pool or as a separate pipeline with a scheduled trigger during non-office hours so it’s not affecting other pipeline runs.

Even if there will be a need to onboard a project that has multiple deliverables (for example multiple NuGet packages in one solution) then I can still define the job in the build pipeline definition and reuse action templates there.

Global default values

Since Single Responsibility and ‘Don’t Repeat Yourself’ principles are used everywhere then any defaults that are common to multiple job templates (for example the version of .NET SDK, pool to be used, different types of thresholds, etc.) will have to be moved out and stored somewhere else. Variable groups fit perfectly for that purpose as they can be shared across pipelines and are used to store global defaults with a dedicated variable group for each group of templates.

Default values should be overridable which is done by using variable group’s variable value as default value for a parameter inside the job.

Transforming above referenced job template into:

Pipeline Definition

Each repo with project/deliverable creates a yaml-based pipeline definition that connects everything together:

Defines naming/versioning
References appropriate variable group with global defaults
Defines triggers
Specifies repositories to be used
References job template/s to be used
Provides overrides for the global defaults and other parameters

Robust and Troubleshootable

Inevitably, things will break or will need to be changed (adjusted, extended, upgraded). To minimize disruption of existing working pipelines (other colleague’s work) I would like to:

have a plan how to effectively troubleshoot
have some kind of a ‘health check’ that is able to catch yaml formatting mistakes and integration with external tools that are used in templates
know where templates are used to effectively communicate breaking changes or know where to fix them

Troubleshooting is addressed by creating a copy of the template or whole chain and using it for troubleshooting. Once the issue is resolved, changes are copied into the live template. I find this approach simpler compared to when doing it through a ‘debug’ branch.

'Health checks'

Since templates are shared, I would like to proactively catch any yaml formatting mistakes (not everybody is editing yaml using editor) or integration issues with external tools. To address it, one more set of repositories is created containing a simple project that can use all actions and a ‘health check’ build pipeline definition that is referencing all available job templates for that group. These specialized pipelines are triggered once per day during non-office hours, reducing bug lifetime to a maximum of one day.

The final infrastructure setup:

Pipelines - Templates map

To communicate any breaking changes or fixing them myself I need to know where templates are used. I tried to do it by maintaining a static map of all connections between pipelines and templates and it did not work very well. So, I decided to use Azure DevOps Repositories Rest API and have a live map instead.

Creating a .net project just for that would be too much. Instead, a simple PowerShell script would fit here perfectly. Now, when I need to communicate or make changes myself, I know exactly where they must be applied.

Couple of final thoughts

Can this approach solve any project's needs? No. It's a 'low-hanging fruit' approach that works best for projects with single deliverable (NuGet package, web/windows app/api/service)
Is the setup flexible and robust enough. Yes. I do think so. By continually refining and adapting, it can serve as a foundation for efficient modernization and migration of our projects to the Azure cloud platform ensuring a robust and streamlined continuous integration process.