A Few Tips for Azure Resource Manager Templates

Azure Resource Manager (ARM) templates are often looked upon as some sort of magic voodoo. People who use them every day often have a limited understanding of them and this leads to doing things the hard way. The users aren’t always to blame, the documentation is often terse (at best) and features change quickly. In particular, mid-2017 saw a few critical features appear that really improve their usability in a few cases. I’m not going to try to completely solve the knowledge gap or try to create a comprehensive set of guidelines. Instead, I’ll point you to a few things that have been useful for me or helped me get to that “click” moment where they stopped looking like a blob of json and started looking like a language.

No comments

Azure Resource Manager (ARM) templates are often looked upon as some sort of magic voodoo. People who use them every day often have a limited understanding of them and this leads to doing things the hard way. The users aren’t always to blame, the documentation is often terse (at best) and features change quickly. In particular, mid-2017 saw a few critical features appear that really improve their usability in a few cases. I’m not going to try to completely solve the knowledge gap or try to create a comprehensive set of guidelines. Instead, I’ll point you to a few things that have been useful for me or helped me get to that “click” moment where they stopped looking like a blob of json and started looking like a language.

ARM Templates are a Descriptive Language

Especially since the mid-2017 enhancements, you can build a whole lot of programming into an ARM template. Be careful with that. ARM Templates are at their most powerful when used as a part of a Desired State Configuration(DSC) approach for your platform and infrastructure. That is, you tell ARM “I want this” and it figures out how to get there.

They are at their weakest when you need to dive in and debug because there really aren’t any viable tools for that. I could live with the lack of debugging if the error messages got straight to the point, but often they don’t.

That debugging experience alone makes me avoid building a ton of logic in my templates. Each template should have an opinion on what it does and if you need to get crazy, do it elsewhere. Better yet, don’t get crazy in the first place and Keep it Simple. Templates should be built to deploy a single set of resources consistently across your environments from dev to prod. That doesn’t mean dev needs to be the same size as prod, feel free to change the number and size of resources via parameters. What it does mean is that you shouldn’t try to build out vastly different deployments in the same template. If you find yourself using a lot of if in your template, you’re probably better off with multiple tempaltes.

Here are the biggest reasons I prefer templates over any other way of creating Azure resources:

  • A single, reusable “package” to provision a set of resources.
    • It can largely handle the dev->qa->stage->prod workflow with the same template while allowing you to pick the scale you want for each environment.
  • The ARM engine figures out parallelism for you.
    • If you’re deploying half a dozen virtual machines in a template that can be provisioned at the same time, each with a dozen data disks and no described dependency between the machines, then ARM can potentially deploy them as fast as it can deploy a single vm with a single disk. The only limiting factor is just how many other people are deploying at the same time.
  • Simplifies training because it is a cross cutting language.
    • JSON is probably familiar to your developers already. If it’s not, the basic rules are fairly simple to master.

The Mode Parameter

Most folks I talk to either don’t know that the Mode parameter exists or have no idea what it really does. Mode just tells ARM whether it should treat this deployment as a full description or just a partial description of the configuration. Its default value is Incremental and this is what you want most of the time.

Incremental tells ARM that this template deployment should add resources to the resource group it is being deployed to. It will not take and resources away and will try to update existing resources if there are changes. This is the default mode and many teams only ever use this mode, often forgetting or not realizing there is another option. If you added data disks to a VM in the template and deploy, the data disks will be added to the VM. However, if there was a database in the resource group that wasn’t described in the template, ARM wouldn’t do anything to it.

Complete tells ARM that this template deployment completely describes the state of resources in the resource group. Any resources that are not described in the template will be removed. This is obviously dangerous as you can lose data. It’s also an easy way to clean up a resource group if you want to remove most, but not all, resources. You build the template for the things you want to keep and deploy it as Complete and it deletes everything else.

Note that I said Complete is “easy”. It doesn’t execute very quickly. If you have a lot of resources that depend on others, it can take a while to run. It will just loop through all the resources that need to be deleted, try to delete them, and if it can’t because there’s another resource that depends on it, it will move to the next resource. It will then loop again over the resources to see if it can delete more and keep going until all the resources it is supposed to remove. If it gets to a point where it can’t remove any more resources due to dependencies, but thinks it should, it will eventually error (at least for the cases I’ve run into).

Using Complete to remove resources is certainly an option, but I recommend that you think a little more about your resource groups and don’t try to use this as a normal practice.

The Name Parameter

There is a Name you must provide when deploying an ARM Template. If you think that is just there for your own reference and it doesn’t matter, my answer is “yes but no”.

Name is an identifier used by ARM, not just you. Remember that ARM is at its best when used as a DSC. Sometimes you will want to use the same template to deploy two sets of resources to the same Resource Group. The easiest example is to add another VM in a given role. Say that your application service is hosted on a VM because you have some custom thing going on that stops you from using Platform as a Service options, Containers, or VM Scale Sets (You have my sympathy if this is truly the case. If it’s not truly the case, what in the world are you doing?).

It’s very easy to parameterize the VM name and then use the concat function to add that VM name. But what if you want to let Azure generate the VM name itself using the built in uniqueString function? How would you be able to update the deployment for the original VM if the VM name is generated at run time? Fortunately, that deployment Name gives you a way.

ARM keeps track of the decisions it made based on the deployment Name. If you set a variable (not a parameter) for the VM name, then you can use that so when you deploy twice, you can get a second VM without changing parameters. "virtualMachineName": "[concat('vm', uniqueString(resourceGroup().id, deployment().name))]" is all you need. This can be useful, but I generally prefer predictable names.

You can find my example template here. This will create a simple VM with two small data disks.

Here’s a snippet to deploy this:

$subscriptionName = "My Subscription Name"
Login-AzureRMAccount -SubscriptionName $subscriptionName


$adminUserName = "myadmin"
$virtualNetworkResourceGroupName = "cnug"
$resourceGroupName = "testtodestroy"
$location = "CentralUS"
$virtualNetworkName = "MyVNET"
$networkSecurityGroupName = "MyNSG"

$password = Read-Host -AsSecureString

New-AzureRmResourceGroupDeployment  -ResourceGroupName $resourceGroupName -Mode Incremental -location $location -adminUserName $adminUserName -adminPassword $password -virtualNetworkResourceGroupName $virtualNetworkResourceGroupName -virtualNetworkName $virtualNetworkName -networkSecurityGroupName $networkSecurityGroupName  -TemplateFile .\simple_vm_arm.json -Name "Vm1"

The first time I ran that (with the various variables set appropriately), I got a VM named “vmddo44dfpuloma”. When I re-run it, and nothing changes in the resource group. I then ran it with -Name "Vm2" and saw that it created another VM named “vmu77phgcynmceq”. Go back and run it with -Name "Vm1" and nothing changes.

If you want to see Mode in action AND this is a resource group you’re just using for the example, change it to Complete and it will go ahead and remove the other VM. Don’t use this with a Resource Group with things you want to keep, it will destroy them! Fortunately it does prompt you before going ahead with it. It will take a little while to figure out the deletions and then execute them. Like I said, it’s easy but not fast.

Nested Copy Loops

Until the middle of 2017, there wasn’t a way to base the number of disks attached to a virtual machine on a variable or parameter. You had to repeat the data disk section for each and every disk. This was annoying to say the least. It’s quite typical for us to want to provision a smaller VM for a dev or limited test environment than we use for production. This includes the number and size of disks. Anyway, we now have it, it’s fairly simple to put in place but not well documented. You can lean on the Azure Sample Template for learning more about it. With the slim documentation, I’m not sure where else we can use it, but Data Disks are the scenario I’ve cared about so far.

What you will do is replace the data disks section of your VM’s template with a copy section in place of your “dataDisks” section. This is a wrapper around the data disk section similiar, but not the same, as other copy loops. I’m really only talking about it because it’s easy to miss with the limited documentation and at least one official documentation page still says you can’t do this.

"dataDisks": []

becomes

"copy": [
    {
        "name": "dataDisks",
        "count": "[variables('numberOfDataDisks')]",
        "input": {
            "caching": "[variables('diskCaching')]",
            "diskSizeGB": "[variables('sizeOfDataDisksInGb')]",
            "lun": "[copyIndex('dataDisks')]",
            "name": "[concat(variables('vmName'), '-datadisk', copyIndex('dataDisks'))]",                                
            "createOption": "Empty",
            "managedDisk": {
                "storageAccountType": "Standard_LRS"                                  
            }
        }
    }
]

This is using all variables, but in truth you’d probably use parameters for many of the variable choices.

Deciding on Resource Group Scope

I’ve done it, you’ve probably done it too because it seems everyone does it when first starting out: You create one resource group to contain your entire application. The Database Server, the Application Server, the Web Server, the Key Vault, and so on.

This may be right. Most often it is wrong. MS provides some guidance, but it’s a little lost in a sea of text. The most important bit of that from a design perspective:

All the resources in your group should share the same lifecycle. You deploy, update, and delete them together. If one resource, such as a database server, needs to exist on a different deployment cycle it should be in another resource group.

They use the database server as an example. Key Vault is another commonly used resource that should live outside the application resource group in most cases. If you’re using Blue-Green deployments or anything along those lines, the Key Vault may be shared and so shouldn’t live in either a Blue or Green resource group. Alternately, you may have a master Key Vault that you clone into Blue and Green, but that master would still live on its own.

What does this mean for Microservices? It almost goes without saying that each Microservice should live in it’s own resource group(s). Different parts of a microservice will likely have their own life cycle, so it will be very common to have more than one resource group for each microservice.

You can and should also use Resource Groups to govern who has access to manage what and just for general organization. If you’re creating a separate set of web servers for your application to serve a specific customer, then putting them in their own resource group can make sense. If you will strictly control who can do anything to the Customer Information SQL Azure database, but can be more relaxed about the Product Catalog database, then it may make sense to put the Customer Information database in a different resource group.

Remember, Resource Groups are first and foremost a tool for organization. Like any tool for organization (example: your clothes closet), they can be a big pile of stuff or highly organized. It’s up to you to create and maintain that organization. No one answer is always right, but Microsoft has defined some patterns and practices here.

I don’t pretend to know everything or to memorize things I only use once every few months. If I can’t get to a website, I’m probably not doing much with Azure at the moment anyway. Here’s some links relate to ARM templates that I use quite a lot: