Infrastructure as code allows for fully automatic infrastructure provisioning, often by declaratively defining a desired state of your infrastructure in some language. One tool to achieve this is terraform. The entry barrier is very low - a basic project setup is done in no time. But beyond the basics, some additional aspects should be considered on top:
Terrform state handling
Let’s say we want to provision a virtual machine and an object storage as part of our deployment. Terraform tracks the state (e.g., all resources that belong to the deployment) in a json file. Especially when working in a team, we should save the state remotely and not on a local machine so that every team member can work with the most recent state. To do so, use a terraform remote backend that supports encryption at rest, versioning and state locking. We need encryption at rest, because the terraform state can contain sensitive information like database passwords. We need versioning, because in case of state corruption you may have to revert. And we need state locking, because otherwise two team members may run terraform at the same time and corrupt the state
To save the terraform state remotely, we first require an object storage (among some other resources). Instead of creating these resources manually, we create them with terraform and store this specific state file that only contains these special resources in our repository. This is okay, because the state file for these resources should not contain any secrets.
Don’t put all your resources in a single state file. Otherwise, you unnecessarily increase the runtime of your terraform deployments and the blast radius if something goes wrong in your deployment. Instead, use the same state file for resources that belong together - e.g., everything that is required for a certain micro-service. You can use data sources to access resources that are not managed in the current state file, but use this with caution, considering that it creates a dependency on the outside world.
On top, all state files should be per stage and region - so a state file per region + stage + resource unit. Use shell scripts or a Makefile to switch between the stages and pass information like the region etc. via the terraform commandline.
Terraform itself and its providers change often and upgrades may not be backwards-compatible (this may be less of a problem nowadays).
Hard-code the terraform and provider version so that terraform and provider updates are always a conscious decision.
I would recommend to install terraform binaries with their version string attached (e.g.,
terraform-1.0.1) and define the name of the terraform binary in your Makefile.
Whenever possible, write modules that can be re-used on all stages (test, production, etc.). But what happens, if you change the infrastructure on test, but don’t want to roll it out to your other stages, yet? You can use git references to pin the version of your own modules - if these modules are tracked in git and are pulled from a remote.
By using terraform, the cloud platform has no notion of which resorces belong to a deployment. Especially on AWS, where there are no mandatory resource groups, it is a good idea to tag resources that were created by terraform. Standard tags can be stored in their own terraform variable file and passed at run-time via the cli. This way, you can make sure that the same set of tags is always available.
Terraform allows to ‘protect’ resources from deletion, as long as the resources is still defined in your terraform scripts. This can be a helpful protection to prevent terraform from accidentally deleting and re-creating a resource if it cannot be modified.
Don’t save unencrypted secrets like database passwords in your git repository. Use alternative approaches, e.g.:
- Manually store an encrypted file in your repository and let terraform decrypt the file on the fly. On AWS you can do this with AWS KMS - sops can make this experience more pleasant.
- Use a secrets manager like AWS secrets manager or Azure keyvault.
Use consistent naming
Adding suffixes and prefixes (e.g., depending on the region) is tedious. Create a separate module just for that and use it to create resource labels.
Plausibility-check the code
terraform validate can be used for simple state validation and tools like
checkov can be used as a linter.
Using terraform in a pipeline
The pipeline should run the
plan step and save the resulting plan.
Afterwards, it must stop and ask for approval.
If an authorized user approves the plan, the
apply step must use the saved plan from the previous step.
This is crucial - sometimes terraform cannot modify a resource and instead decides to delete and re-create it.
Bad luck if it happens to be your production database … so always check the plan, whether terraform is run from a pipeline or not.
- … tbd …