Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Terragrunt as a preprocessor #2403

Open
wants to merge 26 commits into
base: main
Choose a base branch
from

Conversation

brikis98
Copy link
Member

@brikis98 brikis98 commented Jan 2, 2023

Description

This is a WIP PR from a hackday project that implements the idea in #759 (comment) to turn Terragrunt into a preprocessor for Terraform (similar to how Sass and Less are preprocessors for CSS).

It is NOT yet ready for review and merge.

Video overview

video1729187535.mp4

Principles

Key idea: this is Terraform the way it should work. You get to write code in a way that works well from a developer perspective (simple, DRY) and after preprocessing that code, you get to deploy it in a way that works well from an operational perspective (secure, isolated, reviewable).

Input: pure, normal, native Terraform code

  • You write code in normal .tf files
  • You create the code the "naive" way: one giant root module for all your infrastructure
  • The root module uses sub-modules for each part of the infra: e.g., one sub-module for the VPC, one sub-module for the DB, one sub-module for each web service, etc.
  • Because it's all normal TF code, you can use Terraform's native mechanisms to make everything DRY, manage the backend config in one place, handle dependencies between sub-modules, and so on.

Here's an example of what the code could look like: https://github.com/gruntwork-io/terragrunt/tree/enhancement/hackday-terragrunt-preprocessor/test/fixture-preprocessor/before

By itself, this code is great from a developer perspective, but it's terrible from an operational perspective: see below for all the problems you'll run into.

Command: terragrunt process

There's really only one command to run: terragrunt process.

Output: pure, normal, native Terraform code

After you run terragrunt pocess, you get:

  • Normal .tf files again
  • But now they are broken up across multiple environments: one top-level folder per environment
  • And within each environment, they are broken up further by type of infra: a separate root module for each sub-module (e.g., VPC, EKS, web-service).
  • The backend is configured properly for each module
  • Dependencies across modules are automatically configured using terraform_remote_state

Here's an example of what the generated code could look like: https://github.com/gruntwork-io/terragrunt/tree/enhancement/hackday-terragrunt-preprocessor/test/fixture-preprocessor/after

This generated code is optimized to work well from an operational perspective.

Deploy using TF

  • Now you can go into each of the generated sub-folders and use Terraform as usual to deploy: e.g., terraform plan, terraform apply.
  • Nothing new to learn! You write pure Terraform code, just as you'd expect. After preprocessing, you interact with it using standard Terraform codes, just as you'd expect. No weird Terragrunt concepts to grapple with: no terragrunt.hcl, no _envcommon, etc.
  • If you check the generated code into Git, it works natively with TFC and TFE too!
  • No issues with debugging Terragrunt problems, as you can see exactly what the output is!
  • No lock in: it's pure TF code, so if you don't like Terragrunt, you can stop using it any time.

The operational problems this fixes

Although "one giant root module with all your infra" is wonderful from a developer perspective, as it's easy to learn and keeps your code DRY, it has a bunch of drawbacks from an operational perspective:

  • Security: with everything in one module, to deploy anything, you need access to everything (everyone has to be an admin to run plan or apply).
  • Speed: with a giant root module, plan and apply take forever (for a large infra, tens of minutes!).
  • Code review: for a giant root module, the plan output is way too big to meaningfully read, so you blindly apply changes, rarely catching mistakes that slip through.
  • Automated testing: there's no meaningful way to do automated testing for a giant infra.
  • Risk: all your eggs are in one basket. A single typo or mistake anywhere could break everything.
  • Isolation: all your envs end up on the same version of every sub-module. There's no way to do immutable infrastructure practices and use different versions in different environments.

By using Terragrunt to "pre-process" your code, you get all the developer benefits when writing and maintaining the code, as it's simple and DRY, but now, because in the generated code, everything is broken up into separate environments and modules, all the operational problems above (security, speed, code review, automated testing, risk, and isolation) are mitigated!

TODOs before a full review & merging

  • Add an example of how to support versioned modules (different versions in the source URL in different environments) using override files.
  • Figure out how to migrate existing Terragrunt users to this new pattern.
  • Figure out how to support providers not from HashiCorp (i.e., remove hard-coded registry.terraform.io/hashicorp URL).
  • Add additional automated tests:
    • Multiple files with output variables.
    • Resource and data source handling.
    • Remote backend (e.g., S3) handling.

@wraithm
Copy link

wraithm commented Jan 5, 2023

This is incredible work. I love the approach. Much of my terraform code is a giant module in a single terraform state... It's so much easier to write. We've been in the process of migrating to terragrunt to get the benefits mentioned here, but it's been slow going. It's so nice to write terraform in this style, but as you said, it's a big headache operationally.

One question: How would this handle nested modules? One of the reasons why you might want to have nested modules is that there may be cross-communication between different regions. Say you're deploying to multiple regions and you want to have roughly the same configuration in each region, but then you need to set up some communication between those regions (eg. a multi-region consul deployment). You could imagine a "region" module that includes a bunch of sub-modules, where each of the sub-modules gets a terraform state. Does this pre-processor handle that or is it limited to one level of modules? If it's limited to one level of modules, how would one implement multiple regions per env with some cross relationship between those regions?

I guess you could have the tfvars split out into <env>_<region>, eg. dev_us-east-1,prod_us-west-1,etc? Or maybe you could imagine a folder hierarchy inside the tfvars folder, like dev/us-east-1/vpc, dev/us-west-2/vpc, prod/us-west-2/vpc, etc that maps down to replication of the modules or something. Just throwing ideas out there.

@armenr
Copy link

armenr commented Jan 15, 2023

This is incredible work. I firmly believe that this is the way terraform should work (or at least allow us to work). In fact, I've been working on something similar, but from a totally external angle. I LOVE the way this is implemented and how it rolls into terragrunt.

Specifically, I'm glad this is through terragrunt because terragrunt provides a number of conveniences and augmentations that terraform does not. It's become an indispensable tool for my work. I don't image building anything even sufficiently complex without terragrunt.

I do, however, worry/wonder about the following:

Ever been working on some CSS in LESS or SASS, and everything looks good --> and then you process and generate the stylesheets for your bundle...and you're thinking, "why the heck does my generated CSS not work/look right?"

It's painful but workable to debug that and get to the bottom of the issue when you're able to run your code in a wannabe REPL (hot reloading vite/webpack server). In this case, I wonder what kind of pain or difficulty might emerge if/when the parsing/rendering produces an unexpected result, and you're trying to figure out what the root cause of that might be.

Would there be some way to either step through or debug the process command? If not, I feel like this would be a really valuable consideration. As it is, the development feedback loop for terraform is painful and slow (change code --> plan --> wait --> check --> repeat), especially with remote terragrunt state. Anything that might introduce further delay or complexity into that development loop would hurt, badly.

I hope that made sense...

@brikis98 brikis98 added this to the gjhk hg \'\[] milestone Apr 24, 2023
@brikis98 brikis98 removed this from the gjhk hg \'\[] milestone Aug 10, 2023
@timothyclarke
Copy link

Looks really great.
For next steps (possibly part of the paid for product)
terragrunt process -migrate which outputs a terragrunt folder structure (dependencies nested and using dependency) and an appropriate mix of aws S3 cp and terraform state rm or terraform import commands

@tuxillo
Copy link

tuxillo commented Feb 7, 2024

As I understand, there is still something pending for this to work?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants