[Resolve #1169] Add detect-stack-drift command #1170

alex-harvey-z3q · 2021-11-25T12:12:11Z

Add a detect-stack-drift command and tests.

The detect-stack-drift command calls the detect_stack_drift
Boto3 call, takes the Detector Id returned, then waits for
describe_stack_drift_detection_status(StackDriftDetectionId=detector_id)
to complete, and then finally returns
describe_stack_resource_drifts as a JSON document.

If --debug is passed, sceptre also will provide feedback on
the detection progress.

PR Checklist

Wrote a good commit message & description [see guide below].
Commit message starts with [Resolve #issue-number].
Added/Updated unit tests.
Added/Updated integration tests (if applicable).
All unit tests (make test) are passing.
Used the same coding conventions as the rest of the project.
The new code passes pre-commit validations (pre-commit run --all-files).
The PR relates to only one subject with a clear title.
and description in grammatically correct, complete sentences.

Approver/Reviewer Checklist

Before merge squash related commits.

Other Information

Guide to writing a good commit

Add a `detect-stack-drift` command and tests. The `detect-stack-drift` command calls the `detect_stack_drift` Boto3 call, takes the Detector Id returned, then waits for `describe_stack_drift_detection_status(StackDriftDetectionId=detector_id)` to complete, and then finally returns `describe_stack_resource_drifts` as a JSON document. The output is formatted such that it can be passed directly into a tool like `jq`. If `--debug` is passed, sceptre also will provide feedback on the detection progress.

sceptre/cli/template.py

sceptre/plan/actions.py

tests/test_actions.py

sceptre/plan/actions.py

jfalkenstein

This is a pretty cool feature I'm looking forward to having it. With that said, it looks like this feature only really covers the "happy path" of everything going as expected. It needs to handle cases like:

When the stack doesn't exist
When drift detection times out
When drift detection fails for some reason

Also, you should use Python-3 conventions rather than the old-school py2 ways of doing things, such as:

f-strings instead off old-style %s string formatting
type annotations instead of indicating types in docstrings

jfalkenstein · 2021-11-27T16:27:46Z

I'd also like to see some sample output here on this PR. I'm not really sure what sort of info this would show.

jfalkenstein · 2022-01-04T15:15:41Z

integration-tests/steps/stacks.py

+        if stack.stack_status == "ROLLBACK_IN_PROGRESS" and not rollback_printed:
+            client = boto3.client("cloudformation")
+            response = client.describe_stack_events(StackName=stack_name)
+            print(response)
+            rollback_printed = True


What's the thinking behind this change?

@jfalkenstein Those are just troubleshooting lines for me to figure out why the build keeps failing in the CI pipeline where I can't reproduce it on my laptop.

sceptre/cli/drift.py

sceptre/cli/helpers.py

sceptre/plan/actions.py

tests/test_cli.py

Co-authored-by: Jon Falkenstein <[email protected]>

jfalkenstein · 2022-01-06T15:43:49Z

I just tested this out and the output looks good! The output is a lot better, both in yaml and in json.

The one remaining thing I'm noticing is this:

For some stupid reason, cloudformation returns json within json in their responses. It makes the command output super wide, hard to read, and ends up with the awkwardness of json within json or (even worse) json within yaml. This is always present in ExpectedProperties and ActualProperties, but it can also be seen in PropertyDifferences as well, when there's a value there.

I think it would be a significant improvement if we recursively deserialized embedded json. Something like this would work work:

def deserialize_json_properties(value):
    if isinstance(value, str):
        is_json = (
            (value.startswith('{') and value.endswith('}'))
            or
            (value.startswith('[') and value.endswith(']'))
        )
        if is_json:
            return json.loads(value)
        return value
    if isinstance(value, dict):
        return {
            key: deserialize_json_properties(val)
            for key, val in value.items()
        }
    if isinstance(value, list):
        return [
            deserialize_json_properties(item)
            for item in value
        ]
    return value

Basically, if you passed each drift response on the drift show cli through that function before outputting it, that should be sufficient to deserialize all those json properties.

jfalkenstein · 2022-01-07T14:30:18Z

Running integration tests now

jfalkenstein · 2022-01-07T15:18:27Z

@alexharv074 , your integration tests failed again. I looked at the output and found this: "'ResourceStatus': 'CREATE_FAILED', 'ResourceStatusReason': 'Resource handler returned message: "User: arn:aws:sts::582448526747:assumed-role/sceptre-integration-test-ServiceRole-DUSW2V6ES2ZU/botocore-session-1641565891 is not authorized to perform: logs:CreateLogGroup on resource: arn:aws:logs:eu-west-1:582448526747:log-group:sceptre-integration-tests-82194e1a6fc611ecb27a0242c0a82003-drift-group-A-LogGroup-APnLzvaiAFOX:log-stream: because no identity-based policy allows the logs:CreateLogGroup action"

It looks like your plan to cause drift by creating a log group requires permissions the integration test runner doesn't currently have. Looks like you'll either need to use a different resource type, or you'll need to make a PR to update this file to give the integration tests the proper permissions: https://github.com/Sceptre/sceptre-aws/blob/master/config/prod/sceptre-integration-test-service-access.yaml

alex-harvey-z3q · 2022-01-08T16:54:45Z

@jfalkenstein ah .... so that's what it is. Ok, back to the drawing board! I'll see if I can rewrite those tests to not require that permission.

This reverts commit fe28b0a.

This reverts commit 2f27c51.

ykhalyavin · 2022-01-10T19:40:25Z

@alexharv074 would love to see this feature in our pipelines before the actual deployment.

I just wonder if it'd make sense to make timeout configurable in case CloudFormation service takes more than 300 secs to perform the detection (not sure how long it takes today though)?

jfalkenstein

This looks good to me, man. Thanks for your hard work!

@zaro0508 , are we good to squash-merge this into master?

alex-harvey-z3q · 2022-01-13T13:40:32Z

@ykhalyavin

I just wonder if it'd make sense to make timeout configurable in case CloudFormation service takes more than 300 secs to perform the detection (not sure how long it takes today though)?

Yeah we discussed this above. My thinking is that the timeout should not be reachable and if in some scenario it turns out that it is, then that should be a bug that gets fixed. In my tests, it normally takes just a few seconds, so something should be really wrong if 300 seconds pass.

zaro0508

Just a few minor grammatical suggestions otherwise this looks good to me. Thanks for brining this feature to sceptre @alexharv074 and @jfalkenstein

integration-tests/features/drift.feature

integration-tests/steps/stacks.py

sceptre/cli/drift.py

jfalkenstein · 2022-01-27T16:09:30Z

Running integration tests now...

mrowlingfox and others added 5 commits May 19, 2021 08:57

Bump the minor version by 1

e16eabf

Merge branch 'master' of github.com:Sceptre/sceptre

bdb794b

Merge branch 'master' of github.com:Sceptre/sceptre

e8fcbf3

Merge branch 'master' of github.com:Sceptre/sceptre

80e66fe