Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DNM] s3_object: fix removal of max_keys=0 from params for list_keys operation #2294

Closed

Conversation

mandar242
Copy link
Contributor

@mandar242 mandar242 commented Sep 11, 2024

SUMMARY

fix accidental removal of max_keys: 0 from params
The current code removes all falsy values including 0, causing 'max_keys: 0' being removed from pagination_params

builds up on work from #1954
Fixes #1953

ISSUE TYPE
  • Bugfix Pull Request
COMPONENT NAME

s3_object

ADDITIONAL INFORMATION
    - name: List keys all options
      amazon.aws.s3_object:
        bucket: my-test-bucket
        mode: list
        max_keys: 0

current code

list_keys: 'before param cleanup ', {'Bucket': 'mandkulk-test-bucket', 'MaxKeys': 0, 'Prefix': '', 'StartAfter': ''}
list_keys: 'after param cleanup ', {'Bucket': 'mandkulk-test-bucket'}

PR code

list_keys: 'before param cleanup ', {'Bucket': 'mandkulk-test-bucket', 'MaxKeys': 0, 'Prefix': '', 'StartAfter': ''}
list_keys: 'after param cleanup ', {'Bucket': 'mandkulk-test-bucket', 'MaxKeys': 0}

Copy link
Contributor

Build succeeded.
https://ansible.softwarefactory-project.io/zuul/buildset/16393599963c48cf8a8ddb9fd5c3706f

✔️ ansible-galaxy-importer SUCCESS in 4m 33s
✔️ build-ansible-collection SUCCESS in 10m 32s
✔️ ansible-test-splitter SUCCESS in 4m 17s
✔️ integration-amazon.aws-1 SUCCESS in 12m 40s
Skipped 43 jobs

Copy link
Contributor

@tremble tremble left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What makes me nervous about this change is that I'm not sure what the actual effect is when there's a limit of "0" applied. There's also no test that this is actually doing what we expect.

@mandar242
Copy link
Contributor Author

What makes me nervous about this change is that I'm not sure what the actual effect is when there's a limit of "0" applied. There's also no test that this is actually doing what we expect.

From local testing, I've noticed that it just doesn't return anything. Imo don't see any harm with this fix, but would check with @alinabuzachis as well.

MaxKeys (integer) –
Sets the maximum number of keys returned in the response. By default, the action returns up to 1,000 key names. The response might contain fewer keys but will never contain more.

@mandar242 mandar242 changed the title s3_object: fix removal of max_keys=0 from params for list_keys operation [DNM] s3_object: fix removal of max_keys=0 from params for list_keys operation Sep 12, 2024
@alinabuzachis
Copy link
Collaborator

What makes me nervous about this change is that I'm not sure what the actual effect is when there's a limit of "0" applied. There's also no test that this is actually doing what we expect.

From local testing, I've noticed that it just doesn't return anything. Imo don't see any harm with this fix, but would check with @alinabuzachis as well.

MaxKeys (integer) –
Sets the maximum number of keys returned in the response. By default, the action returns up to 1,000 key names. The response might contain fewer keys but will never contain more.

Can you please add some tests (I guess units will be the best) for this param that will help to understand the behaviour when set to 0 or a different value?

@mandar242
Copy link
Contributor Author

What makes me nervous about this change is that I'm not sure what the actual effect is when there's a limit of "0" applied. There's also no test that this is actually doing what we expect.

From local testing, I've noticed that it just doesn't return anything. Imo don't see any harm with this fix, but would check with @alinabuzachis as well.

MaxKeys (integer) –
Sets the maximum number of keys returned in the response. By default, the action returns up to 1,000 key names. The response might contain fewer keys but will never contain more.

Can you please add some tests (I guess units will be the best) for this param that will help to understand the behaviour when set to 0 or a different value?

@alinabuzachis on a deeper look, I think max_keys is not being utilized at all by the API used
the paginator API used in code is which does not take MaxKeys in request
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3/paginator/ListObjectsV2.html#S3.Paginator.ListObjectsV2

while the API without paginator (not used in code) does accept MaxKeys in request
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3/client/list_objects_v2.html#list-objects-v2

the below assertion fails, as including max_keys in code is not changing anything I guess

    - name: List keys - not specifying max_keys
      amazon.aws.s3_object:
        bucket: mandkulk-test-bucket
        mode: list
      register: result_no_max_keys
    - debug: var=result_no_max_keys

    - name: List keys - specifying max_keys
      amazon.aws.s3_object:
        bucket: mandkulk-test-bucket
        mode: list
        max_keys: 3
      register: result_max_keys_3
    - debug: var=result_max_keys_3

    - name: check if max_keys had any effect on output #FAILS!
      assert:
        that: result_no_max_keys != result_max_keys_3

the same works fine with AWS CLI, which utilizes same backend code as client.list_objects_v2 I think

$ aws s3api list-objects-v2 --bucket mandkulk-test-bucket --max-keys 3
{
    "IsTruncated": true,
    "Contents": [
        {
            "Key": "1.txt",
            "LastModified": "2024-09-11T19:17:35+00:00",
            "ETag": "\"xxxxxxxx\"",
            "Size": 1,
            "StorageClass": "STANDARD"
        },
        {
            "Key": "10.txt",
            "LastModified": "2024-09-11T19:17:39+00:00",
            "ETag": "\"xxxxxxxx\"",
            "Size": 1,
            "StorageClass": "STANDARD"
        },
        {
            "Key": "100.txt",
            "LastModified": "2024-09-11T19:18:16+00:00",
            "ETag": "\"xxxxxxxx\"",
            "Size": 1,
            "StorageClass": "STANDARD"
        }
    ],
    "Name": "mandkulk-test-bucket",
    "Prefix": "",
    "MaxKeys": 3,
    "EncodingType": "url",
    "KeyCount": 3,
    "NextContinuationToken": "xxxxxxxx/xxxxxxxx=="
}

$ aws s3api list-objects-v2 --bucket mandkulk-test-bucket --max-keys 0
{
    "IsTruncated": false,
    "Name": "mandkulk-test-bucket",
    "Prefix": "",
    "MaxKeys": 0,
    "EncodingType": "url",
    "KeyCount": 0
}

@alinabuzachis
Copy link
Collaborator

What makes me nervous about this change is that I'm not sure what the actual effect is when there's a limit of "0" applied. There's also no test that this is actually doing what we expect.

From local testing, I've noticed that it just doesn't return anything. Imo don't see any harm with this fix, but would check with @alinabuzachis as well.

MaxKeys (integer) –
Sets the maximum number of keys returned in the response. By default, the action returns up to 1,000 key names. The response might contain fewer keys but will never contain more.

Can you please add some tests (I guess units will be the best) for this param that will help to understand the behaviour when set to 0 or a different value?

@alinabuzachis on a deeper look, I think max_keys is not being utilized at all by the API used the paginator API used in code is which does not take MaxKeys in request https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3/paginator/ListObjectsV2.html#S3.Paginator.ListObjectsV2

while the API without paginator (not used in code) does accept MaxKeys in request https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3/client/list_objects_v2.html#list-objects-v2

the below assertion fails, as including max_keys in code is not changing anything I guess

    - name: List keys - not specifying max_keys
      amazon.aws.s3_object:
        bucket: mandkulk-test-bucket
        mode: list
      register: result_no_max_keys
    - debug: var=result_no_max_keys

    - name: List keys - specifying max_keys
      amazon.aws.s3_object:
        bucket: mandkulk-test-bucket
        mode: list
        max_keys: 3
      register: result_max_keys_3
    - debug: var=result_max_keys_3

    - name: check if max_keys had any effect on output #FAILS!
      assert:
        that: result_no_max_keys != result_max_keys_3

the same works fine with AWS CLI, which utilizes same backend code as client.list_objects_v2 I think

$ aws s3api list-objects-v2 --bucket mandkulk-test-bucket --max-keys 3
{
    "IsTruncated": true,
    "Contents": [
        {
            "Key": "1.txt",
            "LastModified": "2024-09-11T19:17:35+00:00",
            "ETag": "\"xxxxxxxx\"",
            "Size": 1,
            "StorageClass": "STANDARD"
        },
        {
            "Key": "10.txt",
            "LastModified": "2024-09-11T19:17:39+00:00",
            "ETag": "\"xxxxxxxx\"",
            "Size": 1,
            "StorageClass": "STANDARD"
        },
        {
            "Key": "100.txt",
            "LastModified": "2024-09-11T19:18:16+00:00",
            "ETag": "\"xxxxxxxx\"",
            "Size": 1,
            "StorageClass": "STANDARD"
        }
    ],
    "Name": "mandkulk-test-bucket",
    "Prefix": "",
    "MaxKeys": 3,
    "EncodingType": "url",
    "KeyCount": 3,
    "NextContinuationToken": "xxxxxxxx/xxxxxxxx=="
}

$ aws s3api list-objects-v2 --bucket mandkulk-test-bucket --max-keys 0
{
    "IsTruncated": false,
    "Name": "mandkulk-test-bucket",
    "Prefix": "",
    "MaxKeys": 0,
    "EncodingType": "url",
    "KeyCount": 0
}

I have played with this option a bit and can confirm that the paginator does not accept max_keys. I don't know if this option has ever worked, unless the paginator API was changed and this parameter was removed/replaced.

However, similar functionality is achieved by passing MaxItems to PaginationConfig. Therefore, I suggest removing the default value of max_keys and, passing it to PaginationConfig similar to:

def list_keys(s3, bucket, prefix=None, marker=None, max_keys=None):
    pagination_params = {
        "Bucket": bucket,
        "Prefix": prefix,
        "StartAfter": marker,
        "PaginationConfig":  {"MaxItems": max_keys}
    }

This succeeds then:

        - name: List keys - without setting max_keys
          amazon.aws.s3_object:
            bucket: "{{ bucket_name }}"
            mode: list
          register: _result_no_max_keys

        - name: Assert that bucket has 4 files
          ansible.builtin.assert:
            that:
              - _result_no_max_keys.s3_keys | length == 4

        - name: List keys - specifying max_keys=2
          amazon.aws.s3_object:
            bucket: "{{ bucket_name }}"
            mode: list
            max_keys: 2
          register: _result_max_keys_2

        - name: Show only 2 files files (max_keys=2)
          ansible.builtin.assert:
            that:
              - _result_max_keys_2.s3_keys | length == 2

        - name: List keys - specifying max_keys=0
          amazon.aws.s3_object:
            bucket: "{{ bucket_name }}"
            mode: list
            max_keys: 0
          register: _result_max_keys_0

        - name: No keys (max_keys=0)
          ansible.builtin.assert:
            that:
              - _result_max_keys_0.s3_keys | length == 0

Thoughts? @tremble @mandar242

@tremble
Copy link
Contributor

tremble commented Sep 18, 2024

Thoughts? @tremble @mandar242

If someone passes max_items and/or next_token we could skip the pagination entirely.

Copy link
Contributor

Build succeeded.
https://ansible.softwarefactory-project.io/zuul/buildset/d3e9fd7a27ea471d91121f8d8456acde

✔️ ansible-galaxy-importer SUCCESS in 7m 25s
✔️ build-ansible-collection SUCCESS in 10m 53s
✔️ ansible-test-splitter SUCCESS in 4m 22s
✔️ integration-amazon.aws-1 SUCCESS in 12m 52s
Skipped 43 jobs

@mandar242
Copy link
Contributor Author

Thoughts? @tremble @mandar242

If someone passes max_items and/or next_token we could skip the pagination entirely.

@tremble @alinabuzachis , in that case maybe we can drop this PR, and inform reporter to make use of max_items and/or next_token to achieve their desired state?

@tremble
Copy link
Contributor

tremble commented Oct 9, 2024

Closing this in favour of #2328 which reworks how/when pagination is used.

@tremble tremble closed this Oct 9, 2024
@mandar242 mandar242 deleted the aaws-1953-s3_object branch October 29, 2024 20:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

S3 bucket list: max_keys: 0 is ignored
4 participants