Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to use single sign on (sso) to run Glue script #224

Open
Stefan-Dienst opened this issue Feb 7, 2025 · 0 comments
Open

Unable to use single sign on (sso) to run Glue script #224

Stefan-Dienst opened this issue Feb 7, 2025 · 0 comments

Comments

@Stefan-Dienst
Copy link

Hi,

Thanks for providing this package.

I was following the setup guide, which worked well, but I ran into problems trying to use sso for authentication. I have logged into my sso session using the aws cli, set the AWS_PROFILE environment variable and can use it from the command line, e.g. aws s3 ls works. But when I submit a gluejob using ./bin/gluesparksubmit simple_glue_script.py I get the following error:

: java.nio.file.AccessDeniedException: s3://stefan-glue-tests/input/file.csv: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by DefaultAWSCredentialsProviderChain : com.amazonaws.SdkClientException: Unable to load AWS credentials from any provider in the chain: [EnvironmentVariableCredentialsProvider: Unable to load AWS credentials from environment variables (AWS_ACCESS_KEY_ID (or AWS_ACCESS_KEY) and AWS_SECRET_KEY (or AWS_SECRET_ACCESS_KEY)), SystemPropertiesCredentialsProvider: Unable to load AWS credentials from Java system properties (aws.accessKeyId and aws.secretKey), WebIdentityTokenCredentialsProvider: You must specify a value for roleArn and roleSessionName, com.amazonaws.auth.profile.ProfileCredentialsProvider@dd2a19a: Unable to load credentials into profile [profile sandbox]: AWS Access Key ID is not specified., com.amazonaws.auth.EC2ContainerCredentialsProviderWrapper@9479be7: Failed to connect to service endpoint: ]

running this script:

from awsglue.context import GlueContext
from pyspark.context import SparkContext

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session

s3_input_path = "s3://stefan-glue-tests/input/file.csv"
dynamic_frame = glueContext.create_dynamic_frame.from_options(
    connection_type="s3",
    connection_options={"paths": [s3_input_path]},
    format="csv",
    format_options={"withHeader": True}
)
dynamic_frame.printSchema()

s3_output_path = "s3://stefan-glue-tests/output/"
glueContext.write_dynamic_frame.from_options(
    frame=dynamic_frame,
    connection_type="s3",
    connection_options={"path": s3_output_path},
    format="parquet"
)

Following this stack overflow I assumed that the SSO dependency was missing and added them to the pom.xml:

<!-- AWS SDK SSO Dependency -->
    <dependency>
        <groupId>software.amazon.awssdk</groupId>
        <artifactId>sso</artifactId>
		<version>2.16.76</version>
    </dependency>

    <!-- AWS SDK SSO OIDC Dependency -->
    <dependency>
        <groupId>software.amazon.awssdk</groupId>
        <artifactId>ssooidc</artifactId>
		<version>2.16.76</version>
    </dependency>

But the error still prevails. I am suspecting that they are just not used, but I am unable to debug this. Only setting the environment variables AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY and AWS_SESSION_TOKEN does the trick.

Do you have any idea what could be the issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant