Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot find catalog plugin class for catalog 'glue_catalog' #222

Open
ValerianVirmaux opened this issue Jan 21, 2025 · 4 comments
Open

Cannot find catalog plugin class for catalog 'glue_catalog' #222

ValerianVirmaux opened this issue Jan 21, 2025 · 4 comments

Comments

@ValerianVirmaux
Copy link

Hello ! I have an error regarding Iceberg setup with the image glue_libs_4.0.0_image_01

Summary
Can not use Iceberg in glue_libs_4.0.0_image_01 image

Steps to Reproduce

Dockerfile
"""
FROM amazon/aws-glue-libs:glue_libs_4.0.0_image_01
USER root
WORKDIR /app
COPY . /app
RUN yum update -y && yum install -y
wget
curl
python3-pip
&& rm -rf /var/lib/apt/lists/*
RUN curl "https://d1vvhvl2y92vvt.cloudfront.net/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" &&
unzip awscliv2.zip &&
./aws/install
RUN pip3 install pyspark==3.5.4
ENV DATALAKE_FORMATS=iceberg
ENTRYPOINT ["bash"]
"""

main.py
"""
from pyspark.sql import SparkSession

spark = (
SparkSession.builder.config("spark.sql.catalog.glue_catalog", "org.apache.iceberg.spark.SparkCatalog")
.config("spark.sql.catalog.glue_catalog.catalog-impl", "org.apache.iceberg.aws.glue.GlueCatalog")
.config("spark.sql.catalog.glue_catalog.io-impl", "org.apache.iceberg.aws.s3.S3FileIO")
.getOrCreate()
)

spark.sql("CREATE TABLE glue_catalog.my_database.my_table (id BIGINT, name STRING) USING iceberg")
"""

After building the image, running the container, and connect de AWS, I have the error "An error occurred while calling o47.sql.
: org.apache.spark.SparkException: Cannot find catalog plugin class for catalog 'glue_catalog': org.apache.iceberg.spark.SparkCatalog"

Image

@bnigmat
Copy link

bnigmat commented Jan 22, 2025

You may be missing the following configuration parameters:
.config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions").
.config("spark.sql.catalog.glue_catalog.warehouse", <catalog_warehouse_path>)

@Valerian-TSystems
Copy link

Hello, thank you bnigmat for looking at it.

The SQL extension is failing :

Image

And the result is the same :

Image

@bnigmat
Copy link

bnigmat commented Jan 22, 2025

Try to remove below config first:

config("spark.sql.defaultCatalog", "glue_catalog")

This is the configuration I used with no issue:

    spark = SparkSession.\
        builder.\
        appName(app_name).\
        config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions").\
        config("spark.sql.catalog.local", "org.apache.iceberg.spark.SparkCatalog").\
        config("spark.sql.catalog.local.type", "hadoop").\
        config("spark.sql.catalog.local.warehouse", "data/catalog").\
        config("spark.sql.defaultCatalog", "local").\
        config("spark.sql.catalog.glue_catalog", "org.apache.iceberg.spark.SparkCatalog").\
        config("spark.sql.catalog.glue_catalog.catalog-impl", "org.apache.iceberg.aws.glue.GlueCatalog").\
        config("spark.sql.catalog.glue_catalog.io-impl", "org.apache.iceberg.aws.s3.S3FileIO").\
        config("spark.sql.catalog.glue_catalog.warehouse", catalog_warehouse).\
        config("spark.sql.catalog.glue_catalog.default-namespace", default_namespace).\
        config("spark.sql.catalog.glue_catalog.glue.skip-name-validation","true").\

@ValerianVirmaux
Copy link
Author

The DATALAKE_FORMATS environment variable was not working.
This does:

    spark = (
        SparkSession.builder.config(
            "spark.sql.catalog.glue_catalog",
            "org.apache.iceberg.spark.SparkCatalog",
        )
        .config(
            "spark.sql.catalog.glue_catalog.catalog-impl",
            "org.apache.iceberg.aws.glue.GlueCatalog",
        )
        .config(
            "spark.sql.catalog.glue_catalog.io-impl",
            "org.apache.iceberg.aws.s3.S3FileIO",
        )
        .config(
            "spark.jars.packages",
            "org.apache.iceberg:iceberg-spark-runtime-3.3_2.12:1.4.3,org.apache.iceberg:iceberg-aws-bundle:1.4.3",
        )
        .getOrCreate()
    )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants