You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/09/27 13:04:59 UTC

[GitHub] [iceberg] arunb2w opened a new issue, #5867: Facing error when creating iceberg table in EMR using Glue catalog

arunb2w opened a new issue, #5867:
URL: https://github.com/apache/iceberg/issues/5867

   ### Apache Iceberg version
   
   0.14.0
   
   ### Query engine
   
   EMR
   
   ### Please describe the bug 🐞
   
   Facing error when creating iceberg table in EMR using Glue catalog.
   spark version : 3.2.1
   iceberg version: 0.14.0
   
   **Sample code:**
   ```
   catalog = glue_dev
   warehouse_path = "s3_bucket"
   database = "test"
   table_name = "EPAYMENT"
   
   spark = SparkSession \
               .builder \
               .config(f'spark.sql.catalog.{catalog}', 'org.apache.iceberg.spark.SparkCatalog') \
               .config(f'spark.sql.catalog.{catalog}.warehouse', f'{warehouse_path}') \
               .config(f'spark.sql.catalog.{catalog}.catalog-impl', 'org.apache.iceberg.aws.glue.GlueCatalog') \
               .config(f'spark.sql.catalog.{catalog}.io-impl', 'org.apache.iceberg.aws.s3.S3FileIO') \
               .config('spark.sql.extensions', 'org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions') \
               .config('spark.sql.catalog.spark_catalog', 'org.apache.iceberg.spark.SparkSessionCatalog') \
               .config('spark.sql.catalog.spark_catalog.type', 'hive') \
               .appName("IcebergDatalake") \
               .getOrCreate()
   
   df = spark_session.createDataFrame([
          ("100", "2015-01-01", "2015-01-01T13:51:39.340396Z"),
           ("101", "2015-01-01", "2015-01-01T12:14:58.597216Z"),
           ("102", "2015-01-01", "2015-01-01T13:51:40.417052Z"),
           ("103", "2015-01-01", "2015-01-01T13:51:40.519832Z")
       ], ["id", "creation_date", "last_update_time"])
       df.writeTo(f"{catalog}.{database}." + table_name).using("iceberg").create()
   ```
   
   **Spark command used to run:**
   `spark-submit --deploy-mode cluster--packages org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:0.14.0,software.amazon.awssdk:bundle:2.17.257,software.amazon.awssdk:url-connection-client:2.17.257 --conf spark.yarn.submit.waitAppCompletion=true --conf "spark.executor.extraJavaOptions=-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=\"/opt/spark\"" --conf spark.dynamicAllocation.enabled=true --conf spark.executor.maxMemory=32g --conf spark.dynamicAllocation.executorIdleTimeout=300 --conf spark.shuffle.service.enabled=true --driver-memory 8g --num-executors 1 --executor-memory 32g --executor-cores 5 iceberg_main.py`
   
   **Error stacktrace:**
   ```
   Traceback (most recent call last):
     File "iceberg_main.py", line 899, in <module>
       bootstrap_table(tableName, spark, write_type, is_local_run, hive_sync_enabled, database, catalog)
     File "iceberg_main.py", line 428, in bootstrap_table
       bootstrap_to_iceberg(table_name, write_type, spark_session, is_local_run, hive_sync_enabled, database, catalog, stacks)
     File "iceberg_main.py", line 407, in bootstrap_to_iceberg
       df.writeTo(f"{catalog}.{database}." + table_name).using("iceberg").create()
     File "/mnt/yarn/usercache/hadoop/appcache/application_1664278990474_0004/container_1664278990474_0004_01_000001/pyspark.zip/pyspark/sql/readwriter.py", line 1129, in create
     File "/mnt/yarn/usercache/hadoop/appcache/application_1664278990474_0004/container_1664278990474_0004_01_000001/py4j-0.10.9.3-src.zip/py4j/java_gateway.py", line 1322, in __call__
     File "/mnt/yarn/usercache/hadoop/appcache/application_1664278990474_0004/container_1664278990474_0004_01_000001/pyspark.zip/pyspark/sql/utils.py", line 117, in deco
   pyspark.sql.utils.IllegalArgumentException: Invalid table identifier: test.EPAYMENT
   
   ```
   
   Please provide insights on what am missing.
   The same code works fine, if i use hadoop catalog instead of Glue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] github-actions[bot] closed issue #5867: Facing error when creating iceberg table in EMR using Glue catalog

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] closed issue #5867: Facing error when creating iceberg table in EMR using Glue catalog
URL: https://github.com/apache/iceberg/issues/5867


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] C-h-e-r-r-y commented on issue #5867: Facing error when creating iceberg table in EMR using Glue catalog

Posted by GitBox <gi...@apache.org>.
C-h-e-r-r-y commented on issue #5867:
URL: https://github.com/apache/iceberg/issues/5867#issuecomment-1294143670

   > can try setting glue.skip-name-validation via catalog properties if you wanna skip these validations :
   
   It is very hard to figure out how to set these propertes. Could you please share small example? I have tried `spar.glue.skip-name-validation` or `spark.sql.glue.skip-name-validation` or `spark.sql.catalog.my_catalog.glue.skip-name-validation` and have no luck :-(


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] singhpk234 commented on issue #5867: Facing error when creating iceberg table in EMR using Glue catalog

Posted by GitBox <gi...@apache.org>.
singhpk234 commented on issue #5867:
URL: https://github.com/apache/iceberg/issues/5867#issuecomment-1259510116

   This is because glueCatalog has additonal vaildations on tablename it should only contain lower case alphabets.
   https://github.com/apache/iceberg/blob/6d2edd6284ebc5301dbe45376a31ca8316852a77/aws/src/main/java/org/apache/iceberg/aws/glue/GlueCatalog.java#L499-L506
   
   
   can try setting `glue.skip-name-validation` via catalog properties : 
   https://github.com/apache/iceberg/blob/6d2edd6284ebc5301dbe45376a31ca8316852a77/aws/src/main/java/org/apache/iceberg/aws/AwsProperties.java#L106-L114
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] singhpk234 commented on issue #5867: Facing error when creating iceberg table in EMR using Glue catalog

Posted by GitBox <gi...@apache.org>.
singhpk234 commented on issue #5867:
URL: https://github.com/apache/iceberg/issues/5867#issuecomment-1304081971

   ideally 
   ```shell
   -- conf spark.sql.catalog.{catalog_name}.glue.skip-name-validation=false
   ```
   should have worked, can you please add the complete spark conf's you are giving and also iceberg version your are trying it with.
   
   Note: this was added in [iceberg 0.14.0](https://github.com/apache/iceberg/commit/5653a0d9c704b22b3b193ae6338af6261833f6e2) release
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] github-actions[bot] commented on issue #5867: Facing error when creating iceberg table in EMR using Glue catalog

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #5867:
URL: https://github.com/apache/iceberg/issues/5867#issuecomment-1535547456

   This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] github-actions[bot] commented on issue #5867: Facing error when creating iceberg table in EMR using Glue catalog

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #5867:
URL: https://github.com/apache/iceberg/issues/5867#issuecomment-1606351479

   This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org