You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/09/27 13:04:59 UTC
[GitHub] [iceberg] arunb2w opened a new issue, #5867: Facing error when creating iceberg table in EMR using Glue catalog
arunb2w opened a new issue, #5867:
URL: https://github.com/apache/iceberg/issues/5867
### Apache Iceberg version
0.14.0
### Query engine
EMR
### Please describe the bug 🐞
Facing error when creating iceberg table in EMR using Glue catalog.
spark version : 3.2.1
iceberg version: 0.14.0
**Sample code:**
```
catalog = glue_dev
warehouse_path = "s3_bucket"
database = "test"
table_name = "EPAYMENT"
spark = SparkSession \
.builder \
.config(f'spark.sql.catalog.{catalog}', 'org.apache.iceberg.spark.SparkCatalog') \
.config(f'spark.sql.catalog.{catalog}.warehouse', f'{warehouse_path}') \
.config(f'spark.sql.catalog.{catalog}.catalog-impl', 'org.apache.iceberg.aws.glue.GlueCatalog') \
.config(f'spark.sql.catalog.{catalog}.io-impl', 'org.apache.iceberg.aws.s3.S3FileIO') \
.config('spark.sql.extensions', 'org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions') \
.config('spark.sql.catalog.spark_catalog', 'org.apache.iceberg.spark.SparkSessionCatalog') \
.config('spark.sql.catalog.spark_catalog.type', 'hive') \
.appName("IcebergDatalake") \
.getOrCreate()
df = spark_session.createDataFrame([
("100", "2015-01-01", "2015-01-01T13:51:39.340396Z"),
("101", "2015-01-01", "2015-01-01T12:14:58.597216Z"),
("102", "2015-01-01", "2015-01-01T13:51:40.417052Z"),
("103", "2015-01-01", "2015-01-01T13:51:40.519832Z")
], ["id", "creation_date", "last_update_time"])
df.writeTo(f"{catalog}.{database}." + table_name).using("iceberg").create()
```
**Spark command used to run:**
`spark-submit --deploy-mode cluster--packages org.apache.iceberg:iceberg-spark-runtime-3.2_2.12:0.14.0,software.amazon.awssdk:bundle:2.17.257,software.amazon.awssdk:url-connection-client:2.17.257 --conf spark.yarn.submit.waitAppCompletion=true --conf "spark.executor.extraJavaOptions=-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=\"/opt/spark\"" --conf spark.dynamicAllocation.enabled=true --conf spark.executor.maxMemory=32g --conf spark.dynamicAllocation.executorIdleTimeout=300 --conf spark.shuffle.service.enabled=true --driver-memory 8g --num-executors 1 --executor-memory 32g --executor-cores 5 iceberg_main.py`
**Error stacktrace:**
```
Traceback (most recent call last):
File "iceberg_main.py", line 899, in <module>
bootstrap_table(tableName, spark, write_type, is_local_run, hive_sync_enabled, database, catalog)
File "iceberg_main.py", line 428, in bootstrap_table
bootstrap_to_iceberg(table_name, write_type, spark_session, is_local_run, hive_sync_enabled, database, catalog, stacks)
File "iceberg_main.py", line 407, in bootstrap_to_iceberg
df.writeTo(f"{catalog}.{database}." + table_name).using("iceberg").create()
File "/mnt/yarn/usercache/hadoop/appcache/application_1664278990474_0004/container_1664278990474_0004_01_000001/pyspark.zip/pyspark/sql/readwriter.py", line 1129, in create
File "/mnt/yarn/usercache/hadoop/appcache/application_1664278990474_0004/container_1664278990474_0004_01_000001/py4j-0.10.9.3-src.zip/py4j/java_gateway.py", line 1322, in __call__
File "/mnt/yarn/usercache/hadoop/appcache/application_1664278990474_0004/container_1664278990474_0004_01_000001/pyspark.zip/pyspark/sql/utils.py", line 117, in deco
pyspark.sql.utils.IllegalArgumentException: Invalid table identifier: test.EPAYMENT
```
Please provide insights on what am missing.
The same code works fine, if i use hadoop catalog instead of Glue
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] github-actions[bot] closed issue #5867: Facing error when creating iceberg table in EMR using Glue catalog
Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] closed issue #5867: Facing error when creating iceberg table in EMR using Glue catalog
URL: https://github.com/apache/iceberg/issues/5867
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] C-h-e-r-r-y commented on issue #5867: Facing error when creating iceberg table in EMR using Glue catalog
Posted by GitBox <gi...@apache.org>.
C-h-e-r-r-y commented on issue #5867:
URL: https://github.com/apache/iceberg/issues/5867#issuecomment-1294143670
> can try setting glue.skip-name-validation via catalog properties if you wanna skip these validations :
It is very hard to figure out how to set these propertes. Could you please share small example? I have tried `spar.glue.skip-name-validation` or `spark.sql.glue.skip-name-validation` or `spark.sql.catalog.my_catalog.glue.skip-name-validation` and have no luck :-(
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] singhpk234 commented on issue #5867: Facing error when creating iceberg table in EMR using Glue catalog
Posted by GitBox <gi...@apache.org>.
singhpk234 commented on issue #5867:
URL: https://github.com/apache/iceberg/issues/5867#issuecomment-1259510116
This is because glueCatalog has additonal vaildations on tablename it should only contain lower case alphabets.
https://github.com/apache/iceberg/blob/6d2edd6284ebc5301dbe45376a31ca8316852a77/aws/src/main/java/org/apache/iceberg/aws/glue/GlueCatalog.java#L499-L506
can try setting `glue.skip-name-validation` via catalog properties :
https://github.com/apache/iceberg/blob/6d2edd6284ebc5301dbe45376a31ca8316852a77/aws/src/main/java/org/apache/iceberg/aws/AwsProperties.java#L106-L114
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] singhpk234 commented on issue #5867: Facing error when creating iceberg table in EMR using Glue catalog
Posted by GitBox <gi...@apache.org>.
singhpk234 commented on issue #5867:
URL: https://github.com/apache/iceberg/issues/5867#issuecomment-1304081971
ideally
```shell
-- conf spark.sql.catalog.{catalog_name}.glue.skip-name-validation=false
```
should have worked, can you please add the complete spark conf's you are giving and also iceberg version your are trying it with.
Note: this was added in [iceberg 0.14.0](https://github.com/apache/iceberg/commit/5653a0d9c704b22b3b193ae6338af6261833f6e2) release
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] github-actions[bot] commented on issue #5867: Facing error when creating iceberg table in EMR using Glue catalog
Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #5867:
URL: https://github.com/apache/iceberg/issues/5867#issuecomment-1535547456
This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org
[GitHub] [iceberg] github-actions[bot] commented on issue #5867: Facing error when creating iceberg table in EMR using Glue catalog
Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #5867:
URL: https://github.com/apache/iceberg/issues/5867#issuecomment-1606351479
This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org