You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/12/30 17:04:55 UTC
[GitHub] [iceberg] RussellSpitzer commented on issue #3828: java.lang.NoSuchMethodError: org.apache.hadoop.util.SemaphoredDelegatingExecutor.(Ljava/util/concurrent/ExecutorService;IZ)V
RussellSpitzer commented on issue #3828:
URL: https://github.com/apache/iceberg/issues/3828#issuecomment-1003110958
The error looks like you are missing dependencies of the Hadoop-S3
Filesystem api. Specifically some dep of Hadoop-aws. There is a bug noted
here
https://issues.apache.org/jira/browse/HADOOP-16080 which says that the
issue is fixed as of 3.2.2 so I would double check that your runtime Hadoop
versions are correct.
On Thu, Dec 30, 2021 at 10:32 AM Prithvi D ***@***.***> wrote:
> Unable to create basic iceberg table(stored on S3) with Spark
>
> spark.sql("CREATE TABLE table32 (id bigint, data string) USING iceberg")
>
> ---------------------------------------------------------------------------
> Py4JJavaError Traceback (most recent call last)
> /tmp/ipykernel_15/955019638.py in <module>
> ----> 1 spark.sql("CREATE TABLE table32 (id bigint, data string) USING iceberg")
>
> /opt/spark/python/pyspark/sql/session.py in sql(self, sqlQuery)
> 721 [Row(f1=1, f2='row1'), Row(f1=2, f2='row2'), Row(f1=3, f2='row3')]
> 722 """
> --> 723 return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
> 724
> 725 def table(self, tableName):
>
> /usr/local/lib/python3.9/dist-packages/py4j/java_gateway.py in __call__(self, *args)
> 1302
> 1303 answer = self.gateway_client.send_command(command)
> -> 1304 return_value = get_return_value(
> 1305 answer, self.gateway_client, self.target_id, self.name)
> 1306
>
> /opt/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
> 109 def deco(*a, **kw):
> 110 try:
> --> 111 return f(*a, **kw)
> 112 except py4j.protocol.Py4JJavaError as e:
> 113 converted = convert_exception(e.java_exception)
>
> /usr/local/lib/python3.9/dist-packages/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
> 324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
> 325 if answer[1] == REFERENCE_TYPE:
> --> 326 raise Py4JJavaError(
> 327 "An error occurred while calling {0}{1}{2}.\n".
> 328 format(target_id, ".", name), value)
>
> Py4JJavaError: An error occurred while calling o224.sql.
> : java.lang.NoSuchMethodError: org.apache.hadoop.util.SemaphoredDelegatingExecutor.<init>(Ljava/util/concurrent/ExecutorService;IZ)V
> at org.apache.hadoop.fs.s3a.S3AFileSystem.create(S3AFileSystem.java:824)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1118)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1098)
> at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:987)
> at org.apache.iceberg.hadoop.HadoopOutputFile.createOrOverwrite(HadoopOutputFile.java:85)
> at org.apache.iceberg.TableMetadataParser.internalWrite(TableMetadataParser.java:119)
> at org.apache.iceberg.TableMetadataParser.overwrite(TableMetadataParser.java:109)
> at org.apache.iceberg.BaseMetastoreTableOperations.writeNewMetadata(BaseMetastoreTableOperations.java:154)
> at org.apache.iceberg.hive.HiveTableOperations.doCommit(HiveTableOperations.java:206)
> at org.apache.iceberg.BaseMetastoreTableOperations.commit(BaseMetastoreTableOperations.java:126)
> at org.apache.iceberg.BaseMetastoreCatalog$BaseMetastoreCatalogTableBuilder.create(BaseMetastoreCatalog.java:216)
> at org.apache.iceberg.CachingCatalog$CachingTableBuilder.lambda$create$0(CachingCatalog.java:212)
> at org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.BoundedLocalCache.lambda$doComputeIfAbsent$14(BoundedLocalCache.java:2344)
> at java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1853)
> at org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.BoundedLocalCache.doComputeIfAbsent(BoundedLocalCache.java:2342)
> at org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.BoundedLocalCache.computeIfAbsent(BoundedLocalCache.java:2325)
> at org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.LocalCache.computeIfAbsent(LocalCache.java:108)
> at org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.LocalManualCache.get(LocalManualCache.java:62)
> at org.apache.iceberg.CachingCatalog$CachingTableBuilder.create(CachingCatalog.java:210)
> at org.apache.iceberg.spark.SparkCatalog.createTable(SparkCatalog.java:139)
> at org.apache.iceberg.spark.SparkCatalog.createTable(SparkCatalog.java:81)
> at org.apache.iceberg.spark.SparkSessionCatalog.createTable(SparkSessionCatalog.java:130)
> at org.apache.spark.sql.execution.datasources.v2.CreateTableExec.run(CreateTableExec.scala:41)
> at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:40)
> at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:40)
> at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:46)
> at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:228)
> at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3687)
> at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
> at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
> at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
> at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
> at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
> at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3685)
> at org.apache.spark.sql.Dataset.<init>(Dataset.scala:228)
> at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
> at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
> at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
> at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:618)
> at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
> at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
> at py4j.Gateway.invoke(Gateway.java:282)
> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
> at py4j.commands.CallCommand.execute(CallCommand.java:79)
> at py4j.GatewayConnection.run(GatewayConnection.java:238)
> at java.lang.Thread.run(Thread.java:748)
>
> I have no issues creating a simple hive table. Can see student5 folder
> created in my S3 bucket after this command
>
> spark.sql("CREATE TABLE student5 (id INT, name STRING, age INT)")
>
> WARN ResolveSessionCatalog: A Hive serde table will be created as there is no table provider specified. You can set spark.sql.legacy.createHiveTableByDefault to false so that native data source table will be created instead.
> DataFrame[]
>
> Setup details
>
> 1. Spark on kubernetes
> 2. Remote hive metastore service (mysql backend)
> 3. Spark pods connect to metastore service via thrift
> 4. Versions
> - Spark: 3.1.2
> - Hadoop: 3.2.2
> - Hive Standalone Metastore: 3.0.0
> - aws-java-sdk-bundle-1.11.563.jar
> - hadoop-aws-3.2.2.jar
> - guava jar in hive metastore updated to guava-27.0-jre.jar
>
> —
> Reply to this email directly, view it on GitHub
> <https://github.com/apache/iceberg/issues/3828>, or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AADE2YIJLBD5FC2N2SCSDF3UTSCQZANCNFSM5K77OCDA>
> .
> You are receiving this because you are subscribed to this thread.Message
> ID: ***@***.***>
>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org