You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/12/30 17:04:55 UTC

[GitHub] [iceberg] RussellSpitzer commented on issue #3828: java.lang.NoSuchMethodError: org.apache.hadoop.util.SemaphoredDelegatingExecutor.(Ljava/util/concurrent/ExecutorService;IZ)V

RussellSpitzer commented on issue #3828:
URL: https://github.com/apache/iceberg/issues/3828#issuecomment-1003110958


   The error looks like you are missing dependencies of the Hadoop-S3
   Filesystem api. Specifically some dep of Hadoop-aws. There is a bug noted
   here
   https://issues.apache.org/jira/browse/HADOOP-16080 which says that the
   issue is fixed as of 3.2.2 so I would double check that your runtime Hadoop
   versions are correct.
   
   On Thu, Dec 30, 2021 at 10:32 AM Prithvi D ***@***.***> wrote:
   
   > Unable to create basic iceberg table(stored on S3) with Spark
   >
   > spark.sql("CREATE TABLE table32 (id bigint, data string) USING iceberg")
   >
   > ---------------------------------------------------------------------------
   > Py4JJavaError                             Traceback (most recent call last)
   > /tmp/ipykernel_15/955019638.py in <module>
   > ----> 1 spark.sql("CREATE TABLE table32 (id bigint, data string) USING iceberg")
   >
   > /opt/spark/python/pyspark/sql/session.py in sql(self, sqlQuery)
   >     721         [Row(f1=1, f2='row1'), Row(f1=2, f2='row2'), Row(f1=3, f2='row3')]
   >     722         """
   > --> 723         return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
   >     724
   >     725     def table(self, tableName):
   >
   > /usr/local/lib/python3.9/dist-packages/py4j/java_gateway.py in __call__(self, *args)
   >    1302
   >    1303         answer = self.gateway_client.send_command(command)
   > -> 1304         return_value = get_return_value(
   >    1305             answer, self.gateway_client, self.target_id, self.name)
   >    1306
   >
   > /opt/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
   >     109     def deco(*a, **kw):
   >     110         try:
   > --> 111             return f(*a, **kw)
   >     112         except py4j.protocol.Py4JJavaError as e:
   >     113             converted = convert_exception(e.java_exception)
   >
   > /usr/local/lib/python3.9/dist-packages/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
   >     324             value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
   >     325             if answer[1] == REFERENCE_TYPE:
   > --> 326                 raise Py4JJavaError(
   >     327                     "An error occurred while calling {0}{1}{2}.\n".
   >     328                     format(target_id, ".", name), value)
   >
   > Py4JJavaError: An error occurred while calling o224.sql.
   > : java.lang.NoSuchMethodError: org.apache.hadoop.util.SemaphoredDelegatingExecutor.<init>(Ljava/util/concurrent/ExecutorService;IZ)V
   > 	at org.apache.hadoop.fs.s3a.S3AFileSystem.create(S3AFileSystem.java:824)
   > 	at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1118)
   > 	at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1098)
   > 	at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:987)
   > 	at org.apache.iceberg.hadoop.HadoopOutputFile.createOrOverwrite(HadoopOutputFile.java:85)
   > 	at org.apache.iceberg.TableMetadataParser.internalWrite(TableMetadataParser.java:119)
   > 	at org.apache.iceberg.TableMetadataParser.overwrite(TableMetadataParser.java:109)
   > 	at org.apache.iceberg.BaseMetastoreTableOperations.writeNewMetadata(BaseMetastoreTableOperations.java:154)
   > 	at org.apache.iceberg.hive.HiveTableOperations.doCommit(HiveTableOperations.java:206)
   > 	at org.apache.iceberg.BaseMetastoreTableOperations.commit(BaseMetastoreTableOperations.java:126)
   > 	at org.apache.iceberg.BaseMetastoreCatalog$BaseMetastoreCatalogTableBuilder.create(BaseMetastoreCatalog.java:216)
   > 	at org.apache.iceberg.CachingCatalog$CachingTableBuilder.lambda$create$0(CachingCatalog.java:212)
   > 	at org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.BoundedLocalCache.lambda$doComputeIfAbsent$14(BoundedLocalCache.java:2344)
   > 	at java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1853)
   > 	at org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.BoundedLocalCache.doComputeIfAbsent(BoundedLocalCache.java:2342)
   > 	at org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.BoundedLocalCache.computeIfAbsent(BoundedLocalCache.java:2325)
   > 	at org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.LocalCache.computeIfAbsent(LocalCache.java:108)
   > 	at org.apache.iceberg.shaded.com.github.benmanes.caffeine.cache.LocalManualCache.get(LocalManualCache.java:62)
   > 	at org.apache.iceberg.CachingCatalog$CachingTableBuilder.create(CachingCatalog.java:210)
   > 	at org.apache.iceberg.spark.SparkCatalog.createTable(SparkCatalog.java:139)
   > 	at org.apache.iceberg.spark.SparkCatalog.createTable(SparkCatalog.java:81)
   > 	at org.apache.iceberg.spark.SparkSessionCatalog.createTable(SparkSessionCatalog.java:130)
   > 	at org.apache.spark.sql.execution.datasources.v2.CreateTableExec.run(CreateTableExec.scala:41)
   > 	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:40)
   > 	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:40)
   > 	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:46)
   > 	at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:228)
   > 	at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3687)
   > 	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
   > 	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
   > 	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
   > 	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
   > 	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
   > 	at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3685)
   > 	at org.apache.spark.sql.Dataset.<init>(Dataset.scala:228)
   > 	at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
   > 	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
   > 	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
   > 	at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:618)
   > 	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
   > 	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613)
   > 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   > 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   > 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   > 	at java.lang.reflect.Method.invoke(Method.java:498)
   > 	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
   > 	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
   > 	at py4j.Gateway.invoke(Gateway.java:282)
   > 	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
   > 	at py4j.commands.CallCommand.execute(CallCommand.java:79)
   > 	at py4j.GatewayConnection.run(GatewayConnection.java:238)
   > 	at java.lang.Thread.run(Thread.java:748)
   >
   > I have no issues creating a simple hive table. Can see student5 folder
   > created in my S3 bucket after this command
   >
   > spark.sql("CREATE TABLE student5 (id INT, name STRING, age INT)")
   >
   > WARN ResolveSessionCatalog: A Hive serde table will be created as there is no table provider specified. You can set spark.sql.legacy.createHiveTableByDefault to false so that native data source table will be created instead.
   > DataFrame[]
   >
   > Setup details
   >
   >    1. Spark on kubernetes
   >    2. Remote hive metastore service (mysql backend)
   >    3. Spark pods connect to metastore service via thrift
   >    4. Versions
   >       - Spark: 3.1.2
   >       - Hadoop: 3.2.2
   >       - Hive Standalone Metastore: 3.0.0
   >       - aws-java-sdk-bundle-1.11.563.jar
   >       - hadoop-aws-3.2.2.jar
   >       - guava jar in hive metastore updated to guava-27.0-jre.jar
   >
   > —
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/iceberg/issues/3828>, or unsubscribe
   > <https://github.com/notifications/unsubscribe-auth/AADE2YIJLBD5FC2N2SCSDF3UTSCQZANCNFSM5K77OCDA>
   > .
   > You are receiving this because you are subscribed to this thread.Message
   > ID: ***@***.***>
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org