You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/10/20 18:23:40 UTC

[GitHub] [hudi] zhedoubushishi edited a comment on pull request #1760: [HUDI-1040] Update apis for spark3 compatibility

zhedoubushishi edited a comment on pull request #1760:
URL: https://github.com/apache/hudi/pull/1760#issuecomment-713049606


   > Got hive class error
   > 
   > ```
   > Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(Lorg/apache/hadoop/hive/conf/HiveConf;Lorg/apache/hadoop/hive/metastore/HiveMetaHookLoader;Ljava/util/concurrent/ConcurrentHashMap;Ljava/lang/String;Z)Lorg/apache/hadoop/hive/metastore/IMetaStoreClient;
   > 	at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3600)
   > 	at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3652)
   > 	at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3632)
   > 	at org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:3894)
   > 	at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:248)
   > 	at org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:231)
   > 	at org.apache.hadoop.hive.ql.metadata.Hive.<init>(Hive.java:388)
   > 	at org.apache.hadoop.hive.ql.metadata.Hive.create(Hive.java:332)
   > 	at org.apache.hadoop.hive.ql.metadata.Hive.getInternal(Hive.java:312)
   > 	at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:288)
   > 	at org.apache.spark.sql.hive.client.HiveClientImpl.client(HiveClientImpl.scala:260)
   > 	at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:286)
   > 	at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:227)
   > 	at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:226)
   > 	at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:276)
   > 	at org.apache.spark.sql.hive.client.HiveClientImpl.databaseExists(HiveClientImpl.scala:389)
   > 	at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$databaseExists$1(HiveExternalCatalog.scala:221)
   > 	at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
   > 	at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:99)
   > 	at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:221)
   > 	at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:137)
   > 	at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:127)
   > 	at org.apache.spark.sql.internal.SharedState.globalTempViewManager$lzycompute(SharedState.scala:157)
   > 	at org.apache.spark.sql.internal.SharedState.globalTempViewManager(SharedState.scala:155)
   > 	at org.apache.spark.sql.hive.HiveSessionStateBuilder.$anonfun$catalog$2(HiveSessionStateBuilder.scala:59)
   > 	at org.apache.spark.sql.catalyst.catalog.SessionCatalog.globalTempViewManager$lzycompute(SessionCatalog.scala:93)
   > 	at org.apache.spark.sql.catalyst.catalog.SessionCatalog.globalTempViewManager(SessionCatalog.scala:93)
   > 	at org.apache.spark.sql.catalyst.catalog.SessionCatalog.createDatabase(SessionCatalog.scala:206)
   > 	at org.apache.spark.sql.execution.command.CreateDatabaseCommand.run(ddl.scala:81)
   > 	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
   > 	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
   > 	at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
   > 	at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:229)
   > 	at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3616)
   > 	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100)
   > 	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160)
   > 	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87)
   > 	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763)
   > 	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
   > 	at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3614)
   > 	at org.apache.spark.sql.Dataset.<init>(Dataset.scala:229)
   > 	at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100)
   > 	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763)
   > 	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97)
   > 	at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:606)
   > 	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763)
   > 	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:601)
   > 	at bb.gtd.de.CanalJsonToOds$.initSchema(CanalJsonToOds.scala:81)
   > 	at bb.gtd.de.CanalJsonToOds$.delayedEndpoint$bb$gtd$de$CanalJsonToOds$1(CanalJsonToOds.scala:42)
   > 	at bb.gtd.de.CanalJsonToOds$delayedInit$body.apply(CanalJsonToOds.scala:11)
   > 	at scala.Function0.apply$mcV$sp(Function0.scala:39)
   > 	at scala.Function0.apply$mcV$sp$(Function0.scala:39)
   > 	at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)
   > 	at scala.App.$anonfun$main$1$adapted(App.scala:80)
   > 	at scala.collection.immutable.List.foreach(List.scala:431)
   > 	at scala.App.main(App.scala:80)
   > 	at scala.App.main$(App.scala:78)
   > 	at bb.gtd.de.CanalJsonToOds$.main(CanalJsonToOds.scala:11)
   > 	at bb.gtd.de.CanalJsonToOds.main(CanalJsonToOds.scala)
   > 20/10/14 15:18:12 INFO SparkContext: Invoking stop() from shutdown hook
   > 20/10/14 15:18:12 INFO AbstractConnector: Stopped Spark@64b31700{HTTP/1.1,[http/1.1]}{0.0.0.0:4041}
   > 20/10/14 15:18:12 INFO SparkUI: Stopped Spark web UI at http://192.168.200.57:4041
   > 20/10/14 15:18:12 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
   > 20/10/14 15:18:12 INFO MemoryStore: MemoryStore cleared
   > 20/10/14 15:18:12 INFO BlockManager: BlockManager stopped
   > 20/10/14 15:18:12 INFO BlockManagerMaster: BlockManagerMaster stopped
   > 20/10/14 15:18:12 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
   > 20/10/14 15:18:12 INFO SparkContext: Successfully stopped SparkContext
   > 20/10/14 15:18:12 INFO ShutdownHookManager: Shutdown hook called
   > 20/10/14 15:18:12 INFO ShutdownHookManager: Deleting directory /private/var/folders/vv/5d3clfpj22q_c12ghwdnpfl80000gn/T/spark-a0c47cb6-1512-4f8c-8d10-38e58796fed6
   > Disconnected from the target VM, address: '127.0.0.1:50019', transport: 'socket'
   > ```
   
   I suspect that this is because Spark 3.0.0 uses Hive [2.3.7](https://github.com/apache/spark/blob/v3.0.0/pom.xml#L130) but Spark 2.x uses Hive [1.2.1.spark2](https://github.com/apache/spark/blob/v2.4.0/pom.xml#L129) and this causes some API conflict. 
   
   Looks like this signature: ```org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(Lorg/apache/hadoop/hive/conf/HiveConf;Lorg/apache/hadoop/hive/metastore/HiveMetaHookLoader;Ljava/util/concurrent/ConcurrentHashMap;Ljava/lang/String;Z)``` only exists in Hive ```1.2.1.spark2```: https://github.com/JoshRosen/hive/blob/release-1.2.1-spark2/metastore/src/java/org/apache/hadoop/hive/metastore/RetryingMetaStoreClient.java#L101 but no longer exists in Hive ```2.3.7```: https://github.com/apache/hive/blob/rel/release-2.3.7/metastore/src/java/org/apache/hadoop/hive/metastore/RetryingMetaStoreClient.java.
   
   If we compile with Spark 2 and then run with Spark 3, we will run into this kind of issue.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org