You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/05/28 05:58:45 UTC

[GitHub] [hudi] n3nash edited a comment on issue #2688: [SUPPORT] Sync to Hive using Metastore

n3nash edited a comment on issue #2688:
URL: https://github.com/apache/hudi/issues/2688#issuecomment-850157533


   @Limess Let me describe to you the problem:
   
   1. Hudi has a compile time dependency on Hive to 2.x
   2. Spark internally depends on Hive 1.x
   
   The Hive-Sync mechanism generally works as a standalone piece that registers your tables in HMS to work as "hive tables". Since this is a standalone piece, it can control what dependency it brings in. 
   
   Spark on the other hand, always brings in Hive 1.x and depending on how the class-path gets loaded, it will load only Hive 1.x jars. 
   
   The HiveDriver class in Hive 1.x -> https://github.com/apache/hive/blob/release-1.2.1/ql/src/java/org/apache/hadoop/hive/ql/Driver.java#L1742 is different from the HiveDriver class in Hive 2.x -> https://github.com/apache/hive/blob/release-2.3.8-rc2/ql/src/java/org/apache/hadoop/hive/ql/Driver.java#L2377. My guess is that Hudi attempts to invoke the HiveDriver compiled using Hive 2.x but Spark loads up Hive 1.x 
   
   Since most such API's are compatible between Hive 1.x and 2.x - there have not been any issues. Similarly, this close() method also does not differ. I need to perform a deeper dive into this as to why the NoSuchMethod exception is being thrown but one thing is certain that it's happening due to the different Hive versions. 
   
   As a workaround, are you able to use the hive-sync-tool to unblock your issues ? Find documentation here -> https://hudi.apache.org/docs/docker_demo.html#step-3-sync-with-hive. 
   
   Once you use this. you can disable `hoodie.datasource.hive_sync.enable` and may be trigger the hive-sync-tool from your python job as a subprocess ?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org