You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/12/03 14:34:25 UTC

[GitHub] [iceberg] marton-bod commented on a change in pull request #1781: Package thrift into the hive-runtime jar

marton-bod commented on a change in pull request #1781:
URL: https://github.com/apache/iceberg/pull/1781#discussion_r535277851



##########
File path: build.gradle
##########
@@ -607,6 +607,8 @@ project(':iceberg-hive-runtime') {
     if (jdkVersion == '8') {
       compile project(':iceberg-hive3')
     }
+    // including thrift so that the Tez AM can communicate with the HMS when using Hive catalog
+    compile("org.apache.thrift:libfb303")

Review comment:
       Thanks for your patience again. Here are my answers:
   > Shouldn't this already be provided by Hive at runtime? Should Hive be the one putting this on the AM?
   
   Correct. Hive localizes the hive exec jar onto the classpath of the Tez AM, via `TezSessionState.createJarLocalResource`. More precisely, it checks which jar the class `org.apache.hadoop.hive.ql.exec.tez.DagUtils` comes from and localizes that specific jar for the AM 
   -- and this should be the `hive-exec.jar`.
   
   > Also another interesting consideration is that Iceberg does not use Thrift directly, it uses HMSClient. Now hive-runtime does not include HMSClient code either. So how is Tez AM able to get the HMSClient code but not Thrift?
   
   Correct, it's the `HiveMetaStoreClient` which we use directly in Iceberg. This class (along with other metastore classes) is already part of the `hive-exec.jar`, along with most of the Thrift classes as well so Tez has no problem finding them. The only thing that's missing is the `libfb303` (Facebook) thrift classes - these are not contained in the `hive-exec.jar`. This absence has not caused any issues before because Tez has not needed to communicate with the HMS, but now it does when we use Iceberg tables.
   
   > What is Iceberg doing on the AM that needs to talk with HMS?
   
   Mostly `IcebergInputFormat.getSplits` loading the table from HMS (and previously `HiveIcebergSerDe` loading the table schema from the HMS - but this latter should be taken care of by https://github.com/apache/iceberg/pull/1794)
   
   > I've never run into a problem with Iceberg on Hive related to this as Hive provides libfb303-0.9.3.jar in its lib folder (for Hive 2.3.7). What steps reproduce the problem? Is it specific to Hive 3?
   
   This problem should be specific to Tez, because the AM does not have access to the lib folder used by Hive. Not exactly sure how the classpath is handled when MR launches separate processes.
   
   > If we need to include this, is it okay to not shade this?
   
   I've tried shading it by including `relocate 'com.facebook.fb303', 'org.apache.iceberg.shaded.com.facebook.fb303'` but then Hive unfortunately failed to find the class when it was shaded. 
   
   So to summarize: 
   
   - In the current state, Tez does not work out-of-the-box using only the hive-iceberg-runtime jar. The user also needs to put the `libfb303` jar on Tez's classpath one way or another. This should be documented. I will create a PR to add this to the Hive docs.
   - While the solution in this PR does solve this problem, the `libfb303` classes don't really belong in the runtime jar, especially in an unshaded version (although probably not too risky given there'd be no version conflict since Hive2 and Hive3 both use 0.9.3), so I'll abandon this change
   - A probably better solution would be to create a HIVE change and include the `libfb303` classes in the `hive-exec.jar` (which already contains the other thrift classes). Then we can include that change in an upcoming upstream release and upgrade the version in Iceberg once it's out.
   
   What are your thoughts?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org