You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/05/17 17:56:19 UTC

[GitHub] [iceberg] edgarRd opened a new pull request #2598: MR: remove Hive dependencies on Iceberg de/serialization utility functions

edgarRd opened a new pull request #2598:
URL: https://github.com/apache/iceberg/pull/2598


   When running in Hive / Tez, I'm hitting the following error with current `master` branch:
   
   ```
   Vertex failed, vertexName=Map 1, vertexId=vertex_1613777207443_50159_1_00, diagnostics=[Vertex vertex_1613777207443_50159_1_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: my_table initializer failed, vertex=vertex_1613777207443_50159_1_00 [Map 1], java.lang.NoClassDefFoundError: org/apache/hadoop/hive/metastore/HiveMetaHook
   	at org.apache.iceberg.mr.mapreduce.IcebergInputFormat.getSplits(IcebergInputFormat.java:99)
   	at org.apache.iceberg.mr.mapred.MapredIcebergInputFormat.getSplits(MapredIcebergInputFormat.java:68)
   	at org.apache.iceberg.mr.hive.HiveIcebergInputFormat.getSplits(HiveIcebergInputFormat.java:72)
   	at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:442)
   	at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:561)
   	at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:196)
   	at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278)
   	at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269)
   	at java.security.AccessController.doPrivileged(Native Method)
   	at javax.security.auth.Subject.doAs(Subject.java:422)
   	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
   	at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:269)
   	at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
   	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   	at java.lang.Thread.run(Thread.java:748)
   Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.metastore.HiveMetaHook
   	at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
   	at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
   	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
   	at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
   	... 18 more
   ]
   ```
   
   After some debugging, I noticed that the missing class `org.apache.hadoop.hive.metastore.HiveMetaHook` is part of the `hive-metastore` package and is introduced by https://github.com/apache/iceberg/commit/d1510340eaff68d88a2e8194d58e7e493af02bcc#diff-9f974af5a35965b695ad7b3a1fa0d806d4748e890dabd015c538326def44d289R99. The issue seems to be that by importing class `HiveIcebergStorageHandler` in `IcebergInputFormat` now in the Tez side it needs to resolve all other Hive package dependencies within that class.
   
   Since `HiveIcebergStorageHandler` is introduced only for the `table` function, this PR attempts to avoid that dependency and extract functions out of `HiveIcebergStorageHandler` that do deserialization and do not depend on Hive, removing the requirement to have them in the classpath in the Tez execution side.
   
   PTAL @pvary @massdosage


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wangyinsheng commented on pull request #2598: MR: remove Hive dependencies on Iceberg de/serialization utility functions

Posted by GitBox <gi...@apache.org>.
wangyinsheng commented on pull request #2598:
URL: https://github.com/apache/iceberg/pull/2598#issuecomment-977642780


   @edgarRd I got the same exception, is there any update?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on pull request #2598: MR: remove Hive dependencies on Iceberg de/serialization utility functions

Posted by GitBox <gi...@apache.org>.
pvary commented on pull request #2598:
URL: https://github.com/apache/iceberg/pull/2598#issuecomment-843068762


   +1 on keeping the serialization / deserialization in one place.
   
   Otherwise looks good to me


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] marton-bod commented on pull request #2598: MR: remove Hive dependencies on Iceberg de/serialization utility functions

Posted by GitBox <gi...@apache.org>.
marton-bod commented on pull request #2598:
URL: https://github.com/apache/iceberg/pull/2598#issuecomment-842937212


   Thanks for this patch! I think the refactor makes sense, even if there were no classpath issues. One thing though, I think we should also add their opposite, serialization operations into this new class. For example, we serialize the table into the config in `HiveIcebergStorageHandler#overlayTableProperties`, which would now belong more naturally to this util class as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org