You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@gobblin.apache.org by "Vikram Bohra (Jira)" <ji...@apache.org> on 2023/06/17 05:18:00 UTC
[jira] [Created] (GOBBLIN-1845) Java parallel stream usage causes class loader conflict when run with spark
Vikram Bohra created GOBBLIN-1845:
-------------------------------------
Summary: Java parallel stream usage causes class loader conflict when run with spark
Key: GOBBLIN-1845
URL: https://issues.apache.org/jira/browse/GOBBLIN-1845
Project: Apache Gobblin
Issue Type: Task
Reporter: Vikram Bohra
DatasetsFinderFilteringDecorator uses parallel stream on datasets to filter them on predicates. When this code runs in spark, system class loader gets used to pickup hive jar instead of the current conext class loader which leads to ClassNotFound issues
stacktrace
{code:java}
Caused by: MetaException(message:org.apache.hadoop.hive.metastore.HiveMetaStoreClient class not found)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.getClass(MetaStoreUtils.java:1494)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:130)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:98)
at org.apache.gobblin.hive.HiveMetaStoreClientFactory.createMetaStoreClient(HiveMetaStoreClientFactory.java:100)
at org.apache.gobblin.hive.HiveMetaStoreClientFactory.create(HiveMetaStoreClientFactory.java:106) {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)