You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@gobblin.apache.org by "Vikram Bohra (Jira)" <ji...@apache.org> on 2023/06/17 05:18:00 UTC

[jira] [Created] (GOBBLIN-1845) Java parallel stream usage causes class loader conflict when run with spark

Vikram Bohra created GOBBLIN-1845:
-------------------------------------

             Summary: Java parallel stream usage causes class loader conflict when run with spark
                 Key: GOBBLIN-1845
                 URL: https://issues.apache.org/jira/browse/GOBBLIN-1845
             Project: Apache Gobblin
          Issue Type: Task
            Reporter: Vikram Bohra


DatasetsFinderFilteringDecorator uses parallel stream on datasets to filter them on predicates. When this code runs in spark, system class loader gets used to pickup hive jar instead of the current conext class loader which leads to ClassNotFound issues 

stacktrace 
{code:java}
Caused by: MetaException(message:org.apache.hadoop.hive.metastore.HiveMetaStoreClient class not found)
	at org.apache.hadoop.hive.metastore.MetaStoreUtils.getClass(MetaStoreUtils.java:1494)
	at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:130)
	at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
	at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:98)
	at org.apache.gobblin.hive.HiveMetaStoreClientFactory.createMetaStoreClient(HiveMetaStoreClientFactory.java:100)
	at org.apache.gobblin.hive.HiveMetaStoreClientFactory.create(HiveMetaStoreClientFactory.java:106) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)