You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Vinoth Chandar (Jira)" <ji...@apache.org> on 2019/09/17 23:05:00 UTC

[jira] [Commented] (HUDI-260) Hudi Spark Bundle does not work when passed in extraClassPath option

    [ https://issues.apache.org/jira/browse/HUDI-260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16931904#comment-16931904 ] 

Vinoth Chandar commented on HUDI-260:
-------------------------------------

[~uditme] Let me reproduce this on the docker setup and see whats going on.. Mind pasting the exception you get when you try to do a + b ? 

> Hudi Spark Bundle does not work when passed in extraClassPath option
> --------------------------------------------------------------------
>
>                 Key: HUDI-260
>                 URL: https://issues.apache.org/jira/browse/HUDI-260
>             Project: Apache Hudi (incubating)
>          Issue Type: Improvement
>          Components: Spark datasource, SparkSQL Support
>            Reporter: Vinoth Chandar
>            Assignee: Vinoth Chandar
>            Priority: Major
>
> On EMR's side we have the same findings. *a + b + c +d* work in the following cases:
>  * The bundle jar (with databricks-avro shaded) is specified using *--jars* or *spark.jars* option
>  * The bundle jar (with databricks-avro shaded) is placed in the Spark Home jars folder i.e. */usr/lib/spark/jars* folder
> However, it does not work if the jar is specified using *spark.driver.extraClassPath* and *spark.executor.extraClassPath* options which is what EMR uses to configure external dependencies. Although we can drop the jar in */usr/lib/spark/jars* folder, but I am not sure if it is recommended because that folder is supposed to contain the jars coming from spark. Extra dependencies from users side would be better off specified through *extraClassPath* option.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)