You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Vinoth Chandar (Jira)" <ji...@apache.org> on 2019/09/17 23:03:00 UTC

[jira] [Updated] (HUDI-260) Hudi Spark Bundle does not work when passed in extraClassPath option

     [ https://issues.apache.org/jira/browse/HUDI-260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinoth Chandar updated HUDI-260:
--------------------------------
    Description: 
On EMR's side we have the same findings. *a + b + c +d* work in the following cases:
 * The bundle jar (with databricks-avro shaded) is specified using *--jars* or *spark.jars* option
 * The bundle jar (with databricks-avro shaded) is placed in the Spark Home jars folder i.e. */usr/lib/spark/jars* folder

However, it does not work if the jar is specified using *spark.driver.extraClassPath* and *spark.executor.extraClassPath* options which is what EMR uses to configure external dependencies. Although we can drop the jar in */usr/lib/spark/jars* folder, but I am not sure if it is recommended because that folder is supposed to contain the jars coming from spark. Extra dependencies from users side would be better off specified through *extraClassPath* option.

> Hudi Spark Bundle does not work when passed in extraClassPath option
> --------------------------------------------------------------------
>
>                 Key: HUDI-260
>                 URL: https://issues.apache.org/jira/browse/HUDI-260
>             Project: Apache Hudi (incubating)
>          Issue Type: Improvement
>          Components: Spark datasource, SparkSQL Support
>            Reporter: Vinoth Chandar
>            Assignee: Vinoth Chandar
>            Priority: Major
>
> On EMR's side we have the same findings. *a + b + c +d* work in the following cases:
>  * The bundle jar (with databricks-avro shaded) is specified using *--jars* or *spark.jars* option
>  * The bundle jar (with databricks-avro shaded) is placed in the Spark Home jars folder i.e. */usr/lib/spark/jars* folder
> However, it does not work if the jar is specified using *spark.driver.extraClassPath* and *spark.executor.extraClassPath* options which is what EMR uses to configure external dependencies. Although we can drop the jar in */usr/lib/spark/jars* folder, but I am not sure if it is recommended because that folder is supposed to contain the jars coming from spark. Extra dependencies from users side would be better off specified through *extraClassPath* option.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)