You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by "Udit Mehrotra (Jira)" <ji...@apache.org> on 2020/01/09 23:40:00 UTC

[jira] [Created] (HUDI-516) Avoid need to import spark-avro package when submitting Hudi job in spark

Udit Mehrotra created HUDI-516:
----------------------------------

             Summary: Avoid need to import spark-avro package when submitting Hudi job in spark
                 Key: HUDI-516
                 URL: https://issues.apache.org/jira/browse/HUDI-516
             Project: Apache Hudi (incubating)
          Issue Type: Improvement
          Components: Usability
            Reporter: Udit Mehrotra


We are in the process of migrating Hudi to *spark 2.4.4* and using *spark-avro* instead of the deprecated *databricks-avro* here [https://github.com/apache/incubator-hudi/pull/1005/]

After this change, users would be required to specifically download spark-avro while start spark-shell using:
{code:java}
--packages org.apache.spark:spark-avro_2.11:2.4.4
{code}
This is because we are not shading this now in *hudi-spark-bundle*. One reason for not shading this is because we are not sure of the implications of shading a spark dependency in a jar which is being submitted to spark. [~vinoth] pointed out that a possible concern could be that we will always be shading spark-avro 2.4.4 which can affect users using higher versions of Spark.

This Jira is to come up with a way to solve this usability issue.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)