You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Marcelo Vanzin (JIRA)" <ji...@apache.org> on 2014/10/22 20:12:34 UTC

[jira] [Created] (SPARK-4048) Enhance and extend hadoop-provided profile

Marcelo Vanzin created SPARK-4048:
-------------------------------------

             Summary: Enhance and extend hadoop-provided profile
                 Key: SPARK-4048
                 URL: https://issues.apache.org/jira/browse/SPARK-4048
             Project: Spark
          Issue Type: Improvement
          Components: Build
    Affects Versions: 1.2.0
            Reporter: Marcelo Vanzin


The hadoop-provided profile is used to not package Hadoop dependencies inside the Spark assembly. It works, sort of, but it could use some enhancements. A quick list:

- It doesn't include all things that could be removed from the assembly
- It doesn't work well when you're publishing artifacts based on it (SPARK-3812 fixes this)
- There are other dependencies that could use similar treatment: Hive, HBase (for the examples), Flume, Parquet, maybe others I'm missing at the moment.
- Unit tests, more specifically, those that use local-cluster mode, do not work when the assembly is built with this profile enabled.
- The scripts to launch Spark jobs do not add needed "provided" jars to the classpath when this profile is enabled, leaving it for people to figure that out for themselves.

Part of this task is selfish since we build internally with this profile and we'd like to make it easier for us to merge changes without having to keep too many patches on top of upstream. But those feel like good improvements to me, regardless.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org