You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "DB Tsai (JIRA)" <ji...@apache.org> on 2019/04/12 23:54:00 UTC
[jira] [Assigned] (SPARK-27455) spark-submit and friends should allow main artifact to be specified as a package

     [ https://issues.apache.org/jira/browse/SPARK-27455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

DB Tsai reassigned SPARK-27455:
-------------------------------

    Assignee: Brian Lindblom

> spark-submit and friends should allow main artifact to be specified as a package
> --------------------------------------------------------------------------------
>
>                 Key: SPARK-27455
>                 URL: https://issues.apache.org/jira/browse/SPARK-27455
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 2.4.1
>            Reporter: Brian Lindblom
>            Assignee: Brian Lindblom
>            Priority: Minor
>
> Spark already has the ability to provide spark.jars.packages in order to include a set of required dependencies for an application.  It will transitively resolve any provided packages via ivy, cache those artifacts, and serve them via the driver to launched executors.  It would be useful to take this one step further and be able to allow a spark.jars.main.package and corresponding command line flag, --main-package, to eliminate the need to specify a specific jar file (which does NOT transitively resolve).  This could simplify many use-cases.  Additionally, --main-package can trigger the inspection of the artifact's meta-inf to determine the main class, obviating the need for spark-submit invocations to include this information directly.  Currently, I've found that I can do
> {{spark-submit --packages com.example:my-package:1.0.0 --class com.example.MyPackage /path/to/mypackage-1.0.0.jar <my_args>}}
> to achieve the same effect.   This additional boiler plate, however, seems unnecessary, especially considering one must fetch/orchestrate the jar into some location (local or remote) in addition to specifying any dependencies.  Resorting to fat jars to simplify creates other issues.
> Ideally
> {{spark-submit --repository <url_to_my_repo> --main-package com.example:my-package:1.0.0 <my_args>}}
> would be all that is necessary to bootstrap an application.  Obviously, care must be taken to avoid DoS'ing <url_to_my_repo> if orchestrating many Spark applications.  In that case, it may also be desirable to implement a --repository-cache-uri <uri_to_repository_cache> where, perhaps in the case where an HDFS is available, we can bootstrap our application and subsequently cache the resolution to a larger artifact in HDFS for consumption later (zip/tar up the ivy cache itself)?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org