You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Brian Lindblom (JIRA)" <ji...@apache.org> on 2019/04/12 23:21:00 UTC

[jira] [Created] (SPARK-27455) spark-submit and friends should allow main artifact to be specified as a package

Brian Lindblom created SPARK-27455:
--------------------------------------

             Summary: spark-submit and friends should allow main artifact to be specified as a package
                 Key: SPARK-27455
                 URL: https://issues.apache.org/jira/browse/SPARK-27455
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core
    Affects Versions: 2.4.1
            Reporter: Brian Lindblom


Spark already has the ability to provide spark.jars.packages in order to include a set of required dependencies for an application.  It will transitively resolve any provided packages via ivy, cache those artifacts, and serve them via the driver to launched executors.  It would be useful to take this one step further and be able to allow a spark.jars.main.package and corresponding command line flag, --main-package, to eliminate the need to specify a specific jar file (which does NOT transitively resolve).  This could simplify many use-cases.  Additionally, --main-package can trigger the inspection of the artifact's meta-inf to determine the main class, obviating the need for spark-submit invocations to include this information directly.  Currently, I've found that I can do

{{spark-submit --packages com.example:my-package:1.0.0 --class com.example.MyPackage /path/to/mypackage-1.0.0.jar <my_args>}}

to achieve the same effect.   This additional boiler plate, however, seems unnecessary, especially considering one must fetch/orchestrate the jar into some location (local or remote) in addition to specifying any dependencies.  Resorting to fat jars to simplify creates other issues.

Ideally

{{spark-submit --repository <url_to_my_repo> --main-package com.example:my-package:1.0.0 <my_args>}}

would be all that is necessary to bootstrap an application.  Obviously, care must be taken to avoid DoS'ing <url_to_my_repo> if orchestrating many Spark applications.  In that case, it may also be desirable to implement a --repository-cache-uri <uri_to_repository_cache> where, perhaps in the case where an HDFS is available, we can bootstrap our application and subsequently cache the resolution to a larger artifact in HDFS for consumption later (zip/tar up the ivy cache itself)?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org