You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bahir.apache.org by "Christian Kadner (JIRA)" <ji...@apache.org> on 2016/07/19 22:50:20 UTC

[jira] [Created] (BAHIR-35) Include Python code in the binary jars for use with "--packages ..."

Christian Kadner created BAHIR-35:
-------------------------------------

             Summary: Include Python code in the binary jars for use with "--packages ..."
                 Key: BAHIR-35
                 URL: https://issues.apache.org/jira/browse/BAHIR-35
             Project: Bahir
          Issue Type: Task
          Components: Build
    Affects Versions: 2.0.0
            Reporter: Christian Kadner


Currently, to make use the PySpark code (i.e streaming-mqtt/python) a user will have to download the jar from Maven central or clone the code from GitHub and then have to find individual *.py files, create a zip and add that to the {{spark-submit}} command with the {{--py-files}} option, or, add them to the {{PYTHONPATH}} when running locally.

If we include the Python code in the binary build (to the jar that gets uploaded to Maven central), then users need not do any acrobatics besides using the {{--packages ...}} option.

An example where the Python code is part of the binary jar is the [GraphFrames|https://spark-packages.org/package/graphframes/graphframes] package.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)