You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@zeppelin.apache.org by Marty B <sp...@mjhb.com> on 2015/05/03 00:45:03 UTC

Loading Python modules

Hi,

With PySpark, how do I configure a Python module to load when starting
Zeppelin?

I'm attempting to load the pyspark-cassandra connector (
https://github.com/TargetHolding/pyspark-cassandra). The jar loads
successfully, but I don't see how to load the associated .egg file for
Python.

I tried -Dspark.py-files=3D/path/to/module in ZEPPELIN_JAVA_OPTS (with
-Dspark.jars=3D/...) but that didn't seem to do anything.

Thanks in advance for any pointers.

Re: Loading Python modules

Posted by moon soo Lee <mo...@apache.org>.
Thanks for creating issue!

On Thu, May 7, 2015 at 6:51 AM Marty B <sp...@mjhb.com> wrote:

> I created ZEPPELIN-71.  Please let me know if there's any more info I can
> provide.  I attempted a proof-of-concept, but couldn't get it to work.
>
> On Tue, May 5, 2015 at 9:40 PM moon soo Lee <mo...@apache.org> wrote:
>
>> Looks like Zeppelin need something to handle .egg module.
>> Could you create an issue for it?
>>
>> Best,
>> moon
>>
>>
>> On Tue, May 5, 2015 at 6:06 AM Marty B <sp...@mjhb.com> wrote:
>>
>>> The jar loads successfully - I can see it in the log files, and I can
>>> successfully access Cassandra from the %spark interpreter (via the embedded
>>> spark-cassandra-connector).
>>>
>>> But I cannot get the .egg to load (python equivalent of a jar).  Tried
>>> PYTHONPATH, but attempting to 'import pyspark_cassandra' still fails in the
>>> %pyspark interpreter and I see nothing in the log files. This same
>>> statement succeeds in pyspark (outside of Zeppelin) with the flag
>>> '--py-files /path/to/pyspark_cassandra-0.1.3-py2.7.egg'.
>>>
>>> Here is the error message from the %pyspark interpreter:
>>>
>>> (<type 'exceptions.ImportError'>, ImportError('Java module
>>> pyspark_cassandra not found',), <traceback object at 0x7f0b185f8b48>)
>>>
>>>
>>>
>>> On Mon, May 4, 2015 at 6:23 PM Jonathan Haddad <jo...@jonhaddad.com>
>>> wrote:
>>>
>>>> I just tried this today - it's more than just the python path.  There
>>>> also needs to be a jar loaded in the driver path.  Here's what the call to
>>>> pyspark looks like:
>>>>
>>>> pyspark \
>>>>     --jars ${PYSPARK_ROOT}/pyspark_cassandra-0.1.4.jar  \
>>>>     --driver-class-path ${PYSPARK_ROOT}/pyspark_cassandra-0.1.4.jar \
>>>>     --py-files ${PYSPARK_ROOT}/pyspark_cassandra-0.1.4-py2.7.egg \
>>>>     --conf spark.cassandra.connection.host=127.0.0.1 \
>>>>     --master spark://127.0.0.1:7077 \
>>>>
>>>> I'm also wondering how this would work as I don't know Zeppelin at all.
>>>>
>>>> Jon
>>>>
>>>> On Mon, May 4, 2015 at 6:18 PM moon soo Lee <mo...@apache.org> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Zeppelin's PySpark Interpreter respect PYTHONPATH environment
>>>>> variable. Could you try export PYTHONPATH in conf/zeppelin-env.sh?
>>>>>
>>>>> Thanks,
>>>>> moon
>>>>>
>>>>> On Sat, May 2, 2015 at 11:46 PM Marty B <sp...@mjhb.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> With PySpark, how do I configure a Python module to load when
>>>>>> starting Zeppelin?
>>>>>>
>>>>>> I'm attempting to load the pyspark-cassandra connector (
>>>>>> https://github.com/TargetHolding/pyspark-cassandra). The jar loads
>>>>>> successfully, but I don't see how to load the associated .egg file for
>>>>>> Python.
>>>>>>
>>>>>> I tried -Dspark.py-files=3D/path/to/module in ZEPPELIN_JAVA_OPTS
>>>>>> (with -Dspark.jars=3D/...) but that didn't seem to do anything.
>>>>>>
>>>>>> Thanks in advance for any pointers.
>>>>>>
>>>>>

Re: Loading Python modules

Posted by Marty B <sp...@mjhb.com>.
I created ZEPPELIN-71.  Please let me know if there's any more info I can
provide.  I attempted a proof-of-concept, but couldn't get it to work.

On Tue, May 5, 2015 at 9:40 PM moon soo Lee <mo...@apache.org> wrote:

> Looks like Zeppelin need something to handle .egg module.
> Could you create an issue for it?
>
> Best,
> moon
>
>
> On Tue, May 5, 2015 at 6:06 AM Marty B <sp...@mjhb.com> wrote:
>
>> The jar loads successfully - I can see it in the log files, and I can
>> successfully access Cassandra from the %spark interpreter (via the embedded
>> spark-cassandra-connector).
>>
>> But I cannot get the .egg to load (python equivalent of a jar).  Tried
>> PYTHONPATH, but attempting to 'import pyspark_cassandra' still fails in the
>> %pyspark interpreter and I see nothing in the log files. This same
>> statement succeeds in pyspark (outside of Zeppelin) with the flag
>> '--py-files /path/to/pyspark_cassandra-0.1.3-py2.7.egg'.
>>
>> Here is the error message from the %pyspark interpreter:
>>
>> (<type 'exceptions.ImportError'>, ImportError('Java module
>> pyspark_cassandra not found',), <traceback object at 0x7f0b185f8b48>)
>>
>>
>>
>> On Mon, May 4, 2015 at 6:23 PM Jonathan Haddad <jo...@jonhaddad.com> wrote:
>>
>>> I just tried this today - it's more than just the python path.  There
>>> also needs to be a jar loaded in the driver path.  Here's what the call to
>>> pyspark looks like:
>>>
>>> pyspark \
>>>     --jars ${PYSPARK_ROOT}/pyspark_cassandra-0.1.4.jar  \
>>>     --driver-class-path ${PYSPARK_ROOT}/pyspark_cassandra-0.1.4.jar \
>>>     --py-files ${PYSPARK_ROOT}/pyspark_cassandra-0.1.4-py2.7.egg \
>>>     --conf spark.cassandra.connection.host=127.0.0.1 \
>>>     --master spark://127.0.0.1:7077 \
>>>
>>> I'm also wondering how this would work as I don't know Zeppelin at all.
>>>
>>> Jon
>>>
>>> On Mon, May 4, 2015 at 6:18 PM moon soo Lee <mo...@apache.org> wrote:
>>>
>>>> Hi,
>>>>
>>>> Zeppelin's PySpark Interpreter respect PYTHONPATH environment variable.
>>>> Could you try export PYTHONPATH in conf/zeppelin-env.sh?
>>>>
>>>> Thanks,
>>>> moon
>>>>
>>>> On Sat, May 2, 2015 at 11:46 PM Marty B <sp...@mjhb.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> With PySpark, how do I configure a Python module to load when starting
>>>>> Zeppelin?
>>>>>
>>>>> I'm attempting to load the pyspark-cassandra connector (
>>>>> https://github.com/TargetHolding/pyspark-cassandra). The jar loads
>>>>> successfully, but I don't see how to load the associated .egg file for
>>>>> Python.
>>>>>
>>>>> I tried -Dspark.py-files=3D/path/to/module in ZEPPELIN_JAVA_OPTS (with
>>>>> -Dspark.jars=3D/...) but that didn't seem to do anything.
>>>>>
>>>>> Thanks in advance for any pointers.
>>>>>
>>>>

Re: Loading Python modules

Posted by moon soo Lee <mo...@apache.org>.
Looks like Zeppelin need something to handle .egg module.
Could you create an issue for it?

Best,
moon

On Tue, May 5, 2015 at 6:06 AM Marty B <sp...@mjhb.com> wrote:

> The jar loads successfully - I can see it in the log files, and I can
> successfully access Cassandra from the %spark interpreter (via the embedded
> spark-cassandra-connector).
>
> But I cannot get the .egg to load (python equivalent of a jar).  Tried
> PYTHONPATH, but attempting to 'import pyspark_cassandra' still fails in the
> %pyspark interpreter and I see nothing in the log files. This same
> statement succeeds in pyspark (outside of Zeppelin) with the flag
> '--py-files /path/to/pyspark_cassandra-0.1.3-py2.7.egg'.
>
> Here is the error message from the %pyspark interpreter:
>
> (<type 'exceptions.ImportError'>, ImportError('Java module
> pyspark_cassandra not found',), <traceback object at 0x7f0b185f8b48>)
>
>
>
> On Mon, May 4, 2015 at 6:23 PM Jonathan Haddad <jo...@jonhaddad.com> wrote:
>
>> I just tried this today - it's more than just the python path.  There
>> also needs to be a jar loaded in the driver path.  Here's what the call to
>> pyspark looks like:
>>
>> pyspark \
>>     --jars ${PYSPARK_ROOT}/pyspark_cassandra-0.1.4.jar  \
>>     --driver-class-path ${PYSPARK_ROOT}/pyspark_cassandra-0.1.4.jar \
>>     --py-files ${PYSPARK_ROOT}/pyspark_cassandra-0.1.4-py2.7.egg \
>>     --conf spark.cassandra.connection.host=127.0.0.1 \
>>     --master spark://127.0.0.1:7077 \
>>
>> I'm also wondering how this would work as I don't know Zeppelin at all.
>>
>> Jon
>>
>> On Mon, May 4, 2015 at 6:18 PM moon soo Lee <mo...@apache.org> wrote:
>>
>>> Hi,
>>>
>>> Zeppelin's PySpark Interpreter respect PYTHONPATH environment variable.
>>> Could you try export PYTHONPATH in conf/zeppelin-env.sh?
>>>
>>> Thanks,
>>> moon
>>>
>>> On Sat, May 2, 2015 at 11:46 PM Marty B <sp...@mjhb.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> With PySpark, how do I configure a Python module to load when starting
>>>> Zeppelin?
>>>>
>>>> I'm attempting to load the pyspark-cassandra connector (
>>>> https://github.com/TargetHolding/pyspark-cassandra). The jar loads
>>>> successfully, but I don't see how to load the associated .egg file for
>>>> Python.
>>>>
>>>> I tried -Dspark.py-files=3D/path/to/module in ZEPPELIN_JAVA_OPTS (with
>>>> -Dspark.jars=3D/...) but that didn't seem to do anything.
>>>>
>>>> Thanks in advance for any pointers.
>>>>
>>>

Re: Loading Python modules

Posted by Marty B <sp...@mjhb.com>.
The jar loads successfully - I can see it in the log files, and I can
successfully access Cassandra from the %spark interpreter (via the embedded
spark-cassandra-connector).

But I cannot get the .egg to load (python equivalent of a jar).  Tried
PYTHONPATH, but attempting to 'import pyspark_cassandra' still fails in the
%pyspark interpreter and I see nothing in the log files. This same
statement succeeds in pyspark (outside of Zeppelin) with the flag
'--py-files /path/to/pyspark_cassandra-0.1.3-py2.7.egg'.

Here is the error message from the %pyspark interpreter:

(<type 'exceptions.ImportError'>, ImportError('Java module
pyspark_cassandra not found',), <traceback object at 0x7f0b185f8b48>)


On Mon, May 4, 2015 at 6:23 PM Jonathan Haddad <jo...@jonhaddad.com> wrote:

> I just tried this today - it's more than just the python path.  There also
> needs to be a jar loaded in the driver path.  Here's what the call to
> pyspark looks like:
>
> pyspark \
>     --jars ${PYSPARK_ROOT}/pyspark_cassandra-0.1.4.jar  \
>     --driver-class-path ${PYSPARK_ROOT}/pyspark_cassandra-0.1.4.jar \
>     --py-files ${PYSPARK_ROOT}/pyspark_cassandra-0.1.4-py2.7.egg \
>     --conf spark.cassandra.connection.host=127.0.0.1 \
>     --master spark://127.0.0.1:7077 \
>
> I'm also wondering how this would work as I don't know Zeppelin at all.
>
> Jon
>
> On Mon, May 4, 2015 at 6:18 PM moon soo Lee <mo...@apache.org> wrote:
>
>> Hi,
>>
>> Zeppelin's PySpark Interpreter respect PYTHONPATH environment variable.
>> Could you try export PYTHONPATH in conf/zeppelin-env.sh?
>>
>> Thanks,
>> moon
>>
>> On Sat, May 2, 2015 at 11:46 PM Marty B <sp...@mjhb.com> wrote:
>>
>>> Hi,
>>>
>>> With PySpark, how do I configure a Python module to load when starting
>>> Zeppelin?
>>>
>>> I'm attempting to load the pyspark-cassandra connector (
>>> https://github.com/TargetHolding/pyspark-cassandra). The jar loads
>>> successfully, but I don't see how to load the associated .egg file for
>>> Python.
>>>
>>> I tried -Dspark.py-files=3D/path/to/module in ZEPPELIN_JAVA_OPTS (with
>>> -Dspark.jars=3D/...) but that didn't seem to do anything.
>>>
>>> Thanks in advance for any pointers.
>>>
>>

Re: Loading Python modules

Posted by Jonathan Haddad <jo...@jonhaddad.com>.
I just tried this today - it's more than just the python path.  There also
needs to be a jar loaded in the driver path.  Here's what the call to
pyspark looks like:

pyspark \
    --jars ${PYSPARK_ROOT}/pyspark_cassandra-0.1.4.jar  \
    --driver-class-path ${PYSPARK_ROOT}/pyspark_cassandra-0.1.4.jar \
    --py-files ${PYSPARK_ROOT}/pyspark_cassandra-0.1.4-py2.7.egg \
    --conf spark.cassandra.connection.host=127.0.0.1 \
    --master spark://127.0.0.1:7077 \

I'm also wondering how this would work as I don't know Zeppelin at all.

Jon

On Mon, May 4, 2015 at 6:18 PM moon soo Lee <mo...@apache.org> wrote:

> Hi,
>
> Zeppelin's PySpark Interpreter respect PYTHONPATH environment variable.
> Could you try export PYTHONPATH in conf/zeppelin-env.sh?
>
> Thanks,
> moon
>
> On Sat, May 2, 2015 at 11:46 PM Marty B <sp...@mjhb.com> wrote:
>
>> Hi,
>>
>> With PySpark, how do I configure a Python module to load when starting
>> Zeppelin?
>>
>> I'm attempting to load the pyspark-cassandra connector (
>> https://github.com/TargetHolding/pyspark-cassandra). The jar loads
>> successfully, but I don't see how to load the associated .egg file for
>> Python.
>>
>> I tried -Dspark.py-files=3D/path/to/module in ZEPPELIN_JAVA_OPTS (with
>> -Dspark.jars=3D/...) but that didn't seem to do anything.
>>
>> Thanks in advance for any pointers.
>>
>

Re: Loading Python modules

Posted by moon soo Lee <mo...@apache.org>.
Hi,

Zeppelin's PySpark Interpreter respect PYTHONPATH environment variable.
Could you try export PYTHONPATH in conf/zeppelin-env.sh?

Thanks,
moon

On Sat, May 2, 2015 at 11:46 PM Marty B <sp...@mjhb.com> wrote:

> Hi,
>
> With PySpark, how do I configure a Python module to load when starting
> Zeppelin?
>
> I'm attempting to load the pyspark-cassandra connector (
> https://github.com/TargetHolding/pyspark-cassandra). The jar loads
> successfully, but I don't see how to load the associated .egg file for
> Python.
>
> I tried -Dspark.py-files=3D/path/to/module in ZEPPELIN_JAVA_OPTS (with
> -Dspark.jars=3D/...) but that didn't seem to do anything.
>
> Thanks in advance for any pointers.
>