You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by "Garlapati, Suryanarayana (Nokia - IN/Bangalore)" <su...@nokia.com> on 2018/09/25 17:23:24 UTC

Python kubernetes spark 2.4 branch

Hi,
I am trying to run spark python testcases on k8s based on tag spark-2.4-rc1. When the dependent files are passed through the --py-files option, they are not getting resolved by the main python script. Please let me know, is this a known issue?

Regards
Surya


RE: Python kubernetes spark 2.4 branch

Posted by "Garlapati, Suryanarayana (Nokia - IN/Bangalore)" <su...@nokia.com>.
Hi Ilan/Yinan,
My observation is as follows:
The dependent files specified with “--py-files http://10.75.145.25:80/Spark/getNN.py” are being downloaded and available in the container at “/var/data/spark-c163f15e-d59d-4975-b9be-91b6be062da9/spark-61094ca2-125b-48de-a154-214304dbe74/”.
I guess we need to export PYTHONPATH with this path as well with following code change in entrypoint.sh


if [ -n "$PYSPARK_FILES" ]; then
    PYTHONPATH="$PYTHONPATH:$PYSPARK_FILES"
fi

to

if [ -n "$PYSPARK_FILES" ]; then
    PYTHONPATH="$PYTHONPATH:<directory where the dependent files are downloaded and available in container for example /var/data/spark-c163f15e-d59d-4975-b9be-91b6be062da9/spark-61094ca2-125b-48de-a154-214304dbe74/>"
fi
Let me know, if this approach is fine.

Please correct me if my understanding is wrong with this approach.

Regards
Surya

From: Garlapati, Suryanarayana (Nokia - IN/Bangalore)
Sent: Wednesday, September 26, 2018 9:14 AM
To: Ilan Filonenko <if...@cornell.edu>; liyinan926@gmail.com
Cc: Spark dev list <de...@spark.apache.org>; user@spark.apache.org
Subject: RE: Python kubernetes spark 2.4 branch

Hi Ilan/ Yinan,
Yes my test case is also similar to the one described in https://issues.apache.org/jira/browse/SPARK-24736

My spark-submit is as follows:
./spark-submit --deploy-mode cluster --master k8s://https://10.75.145.23:8443<https://10.75.145.23:8443/> --conf spark.app.name=spark-py --properties-file /tmp/program_files/spark_py.conf --py-files http://10.75.145.25:80/Spark/getNN.py http://10.75.145.25:80/Spark/test.py

Following is the error observed:

+ exec /sbin/tini -s – /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=192.168.1.22 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner http://10.75.145.25:80/Spark/test.py
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/spark/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/spark/jars/phoenix-4.13.1-HBase-1.3-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Traceback (most recent call last):
File "/tmp/spark-4c428c98-e123-4c29-a9f5-ef85f207e229/test.py", line 13, in <module>
from getNN import *
ImportError: No module named getNN
2018-09-25 16:19:57 INFO ShutdownHookManager:54 - Shutdown hook called
2018-09-25 16:19:57 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-4c428c98-e123-4c29-a9f5-ef85f207e229

Observing the same kind of behaviour as mentioned in https://issues.apache.org/jira/browse/SPARK-24736 (file getting downloaded and available in pod)

This is also the same with the local files as well:

./spark-submit --deploy-mode cluster --master k8s://https://10.75.145.23:8443<https://10.75.145.23:8443/> --conf spark.app.name=spark-py --properties-file /tmp/program_files/spark_py.conf --py-files ./getNN.py<getNN.py> http://10.75.145.25:80/Spark/test.py

test.py has dependencies from getNN.py.


But the same is working in spark 2.2 k8s branch.


Regards
Surya

From: Ilan Filonenko <if...@cornell.edu>>
Sent: Wednesday, September 26, 2018 2:06 AM
To: liyinan926@gmail.com<ma...@gmail.com>
Cc: Garlapati, Suryanarayana (Nokia - IN/Bangalore) <su...@nokia.com>>; Spark dev list <de...@spark.apache.org>>; user@spark.apache.org<ma...@spark.apache.org>
Subject: Re: Python kubernetes spark 2.4 branch

Is this in reference to: https://issues.apache.org/jira/browse/SPARK-24736 ?

On Tue, Sep 25, 2018 at 12:38 PM Yinan Li <li...@gmail.com>> wrote:
Can you give more details on how you ran your app, did you build your own image, and which image are you using?

On Tue, Sep 25, 2018 at 10:23 AM Garlapati, Suryanarayana (Nokia - IN/Bangalore) <su...@nokia.com>> wrote:
Hi,
I am trying to run spark python testcases on k8s based on tag spark-2.4-rc1. When the dependent files are passed through the --py-files option, they are not getting resolved by the main python script. Please let me know, is this a known issue?

Regards
Surya


RE: Python kubernetes spark 2.4 branch

Posted by "Garlapati, Suryanarayana (Nokia - IN/Bangalore)" <su...@nokia.com>.
Hi Ilan/Yinan,
My observation is as follows:
The dependent files specified with “--py-files http://10.75.145.25:80/Spark/getNN.py” are being downloaded and available in the container at “/var/data/spark-c163f15e-d59d-4975-b9be-91b6be062da9/spark-61094ca2-125b-48de-a154-214304dbe74/”.
I guess we need to export PYTHONPATH with this path as well with following code change in entrypoint.sh


if [ -n "$PYSPARK_FILES" ]; then
    PYTHONPATH="$PYTHONPATH:$PYSPARK_FILES"
fi

to

if [ -n "$PYSPARK_FILES" ]; then
    PYTHONPATH="$PYTHONPATH:<directory where the dependent files are downloaded and available in container for example /var/data/spark-c163f15e-d59d-4975-b9be-91b6be062da9/spark-61094ca2-125b-48de-a154-214304dbe74/>"
fi
Let me know, if this approach is fine.

Please correct me if my understanding is wrong with this approach.

Regards
Surya

From: Garlapati, Suryanarayana (Nokia - IN/Bangalore)
Sent: Wednesday, September 26, 2018 9:14 AM
To: Ilan Filonenko <if...@cornell.edu>; liyinan926@gmail.com
Cc: Spark dev list <de...@spark.apache.org>; user@spark.apache.org
Subject: RE: Python kubernetes spark 2.4 branch

Hi Ilan/ Yinan,
Yes my test case is also similar to the one described in https://issues.apache.org/jira/browse/SPARK-24736

My spark-submit is as follows:
./spark-submit --deploy-mode cluster --master k8s://https://10.75.145.23:8443<https://10.75.145.23:8443/> --conf spark.app.name=spark-py --properties-file /tmp/program_files/spark_py.conf --py-files http://10.75.145.25:80/Spark/getNN.py http://10.75.145.25:80/Spark/test.py

Following is the error observed:

+ exec /sbin/tini -s – /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=192.168.1.22 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner http://10.75.145.25:80/Spark/test.py
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/spark/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/spark/jars/phoenix-4.13.1-HBase-1.3-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Traceback (most recent call last):
File "/tmp/spark-4c428c98-e123-4c29-a9f5-ef85f207e229/test.py", line 13, in <module>
from getNN import *
ImportError: No module named getNN
2018-09-25 16:19:57 INFO ShutdownHookManager:54 - Shutdown hook called
2018-09-25 16:19:57 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-4c428c98-e123-4c29-a9f5-ef85f207e229

Observing the same kind of behaviour as mentioned in https://issues.apache.org/jira/browse/SPARK-24736 (file getting downloaded and available in pod)

This is also the same with the local files as well:

./spark-submit --deploy-mode cluster --master k8s://https://10.75.145.23:8443<https://10.75.145.23:8443/> --conf spark.app.name=spark-py --properties-file /tmp/program_files/spark_py.conf --py-files ./getNN.py<getNN.py> http://10.75.145.25:80/Spark/test.py

test.py has dependencies from getNN.py.


But the same is working in spark 2.2 k8s branch.


Regards
Surya

From: Ilan Filonenko <if...@cornell.edu>>
Sent: Wednesday, September 26, 2018 2:06 AM
To: liyinan926@gmail.com<ma...@gmail.com>
Cc: Garlapati, Suryanarayana (Nokia - IN/Bangalore) <su...@nokia.com>>; Spark dev list <de...@spark.apache.org>>; user@spark.apache.org<ma...@spark.apache.org>
Subject: Re: Python kubernetes spark 2.4 branch

Is this in reference to: https://issues.apache.org/jira/browse/SPARK-24736 ?

On Tue, Sep 25, 2018 at 12:38 PM Yinan Li <li...@gmail.com>> wrote:
Can you give more details on how you ran your app, did you build your own image, and which image are you using?

On Tue, Sep 25, 2018 at 10:23 AM Garlapati, Suryanarayana (Nokia - IN/Bangalore) <su...@nokia.com>> wrote:
Hi,
I am trying to run spark python testcases on k8s based on tag spark-2.4-rc1. When the dependent files are passed through the --py-files option, they are not getting resolved by the main python script. Please let me know, is this a known issue?

Regards
Surya


RE: Python kubernetes spark 2.4 branch

Posted by "Garlapati, Suryanarayana (Nokia - IN/Bangalore)" <su...@nokia.com>.
Hi Ilan/ Yinan,
Yes my test case is also similar to the one described in https://issues.apache.org/jira/browse/SPARK-24736

My spark-submit is as follows:
./spark-submit --deploy-mode cluster --master k8s://https://10.75.145.23:8443<https://10.75.145.23:8443/> --conf spark.app.name=spark-py --properties-file /tmp/program_files/spark_py.conf --py-files http://10.75.145.25:80/Spark/getNN.py http://10.75.145.25:80/Spark/test.py

Following is the error observed:

+ exec /sbin/tini -s – /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=192.168.1.22 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner http://10.75.145.25:80/Spark/test.py
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/spark/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/spark/jars/phoenix-4.13.1-HBase-1.3-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Traceback (most recent call last):
File "/tmp/spark-4c428c98-e123-4c29-a9f5-ef85f207e229/test.py", line 13, in <module>
from getNN import *
ImportError: No module named getNN
2018-09-25 16:19:57 INFO ShutdownHookManager:54 - Shutdown hook called
2018-09-25 16:19:57 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-4c428c98-e123-4c29-a9f5-ef85f207e229

Observing the same kind of behaviour as mentioned in https://issues.apache.org/jira/browse/SPARK-24736 (file getting downloaded and available in pod)

This is also the same with the local files as well:

./spark-submit --deploy-mode cluster --master k8s://https://10.75.145.23:8443<https://10.75.145.23:8443/> --conf spark.app.name=spark-py --properties-file /tmp/program_files/spark_py.conf --py-files ./getNN.py<getNN.py> http://10.75.145.25:80/Spark/test.py

test.py has dependencies from getNN.py.


But the same is working in spark 2.2 k8s branch.


Regards
Surya

From: Ilan Filonenko <if...@cornell.edu>
Sent: Wednesday, September 26, 2018 2:06 AM
To: liyinan926@gmail.com
Cc: Garlapati, Suryanarayana (Nokia - IN/Bangalore) <su...@nokia.com>; Spark dev list <de...@spark.apache.org>; user@spark.apache.org
Subject: Re: Python kubernetes spark 2.4 branch

Is this in reference to: https://issues.apache.org/jira/browse/SPARK-24736 ?

On Tue, Sep 25, 2018 at 12:38 PM Yinan Li <li...@gmail.com>> wrote:
Can you give more details on how you ran your app, did you build your own image, and which image are you using?

On Tue, Sep 25, 2018 at 10:23 AM Garlapati, Suryanarayana (Nokia - IN/Bangalore) <su...@nokia.com>> wrote:
Hi,
I am trying to run spark python testcases on k8s based on tag spark-2.4-rc1. When the dependent files are passed through the --py-files option, they are not getting resolved by the main python script. Please let me know, is this a known issue?

Regards
Surya


RE: Python kubernetes spark 2.4 branch

Posted by "Garlapati, Suryanarayana (Nokia - IN/Bangalore)" <su...@nokia.com>.
Hi Ilan/ Yinan,
Yes my test case is also similar to the one described in https://issues.apache.org/jira/browse/SPARK-24736

My spark-submit is as follows:
./spark-submit --deploy-mode cluster --master k8s://https://10.75.145.23:8443<https://10.75.145.23:8443/> --conf spark.app.name=spark-py --properties-file /tmp/program_files/spark_py.conf --py-files http://10.75.145.25:80/Spark/getNN.py http://10.75.145.25:80/Spark/test.py

Following is the error observed:

+ exec /sbin/tini -s – /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=192.168.1.22 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner http://10.75.145.25:80/Spark/test.py
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/spark/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/spark/jars/phoenix-4.13.1-HBase-1.3-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Traceback (most recent call last):
File "/tmp/spark-4c428c98-e123-4c29-a9f5-ef85f207e229/test.py", line 13, in <module>
from getNN import *
ImportError: No module named getNN
2018-09-25 16:19:57 INFO ShutdownHookManager:54 - Shutdown hook called
2018-09-25 16:19:57 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-4c428c98-e123-4c29-a9f5-ef85f207e229

Observing the same kind of behaviour as mentioned in https://issues.apache.org/jira/browse/SPARK-24736 (file getting downloaded and available in pod)

This is also the same with the local files as well:

./spark-submit --deploy-mode cluster --master k8s://https://10.75.145.23:8443<https://10.75.145.23:8443/> --conf spark.app.name=spark-py --properties-file /tmp/program_files/spark_py.conf --py-files ./getNN.py<getNN.py> http://10.75.145.25:80/Spark/test.py

test.py has dependencies from getNN.py.


But the same is working in spark 2.2 k8s branch.


Regards
Surya

From: Ilan Filonenko <if...@cornell.edu>
Sent: Wednesday, September 26, 2018 2:06 AM
To: liyinan926@gmail.com
Cc: Garlapati, Suryanarayana (Nokia - IN/Bangalore) <su...@nokia.com>; Spark dev list <de...@spark.apache.org>; user@spark.apache.org
Subject: Re: Python kubernetes spark 2.4 branch

Is this in reference to: https://issues.apache.org/jira/browse/SPARK-24736 ?

On Tue, Sep 25, 2018 at 12:38 PM Yinan Li <li...@gmail.com>> wrote:
Can you give more details on how you ran your app, did you build your own image, and which image are you using?

On Tue, Sep 25, 2018 at 10:23 AM Garlapati, Suryanarayana (Nokia - IN/Bangalore) <su...@nokia.com>> wrote:
Hi,
I am trying to run spark python testcases on k8s based on tag spark-2.4-rc1. When the dependent files are passed through the --py-files option, they are not getting resolved by the main python script. Please let me know, is this a known issue?

Regards
Surya


Re: Python kubernetes spark 2.4 branch

Posted by Ilan Filonenko <if...@cornell.edu>.
Is this in reference to: https://issues.apache.org/jira/browse/SPARK-24736
?

On Tue, Sep 25, 2018 at 12:38 PM Yinan Li <li...@gmail.com> wrote:

> Can you give more details on how you ran your app, did you build your own
> image, and which image are you using?
>
> On Tue, Sep 25, 2018 at 10:23 AM Garlapati, Suryanarayana (Nokia -
> IN/Bangalore) <su...@nokia.com> wrote:
>
>> Hi,
>>
>> I am trying to run spark python testcases on k8s based on tag
>> spark-2.4-rc1. When the dependent files are passed through the --py-files
>> option, they are not getting resolved by the main python script. Please let
>> me know, is this a known issue?
>>
>>
>>
>> Regards
>>
>> Surya
>>
>>
>>
>

Re: Python kubernetes spark 2.4 branch

Posted by Yinan Li <li...@gmail.com>.
Can you give more details on how you ran your app, did you build your own
image, and which image are you using?

On Tue, Sep 25, 2018 at 10:23 AM Garlapati, Suryanarayana (Nokia -
IN/Bangalore) <su...@nokia.com> wrote:

> Hi,
>
> I am trying to run spark python testcases on k8s based on tag
> spark-2.4-rc1. When the dependent files are passed through the --py-files
> option, they are not getting resolved by the main python script. Please let
> me know, is this a known issue?
>
>
>
> Regards
>
> Surya
>
>
>

Re: Python kubernetes spark 2.4 branch

Posted by Yinan Li <li...@gmail.com>.
Can you give more details on how you ran your app, did you build your own
image, and which image are you using?

On Tue, Sep 25, 2018 at 10:23 AM Garlapati, Suryanarayana (Nokia -
IN/Bangalore) <su...@nokia.com> wrote:

> Hi,
>
> I am trying to run spark python testcases on k8s based on tag
> spark-2.4-rc1. When the dependent files are passed through the --py-files
> option, they are not getting resolved by the main python script. Please let
> me know, is this a known issue?
>
>
>
> Regards
>
> Surya
>
>
>