You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by KhajaAsmath Mohammed <md...@gmail.com> on 2021/05/14 21:49:34 UTC

Urgent Help - Py Spark submit error

Hi,

I am having a weird situation where the below command works when the
deploy mode is a client and fails if it is a cluster.

spark-submit --master yarn --deploy-mode client --files
/etc/hive/conf/hive-site.xml,/etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml
--driver-memory 70g --num-executors 6 --executor-cores 3 --driver-cores 3
--driver-memory 7g --py-files /appl/common/ftp/ftp_event_data.py
 /appl/common/ftp/ftp_event_data.py /appl/common/ftp/conf.json 2021-05-10 7



21/05/14 17:34:39 INFO ApplicationMaster: Waiting for spark context
initialization...
21/05/14 17:34:39 WARN SparkConf: The configuration key
'spark.yarn.executor.memoryOverhead' has been deprecated as of Spark 2.3
and may be removed in the future. Please use the new key
'spark.executor.memoryOverhead' instead.
21/05/14 17:34:39 ERROR ApplicationMaster: User application exited with
status 1
21/05/14 17:34:39 INFO ApplicationMaster: Final app status: FAILED,
exitCode: 13, (reason: User application exited with status 1)
21/05/14 17:34:39 ERROR ApplicationMaster: Uncaught exception:
org.apache.spark.SparkException: Exception thrown in awaitResult:
        at
org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:226)
        at
org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:447)
        at
org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:275)
        at
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:799)
        at
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:798)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
        at
org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:798)
        at
org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
Caused by: org.apache.spark.SparkUserAppException: User application exited
with 1
        at
org.apache.spark.deploy.PythonRunner$.main(PythonRunner.scala:106)
        at org.apache.spark.deploy.PythonRunner.main(PythonRunner.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:667)
21/05/14 17:34:39 INFO ApplicationMaster: Deleting staging directory
hdfs://dev-cbb-datalake/user/nifiuser/.sparkStaging/application_1620318563358_0046
21/05/14 17:34:41 INFO ShutdownHookManager: Shutdown hook called


For more detailed output, check the application tracking page:
https://srvbigddvlsh115.us.dev.corp:8090/cluster/app/application_1620318563358_0046
Then click on links to logs of each attempt.
. Failing the application.
Exception in thread "main" org.apache.spark.SparkException: Application
application_1620318563358_0046 finished with failed status
        at org.apache.spark.deploy.yarn.Client.run(Client.scala:1155)
        at
org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1603)
        at org.apache.spark.deploy.SparkSubmit.org
$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:851)
        at
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
        at
org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
        at
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:926)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:935)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
21/05/14 17:34:42 INFO util.ShutdownHookManager: Shutdown hook called
21/05/14 17:34:42 INFO util.ShutdownHookManager: Deleting directory
/tmp/spark-28fa7d64-5a1d-42fb-865f-e9bb24854e7c
21/05/14 17:34:42 INFO util.ShutdownHookManager: Deleting directory
/tmp/spark-db93f731-d48a-4a7b-986f-e0a016bbd7f3

Thanks,
Asmath

Re: [EXTERNAL] Urgent Help - Py Spark submit error

Posted by Mich Talebzadeh <mi...@gmail.com>.
This is an interesting one.

I have never tried to add --files ...

spark-submit --master yarn --deploy-mode client --files
/etc/hive/conf/hive-site.xml,/etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml

Rather, under $SPARK_HOME/conf, I create soft links to the needed XML files
as below

/d4T/hduser/spark-3.1.1-bin-hadoop3.2/conf> ls -lhaF | grep ^l
lrwxrwxrwx  1 hduser hadoop   50 Mar  3 08:08 core-site.xml ->
/home/hduser/hadoop-3.1.0/etc/hadoop/core-site.xml
lrwxrwxrwx  1 hduser hadoop   45 Mar  3 08:07 hbase-site.xml ->
/data6/hduser/hbase-1.2.6/conf/hbase-site.xml
lrwxrwxrwx  1 hduser hadoop   50 Mar  3 08:08 hdfs-site.xml ->
/home/hduser/hadoop-3.1.0/etc/hadoop/hdfs-site.xml
lrwxrwxrwx  1 hduser hadoop   43 Mar  3 08:07 hive-site.xml ->
/data6/hduser/hive-3.0.0/conf/hive-site.xml

This works

HTH



   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Sat, 15 May 2021 at 18:32, KhajaAsmath Mohammed <md...@gmail.com>
wrote:

> Thanks everyone. I was able to resolve this.
>
> Here is what I did. Just passed conf file using —files option.
>
> Mistake that I did was reading the json conf file before creating spark
> session . Reading if after creating spark session helped it. Thanks once
> again for your valuable suggestions
>
> Thanks,
> Asmath
>
> On May 15, 2021, at 8:12 AM, Sean Owen <sr...@gmail.com> wrote:
>
> 
> If code running on the executors need some local file like a config file,
> then it does have to be passed this way. That much is normal.
>
> On Sat, May 15, 2021 at 1:41 AM Gourav Sengupta <go...@gmail.com>
> wrote:
>
>> Hi,
>>
>> once again lets start with the requirement. Why are you trying to pass
>> xml and json files to SPARK instead of reading them in SPARK?
>> Generally when people pass on files they are python or jar files.
>>
>> Regards,
>> Gourav
>>
>

Re: [EXTERNAL] Urgent Help - Py Spark submit error

Posted by KhajaAsmath Mohammed <md...@gmail.com>.
Thanks everyone. I was able to resolve this. 

Here is what I did. Just passed conf file using —files option.

Mistake that I did was reading the json conf file before creating spark session . Reading if after creating spark session helped it. Thanks once again for your valuable suggestions 

Thanks,
Asmath

> On May 15, 2021, at 8:12 AM, Sean Owen <sr...@gmail.com> wrote:
> 
> 
> If code running on the executors need some local file like a config file, then it does have to be passed this way. That much is normal.
> 
>> On Sat, May 15, 2021 at 1:41 AM Gourav Sengupta <go...@gmail.com> wrote:
>> Hi,
>> 
>> once again lets start with the requirement. Why are you trying to pass xml and json files to SPARK instead of reading them in SPARK? 
>> Generally when people pass on files they are python or jar files.
>> 
>> Regards,
>> Gourav

Re: [EXTERNAL] Urgent Help - Py Spark submit error

Posted by Sean Owen <sr...@gmail.com>.
If code running on the executors need some local file like a config file,
then it does have to be passed this way. That much is normal.

On Sat, May 15, 2021 at 1:41 AM Gourav Sengupta <go...@gmail.com>
wrote:

> Hi,
>
> once again lets start with the requirement. Why are you trying to pass xml
> and json files to SPARK instead of reading them in SPARK?
> Generally when people pass on files they are python or jar files.
>
> Regards,
> Gourav
>

Re: [EXTERNAL] Urgent Help - Py Spark submit error

Posted by Gourav Sengupta <go...@gmail.com>.
Hi,

once again lets start with the requirement. Why are you trying to pass xml
and json files to SPARK instead of reading them in SPARK?
Generally when people pass on files they are python or jar files.

Regards,
Gourav

On Sat, May 15, 2021 at 5:03 AM Amit Joshi <ma...@gmail.com>
wrote:

> Hi KhajaAsmath,
>
> Client vs Cluster: In client mode driver runs in the machine from where
> you submit your job. Whereas in cluster mode driver runs in one of the
> worker nodes.
>
> I think you need to pass the conf file to your driver, as you are using it
> in the driver code, which runs in one of the worker nodes.
> Use this command to pass it to driver
> *--files  /appl/common/ftp/conf.json  --conf
> spark.driver.extraJavaOptions="-Dconfig.file=conf.json*
>
> And make sure you are able to access the file location from worker nodes.
>
>
> Regards
> Amit Joshi
>
> On Sat, May 15, 2021 at 5:14 AM KhajaAsmath Mohammed <
> mdkhajaasmath@gmail.com> wrote:
>
>> Here is my updated spark submit without any luck.,
>>
>> spark-submit --master yarn --deploy-mode cluster --files
>> /appl/common/ftp/conf.json,/etc/hive/conf/hive-site.xml,/etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml
>> --num-executors 6 --executor-cores 3 --driver-cores 3 --driver-memory 7g
>> --executor-memory 7g /appl/common/ftp/ftp_event_data.py
>> /appl/common/ftp/conf.json 2021-05-10 7
>>
>> On Fri, May 14, 2021 at 6:19 PM KhajaAsmath Mohammed <
>> mdkhajaasmath@gmail.com> wrote:
>>
>>> Sorry my bad, it did not resolve the issue. I still have the same issue.
>>> can anyone please guide me. I was still running as a client instead of a
>>> cluster.
>>>
>>> On Fri, May 14, 2021 at 5:05 PM KhajaAsmath Mohammed <
>>> mdkhajaasmath@gmail.com> wrote:
>>>
>>>> You are right. It worked but I still don't understand why I need to
>>>> pass that to all executors.
>>>>
>>>> On Fri, May 14, 2021 at 5:03 PM KhajaAsmath Mohammed <
>>>> mdkhajaasmath@gmail.com> wrote:
>>>>
>>>>> I am using json only to read properties before calling spark session.
>>>>> I don't know why we need to pass that to all executors.
>>>>>
>>>>>
>>>>> On Fri, May 14, 2021 at 5:01 PM Longjiang.Yang <
>>>>> Longjiang.Yang@target.com> wrote:
>>>>>
>>>>>> Could you check whether this file is accessible in executors? (is it
>>>>>> in HDFS or in the client local FS)
>>>>>> /appl/common/ftp/conf.json
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> *From: *KhajaAsmath Mohammed <md...@gmail.com>
>>>>>> *Date: *Friday, May 14, 2021 at 4:50 PM
>>>>>> *To: *"user @spark" <us...@spark.apache.org>
>>>>>> *Subject: *[EXTERNAL] Urgent Help - Py Spark submit error
>>>>>>
>>>>>>
>>>>>>
>>>>>> /appl/common/ftp/conf.json
>>>>>>
>>>>>

Re: [EXTERNAL] Urgent Help - Py Spark submit error

Posted by Amit Joshi <ma...@gmail.com>.
Hi KhajaAsmath,

Client vs Cluster: In client mode driver runs in the machine from where you
submit your job. Whereas in cluster mode driver runs in one of the worker
nodes.

I think you need to pass the conf file to your driver, as you are using it
in the driver code, which runs in one of the worker nodes.
Use this command to pass it to driver
*--files  /appl/common/ftp/conf.json  --conf
spark.driver.extraJavaOptions="-Dconfig.file=conf.json*

And make sure you are able to access the file location from worker nodes.


Regards
Amit Joshi

On Sat, May 15, 2021 at 5:14 AM KhajaAsmath Mohammed <
mdkhajaasmath@gmail.com> wrote:

> Here is my updated spark submit without any luck.,
>
> spark-submit --master yarn --deploy-mode cluster --files
> /appl/common/ftp/conf.json,/etc/hive/conf/hive-site.xml,/etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml
> --num-executors 6 --executor-cores 3 --driver-cores 3 --driver-memory 7g
> --executor-memory 7g /appl/common/ftp/ftp_event_data.py
> /appl/common/ftp/conf.json 2021-05-10 7
>
> On Fri, May 14, 2021 at 6:19 PM KhajaAsmath Mohammed <
> mdkhajaasmath@gmail.com> wrote:
>
>> Sorry my bad, it did not resolve the issue. I still have the same issue.
>> can anyone please guide me. I was still running as a client instead of a
>> cluster.
>>
>> On Fri, May 14, 2021 at 5:05 PM KhajaAsmath Mohammed <
>> mdkhajaasmath@gmail.com> wrote:
>>
>>> You are right. It worked but I still don't understand why I need to pass
>>> that to all executors.
>>>
>>> On Fri, May 14, 2021 at 5:03 PM KhajaAsmath Mohammed <
>>> mdkhajaasmath@gmail.com> wrote:
>>>
>>>> I am using json only to read properties before calling spark session. I
>>>> don't know why we need to pass that to all executors.
>>>>
>>>>
>>>> On Fri, May 14, 2021 at 5:01 PM Longjiang.Yang <
>>>> Longjiang.Yang@target.com> wrote:
>>>>
>>>>> Could you check whether this file is accessible in executors? (is it
>>>>> in HDFS or in the client local FS)
>>>>> /appl/common/ftp/conf.json
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> *From: *KhajaAsmath Mohammed <md...@gmail.com>
>>>>> *Date: *Friday, May 14, 2021 at 4:50 PM
>>>>> *To: *"user @spark" <us...@spark.apache.org>
>>>>> *Subject: *[EXTERNAL] Urgent Help - Py Spark submit error
>>>>>
>>>>>
>>>>>
>>>>> /appl/common/ftp/conf.json
>>>>>
>>>>

Re: [EXTERNAL] Urgent Help - Py Spark submit error

Posted by KhajaAsmath Mohammed <md...@gmail.com>.
Here is my updated spark submit without any luck.,

spark-submit --master yarn --deploy-mode cluster --files
/appl/common/ftp/conf.json,/etc/hive/conf/hive-site.xml,/etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml
--num-executors 6 --executor-cores 3 --driver-cores 3 --driver-memory 7g
--executor-memory 7g /appl/common/ftp/ftp_event_data.py
/appl/common/ftp/conf.json 2021-05-10 7

On Fri, May 14, 2021 at 6:19 PM KhajaAsmath Mohammed <
mdkhajaasmath@gmail.com> wrote:

> Sorry my bad, it did not resolve the issue. I still have the same issue.
> can anyone please guide me. I was still running as a client instead of a
> cluster.
>
> On Fri, May 14, 2021 at 5:05 PM KhajaAsmath Mohammed <
> mdkhajaasmath@gmail.com> wrote:
>
>> You are right. It worked but I still don't understand why I need to pass
>> that to all executors.
>>
>> On Fri, May 14, 2021 at 5:03 PM KhajaAsmath Mohammed <
>> mdkhajaasmath@gmail.com> wrote:
>>
>>> I am using json only to read properties before calling spark session. I
>>> don't know why we need to pass that to all executors.
>>>
>>>
>>> On Fri, May 14, 2021 at 5:01 PM Longjiang.Yang <
>>> Longjiang.Yang@target.com> wrote:
>>>
>>>> Could you check whether this file is accessible in executors? (is it in
>>>> HDFS or in the client local FS)
>>>> /appl/common/ftp/conf.json
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *From: *KhajaAsmath Mohammed <md...@gmail.com>
>>>> *Date: *Friday, May 14, 2021 at 4:50 PM
>>>> *To: *"user @spark" <us...@spark.apache.org>
>>>> *Subject: *[EXTERNAL] Urgent Help - Py Spark submit error
>>>>
>>>>
>>>>
>>>> /appl/common/ftp/conf.json
>>>>
>>>

Re: [EXTERNAL] Urgent Help - Py Spark submit error

Posted by KhajaAsmath Mohammed <md...@gmail.com>.
Sorry my bad, it did not resolve the issue. I still have the same issue.
can anyone please guide me. I was still running as a client instead of a
cluster.

On Fri, May 14, 2021 at 5:05 PM KhajaAsmath Mohammed <
mdkhajaasmath@gmail.com> wrote:

> You are right. It worked but I still don't understand why I need to pass
> that to all executors.
>
> On Fri, May 14, 2021 at 5:03 PM KhajaAsmath Mohammed <
> mdkhajaasmath@gmail.com> wrote:
>
>> I am using json only to read properties before calling spark session. I
>> don't know why we need to pass that to all executors.
>>
>>
>> On Fri, May 14, 2021 at 5:01 PM Longjiang.Yang <Lo...@target.com>
>> wrote:
>>
>>> Could you check whether this file is accessible in executors? (is it in
>>> HDFS or in the client local FS)
>>> /appl/common/ftp/conf.json
>>>
>>>
>>>
>>>
>>>
>>> *From: *KhajaAsmath Mohammed <md...@gmail.com>
>>> *Date: *Friday, May 14, 2021 at 4:50 PM
>>> *To: *"user @spark" <us...@spark.apache.org>
>>> *Subject: *[EXTERNAL] Urgent Help - Py Spark submit error
>>>
>>>
>>>
>>> /appl/common/ftp/conf.json
>>>
>>

Re: [EXTERNAL] Urgent Help - Py Spark submit error

Posted by KhajaAsmath Mohammed <md...@gmail.com>.
You are right. It worked but I still don't understand why I need to pass
that to all executors.

On Fri, May 14, 2021 at 5:03 PM KhajaAsmath Mohammed <
mdkhajaasmath@gmail.com> wrote:

> I am using json only to read properties before calling spark session. I
> don't know why we need to pass that to all executors.
>
>
> On Fri, May 14, 2021 at 5:01 PM Longjiang.Yang <Lo...@target.com>
> wrote:
>
>> Could you check whether this file is accessible in executors? (is it in
>> HDFS or in the client local FS)
>> /appl/common/ftp/conf.json
>>
>>
>>
>>
>>
>> *From: *KhajaAsmath Mohammed <md...@gmail.com>
>> *Date: *Friday, May 14, 2021 at 4:50 PM
>> *To: *"user @spark" <us...@spark.apache.org>
>> *Subject: *[EXTERNAL] Urgent Help - Py Spark submit error
>>
>>
>>
>> /appl/common/ftp/conf.json
>>
>

Re: [EXTERNAL] Urgent Help - Py Spark submit error

Posted by KhajaAsmath Mohammed <md...@gmail.com>.
I am using json only to read properties before calling spark session. I
don't know why we need to pass that to all executors.


On Fri, May 14, 2021 at 5:01 PM Longjiang.Yang <Lo...@target.com>
wrote:

> Could you check whether this file is accessible in executors? (is it in
> HDFS or in the client local FS)
> /appl/common/ftp/conf.json
>
>
>
>
>
> *From: *KhajaAsmath Mohammed <md...@gmail.com>
> *Date: *Friday, May 14, 2021 at 4:50 PM
> *To: *"user @spark" <us...@spark.apache.org>
> *Subject: *[EXTERNAL] Urgent Help - Py Spark submit error
>
>
>
> /appl/common/ftp/conf.json
>