You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Nan Zhu <zh...@gmail.com> on 2014/01/05 16:06:20 UTC

debug standalone Spark jobs?

Hi, all  

I’m trying to run a standalone job in a Spark cluster on EC2,  

obviously there is some bug in my code, after the job runs for several minutes, it failed with an exception  

Loading /usr/share/sbt/bin/sbt-launch-lib.bash  
[info] Set current project to rec_system (in build file:/home/ubuntu/rec_sys/)
[info] Running general.NetflixRecommender algorithm.SparkALS -b 20 -i 20 -l 0.005 -m spark://172.31.32.76:7077 --moviepath s3n://trainingset/netflix/training_set/* -o s3n://training_set/netflix/training_set/output.txt --rank 20 -r s3n://trainingset/netflix/training_set/mv_*
log4j:WARN No appenders could be found for logger (akka.event.slf4j.Slf4jEventHandler).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
failed to init the engine class
org.apache.spark.SparkException: Job aborted: Task 43.0:9 failed more than 4 times
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:827)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:825)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:825)
at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:440)
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:502)
at org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:157)




However, this information does not mean anything to me, how can I print out the detailed log information in console

I’m not sure about the reasons of those WARNs from log4j, I received the same WARNING when I run spark-shell, while in there, I can see detailed information like which task is running, etc.

Best,

--  
Nan Zhu

Re: debug standalone Spark jobs?

Posted by "K. Shankari" <sh...@eecs.berkeley.edu>.

I ran into a similar problem earlier. The issue is that spark does not
actually depend on log4j any more. You need to manually add the dependency
to your build system. For example, in sbt, I added the following to
build.sbt

libraryDependencies += "org.slf4j" % "slf4j-log4j12" % "1.7.2"
>

After that, it generates at least info level logging.

Thanks to TD for the pointer.

Thanks,
Shankari

On Sun, Jan 5, 2014 at 8:36 PM, Nan Zhu <zh...@gmail.com> wrote:

 Hi, all

I’m trying to run a standalone job in a Spark cluster on EC2,

obviously there is some bug in my code, after the job runs for several
minutes, it failed with an exception

Loading /usr/share/sbt/bin/sbt-launch-lib.bash

[info] Set current project to rec_system (in build
file:/home/ubuntu/rec_sys/)

[info] Running general.NetflixRecommender algorithm.SparkALS -b 20 -i 20 -l
0.005 -m spark://172.31.32.76:7077 --moviepath
s3n://trainingset/netflix/training_set/* -o
s3n://training_set/netflix/training_set/output.txt --rank 20 -r
s3n://trainingset/netflix/training_set/mv_*

log4j:WARN No appenders could be found for logger
(akka.event.slf4j.Slf4jEventHandler).

log4j:WARN Please initialize the log4j system properly.

log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
more info.

failed to init the engine class

org.apache.spark.SparkException: Job aborted: Task 43.0:9 failed more than
4 times

at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:827)

at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:825)

at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)

at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)

at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:825)

at
org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:440)

at org.apache.spark.scheduler.DAGScheduler.org
$apache$spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:502)

at
org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:157)

However, this information does not mean anything to me, how can I print out
the detailed log information in console

I’m not sure about the reasons of those WARNs from log4j, I received the
same WARNING when I run spark-shell, while in there, I can see detailed
information like which task is running, etc.

Best,

-- 
Nan Zhu

-- 
It's just about how deep your longing is!

>
>>
>>
>>
>

Re: debug standalone Spark jobs?

Posted by Eugen Cepoi <ce...@gmail.com>.

You can set the log level to INFO, it looks like spark is logging
applicative errors as INFO. When I have errors that I can reproduce only on
live data, I am running a spark shell with my job in its classpath, then I
debug & tweak things to find out what happens.


2014/1/5 Nan Zhu <zh...@gmail.com>

>  Yes, but my problem only appears when on a large dataset, anyway, Thanks
> for the reply
>
> Best,
>
> --
> Nan Zhu
>
> On Sunday, January 5, 2014 at 11:09 AM, Archit Thakur wrote:
>
> You can run your spark application locally by setting SPARK_MASTER="local"
> and then debug the launched jvm in your IDE.
>
>
> On Sun, Jan 5, 2014 at 9:04 PM, Nan Zhu <zh...@gmail.com> wrote:
>
>  Ah, yes, I think application logs really help
>
> Thank you
>
> --
> Nan Zhu
>
> On Sunday, January 5, 2014 at 10:13 AM, Sriram Ramachandrasekaran wrote:
>
> Did you get to look at the spark worker logs? They would be at
> SPARK_HOME/logs/
> Also, you should look at the application logs itself. They would be under
> SPARK_HOME/work/APP_ID
>
>
>
> On Sun, Jan 5, 2014 at 8:36 PM, Nan Zhu <zh...@gmail.com> wrote:
>
> Hi, all
>
> I’m trying to run a standalone job in a Spark cluster on EC2,
>
> obviously there is some bug in my code, after the job runs for several
> minutes, it failed with an exception
>
> Loading /usr/share/sbt/bin/sbt-launch-lib.bash
>
> [info] Set current project to rec_system (in build
> file:/home/ubuntu/rec_sys/)
>
> [info] Running general.NetflixRecommender algorithm.SparkALS -b 20 -i 20
> -l 0.005 -m spark://172.31.32.76:7077 --moviepath
> s3n://trainingset/netflix/training_set/* -o
> s3n://training_set/netflix/training_set/output.txt --rank 20 -r
> s3n://trainingset/netflix/training_set/mv_*
>
> log4j:WARN No appenders could be found for logger
> (akka.event.slf4j.Slf4jEventHandler).
>
> log4j:WARN Please initialize the log4j system properly.
>
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
> more info.
>
> failed to init the engine class
>
> org.apache.spark.SparkException: Job aborted: Task 43.0:9 failed more than
> 4 times
>
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:827)
>
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:825)
>
> at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
>
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>
> at
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:825)
>
> at
> org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:440)
>
> at org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:502)
>
> at
> org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:157)
>
>
>
>
>
> However, this information does not mean anything to me, how can I print
> out the detailed log information in console
>
> I’m not sure about the reasons of those WARNs from log4j, I received the
> same WARNING when I run spark-shell, while in there, I can see detailed
> information like which task is running, etc.
>
> Best,
>
> --
> Nan Zhu
>
>
>
>
> --
> It's just about how deep your longing is!
>
>
>
>
>

Re: debug standalone Spark jobs?

Posted by Nan Zhu <zh...@gmail.com>.

Yes, but my problem only appears when on a large dataset, anyway, Thanks for the reply  

Best,  

--  
Nan Zhu


On Sunday, January 5, 2014 at 11:09 AM, Archit Thakur wrote:

> You can run your spark application locally by setting SPARK_MASTER="local" and then debug the launched jvm in your IDE.
>  
>  
> On Sun, Jan 5, 2014 at 9:04 PM, Nan Zhu <zhunanmcgill@gmail.com (mailto:zhunanmcgill@gmail.com)> wrote:
> > Ah, yes, I think application logs really help  
> >  
> > Thank you  
> >  
> > --  
> > Nan Zhu
> >  
> >  
> > On Sunday, January 5, 2014 at 10:13 AM, Sriram Ramachandrasekaran wrote:
> >  
> > > Did you get to look at the spark worker logs? They would be at SPARK_HOME/logs/
> > > Also, you should look at the application logs itself. They would be under SPARK_HOME/work/APP_ID
> > >  
> > >  
> > >  
> > > On Sun, Jan 5, 2014 at 8:36 PM, Nan Zhu <zhunanmcgill@gmail.com (mailto:zhunanmcgill@gmail.com)> wrote:
> > > > Hi, all  
> > > >  
> > > > I’m trying to run a standalone job in a Spark cluster on EC2,  
> > > >  
> > > > obviously there is some bug in my code, after the job runs for several minutes, it failed with an exception   
> > > >  
> > > >  
> > > > Loading /usr/share/sbt/bin/sbt-launch-lib.bash
> > > >  
> > > >  
> > > > [info] Set current project to rec_system (in build file:/home/ubuntu/rec_sys/)
> > > >  
> > > >  
> > > > [info] Running general.NetflixRecommender algorithm.SparkALS -b 20 -i 20 -l 0.005 -m spark://172.31.32.76:7077 (http://172.31.32.76:7077) --moviepath s3n://trainingset/netflix/training_set/* -o s3n://training_set/netflix/training_set/output.txt --rank 20 -r s3n://trainingset/netflix/training_set/mv_*
> > > >  
> > > >  
> > > > log4j:WARN No appenders could be found for logger (akka.event.slf4j.Slf4jEventHandler).
> > > >  
> > > >  
> > > > log4j:WARN Please initialize the log4j system properly.
> > > >  
> > > >  
> > > > log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
> > > >  
> > > >  
> > > > failed to init the engine class
> > > >  
> > > >  
> > > > org.apache.spark.SparkException: Job aborted: Task 43.0:9 failed more than 4 times
> > > >  
> > > >  
> > > > at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:827)
> > > >  
> > > >  
> > > > at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:825)
> > > >  
> > > >  
> > > > at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
> > > >  
> > > >  
> > > > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> > > >  
> > > >  
> > > > at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:825)
> > > >  
> > > >  
> > > > at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:440)
> > > >  
> > > >  
> > > > at org.apache.spark.scheduler.DAGScheduler.org (http://org.apache.spark.scheduler.DAGScheduler.org)$apache$spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:502)
> > > >  
> > > >  
> > > > at org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:157)
> > > >  
> > > >  
> > > >  
> > > >  
> > > >  
> > > >  
> > > >  
> > > >  
> > > >  
> > > >  
> > > >  
> > > >  
> > > > However, this information does not mean anything to me, how can I print out the detailed log information in console
> > > >  
> > > > I’m not sure about the reasons of those WARNs from log4j, I received the same WARNING when I run spark-shell, while in there, I can see detailed information like which task is running, etc.  
> > > >  
> > > > Best,
> > > >  
> > > > --  
> > > > Nan Zhu
> > > >  
> > >  
> > >  
> > >  
> > > --  
> > > It's just about how deep your longing is!
> >  
>

Re: debug standalone Spark jobs?

Posted by Archit Thakur <ar...@gmail.com>.

You can run your spark application locally by setting SPARK_MASTER="local"
and then debug the launched jvm in your IDE.


On Sun, Jan 5, 2014 at 9:04 PM, Nan Zhu <zh...@gmail.com> wrote:

>  Ah, yes, I think application logs really help
>
> Thank you
>
> --
> Nan Zhu
>
> On Sunday, January 5, 2014 at 10:13 AM, Sriram Ramachandrasekaran wrote:
>
> Did you get to look at the spark worker logs? They would be at
> SPARK_HOME/logs/
> Also, you should look at the application logs itself. They would be under
> SPARK_HOME/work/APP_ID
>
>
>
> On Sun, Jan 5, 2014 at 8:36 PM, Nan Zhu <zh...@gmail.com> wrote:
>
> Hi, all
>
> I’m trying to run a standalone job in a Spark cluster on EC2,
>
> obviously there is some bug in my code, after the job runs for several
> minutes, it failed with an exception
>
> Loading /usr/share/sbt/bin/sbt-launch-lib.bash
>
> [info] Set current project to rec_system (in build
> file:/home/ubuntu/rec_sys/)
>
> [info] Running general.NetflixRecommender algorithm.SparkALS -b 20 -i 20
> -l 0.005 -m spark://172.31.32.76:7077 --moviepath
> s3n://trainingset/netflix/training_set/* -o
> s3n://training_set/netflix/training_set/output.txt --rank 20 -r
> s3n://trainingset/netflix/training_set/mv_*
>
> log4j:WARN No appenders could be found for logger
> (akka.event.slf4j.Slf4jEventHandler).
>
> log4j:WARN Please initialize the log4j system properly.
>
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
> more info.
>
> failed to init the engine class
>
> org.apache.spark.SparkException: Job aborted: Task 43.0:9 failed more than
> 4 times
>
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:827)
>
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:825)
>
> at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
>
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>
> at
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:825)
>
> at
> org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:440)
>
> at org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:502)
>
> at
> org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:157)
>
>
>
>
>
> However, this information does not mean anything to me, how can I print
> out the detailed log information in console
>
> I’m not sure about the reasons of those WARNs from log4j, I received the
> same WARNING when I run spark-shell, while in there, I can see detailed
> information like which task is running, etc.
>
> Best,
>
> --
> Nan Zhu
>
>
>
>
> --
> It's just about how deep your longing is!
>
>
>

Re: debug standalone Spark jobs?

Posted by Nan Zhu <zh...@gmail.com>.

Ah, yes, I think application logs really help  

Thank you  

--  
Nan Zhu


On Sunday, January 5, 2014 at 10:13 AM, Sriram Ramachandrasekaran wrote:

> Did you get to look at the spark worker logs? They would be at SPARK_HOME/logs/
> Also, you should look at the application logs itself. They would be under SPARK_HOME/work/APP_ID
>  
>  
>  
> On Sun, Jan 5, 2014 at 8:36 PM, Nan Zhu <zhunanmcgill@gmail.com (mailto:zhunanmcgill@gmail.com)> wrote:
> > Hi, all  
> >  
> > I’m trying to run a standalone job in a Spark cluster on EC2,  
> >  
> > obviously there is some bug in my code, after the job runs for several minutes, it failed with an exception   
> >  
> >  
> > Loading /usr/share/sbt/bin/sbt-launch-lib.bash
> >  
> >  
> > [info] Set current project to rec_system (in build file:/home/ubuntu/rec_sys/)
> >  
> >  
> > [info] Running general.NetflixRecommender algorithm.SparkALS -b 20 -i 20 -l 0.005 -m spark://172.31.32.76:7077 (http://172.31.32.76:7077) --moviepath s3n://trainingset/netflix/training_set/* -o s3n://training_set/netflix/training_set/output.txt --rank 20 -r s3n://trainingset/netflix/training_set/mv_*
> >  
> >  
> > log4j:WARN No appenders could be found for logger (akka.event.slf4j.Slf4jEventHandler).
> >  
> >  
> > log4j:WARN Please initialize the log4j system properly.
> >  
> >  
> > log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
> >  
> >  
> > failed to init the engine class
> >  
> >  
> > org.apache.spark.SparkException: Job aborted: Task 43.0:9 failed more than 4 times
> >  
> >  
> > at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:827)
> >  
> >  
> > at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:825)
> >  
> >  
> > at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
> >  
> >  
> > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> >  
> >  
> > at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:825)
> >  
> >  
> > at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:440)
> >  
> >  
> > at org.apache.spark.scheduler.DAGScheduler.org (http://org.apache.spark.scheduler.DAGScheduler.org)$apache$spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:502)
> >  
> >  
> > at org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:157)
> >  
> >  
> >  
> >  
> >  
> >  
> >  
> >  
> >  
> >  
> >  
> >  
> > However, this information does not mean anything to me, how can I print out the detailed log information in console
> >  
> > I’m not sure about the reasons of those WARNs from log4j, I received the same WARNING when I run spark-shell, while in there, I can see detailed information like which task is running, etc.  
> >  
> > Best,
> >  
> > --  
> > Nan Zhu
> >  
>  
>  
>  
> --  
> It's just about how deep your longing is!

Re: debug standalone Spark jobs?

Posted by Sriram Ramachandrasekaran <sr...@gmail.com>.

Did you get to look at the spark worker logs? They would be at
SPARK_HOME/logs/
Also, you should look at the application logs itself. They would be under
SPARK_HOME/work/APP_ID



On Sun, Jan 5, 2014 at 8:36 PM, Nan Zhu <zh...@gmail.com> wrote:

> Hi, all
>
> I’m trying to run a standalone job in a Spark cluster on EC2,
>
> obviously there is some bug in my code, after the job runs for several
> minutes, it failed with an exception
>
> Loading /usr/share/sbt/bin/sbt-launch-lib.bash
>
> [info] Set current project to rec_system (in build
> file:/home/ubuntu/rec_sys/)
>
> [info] Running general.NetflixRecommender algorithm.SparkALS -b 20 -i 20
> -l 0.005 -m spark://172.31.32.76:7077 --moviepath
> s3n://trainingset/netflix/training_set/* -o
> s3n://training_set/netflix/training_set/output.txt --rank 20 -r
> s3n://trainingset/netflix/training_set/mv_*
>
> log4j:WARN No appenders could be found for logger
> (akka.event.slf4j.Slf4jEventHandler).
>
> log4j:WARN Please initialize the log4j system properly.
>
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
> more info.
>
> failed to init the engine class
>
> org.apache.spark.SparkException: Job aborted: Task 43.0:9 failed more than
> 4 times
>
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:827)
>
> at
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:825)
>
> at
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
>
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>
> at
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:825)
>
> at
> org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:440)
>
> at org.apache.spark.scheduler.DAGScheduler.org
> $apache$spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:502)
>
> at
> org.apache.spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:157)
>
>
>
>
>
> However, this information does not mean anything to me, how can I print
> out the detailed log information in console
>
> I’m not sure about the reasons of those WARNs from log4j, I received the
> same WARNING when I run spark-shell, while in there, I can see detailed
> information like which task is running, etc.
>
> Best,
>
> --
> Nan Zhu
>
>


-- 
It's just about how deep your longing is!