You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Andrew Lee <al...@hotmail.com> on 2014/07/03 02:01:46 UTC

Enable Parsing Failed or Incompleted jobs on HistoryServer (YARN mode)

Hi All,
I have HistoryServer up and running, and it is great.
Is it possible to also enable HsitoryServer to parse failed jobs event by default as well?
I get "No Completed Applications Found" if job fails.
=========Event Log Location: hdfs:///user/test01/spark/logs/No Completed Applications Found=========
The reason is that it is good to run the HistoryServer to keep track of performance and resource usage for each completed job, but I found it more useful when job fails. I can identify which stage did it fail, etc instead of sipping through the logs from the Resource Manager. The same event log is only available when the Application Master is still active, once the job fails, the Application Master is killed, and I lose the GUI access, even though I have the event log in JSON format, I can't open it with the HistoryServer.
This is very helpful especially for long running jobs that last for 2-18 hours that generates Gigabytes of logs.
So I have 2 questions:
1. Any reason why we only render completed jobs? Why can't we bring in all jobs and choose from the GUI? Like a time machine to restore the status from the Application Master?








./core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala 







val logInfos = logDirs
          .sortBy { dir => getModificationTime(dir) }
          .map { dir => (dir, EventLoggingListener.parseLoggingInfo(dir.getPath, fileSystem)) }
          .filter { case (dir, info) => info.applicationComplete }

2. If I force to touch a file "APPLICATION_COMPLETE" in the failed job event log folder, will this cause any problem?

RE: Enable Parsing Failed or Incompleted jobs on HistoryServer (YARN mode)

Posted by Andrew Lee <al...@hotmail.com>.

Hi Suren,
It showed up after awhile when I touch the APPLICATION_COMPLETE file in the event log folders.
I checked the source code and it looks like it is re-scanning (polling) the folders every 10 seconds (configurable)?
Not sure what exactly triggers that 'refresh', may need to do more digging.
Thanks.

Date: Thu, 3 Jul 2014 06:56:46 -0400
Subject: Re: Enable Parsing Failed or Incompleted jobs on HistoryServer (YARN mode)
From: suren.hiraman@velos.io
To: user@spark.apache.org

I've had some odd behavior with jobs showing up in the history server in 1.0.0. Failed jobs do show up but it seems they can show up minutes or hours later. I see in the history server logs messages about bad task ids. But then eventually the jobs show up.

This might be your situation.
Anecdotally, if you click on the job in the Spark Master GUI after it is done, this may help it show up in the history server faster. Haven't reliably tested this though. May just be a coincidence of timing.

-Suren

On Wed, Jul 2, 2014 at 8:01 PM, Andrew Lee <al...@hotmail.com> wrote:

Hi All,
I have HistoryServer up and running, and it is great.
Is it possible to also enable HsitoryServer to parse failed jobs event by default as well?

I get "No Completed Applications Found" if job fails.

=========
Event Log Location: hdfs:///user/test01/spark/logs/
No Completed Applications Found=========
The reason is that it is good to run the HistoryServer to keep track of performance and resource usage for each completed job, but I found it more useful when job fails. I can identify which stage did it fail, etc instead of sipping through the logs 
from the Resource Manager. The same event log is only available when the Application Master is still active, once the job fails, the Application Master is killed, and I lose the GUI access, even though I have the event log in JSON format, I can't open it with 
the HistoryServer.
This is very helpful especially for long running jobs that last for 2-18 hours that generates Gigabytes of logs.
So I have 2 questions:

1. Any reason why we only render completed jobs? Why can't we bring in all jobs and choose from the GUI? Like a time machine to restore the status from the Application Master?

./core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala 

val logInfos = logDirs

          .sortBy { dir => getModificationTime(dir) }

          .map { dir => (dir, EventLoggingListener.parseLoggingInfo(dir.getPath, fileSystem)) }

          .filter { case (dir, info) => info.applicationComplete }

2. If I force to touch a file "APPLICATION_COMPLETE" in the failed job event log folder, will this cause any problem?

-- 

SUREN HIRAMAN, VP TECHNOLOGY
VelosAccelerating Machine Learning

440 NINTH AVENUE, 11TH FLOOR
NEW YORK, NY 10001
O: (917) 525-2466 ext. 105F: 646.349.4063

E: suren.hiraman@velos.io
W: www.velos.io

Re: Enable Parsing Failed or Incompleted jobs on HistoryServer (YARN mode)

Posted by Surendranauth Hiraman <su...@velos.io>.

I've had some odd behavior with jobs showing up in the history server in
1.0.0. Failed jobs do show up but it seems they can show up minutes or
hours later. I see in the history server logs messages about bad task ids.
But then eventually the jobs show up.

This might be your situation.

Anecdotally, if you click on the job in the Spark Master GUI after it is
done, this may help it show up in the history server faster. Haven't
reliably tested this though. May just be a coincidence of timing.

-Suren



On Wed, Jul 2, 2014 at 8:01 PM, Andrew Lee <al...@hotmail.com> wrote:

> Hi All,
>
> I have HistoryServer up and running, and it is great.
>
> Is it possible to also enable HsitoryServer to parse failed jobs event by
> default as well?
>
> I get "No Completed Applications Found" if job fails.
>
>    -
> -
> *========= Event Log Location: *hdfs:///user/test01/spark/logs/
>
> No Completed Applications Found
> =========
>
> The reason is that it is good to run the HistoryServer to keep track of
> performance and resource usage for each completed job,
> but I found it more useful when job fails. I can identify which stage did
> it fail, etc instead of sipping through the logs
> from the Resource Manager. The same event log is only available when the
> Application Master is still active, once the job fails,
> the Application Master is killed, and I lose the GUI access, even though I
> have the event log in JSON format, I can't open it with
> the HistoryServer.
>
> This is very helpful especially for long running jobs that last for 2-18
> hours that generates Gigabytes of logs.
>
> So I have 2 questions:
>
> 1. Any reason why we only render completed jobs? Why can't we bring in all
> jobs and choose from the GUI? Like a time machine to restore the status
> from the Application Master?
>
> ./core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala
>
> val logInfos = logDirs
>
>           .sortBy { dir => getModificationTime(dir) }
>
>           .map { dir => (dir,
> EventLoggingListener.parseLoggingInfo(dir.getPath, fileSystem)) }
>
>           .filter { case (dir, info) => info.*applicationComplete* }
>
>
>
> 2. If I force to touch a file "APPLICATION_COMPLETE" in the failed job
> event log folder, will this cause any problem?
>
>
>
>
>
>


-- 

SUREN HIRAMAN, VP TECHNOLOGY
Velos
Accelerating Machine Learning

440 NINTH AVENUE, 11TH FLOOR
NEW YORK, NY 10001
O: (917) 525-2466 ext. 105
F: 646.349.4063
E: suren.hiraman@v <su...@sociocast.com>elos.io
W: www.velos.io