You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Gopi Krishna M <mg...@gmail.com> on 2013/08/27 07:04:37 UTC

hadoop debugging tools

Hi

We are seeing our map-reduce jobs crashing once in a while and have to go
through the logs on all the nodes to figure out what went wrong.  Sometimes
it is low resources and sometimes it is a programming error which is
triggered on specific inputs..  Same is true for some of our hive queries.

Are there any tools (free/paid) which help us to do this debugging quickly?
 I am planning to write a debugging tool for sifting through the
distributed logs of hadoop but wanted to check if there are already any
useful tools for this.

Thx
Gopi | www.wignite.com

Re: hadoop debugging tools

Posted by Shekhar Sharma <sh...@gmail.com>.
You can get the stats for a job using rumen.
http://ksssblogs.blogspot.in/2013/06/getting-job-statistics-using-rumen.html

Regards,
Som Shekhar Sharma
+91-8197243810


On Tue, Aug 27, 2013 at 10:54 AM, Gopi Krishna M <mg...@gmail.com> wrote:

> Harsh: thanks for the quick response.
>
> we often see an error response such as "Failed(Query returned non-zero
> code: 2, cause: FAILED: Execution Error, return code 2 from
> org.apache.hadoop.hive.ql.exec.MapRedTask)" and then go through all the
> logs to figure out what happened.  I use the jobtracker UI to go to the
> error logs and see what happened.
>
> I was thinking a log parsing tool with a good UI to go through the
> distributed-logs and help you find errors,  get stats on similar effors in
> prev runs etc will be useful.  HADOOP-9861 might help in getting good
> info, but might be still not very easy for quick debugging.
>
> Has anybody faced similar issues as part of their development?  Are there
> any better ways to pin point the cause of error?
>
> Thx
> Gopi | www.wignite.com
>
>
> On Tue, Aug 27, 2013 at 10:42 AM, Harsh J <ha...@cloudera.com> wrote:
>
>> We set a part of the failure reason as the diagnostic message for a
>> failed task that a JobClient API retrieves/can retrieve:
>>
>> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/RunningJob.html#getTaskDiagnostics(org.apache.hadoop.mapred.TaskAttemptID)
>> .
>> Often this is
>> 'useless' given the stack trace's top part isn't always carrying the
>> most relevant information, so perhaps HADOOP-9861 may help here once
>> it is checked in.
>>
>> On Tue, Aug 27, 2013 at 10:34 AM, Gopi Krishna M <mg...@gmail.com>
>> wrote:
>> > Hi
>> >
>> > We are seeing our map-reduce jobs crashing once in a while and have to
>> go
>> > through the logs on all the nodes to figure out what went wrong.
>>  Sometimes
>> > it is low resources and sometimes it is a programming error which is
>> > triggered on specific inputs..  Same is true for some of our hive
>> queries.
>> >
>> > Are there any tools (free/paid) which help us to do this debugging
>> quickly?
>> > I am planning to write a debugging tool for sifting through the
>> distributed
>> > logs of hadoop but wanted to check if there are already any useful
>> tools for
>> > this.
>> >
>> > Thx
>> > Gopi | www.wignite.com
>>
>>
>>
>> --
>> Harsh J
>>
>
>

Re: hadoop debugging tools

Posted by Shekhar Sharma <sh...@gmail.com>.
You can get the stats for a job using rumen.
http://ksssblogs.blogspot.in/2013/06/getting-job-statistics-using-rumen.html

Regards,
Som Shekhar Sharma
+91-8197243810


On Tue, Aug 27, 2013 at 10:54 AM, Gopi Krishna M <mg...@gmail.com> wrote:

> Harsh: thanks for the quick response.
>
> we often see an error response such as "Failed(Query returned non-zero
> code: 2, cause: FAILED: Execution Error, return code 2 from
> org.apache.hadoop.hive.ql.exec.MapRedTask)" and then go through all the
> logs to figure out what happened.  I use the jobtracker UI to go to the
> error logs and see what happened.
>
> I was thinking a log parsing tool with a good UI to go through the
> distributed-logs and help you find errors,  get stats on similar effors in
> prev runs etc will be useful.  HADOOP-9861 might help in getting good
> info, but might be still not very easy for quick debugging.
>
> Has anybody faced similar issues as part of their development?  Are there
> any better ways to pin point the cause of error?
>
> Thx
> Gopi | www.wignite.com
>
>
> On Tue, Aug 27, 2013 at 10:42 AM, Harsh J <ha...@cloudera.com> wrote:
>
>> We set a part of the failure reason as the diagnostic message for a
>> failed task that a JobClient API retrieves/can retrieve:
>>
>> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/RunningJob.html#getTaskDiagnostics(org.apache.hadoop.mapred.TaskAttemptID)
>> .
>> Often this is
>> 'useless' given the stack trace's top part isn't always carrying the
>> most relevant information, so perhaps HADOOP-9861 may help here once
>> it is checked in.
>>
>> On Tue, Aug 27, 2013 at 10:34 AM, Gopi Krishna M <mg...@gmail.com>
>> wrote:
>> > Hi
>> >
>> > We are seeing our map-reduce jobs crashing once in a while and have to
>> go
>> > through the logs on all the nodes to figure out what went wrong.
>>  Sometimes
>> > it is low resources and sometimes it is a programming error which is
>> > triggered on specific inputs..  Same is true for some of our hive
>> queries.
>> >
>> > Are there any tools (free/paid) which help us to do this debugging
>> quickly?
>> > I am planning to write a debugging tool for sifting through the
>> distributed
>> > logs of hadoop but wanted to check if there are already any useful
>> tools for
>> > this.
>> >
>> > Thx
>> > Gopi | www.wignite.com
>>
>>
>>
>> --
>> Harsh J
>>
>
>

Re: hadoop debugging tools

Posted by Shekhar Sharma <sh...@gmail.com>.
You can get the stats for a job using rumen.
http://ksssblogs.blogspot.in/2013/06/getting-job-statistics-using-rumen.html

Regards,
Som Shekhar Sharma
+91-8197243810


On Tue, Aug 27, 2013 at 10:54 AM, Gopi Krishna M <mg...@gmail.com> wrote:

> Harsh: thanks for the quick response.
>
> we often see an error response such as "Failed(Query returned non-zero
> code: 2, cause: FAILED: Execution Error, return code 2 from
> org.apache.hadoop.hive.ql.exec.MapRedTask)" and then go through all the
> logs to figure out what happened.  I use the jobtracker UI to go to the
> error logs and see what happened.
>
> I was thinking a log parsing tool with a good UI to go through the
> distributed-logs and help you find errors,  get stats on similar effors in
> prev runs etc will be useful.  HADOOP-9861 might help in getting good
> info, but might be still not very easy for quick debugging.
>
> Has anybody faced similar issues as part of their development?  Are there
> any better ways to pin point the cause of error?
>
> Thx
> Gopi | www.wignite.com
>
>
> On Tue, Aug 27, 2013 at 10:42 AM, Harsh J <ha...@cloudera.com> wrote:
>
>> We set a part of the failure reason as the diagnostic message for a
>> failed task that a JobClient API retrieves/can retrieve:
>>
>> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/RunningJob.html#getTaskDiagnostics(org.apache.hadoop.mapred.TaskAttemptID)
>> .
>> Often this is
>> 'useless' given the stack trace's top part isn't always carrying the
>> most relevant information, so perhaps HADOOP-9861 may help here once
>> it is checked in.
>>
>> On Tue, Aug 27, 2013 at 10:34 AM, Gopi Krishna M <mg...@gmail.com>
>> wrote:
>> > Hi
>> >
>> > We are seeing our map-reduce jobs crashing once in a while and have to
>> go
>> > through the logs on all the nodes to figure out what went wrong.
>>  Sometimes
>> > it is low resources and sometimes it is a programming error which is
>> > triggered on specific inputs..  Same is true for some of our hive
>> queries.
>> >
>> > Are there any tools (free/paid) which help us to do this debugging
>> quickly?
>> > I am planning to write a debugging tool for sifting through the
>> distributed
>> > logs of hadoop but wanted to check if there are already any useful
>> tools for
>> > this.
>> >
>> > Thx
>> > Gopi | www.wignite.com
>>
>>
>>
>> --
>> Harsh J
>>
>
>

Re: hadoop debugging tools

Posted by Shekhar Sharma <sh...@gmail.com>.
You can get the stats for a job using rumen.
http://ksssblogs.blogspot.in/2013/06/getting-job-statistics-using-rumen.html

Regards,
Som Shekhar Sharma
+91-8197243810


On Tue, Aug 27, 2013 at 10:54 AM, Gopi Krishna M <mg...@gmail.com> wrote:

> Harsh: thanks for the quick response.
>
> we often see an error response such as "Failed(Query returned non-zero
> code: 2, cause: FAILED: Execution Error, return code 2 from
> org.apache.hadoop.hive.ql.exec.MapRedTask)" and then go through all the
> logs to figure out what happened.  I use the jobtracker UI to go to the
> error logs and see what happened.
>
> I was thinking a log parsing tool with a good UI to go through the
> distributed-logs and help you find errors,  get stats on similar effors in
> prev runs etc will be useful.  HADOOP-9861 might help in getting good
> info, but might be still not very easy for quick debugging.
>
> Has anybody faced similar issues as part of their development?  Are there
> any better ways to pin point the cause of error?
>
> Thx
> Gopi | www.wignite.com
>
>
> On Tue, Aug 27, 2013 at 10:42 AM, Harsh J <ha...@cloudera.com> wrote:
>
>> We set a part of the failure reason as the diagnostic message for a
>> failed task that a JobClient API retrieves/can retrieve:
>>
>> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/RunningJob.html#getTaskDiagnostics(org.apache.hadoop.mapred.TaskAttemptID)
>> .
>> Often this is
>> 'useless' given the stack trace's top part isn't always carrying the
>> most relevant information, so perhaps HADOOP-9861 may help here once
>> it is checked in.
>>
>> On Tue, Aug 27, 2013 at 10:34 AM, Gopi Krishna M <mg...@gmail.com>
>> wrote:
>> > Hi
>> >
>> > We are seeing our map-reduce jobs crashing once in a while and have to
>> go
>> > through the logs on all the nodes to figure out what went wrong.
>>  Sometimes
>> > it is low resources and sometimes it is a programming error which is
>> > triggered on specific inputs..  Same is true for some of our hive
>> queries.
>> >
>> > Are there any tools (free/paid) which help us to do this debugging
>> quickly?
>> > I am planning to write a debugging tool for sifting through the
>> distributed
>> > logs of hadoop but wanted to check if there are already any useful
>> tools for
>> > this.
>> >
>> > Thx
>> > Gopi | www.wignite.com
>>
>>
>>
>> --
>> Harsh J
>>
>
>

Re: hadoop debugging tools

Posted by Gopi Krishna M <mg...@gmail.com>.
Harsh: thanks for the quick response.

we often see an error response such as "Failed(Query returned non-zero
code: 2, cause: FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.MapRedTask)" and then go through all the
logs to figure out what happened.  I use the jobtracker UI to go to the
error logs and see what happened.

I was thinking a log parsing tool with a good UI to go through the
distributed-logs and help you find errors,  get stats on similar effors in
prev runs etc will be useful.  HADOOP-9861 might help in getting good info,
but might be still not very easy for quick debugging.

Has anybody faced similar issues as part of their development?  Are there
any better ways to pin point the cause of error?

Thx
Gopi | www.wignite.com


On Tue, Aug 27, 2013 at 10:42 AM, Harsh J <ha...@cloudera.com> wrote:

> We set a part of the failure reason as the diagnostic message for a
> failed task that a JobClient API retrieves/can retrieve:
>
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/RunningJob.html#getTaskDiagnostics(org.apache.hadoop.mapred.TaskAttemptID)
> .
> Often this is
> 'useless' given the stack trace's top part isn't always carrying the
> most relevant information, so perhaps HADOOP-9861 may help here once
> it is checked in.
>
> On Tue, Aug 27, 2013 at 10:34 AM, Gopi Krishna M <mg...@gmail.com> wrote:
> > Hi
> >
> > We are seeing our map-reduce jobs crashing once in a while and have to go
> > through the logs on all the nodes to figure out what went wrong.
>  Sometimes
> > it is low resources and sometimes it is a programming error which is
> > triggered on specific inputs..  Same is true for some of our hive
> queries.
> >
> > Are there any tools (free/paid) which help us to do this debugging
> quickly?
> > I am planning to write a debugging tool for sifting through the
> distributed
> > logs of hadoop but wanted to check if there are already any useful tools
> for
> > this.
> >
> > Thx
> > Gopi | www.wignite.com
>
>
>
> --
> Harsh J
>

Re: hadoop debugging tools

Posted by Gopi Krishna M <mg...@gmail.com>.
Harsh: thanks for the quick response.

we often see an error response such as "Failed(Query returned non-zero
code: 2, cause: FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.MapRedTask)" and then go through all the
logs to figure out what happened.  I use the jobtracker UI to go to the
error logs and see what happened.

I was thinking a log parsing tool with a good UI to go through the
distributed-logs and help you find errors,  get stats on similar effors in
prev runs etc will be useful.  HADOOP-9861 might help in getting good info,
but might be still not very easy for quick debugging.

Has anybody faced similar issues as part of their development?  Are there
any better ways to pin point the cause of error?

Thx
Gopi | www.wignite.com


On Tue, Aug 27, 2013 at 10:42 AM, Harsh J <ha...@cloudera.com> wrote:

> We set a part of the failure reason as the diagnostic message for a
> failed task that a JobClient API retrieves/can retrieve:
>
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/RunningJob.html#getTaskDiagnostics(org.apache.hadoop.mapred.TaskAttemptID)
> .
> Often this is
> 'useless' given the stack trace's top part isn't always carrying the
> most relevant information, so perhaps HADOOP-9861 may help here once
> it is checked in.
>
> On Tue, Aug 27, 2013 at 10:34 AM, Gopi Krishna M <mg...@gmail.com> wrote:
> > Hi
> >
> > We are seeing our map-reduce jobs crashing once in a while and have to go
> > through the logs on all the nodes to figure out what went wrong.
>  Sometimes
> > it is low resources and sometimes it is a programming error which is
> > triggered on specific inputs..  Same is true for some of our hive
> queries.
> >
> > Are there any tools (free/paid) which help us to do this debugging
> quickly?
> > I am planning to write a debugging tool for sifting through the
> distributed
> > logs of hadoop but wanted to check if there are already any useful tools
> for
> > this.
> >
> > Thx
> > Gopi | www.wignite.com
>
>
>
> --
> Harsh J
>

Re: hadoop debugging tools

Posted by Gopi Krishna M <mg...@gmail.com>.
Harsh: thanks for the quick response.

we often see an error response such as "Failed(Query returned non-zero
code: 2, cause: FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.MapRedTask)" and then go through all the
logs to figure out what happened.  I use the jobtracker UI to go to the
error logs and see what happened.

I was thinking a log parsing tool with a good UI to go through the
distributed-logs and help you find errors,  get stats on similar effors in
prev runs etc will be useful.  HADOOP-9861 might help in getting good info,
but might be still not very easy for quick debugging.

Has anybody faced similar issues as part of their development?  Are there
any better ways to pin point the cause of error?

Thx
Gopi | www.wignite.com


On Tue, Aug 27, 2013 at 10:42 AM, Harsh J <ha...@cloudera.com> wrote:

> We set a part of the failure reason as the diagnostic message for a
> failed task that a JobClient API retrieves/can retrieve:
>
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/RunningJob.html#getTaskDiagnostics(org.apache.hadoop.mapred.TaskAttemptID)
> .
> Often this is
> 'useless' given the stack trace's top part isn't always carrying the
> most relevant information, so perhaps HADOOP-9861 may help here once
> it is checked in.
>
> On Tue, Aug 27, 2013 at 10:34 AM, Gopi Krishna M <mg...@gmail.com> wrote:
> > Hi
> >
> > We are seeing our map-reduce jobs crashing once in a while and have to go
> > through the logs on all the nodes to figure out what went wrong.
>  Sometimes
> > it is low resources and sometimes it is a programming error which is
> > triggered on specific inputs..  Same is true for some of our hive
> queries.
> >
> > Are there any tools (free/paid) which help us to do this debugging
> quickly?
> > I am planning to write a debugging tool for sifting through the
> distributed
> > logs of hadoop but wanted to check if there are already any useful tools
> for
> > this.
> >
> > Thx
> > Gopi | www.wignite.com
>
>
>
> --
> Harsh J
>

Re: hadoop debugging tools

Posted by Gopi Krishna M <mg...@gmail.com>.
Harsh: thanks for the quick response.

we often see an error response such as "Failed(Query returned non-zero
code: 2, cause: FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.MapRedTask)" and then go through all the
logs to figure out what happened.  I use the jobtracker UI to go to the
error logs and see what happened.

I was thinking a log parsing tool with a good UI to go through the
distributed-logs and help you find errors,  get stats on similar effors in
prev runs etc will be useful.  HADOOP-9861 might help in getting good info,
but might be still not very easy for quick debugging.

Has anybody faced similar issues as part of their development?  Are there
any better ways to pin point the cause of error?

Thx
Gopi | www.wignite.com


On Tue, Aug 27, 2013 at 10:42 AM, Harsh J <ha...@cloudera.com> wrote:

> We set a part of the failure reason as the diagnostic message for a
> failed task that a JobClient API retrieves/can retrieve:
>
> http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/RunningJob.html#getTaskDiagnostics(org.apache.hadoop.mapred.TaskAttemptID)
> .
> Often this is
> 'useless' given the stack trace's top part isn't always carrying the
> most relevant information, so perhaps HADOOP-9861 may help here once
> it is checked in.
>
> On Tue, Aug 27, 2013 at 10:34 AM, Gopi Krishna M <mg...@gmail.com> wrote:
> > Hi
> >
> > We are seeing our map-reduce jobs crashing once in a while and have to go
> > through the logs on all the nodes to figure out what went wrong.
>  Sometimes
> > it is low resources and sometimes it is a programming error which is
> > triggered on specific inputs..  Same is true for some of our hive
> queries.
> >
> > Are there any tools (free/paid) which help us to do this debugging
> quickly?
> > I am planning to write a debugging tool for sifting through the
> distributed
> > logs of hadoop but wanted to check if there are already any useful tools
> for
> > this.
> >
> > Thx
> > Gopi | www.wignite.com
>
>
>
> --
> Harsh J
>

Re: hadoop debugging tools

Posted by Harsh J <ha...@cloudera.com>.
We set a part of the failure reason as the diagnostic message for a
failed task that a JobClient API retrieves/can retrieve:
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/RunningJob.html#getTaskDiagnostics(org.apache.hadoop.mapred.TaskAttemptID).
Often this is
'useless' given the stack trace's top part isn't always carrying the
most relevant information, so perhaps HADOOP-9861 may help here once
it is checked in.

On Tue, Aug 27, 2013 at 10:34 AM, Gopi Krishna M <mg...@gmail.com> wrote:
> Hi
>
> We are seeing our map-reduce jobs crashing once in a while and have to go
> through the logs on all the nodes to figure out what went wrong.  Sometimes
> it is low resources and sometimes it is a programming error which is
> triggered on specific inputs..  Same is true for some of our hive queries.
>
> Are there any tools (free/paid) which help us to do this debugging quickly?
> I am planning to write a debugging tool for sifting through the distributed
> logs of hadoop but wanted to check if there are already any useful tools for
> this.
>
> Thx
> Gopi | www.wignite.com



-- 
Harsh J

Re: hadoop debugging tools

Posted by Harsh J <ha...@cloudera.com>.
We set a part of the failure reason as the diagnostic message for a
failed task that a JobClient API retrieves/can retrieve:
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/RunningJob.html#getTaskDiagnostics(org.apache.hadoop.mapred.TaskAttemptID).
Often this is
'useless' given the stack trace's top part isn't always carrying the
most relevant information, so perhaps HADOOP-9861 may help here once
it is checked in.

On Tue, Aug 27, 2013 at 10:34 AM, Gopi Krishna M <mg...@gmail.com> wrote:
> Hi
>
> We are seeing our map-reduce jobs crashing once in a while and have to go
> through the logs on all the nodes to figure out what went wrong.  Sometimes
> it is low resources and sometimes it is a programming error which is
> triggered on specific inputs..  Same is true for some of our hive queries.
>
> Are there any tools (free/paid) which help us to do this debugging quickly?
> I am planning to write a debugging tool for sifting through the distributed
> logs of hadoop but wanted to check if there are already any useful tools for
> this.
>
> Thx
> Gopi | www.wignite.com



-- 
Harsh J

Re: hadoop debugging tools

Posted by Harsh J <ha...@cloudera.com>.
We set a part of the failure reason as the diagnostic message for a
failed task that a JobClient API retrieves/can retrieve:
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/RunningJob.html#getTaskDiagnostics(org.apache.hadoop.mapred.TaskAttemptID).
Often this is
'useless' given the stack trace's top part isn't always carrying the
most relevant information, so perhaps HADOOP-9861 may help here once
it is checked in.

On Tue, Aug 27, 2013 at 10:34 AM, Gopi Krishna M <mg...@gmail.com> wrote:
> Hi
>
> We are seeing our map-reduce jobs crashing once in a while and have to go
> through the logs on all the nodes to figure out what went wrong.  Sometimes
> it is low resources and sometimes it is a programming error which is
> triggered on specific inputs..  Same is true for some of our hive queries.
>
> Are there any tools (free/paid) which help us to do this debugging quickly?
> I am planning to write a debugging tool for sifting through the distributed
> logs of hadoop but wanted to check if there are already any useful tools for
> this.
>
> Thx
> Gopi | www.wignite.com



-- 
Harsh J

Re: hadoop debugging tools

Posted by Harsh J <ha...@cloudera.com>.
We set a part of the failure reason as the diagnostic message for a
failed task that a JobClient API retrieves/can retrieve:
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/RunningJob.html#getTaskDiagnostics(org.apache.hadoop.mapred.TaskAttemptID).
Often this is
'useless' given the stack trace's top part isn't always carrying the
most relevant information, so perhaps HADOOP-9861 may help here once
it is checked in.

On Tue, Aug 27, 2013 at 10:34 AM, Gopi Krishna M <mg...@gmail.com> wrote:
> Hi
>
> We are seeing our map-reduce jobs crashing once in a while and have to go
> through the logs on all the nodes to figure out what went wrong.  Sometimes
> it is low resources and sometimes it is a programming error which is
> triggered on specific inputs..  Same is true for some of our hive queries.
>
> Are there any tools (free/paid) which help us to do this debugging quickly?
> I am planning to write a debugging tool for sifting through the distributed
> logs of hadoop but wanted to check if there are already any useful tools for
> this.
>
> Thx
> Gopi | www.wignite.com



-- 
Harsh J