You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Mahadev konar (JIRA)" <ji...@apache.org> on 2006/03/17 05:49:12 UTC

[jira] Created: (HADOOP-92) Error Reporting/logging in MapReduce

Error Reporting/logging in MapReduce
------------------------------------

         Key: HADOOP-92
         URL: http://issues.apache.org/jira/browse/HADOOP-92
     Project: Hadoop
        Type: Bug
  Components: mapred  
    Reporter: Mahadev konar
    Priority: Minor


Currently Mapreduce does not tell you which machine failed to execute the task. Also, it would be nice to have features wherein there is a log report with each job, saying the number of tasks it ran (reporting which one failed and on which machine, listing any error information it can)  with  the start/end/execute time of each task. 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Resolved: (HADOOP-92) Error Reporting/logging in MapReduce

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

     [ http://issues.apache.org/jira/browse/HADOOP-92?page=all ]
     
Doug Cutting resolved HADOOP-92:
--------------------------------

    Fix Version: 0.2
     Resolution: Fixed
      Assign To: Mahadev konar

This mostly looks good.  I committed it with the following changes:

1. TaskStatus is not a public class, so a public method in another public class should not return it.  For now, I just made the method package-private, since the jsp pages are compiled in the same package.  Longer-term we should probably copy this information into the TaskReport, the public version of a TaskStatus, or something.  This information should be available to other applications through a public API, and through an RPC in the JobSubmissionProtocol.

2. I renamed JobTracker.getallTaskStatus to be getTaskStatuses, like the JobInProgress method, and also using correct camel-case.

3. Rather than add a new TaskStatus contructor, leaving an old one that is no longer called, I removed the old one.  We don't need dead code around.  This is package-private, so we can be sure that no code outside of this package uses the old constructor.

4. The patch removed some inter-method whitespace.  I restored it.


> Error Reporting/logging in MapReduce
> ------------------------------------
>
>          Key: HADOOP-92
>          URL: http://issues.apache.org/jira/browse/HADOOP-92
>      Project: Hadoop
>         Type: Bug

>   Components: mapred
>     Reporter: Mahadev konar
>     Assignee: Mahadev konar
>     Priority: Minor
>      Fix For: 0.2
>  Attachments: interface.patch, patch.txt
>
> Currently Mapreduce does not tell you which machine failed to execute the task. Also, it would be nice to have features wherein there is a log report with each job, saying the number of tasks it ran (reporting which one failed and on which machine, listing any error information it can)  with  the start/end/execute time of each task. 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-92) Error Reporting/logging in MapReduce

Posted by "stack@archive.org (JIRA)" <ji...@apache.org>.

    [ http://issues.apache.org/jira/browse/HADOOP-92?page=comments#action_12374251 ] 

stack@archive.org commented on HADOOP-92:
-----------------------------------------

I took this patch for a spin.  +1 on commit.

> Error Reporting/logging in MapReduce
> ------------------------------------
>
>          Key: HADOOP-92
>          URL: http://issues.apache.org/jira/browse/HADOOP-92
>      Project: Hadoop
>         Type: Bug

>   Components: mapred
>     Reporter: Mahadev konar
>     Priority: Minor
>  Attachments: interface.patch, patch.txt
>
> Currently Mapreduce does not tell you which machine failed to execute the task. Also, it would be nice to have features wherein there is a log report with each job, saying the number of tasks it ran (reporting which one failed and on which machine, listing any error information it can)  with  the start/end/execute time of each task. 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-92) Error Reporting/logging in MapReduce

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ http://issues.apache.org/jira/browse/HADOOP-92?page=comments#action_12371457 ] 

Doug Cutting commented on HADOOP-92:
------------------------------------

Another approach to this would be to expose this through the JobClient API.  Job-related events can be reported to the job client.  Events can be queued in the job tracker and the JobClient can retrieve them as it polls for job status.  Then the JobClient can decide where to log them.  By default they can be logged to standard error.

The events I think one might care about are:
  - task start (task_id, type & host)
  - task completion (task_id)
  - task failure (task_id, error message)

The jobtracker already tracks most of this, so I don't think this places a huge new burden on the jobtracker.

I don't like polluting the job's output directory with log data since it would require changes to the InputFormat implementations and other code to make them skip this specially named sub-directory (unless the name begins with a dot, which the fs code already ignores).

In any case, we could add an option to JobClient to log to the job's output fs.


> Error Reporting/logging in MapReduce
> ------------------------------------
>
>          Key: HADOOP-92
>          URL: http://issues.apache.org/jira/browse/HADOOP-92
>      Project: Hadoop
>         Type: Bug
>   Components: mapred
>     Reporter: Mahadev konar
>     Priority: Minor

>
> Currently Mapreduce does not tell you which machine failed to execute the task. Also, it would be nice to have features wherein there is a log report with each job, saying the number of tasks it ran (reporting which one failed and on which machine, listing any error information it can)  with  the start/end/execute time of each task. 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Updated: (HADOOP-92) Error Reporting/logging in MapReduce

Posted by "Mahadev konar (JIRA)" <ji...@apache.org>.

     [ http://issues.apache.org/jira/browse/HADOOP-92?page=all ]

Mahadev konar updated HADOOP-92:
--------------------------------

    Attachment: interface.patch

here is patch that reports the machine on which a specific task failed. Clicking on the jobid takes you to a page where  the information is displayed as before (with a little more parsing). But each of the task's is made clickable to show all the attempts that were made to execute this task. On clikcing on this task, you can see the machine on which a particular task attempt was executed and what the error was.

> Error Reporting/logging in MapReduce
> ------------------------------------
>
>          Key: HADOOP-92
>          URL: http://issues.apache.org/jira/browse/HADOOP-92
>      Project: Hadoop
>         Type: Bug
>   Components: mapred
>     Reporter: Mahadev konar
>     Priority: Minor
>  Attachments: interface.patch
>
> Currently Mapreduce does not tell you which machine failed to execute the task. Also, it would be nice to have features wherein there is a log report with each job, saying the number of tasks it ran (reporting which one failed and on which machine, listing any error information it can)  with  the start/end/execute time of each task. 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-92) Error Reporting/logging in MapReduce

Posted by "eric baldeschwieler (JIRA)" <ji...@apache.org>.

    [ http://issues.apache.org/jira/browse/HADOOP-92?page=comments#action_12370786 ] 

eric baldeschwieler commented on HADOOP-92:
-------------------------------------------

specifically, I'm hoping that we can create a well known file in the output directory with a log of all interesting job output.

Perhaps we should put it in a subdirectory /INFO/ of the output directory, so that when the directory is used for future input, it is not confused with data?  This will also allow us to add other things later.

> Error Reporting/logging in MapReduce
> ------------------------------------
>
>          Key: HADOOP-92
>          URL: http://issues.apache.org/jira/browse/HADOOP-92
>      Project: Hadoop
>         Type: Bug
>   Components: mapred
>     Reporter: Mahadev konar
>     Priority: Minor

>
> Currently Mapreduce does not tell you which machine failed to execute the task. Also, it would be nice to have features wherein there is a log report with each job, saying the number of tasks it ran (reporting which one failed and on which machine, listing any error information it can)  with  the start/end/execute time of each task. 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Re: [jira] Commented: (HADOOP-92) Error Reporting/logging in MapReduce

Posted by Eric Baldeschwieler <er...@yahoo-inc.com>.

this might be a reasonable short term solution.  Although it adds a  
lot of complexity.

I was assuming master aggregation.  This clearly could be a burden  
(good point), although if we simply log launch, completion and  
failure events, that should be ok.  Maybe we should stick to that?   
Or only record failure logs by default?

Another approach would be to launch a job to reap entries whenever  
there get to be a large number, and just concatenate them together.   
Say concatenate the smallest 100 together whenever we get to 200?

On Mar 22, 2006, at 9:21 AM, Stefan Groschupf wrote:

> Hi,
>
> In case we would be able to query a host that runs a specific  
> maprunnable from the jobtracker,
> we would be able to run one logging server as map task and  
> tasktrackers can send log messages to this logging server.
> From my point of view this would be easier to implement than  
> multiple writers to one dfs file.
>
> Just my 2 cents.
> Greetings,
> Stefan
>
>
> Am 22.03.2006 um 18:10 schrieb Yoram Arnon:
>
>> DFS files can only be written once, and by a single writer.
>> Until that changes our hands are tied, as long as we require the  
>> output to
>> reside in the output directory.
>>
>> Unless... we create a protocol whereby the task masters report up  
>> to the job
>> master, and it's only the job master that does the logging.
>> That might introduce unwanted overhead and some load on the job  
>> master.
>>
>>
>>> -----Original Message-----
>>> From: Eric Baldeschwieler [mailto:eric14@yahoo-inc.com]
>>> Sent: Tuesday, March 21, 2006 8:54 PM
>>> To: hadoop-dev@lucene.apache.org
>>> Subject: Re: [jira] Commented: (HADOOP-92) Error Reporting/ 
>>> logging in
>>> MapReduce
>>>
>>> Will it really make sense to have 300,000 subdirectories with  
>>> several
>>> log files?  Seems like a real loosing proposition.  I'd just go  
>>> for a
>>> single log file with reasonable per line prefixes (time, job, ...).
>>>
>>> Then you can grep out what you want.
>>
>>
>>
>
> ---------------------------------------------------------------
> company:        http://www.media-style.com
> forum:        http://www.text-mining.org
> blog:            http://www.find23.net
>
>

Re: [jira] Commented: (HADOOP-92) Error Reporting/logging in MapReduce

Posted by Stefan Groschupf <sg...@media-style.com>.

Hi,

In case we would be able to query a host that runs a specific  
maprunnable from the jobtracker,
we would be able to run one logging server as map task and  
tasktrackers can send log messages to this logging server.
 From my point of view this would be easier to implement than  
multiple writers to one dfs file.

Just my 2 cents.
Greetings,
Stefan


Am 22.03.2006 um 18:10 schrieb Yoram Arnon:

> DFS files can only be written once, and by a single writer.
> Until that changes our hands are tied, as long as we require the  
> output to
> reside in the output directory.
>
> Unless... we create a protocol whereby the task masters report up  
> to the job
> master, and it's only the job master that does the logging.
> That might introduce unwanted overhead and some load on the job  
> master.
>
>
>> -----Original Message-----
>> From: Eric Baldeschwieler [mailto:eric14@yahoo-inc.com]
>> Sent: Tuesday, March 21, 2006 8:54 PM
>> To: hadoop-dev@lucene.apache.org
>> Subject: Re: [jira] Commented: (HADOOP-92) Error Reporting/logging in
>> MapReduce
>>
>> Will it really make sense to have 300,000 subdirectories with several
>> log files?  Seems like a real loosing proposition.  I'd just go for a
>> single log file with reasonable per line prefixes (time, job, ...).
>>
>> Then you can grep out what you want.
>
>
>

---------------------------------------------------------------
company:        http://www.media-style.com
forum:        http://www.text-mining.org
blog:            http://www.find23.net

RE: [jira] Commented: (HADOOP-92) Error Reporting/logging in MapReduce

Posted by Yoram Arnon <ya...@yahoo-inc.com>.

DFS files can only be written once, and by a single writer.
Until that changes our hands are tied, as long as we require the output to
reside in the output directory.

Unless... we create a protocol whereby the task masters report up to the job
master, and it's only the job master that does the logging.
That might introduce unwanted overhead and some load on the job master.


> -----Original Message-----
> From: Eric Baldeschwieler [mailto:eric14@yahoo-inc.com]
> Sent: Tuesday, March 21, 2006 8:54 PM
> To: hadoop-dev@lucene.apache.org
> Subject: Re: [jira] Commented: (HADOOP-92) Error Reporting/logging in
> MapReduce
> 
> Will it really make sense to have 300,000 subdirectories with several
> log files?  Seems like a real loosing proposition.  I'd just go for a
> single log file with reasonable per line prefixes (time, job, ...).
> 
> Then you can grep out what you want.

Re: [jira] Commented: (HADOOP-92) Error Reporting/logging in MapReduce

Posted by Eric Baldeschwieler <er...@yahoo-inc.com>.

Will it really make sense to have 300,000 subdirectories with several  
log files?  Seems like a real loosing proposition.  I'd just go for a  
single log file with reasonable per line prefixes (time, job, ...).

Then you can grep out what you want.

[jira] Commented: (HADOOP-92) Error Reporting/logging in MapReduce

Posted by "Mahadev konar (JIRA)" <ji...@apache.org>.

    [ http://issues.apache.org/jira/browse/HADOOP-92?page=comments#action_12371296 ] 

Mahadev konar commented on HADOOP-92:
-------------------------------------

It could be done with a logs directory in output_dir. So, the output_dir/logs/tasks/  would contain all the information per task (a file per task). This file would contain information like -- machine name, start time, end time, result status, error messages. Also, there should be a file per job  output_dir/logs/job_id.log. This file would contains job specific data -- no of mapreduce tasks that were running at some time, number of machines it was running on, and the start time and end time of the job. This log could also contain information on which of the task failed, so that the user can take a look at the respective task log to know why it failed.

> Error Reporting/logging in MapReduce
> ------------------------------------
>
>          Key: HADOOP-92
>          URL: http://issues.apache.org/jira/browse/HADOOP-92
>      Project: Hadoop
>         Type: Bug
>   Components: mapred
>     Reporter: Mahadev konar
>     Priority: Minor

>
> Currently Mapreduce does not tell you which machine failed to execute the task. Also, it would be nice to have features wherein there is a log report with each job, saying the number of tasks it ran (reporting which one failed and on which machine, listing any error information it can)  with  the start/end/execute time of each task. 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Updated: (HADOOP-92) Error Reporting/logging in MapReduce

Posted by "Mahadev konar (JIRA)" <ji...@apache.org>.

     [ http://issues.apache.org/jira/browse/HADOOP-92?page=all ]

Mahadev konar updated HADOOP-92:
--------------------------------

    Attachment: patch.txt

I am including a patch that improves upon the job tracker interface. The tipid is made clickable so that it shows all the attempts for a particular task with the machine name. I found it useful, so am uplodaing it again.

> Error Reporting/logging in MapReduce
> ------------------------------------
>
>          Key: HADOOP-92
>          URL: http://issues.apache.org/jira/browse/HADOOP-92
>      Project: Hadoop
>         Type: Bug

>   Components: mapred
>     Reporter: Mahadev konar
>     Priority: Minor
>  Attachments: interface.patch, patch.txt
>
> Currently Mapreduce does not tell you which machine failed to execute the task. Also, it would be nice to have features wherein there is a log report with each job, saying the number of tasks it ran (reporting which one failed and on which machine, listing any error information it can)  with  the start/end/execute time of each task. 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-92) Error Reporting/logging in MapReduce

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.

    [ http://issues.apache.org/jira/browse/HADOOP-92?page=comments#action_12374256 ] 

Owen O'Malley commented on HADOOP-92:
-------------------------------------

I've been using it today too. +1 on commit.

> Error Reporting/logging in MapReduce
> ------------------------------------
>
>          Key: HADOOP-92
>          URL: http://issues.apache.org/jira/browse/HADOOP-92
>      Project: Hadoop
>         Type: Bug

>   Components: mapred
>     Reporter: Mahadev konar
>     Priority: Minor
>  Attachments: interface.patch, patch.txt
>
> Currently Mapreduce does not tell you which machine failed to execute the task. Also, it would be nice to have features wherein there is a log report with each job, saying the number of tasks it ran (reporting which one failed and on which machine, listing any error information it can)  with  the start/end/execute time of each task. 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira