You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Michael Bieniosek (JIRA)" <ji...@apache.org> on 2007/06/23 22:13:25 UTC

[jira] Created: (HADOOP-1524) Task Logs userlogs don't show up for a while

Task Logs userlogs don't show up for a while 
---------------------------------------------

                 Key: HADOOP-1524
                 URL: https://issues.apache.org/jira/browse/HADOOP-1524
             Project: Hadoop
          Issue Type: Bug
    Affects Versions: 0.13.0
            Reporter: Michael Bieniosek


When I start a task and go to the task logs, nothing shows up for a while.  An examination of TaskLog.Writer and TaskLog.Reader reveals:

1. The TaskLog.Reader relies on the presence of a split.idx to identify the parts of the logs to display.
2. The TaskLog.Writer only updates the split.idx file when it moves on to the next log.

As a result, updates to the log only get pushed when an entire file is done.

Why is there a split.idx file?  It seems that since files are called part-00000, part-00001, etc., the TaskLog.Reader can just look at all files and arrange them by alphabetical order.  The split.idx file also contains file length, but this data is already stored by the filesystem.

If nobody has objections, I'd like to write a patch to eliminate the split.idx file.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1524) Task Logs userlogs don't show up for a while

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12511181 ] 

Arun C Murthy commented on HADOOP-1524:
---------------------------------------

{quote}
I'd propose an approach where the index file looks like:

file|offset
{quote}

+1

> Task Logs userlogs don't show up for a while 
> ---------------------------------------------
>
>                 Key: HADOOP-1524
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1524
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.13.0
>            Reporter: Michael Bieniosek
>         Attachments: eliminate-split-idx.patch
>
>
> When I start a task and go to the task logs, nothing shows up for a while.  An examination of TaskLog.Writer and TaskLog.Reader reveals:
> 1. The TaskLog.Reader relies on the presence of a split.idx to identify the parts of the logs to display.
> 2. The TaskLog.Writer only updates the split.idx file when it moves on to the next log.
> As a result, updates to the log only get pushed when an entire file is done.
> Why is there a split.idx file?  It seems that since files are called part-00000, part-00001, etc., the TaskLog.Reader can just look at all files and arrange them by alphabetical order.  The split.idx file also contains file length, but this data is already stored by the filesystem.
> If nobody has objections, I'd like to write a patch to eliminate the split.idx file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1524) Task Logs userlogs don't show up for a while

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508959 ] 

Hadoop QA commented on HADOOP-1524:
-----------------------------------

+1

http://issues.apache.org/jira/secure/attachment/12360767/eliminate-split-idx.patch applied and successfully tested against trunk revision r551725.

Test results:   http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/344/testReport/
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/344/console

> Task Logs userlogs don't show up for a while 
> ---------------------------------------------
>
>                 Key: HADOOP-1524
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1524
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.13.0
>            Reporter: Michael Bieniosek
>         Attachments: eliminate-split-idx.patch
>
>
> When I start a task and go to the task logs, nothing shows up for a while.  An examination of TaskLog.Writer and TaskLog.Reader reveals:
> 1. The TaskLog.Reader relies on the presence of a split.idx to identify the parts of the logs to display.
> 2. The TaskLog.Writer only updates the split.idx file when it moves on to the next log.
> As a result, updates to the log only get pushed when an entire file is done.
> Why is there a split.idx file?  It seems that since files are called part-00000, part-00001, etc., the TaskLog.Reader can just look at all files and arrange them by alphabetical order.  The split.idx file also contains file length, but this data is already stored by the filesystem.
> If nobody has objections, I'd like to write a patch to eliminate the split.idx file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1524) Task Logs userlogs don't show up for a while

Posted by "Sameer Paranjpye (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sameer Paranjpye updated HADOOP-1524:
-------------------------------------

    Component/s: mapred

> Task Logs userlogs don't show up for a while 
> ---------------------------------------------
>
>                 Key: HADOOP-1524
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1524
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.13.0
>            Reporter: Michael Bieniosek
>
> When I start a task and go to the task logs, nothing shows up for a while.  An examination of TaskLog.Writer and TaskLog.Reader reveals:
> 1. The TaskLog.Reader relies on the presence of a split.idx to identify the parts of the logs to display.
> 2. The TaskLog.Writer only updates the split.idx file when it moves on to the next log.
> As a result, updates to the log only get pushed when an entire file is done.
> Why is there a split.idx file?  It seems that since files are called part-00000, part-00001, etc., the TaskLog.Reader can just look at all files and arrange them by alphabetical order.  The split.idx file also contains file length, but this data is already stored by the filesystem.
> If nobody has objections, I'd like to write a patch to eliminate the split.idx file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1524) Task Logs userlogs don't show up for a while

Posted by "Michael Bieniosek (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12511028 ] 

Michael Bieniosek commented on HADOOP-1524:
-------------------------------------------

> One option is to just dump the contents of part-* files when there is no index file to read or when there is no information present in the index file... thoughts?

I assume you mean that we should compare the index with the file system, and do the right thing if a file is missing from the index.

That seems like a reasonable solution.


> Task Logs userlogs don't show up for a while 
> ---------------------------------------------
>
>                 Key: HADOOP-1524
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1524
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.13.0
>            Reporter: Michael Bieniosek
>         Attachments: eliminate-split-idx.patch
>
>
> When I start a task and go to the task logs, nothing shows up for a while.  An examination of TaskLog.Writer and TaskLog.Reader reveals:
> 1. The TaskLog.Reader relies on the presence of a split.idx to identify the parts of the logs to display.
> 2. The TaskLog.Writer only updates the split.idx file when it moves on to the next log.
> As a result, updates to the log only get pushed when an entire file is done.
> Why is there a split.idx file?  It seems that since files are called part-00000, part-00001, etc., the TaskLog.Reader can just look at all files and arrange them by alphabetical order.  The split.idx file also contains file length, but this data is already stored by the filesystem.
> If nobody has objections, I'd like to write a patch to eliminate the split.idx file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1524) Task Logs userlogs don't show up for a while

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated HADOOP-1524:
---------------------------------

       Resolution: Fixed
    Fix Version/s: 0.14.0
           Status: Resolved  (was: Patch Available)

I just committed this.  Thanks, Michael!

> Task Logs userlogs don't show up for a while 
> ---------------------------------------------
>
>                 Key: HADOOP-1524
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1524
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.13.0
>            Reporter: Michael Bieniosek
>             Fix For: 0.14.0
>
>         Attachments: accelerate-task-log.patch, eliminate-split-idx.patch
>
>
> When I start a task and go to the task logs, nothing shows up for a while.  An examination of TaskLog.Writer and TaskLog.Reader reveals:
> 1. The TaskLog.Reader relies on the presence of a split.idx to identify the parts of the logs to display.
> 2. The TaskLog.Writer only updates the split.idx file when it moves on to the next log.
> As a result, updates to the log only get pushed when an entire file is done.
> Why is there a split.idx file?  It seems that since files are called part-00000, part-00001, etc., the TaskLog.Reader can just look at all files and arrange them by alphabetical order.  The split.idx file also contains file length, but this data is already stored by the filesystem.
> If nobody has objections, I'd like to write a patch to eliminate the split.idx file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (HADOOP-1524) Task Logs userlogs don't show up for a while

Posted by "Michael Bieniosek (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12512576 ] 

Michael Bieniosek edited comment on HADOOP-1524 at 7/13/07 11:47 AM:
---------------------------------------------------------------------

My patch shouldn't affect the fs code at all.  I have no idea what is failing or why.  I have personally tried my patch on a two-node cluster without issue.


 was:
My patch shouldn't affect the fs code at all.  I have no idea what is failing or why.

> Task Logs userlogs don't show up for a while 
> ---------------------------------------------
>
>                 Key: HADOOP-1524
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1524
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.13.0
>            Reporter: Michael Bieniosek
>         Attachments: accelerate-task-log.patch, eliminate-split-idx.patch
>
>
> When I start a task and go to the task logs, nothing shows up for a while.  An examination of TaskLog.Writer and TaskLog.Reader reveals:
> 1. The TaskLog.Reader relies on the presence of a split.idx to identify the parts of the logs to display.
> 2. The TaskLog.Writer only updates the split.idx file when it moves on to the next log.
> As a result, updates to the log only get pushed when an entire file is done.
> Why is there a split.idx file?  It seems that since files are called part-00000, part-00001, etc., the TaskLog.Reader can just look at all files and arrange them by alphabetical order.  The split.idx file also contains file length, but this data is already stored by the filesystem.
> If nobody has objections, I'd like to write a patch to eliminate the split.idx file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1524) Task Logs userlogs don't show up for a while

Posted by "Michael Bieniosek (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Bieniosek updated HADOOP-1524:
--------------------------------------

    Attachment: accelerate-task-log.patch

this patch writes to the split.idx file when a new log file is created, and does not store file lengths in the split.idx file.

> Task Logs userlogs don't show up for a while 
> ---------------------------------------------
>
>                 Key: HADOOP-1524
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1524
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.13.0
>            Reporter: Michael Bieniosek
>         Attachments: accelerate-task-log.patch, eliminate-split-idx.patch
>
>
> When I start a task and go to the task logs, nothing shows up for a while.  An examination of TaskLog.Writer and TaskLog.Reader reveals:
> 1. The TaskLog.Reader relies on the presence of a split.idx to identify the parts of the logs to display.
> 2. The TaskLog.Writer only updates the split.idx file when it moves on to the next log.
> As a result, updates to the log only get pushed when an entire file is done.
> Why is there a split.idx file?  It seems that since files are called part-00000, part-00001, etc., the TaskLog.Reader can just look at all files and arrange them by alphabetical order.  The split.idx file also contains file length, but this data is already stored by the filesystem.
> If nobody has objections, I'd like to write a patch to eliminate the split.idx file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1524) Task Logs userlogs don't show up for a while

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12509512 ] 

Arun C Murthy commented on HADOOP-1524:
---------------------------------------

bq. Why is there a split.idx file?

The index file is maintained to aid in tailing the log files (4KB / 8KB tail window), and hence it cannot be eliminated.

bq. This bug clearly does need to be fixed however...

No question.
One option is to just dump the contents of part-* files when there is no index file to read or when there is no information present in the index file... thoughts?

> Task Logs userlogs don't show up for a while 
> ---------------------------------------------
>
>                 Key: HADOOP-1524
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1524
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.13.0
>            Reporter: Michael Bieniosek
>         Attachments: eliminate-split-idx.patch
>
>
> When I start a task and go to the task logs, nothing shows up for a while.  An examination of TaskLog.Writer and TaskLog.Reader reveals:
> 1. The TaskLog.Reader relies on the presence of a split.idx to identify the parts of the logs to display.
> 2. The TaskLog.Writer only updates the split.idx file when it moves on to the next log.
> As a result, updates to the log only get pushed when an entire file is done.
> Why is there a split.idx file?  It seems that since files are called part-00000, part-00001, etc., the TaskLog.Reader can just look at all files and arrange them by alphabetical order.  The split.idx file also contains file length, but this data is already stored by the filesystem.
> If nobody has objections, I'd like to write a patch to eliminate the split.idx file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1524) Task Logs userlogs don't show up for a while

Posted by "Michael Bieniosek (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Bieniosek updated HADOOP-1524:
--------------------------------------

    Status: Patch Available  (was: Open)

Hudson failure looks unrelated to this patch.  Toggling status to see if I can make it work...

> Task Logs userlogs don't show up for a while 
> ---------------------------------------------
>
>                 Key: HADOOP-1524
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1524
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.13.0
>            Reporter: Michael Bieniosek
>         Attachments: accelerate-task-log.patch, eliminate-split-idx.patch
>
>
> When I start a task and go to the task logs, nothing shows up for a while.  An examination of TaskLog.Writer and TaskLog.Reader reveals:
> 1. The TaskLog.Reader relies on the presence of a split.idx to identify the parts of the logs to display.
> 2. The TaskLog.Writer only updates the split.idx file when it moves on to the next log.
> As a result, updates to the log only get pushed when an entire file is done.
> Why is there a split.idx file?  It seems that since files are called part-00000, part-00001, etc., the TaskLog.Reader can just look at all files and arrange them by alphabetical order.  The split.idx file also contains file length, but this data is already stored by the filesystem.
> If nobody has objections, I'd like to write a patch to eliminate the split.idx file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1524) Task Logs userlogs don't show up for a while

Posted by "Michael Bieniosek (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12511193 ] 

Michael Bieniosek commented on HADOOP-1524:
-------------------------------------------

Ah, I see.  In that case, your solution sounds good.

> Task Logs userlogs don't show up for a while 
> ---------------------------------------------
>
>                 Key: HADOOP-1524
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1524
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.13.0
>            Reporter: Michael Bieniosek
>         Attachments: eliminate-split-idx.patch
>
>
> When I start a task and go to the task logs, nothing shows up for a while.  An examination of TaskLog.Writer and TaskLog.Reader reveals:
> 1. The TaskLog.Reader relies on the presence of a split.idx to identify the parts of the logs to display.
> 2. The TaskLog.Writer only updates the split.idx file when it moves on to the next log.
> As a result, updates to the log only get pushed when an entire file is done.
> Why is there a split.idx file?  It seems that since files are called part-00000, part-00001, etc., the TaskLog.Reader can just look at all files and arrange them by alphabetical order.  The split.idx file also contains file length, but this data is already stored by the filesystem.
> If nobody has objections, I'd like to write a patch to eliminate the split.idx file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1524) Task Logs userlogs don't show up for a while

Posted by "Michael Bieniosek (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Bieniosek updated HADOOP-1524:
--------------------------------------

    Attachment: eliminate-split-idx.patch

This patch eliminates the use of split.idx.  Instead, get the information directly from the file system.

> Task Logs userlogs don't show up for a while 
> ---------------------------------------------
>
>                 Key: HADOOP-1524
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1524
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.13.0
>            Reporter: Michael Bieniosek
>         Attachments: eliminate-split-idx.patch
>
>
> When I start a task and go to the task logs, nothing shows up for a while.  An examination of TaskLog.Writer and TaskLog.Reader reveals:
> 1. The TaskLog.Reader relies on the presence of a split.idx to identify the parts of the logs to display.
> 2. The TaskLog.Writer only updates the split.idx file when it moves on to the next log.
> As a result, updates to the log only get pushed when an entire file is done.
> Why is there a split.idx file?  It seems that since files are called part-00000, part-00001, etc., the TaskLog.Reader can just look at all files and arrange them by alphabetical order.  The split.idx file also contains file length, but this data is already stored by the filesystem.
> If nobody has objections, I'd like to write a patch to eliminate the split.idx file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1524) Task Logs userlogs don't show up for a while

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12511189 ] 

Owen O'Malley commented on HADOOP-1524:
---------------------------------------

That is because the tasklog.jsp supports reading from an offset with the intention of tailing the logs. If you don't keep the length of the deleted logs you can't tail because the offsets would change with every deletion.

> Task Logs userlogs don't show up for a while 
> ---------------------------------------------
>
>                 Key: HADOOP-1524
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1524
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.13.0
>            Reporter: Michael Bieniosek
>         Attachments: eliminate-split-idx.patch
>
>
> When I start a task and go to the task logs, nothing shows up for a while.  An examination of TaskLog.Writer and TaskLog.Reader reveals:
> 1. The TaskLog.Reader relies on the presence of a split.idx to identify the parts of the logs to display.
> 2. The TaskLog.Writer only updates the split.idx file when it moves on to the next log.
> As a result, updates to the log only get pushed when an entire file is done.
> Why is there a split.idx file?  It seems that since files are called part-00000, part-00001, etc., the TaskLog.Reader can just look at all files and arrange them by alphabetical order.  The split.idx file also contains file length, but this data is already stored by the filesystem.
> If nobody has objections, I'd like to write a patch to eliminate the split.idx file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1524) Task Logs userlogs don't show up for a while

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12511173 ] 

Owen O'Malley commented on HADOOP-1524:
---------------------------------------

I believe you are missing the point. The older splits are deleted to limit the size of the task logs. This means that you can't use their lengths to compute offsets because they aren't there any more. 

I'd propose an approach where the index file looks like:

file|offset

dropping the length, so that the index can be written when the new split is started. This will preserve the current functionality and fix the problem, I believe.

> Task Logs userlogs don't show up for a while 
> ---------------------------------------------
>
>                 Key: HADOOP-1524
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1524
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.13.0
>            Reporter: Michael Bieniosek
>         Attachments: eliminate-split-idx.patch
>
>
> When I start a task and go to the task logs, nothing shows up for a while.  An examination of TaskLog.Writer and TaskLog.Reader reveals:
> 1. The TaskLog.Reader relies on the presence of a split.idx to identify the parts of the logs to display.
> 2. The TaskLog.Writer only updates the split.idx file when it moves on to the next log.
> As a result, updates to the log only get pushed when an entire file is done.
> Why is there a split.idx file?  It seems that since files are called part-00000, part-00001, etc., the TaskLog.Reader can just look at all files and arrange them by alphabetical order.  The split.idx file also contains file length, but this data is already stored by the filesystem.
> If nobody has objections, I'd like to write a patch to eliminate the split.idx file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1524) Task Logs userlogs don't show up for a while

Posted by "Michael Bieniosek (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12512576 ] 

Michael Bieniosek commented on HADOOP-1524:
-------------------------------------------

My patch shouldn't affect the fs code at all.  I have no idea what is failing or why.

> Task Logs userlogs don't show up for a while 
> ---------------------------------------------
>
>                 Key: HADOOP-1524
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1524
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.13.0
>            Reporter: Michael Bieniosek
>         Attachments: accelerate-task-log.patch, eliminate-split-idx.patch
>
>
> When I start a task and go to the task logs, nothing shows up for a while.  An examination of TaskLog.Writer and TaskLog.Reader reveals:
> 1. The TaskLog.Reader relies on the presence of a split.idx to identify the parts of the logs to display.
> 2. The TaskLog.Writer only updates the split.idx file when it moves on to the next log.
> As a result, updates to the log only get pushed when an entire file is done.
> Why is there a split.idx file?  It seems that since files are called part-00000, part-00001, etc., the TaskLog.Reader can just look at all files and arrange them by alphabetical order.  The split.idx file also contains file length, but this data is already stored by the filesystem.
> If nobody has objections, I'd like to write a patch to eliminate the split.idx file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1524) Task Logs userlogs don't show up for a while

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12512219 ] 

Hadoop QA commented on HADOOP-1524:
-----------------------------------

-1, build or testing failed

2 attempts failed to build and test the latest attachment http://issues.apache.org/jira/secure/attachment/12361700/accelerate-task-log.patch against trunk revision r555697.

Test results:   http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/399/testReport/
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/399/console

Please note that this message is automatically generated and may represent a problem with the automation system and not the patch.

> Task Logs userlogs don't show up for a while 
> ---------------------------------------------
>
>                 Key: HADOOP-1524
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1524
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.13.0
>            Reporter: Michael Bieniosek
>         Attachments: accelerate-task-log.patch, eliminate-split-idx.patch
>
>
> When I start a task and go to the task logs, nothing shows up for a while.  An examination of TaskLog.Writer and TaskLog.Reader reveals:
> 1. The TaskLog.Reader relies on the presence of a split.idx to identify the parts of the logs to display.
> 2. The TaskLog.Writer only updates the split.idx file when it moves on to the next log.
> As a result, updates to the log only get pushed when an entire file is done.
> Why is there a split.idx file?  It seems that since files are called part-00000, part-00001, etc., the TaskLog.Reader can just look at all files and arrange them by alphabetical order.  The split.idx file also contains file length, but this data is already stored by the filesystem.
> If nobody has objections, I'd like to write a patch to eliminate the split.idx file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1524) Task Logs userlogs don't show up for a while

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12512984 ] 

Owen O'Malley commented on HADOOP-1524:
---------------------------------------

This looks good.

> Task Logs userlogs don't show up for a while 
> ---------------------------------------------
>
>                 Key: HADOOP-1524
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1524
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.13.0
>            Reporter: Michael Bieniosek
>         Attachments: accelerate-task-log.patch, eliminate-split-idx.patch
>
>
> When I start a task and go to the task logs, nothing shows up for a while.  An examination of TaskLog.Writer and TaskLog.Reader reveals:
> 1. The TaskLog.Reader relies on the presence of a split.idx to identify the parts of the logs to display.
> 2. The TaskLog.Writer only updates the split.idx file when it moves on to the next log.
> As a result, updates to the log only get pushed when an entire file is done.
> Why is there a split.idx file?  It seems that since files are called part-00000, part-00001, etc., the TaskLog.Reader can just look at all files and arrange them by alphabetical order.  The split.idx file also contains file length, but this data is already stored by the filesystem.
> If nobody has objections, I'd like to write a patch to eliminate the split.idx file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1524) Task Logs userlogs don't show up for a while

Posted by "Michael Bieniosek (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12511182 ] 

Michael Bieniosek commented on HADOOP-1524:
-------------------------------------------

So, why do we need offsets from the beginning of the logs?  It seems to me that we only need offsets from the end of the log, which doesn't require an index file at all.  

The only additional information the index file gives is the length of the deleted logs.


> Task Logs userlogs don't show up for a while 
> ---------------------------------------------
>
>                 Key: HADOOP-1524
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1524
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.13.0
>            Reporter: Michael Bieniosek
>         Attachments: eliminate-split-idx.patch
>
>
> When I start a task and go to the task logs, nothing shows up for a while.  An examination of TaskLog.Writer and TaskLog.Reader reveals:
> 1. The TaskLog.Reader relies on the presence of a split.idx to identify the parts of the logs to display.
> 2. The TaskLog.Writer only updates the split.idx file when it moves on to the next log.
> As a result, updates to the log only get pushed when an entire file is done.
> Why is there a split.idx file?  It seems that since files are called part-00000, part-00001, etc., the TaskLog.Reader can just look at all files and arrange them by alphabetical order.  The split.idx file also contains file length, but this data is already stored by the filesystem.
> If nobody has objections, I'd like to write a patch to eliminate the split.idx file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1524) Task Logs userlogs don't show up for a while

Posted by "Michael Bieniosek (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Bieniosek updated HADOOP-1524:
--------------------------------------

    Status: Patch Available  (was: Open)

> Task Logs userlogs don't show up for a while 
> ---------------------------------------------
>
>                 Key: HADOOP-1524
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1524
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.13.0
>            Reporter: Michael Bieniosek
>         Attachments: accelerate-task-log.patch, eliminate-split-idx.patch
>
>
> When I start a task and go to the task logs, nothing shows up for a while.  An examination of TaskLog.Writer and TaskLog.Reader reveals:
> 1. The TaskLog.Reader relies on the presence of a split.idx to identify the parts of the logs to display.
> 2. The TaskLog.Writer only updates the split.idx file when it moves on to the next log.
> As a result, updates to the log only get pushed when an entire file is done.
> Why is there a split.idx file?  It seems that since files are called part-00000, part-00001, etc., the TaskLog.Reader can just look at all files and arrange them by alphabetical order.  The split.idx file also contains file length, but this data is already stored by the filesystem.
> If nobody has objections, I'd like to write a patch to eliminate the split.idx file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1524) Task Logs userlogs don't show up for a while

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley updated HADOOP-1524:
----------------------------------

    Status: Open  (was: Patch Available)

This doesn't work, because it doesn't preserve the offsets of the task logs as the older parts are deleted. One of the requirements for the task logs is that you can read them as they are generated. With this patch, the offsets will constantly be changing as old splits are deleted and thus prevent users from tailing the log.

This bug clearly does need to be fixed however...

> Task Logs userlogs don't show up for a while 
> ---------------------------------------------
>
>                 Key: HADOOP-1524
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1524
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.13.0
>            Reporter: Michael Bieniosek
>         Attachments: eliminate-split-idx.patch
>
>
> When I start a task and go to the task logs, nothing shows up for a while.  An examination of TaskLog.Writer and TaskLog.Reader reveals:
> 1. The TaskLog.Reader relies on the presence of a split.idx to identify the parts of the logs to display.
> 2. The TaskLog.Writer only updates the split.idx file when it moves on to the next log.
> As a result, updates to the log only get pushed when an entire file is done.
> Why is there a split.idx file?  It seems that since files are called part-00000, part-00001, etc., the TaskLog.Reader can just look at all files and arrange them by alphabetical order.  The split.idx file also contains file length, but this data is already stored by the filesystem.
> If nobody has objections, I'd like to write a patch to eliminate the split.idx file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1524) Task Logs userlogs don't show up for a while

Posted by "Michael Bieniosek (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12511150 ] 

Michael Bieniosek commented on HADOOP-1524:
-------------------------------------------

> With this patch, the offsets will constantly be changing as old splits are deleted and thus prevent users from tailing the log.

Actually, the patch doesn't affect tailing the log, as the offsets from the end of the log will be preserved (since the filesystem knows their lengths).


> Task Logs userlogs don't show up for a while 
> ---------------------------------------------
>
>                 Key: HADOOP-1524
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1524
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.13.0
>            Reporter: Michael Bieniosek
>         Attachments: eliminate-split-idx.patch
>
>
> When I start a task and go to the task logs, nothing shows up for a while.  An examination of TaskLog.Writer and TaskLog.Reader reveals:
> 1. The TaskLog.Reader relies on the presence of a split.idx to identify the parts of the logs to display.
> 2. The TaskLog.Writer only updates the split.idx file when it moves on to the next log.
> As a result, updates to the log only get pushed when an entire file is done.
> Why is there a split.idx file?  It seems that since files are called part-00000, part-00001, etc., the TaskLog.Reader can just look at all files and arrange them by alphabetical order.  The split.idx file also contains file length, but this data is already stored by the filesystem.
> If nobody has objections, I'd like to write a patch to eliminate the split.idx file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1524) Task Logs userlogs don't show up for a while

Posted by "Michael Bieniosek (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Bieniosek updated HADOOP-1524:
--------------------------------------

    Status: Patch Available  (was: Open)

> Task Logs userlogs don't show up for a while 
> ---------------------------------------------
>
>                 Key: HADOOP-1524
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1524
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.13.0
>            Reporter: Michael Bieniosek
>
> When I start a task and go to the task logs, nothing shows up for a while.  An examination of TaskLog.Writer and TaskLog.Reader reveals:
> 1. The TaskLog.Reader relies on the presence of a split.idx to identify the parts of the logs to display.
> 2. The TaskLog.Writer only updates the split.idx file when it moves on to the next log.
> As a result, updates to the log only get pushed when an entire file is done.
> Why is there a split.idx file?  It seems that since files are called part-00000, part-00001, etc., the TaskLog.Reader can just look at all files and arrange them by alphabetical order.  The split.idx file also contains file length, but this data is already stored by the filesystem.
> If nobody has objections, I'd like to write a patch to eliminate the split.idx file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1524) Task Logs userlogs don't show up for a while

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12512563 ] 

Hadoop QA commented on HADOOP-1524:
-----------------------------------

-1, build or testing failed

2 attempts failed to build and test the latest attachment http://issues.apache.org/jira/secure/attachment/12361700/accelerate-task-log.patch against trunk revision r555813.

Test results:   http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/405/testReport/
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/405/console

Please note that this message is automatically generated and may represent a problem with the automation system and not the patch.

> Task Logs userlogs don't show up for a while 
> ---------------------------------------------
>
>                 Key: HADOOP-1524
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1524
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.13.0
>            Reporter: Michael Bieniosek
>         Attachments: accelerate-task-log.patch, eliminate-split-idx.patch
>
>
> When I start a task and go to the task logs, nothing shows up for a while.  An examination of TaskLog.Writer and TaskLog.Reader reveals:
> 1. The TaskLog.Reader relies on the presence of a split.idx to identify the parts of the logs to display.
> 2. The TaskLog.Writer only updates the split.idx file when it moves on to the next log.
> As a result, updates to the log only get pushed when an entire file is done.
> Why is there a split.idx file?  It seems that since files are called part-00000, part-00001, etc., the TaskLog.Reader can just look at all files and arrange them by alphabetical order.  The split.idx file also contains file length, but this data is already stored by the filesystem.
> If nobody has objections, I'd like to write a patch to eliminate the split.idx file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1524) Task Logs userlogs don't show up for a while

Posted by "Michael Bieniosek (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Bieniosek updated HADOOP-1524:
--------------------------------------

    Status: Open  (was: Patch Available)

> Task Logs userlogs don't show up for a while 
> ---------------------------------------------
>
>                 Key: HADOOP-1524
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1524
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.13.0
>            Reporter: Michael Bieniosek
>         Attachments: accelerate-task-log.patch, eliminate-split-idx.patch
>
>
> When I start a task and go to the task logs, nothing shows up for a while.  An examination of TaskLog.Writer and TaskLog.Reader reveals:
> 1. The TaskLog.Reader relies on the presence of a split.idx to identify the parts of the logs to display.
> 2. The TaskLog.Writer only updates the split.idx file when it moves on to the next log.
> As a result, updates to the log only get pushed when an entire file is done.
> Why is there a split.idx file?  It seems that since files are called part-00000, part-00001, etc., the TaskLog.Reader can just look at all files and arrange them by alphabetical order.  The split.idx file also contains file length, but this data is already stored by the filesystem.
> If nobody has objections, I'd like to write a patch to eliminate the split.idx file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.