You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Owen O'Malley (JIRA)" <ji...@apache.org> on 2008/09/19 01:52:44 UTC

[jira] Created: (HADOOP-4209) The TaskAttemptID should not have the JobTracker start time

The TaskAttemptID should not have the JobTracker start time
-----------------------------------------------------------

                 Key: HADOOP-4209
                 URL: https://issues.apache.org/jira/browse/HADOOP-4209
             Project: Hadoop Core
          Issue Type: Bug
            Reporter: Owen O'Malley


The TaskAttemptID now includes the redundant copy of the JobTracker's start time as milliseconds. We should instead change the JobID to have the longer unique string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4209) The TaskAttemptID should not have the JobTracker start time

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley updated HADOOP-4209:
----------------------------------

      Resolution: Fixed
    Hadoop Flags: [Reviewed]
          Status: Resolved  (was: Patch Available)

I just committed this. Thanks, Amar!

I don't think this should be an incompatible change, because relative to 0.18, all that changed was that the task attempt ids are formatted to always have 4 digits.

> The TaskAttemptID should not have the JobTracker start time
> -----------------------------------------------------------
>
>                 Key: HADOOP-4209
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4209
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Owen O'Malley
>            Assignee: Amar Kamat
>            Priority: Blocker
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4209-v1.1.patch, HADOOP-4209-v1.2.patch, HADOOP-4209-v1.patch
>
>
> The TaskAttemptID now includes the redundant copy of the JobTracker's start time as milliseconds. We should instead change the JobID to have the longer unique string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4209) The TaskAttemptID should not have the JobTracker start time

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12632519#action_12632519 ] 

Owen O'Malley commented on HADOOP-4209:
---------------------------------------

After looking through the code some more and seeing the attempt ids like:

attempt_200707121733_0003_m_000005_0_1234567890123

There are problems:
  1. The format of the task ids change depending on the context.
  2. The final number is way longer than it needs to be.
  3. The numbers are out of order for sorting.
  4. The change of the format of the task ids needs to be called out much more explicitly.

I think it would be much better to expand the retry out to 4 digits and increment by one each time the job is running through a restart:

attempt_200707121733_0003_m_000005_0000
attempt_200707121733_0003_m_000005_0001   // fails once
attempt_200707121733_0003_m_000005_1000   // after a restart
attempt_200707121733_0003_m_000005_1001   // fails after restart
attempt_200707121733_0003_m_000005_2000   // after a second restart

That way, we keep the format consistent and compatible. It is a single variable to track in the JobInProgress and is easy to explain. The only problem would come if you had 1000 failures on an attempt and then had a JT reset.




> The TaskAttemptID should not have the JobTracker start time
> -----------------------------------------------------------
>
>                 Key: HADOOP-4209
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4209
>             Project: Hadoop Core
>          Issue Type: Bug
>            Reporter: Owen O'Malley
>            Priority: Blocker
>             Fix For: 0.19.0
>
>
> The TaskAttemptID now includes the redundant copy of the JobTracker's start time as milliseconds. We should instead change the JobID to have the longer unique string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4209) The TaskAttemptID should not have the JobTracker start time

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12632545#action_12632545 ] 

Owen O'Malley commented on HADOOP-4209:
---------------------------------------

+1 to keeping a count in the JobInProgress that is originally 0 and incremented by one every time the history is reread/restored. 

> The TaskAttemptID should not have the JobTracker start time
> -----------------------------------------------------------
>
>                 Key: HADOOP-4209
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4209
>             Project: Hadoop Core
>          Issue Type: Bug
>            Reporter: Owen O'Malley
>            Priority: Blocker
>             Fix For: 0.19.0
>
>
> The TaskAttemptID now includes the redundant copy of the JobTracker's start time as milliseconds. We should instead change the JobID to have the longer unique string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4209) The TaskAttemptID should not have the JobTracker start time

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636608#action_12636608 ] 

Hadoop QA commented on HADOOP-4209:
-----------------------------------

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12391402/HADOOP-4209-v1.2.patch
  against trunk revision 701307.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 9 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3428/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3428/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3428/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3428/console

This message is automatically generated.

> The TaskAttemptID should not have the JobTracker start time
> -----------------------------------------------------------
>
>                 Key: HADOOP-4209
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4209
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Owen O'Malley
>            Assignee: Amar Kamat
>            Priority: Blocker
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4209-v1.1.patch, HADOOP-4209-v1.2.patch, HADOOP-4209-v1.patch
>
>
> The TaskAttemptID now includes the redundant copy of the JobTracker's start time as milliseconds. We should instead change the JobID to have the longer unique string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4209) The TaskAttemptID should not have the JobTracker start time

Posted by "Enis Soztutar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12633658#action_12633658 ] 

Enis Soztutar commented on HADOOP-4209:
---------------------------------------

Both of the proposals change the attemptid format, although the first one is implicit. But I am not sure if we want to put 2 different integers in one field, instead of keeping them separate. 
The format is very similar to the current one, except timestamp is an incrementing int, and attempt id and timestamp places switched. 
The initial reason we have introduced job, task, attempt ids is that folks should not in any case parse the strings, and we can make changes to the format w/o breaking compatibility. However if you are concerned about compatibility then we should go with Owen's suggestion. 

> The TaskAttemptID should not have the JobTracker start time
> -----------------------------------------------------------
>
>                 Key: HADOOP-4209
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4209
>             Project: Hadoop Core
>          Issue Type: Bug
>            Reporter: Owen O'Malley
>            Assignee: Amar Kamat
>            Priority: Blocker
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4209-v1.patch
>
>
> The TaskAttemptID now includes the redundant copy of the JobTracker's start time as milliseconds. We should instead change the JobID to have the longer unique string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4209) The TaskAttemptID should not have the JobTracker start time

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amar Kamat updated HADOOP-4209:
-------------------------------

    Attachment: HADOOP-4209-v1.patch

Attaching a very basic patch. Changes are as follows 
1) Reverts the changes made to {{TaskAttememptID}} and changes related to it.
2) Adds the restart count to the history
3) Uses restart count while assigning ids to task attempts. This patch allows 1000 attempts per task per restart.

Ant tests pass on my box. Testing on a cluster.

> The TaskAttemptID should not have the JobTracker start time
> -----------------------------------------------------------
>
>                 Key: HADOOP-4209
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4209
>             Project: Hadoop Core
>          Issue Type: Bug
>            Reporter: Owen O'Malley
>            Assignee: Amar Kamat
>            Priority: Blocker
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4209-v1.patch
>
>
> The TaskAttemptID now includes the redundant copy of the JobTracker's start time as milliseconds. We should instead change the JobID to have the longer unique string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4209) The TaskAttemptID should not have the JobTracker start time

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amar Kamat updated HADOOP-4209:
-------------------------------

    Status: Open  (was: Patch Available)

> The TaskAttemptID should not have the JobTracker start time
> -----------------------------------------------------------
>
>                 Key: HADOOP-4209
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4209
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Owen O'Malley
>            Assignee: Amar Kamat
>            Priority: Blocker
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4209-v1.1.patch, HADOOP-4209-v1.patch
>
>
> The TaskAttemptID now includes the redundant copy of the JobTracker's start time as milliseconds. We should instead change the JobID to have the longer unique string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4209) The TaskAttemptID should not have the JobTracker start time

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637474#action_12637474 ] 

Hudson commented on HADOOP-4209:
--------------------------------

Integrated in Hadoop-trunk #626 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/626/])
    

> The TaskAttemptID should not have the JobTracker start time
> -----------------------------------------------------------
>
>                 Key: HADOOP-4209
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4209
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Owen O'Malley
>            Assignee: Amar Kamat
>            Priority: Blocker
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4209-v1.1.patch, HADOOP-4209-v1.2.patch, HADOOP-4209-v1.patch
>
>
> The TaskAttemptID now includes the redundant copy of the JobTracker's start time as milliseconds. We should instead change the JobID to have the longer unique string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4209) The TaskAttemptID should not have the JobTracker start time

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12635970#action_12635970 ] 

Amar Kamat commented on HADOOP-4209:
------------------------------------

Tested the patch on 50 nodes and it works fine. Tried with multiple restarts and everytime the attempt-id space shifts by 1000.

> The TaskAttemptID should not have the JobTracker start time
> -----------------------------------------------------------
>
>                 Key: HADOOP-4209
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4209
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Owen O'Malley
>            Assignee: Amar Kamat
>            Priority: Blocker
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4209-v1.1.patch, HADOOP-4209-v1.patch
>
>
> The TaskAttemptID now includes the redundant copy of the JobTracker's start time as milliseconds. We should instead change the JobID to have the longer unique string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4209) The TaskAttemptID should not have the JobTracker start time

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley updated HADOOP-4209:
----------------------------------

         Priority: Blocker  (was: Major)
    Fix Version/s: 0.19.0

> The TaskAttemptID should not have the JobTracker start time
> -----------------------------------------------------------
>
>                 Key: HADOOP-4209
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4209
>             Project: Hadoop Core
>          Issue Type: Bug
>            Reporter: Owen O'Malley
>            Priority: Blocker
>             Fix For: 0.19.0
>
>
> The TaskAttemptID now includes the redundant copy of the JobTracker's start time as milliseconds. We should instead change the JobID to have the longer unique string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4209) The TaskAttemptID should not have the JobTracker start time

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12635681#action_12635681 ] 

Amar Kamat commented on HADOOP-4209:
------------------------------------

bq. An example of TaskAttemptID with JT restart can also be added in javadoc for TaskAttemptID.
I purposefully didnt add any comment to {{TaskAttemptD}} as {{TaskAttemptD}} is totally independent of the restart/recovery logic. Adding a line related to restart/recovery might create some confusion. Let me know if this is not the case. We can add it to TIP documentation and also to recovery documentation. Thoughts?

> The TaskAttemptID should not have the JobTracker start time
> -----------------------------------------------------------
>
>                 Key: HADOOP-4209
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4209
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Owen O'Malley
>            Assignee: Amar Kamat
>            Priority: Blocker
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4209-v1.1.patch, HADOOP-4209-v1.patch
>
>
> The TaskAttemptID now includes the redundant copy of the JobTracker's start time as milliseconds. We should instead change the JobID to have the longer unique string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4209) The TaskAttemptID should not have the JobTracker start time

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amar Kamat updated HADOOP-4209:
-------------------------------

    Attachment: HADOOP-4209-v1.2.patch

Attaching an updated patch for trunk.

> The TaskAttemptID should not have the JobTracker start time
> -----------------------------------------------------------
>
>                 Key: HADOOP-4209
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4209
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Owen O'Malley
>            Assignee: Amar Kamat
>            Priority: Blocker
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4209-v1.1.patch, HADOOP-4209-v1.2.patch, HADOOP-4209-v1.patch
>
>
> The TaskAttemptID now includes the redundant copy of the JobTracker's start time as milliseconds. We should instead change the JobID to have the longer unique string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4209) The TaskAttemptID should not have the JobTracker start time

Posted by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12635665#action_12635665 ] 

Amareshwari Sriramadasu commented on HADOOP-4209:
-------------------------------------------------

Changes look good. 
An example of TaskAttemptID with JT restart can also be added in javadoc for TaskAttemptID.

> The TaskAttemptID should not have the JobTracker start time
> -----------------------------------------------------------
>
>                 Key: HADOOP-4209
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4209
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Owen O'Malley
>            Assignee: Amar Kamat
>            Priority: Blocker
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4209-v1.1.patch, HADOOP-4209-v1.patch
>
>
> The TaskAttemptID now includes the redundant copy of the JobTracker's start time as milliseconds. We should instead change the JobID to have the longer unique string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4209) The TaskAttemptID should not have the JobTracker start time

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12636267#action_12636267 ] 

Hadoop QA commented on HADOOP-4209:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12391128/HADOOP-4209-v1.1.patch
  against trunk revision 700923.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 9 new or modified tests.

    -1 patch.  The patch command could not apply the patch.

Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3418/console

This message is automatically generated.

> The TaskAttemptID should not have the JobTracker start time
> -----------------------------------------------------------
>
>                 Key: HADOOP-4209
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4209
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Owen O'Malley
>            Assignee: Amar Kamat
>            Priority: Blocker
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4209-v1.1.patch, HADOOP-4209-v1.patch
>
>
> The TaskAttemptID now includes the redundant copy of the JobTracker's start time as milliseconds. We should instead change the JobID to have the longer unique string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HADOOP-4209) The TaskAttemptID should not have the JobTracker start time

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amar Kamat reassigned HADOOP-4209:
----------------------------------

    Assignee: Amar Kamat

> The TaskAttemptID should not have the JobTracker start time
> -----------------------------------------------------------
>
>                 Key: HADOOP-4209
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4209
>             Project: Hadoop Core
>          Issue Type: Bug
>            Reporter: Owen O'Malley
>            Assignee: Amar Kamat
>            Priority: Blocker
>             Fix For: 0.19.0
>
>
> The TaskAttemptID now includes the redundant copy of the JobTracker's start time as milliseconds. We should instead change the JobID to have the longer unique string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4209) The TaskAttemptID should not have the JobTracker start time

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12632508#action_12632508 ] 

Amar Kamat commented on HADOOP-4209:
------------------------------------

Owen,
The timestamp is introduced for a different purpose. Consider a case where the JT runs for the first time and schedules an attempt from a tip tip_0. Let the attempt be attempt_0_0. Let assume that the jobtracker crashes while attempt_0_0 is still running. Upon restart, the JT might not have seen the attempt as its not logged to the history and so will go ahead and schedule another attempt from tip_0. Now this attempt will also have the same id as the earlier attempt (before restart) and hence the side-effect files will now clash. Consider a case where the attempt_0_0 is done but not logged. In such a case the same tracker can now ask for an attempt from tip_0 and might get attempt_0_0. Now not only the side effect files but also the local directories will clash. To avoid such issues we have kept the attempt id unique across restarts. This timestamp is redundant only if the jobtracker never restarts.

> The TaskAttemptID should not have the JobTracker start time
> -----------------------------------------------------------
>
>                 Key: HADOOP-4209
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4209
>             Project: Hadoop Core
>          Issue Type: Bug
>            Reporter: Owen O'Malley
>            Priority: Blocker
>             Fix For: 0.19.0
>
>
> The TaskAttemptID now includes the redundant copy of the JobTracker's start time as milliseconds. We should instead change the JobID to have the longer unique string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4209) The TaskAttemptID should not have the JobTracker start time

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12633619#action_12633619 ] 

Amar Kamat commented on HADOOP-4209:
------------------------------------

Enis,
Changing the format of attempt id might be a big change. I feel we should go with what Owen has suggested and if we hit the case where the ids clash then we should rethink on the format.

> The TaskAttemptID should not have the JobTracker start time
> -----------------------------------------------------------
>
>                 Key: HADOOP-4209
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4209
>             Project: Hadoop Core
>          Issue Type: Bug
>            Reporter: Owen O'Malley
>            Assignee: Amar Kamat
>            Priority: Blocker
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4209-v1.patch
>
>
> The TaskAttemptID now includes the redundant copy of the JobTracker's start time as milliseconds. We should instead change the JobID to have the longer unique string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4209) The TaskAttemptID should not have the JobTracker start time

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12635955#action_12635955 ] 

Amar Kamat commented on HADOOP-4209:
------------------------------------

_test-patch_ passed on my box. Running through hudson.

> The TaskAttemptID should not have the JobTracker start time
> -----------------------------------------------------------
>
>                 Key: HADOOP-4209
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4209
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Owen O'Malley
>            Assignee: Amar Kamat
>            Priority: Blocker
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4209-v1.1.patch, HADOOP-4209-v1.patch
>
>
> The TaskAttemptID now includes the redundant copy of the JobTracker's start time as milliseconds. We should instead change the JobID to have the longer unique string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4209) The TaskAttemptID should not have the JobTracker start time

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amar Kamat updated HADOOP-4209:
-------------------------------

    Status: Patch Available  (was: Open)

> The TaskAttemptID should not have the JobTracker start time
> -----------------------------------------------------------
>
>                 Key: HADOOP-4209
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4209
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Owen O'Malley
>            Assignee: Amar Kamat
>            Priority: Blocker
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4209-v1.1.patch, HADOOP-4209-v1.patch
>
>
> The TaskAttemptID now includes the redundant copy of the JobTracker's start time as milliseconds. We should instead change the JobID to have the longer unique string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4209) The TaskAttemptID should not have the JobTracker start time

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12632504#action_12632504 ] 

Owen O'Malley commented on HADOOP-4209:
---------------------------------------

I would propose that we should extend the jobid by HHMM, which will make all job ids where the JobTracker stays up more than a minute unique. I hope that is enough and that we don't really expect our JobTracker to only last 30 seconds and to launch jobs in that time. *grin*

> The TaskAttemptID should not have the JobTracker start time
> -----------------------------------------------------------
>
>                 Key: HADOOP-4209
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4209
>             Project: Hadoop Core
>          Issue Type: Bug
>            Reporter: Owen O'Malley
>            Priority: Blocker
>             Fix For: 0.19.0
>
>
> The TaskAttemptID now includes the redundant copy of the JobTracker's start time as milliseconds. We should instead change the JobID to have the longer unique string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4209) The TaskAttemptID should not have the JobTracker start time

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amar Kamat updated HADOOP-4209:
-------------------------------

    Status: Patch Available  (was: Open)

> The TaskAttemptID should not have the JobTracker start time
> -----------------------------------------------------------
>
>                 Key: HADOOP-4209
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4209
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Owen O'Malley
>            Assignee: Amar Kamat
>            Priority: Blocker
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4209-v1.1.patch, HADOOP-4209-v1.2.patch, HADOOP-4209-v1.patch
>
>
> The TaskAttemptID now includes the redundant copy of the JobTracker's start time as milliseconds. We should instead change the JobID to have the longer unique string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4209) The TaskAttemptID should not have the JobTracker start time

Posted by "Enis Soztutar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12632692#action_12632692 ] 

Enis Soztutar commented on HADOOP-4209:
---------------------------------------

How about 
{code}
attempt_200707121733_0003_m_000005_0_0
attempt_200707121733_0003_m_000005_0_1 // fails once
attempt_200707121733_0003_m_000005_1_0 // after a restart
attempt_200707121733_0003_m_000005_1_1 // fails after restart
attempt_200707121733_0003_m_000005_2_0 // after a second restart
{code}
The attempt count and history count need not be merged into one like "1001". Keeping them separate would simplify the code. 

> The TaskAttemptID should not have the JobTracker start time
> -----------------------------------------------------------
>
>                 Key: HADOOP-4209
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4209
>             Project: Hadoop Core
>          Issue Type: Bug
>            Reporter: Owen O'Malley
>            Assignee: Amar Kamat
>            Priority: Blocker
>             Fix For: 0.19.0
>
>
> The TaskAttemptID now includes the redundant copy of the JobTracker's start time as milliseconds. We should instead change the JobID to have the longer unique string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4209) The TaskAttemptID should not have the JobTracker start time

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12634776#action_12634776 ] 

Owen O'Malley commented on HADOOP-4209:
---------------------------------------

Let's go with the 1000*reboots + attempt. I think it will create less user confusion, since most jobs will have it set to 0. If we add the new number and it is always 0, users will be confused.

> The TaskAttemptID should not have the JobTracker start time
> -----------------------------------------------------------
>
>                 Key: HADOOP-4209
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4209
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Owen O'Malley
>            Assignee: Amar Kamat
>            Priority: Blocker
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4209-v1.patch
>
>
> The TaskAttemptID now includes the redundant copy of the JobTracker's start time as milliseconds. We should instead change the JobID to have the longer unique string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4209) The TaskAttemptID should not have the JobTracker start time

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12632534#action_12632534 ] 

Amar Kamat commented on HADOOP-4209:
------------------------------------

+1. Currently there is no way to know how many times the job got restarted. One way I can think of is by using the history. Every time the history is (re)written, bump up the restart count. Thoughts?

> The TaskAttemptID should not have the JobTracker start time
> -----------------------------------------------------------
>
>                 Key: HADOOP-4209
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4209
>             Project: Hadoop Core
>          Issue Type: Bug
>            Reporter: Owen O'Malley
>            Priority: Blocker
>             Fix For: 0.19.0
>
>
> The TaskAttemptID now includes the redundant copy of the JobTracker's start time as milliseconds. We should instead change the JobID to have the longer unique string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4209) The TaskAttemptID should not have the JobTracker start time

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amar Kamat updated HADOOP-4209:
-------------------------------

    Attachment: HADOOP-4209-v1.1.patch

Attaching the patch updated to 19.

> The TaskAttemptID should not have the JobTracker start time
> -----------------------------------------------------------
>
>                 Key: HADOOP-4209
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4209
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Owen O'Malley
>            Assignee: Amar Kamat
>            Priority: Blocker
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4209-v1.1.patch, HADOOP-4209-v1.patch
>
>
> The TaskAttemptID now includes the redundant copy of the JobTracker's start time as milliseconds. We should instead change the JobID to have the longer unique string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4209) The TaskAttemptID should not have the JobTracker start time

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12632524#action_12632524 ] 

Devaraj Das commented on HADOOP-4209:
-------------------------------------

I think Owen's suggestion makes sense.

> The TaskAttemptID should not have the JobTracker start time
> -----------------------------------------------------------
>
>                 Key: HADOOP-4209
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4209
>             Project: Hadoop Core
>          Issue Type: Bug
>            Reporter: Owen O'Malley
>            Priority: Blocker
>             Fix For: 0.19.0
>
>
> The TaskAttemptID now includes the redundant copy of the JobTracker's start time as milliseconds. We should instead change the JobID to have the longer unique string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4209) The TaskAttemptID should not have the JobTracker start time

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Devaraj Das updated HADOOP-4209:
--------------------------------

    Component/s: mapred

> The TaskAttemptID should not have the JobTracker start time
> -----------------------------------------------------------
>
>                 Key: HADOOP-4209
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4209
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Owen O'Malley
>            Assignee: Amar Kamat
>            Priority: Blocker
>             Fix For: 0.19.0
>
>         Attachments: HADOOP-4209-v1.patch
>
>
> The TaskAttemptID now includes the redundant copy of the JobTracker's start time as milliseconds. We should instead change the JobID to have the longer unique string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.