You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Owen O'Malley (JIRA)" <ji...@apache.org> on 2007/06/07 23:53:26 UTC

[jira] Created: (HADOOP-1473) Make jobids unique across jobtracker restarts

Make jobids unique across jobtracker restarts
---------------------------------------------

                 Key: HADOOP-1473
                 URL: https://issues.apache.org/jira/browse/HADOOP-1473
             Project: Hadoop
          Issue Type: Improvement
          Components: mapred
    Affects Versions: 0.12.3
            Reporter: Owen O'Malley
            Assignee: Owen O'Malley
             Fix For: 0.14.0


I'll make the job ids unique across JobTracker restarts by adding the startup time of the JobTracker, so if the JobTracker started at 8 Jun 2007 14:50, the first job would be called:

job_200706081450_00001

the second job would be:

job_200706081450_00002

and so on...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1473) Make jobids unique across jobtracker restarts

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12511222 ] 

Hadoop QA commented on HADOOP-1473:
-----------------------------------

-1, build or testing failed

2 attempts failed to build and test the latest attachment http://issues.apache.org/jira/secure/attachment/12361427/new-job-id.patch against trunk revision r554144.

Test results:   http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/377/testReport/
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/377/console

Please note that this message is automatically generated and may represent a problem with the automation system and not the patch.

> Make jobids unique across jobtracker restarts
> ---------------------------------------------
>
>                 Key: HADOOP-1473
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1473
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.3
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>             Fix For: 0.14.0
>
>         Attachments: new-job-id.patch
>
>
> I'll make the job ids unique across JobTracker restarts by adding the startup time of the JobTracker, so if the JobTracker started at 8 Jun 2007 14:50, the first job would be called:
> job_200706081450_00001
> the second job would be:
> job_200706081450_00002
> and so on...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1473) Make jobids unique across jobtracker restarts

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley updated HADOOP-1473:
----------------------------------

    Attachment: new-job-id.patch

This patch adds the "job tracker start time" to all of the job and task ids. This should make all of the job ids unique on a given job tracker.

> Make jobids unique across jobtracker restarts
> ---------------------------------------------
>
>                 Key: HADOOP-1473
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1473
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.3
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>             Fix For: 0.14.0
>
>         Attachments: new-job-id.patch
>
>
> I'll make the job ids unique across JobTracker restarts by adding the startup time of the JobTracker, so if the JobTracker started at 8 Jun 2007 14:50, the first job would be called:
> job_200706081450_00001
> the second job would be:
> job_200706081450_00002
> and so on...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1473) Make jobids unique across jobtracker restarts

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley updated HADOOP-1473:
----------------------------------

    Status: Patch Available  (was: Open)

> Make jobids unique across jobtracker restarts
> ---------------------------------------------
>
>                 Key: HADOOP-1473
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1473
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.3
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>             Fix For: 0.14.0
>
>         Attachments: new-job-id.patch
>
>
> I'll make the job ids unique across JobTracker restarts by adding the startup time of the JobTracker, so if the JobTracker started at 8 Jun 2007 14:50, the first job would be called:
> job_200706081450_00001
> the second job would be:
> job_200706081450_00002
> and so on...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Re: [jira] Commented: (HADOOP-1473) Make jobids unique across jobtracker restarts

Posted by Alejandro Abdelnur <tu...@gmail.com>.
BTW, patch for 1121 requires this to be able to handle job recover on
JT failure.

It is using the same date format.

The only difference is that is was doing it only for jobs that are
autorecoverable.

I'll have to remove this from my patch.

Cheers.

A


On 7/10/07, Nigel Daley <nd...@yahoo-inc.com> wrote:
> +1 for dates in Owen's suggested format, so that the job id's will be
> easily sortable.
>
> On Jul 10, 2007, at 1:11 AM, Enis Soztutar (JIRA) wrote:
>
> >
> >     [ https://issues.apache.org/jira/browse/HADOOP-1473?
> > page=com.atlassian.jira.plugin.system.issuetabpanels:comment-
> > tabpanel#action_12511351 ]
> >
> > Enis Soztutar commented on HADOOP-1473:
> > ---------------------------------------
> >
> >> its looks much cleaner and also is easy to grep on the logs with
> >> the jobs ran on some day and some month.
> > The date in the job's id is not intended to be the date job is run
> > but the JT is started.
> >
> >> +1 for date/times, they're generally easier to remember than
> >> random strings.
> > you do not have to remember the dates' unless you're dealing with
> > jobs' which run on (now) stopped JT.
> >
> > IMO as far as "look at job 75" is concerned, i think either method
> > would make no difference.
> >        look at job 75 => find {{job_200706081450_00075}}
> > or   look at job 75 => find {{job_jkx3y7_00075}}
> >
> > my vote is to 4-6 digit hash of the JT start time
> >        look at job 75 => find {{job_4390_00075}}
> >
> > but now it is harder to explain what 4390 is to new comers.
> >
> >
> >> Make jobids unique across jobtracker restarts
> >> ---------------------------------------------
> >>
> >>                 Key: HADOOP-1473
> >>                 URL: https://issues.apache.org/jira/browse/
> >> HADOOP-1473
> >>             Project: Hadoop
> >>          Issue Type: Improvement
> >>          Components: mapred
> >>    Affects Versions: 0.12.3
> >>            Reporter: Owen O'Malley
> >>            Assignee: Owen O'Malley
> >>             Fix For: 0.14.0
> >>
> >>         Attachments: new-job-id.patch
> >>
> >>
> >> I'll make the job ids unique across JobTracker restarts by adding
> >> the startup time of the JobTracker, so if the JobTracker started
> >> at 8 Jun 2007 14:50, the first job would be called:
> >> job_200706081450_00001
> >> the second job would be:
> >> job_200706081450_00002
> >> and so on...
> >
> > --
> > This message is automatically generated by JIRA.
> > -
> > You can reply to this email to add a comment to the issue online.
> >
>
>

Re: [jira] Commented: (HADOOP-1473) Make jobids unique across jobtracker restarts

Posted by Nigel Daley <nd...@yahoo-inc.com>.
+1 for dates in Owen's suggested format, so that the job id's will be  
easily sortable.

On Jul 10, 2007, at 1:11 AM, Enis Soztutar (JIRA) wrote:

>
>     [ https://issues.apache.org/jira/browse/HADOOP-1473? 
> page=com.atlassian.jira.plugin.system.issuetabpanels:comment- 
> tabpanel#action_12511351 ]
>
> Enis Soztutar commented on HADOOP-1473:
> ---------------------------------------
>
>> its looks much cleaner and also is easy to grep on the logs with  
>> the jobs ran on some day and some month.
> The date in the job's id is not intended to be the date job is run  
> but the JT is started.
>
>> +1 for date/times, they're generally easier to remember than  
>> random strings.
> you do not have to remember the dates' unless you're dealing with  
> jobs' which run on (now) stopped JT.
>
> IMO as far as "look at job 75" is concerned, i think either method  
> would make no difference.
>        look at job 75 => find {{job_200706081450_00075}}
> or   look at job 75 => find {{job_jkx3y7_00075}}
>
> my vote is to 4-6 digit hash of the JT start time
>        look at job 75 => find {{job_4390_00075}}
>
> but now it is harder to explain what 4390 is to new comers.
>
>
>> Make jobids unique across jobtracker restarts
>> ---------------------------------------------
>>
>>                 Key: HADOOP-1473
>>                 URL: https://issues.apache.org/jira/browse/ 
>> HADOOP-1473
>>             Project: Hadoop
>>          Issue Type: Improvement
>>          Components: mapred
>>    Affects Versions: 0.12.3
>>            Reporter: Owen O'Malley
>>            Assignee: Owen O'Malley
>>             Fix For: 0.14.0
>>
>>         Attachments: new-job-id.patch
>>
>>
>> I'll make the job ids unique across JobTracker restarts by adding  
>> the startup time of the JobTracker, so if the JobTracker started  
>> at 8 Jun 2007 14:50, the first job would be called:
>> job_200706081450_00001
>> the second job would be:
>> job_200706081450_00002
>> and so on...
>
> -- 
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>


[jira] Commented: (HADOOP-1473) Make jobids unique across jobtracker restarts

Posted by "Enis Soztutar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12511351 ] 

Enis Soztutar commented on HADOOP-1473:
---------------------------------------

> its looks much cleaner and also is easy to grep on the logs with the jobs ran on some day and some month.
The date in the job's id is not intended to be the date job is run but the JT is started.

> +1 for date/times, they're generally easier to remember than random strings.
you do not have to remember the dates' unless you're dealing with jobs' which run on (now) stopped JT. 

IMO as far as "look at job 75" is concerned, i think either method would make no difference. 
       look at job 75 => find {{job_200706081450_00075}}
or   look at job 75 => find {{job_jkx3y7_00075}}

my vote is to 4-6 digit hash of the JT start time
       look at job 75 => find {{job_4390_00075}}

but now it is harder to explain what 4390 is to new comers. 


> Make jobids unique across jobtracker restarts
> ---------------------------------------------
>
>                 Key: HADOOP-1473
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1473
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.3
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>             Fix For: 0.14.0
>
>         Attachments: new-job-id.patch
>
>
> I'll make the job ids unique across JobTracker restarts by adding the startup time of the JobTracker, so if the JobTracker started at 8 Jun 2007 14:50, the first job would be called:
> job_200706081450_00001
> the second job would be:
> job_200706081450_00002
> and so on...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1473) Make jobids unique across jobtracker restarts

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12512057 ] 

Hudson commented on HADOOP-1473:
--------------------------------

Integrated in Hadoop-Nightly #152 (See [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/152/])

> Make jobids unique across jobtracker restarts
> ---------------------------------------------
>
>                 Key: HADOOP-1473
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1473
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.3
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>             Fix For: 0.14.0
>
>         Attachments: new-job-id.patch
>
>
> I'll make the job ids unique across JobTracker restarts by adding the startup time of the JobTracker, so if the JobTracker started at 8 Jun 2007 14:50, the first job would be called:
> job_200706081450_00001
> the second job would be:
> job_200706081450_00002
> and so on...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1473) Make jobids unique across jobtracker restarts

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12511249 ] 

Doug Cutting commented on HADOOP-1473:
--------------------------------------

Having job ids contain an easily-readable jobtracker start time does not add much utility, and it costs a lot of screen space.  I'd vote to use base 36 instead, for both the jobtracker start time and the job number.  This would render a job id as something like "jkxio1-001", half the size for the same range of values.

> Make jobids unique across jobtracker restarts
> ---------------------------------------------
>
>                 Key: HADOOP-1473
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1473
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.3
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>             Fix For: 0.14.0
>
>         Attachments: new-job-id.patch
>
>
> I'll make the job ids unique across JobTracker restarts by adding the startup time of the JobTracker, so if the JobTracker started at 8 Jun 2007 14:50, the first job would be called:
> job_200706081450_00001
> the second job would be:
> job_200706081450_00002
> and so on...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1473) Make jobids unique across jobtracker restarts

Posted by "Sameer Paranjpye (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12511278 ] 

Sameer Paranjpye commented on HADOOP-1473:
------------------------------------------

+1 for date/times, they're generally easier to remember than random strings.

Also, do we want the date/time of the jobtracker re-start to be in the job name? Would it make sense to just re-cycle job id's every day? 
When scanning for a job it would be easy to scan for 'job 4 from Wednesday the 11th', rather than 'job 4 after the restart on Wednesday the 11th'



> Make jobids unique across jobtracker restarts
> ---------------------------------------------
>
>                 Key: HADOOP-1473
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1473
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.3
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>             Fix For: 0.14.0
>
>         Attachments: new-job-id.patch
>
>
> I'll make the job ids unique across JobTracker restarts by adding the startup time of the JobTracker, so if the JobTracker started at 8 Jun 2007 14:50, the first job would be called:
> job_200706081450_00001
> the second job would be:
> job_200706081450_00002
> and so on...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1473) Make jobids unique across jobtracker restarts

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated HADOOP-1473:
---------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

I just committed this.  Thanks, Owen!

> Make jobids unique across jobtracker restarts
> ---------------------------------------------
>
>                 Key: HADOOP-1473
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1473
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.3
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>             Fix For: 0.14.0
>
>         Attachments: new-job-id.patch
>
>
> I'll make the job ids unique across JobTracker restarts by adding the startup time of the JobTracker, so if the JobTracker started at 8 Jun 2007 14:50, the first job would be called:
> job_200706081450_00001
> the second job would be:
> job_200706081450_00002
> and so on...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1473) Make jobids unique across jobtracker restarts

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12502547 ] 

Doug Cutting commented on HADOOP-1473:
--------------------------------------

This sounds reasonable to me.  +1

> Make jobids unique across jobtracker restarts
> ---------------------------------------------
>
>                 Key: HADOOP-1473
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1473
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.3
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>             Fix For: 0.14.0
>
>
> I'll make the job ids unique across JobTracker restarts by adding the startup time of the JobTracker, so if the JobTracker started at 8 Jun 2007 14:50, the first job would be called:
> job_200706081450_00001
> the second job would be:
> job_200706081450_00002
> and so on...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1473) Make jobids unique across jobtracker restarts

Posted by "Mahadev konar (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12511264 ] 

Mahadev konar commented on HADOOP-1473:
---------------------------------------

+1 on owen's suggestion. its looks much cleaner  and also is easy to grep on the logs with the jobs ran on some day and some month.

> Make jobids unique across jobtracker restarts
> ---------------------------------------------
>
>                 Key: HADOOP-1473
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1473
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.3
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>             Fix For: 0.14.0
>
>         Attachments: new-job-id.patch
>
>
> I'll make the job ids unique across JobTracker restarts by adding the startup time of the JobTracker, so if the JobTracker started at 8 Jun 2007 14:50, the first job would be called:
> job_200706081450_00001
> the second job would be:
> job_200706081450_00002
> and so on...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1473) Make jobids unique across jobtracker restarts

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12511253 ] 

Owen O'Malley commented on HADOOP-1473:
---------------------------------------

The Hadoop-QA failure seems to be streaming TestSymLinks, which is generally flaky and not the fault of the patch. (Although I did have to change other tests to work with the new job ids.)

Base-36 numbers are ugly and hard for users to remember and distinguish. I can figure out yesterday's date in the readable form, but not in base-36. I think it is a big usability hit to make the numbers unreadable. I often have people tell me look at job 75 on node1000. That is much harder if it is some strange combination of digits and numbers.



> Make jobids unique across jobtracker restarts
> ---------------------------------------------
>
>                 Key: HADOOP-1473
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1473
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.3
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>             Fix For: 0.14.0
>
>         Attachments: new-job-id.patch
>
>
> I'll make the job ids unique across JobTracker restarts by adding the startup time of the JobTracker, so if the JobTracker started at 8 Jun 2007 14:50, the first job would be called:
> job_200706081450_00001
> the second job would be:
> job_200706081450_00002
> and so on...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1473) Make jobids unique across jobtracker restarts

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12511257 ] 

Doug Cutting commented on HADOOP-1473:
--------------------------------------

> Base-36 numbers are ugly and hard for users to remember and distinguish.

Longer numbers are harder to remember and distinguish.  Personally, it is easier for me to remember "jkx3y7" than "200706081453".  In both cases, if I'm scanning visually I'm likely to check only the last few digits.

> I often have people tell me look at job 75 on node1000.

That would change to "look at job 23 on node1000".  Is that onerous?  Even if it were "look at job 2v" would that be a problem?

> Make jobids unique across jobtracker restarts
> ---------------------------------------------
>
>                 Key: HADOOP-1473
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1473
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.3
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>             Fix For: 0.14.0
>
>         Attachments: new-job-id.patch
>
>
> I'll make the job ids unique across JobTracker restarts by adding the startup time of the JobTracker, so if the JobTracker started at 8 Jun 2007 14:50, the first job would be called:
> job_200706081450_00001
> the second job would be:
> job_200706081450_00002
> and so on...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.