You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-issues@hadoop.apache.org by "Scott Chen (JIRA)" <ji...@apache.org> on 2010/05/07 02:07:48 UTC

[jira] Created: (MAPREDUCE-1764) FairScheduler locality delay may put heavy pressure on Jobtracker

FairScheduler locality delay may put heavy pressure on Jobtracker
-----------------------------------------------------------------

                 Key: MAPREDUCE-1764
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1764
             Project: Hadoop Map/Reduce
          Issue Type: Bug
            Reporter: Scott Chen
            Assignee: Scott Chen


FairScheduler locality delay feature holds the scheduling of jobs until it gets good locality.
This greatly improves the locality of the tasks. Reduce the cost of traffic.

We have observed the following problem on FairScheduler locality delay:
We have some machines have older data and some newly added machines do not have important data.
When these machines send heartbeat, JT scans tasks to find jobs has the right locality.
Often time, these machines will scan all of the tasks of all the jobs and do not get any tasks.
Scanning all the tasks on the JT is very costly. This makes JT very slow.
And these machines often time do not get scheduled. This hurts the cluster utilization.

Any ideas?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1764) FairScheduler locality delay may put heavy pressure on Jobtracker

Posted by "Scott Chen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12866336#action_12866336 ] 

Scott Chen commented on MAPREDUCE-1764:
---------------------------------------

One option is to cache the searched result for each TT. So next time we directly skip the TT without the allowed locality level.
What do you think?

> FairScheduler locality delay may put heavy pressure on Jobtracker
> -----------------------------------------------------------------
>
>                 Key: MAPREDUCE-1764
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1764
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Scott Chen
>            Assignee: Dmytro Molkov
>             Fix For: 0.22.0
>
>
> FairScheduler locality delay feature holds the scheduling of jobs until it gets good locality.
> This greatly improves the locality of the tasks. Reduce the cost of traffic.
> We have observed the following problem on FairScheduler locality delay:
> We have some machines have older data and some newly added machines do not have important data.
> When these machines send heartbeat, JT scans tasks to find jobs has the right locality.
> Often time, these machines will scan all of the tasks of all the jobs and do not get any tasks.
> Scanning all the tasks on the JT is very costly. This makes JT very slow.
> And these machines often time do not get scheduled. This hurts the cluster utilization.
> Any ideas?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1764) FairScheduler locality delay may put heavy pressure on Jobtracker

Posted by "Scott Chen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Chen updated MAPREDUCE-1764:
----------------------------------

    Assignee: Dmytro Molkov  (was: Scott Chen)

> FairScheduler locality delay may put heavy pressure on Jobtracker
> -----------------------------------------------------------------
>
>                 Key: MAPREDUCE-1764
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1764
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Scott Chen
>            Assignee: Dmytro Molkov
>             Fix For: 0.22.0
>
>
> FairScheduler locality delay feature holds the scheduling of jobs until it gets good locality.
> This greatly improves the locality of the tasks. Reduce the cost of traffic.
> We have observed the following problem on FairScheduler locality delay:
> We have some machines have older data and some newly added machines do not have important data.
> When these machines send heartbeat, JT scans tasks to find jobs has the right locality.
> Often time, these machines will scan all of the tasks of all the jobs and do not get any tasks.
> Scanning all the tasks on the JT is very costly. This makes JT very slow.
> And these machines often time do not get scheduled. This hurts the cluster utilization.
> Any ideas?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1764) FairScheduler locality delay may put heavy pressure on Jobtracker

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12866389#action_12866389 ] 

Joydeep Sen Sarma commented on MAPREDUCE-1764:
----------------------------------------------

how expensive (memory wise) would it be to add indices from node->[tasks] and rack->[tasks] based on split information?

leaving slots idle seems like a real bummer. it would seem better to be greedy and always grab something (especially if the fraction of non-local tasks is within tolerable limits)

> FairScheduler locality delay may put heavy pressure on Jobtracker
> -----------------------------------------------------------------
>
>                 Key: MAPREDUCE-1764
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1764
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Scott Chen
>            Assignee: Dmytro Molkov
>             Fix For: 0.22.0
>
>
> FairScheduler locality delay feature holds the scheduling of jobs until it gets good locality.
> This greatly improves the locality of the tasks. Reduce the cost of traffic.
> We have observed the following problem on FairScheduler locality delay:
> We have some machines have older data and some newly added machines do not have important data.
> When these machines send heartbeat, JT scans tasks to find jobs has the right locality.
> Often time, these machines will scan all of the tasks of all the jobs and do not get any tasks.
> Scanning all the tasks on the JT is very costly. This makes JT very slow.
> And these machines often time do not get scheduled. This hurts the cluster utilization.
> Any ideas?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1764) FairScheduler locality delay may put heavy pressure on Jobtracker

Posted by "Scott Chen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12867300#action_12867300 ] 

Scott Chen commented on MAPREDUCE-1764:
---------------------------------------

Joydeep:

Matei and I had some discussion and we have also looked the code.
In JobInProgress, there is such a HashMap of node->[tasks] and rack->[tasks] exists.
It is not clear to me why this is so slow.

I agree with your point that we should not leave the slots idle especially in the case that cluster is full of jobs.

> FairScheduler locality delay may put heavy pressure on Jobtracker
> -----------------------------------------------------------------
>
>                 Key: MAPREDUCE-1764
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1764
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Scott Chen
>            Assignee: Dmytro Molkov
>             Fix For: 0.22.0
>
>
> FairScheduler locality delay feature holds the scheduling of jobs until it gets good locality.
> This greatly improves the locality of the tasks. Reduce the cost of traffic.
> We have observed the following problem on FairScheduler locality delay:
> We have some machines have older data and some newly added machines do not have important data.
> When these machines send heartbeat, JT scans tasks to find jobs has the right locality.
> Often time, these machines will scan all of the tasks of all the jobs and do not get any tasks.
> Scanning all the tasks on the JT is very costly. This makes JT very slow.
> And these machines often time do not get scheduled. This hurts the cluster utilization.
> Any ideas?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1764) FairScheduler locality delay may put heavy pressure on Jobtracker

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12867424#action_12867424 ] 

Joydeep Sen Sarma commented on MAPREDUCE-1764:
----------------------------------------------

it seems better to find out why the index is not helping (assuming it's actually being used) rather than adding another cache on top ..

> FairScheduler locality delay may put heavy pressure on Jobtracker
> -----------------------------------------------------------------
>
>                 Key: MAPREDUCE-1764
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1764
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Scott Chen
>            Assignee: Dmytro Molkov
>             Fix For: 0.22.0
>
>
> FairScheduler locality delay feature holds the scheduling of jobs until it gets good locality.
> This greatly improves the locality of the tasks. Reduce the cost of traffic.
> We have observed the following problem on FairScheduler locality delay:
> We have some machines have older data and some newly added machines do not have important data.
> When these machines send heartbeat, JT scans tasks to find jobs has the right locality.
> Often time, these machines will scan all of the tasks of all the jobs and do not get any tasks.
> Scanning all the tasks on the JT is very costly. This makes JT very slow.
> And these machines often time do not get scheduled. This hurts the cluster utilization.
> Any ideas?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAPREDUCE-1764) FairScheduler locality delay may put heavy pressure on Jobtracker

Posted by "Scott Chen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Chen updated MAPREDUCE-1764:
----------------------------------

        Fix Version/s: 0.22.0
    Affects Version/s: 0.22.0

> FairScheduler locality delay may put heavy pressure on Jobtracker
> -----------------------------------------------------------------
>
>                 Key: MAPREDUCE-1764
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1764
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Scott Chen
>            Assignee: Scott Chen
>             Fix For: 0.22.0
>
>
> FairScheduler locality delay feature holds the scheduling of jobs until it gets good locality.
> This greatly improves the locality of the tasks. Reduce the cost of traffic.
> We have observed the following problem on FairScheduler locality delay:
> We have some machines have older data and some newly added machines do not have important data.
> When these machines send heartbeat, JT scans tasks to find jobs has the right locality.
> Often time, these machines will scan all of the tasks of all the jobs and do not get any tasks.
> Scanning all the tasks on the JT is very costly. This makes JT very slow.
> And these machines often time do not get scheduled. This hurts the cluster utilization.
> Any ideas?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.