You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Sayan Kole <sa...@gmail.com> on 2013/03/06 21:47:05 UTC

preferential distribution of tasks to tasktracker by jobtracker based on specific criteria (cpu frequency)

Hi,
   I want the jobtracker to prioritize the assignment of tasks to certain
tasktrackers.
eg: If a tasktracker meets certain criteria better than other ones, I want
to assign task to that tasktracker first (ideally I want the jobtracker to
sort tasktrackers based on certain criteria (eg cpu frequency) and then
assign tasks to tasktracker based on that sorted assignment.

Could you point me to the files(eg Jobtracker.java....) and routines to
modify or a guideline to a proper way of implementing it.

Thanks,
Sayan Kole.

Re: preferential distribution of tasks to tasktracker by jobtracker based on specific criteria (cpu frequency)

Posted by Harsh J <ha...@cloudera.com>.
MRv1: Both the CapacityScheduler and FairScheduler support some form
of resource aware scheduling (Capacity supports memory provisions
while Fair provides an interface you can plug into to influence its
decisions). You can also implement a custom scheduler - as
TaskTrackers do send their resource details (which includes CPU). The
class you need to derive for this is
https://github.com/apache/hadoop-common/blob/branch-1/src/mapred/org/apache/hadoop/mapred/TaskScheduler.java,
and the assignTasks(…)'s TaskTracker object (a repr. of the TT for the
JT) lets you get the TaskTrackerStatus object which has the relevant
information you seek:
https://github.com/apache/hadoop-common/blob/branch-1/src/mapred/org/apache/hadoop/mapred/TaskTrackerStatus.java

Note though that in MRv2, which runs on YARN and is present in the
current 2.x releases, the JobTracker and the TaskTrackers are both
gone and the scheduler is now a part of the Resource Manager as a
generic component under YARN. Just something to keep in mind if you're
continuing to base your work (research or otherwise) on an older,
maintenance release. Your MRv1 implementation may become irrelevant or
cease to work after the upgrade in future.

YARN: The JIRA https://issues.apache.org/jira/browse/YARN-2 added
native CPU core based scheduling, which should also help your need
right out of the box perhaps.

On Thu, Mar 7, 2013 at 2:17 AM, Sayan Kole <sa...@gmail.com> wrote:
> Hi,
>    I want the jobtracker to prioritize the assignment of tasks to certain
> tasktrackers.
> eg: If a tasktracker meets certain criteria better than other ones, I want
> to assign task to that tasktracker first (ideally I want the jobtracker to
> sort tasktrackers based on certain criteria (eg cpu frequency) and then
> assign tasks to tasktracker based on that sorted assignment.
>
> Could you point me to the files(eg Jobtracker.java....) and routines to
> modify or a guideline to a proper way of implementing it.
>
> Thanks,
> Sayan Kole.
>



--
Harsh J

Re: preferential distribution of tasks to tasktracker by jobtracker based on specific criteria (cpu frequency)

Posted by Harsh J <ha...@cloudera.com>.
MRv1: Both the CapacityScheduler and FairScheduler support some form
of resource aware scheduling (Capacity supports memory provisions
while Fair provides an interface you can plug into to influence its
decisions). You can also implement a custom scheduler - as
TaskTrackers do send their resource details (which includes CPU). The
class you need to derive for this is
https://github.com/apache/hadoop-common/blob/branch-1/src/mapred/org/apache/hadoop/mapred/TaskScheduler.java,
and the assignTasks(…)'s TaskTracker object (a repr. of the TT for the
JT) lets you get the TaskTrackerStatus object which has the relevant
information you seek:
https://github.com/apache/hadoop-common/blob/branch-1/src/mapred/org/apache/hadoop/mapred/TaskTrackerStatus.java

Note though that in MRv2, which runs on YARN and is present in the
current 2.x releases, the JobTracker and the TaskTrackers are both
gone and the scheduler is now a part of the Resource Manager as a
generic component under YARN. Just something to keep in mind if you're
continuing to base your work (research or otherwise) on an older,
maintenance release. Your MRv1 implementation may become irrelevant or
cease to work after the upgrade in future.

YARN: The JIRA https://issues.apache.org/jira/browse/YARN-2 added
native CPU core based scheduling, which should also help your need
right out of the box perhaps.

On Thu, Mar 7, 2013 at 2:17 AM, Sayan Kole <sa...@gmail.com> wrote:
> Hi,
>    I want the jobtracker to prioritize the assignment of tasks to certain
> tasktrackers.
> eg: If a tasktracker meets certain criteria better than other ones, I want
> to assign task to that tasktracker first (ideally I want the jobtracker to
> sort tasktrackers based on certain criteria (eg cpu frequency) and then
> assign tasks to tasktracker based on that sorted assignment.
>
> Could you point me to the files(eg Jobtracker.java....) and routines to
> modify or a guideline to a proper way of implementing it.
>
> Thanks,
> Sayan Kole.
>



--
Harsh J

Re: preferential distribution of tasks to tasktracker by jobtracker based on specific criteria (cpu frequency)

Posted by Harsh J <ha...@cloudera.com>.
MRv1: Both the CapacityScheduler and FairScheduler support some form
of resource aware scheduling (Capacity supports memory provisions
while Fair provides an interface you can plug into to influence its
decisions). You can also implement a custom scheduler - as
TaskTrackers do send their resource details (which includes CPU). The
class you need to derive for this is
https://github.com/apache/hadoop-common/blob/branch-1/src/mapred/org/apache/hadoop/mapred/TaskScheduler.java,
and the assignTasks(…)'s TaskTracker object (a repr. of the TT for the
JT) lets you get the TaskTrackerStatus object which has the relevant
information you seek:
https://github.com/apache/hadoop-common/blob/branch-1/src/mapred/org/apache/hadoop/mapred/TaskTrackerStatus.java

Note though that in MRv2, which runs on YARN and is present in the
current 2.x releases, the JobTracker and the TaskTrackers are both
gone and the scheduler is now a part of the Resource Manager as a
generic component under YARN. Just something to keep in mind if you're
continuing to base your work (research or otherwise) on an older,
maintenance release. Your MRv1 implementation may become irrelevant or
cease to work after the upgrade in future.

YARN: The JIRA https://issues.apache.org/jira/browse/YARN-2 added
native CPU core based scheduling, which should also help your need
right out of the box perhaps.

On Thu, Mar 7, 2013 at 2:17 AM, Sayan Kole <sa...@gmail.com> wrote:
> Hi,
>    I want the jobtracker to prioritize the assignment of tasks to certain
> tasktrackers.
> eg: If a tasktracker meets certain criteria better than other ones, I want
> to assign task to that tasktracker first (ideally I want the jobtracker to
> sort tasktrackers based on certain criteria (eg cpu frequency) and then
> assign tasks to tasktracker based on that sorted assignment.
>
> Could you point me to the files(eg Jobtracker.java....) and routines to
> modify or a guideline to a proper way of implementing it.
>
> Thanks,
> Sayan Kole.
>



--
Harsh J

Re: preferential distribution of tasks to tasktracker by jobtracker based on specific criteria (cpu frequency)

Posted by Harsh J <ha...@cloudera.com>.
MRv1: Both the CapacityScheduler and FairScheduler support some form
of resource aware scheduling (Capacity supports memory provisions
while Fair provides an interface you can plug into to influence its
decisions). You can also implement a custom scheduler - as
TaskTrackers do send their resource details (which includes CPU). The
class you need to derive for this is
https://github.com/apache/hadoop-common/blob/branch-1/src/mapred/org/apache/hadoop/mapred/TaskScheduler.java,
and the assignTasks(…)'s TaskTracker object (a repr. of the TT for the
JT) lets you get the TaskTrackerStatus object which has the relevant
information you seek:
https://github.com/apache/hadoop-common/blob/branch-1/src/mapred/org/apache/hadoop/mapred/TaskTrackerStatus.java

Note though that in MRv2, which runs on YARN and is present in the
current 2.x releases, the JobTracker and the TaskTrackers are both
gone and the scheduler is now a part of the Resource Manager as a
generic component under YARN. Just something to keep in mind if you're
continuing to base your work (research or otherwise) on an older,
maintenance release. Your MRv1 implementation may become irrelevant or
cease to work after the upgrade in future.

YARN: The JIRA https://issues.apache.org/jira/browse/YARN-2 added
native CPU core based scheduling, which should also help your need
right out of the box perhaps.

On Thu, Mar 7, 2013 at 2:17 AM, Sayan Kole <sa...@gmail.com> wrote:
> Hi,
>    I want the jobtracker to prioritize the assignment of tasks to certain
> tasktrackers.
> eg: If a tasktracker meets certain criteria better than other ones, I want
> to assign task to that tasktracker first (ideally I want the jobtracker to
> sort tasktrackers based on certain criteria (eg cpu frequency) and then
> assign tasks to tasktracker based on that sorted assignment.
>
> Could you point me to the files(eg Jobtracker.java....) and routines to
> modify or a guideline to a proper way of implementing it.
>
> Thanks,
> Sayan Kole.
>



--
Harsh J