You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hama.apache.org by Micle Bu <mi...@gmail.com> on 2013/04/17 17:15:31 UTC

Data locality in Hama

Hi all,

I'm learning data locality in Hama, and found there is a
BestEffortDataLocalTaskAllocator class for this purpose. It's a good idea
to assign task to the groom which contains its split, getGroomToSchedule()
play this role.

Well, in getGroomToSchedule(), the code like:

GroomServerStatus groom = grooms.get(location);
...
if (taskInGroom < groom.getMaxTasks() &&
location.equals(groom.getGroomHostName())) {
        return groom.getGroomHostName();
}

It seems that location.equals(groom.getGroomHostName() is always true, so
it just select the first groom which contains split? Am i right?

Thanks in advance!

Micle Bu

Re: Data locality in Hama

Posted by Micle Bu <mi...@gmail.com>.
Thanks Suraj!

Good, another way is to select the groom server that has most blocks/parts
of split(when splitSize > HDFS blocksize, split may has several HDFS
blocks) to obtain maximum locality!
Taking hostname in different rack is another good hint.

Micle Bu


On Wed, Apr 17, 2013 at 11:29 PM, Suraj Menon <su...@apache.org>wrote:

> Good catch! Yes the logic is to find the first groom server that has the
> split and has available slots for execution.
> You might note that depending on the HDFS allocation, this hostname might
> not be in the same rack. You are welcome to fix this.
>
>
> On Wed, Apr 17, 2013 at 11:15 AM, Micle Bu <mi...@gmail.com> wrote:
>
> > Hi all,
> >
> > I'm learning data locality in Hama, and found there is a
> > BestEffortDataLocalTaskAllocator class for this purpose. It's a good idea
> > to assign task to the groom which contains its split,
> getGroomToSchedule()
> > play this role.
> >
> > Well, in getGroomToSchedule(), the code like:
> >
> > GroomServerStatus groom = grooms.get(location);
> > ...
> > if (taskInGroom < groom.getMaxTasks() &&
> > location.equals(groom.getGroomHostName())) {
> >         return groom.getGroomHostName();
> > }
> >
> > It seems that location.equals(groom.getGroomHostName() is always true, so
> > it just select the first groom which contains split? Am i right?
> >
> > Thanks in advance!
> >
> > Micle Bu
> >
>

Re: Data locality in Hama

Posted by Suraj Menon <su...@apache.org>.
Good catch! Yes the logic is to find the first groom server that has the
split and has available slots for execution.
You might note that depending on the HDFS allocation, this hostname might
not be in the same rack. You are welcome to fix this.


On Wed, Apr 17, 2013 at 11:15 AM, Micle Bu <mi...@gmail.com> wrote:

> Hi all,
>
> I'm learning data locality in Hama, and found there is a
> BestEffortDataLocalTaskAllocator class for this purpose. It's a good idea
> to assign task to the groom which contains its split, getGroomToSchedule()
> play this role.
>
> Well, in getGroomToSchedule(), the code like:
>
> GroomServerStatus groom = grooms.get(location);
> ...
> if (taskInGroom < groom.getMaxTasks() &&
> location.equals(groom.getGroomHostName())) {
>         return groom.getGroomHostName();
> }
>
> It seems that location.equals(groom.getGroomHostName() is always true, so
> it just select the first groom which contains split? Am i right?
>
> Thanks in advance!
>
> Micle Bu
>