You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Jean-Marc Spaggiari <je...@spaggiari.org> on 2012/12/08 04:32:50 UTC

Heterogeneous cluster

Hi,

Here is the situation.

I have an heterogeneous cluster with 2 cores CPUs, 4 cores CPUs and 8
cores CPUs servers. The performances of those different servers allow
them to handle different size of load. So far, I built a LoadBalancer
which balance the regions over those servers based on the
performances. And it’s working quite well. The RowCounter went down
from 11 minutes to 6 minutes. However, I can still see that the tasks
are run on some servers accessing data on other servers, which
overwhelme the bandwidth and slow done the process since some 2 cores
servers are assigned to count some rows hosted on 8 cores servers.

I’m looking for a way to “force” the tasks to run on the servers where
the regions are assigned.

I first tried to reject the tasks on the Mapper setup method when the
data was not local to see if the tracker will assign it to another
server. No. It’s just failing and mostly not re-assigned. I tried
IOExceptions, RuntimeExceptions, InterruptionExceptions with no
success.

So now I have 3 possible options.

The first one is to move from the MapReduce to the Coprocessor
EndPoint. Running locally on the RegionServer, it’s accessing only the
local data and I can manually reject all what is not local. Therefor
it’s achieving my needs, but it’s not my preferred options since I
would like to keep the MR features.

The second option is to tell Hadoop where the tasks should be
assigned. Should that be done by HBase? By Hadoop? I don’t know.
Where? I don’t know either. I have started to look at JobTracker and
JobInProgress code but it seems it will be a big task. Also, doing
that will mean I will have to re-patch the distributed code each time
I’m upgrading the version, and I will have to redo everything when I
will move from 1.0.x to 2.x…

Third option is to not process the task if the data is not local. I
mean, on the map method, simply have a if (!local) return; right from
the beginning and just do nothing. This will not work for things like
RowCount since all the entries are required, but for some of my
usecases this might work where I don’t necessary need all the data to
be processed. I will not be efficient stlil the task will still scan
the entire region.

My preferred option is definitively the 2nd one, but it seems also to
be the most difficult one. The Third one is very easy to implement.
Need 2 lines to see if the data is local. But it’s not working for all
the scenarios, and is more like a dirty fix. The coprocessor option
might be doable too since I already have all the code for my MapReduce
jobs. So it might be an acceptable option.

I’m wondering if anyone already faced this situation and worked on
something, and if not, do you have any other ideas/options to propose,
or can someone point me to the right classes to look at to implement
the solution 2?

Thanks,

JM

Re: Heterogeneous cluster

Posted by Asaf Mesika <as...@gmail.com>.

So just to get this right: the class you have built is a custom Load
Balancer which replaces the default hbase load balancer implementation?

Sent from my iPhone

On 8 בדצמ 2012, at 05:33, Jean-Marc Spaggiari <je...@spaggiari.org>
wrote:

Hi,

Here is the situation.

I have an heterogeneous cluster with 2 cores CPUs, 4 cores CPUs and 8
cores CPUs servers. The performances of those different servers allow
them to handle different size of load. So far, I built a LoadBalancer
which balance the regions over those servers based on the
performances. And it’s working quite well. The RowCounter went down
from 11 minutes to 6 minutes. However, I can still see that the tasks
are run on some servers accessing data on other servers, which
overwhelme the bandwidth and slow done the process since some 2 cores
servers are assigned to count some rows hosted on 8 cores servers.

I’m looking for a way to “force” the tasks to run on the servers where
the regions are assigned.

I first tried to reject the tasks on the Mapper setup method when the
data was not local to see if the tracker will assign it to another
server. No. It’s just failing and mostly not re-assigned. I tried
IOExceptions, RuntimeExceptions, InterruptionExceptions with no
success.

So now I have 3 possible options.

The first one is to move from the MapReduce to the Coprocessor
EndPoint. Running locally on the RegionServer, it’s accessing only the
local data and I can manually reject all what is not local. Therefor
it’s achieving my needs, but it’s not my preferred options since I
would like to keep the MR features.

The second option is to tell Hadoop where the tasks should be
assigned. Should that be done by HBase? By Hadoop? I don’t know.
Where? I don’t know either. I have started to look at JobTracker and
JobInProgress code but it seems it will be a big task. Also, doing
that will mean I will have to re-patch the distributed code each time
I’m upgrading the version, and I will have to redo everything when I
will move from 1.0.x to 2.x…

Third option is to not process the task if the data is not local. I
mean, on the map method, simply have a if (!local) return; right from
the beginning and just do nothing. This will not work for things like
RowCount since all the entries are required, but for some of my
usecases this might work where I don’t necessary need all the data to
be processed. I will not be efficient stlil the task will still scan
the entire region.

My preferred option is definitively the 2nd one, but it seems also to
be the most difficult one. The Third one is very easy to implement.
Need 2 lines to see if the data is local. But it’s not working for all
the scenarios, and is more like a dirty fix. The coprocessor option
might be doable too since I already have all the code for my MapReduce
jobs. So it might be an acceptable option.

I’m wondering if anyone already faced this situation and worked on
something, and if not, do you have any other ideas/options to propose,
or can someone point me to the right classes to look at to implement
the solution 2?

Thanks,

JM

Re: Heterogeneous cluster

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

Hi Mike,

I totally agree with you. The balancer I have done is more a hack than
a production version. I built my cluster by taking all the computers I
found around me. From 1 core P4 to 8 cores CPU. I have 8 nodes + 3 ZK.
They are all SO diffrent that "normal" load balancing was not very
efficient. They are all on the same rack (hub), so the "machine, rack,
second rack" distribution is not really working for me.

That's why I built this hack.

When I will have enought nodes to have 2 or 3 racks, I will most
probably go back to the DefaultLoadBalancer.

Just to give you an example, here is how one of my tables is now balanced:

Regions by Region Server
Region Server	Region Count
http://node4:60030/ 	11
http://phenom:60030/ 	37
http://node5:60030/ 	3
http://node2:60030/ 	11
http://node3:60030/ 	55
http://node1:60030/ 	4
http://node6:60030/ 	8

Also, can someone confirm the recommanded number of tasks per server?
I think I saw something like CPU * 0,7. Is that correct?

JM

2012/12/8, Robert Dyer <rd...@iastate.edu>:
> I of course can not speak for Jean-Marc, however my use case is not very
> corporate.  It is a small cluster (9 nodes) and only 1 of those nodes is
> different (drastically different).
>
> And yes, I configured it so that node has a lot more map slots.  However,
> the problem is HBase balances without regard to that and thus even though
> more map tasks run on those nodes they are not data-local!  If I have a
> balancer that is able to keep more regions on that particular node, then
> the data locality of my map tasks is improved.
>
>
> On Sat, Dec 8, 2012 at 5:45 PM, Michael Segel
> <mi...@hotmail.com>wrote:
>
>> Take what I say with a grain of kosher salt. (Its what they put on your
>> drink glasses because the grains are bigger. ;-)
>>
>> I think what you are doing is cool hack, however in the bigger picture,
>> you shouldn't have to do this with your load balancer. Also it doesn't
>> matter if you think about ti.
>>
>> With a heterogenous cluster, you will not share the same configuration
>> across all machines in the cluster. You will change the number of slots
>> per
>> node based on its capacity.
>> That will limit what amount of work could be done on the same cluster.
>>
>> You could also consider playing with the rack aware aspects of your
>> cluster.
>> You could make all of your 2CPU machines in the same rack.
>>
>> In theory... machine, rack , second rack is how the data is distributed.
>> In theory if the 2CPU cores are neighbors, then the 2nd and or 3rd copy
>> goes to another machine.
>>
>> Trying to write a custom balancer, may be a good hack, but not good in
>> terms of corporate life.
>>
>> Just saying!
>>
>> -Mike
>>
>> On Dec 8, 2012, at 1:34 PM, Jean-Marc Spaggiari <je...@spaggiari.org>
>> wrote:
>>
>> > Hi,
>> >
>> > It's not yet available anywhere. I will post it today or tomorrow,
>> > just the time to remove some hardcoding I did into it ;) It's a quick
>> > and dirty PerformanceBalancer. It's not a CPULoadBalencer.
>> >
>> > Anyway, I will give more details over the week-end, but there is
>> > absolutly nothing extraordinaire with it.
>> >
>> > JM
>> >
>> > 2012/12/8, Robert Dyer <rd...@iastate.edu>:
>> >> I too am interested in this custom load balancer, as I was actually
>> >> just
>> >> starting to look into writing one that does the same thing for
>> >> my heterogeneous cluster!
>> >>
>> >> Is this available somewhere?
>> >>
>> >> On Sat, Dec 8, 2012 at 9:17 AM, James Chang <ja...@gmail.com>
>> >> wrote:
>> >>
>> >>>     By the way, I saw you mentioned that you
>> >>> have built a "LoadBalancer", could you kindly
>> >>> share some detailed info about it?
>> >>>
>> >>> Jean-Marc Spaggiari 於 2012年12月8日星期六寫道：
>> >>>
>> >>>> Hi,
>> >>>>
>> >>>> Here is the situation.
>> >>>>
>> >>>> I have an heterogeneous cluster with 2 cores CPUs, 4 cores CPUs and
>> >>>> 8
>> >>>> cores CPUs servers. The performances of those different servers
>> >>>> allow
>> >>>> them to handle different size of load. So far, I built a
>> >>>> LoadBalancer
>> >>>> which balance the regions over those servers based on the
>> >>>> performances. And it’s working quite well. The RowCounter went down
>> >>>> from 11 minutes to 6 minutes. However, I can still see that the
>> >>>> tasks
>> >>>> are run on some servers accessing data on other servers, which
>> >>>> overwhelme the bandwidth and slow done the process since some 2
>> >>>> cores
>> >>>> servers are assigned to count some rows hosted on 8 cores servers.
>> >>>>
>> >>>> I’m looking for a way to “force” the tasks to run on the servers
>> >>>> where
>> >>>> the regions are assigned.
>> >>>>
>> >>>> I first tried to reject the tasks on the Mapper setup method when
>> >>>> the
>> >>>> data was not local to see if the tracker will assign it to another
>> >>>> server. No. It’s just failing and mostly not re-assigned. I tried
>> >>>> IOExceptions, RuntimeExceptions, InterruptionExceptions with no
>> >>>> success.
>> >>>>
>> >>>> So now I have 3 possible options.
>> >>>>
>> >>>> The first one is to move from the MapReduce to the Coprocessor
>> >>>> EndPoint. Running locally on the RegionServer, it’s accessing only
>> >>>> the
>> >>>> local data and I can manually reject all what is not local. Therefor
>> >>>> it’s achieving my needs, but it’s not my preferred options since I
>> >>>> would like to keep the MR features.
>> >>>>
>> >>>> The second option is to tell Hadoop where the tasks should be
>> >>>> assigned. Should that be done by HBase? By Hadoop? I don’t know.
>> >>>> Where? I don’t know either. I have started to look at JobTracker and
>> >>>> JobInProgress code but it seems it will be a big task. Also, doing
>> >>>> that will mean I will have to re-patch the distributed code each
>> >>>> time
>> >>>> I’m upgrading the version, and I will have to redo everything when I
>> >>>> will move from 1.0.x to 2.x…
>> >>>>
>> >>>> Third option is to not process the task if the data is not local. I
>> >>>> mean, on the map method, simply have a if (!local) return; right
>> >>>> from
>> >>>> the beginning and just do nothing. This will not work for things
>> >>>> like
>> >>>> RowCount since all the entries are required, but for some of my
>> >>>> usecases this might work where I don’t necessary need all the data
>> >>>> to
>> >>>> be processed. I will not be efficient stlil the task will still scan
>> >>>> the entire region.
>> >>>>
>> >>>> My preferred option is definitively the 2nd one, but it seems also
>> >>>> to
>> >>>> be the most difficult one. The Third one is very easy to implement.
>> >>>> Need 2 lines to see if the data is local. But it’s not working for
>> >>>> all
>> >>>> the scenarios, and is more like a dirty fix. The coprocessor option
>> >>>> might be doable too since I already have all the code for my
>> >>>> MapReduce
>> >>>> jobs. So it might be an acceptable option.
>> >>>>
>> >>>> I’m wondering if anyone already faced this situation and worked on
>> >>>> something, and if not, do you have any other ideas/options to
>> >>>> propose,
>> >>>> or can someone point me to the right classes to look at to implement
>> >>>> the solution 2?
>> >>>>
>> >>>> Thanks,
>> >>>>
>> >>>> JM
>> >>>>
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >>
>> >> Robert Dyer
>> >> rdyer@iastate.edu
>> >>
>> >
>>
>>
>
>
> --
>
> Robert Dyer
> rdyer@iastate.edu
>

RE: Heterogeneous cluster

Posted by Anoop Sam John <an...@huawei.com>.

Hi Jean
     Hope you are clear from Harsh's reply.. Thanks Harsh
Pls always keep in mind the 2 layers HBase and under that the HDFS layer where the data actually lies. When u do read HBase tables via MR, the read happens from regions not directly from stored HFiles.  So yes if the job for region1 is running in N2 and region1 is in N1 there will be an RPC to N1 and the DFS client in      N1 in turn may read the data from N1.. So even if the data is replicated in N2 no data locality factor helping you here.

HDFS-2246 introduced the short circuit based read. You can get the detailed explanation of how and when all it will be useful from the below mentioned path.
Also may be better to configure the HBase handled checksum option for better perfromance if you are using 0.94.x version. [This will work only when the read is a short circuited local read]

-Anoop-
________________________________________
From: Harsh J [harsh@cloudera.com]
Sent: Wednesday, December 12, 2012 1:50 AM
To: user@hbase.apache.org
Subject: Re: Heterogeneous cluster

Hi,

On Wed, Dec 12, 2012 at 12:18 AM, Jean-Marc Spaggiari
<je...@spaggiari.org> wrote:
> Hi Anoop,
>
> Thanks for the clarification.
>
> So let's take one example.
>
> Let's say I have 4 nodes and a replication factor set to 3.
>
> I have a region hosted on N1, replicated on N2 and N3. Nothing about
> this region on N4.

The important bit is, pending further enhancements along this line,
"regions" are not replicated. Region's data is replicated on HDFS, but
a Region itself is not replicated. It is served from a single point
(where it is currently assigned). Region data read requests are done
via the RegionServer layer, not directly from DataNodes (from a client
POV).

> It's time to run a MR, and someone need to work on the given region.
> N1 is to busy, so region will be given to another node. Does it mean
> it will be given randomly between N2, N3 and N4?

HBase jobs submit with the split locations for each region being its
current assignee (at time of submission). This gives the "locality".

> If it's given to N4, it's missing an oportunity to get the data almost locally.

If your task gets assigned to any other node or if the region moves
after the job's begun, the data locality of the reads the regionserver
does may easily be affected, yes.

> Also, if the job is given to N2 or N3, are they going to remotly query
> the data over the network from N1? Or are they able to ready it from
> the replicate? Based on what you are saying, seems that they will
> retrieve it for N1. Is there not another oportunity to improve the
> process by reading from the replicated data and not from the master
> one?

As explained above, all reads go through the assigned regionserver. So
the concept of HDFS block replicas can't be applied here yet (I do
know enhancements around this are planned).

> When you are talking about "the short circuit read option", is  this
> something we need to enable as a property? Or it's more like a piece
> of code?

Its configs, and the speed-drug details are at
http://hbase.apache.org/book.html#perf.hdfs section "11.10.2.
Leveraging local data".

--
Harsh J

Re: Heterogeneous cluster

Posted by Harsh J <ha...@cloudera.com>.

Hi,

On Wed, Dec 12, 2012 at 12:18 AM, Jean-Marc Spaggiari
<je...@spaggiari.org> wrote:
> Hi Anoop,
>
> Thanks for the clarification.
>
> So let's take one example.
>
> Let's say I have 4 nodes and a replication factor set to 3.
>
> I have a region hosted on N1, replicated on N2 and N3. Nothing about
> this region on N4.

The important bit is, pending further enhancements along this line,
"regions" are not replicated. Region's data is replicated on HDFS, but
a Region itself is not replicated. It is served from a single point
(where it is currently assigned). Region data read requests are done
via the RegionServer layer, not directly from DataNodes (from a client
POV).

> It's time to run a MR, and someone need to work on the given region.
> N1 is to busy, so region will be given to another node. Does it mean
> it will be given randomly between N2, N3 and N4?

HBase jobs submit with the split locations for each region being its
current assignee (at time of submission). This gives the "locality".

> If it's given to N4, it's missing an oportunity to get the data almost locally.

If your task gets assigned to any other node or if the region moves
after the job's begun, the data locality of the reads the regionserver
does may easily be affected, yes.

> Also, if the job is given to N2 or N3, are they going to remotly query
> the data over the network from N1? Or are they able to ready it from
> the replicate? Based on what you are saying, seems that they will
> retrieve it for N1. Is there not another oportunity to improve the
> process by reading from the replicated data and not from the master
> one?

As explained above, all reads go through the assigned regionserver. So
the concept of HDFS block replicas can't be applied here yet (I do
know enhancements around this are planned).

> When you are talking about "the short circuit read option", is  this
> something we need to enable as a property? Or it's more like a piece
> of code?

Its configs, and the speed-drug details are at
http://hbase.apache.org/book.html#perf.hdfs section "11.10.2.
Leveraging local data".

--
Harsh J

Re: Heterogeneous cluster

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

Hi Anoop,

Thanks for the clarification.

So let's take one example.

Let's say I have 4 nodes and a replication factor set to 3.

I have a region hosted on N1, replicated on N2 and N3. Nothing about
this region on N4.

It's time to run a MR, and someone need to work on the given region.
N1 is to busy, so region will be given to another node. Does it mean
it will be given randomly between N2, N3 and N4?

If it's given to N4, it's missing an oportunity to get the data almost locally.

Also, if the job is given to N2 or N3, are they going to remotly query
the data over the network from N1? Or are they able to ready it from
the replicate? Based on what you are saying, seems that they will
retrieve it for N1. Is there not another oportunity to improve the
process by reading from the replicated data and not from the master
one?

When you are talking about "the short circuit read option", is  this
something we need to enable as a property? Or it's more like a piece
of code?

JM

2012/12/10, Anoop Sam John <an...@huawei.com>:
>>But if the job is running there, it can also be
> considered as running locally, right? Or will it always be retrieved
> from the datanode linked to the RS hosting the region we are dealing
> with? Not sure I'm clear :(
>
> Hi Jean,
>                  Sorry I have not seen the history of this mailing thread.
> As far as seeing this question from you, I guess the MR is scanning HTable
> data, even if the job is running on a replicate I dont think it will be
> local. The MR job need to fetch the data via HBase only. Means it need to
> contact the RS hosting the region. Then in turn HBase will contact any of
> the DN where the data is available.  So it will be multiple steps.  There is
> nothing like one RS in some way linked to one DN. From which DN the data to
> be fetched depends on the decision taken by the DFS client. May be it will
> not contact any DN but will do a local read, if the short circuit read
> option is enabled and the data is there in the same server where the region
> is hosted..   I guess I make it clear here.  :)
>
> -Anoop-
>
> ________________________________________
> From: Jean-Marc Spaggiari [jean-marc@spaggiari.org]
> Sent: Monday, December 10, 2012 7:33 PM
> To: user@hbase.apache.org
> Subject: Re: Heterogeneous cluster
>
> @Asaf & Robert: I have posted the code here. But be careful with it.
> Read Mike's comment above.
> http://www.spaggiari.org/index.php/hbase/changing-the-hbase-default-loadbalancer
> I'm a newby on HBase, so you're better to rely on someone more
> experienced feedback.
>
> @Mike:
>
> Hi Mike,
>
> I totally agree with your opinion. My balancer is totally a hack on a
> 'Frankencluster' (BTW, I LOVE this description! Perfect fit!) and a
> way for me to take a deeper look at HBase's code.
>
> One question about data locality. When you run an HBase MR, even with
> a factor 3 replication, data is considered local only if it's running
> on the RS version the region is stored. But does HBase has a way to
> see if it can be run on any of the replicats? The replicate might be
> on a different rack. But if the job is running there, it can also be
> considered as running locally, right? Or will it always be retrieved
> from the datanode linked to the RS hosting the region we are dealing
> with? Not sure I'm clear :(
>
> JM
>
> 2012/12/9, Michael Segel <mi...@hotmail.com>:
>> Ok...
>>
>> From a production/commercial grade answer...
>>
>> With respect to HBase, you will have 1 live copy and 2 replications.
>> (Assuming you didn't change this.) So when you run against HBase, data
>> locality becomes less of an issue.
>> And again, you have to temper that with that it depends on the number of
>> regions within the table...
>>
>> A lot of people, including committers tend to get hung up on some of the
>> details and they tend to lose focus on the larger picture.
>>
>> If you were running a production cluster and your one node was radically
>> different... then you would be better off taking it out of the cluster
>> and
>> making it an edge node. (Edge nodes are very important...)
>>
>> If we're talking about a very large cluster which has evolved... then you
>> would want to work out your rack aware placements.  Note that rack aware
>> is
>> a logical and not a physical location. So you can modify it to let the
>> distro's placement take the hint and move the data.  This is more of a
>> cheat
>> and even here... I think that at scale, the potential improvement gains
>> are
>> going to be minimal.
>>
>> This works for everything but HBase.
>>
>> On that note, it doesn't matter. Again, assume that you have your data
>> equally distributed around the cluster and that your access pattern is to
>> all nodes in the cluster.  The parallelization in the cluster will
>> average
>> out the slow ones.
>>
>> In terms of your small research clusters...
>>
>> You're not looking at performance when you build a 'Frankencluster'
>>
>> Specifically to your case... move all the data to that node and you end
>> up
>> with both a networking and disk i/o bottlenecks.
>>
>> You're worried about the noise.
>>
>> Having said that...
>>
>> If you want to improve the balancer code, sure, however, you're going to
>> need to do some work where you capture your cluster's statistics so that
>> the
>> balancer has more intelligence.
>>
>> You may start off wanting to allow HBase to take hints about the cluster,
>> but in truth, I don't think its a good idea. Note, I realize that you and
>> Jean-Marc are not suggesting that it is your intent to add something like
>> this, but that someone will create a JIRA and then someone else may act
>> upon
>> it....
>>
>> IMHO, that's a lot of work, adding intelligence to the HBase Scheduler and
>> I
>> don't think it will really make a difference in terms of overall
>> performance.
>>
>>
>> Just saying...
>>
>> -Mike
>>
>> On Dec 8, 2012, at 5:50 PM, Robert Dyer <rd...@iastate.edu> wrote:
>>
>>> I of course can not speak for Jean-Marc, however my use case is not very
>>> corporate.  It is a small cluster (9 nodes) and only 1 of those nodes is
>>> different (drastically different).
>>>
>>> And yes, I configured it so that node has a lot more map slots.
>>> However,
>>> the problem is HBase balances without regard to that and thus even
>>> though
>>> more map tasks run on those nodes they are not data-local!  If I have a
>>> balancer that is able to keep more regions on that particular node, then
>>> the data locality of my map tasks is improved.
>>>
>>>
>>> On Sat, Dec 8, 2012 at 5:45 PM, Michael Segel
>>> <mi...@hotmail.com>wrote:
>>>
>>>> Take what I say with a grain of kosher salt. (Its what they put on your
>>>> drink glasses because the grains are bigger. ;-)
>>>>
>>>> I think what you are doing is cool hack, however in the bigger picture,
>>>> you shouldn't have to do this with your load balancer. Also it doesn't
>>>> matter if you think about ti.
>>>>
>>>> With a heterogenous cluster, you will not share the same configuration
>>>> across all machines in the cluster. You will change the number of slots
>>>> per
>>>> node based on its capacity.
>>>> That will limit what amount of work could be done on the same cluster.
>>>>
>>>> You could also consider playing with the rack aware aspects of your
>>>> cluster.
>>>> You could make all of your 2CPU machines in the same rack.
>>>>
>>>> In theory... machine, rack , second rack is how the data is
>>>> distributed.
>>>> In theory if the 2CPU cores are neighbors, then the 2nd and or 3rd copy
>>>> goes to another machine.
>>>>
>>>> Trying to write a custom balancer, may be a good hack, but not good in
>>>> terms of corporate life.
>>>>
>>>> Just saying!
>>>>
>>>> -Mike
>>>>
>>>> On Dec 8, 2012, at 1:34 PM, Jean-Marc Spaggiari
>>>> <je...@spaggiari.org>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> It's not yet available anywhere. I will post it today or tomorrow,
>>>>> just the time to remove some hardcoding I did into it ;) It's a quick
>>>>> and dirty PerformanceBalancer. It's not a CPULoadBalencer.
>>>>>
>>>>> Anyway, I will give more details over the week-end, but there is
>>>>> absolutly nothing extraordinaire with it.
>>>>>
>>>>> JM
>>>>>
>>>>> 2012/12/8, Robert Dyer <rd...@iastate.edu>:
>>>>>> I too am interested in this custom load balancer, as I was actually
>>>>>> just
>>>>>> starting to look into writing one that does the same thing for
>>>>>> my heterogeneous cluster!
>>>>>>
>>>>>> Is this available somewhere?
>>>>>>
>>>>>> On Sat, Dec 8, 2012 at 9:17 AM, James Chang <ja...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>>    By the way, I saw you mentioned that you
>>>>>>> have built a "LoadBalancer", could you kindly
>>>>>>> share some detailed info about it?
>>>>>>>
>>>>>>> Jean-Marc Spaggiari 於 2012年12月8日星期六寫道：
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Here is the situation.
>>>>>>>>
>>>>>>>> I have an heterogeneous cluster with 2 cores CPUs, 4 cores CPUs and
>>>>>>>> 8
>>>>>>>> cores CPUs servers. The performances of those different servers
>>>>>>>> allow
>>>>>>>> them to handle different size of load. So far, I built a
>>>>>>>> LoadBalancer
>>>>>>>> which balance the regions over those servers based on the
>>>>>>>> performances. And it’s working quite well. The RowCounter went down
>>>>>>>> from 11 minutes to 6 minutes. However, I can still see that the
>>>>>>>> tasks
>>>>>>>> are run on some servers accessing data on other servers, which
>>>>>>>> overwhelme the bandwidth and slow done the process since some 2
>>>>>>>> cores
>>>>>>>> servers are assigned to count some rows hosted on 8 cores servers.
>>>>>>>>
>>>>>>>> I’m looking for a way to “force” the tasks to run on the servers
>>>>>>>> where
>>>>>>>> the regions are assigned.
>>>>>>>>
>>>>>>>> I first tried to reject the tasks on the Mapper setup method when
>>>>>>>> the
>>>>>>>> data was not local to see if the tracker will assign it to another
>>>>>>>> server. No. It’s just failing and mostly not re-assigned. I tried
>>>>>>>> IOExceptions, RuntimeExceptions, InterruptionExceptions with no
>>>>>>>> success.
>>>>>>>>
>>>>>>>> So now I have 3 possible options.
>>>>>>>>
>>>>>>>> The first one is to move from the MapReduce to the Coprocessor
>>>>>>>> EndPoint. Running locally on the RegionServer, it’s accessing only
>>>>>>>> the
>>>>>>>> local data and I can manually reject all what is not local.
>>>>>>>> Therefor
>>>>>>>> it’s achieving my needs, but it’s not my preferred options since I
>>>>>>>> would like to keep the MR features.
>>>>>>>>
>>>>>>>> The second option is to tell Hadoop where the tasks should be
>>>>>>>> assigned. Should that be done by HBase? By Hadoop? I don’t know.
>>>>>>>> Where? I don’t know either. I have started to look at JobTracker
>>>>>>>> and
>>>>>>>> JobInProgress code but it seems it will be a big task. Also, doing
>>>>>>>> that will mean I will have to re-patch the distributed code each
>>>>>>>> time
>>>>>>>> I’m upgrading the version, and I will have to redo everything when
>>>>>>>> I
>>>>>>>> will move from 1.0.x to 2.x…
>>>>>>>>
>>>>>>>> Third option is to not process the task if the data is not local. I
>>>>>>>> mean, on the map method, simply have a if (!local) return; right
>>>>>>>> from
>>>>>>>> the beginning and just do nothing. This will not work for things
>>>>>>>> like
>>>>>>>> RowCount since all the entries are required, but for some of my
>>>>>>>> usecases this might work where I don’t necessary need all the data
>>>>>>>> to
>>>>>>>> be processed. I will not be efficient stlil the task will still
>>>>>>>> scan
>>>>>>>> the entire region.
>>>>>>>>
>>>>>>>> My preferred option is definitively the 2nd one, but it seems also
>>>>>>>> to
>>>>>>>> be the most difficult one. The Third one is very easy to implement.
>>>>>>>> Need 2 lines to see if the data is local. But it’s not working for
>>>>>>>> all
>>>>>>>> the scenarios, and is more like a dirty fix. The coprocessor option
>>>>>>>> might be doable too since I already have all the code for my
>>>>>>>> MapReduce
>>>>>>>> jobs. So it might be an acceptable option.
>>>>>>>>
>>>>>>>> I’m wondering if anyone already faced this situation and worked on
>>>>>>>> something, and if not, do you have any other ideas/options to
>>>>>>>> propose,
>>>>>>>> or can someone point me to the right classes to look at to
>>>>>>>> implement
>>>>>>>> the solution 2?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> JM
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Robert Dyer
>>>>>> rdyer@iastate.edu
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> Robert Dyer
>>> rdyer@iastate.edu
>>
>>

RE: Heterogeneous cluster

Posted by Anoop Sam John <an...@huawei.com>.

>But if the job is running there, it can also be
considered as running locally, right? Or will it always be retrieved
from the datanode linked to the RS hosting the region we are dealing
with? Not sure I'm clear :(

Hi Jean,
                 Sorry I have not seen the history of this mailing thread. As far as seeing this question from you, I guess the MR is scanning HTable data, even if the job is running on a replicate I dont think it will be local. The MR job need to fetch the data via HBase only. Means it need to contact the RS hosting the region. Then in turn HBase will contact any of the DN where the data is available.  So it will be multiple steps.  There is nothing like one RS in some way linked to one DN. From which DN the data to be fetched depends on the decision taken by the DFS client. May be it will not contact any DN but will do a local read, if the short circuit read option is enabled and the data is there in the same server where the region is hosted..   I guess I make it clear here.  :)

-Anoop-

________________________________________
From: Jean-Marc Spaggiari [jean-marc@spaggiari.org]
Sent: Monday, December 10, 2012 7:33 PM
To: user@hbase.apache.org
Subject: Re: Heterogeneous cluster

@Asaf & Robert: I have posted the code here. But be careful with it.
Read Mike's comment above.
http://www.spaggiari.org/index.php/hbase/changing-the-hbase-default-loadbalancer
I'm a newby on HBase, so you're better to rely on someone more
experienced feedback.

@Mike:

Hi Mike,

I totally agree with your opinion. My balancer is totally a hack on a
'Frankencluster' (BTW, I LOVE this description! Perfect fit!) and a
way for me to take a deeper look at HBase's code.

One question about data locality. When you run an HBase MR, even with
a factor 3 replication, data is considered local only if it's running
on the RS version the region is stored. But does HBase has a way to
see if it can be run on any of the replicats? The replicate might be
on a different rack. But if the job is running there, it can also be
considered as running locally, right? Or will it always be retrieved
from the datanode linked to the RS hosting the region we are dealing
with? Not sure I'm clear :(

JM

2012/12/9, Michael Segel <mi...@hotmail.com>:
> Ok...
>
> From a production/commercial grade answer...
>
> With respect to HBase, you will have 1 live copy and 2 replications.
> (Assuming you didn't change this.) So when you run against HBase, data
> locality becomes less of an issue.
> And again, you have to temper that with that it depends on the number of
> regions within the table...
>
> A lot of people, including committers tend to get hung up on some of the
> details and they tend to lose focus on the larger picture.
>
> If you were running a production cluster and your one node was radically
> different... then you would be better off taking it out of the cluster and
> making it an edge node. (Edge nodes are very important...)
>
> If we're talking about a very large cluster which has evolved... then you
> would want to work out your rack aware placements.  Note that rack aware is
> a logical and not a physical location. So you can modify it to let the
> distro's placement take the hint and move the data.  This is more of a cheat
> and even here... I think that at scale, the potential improvement gains are
> going to be minimal.
>
> This works for everything but HBase.
>
> On that note, it doesn't matter. Again, assume that you have your data
> equally distributed around the cluster and that your access pattern is to
> all nodes in the cluster.  The parallelization in the cluster will average
> out the slow ones.
>
> In terms of your small research clusters...
>
> You're not looking at performance when you build a 'Frankencluster'
>
> Specifically to your case... move all the data to that node and you end up
> with both a networking and disk i/o bottlenecks.
>
> You're worried about the noise.
>
> Having said that...
>
> If you want to improve the balancer code, sure, however, you're going to
> need to do some work where you capture your cluster's statistics so that the
> balancer has more intelligence.
>
> You may start off wanting to allow HBase to take hints about the cluster,
> but in truth, I don't think its a good idea. Note, I realize that you and
> Jean-Marc are not suggesting that it is your intent to add something like
> this, but that someone will create a JIRA and then someone else may act upon
> it....
>
> IMHO, that's a lot of work, adding intelligence to the HBase Scheduler and I
> don't think it will really make a difference in terms of overall
> performance.
>
>
> Just saying...
>
> -Mike
>
> On Dec 8, 2012, at 5:50 PM, Robert Dyer <rd...@iastate.edu> wrote:
>
>> I of course can not speak for Jean-Marc, however my use case is not very
>> corporate.  It is a small cluster (9 nodes) and only 1 of those nodes is
>> different (drastically different).
>>
>> And yes, I configured it so that node has a lot more map slots.  However,
>> the problem is HBase balances without regard to that and thus even though
>> more map tasks run on those nodes they are not data-local!  If I have a
>> balancer that is able to keep more regions on that particular node, then
>> the data locality of my map tasks is improved.
>>
>>
>> On Sat, Dec 8, 2012 at 5:45 PM, Michael Segel
>> <mi...@hotmail.com>wrote:
>>
>>> Take what I say with a grain of kosher salt. (Its what they put on your
>>> drink glasses because the grains are bigger. ;-)
>>>
>>> I think what you are doing is cool hack, however in the bigger picture,
>>> you shouldn't have to do this with your load balancer. Also it doesn't
>>> matter if you think about ti.
>>>
>>> With a heterogenous cluster, you will not share the same configuration
>>> across all machines in the cluster. You will change the number of slots
>>> per
>>> node based on its capacity.
>>> That will limit what amount of work could be done on the same cluster.
>>>
>>> You could also consider playing with the rack aware aspects of your
>>> cluster.
>>> You could make all of your 2CPU machines in the same rack.
>>>
>>> In theory... machine, rack , second rack is how the data is distributed.
>>> In theory if the 2CPU cores are neighbors, then the 2nd and or 3rd copy
>>> goes to another machine.
>>>
>>> Trying to write a custom balancer, may be a good hack, but not good in
>>> terms of corporate life.
>>>
>>> Just saying!
>>>
>>> -Mike
>>>
>>> On Dec 8, 2012, at 1:34 PM, Jean-Marc Spaggiari
>>> <je...@spaggiari.org>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> It's not yet available anywhere. I will post it today or tomorrow,
>>>> just the time to remove some hardcoding I did into it ;) It's a quick
>>>> and dirty PerformanceBalancer. It's not a CPULoadBalencer.
>>>>
>>>> Anyway, I will give more details over the week-end, but there is
>>>> absolutly nothing extraordinaire with it.
>>>>
>>>> JM
>>>>
>>>> 2012/12/8, Robert Dyer <rd...@iastate.edu>:
>>>>> I too am interested in this custom load balancer, as I was actually
>>>>> just
>>>>> starting to look into writing one that does the same thing for
>>>>> my heterogeneous cluster!
>>>>>
>>>>> Is this available somewhere?
>>>>>
>>>>> On Sat, Dec 8, 2012 at 9:17 AM, James Chang <ja...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>>    By the way, I saw you mentioned that you
>>>>>> have built a "LoadBalancer", could you kindly
>>>>>> share some detailed info about it?
>>>>>>
>>>>>> Jean-Marc Spaggiari � 2012��12��8��������������
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Here is the situation.
>>>>>>>
>>>>>>> I have an heterogeneous cluster with 2 cores CPUs, 4 cores CPUs and
>>>>>>> 8
>>>>>>> cores CPUs servers. The performances of those different servers
>>>>>>> allow
>>>>>>> them to handle different size of load. So far, I built a
>>>>>>> LoadBalancer
>>>>>>> which balance the regions over those servers based on the
>>>>>>> performances. And it��s working quite well. The RowCounter went down
>>>>>>> from 11 minutes to 6 minutes. However, I can still see that the
>>>>>>> tasks
>>>>>>> are run on some servers accessing data on other servers, which
>>>>>>> overwhelme the bandwidth and slow done the process since some 2
>>>>>>> cores
>>>>>>> servers are assigned to count some rows hosted on 8 cores servers.
>>>>>>>
>>>>>>> I��m looking for a way to ��force�� the tasks to run on the servers
>>>>>>> where
>>>>>>> the regions are assigned.
>>>>>>>
>>>>>>> I first tried to reject the tasks on the Mapper setup method when
>>>>>>> the
>>>>>>> data was not local to see if the tracker will assign it to another
>>>>>>> server. No. It��s just failing and mostly not re-assigned. I tried
>>>>>>> IOExceptions, RuntimeExceptions, InterruptionExceptions with no
>>>>>>> success.
>>>>>>>
>>>>>>> So now I have 3 possible options.
>>>>>>>
>>>>>>> The first one is to move from the MapReduce to the Coprocessor
>>>>>>> EndPoint. Running locally on the RegionServer, it��s accessing only
>>>>>>> the
>>>>>>> local data and I can manually reject all what is not local. Therefor
>>>>>>> it��s achieving my needs, but it��s not my preferred options since I
>>>>>>> would like to keep the MR features.
>>>>>>>
>>>>>>> The second option is to tell Hadoop where the tasks should be
>>>>>>> assigned. Should that be done by HBase? By Hadoop? I don��t know.
>>>>>>> Where? I don��t know either. I have started to look at JobTracker and
>>>>>>> JobInProgress code but it seems it will be a big task. Also, doing
>>>>>>> that will mean I will have to re-patch the distributed code each
>>>>>>> time
>>>>>>> I��m upgrading the version, and I will have to redo everything when I
>>>>>>> will move from 1.0.x to 2.x��
>>>>>>>
>>>>>>> Third option is to not process the task if the data is not local. I
>>>>>>> mean, on the map method, simply have a if (!local) return; right
>>>>>>> from
>>>>>>> the beginning and just do nothing. This will not work for things
>>>>>>> like
>>>>>>> RowCount since all the entries are required, but for some of my
>>>>>>> usecases this might work where I don��t necessary need all the data
>>>>>>> to
>>>>>>> be processed. I will not be efficient stlil the task will still scan
>>>>>>> the entire region.
>>>>>>>
>>>>>>> My preferred option is definitively the 2nd one, but it seems also
>>>>>>> to
>>>>>>> be the most difficult one. The Third one is very easy to implement.
>>>>>>> Need 2 lines to see if the data is local. But it��s not working for
>>>>>>> all
>>>>>>> the scenarios, and is more like a dirty fix. The coprocessor option
>>>>>>> might be doable too since I already have all the code for my
>>>>>>> MapReduce
>>>>>>> jobs. So it might be an acceptable option.
>>>>>>>
>>>>>>> I��m wondering if anyone already faced this situation and worked on
>>>>>>> something, and if not, do you have any other ideas/options to
>>>>>>> propose,
>>>>>>> or can someone point me to the right classes to look at to implement
>>>>>>> the solution 2?
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> JM
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Robert Dyer
>>>>> rdyer@iastate.edu
>>>>>
>>>>
>>>
>>>
>>
>>
>> --
>>
>> Robert Dyer
>> rdyer@iastate.edu
>
>

Re: Heterogeneous cluster

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

@Asaf & Robert: I have posted the code here. But be careful with it.
Read Mike's comment above.
http://www.spaggiari.org/index.php/hbase/changing-the-hbase-default-loadbalancer
I'm a newby on HBase, so you're better to rely on someone more
experienced feedback.

@Mike:

Hi Mike,

I totally agree with your opinion. My balancer is totally a hack on a
'Frankencluster' (BTW, I LOVE this description! Perfect fit!) and a
way for me to take a deeper look at HBase's code.

One question about data locality. When you run an HBase MR, even with
a factor 3 replication, data is considered local only if it's running
on the RS version the region is stored. But does HBase has a way to
see if it can be run on any of the replicats? The replicate might be
on a different rack. But if the job is running there, it can also be
considered as running locally, right? Or will it always be retrieved
from the datanode linked to the RS hosting the region we are dealing
with? Not sure I'm clear :(

JM

2012/12/9, Michael Segel <mi...@hotmail.com>:
> Ok...
>
> From a production/commercial grade answer...
>
> With respect to HBase, you will have 1 live copy and 2 replications.
> (Assuming you didn't change this.) So when you run against HBase, data
> locality becomes less of an issue.
> And again, you have to temper that with that it depends on the number of
> regions within the table...
>
> A lot of people, including committers tend to get hung up on some of the
> details and they tend to lose focus on the larger picture.
>
> If you were running a production cluster and your one node was radically
> different... then you would be better off taking it out of the cluster and
> making it an edge node. (Edge nodes are very important...)
>
> If we're talking about a very large cluster which has evolved... then you
> would want to work out your rack aware placements.  Note that rack aware is
> a logical and not a physical location. So you can modify it to let the
> distro's placement take the hint and move the data.  This is more of a cheat
> and even here... I think that at scale, the potential improvement gains are
> going to be minimal.
>
> This works for everything but HBase.
>
> On that note, it doesn't matter. Again, assume that you have your data
> equally distributed around the cluster and that your access pattern is to
> all nodes in the cluster.  The parallelization in the cluster will average
> out the slow ones.
>
> In terms of your small research clusters...
>
> You're not looking at performance when you build a 'Frankencluster'
>
> Specifically to your case... move all the data to that node and you end up
> with both a networking and disk i/o bottlenecks.
>
> You're worried about the noise.
>
> Having said that...
>
> If you want to improve the balancer code, sure, however, you're going to
> need to do some work where you capture your cluster's statistics so that the
> balancer has more intelligence.
>
> You may start off wanting to allow HBase to take hints about the cluster,
> but in truth, I don't think its a good idea. Note, I realize that you and
> Jean-Marc are not suggesting that it is your intent to add something like
> this, but that someone will create a JIRA and then someone else may act upon
> it....
>
> IMHO, that's a lot of work, adding intelligence to the HBase Scheduler and I
> don't think it will really make a difference in terms of overall
> performance.
>
>
> Just saying...
>
> -Mike
>
> On Dec 8, 2012, at 5:50 PM, Robert Dyer <rd...@iastate.edu> wrote:
>
>> I of course can not speak for Jean-Marc, however my use case is not very
>> corporate.  It is a small cluster (9 nodes) and only 1 of those nodes is
>> different (drastically different).
>>
>> And yes, I configured it so that node has a lot more map slots.  However,
>> the problem is HBase balances without regard to that and thus even though
>> more map tasks run on those nodes they are not data-local!  If I have a
>> balancer that is able to keep more regions on that particular node, then
>> the data locality of my map tasks is improved.
>>
>>
>> On Sat, Dec 8, 2012 at 5:45 PM, Michael Segel
>> <mi...@hotmail.com>wrote:
>>
>>> Take what I say with a grain of kosher salt. (Its what they put on your
>>> drink glasses because the grains are bigger. ;-)
>>>
>>> I think what you are doing is cool hack, however in the bigger picture,
>>> you shouldn't have to do this with your load balancer. Also it doesn't
>>> matter if you think about ti.
>>>
>>> With a heterogenous cluster, you will not share the same configuration
>>> across all machines in the cluster. You will change the number of slots
>>> per
>>> node based on its capacity.
>>> That will limit what amount of work could be done on the same cluster.
>>>
>>> You could also consider playing with the rack aware aspects of your
>>> cluster.
>>> You could make all of your 2CPU machines in the same rack.
>>>
>>> In theory... machine, rack , second rack is how the data is distributed.
>>> In theory if the 2CPU cores are neighbors, then the 2nd and or 3rd copy
>>> goes to another machine.
>>>
>>> Trying to write a custom balancer, may be a good hack, but not good in
>>> terms of corporate life.
>>>
>>> Just saying!
>>>
>>> -Mike
>>>
>>> On Dec 8, 2012, at 1:34 PM, Jean-Marc Spaggiari
>>> <je...@spaggiari.org>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> It's not yet available anywhere. I will post it today or tomorrow,
>>>> just the time to remove some hardcoding I did into it ;) It's a quick
>>>> and dirty PerformanceBalancer. It's not a CPULoadBalencer.
>>>>
>>>> Anyway, I will give more details over the week-end, but there is
>>>> absolutly nothing extraordinaire with it.
>>>>
>>>> JM
>>>>
>>>> 2012/12/8, Robert Dyer <rd...@iastate.edu>:
>>>>> I too am interested in this custom load balancer, as I was actually
>>>>> just
>>>>> starting to look into writing one that does the same thing for
>>>>> my heterogeneous cluster!
>>>>>
>>>>> Is this available somewhere?
>>>>>
>>>>> On Sat, Dec 8, 2012 at 9:17 AM, James Chang <ja...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>>    By the way, I saw you mentioned that you
>>>>>> have built a "LoadBalancer", could you kindly
>>>>>> share some detailed info about it?
>>>>>>
>>>>>> Jean-Marc Spaggiari 於 2012年12月8日星期六寫道：
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> Here is the situation.
>>>>>>>
>>>>>>> I have an heterogeneous cluster with 2 cores CPUs, 4 cores CPUs and
>>>>>>> 8
>>>>>>> cores CPUs servers. The performances of those different servers
>>>>>>> allow
>>>>>>> them to handle different size of load. So far, I built a
>>>>>>> LoadBalancer
>>>>>>> which balance the regions over those servers based on the
>>>>>>> performances. And it’s working quite well. The RowCounter went down
>>>>>>> from 11 minutes to 6 minutes. However, I can still see that the
>>>>>>> tasks
>>>>>>> are run on some servers accessing data on other servers, which
>>>>>>> overwhelme the bandwidth and slow done the process since some 2
>>>>>>> cores
>>>>>>> servers are assigned to count some rows hosted on 8 cores servers.
>>>>>>>
>>>>>>> I’m looking for a way to “force” the tasks to run on the servers
>>>>>>> where
>>>>>>> the regions are assigned.
>>>>>>>
>>>>>>> I first tried to reject the tasks on the Mapper setup method when
>>>>>>> the
>>>>>>> data was not local to see if the tracker will assign it to another
>>>>>>> server. No. It’s just failing and mostly not re-assigned. I tried
>>>>>>> IOExceptions, RuntimeExceptions, InterruptionExceptions with no
>>>>>>> success.
>>>>>>>
>>>>>>> So now I have 3 possible options.
>>>>>>>
>>>>>>> The first one is to move from the MapReduce to the Coprocessor
>>>>>>> EndPoint. Running locally on the RegionServer, it’s accessing only
>>>>>>> the
>>>>>>> local data and I can manually reject all what is not local. Therefor
>>>>>>> it’s achieving my needs, but it’s not my preferred options since I
>>>>>>> would like to keep the MR features.
>>>>>>>
>>>>>>> The second option is to tell Hadoop where the tasks should be
>>>>>>> assigned. Should that be done by HBase? By Hadoop? I don’t know.
>>>>>>> Where? I don’t know either. I have started to look at JobTracker and
>>>>>>> JobInProgress code but it seems it will be a big task. Also, doing
>>>>>>> that will mean I will have to re-patch the distributed code each
>>>>>>> time
>>>>>>> I’m upgrading the version, and I will have to redo everything when I
>>>>>>> will move from 1.0.x to 2.x…
>>>>>>>
>>>>>>> Third option is to not process the task if the data is not local. I
>>>>>>> mean, on the map method, simply have a if (!local) return; right
>>>>>>> from
>>>>>>> the beginning and just do nothing. This will not work for things
>>>>>>> like
>>>>>>> RowCount since all the entries are required, but for some of my
>>>>>>> usecases this might work where I don’t necessary need all the data
>>>>>>> to
>>>>>>> be processed. I will not be efficient stlil the task will still scan
>>>>>>> the entire region.
>>>>>>>
>>>>>>> My preferred option is definitively the 2nd one, but it seems also
>>>>>>> to
>>>>>>> be the most difficult one. The Third one is very easy to implement.
>>>>>>> Need 2 lines to see if the data is local. But it’s not working for
>>>>>>> all
>>>>>>> the scenarios, and is more like a dirty fix. The coprocessor option
>>>>>>> might be doable too since I already have all the code for my
>>>>>>> MapReduce
>>>>>>> jobs. So it might be an acceptable option.
>>>>>>>
>>>>>>> I’m wondering if anyone already faced this situation and worked on
>>>>>>> something, and if not, do you have any other ideas/options to
>>>>>>> propose,
>>>>>>> or can someone point me to the right classes to look at to implement
>>>>>>> the solution 2?
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> JM
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Robert Dyer
>>>>> rdyer@iastate.edu
>>>>>
>>>>
>>>
>>>
>>
>>
>> --
>>
>> Robert Dyer
>> rdyer@iastate.edu
>
>

Re: Heterogeneous cluster

Posted by Michael Segel <mi...@hotmail.com>.

Ok... 

From a production/commercial grade answer... 

With respect to HBase, you will have 1 live copy and 2 replications. (Assuming you didn't change this.) So when you run against HBase, data locality becomes less of an issue. 
And again, you have to temper that with that it depends on the number of regions within the table...

A lot of people, including committers tend to get hung up on some of the details and they tend to lose focus on the larger picture.  

If you were running a production cluster and your one node was radically different... then you would be better off taking it out of the cluster and making it an edge node. (Edge nodes are very important...)

If we're talking about a very large cluster which has evolved... then you would want to work out your rack aware placements.  Note that rack aware is a logical and not a physical location. So you can modify it to let the distro's placement take the hint and move the data.  This is more of a cheat and even here... I think that at scale, the potential improvement gains are going to be minimal. 

This works for everything but HBase. 

On that note, it doesn't matter. Again, assume that you have your data equally distributed around the cluster and that your access pattern is to all nodes in the cluster.  The parallelization in the cluster will average out the slow ones. 

In terms of your small research clusters... 

You're not looking at performance when you build a 'Frankencluster' 

Specifically to your case... move all the data to that node and you end up with both a networking and disk i/o bottlenecks. 

You're worried about the noise. 

Having said that... 

If you want to improve the balancer code, sure, however, you're going to need to do some work where you capture your cluster's statistics so that the balancer has more intelligence. 

You may start off wanting to allow HBase to take hints about the cluster, but in truth, I don't think its a good idea. Note, I realize that you and Jean-Marc are not suggesting that it is your intent to add something like this, but that someone will create a JIRA and then someone else may act upon it.... 

IMHO, that's a lot of work, adding intelligence to the HBase Scheduler and I don't think it will really make a difference in terms of overall performance. 


Just saying... 

-Mike

On Dec 8, 2012, at 5:50 PM, Robert Dyer <rd...@iastate.edu> wrote:

> I of course can not speak for Jean-Marc, however my use case is not very
> corporate.  It is a small cluster (9 nodes) and only 1 of those nodes is
> different (drastically different).
> 
> And yes, I configured it so that node has a lot more map slots.  However,
> the problem is HBase balances without regard to that and thus even though
> more map tasks run on those nodes they are not data-local!  If I have a
> balancer that is able to keep more regions on that particular node, then
> the data locality of my map tasks is improved.
> 
> 
> On Sat, Dec 8, 2012 at 5:45 PM, Michael Segel <mi...@hotmail.com>wrote:
> 
>> Take what I say with a grain of kosher salt. (Its what they put on your
>> drink glasses because the grains are bigger. ;-)
>> 
>> I think what you are doing is cool hack, however in the bigger picture,
>> you shouldn't have to do this with your load balancer. Also it doesn't
>> matter if you think about ti.
>> 
>> With a heterogenous cluster, you will not share the same configuration
>> across all machines in the cluster. You will change the number of slots per
>> node based on its capacity.
>> That will limit what amount of work could be done on the same cluster.
>> 
>> You could also consider playing with the rack aware aspects of your
>> cluster.
>> You could make all of your 2CPU machines in the same rack.
>> 
>> In theory... machine, rack , second rack is how the data is distributed.
>> In theory if the 2CPU cores are neighbors, then the 2nd and or 3rd copy
>> goes to another machine.
>> 
>> Trying to write a custom balancer, may be a good hack, but not good in
>> terms of corporate life.
>> 
>> Just saying!
>> 
>> -Mike
>> 
>> On Dec 8, 2012, at 1:34 PM, Jean-Marc Spaggiari <je...@spaggiari.org>
>> wrote:
>> 
>>> Hi,
>>> 
>>> It's not yet available anywhere. I will post it today or tomorrow,
>>> just the time to remove some hardcoding I did into it ;) It's a quick
>>> and dirty PerformanceBalancer. It's not a CPULoadBalencer.
>>> 
>>> Anyway, I will give more details over the week-end, but there is
>>> absolutly nothing extraordinaire with it.
>>> 
>>> JM
>>> 
>>> 2012/12/8, Robert Dyer <rd...@iastate.edu>:
>>>> I too am interested in this custom load balancer, as I was actually just
>>>> starting to look into writing one that does the same thing for
>>>> my heterogeneous cluster!
>>>> 
>>>> Is this available somewhere?
>>>> 
>>>> On Sat, Dec 8, 2012 at 9:17 AM, James Chang <ja...@gmail.com>
>>>> wrote:
>>>> 
>>>>>    By the way, I saw you mentioned that you
>>>>> have built a "LoadBalancer", could you kindly
>>>>> share some detailed info about it?
>>>>> 
>>>>> Jean-Marc Spaggiari 於 2012年12月8日星期六寫道：
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> Here is the situation.
>>>>>> 
>>>>>> I have an heterogeneous cluster with 2 cores CPUs, 4 cores CPUs and 8
>>>>>> cores CPUs servers. The performances of those different servers allow
>>>>>> them to handle different size of load. So far, I built a LoadBalancer
>>>>>> which balance the regions over those servers based on the
>>>>>> performances. And it’s working quite well. The RowCounter went down
>>>>>> from 11 minutes to 6 minutes. However, I can still see that the tasks
>>>>>> are run on some servers accessing data on other servers, which
>>>>>> overwhelme the bandwidth and slow done the process since some 2 cores
>>>>>> servers are assigned to count some rows hosted on 8 cores servers.
>>>>>> 
>>>>>> I’m looking for a way to “force” the tasks to run on the servers where
>>>>>> the regions are assigned.
>>>>>> 
>>>>>> I first tried to reject the tasks on the Mapper setup method when the
>>>>>> data was not local to see if the tracker will assign it to another
>>>>>> server. No. It’s just failing and mostly not re-assigned. I tried
>>>>>> IOExceptions, RuntimeExceptions, InterruptionExceptions with no
>>>>>> success.
>>>>>> 
>>>>>> So now I have 3 possible options.
>>>>>> 
>>>>>> The first one is to move from the MapReduce to the Coprocessor
>>>>>> EndPoint. Running locally on the RegionServer, it’s accessing only the
>>>>>> local data and I can manually reject all what is not local. Therefor
>>>>>> it’s achieving my needs, but it’s not my preferred options since I
>>>>>> would like to keep the MR features.
>>>>>> 
>>>>>> The second option is to tell Hadoop where the tasks should be
>>>>>> assigned. Should that be done by HBase? By Hadoop? I don’t know.
>>>>>> Where? I don’t know either. I have started to look at JobTracker and
>>>>>> JobInProgress code but it seems it will be a big task. Also, doing
>>>>>> that will mean I will have to re-patch the distributed code each time
>>>>>> I’m upgrading the version, and I will have to redo everything when I
>>>>>> will move from 1.0.x to 2.x…
>>>>>> 
>>>>>> Third option is to not process the task if the data is not local. I
>>>>>> mean, on the map method, simply have a if (!local) return; right from
>>>>>> the beginning and just do nothing. This will not work for things like
>>>>>> RowCount since all the entries are required, but for some of my
>>>>>> usecases this might work where I don’t necessary need all the data to
>>>>>> be processed. I will not be efficient stlil the task will still scan
>>>>>> the entire region.
>>>>>> 
>>>>>> My preferred option is definitively the 2nd one, but it seems also to
>>>>>> be the most difficult one. The Third one is very easy to implement.
>>>>>> Need 2 lines to see if the data is local. But it’s not working for all
>>>>>> the scenarios, and is more like a dirty fix. The coprocessor option
>>>>>> might be doable too since I already have all the code for my MapReduce
>>>>>> jobs. So it might be an acceptable option.
>>>>>> 
>>>>>> I’m wondering if anyone already faced this situation and worked on
>>>>>> something, and if not, do you have any other ideas/options to propose,
>>>>>> or can someone point me to the right classes to look at to implement
>>>>>> the solution 2?
>>>>>> 
>>>>>> Thanks,
>>>>>> 
>>>>>> JM
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> 
>>>> Robert Dyer
>>>> rdyer@iastate.edu
>>>> 
>>> 
>> 
>> 
> 
> 
> -- 
> 
> Robert Dyer
> rdyer@iastate.edu

Re: Heterogeneous cluster

Posted by Robert Dyer <rd...@iastate.edu>.

I of course can not speak for Jean-Marc, however my use case is not very
corporate.  It is a small cluster (9 nodes) and only 1 of those nodes is
different (drastically different).

And yes, I configured it so that node has a lot more map slots.  However,
the problem is HBase balances without regard to that and thus even though
more map tasks run on those nodes they are not data-local!  If I have a
balancer that is able to keep more regions on that particular node, then
the data locality of my map tasks is improved.


On Sat, Dec 8, 2012 at 5:45 PM, Michael Segel <mi...@hotmail.com>wrote:

> Take what I say with a grain of kosher salt. (Its what they put on your
> drink glasses because the grains are bigger. ;-)
>
> I think what you are doing is cool hack, however in the bigger picture,
> you shouldn't have to do this with your load balancer. Also it doesn't
> matter if you think about ti.
>
> With a heterogenous cluster, you will not share the same configuration
> across all machines in the cluster. You will change the number of slots per
> node based on its capacity.
> That will limit what amount of work could be done on the same cluster.
>
> You could also consider playing with the rack aware aspects of your
> cluster.
> You could make all of your 2CPU machines in the same rack.
>
> In theory... machine, rack , second rack is how the data is distributed.
> In theory if the 2CPU cores are neighbors, then the 2nd and or 3rd copy
> goes to another machine.
>
> Trying to write a custom balancer, may be a good hack, but not good in
> terms of corporate life.
>
> Just saying!
>
> -Mike
>
> On Dec 8, 2012, at 1:34 PM, Jean-Marc Spaggiari <je...@spaggiari.org>
> wrote:
>
> > Hi,
> >
> > It's not yet available anywhere. I will post it today or tomorrow,
> > just the time to remove some hardcoding I did into it ;) It's a quick
> > and dirty PerformanceBalancer. It's not a CPULoadBalencer.
> >
> > Anyway, I will give more details over the week-end, but there is
> > absolutly nothing extraordinaire with it.
> >
> > JM
> >
> > 2012/12/8, Robert Dyer <rd...@iastate.edu>:
> >> I too am interested in this custom load balancer, as I was actually just
> >> starting to look into writing one that does the same thing for
> >> my heterogeneous cluster!
> >>
> >> Is this available somewhere?
> >>
> >> On Sat, Dec 8, 2012 at 9:17 AM, James Chang <ja...@gmail.com>
> >> wrote:
> >>
> >>>     By the way, I saw you mentioned that you
> >>> have built a "LoadBalancer", could you kindly
> >>> share some detailed info about it?
> >>>
> >>> Jean-Marc Spaggiari 於 2012年12月8日星期六寫道：
> >>>
> >>>> Hi,
> >>>>
> >>>> Here is the situation.
> >>>>
> >>>> I have an heterogeneous cluster with 2 cores CPUs, 4 cores CPUs and 8
> >>>> cores CPUs servers. The performances of those different servers allow
> >>>> them to handle different size of load. So far, I built a LoadBalancer
> >>>> which balance the regions over those servers based on the
> >>>> performances. And it’s working quite well. The RowCounter went down
> >>>> from 11 minutes to 6 minutes. However, I can still see that the tasks
> >>>> are run on some servers accessing data on other servers, which
> >>>> overwhelme the bandwidth and slow done the process since some 2 cores
> >>>> servers are assigned to count some rows hosted on 8 cores servers.
> >>>>
> >>>> I’m looking for a way to “force” the tasks to run on the servers where
> >>>> the regions are assigned.
> >>>>
> >>>> I first tried to reject the tasks on the Mapper setup method when the
> >>>> data was not local to see if the tracker will assign it to another
> >>>> server. No. It’s just failing and mostly not re-assigned. I tried
> >>>> IOExceptions, RuntimeExceptions, InterruptionExceptions with no
> >>>> success.
> >>>>
> >>>> So now I have 3 possible options.
> >>>>
> >>>> The first one is to move from the MapReduce to the Coprocessor
> >>>> EndPoint. Running locally on the RegionServer, it’s accessing only the
> >>>> local data and I can manually reject all what is not local. Therefor
> >>>> it’s achieving my needs, but it’s not my preferred options since I
> >>>> would like to keep the MR features.
> >>>>
> >>>> The second option is to tell Hadoop where the tasks should be
> >>>> assigned. Should that be done by HBase? By Hadoop? I don’t know.
> >>>> Where? I don’t know either. I have started to look at JobTracker and
> >>>> JobInProgress code but it seems it will be a big task. Also, doing
> >>>> that will mean I will have to re-patch the distributed code each time
> >>>> I’m upgrading the version, and I will have to redo everything when I
> >>>> will move from 1.0.x to 2.x…
> >>>>
> >>>> Third option is to not process the task if the data is not local. I
> >>>> mean, on the map method, simply have a if (!local) return; right from
> >>>> the beginning and just do nothing. This will not work for things like
> >>>> RowCount since all the entries are required, but for some of my
> >>>> usecases this might work where I don’t necessary need all the data to
> >>>> be processed. I will not be efficient stlil the task will still scan
> >>>> the entire region.
> >>>>
> >>>> My preferred option is definitively the 2nd one, but it seems also to
> >>>> be the most difficult one. The Third one is very easy to implement.
> >>>> Need 2 lines to see if the data is local. But it’s not working for all
> >>>> the scenarios, and is more like a dirty fix. The coprocessor option
> >>>> might be doable too since I already have all the code for my MapReduce
> >>>> jobs. So it might be an acceptable option.
> >>>>
> >>>> I’m wondering if anyone already faced this situation and worked on
> >>>> something, and if not, do you have any other ideas/options to propose,
> >>>> or can someone point me to the right classes to look at to implement
> >>>> the solution 2?
> >>>>
> >>>> Thanks,
> >>>>
> >>>> JM
> >>>>
> >>>
> >>
> >>
> >>
> >> --
> >>
> >> Robert Dyer
> >> rdyer@iastate.edu
> >>
> >
>
>


-- 

Robert Dyer
rdyer@iastate.edu

Re: Heterogeneous cluster

Posted by Michael Segel <mi...@hotmail.com>.

Take what I say with a grain of kosher salt. (Its what they put on your drink glasses because the grains are bigger. ;-)

I think what you are doing is cool hack, however in the bigger picture, you shouldn't have to do this with your load balancer. Also it doesn't matter if you think about ti. 

With a heterogenous cluster, you will not share the same configuration across all machines in the cluster. You will change the number of slots per node based on its capacity. 
That will limit what amount of work could be done on the same cluster. 

You could also consider playing with the rack aware aspects of your cluster.
You could make all of your 2CPU machines in the same rack. 

In theory... machine, rack , second rack is how the data is distributed. In theory if the 2CPU cores are neighbors, then the 2nd and or 3rd copy goes to another machine. 

Trying to write a custom balancer, may be a good hack, but not good in terms of corporate life. 

Just saying!

-Mike

On Dec 8, 2012, at 1:34 PM, Jean-Marc Spaggiari <je...@spaggiari.org> wrote:

> Hi,
> 
> It's not yet available anywhere. I will post it today or tomorrow,
> just the time to remove some hardcoding I did into it ;) It's a quick
> and dirty PerformanceBalancer. It's not a CPULoadBalencer.
> 
> Anyway, I will give more details over the week-end, but there is
> absolutly nothing extraordinaire with it.
> 
> JM
> 
> 2012/12/8, Robert Dyer <rd...@iastate.edu>:
>> I too am interested in this custom load balancer, as I was actually just
>> starting to look into writing one that does the same thing for
>> my heterogeneous cluster!
>> 
>> Is this available somewhere?
>> 
>> On Sat, Dec 8, 2012 at 9:17 AM, James Chang <ja...@gmail.com>
>> wrote:
>> 
>>>     By the way, I saw you mentioned that you
>>> have built a "LoadBalancer", could you kindly
>>> share some detailed info about it?
>>> 
>>> Jean-Marc Spaggiari 於 2012年12月8日星期六寫道：
>>> 
>>>> Hi,
>>>> 
>>>> Here is the situation.
>>>> 
>>>> I have an heterogeneous cluster with 2 cores CPUs, 4 cores CPUs and 8
>>>> cores CPUs servers. The performances of those different servers allow
>>>> them to handle different size of load. So far, I built a LoadBalancer
>>>> which balance the regions over those servers based on the
>>>> performances. And it’s working quite well. The RowCounter went down
>>>> from 11 minutes to 6 minutes. However, I can still see that the tasks
>>>> are run on some servers accessing data on other servers, which
>>>> overwhelme the bandwidth and slow done the process since some 2 cores
>>>> servers are assigned to count some rows hosted on 8 cores servers.
>>>> 
>>>> I’m looking for a way to “force” the tasks to run on the servers where
>>>> the regions are assigned.
>>>> 
>>>> I first tried to reject the tasks on the Mapper setup method when the
>>>> data was not local to see if the tracker will assign it to another
>>>> server. No. It’s just failing and mostly not re-assigned. I tried
>>>> IOExceptions, RuntimeExceptions, InterruptionExceptions with no
>>>> success.
>>>> 
>>>> So now I have 3 possible options.
>>>> 
>>>> The first one is to move from the MapReduce to the Coprocessor
>>>> EndPoint. Running locally on the RegionServer, it’s accessing only the
>>>> local data and I can manually reject all what is not local. Therefor
>>>> it’s achieving my needs, but it’s not my preferred options since I
>>>> would like to keep the MR features.
>>>> 
>>>> The second option is to tell Hadoop where the tasks should be
>>>> assigned. Should that be done by HBase? By Hadoop? I don’t know.
>>>> Where? I don’t know either. I have started to look at JobTracker and
>>>> JobInProgress code but it seems it will be a big task. Also, doing
>>>> that will mean I will have to re-patch the distributed code each time
>>>> I’m upgrading the version, and I will have to redo everything when I
>>>> will move from 1.0.x to 2.x…
>>>> 
>>>> Third option is to not process the task if the data is not local. I
>>>> mean, on the map method, simply have a if (!local) return; right from
>>>> the beginning and just do nothing. This will not work for things like
>>>> RowCount since all the entries are required, but for some of my
>>>> usecases this might work where I don’t necessary need all the data to
>>>> be processed. I will not be efficient stlil the task will still scan
>>>> the entire region.
>>>> 
>>>> My preferred option is definitively the 2nd one, but it seems also to
>>>> be the most difficult one. The Third one is very easy to implement.
>>>> Need 2 lines to see if the data is local. But it’s not working for all
>>>> the scenarios, and is more like a dirty fix. The coprocessor option
>>>> might be doable too since I already have all the code for my MapReduce
>>>> jobs. So it might be an acceptable option.
>>>> 
>>>> I’m wondering if anyone already faced this situation and worked on
>>>> something, and if not, do you have any other ideas/options to propose,
>>>> or can someone point me to the right classes to look at to implement
>>>> the solution 2?
>>>> 
>>>> Thanks,
>>>> 
>>>> JM
>>>> 
>>> 
>> 
>> 
>> 
>> --
>> 
>> Robert Dyer
>> rdyer@iastate.edu
>> 
>

Re: Heterogeneous cluster

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

Hi,

It's not yet available anywhere. I will post it today or tomorrow,
just the time to remove some hardcoding I did into it ;) It's a quick
and dirty PerformanceBalancer. It's not a CPULoadBalencer.

Anyway, I will give more details over the week-end, but there is
absolutly nothing extraordinaire with it.

JM

2012/12/8, Robert Dyer <rd...@iastate.edu>:
> I too am interested in this custom load balancer, as I was actually just
> starting to look into writing one that does the same thing for
> my heterogeneous cluster!
>
> Is this available somewhere?
>
> On Sat, Dec 8, 2012 at 9:17 AM, James Chang <ja...@gmail.com>
> wrote:
>
>>      By the way, I saw you mentioned that you
>> have built a "LoadBalancer", could you kindly
>> share some detailed info about it?
>>
>> Jean-Marc Spaggiari 於 2012年12月8日星期六寫道：
>>
>> > Hi,
>> >
>> > Here is the situation.
>> >
>> > I have an heterogeneous cluster with 2 cores CPUs, 4 cores CPUs and 8
>> > cores CPUs servers. The performances of those different servers allow
>> > them to handle different size of load. So far, I built a LoadBalancer
>> > which balance the regions over those servers based on the
>> > performances. And it’s working quite well. The RowCounter went down
>> > from 11 minutes to 6 minutes. However, I can still see that the tasks
>> > are run on some servers accessing data on other servers, which
>> > overwhelme the bandwidth and slow done the process since some 2 cores
>> > servers are assigned to count some rows hosted on 8 cores servers.
>> >
>> > I’m looking for a way to “force” the tasks to run on the servers where
>> > the regions are assigned.
>> >
>> > I first tried to reject the tasks on the Mapper setup method when the
>> > data was not local to see if the tracker will assign it to another
>> > server. No. It’s just failing and mostly not re-assigned. I tried
>> > IOExceptions, RuntimeExceptions, InterruptionExceptions with no
>> > success.
>> >
>> > So now I have 3 possible options.
>> >
>> > The first one is to move from the MapReduce to the Coprocessor
>> > EndPoint. Running locally on the RegionServer, it’s accessing only the
>> > local data and I can manually reject all what is not local. Therefor
>> > it’s achieving my needs, but it’s not my preferred options since I
>> > would like to keep the MR features.
>> >
>> > The second option is to tell Hadoop where the tasks should be
>> > assigned. Should that be done by HBase? By Hadoop? I don’t know.
>> > Where? I don’t know either. I have started to look at JobTracker and
>> > JobInProgress code but it seems it will be a big task. Also, doing
>> > that will mean I will have to re-patch the distributed code each time
>> > I’m upgrading the version, and I will have to redo everything when I
>> > will move from 1.0.x to 2.x…
>> >
>> > Third option is to not process the task if the data is not local. I
>> > mean, on the map method, simply have a if (!local) return; right from
>> > the beginning and just do nothing. This will not work for things like
>> > RowCount since all the entries are required, but for some of my
>> > usecases this might work where I don’t necessary need all the data to
>> > be processed. I will not be efficient stlil the task will still scan
>> > the entire region.
>> >
>> > My preferred option is definitively the 2nd one, but it seems also to
>> > be the most difficult one. The Third one is very easy to implement.
>> > Need 2 lines to see if the data is local. But it’s not working for all
>> > the scenarios, and is more like a dirty fix. The coprocessor option
>> > might be doable too since I already have all the code for my MapReduce
>> > jobs. So it might be an acceptable option.
>> >
>> > I’m wondering if anyone already faced this situation and worked on
>> > something, and if not, do you have any other ideas/options to propose,
>> > or can someone point me to the right classes to look at to implement
>> > the solution 2?
>> >
>> > Thanks,
>> >
>> > JM
>> >
>>
>
>
>
> --
>
> Robert Dyer
> rdyer@iastate.edu
>

Re: Heterogeneous cluster

Posted by Robert Dyer <rd...@iastate.edu>.

I too am interested in this custom load balancer, as I was actually just
starting to look into writing one that does the same thing for
my heterogeneous cluster!

Is this available somewhere?

On Sat, Dec 8, 2012 at 9:17 AM, James Chang <ja...@gmail.com> wrote:

>      By the way, I saw you mentioned that you
> have built a "LoadBalancer", could you kindly
> share some detailed info about it?
>
> Jean-Marc Spaggiari 於 2012年12月8日星期六寫道：
>
> > Hi,
> >
> > Here is the situation.
> >
> > I have an heterogeneous cluster with 2 cores CPUs, 4 cores CPUs and 8
> > cores CPUs servers. The performances of those different servers allow
> > them to handle different size of load. So far, I built a LoadBalancer
> > which balance the regions over those servers based on the
> > performances. And it’s working quite well. The RowCounter went down
> > from 11 minutes to 6 minutes. However, I can still see that the tasks
> > are run on some servers accessing data on other servers, which
> > overwhelme the bandwidth and slow done the process since some 2 cores
> > servers are assigned to count some rows hosted on 8 cores servers.
> >
> > I’m looking for a way to “force” the tasks to run on the servers where
> > the regions are assigned.
> >
> > I first tried to reject the tasks on the Mapper setup method when the
> > data was not local to see if the tracker will assign it to another
> > server. No. It’s just failing and mostly not re-assigned. I tried
> > IOExceptions, RuntimeExceptions, InterruptionExceptions with no
> > success.
> >
> > So now I have 3 possible options.
> >
> > The first one is to move from the MapReduce to the Coprocessor
> > EndPoint. Running locally on the RegionServer, it’s accessing only the
> > local data and I can manually reject all what is not local. Therefor
> > it’s achieving my needs, but it’s not my preferred options since I
> > would like to keep the MR features.
> >
> > The second option is to tell Hadoop where the tasks should be
> > assigned. Should that be done by HBase? By Hadoop? I don’t know.
> > Where? I don’t know either. I have started to look at JobTracker and
> > JobInProgress code but it seems it will be a big task. Also, doing
> > that will mean I will have to re-patch the distributed code each time
> > I’m upgrading the version, and I will have to redo everything when I
> > will move from 1.0.x to 2.x…
> >
> > Third option is to not process the task if the data is not local. I
> > mean, on the map method, simply have a if (!local) return; right from
> > the beginning and just do nothing. This will not work for things like
> > RowCount since all the entries are required, but for some of my
> > usecases this might work where I don’t necessary need all the data to
> > be processed. I will not be efficient stlil the task will still scan
> > the entire region.
> >
> > My preferred option is definitively the 2nd one, but it seems also to
> > be the most difficult one. The Third one is very easy to implement.
> > Need 2 lines to see if the data is local. But it’s not working for all
> > the scenarios, and is more like a dirty fix. The coprocessor option
> > might be doable too since I already have all the code for my MapReduce
> > jobs. So it might be an acceptable option.
> >
> > I’m wondering if anyone already faced this situation and worked on
> > something, and if not, do you have any other ideas/options to propose,
> > or can someone point me to the right classes to look at to implement
> > the solution 2?
> >
> > Thanks,
> >
> > JM
> >
>



-- 

Robert Dyer
rdyer@iastate.edu

Re: Heterogeneous cluster

Posted by James Chang <ja...@gmail.com>.

Hi JM,

     I ever think the same issue, in my opinion,
option 2 is perfer.

     By the way, I saw you mentioned that you
have built a "LoadBalancer", could you kindly
share some detailed info about it?


Best Regards.
James Chang



Jean-Marc Spaggiari 於 2012年12月8日星期六寫道：

> Hi,
>
> Here is the situation.
>
> I have an heterogeneous cluster with 2 cores CPUs, 4 cores CPUs and 8
> cores CPUs servers. The performances of those different servers allow
> them to handle different size of load. So far, I built a LoadBalancer
> which balance the regions over those servers based on the
> performances. And it’s working quite well. The RowCounter went down
> from 11 minutes to 6 minutes. However, I can still see that the tasks
> are run on some servers accessing data on other servers, which
> overwhelme the bandwidth and slow done the process since some 2 cores
> servers are assigned to count some rows hosted on 8 cores servers.
>
> I’m looking for a way to “force” the tasks to run on the servers where
> the regions are assigned.
>
> I first tried to reject the tasks on the Mapper setup method when the
> data was not local to see if the tracker will assign it to another
> server. No. It’s just failing and mostly not re-assigned. I tried
> IOExceptions, RuntimeExceptions, InterruptionExceptions with no
> success.
>
> So now I have 3 possible options.
>
> The first one is to move from the MapReduce to the Coprocessor
> EndPoint. Running locally on the RegionServer, it’s accessing only the
> local data and I can manually reject all what is not local. Therefor
> it’s achieving my needs, but it’s not my preferred options since I
> would like to keep the MR features.
>
> The second option is to tell Hadoop where the tasks should be
> assigned. Should that be done by HBase? By Hadoop? I don’t know.
> Where? I don’t know either. I have started to look at JobTracker and
> JobInProgress code but it seems it will be a big task. Also, doing
> that will mean I will have to re-patch the distributed code each time
> I’m upgrading the version, and I will have to redo everything when I
> will move from 1.0.x to 2.x…
>
> Third option is to not process the task if the data is not local. I
> mean, on the map method, simply have a if (!local) return; right from
> the beginning and just do nothing. This will not work for things like
> RowCount since all the entries are required, but for some of my
> usecases this might work where I don’t necessary need all the data to
> be processed. I will not be efficient stlil the task will still scan
> the entire region.
>
> My preferred option is definitively the 2nd one, but it seems also to
> be the most difficult one. The Third one is very easy to implement.
> Need 2 lines to see if the data is local. But it’s not working for all
> the scenarios, and is more like a dirty fix. The coprocessor option
> might be doable too since I already have all the code for my MapReduce
> jobs. So it might be an acceptable option.
>
> I’m wondering if anyone already faced this situation and worked on
> something, and if not, do you have any other ideas/options to propose,
> or can someone point me to the right classes to look at to implement
> the solution 2?
>
> Thanks,
>
> JM
>