You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Demai Ni <ni...@gmail.com> on 2015/04/08 02:05:23 UTC

HBase region assignment by range?

hi, folks,

I have a question about region assignment and like to clarify some through.

Let's say I have a table with rowkey as "row00000 ~ row30000" on a 4 node
hbase cluster, is there a way to keep data partitioned by range on each
node? for example:

node1:  <=row10000
node2:  row10001~row20000
node3:  row20001~row30000
node4:  >row30000

And even when one of the node become hotspot, the boundary won't be crossed
unless manually doing a load balancing?

I looked at presplit: { SPLITS => ['row100','row200','row300'] } , but
don't think it serves this purpose.

BTW, a bit background. I am thinking to do a local join between two tables
if both have same rowkey, and partitioned by range (or same hash
algorithm). If I can keep the join-key on the same node(aka regionServer),
the join can be handled locally instead of broadcast to all other nodes.

Thanks for your input. A couple pointers to blog/presentation would be
appreciated.

Demai

Re: HBase region assignment by range?

Posted by Anoop John <an...@gmail.com>.

You can pre split the table as per the key ranges and use a custom Load
Balancer to keep the regions to required nodes (?)  Seems you have to
collocate 2 table regions in these nodes (to do the join)...  So hope you
already work with the LB

-Anoop-

On Wed, Apr 8, 2015 at 8:17 AM, Alok Singh <al...@gmail.com> wrote:

> >I looked at presplit: { SPLITS => ['row100','row200','row300'] } , but
> >don't think it serves this purpose.
>
> Why doesn't this work for you? Is it because regions are not evenly
> distributed across the cluster after the split? You can move regions
> manually and spread them out evenly.
>
> Alok
>
> On Tue, Apr 7, 2015 at 5:05 PM, Demai Ni <ni...@gmail.com> wrote:
> > hi, folks,
> >
> > I have a question about region assignment and like to clarify some
> through.
> >
> > Let's say I have a table with rowkey as "row00000 ~ row30000" on a 4 node
> > hbase cluster, is there a way to keep data partitioned by range on each
> > node? for example:
> >
> > node1:  <=row10000
> > node2:  row10001~row20000
> > node3:  row20001~row30000
> > node4:  >row30000
> >
> > And even when one of the node become hotspot, the boundary won't be
> crossed
> > unless manually doing a load balancing?
> >
> > I looked at presplit: { SPLITS => ['row100','row200','row300'] } , but
> > don't think it serves this purpose.
> >
> > BTW, a bit background. I am thinking to do a local join between two
> tables
> > if both have same rowkey, and partitioned by range (or same hash
> > algorithm). If I can keep the join-key on the same node(aka
> regionServer),
> > the join can be handled locally instead of broadcast to all other nodes.
> >
> > Thanks for your input. A couple pointers to blog/presentation would be
> > appreciated.
> >
> > Demai
>

Re: HBase region assignment by range?

Posted by Alok Singh <al...@gmail.com>.

>I looked at presplit: { SPLITS => ['row100','row200','row300'] } , but
>don't think it serves this purpose.

Why doesn't this work for you? Is it because regions are not evenly
distributed across the cluster after the split? You can move regions
manually and spread them out evenly.

Alok

On Tue, Apr 7, 2015 at 5:05 PM, Demai Ni <ni...@gmail.com> wrote:
> hi, folks,
>
> I have a question about region assignment and like to clarify some through.
>
> Let's say I have a table with rowkey as "row00000 ~ row30000" on a 4 node
> hbase cluster, is there a way to keep data partitioned by range on each
> node? for example:
>
> node1:  <=row10000
> node2:  row10001~row20000
> node3:  row20001~row30000
> node4:  >row30000
>
> And even when one of the node become hotspot, the boundary won't be crossed
> unless manually doing a load balancing?
>
> I looked at presplit: { SPLITS => ['row100','row200','row300'] } , but
> don't think it serves this purpose.
>
> BTW, a bit background. I am thinking to do a local join between two tables
> if both have same rowkey, and partitioned by range (or same hash
> algorithm). If I can keep the join-key on the same node(aka regionServer),
> the join can be handled locally instead of broadcast to all other nodes.
>
> Thanks for your input. A couple pointers to blog/presentation would be
> appreciated.
>
> Demai

Re: HBase region assignment by range?

Posted by Michael Segel <mi...@hotmail.com>.

Hi, 

Schemas are relatively straight forward if you ignore relational modeling. 
Think of a hierarchical  database where everything is stored and accessed off the primary key and then you retrieve the record as a whole. 
That makes it easier, but you end up having to forget everything you learned about a data-warehouses.

Think about walking in to a doctor’s office. He opens a file cabinet and pulls out a folder. (Sometimes a very thick folder or two… ;-) 
That’s it. Your hbase row is really just a fancy container and all of the attributes of your medical history are stored in either different column families and then in cells where each cell could contain the record(s) from a specific event.

So the main use case is that he needs one medical file (you) when you visit, so its a fast fetch or get().

When he wants to scan the records to see who had blood drawn last Tuesday and may have to be checked out for Hep C due to a Nurse being sick…. then he needs to do a scan of all patients to see if they were in the office, and then to see if they had blood work done.  Which is a table scan.  Please note that this isnt a good Big Data example, unless its patient data for the entire hospital / chain, over the past 10 years. (The single doctor is just easier to visualize.) 

Compaction is a side effect of sitting on a WORM file system.  
MapRDB doesn’t have this issue.  And compaction could be better managed so that its a shorter window of blocking. ;-)  But that’s another topic. 

From what you want to do… if you’re pulling out blobs of data that are used in whole during one unit of work, you may want to store them in sequence files because they are static and the I/O is faster, and then use HBase to index where a specific blob is located. (The URL to the file, the starting offset, and then the length. )  Now you hit HBase to find the record you want and then pull it from HDFS. (simple get() then fetch() or scan and for each element in the result set,  fetch the record.) 




> On Apr 10, 2015, at 11:44 AM, Demai Ni <ni...@gmail.com> wrote:
> 
> Michael,
> 
> thanks for the input.
> 
> Not sure if this was a typo, but you don’t have OLTP in HBase.
>> 
> 
> Understand your point as HBase is NO way comparing to the well-matured
> RDBMS, which addressed the OLTP requirement decades ago, and keep
> improving. BigData(to be specific, Hadoop econ system) is not for such kind
> of usage. For example, a banking transaction system should stay on the
> 'OLD' oracle/db2/., and don't get fooled by the 'big data' HYPE.
> What I mean is that HBase has the capacity to manipulate data at ROW/Cell
> level with acceptable performance(for example, phone logs, health data),
> and has a built-in mechanism of COMPACTION so that the HDFS system won't be
> flooded by small files.
> 
> 
> I would strongly suggest that you rethink your schema…
>> 
> I agree that schema design is very important for the system usage and
> performance. However, for my current POC, the goal is to test whether HBase
> can be used as an OLAP data store. If the conclusion heavily depends on
> user schema design, the conclusion will be quite weak unless such schema
> design can easily understood and implemented by end-user, without in-deep
> knowledge on HBase.
> 
> Demai
> 
> 
> On Wed, Apr 8, 2015 at 7:35 PM, Michael Segel <mi...@hotmail.com>
> wrote:
> 
>> Hi…
>> 
>> Not sure if this was a typo, but you don’t have OLTP in HBase.
>> In fact Splice Machines has gone to a lot of trouble to add in OLTP and
>> there is still work that has to be done when it comes to isolation levels
>> and RLL
>> (Note RLL in HBase is not the same as RLL in an OLTP scenario.)
>> 
>> Thinking of HBase in terms of an RDBMs is wrong. You don’t want to do it.
>> It won’t work and the design will be very sub-optimal.  Its a common
>> mistake.
>> 
>> You will need to do a lot of inverted tables for indexing.
>> The reason you want to use an inverted table is that its the easiest index
>> to do and when you’re inserting rows in to your table, you can build your
>> index, or you can drop your index table and then run a m/r job to rebuild
>> it. (You could also rebuild multiple indexes in the same M/R job)
>> 
>> Now when you want to filter your data, you can pull data from the index
>> tables, and then perform an intersection against the result sets. easy
>> peasy. Now you have your final result set which you can then fetch and then
>> apply any filters that are not on indexed columns and you’re done.
>> 
>> You want something faster… build a lucene index where the in memory index
>> documents only contain the indexed columns…
>> 
>> I would strongly suggest that you rethink your schema…
>> 
>> Also, with HBase, while you can have fact tables, you store the facts in
>> the base table for the record. The fact table exists so that your
>> application has a record of the domain of allowable attributes.
>> 
>> HTH
>> 
>> -Mike
>> 
>> 
>>> On Apr 8, 2015, at 1:39 PM, Demai Ni <ni...@gmail.com> wrote:
>>> 
>>> hi, Guys,
>>> 
>>> many thanks for your quick response.
>>> 
>>> First, Let me share what I am looking at, which may help to clarify the
>>> intention and answer a few of questions. I am working on a POC to bring
>> in
>>> MPP style of OLAP on Hadoop, and looking for whether it is feasible to
>> have
>>> HBase as Datastore. With HBase, I'd like to take advantage of 1) OLTP
>>> capability ; 2) many filters ; 3) in-cluster replica and between-clusters
>>> replication. I am currently using TPCH schema for this POC, and also
>>> consider star-schema. Since it is a POC, I can pretty much define my
>> rules
>>> and set limitations as it fits. :-)
>>> 
>>> Why doesn't this(presplit) work for you?
>>> 
>>> The reason is that presplit won't guarantee the regions stay at the
>>> pre-assigned regionServer. Let's say I have a very large table and a very
>>> small table with different data distribution, even with the same presplit
>>> value. HBase won't ensure the same range of data located on the same
>>> physical node. Unless we have a custom LB mentioned by @Anoop and
>> @esteban.
>>> Is my understanding correct? BTW, I will look into HBASE-10576 to see
>>> whether it fits my needs.
>>> 
>>> Is your table staic?
>>>> 
>>> while I can make it static for POC purpose, but I will use this
>> limitation,
>>> as I'd like the HBase for its OLTP feature. So besides the 'static'
>> HFile,
>>> need HLOGs on the same local node too. But again, I would worry about the
>>> 'static' HFile for now
>>> 
>>> However as you add data to the table, those regions will eventually
>> split.
>>> 
>>> while the region can surely split when more data added-on, but can HBase
>>> keep the new regions still on the same regionServer according to the
>>> predefined bounary? I will worry about hotspot-issue late. that is the
>>> beauty of doing POC instead of production. :-)
>>> 
>>> What you’re suggesting is that as you do a region scan, you’re going to
>> the
>>>> other table and then try to fetch a row if it exists.
>>>> 
>>> Yes, something like that. I am currently using the client API: scan()
>> with
>>> start and end key.  Since I know my start and end keys, and with the
>>> local-read feature, the scan should be local-READ. With some
>>> statistics(such as which one is larger table) and  a hash join
>>> operation(which I need to implement), the join will work with not-too-bad
>>> performance. Again, it is POC, so I won't worry about the situation that
>> a
>>> regionServer hosts too much data(hotspot). But surely, a LB should be
>> used
>>> before putting into production if it ever occurs.
>>> 
>>> either the second table should be part of the first table in the same CF
>> or
>>>> as a separate CF
>>>> 
>>> I am not sure whether it will work for a situation of a large table vs a
>>> small table. The data of the small table has to be duplicated in many
>>> places, and a update of the small table can be costly.
>>> 
>>> Demai
>>> 
>>> 
>>> On Wed, Apr 8, 2015 at 10:24 AM, Esteban Gutierrez <esteban@cloudera.com
>>> 
>>> wrote:
>>> 
>>>> +1 Anoop.
>>>> 
>>>> Thats pretty much the only way right now if you need a custom balancing.
>>>> This balancer doesn't have to live in the HMaster and can be invoked
>>>> externally (there are caveats of doing that, when a RS die but works ok
>> so
>>>> far). A long term solution for your the problem you are trying to solve
>> is
>>>> HBASE-10576 by tweaking it a little.
>>>> 
>>>> cheers,
>>>> esteban.
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Cloudera, Inc.
>>>> 
>>>> 
>>>> On Wed, Apr 8, 2015 at 4:41 AM, Michael Segel <
>> michael_segel@hotmail.com>
>>>> wrote:
>>>> 
>>>>> Is your table staic?
>>>>> 
>>>>> If you know your data and your ranges, you can do it. However as you
>> add
>>>>> data to the table, those regions will eventually split.
>>>>> 
>>>>> The other issue that you brought up is that you want to do ‘local’
>> joins.
>>>>> 
>>>>> Simple single word response… don’t.
>>>>> 
>>>>> Longer response..
>>>>> 
>>>>> You’re suggesting that the tables in question share the row key in
>>>>> common.  Ok… why? Are they part of the same record?
>>>>> How is the data normally being used?
>>>>> 
>>>>> Have you looked at column families?
>>>>> 
>>>>> The issue is that joins are expensive. What you’re suggesting is that
>> as
>>>>> you do a region scan, you’re going to the other table and then try to
>>>> fetch
>>>>> a row if it exists.
>>>>> So its essentially for each row in the scan, try a get() which will
>>>> almost
>>>>> double the cost of your fetch. Then you have to decide how to do it
>>>>> locally. Are you really going to write a coprocessor for this?  (Hint:
>> If
>>>>> this is a common thing. Then either the second table should be part of
>>>> the
>>>>> first table in the same CF or as a separate CF. You need to rethink
>> your
>>>>> schema.)
>>>>> 
>>>>> Does this make sense?
>>>>> 
>>>>>> On Apr 7, 2015, at 7:05 PM, Demai Ni <ni...@gmail.com> wrote:
>>>>>> 
>>>>>> hi, folks,
>>>>>> 
>>>>>> I have a question about region assignment and like to clarify some
>>>>> through.
>>>>>> 
>>>>>> Let's say I have a table with rowkey as "row00000 ~ row30000" on a 4
>>>> node
>>>>>> hbase cluster, is there a way to keep data partitioned by range on
>> each
>>>>>> node? for example:
>>>>>> 
>>>>>> node1:  <=row10000
>>>>>> node2:  row10001~row20000
>>>>>> node3:  row20001~row30000
>>>>>> node4:  >row30000
>>>>>> 
>>>>>> And even when one of the node become hotspot, the boundary won't be
>>>>> crossed
>>>>>> unless manually doing a load balancing?
>>>>>> 
>>>>>> I looked at presplit: { SPLITS => ['row100','row200','row300'] } , but
>>>>>> don't think it serves this purpose.
>>>>>> 
>>>>>> BTW, a bit background. I am thinking to do a local join between two
>>>>> tables
>>>>>> if both have same rowkey, and partitioned by range (or same hash
>>>>>> algorithm). If I can keep the join-key on the same node(aka
>>>>> regionServer),
>>>>>> the join can be handled locally instead of broadcast to all other
>>>> nodes.
>>>>>> 
>>>>>> Thanks for your input. A couple pointers to blog/presentation would be
>>>>>> appreciated.
>>>>>> 
>>>>>> Demai
>>>>> 
>>>>> The opinions expressed here are mine, while they may reflect a
>> cognitive
>>>>> thought, that is purely accidental.
>>>>> Use at your own risk.
>>>>> Michael Segel
>>>>> michael_segel (AT) hotmail.com
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>> 
>> The opinions expressed here are mine, while they may reflect a cognitive
>> thought, that is purely accidental.
>> Use at your own risk.
>> Michael Segel
>> michael_segel (AT) hotmail.com
>> 
>> 
>> 
>> 
>> 
>> 

The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com

Re: HBase region assignment by range?

Posted by Demai Ni <ni...@gmail.com>.

Michael,

thanks for the input.

Not sure if this was a typo, but you don’t have OLTP in HBase.
>

Understand your point as HBase is NO way comparing to the well-matured
RDBMS, which addressed the OLTP requirement decades ago, and keep
improving. BigData(to be specific, Hadoop econ system) is not for such kind
of usage. For example, a banking transaction system should stay on the
'OLD' oracle/db2/., and don't get fooled by the 'big data' HYPE.
What I mean is that HBase has the capacity to manipulate data at ROW/Cell
level with acceptable performance(for example, phone logs, health data),
and has a built-in mechanism of COMPACTION so that the HDFS system won't be
flooded by small files.


I would strongly suggest that you rethink your schema…
>
I agree that schema design is very important for the system usage and
performance. However, for my current POC, the goal is to test whether HBase
can be used as an OLAP data store. If the conclusion heavily depends on
user schema design, the conclusion will be quite weak unless such schema
design can easily understood and implemented by end-user, without in-deep
knowledge on HBase.

Demai


On Wed, Apr 8, 2015 at 7:35 PM, Michael Segel <mi...@hotmail.com>
wrote:

> Hi…
>
> Not sure if this was a typo, but you don’t have OLTP in HBase.
> In fact Splice Machines has gone to a lot of trouble to add in OLTP and
> there is still work that has to be done when it comes to isolation levels
> and RLL
> (Note RLL in HBase is not the same as RLL in an OLTP scenario.)
>
> Thinking of HBase in terms of an RDBMs is wrong. You don’t want to do it.
> It won’t work and the design will be very sub-optimal.  Its a common
> mistake.
>
> You will need to do a lot of inverted tables for indexing.
> The reason you want to use an inverted table is that its the easiest index
> to do and when you’re inserting rows in to your table, you can build your
> index, or you can drop your index table and then run a m/r job to rebuild
> it. (You could also rebuild multiple indexes in the same M/R job)
>
> Now when you want to filter your data, you can pull data from the index
> tables, and then perform an intersection against the result sets. easy
> peasy. Now you have your final result set which you can then fetch and then
> apply any filters that are not on indexed columns and you’re done.
>
> You want something faster… build a lucene index where the in memory index
> documents only contain the indexed columns…
>
> I would strongly suggest that you rethink your schema…
>
> Also, with HBase, while you can have fact tables, you store the facts in
> the base table for the record. The fact table exists so that your
> application has a record of the domain of allowable attributes.
>
> HTH
>
> -Mike
>
>
> > On Apr 8, 2015, at 1:39 PM, Demai Ni <ni...@gmail.com> wrote:
> >
> > hi, Guys,
> >
> > many thanks for your quick response.
> >
> > First, Let me share what I am looking at, which may help to clarify the
> > intention and answer a few of questions. I am working on a POC to bring
> in
> > MPP style of OLAP on Hadoop, and looking for whether it is feasible to
> have
> > HBase as Datastore. With HBase, I'd like to take advantage of 1) OLTP
> > capability ; 2) many filters ; 3) in-cluster replica and between-clusters
> > replication. I am currently using TPCH schema for this POC, and also
> > consider star-schema. Since it is a POC, I can pretty much define my
> rules
> > and set limitations as it fits. :-)
> >
> > Why doesn't this(presplit) work for you?
> >
> > The reason is that presplit won't guarantee the regions stay at the
> > pre-assigned regionServer. Let's say I have a very large table and a very
> > small table with different data distribution, even with the same presplit
> > value. HBase won't ensure the same range of data located on the same
> > physical node. Unless we have a custom LB mentioned by @Anoop and
> @esteban.
> > Is my understanding correct? BTW, I will look into HBASE-10576 to see
> > whether it fits my needs.
> >
> > Is your table staic?
> >>
> > while I can make it static for POC purpose, but I will use this
> limitation,
> > as I'd like the HBase for its OLTP feature. So besides the 'static'
> HFile,
> > need HLOGs on the same local node too. But again, I would worry about the
> > 'static' HFile for now
> >
> > However as you add data to the table, those regions will eventually
> split.
> >
> > while the region can surely split when more data added-on, but can HBase
> > keep the new regions still on the same regionServer according to the
> > predefined bounary? I will worry about hotspot-issue late. that is the
> > beauty of doing POC instead of production. :-)
> >
> > What you’re suggesting is that as you do a region scan, you’re going to
> the
> >> other table and then try to fetch a row if it exists.
> >>
> > Yes, something like that. I am currently using the client API: scan()
> with
> > start and end key.  Since I know my start and end keys, and with the
> > local-read feature, the scan should be local-READ. With some
> > statistics(such as which one is larger table) and  a hash join
> > operation(which I need to implement), the join will work with not-too-bad
> > performance. Again, it is POC, so I won't worry about the situation that
> a
> > regionServer hosts too much data(hotspot). But surely, a LB should be
> used
> > before putting into production if it ever occurs.
> >
> > either the second table should be part of the first table in the same CF
> or
> >> as a separate CF
> >>
> > I am not sure whether it will work for a situation of a large table vs a
> > small table. The data of the small table has to be duplicated in many
> > places, and a update of the small table can be costly.
> >
> > Demai
> >
> >
> > On Wed, Apr 8, 2015 at 10:24 AM, Esteban Gutierrez <esteban@cloudera.com
> >
> > wrote:
> >
> >> +1 Anoop.
> >>
> >> Thats pretty much the only way right now if you need a custom balancing.
> >> This balancer doesn't have to live in the HMaster and can be invoked
> >> externally (there are caveats of doing that, when a RS die but works ok
> so
> >> far). A long term solution for your the problem you are trying to solve
> is
> >> HBASE-10576 by tweaking it a little.
> >>
> >> cheers,
> >> esteban.
> >>
> >>
> >>
> >>
> >>
> >> --
> >> Cloudera, Inc.
> >>
> >>
> >> On Wed, Apr 8, 2015 at 4:41 AM, Michael Segel <
> michael_segel@hotmail.com>
> >> wrote:
> >>
> >>> Is your table staic?
> >>>
> >>> If you know your data and your ranges, you can do it. However as you
> add
> >>> data to the table, those regions will eventually split.
> >>>
> >>> The other issue that you brought up is that you want to do ‘local’
> joins.
> >>>
> >>> Simple single word response… don’t.
> >>>
> >>> Longer response..
> >>>
> >>> You’re suggesting that the tables in question share the row key in
> >>> common.  Ok… why? Are they part of the same record?
> >>> How is the data normally being used?
> >>>
> >>> Have you looked at column families?
> >>>
> >>> The issue is that joins are expensive. What you’re suggesting is that
> as
> >>> you do a region scan, you’re going to the other table and then try to
> >> fetch
> >>> a row if it exists.
> >>> So its essentially for each row in the scan, try a get() which will
> >> almost
> >>> double the cost of your fetch. Then you have to decide how to do it
> >>> locally. Are you really going to write a coprocessor for this?  (Hint:
> If
> >>> this is a common thing. Then either the second table should be part of
> >> the
> >>> first table in the same CF or as a separate CF. You need to rethink
> your
> >>> schema.)
> >>>
> >>> Does this make sense?
> >>>
> >>>> On Apr 7, 2015, at 7:05 PM, Demai Ni <ni...@gmail.com> wrote:
> >>>>
> >>>> hi, folks,
> >>>>
> >>>> I have a question about region assignment and like to clarify some
> >>> through.
> >>>>
> >>>> Let's say I have a table with rowkey as "row00000 ~ row30000" on a 4
> >> node
> >>>> hbase cluster, is there a way to keep data partitioned by range on
> each
> >>>> node? for example:
> >>>>
> >>>> node1:  <=row10000
> >>>> node2:  row10001~row20000
> >>>> node3:  row20001~row30000
> >>>> node4:  >row30000
> >>>>
> >>>> And even when one of the node become hotspot, the boundary won't be
> >>> crossed
> >>>> unless manually doing a load balancing?
> >>>>
> >>>> I looked at presplit: { SPLITS => ['row100','row200','row300'] } , but
> >>>> don't think it serves this purpose.
> >>>>
> >>>> BTW, a bit background. I am thinking to do a local join between two
> >>> tables
> >>>> if both have same rowkey, and partitioned by range (or same hash
> >>>> algorithm). If I can keep the join-key on the same node(aka
> >>> regionServer),
> >>>> the join can be handled locally instead of broadcast to all other
> >> nodes.
> >>>>
> >>>> Thanks for your input. A couple pointers to blog/presentation would be
> >>>> appreciated.
> >>>>
> >>>> Demai
> >>>
> >>> The opinions expressed here are mine, while they may reflect a
> cognitive
> >>> thought, that is purely accidental.
> >>> Use at your own risk.
> >>> Michael Segel
> >>> michael_segel (AT) hotmail.com
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>
>
> The opinions expressed here are mine, while they may reflect a cognitive
> thought, that is purely accidental.
> Use at your own risk.
> Michael Segel
> michael_segel (AT) hotmail.com
>
>
>
>
>
>

Re: HBase region assignment by range?

Posted by Michael Segel <mi...@hotmail.com>.

Hi… 

Not sure if this was a typo, but you don’t have OLTP in HBase. 
In fact Splice Machines has gone to a lot of trouble to add in OLTP and there is still work that has to be done when it comes to isolation levels and RLL
(Note RLL in HBase is not the same as RLL in an OLTP scenario.) 

Thinking of HBase in terms of an RDBMs is wrong. You don’t want to do it. It won’t work and the design will be very sub-optimal.  Its a common mistake. 

You will need to do a lot of inverted tables for indexing. 
The reason you want to use an inverted table is that its the easiest index to do and when you’re inserting rows in to your table, you can build your index, or you can drop your index table and then run a m/r job to rebuild it. (You could also rebuild multiple indexes in the same M/R job) 

Now when you want to filter your data, you can pull data from the index tables, and then perform an intersection against the result sets. easy peasy. Now you have your final result set which you can then fetch and then apply any filters that are not on indexed columns and you’re done. 

You want something faster… build a lucene index where the in memory index documents only contain the indexed columns… 

I would strongly suggest that you rethink your schema… 

Also, with HBase, while you can have fact tables, you store the facts in the base table for the record. The fact table exists so that your application has a record of the domain of allowable attributes. 

HTH

-Mike


> On Apr 8, 2015, at 1:39 PM, Demai Ni <ni...@gmail.com> wrote:
> 
> hi, Guys,
> 
> many thanks for your quick response.
> 
> First, Let me share what I am looking at, which may help to clarify the
> intention and answer a few of questions. I am working on a POC to bring in
> MPP style of OLAP on Hadoop, and looking for whether it is feasible to have
> HBase as Datastore. With HBase, I'd like to take advantage of 1) OLTP
> capability ; 2) many filters ; 3) in-cluster replica and between-clusters
> replication. I am currently using TPCH schema for this POC, and also
> consider star-schema. Since it is a POC, I can pretty much define my rules
> and set limitations as it fits. :-)
> 
> Why doesn't this(presplit) work for you?
> 
> The reason is that presplit won't guarantee the regions stay at the
> pre-assigned regionServer. Let's say I have a very large table and a very
> small table with different data distribution, even with the same presplit
> value. HBase won't ensure the same range of data located on the same
> physical node. Unless we have a custom LB mentioned by @Anoop and @esteban.
> Is my understanding correct? BTW, I will look into HBASE-10576 to see
> whether it fits my needs.
> 
> Is your table staic?
>> 
> while I can make it static for POC purpose, but I will use this limitation,
> as I'd like the HBase for its OLTP feature. So besides the 'static' HFile,
> need HLOGs on the same local node too. But again, I would worry about the
> 'static' HFile for now
> 
> However as you add data to the table, those regions will eventually split.
> 
> while the region can surely split when more data added-on, but can HBase
> keep the new regions still on the same regionServer according to the
> predefined bounary? I will worry about hotspot-issue late. that is the
> beauty of doing POC instead of production. :-)
> 
> What you’re suggesting is that as you do a region scan, you’re going to the
>> other table and then try to fetch a row if it exists.
>> 
> Yes, something like that. I am currently using the client API: scan() with
> start and end key.  Since I know my start and end keys, and with the
> local-read feature, the scan should be local-READ. With some
> statistics(such as which one is larger table) and  a hash join
> operation(which I need to implement), the join will work with not-too-bad
> performance. Again, it is POC, so I won't worry about the situation that a
> regionServer hosts too much data(hotspot). But surely, a LB should be used
> before putting into production if it ever occurs.
> 
> either the second table should be part of the first table in the same CF or
>> as a separate CF
>> 
> I am not sure whether it will work for a situation of a large table vs a
> small table. The data of the small table has to be duplicated in many
> places, and a update of the small table can be costly.
> 
> Demai
> 
> 
> On Wed, Apr 8, 2015 at 10:24 AM, Esteban Gutierrez <es...@cloudera.com>
> wrote:
> 
>> +1 Anoop.
>> 
>> Thats pretty much the only way right now if you need a custom balancing.
>> This balancer doesn't have to live in the HMaster and can be invoked
>> externally (there are caveats of doing that, when a RS die but works ok so
>> far). A long term solution for your the problem you are trying to solve is
>> HBASE-10576 by tweaking it a little.
>> 
>> cheers,
>> esteban.
>> 
>> 
>> 
>> 
>> 
>> --
>> Cloudera, Inc.
>> 
>> 
>> On Wed, Apr 8, 2015 at 4:41 AM, Michael Segel <mi...@hotmail.com>
>> wrote:
>> 
>>> Is your table staic?
>>> 
>>> If you know your data and your ranges, you can do it. However as you add
>>> data to the table, those regions will eventually split.
>>> 
>>> The other issue that you brought up is that you want to do ‘local’ joins.
>>> 
>>> Simple single word response… don’t.
>>> 
>>> Longer response..
>>> 
>>> You’re suggesting that the tables in question share the row key in
>>> common.  Ok… why? Are they part of the same record?
>>> How is the data normally being used?
>>> 
>>> Have you looked at column families?
>>> 
>>> The issue is that joins are expensive. What you’re suggesting is that as
>>> you do a region scan, you’re going to the other table and then try to
>> fetch
>>> a row if it exists.
>>> So its essentially for each row in the scan, try a get() which will
>> almost
>>> double the cost of your fetch. Then you have to decide how to do it
>>> locally. Are you really going to write a coprocessor for this?  (Hint: If
>>> this is a common thing. Then either the second table should be part of
>> the
>>> first table in the same CF or as a separate CF. You need to rethink your
>>> schema.)
>>> 
>>> Does this make sense?
>>> 
>>>> On Apr 7, 2015, at 7:05 PM, Demai Ni <ni...@gmail.com> wrote:
>>>> 
>>>> hi, folks,
>>>> 
>>>> I have a question about region assignment and like to clarify some
>>> through.
>>>> 
>>>> Let's say I have a table with rowkey as "row00000 ~ row30000" on a 4
>> node
>>>> hbase cluster, is there a way to keep data partitioned by range on each
>>>> node? for example:
>>>> 
>>>> node1:  <=row10000
>>>> node2:  row10001~row20000
>>>> node3:  row20001~row30000
>>>> node4:  >row30000
>>>> 
>>>> And even when one of the node become hotspot, the boundary won't be
>>> crossed
>>>> unless manually doing a load balancing?
>>>> 
>>>> I looked at presplit: { SPLITS => ['row100','row200','row300'] } , but
>>>> don't think it serves this purpose.
>>>> 
>>>> BTW, a bit background. I am thinking to do a local join between two
>>> tables
>>>> if both have same rowkey, and partitioned by range (or same hash
>>>> algorithm). If I can keep the join-key on the same node(aka
>>> regionServer),
>>>> the join can be handled locally instead of broadcast to all other
>> nodes.
>>>> 
>>>> Thanks for your input. A couple pointers to blog/presentation would be
>>>> appreciated.
>>>> 
>>>> Demai
>>> 
>>> The opinions expressed here are mine, while they may reflect a cognitive
>>> thought, that is purely accidental.
>>> Use at your own risk.
>>> Michael Segel
>>> michael_segel (AT) hotmail.com
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>> 

The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com

Re: HBase region assignment by range?

Posted by Demai Ni <ni...@gmail.com>.

Nick, thanks. I will look into Phoenix. Heard about Phoenix for quite a
while, but haven't seriously played with it yet. Considering Phoenix's
target user base, it is no surprise for Phoenix team to consider something
around secondary index, complex JOIN. etc.

Anoop, thanks for your input too. custom LB will be the way to go...

On Wed, Apr 8, 2015 at 4:11 PM, Nick Dimiduk <nd...@gmail.com> wrote:

> Your needs (and use case?) looks a lot like the local secondary index work
> happening around Phoenix.
>
> On Wed, Apr 8, 2015 at 11:50 AM, Anoop John <an...@gmail.com> wrote:
>
> > bq.while the region can surely split when more data added-on, but can
> HBase
> > keep the new regions still on the same regionServer according to the
> > predefined bounary?
> >
> > You need custom LB for that.. If there, it is possible to restrict
> >
> > -Anoop-
> >
> >
> > On Thu, Apr 9, 2015 at 12:09 AM, Demai Ni <ni...@gmail.com> wrote:
> >
> > > hi, Guys,
> > >
> > > many thanks for your quick response.
> > >
> > > First, Let me share what I am looking at, which may help to clarify the
> > > intention and answer a few of questions. I am working on a POC to bring
> > in
> > > MPP style of OLAP on Hadoop, and looking for whether it is feasible to
> > have
> > > HBase as Datastore. With HBase, I'd like to take advantage of 1) OLTP
> > > capability ; 2) many filters ; 3) in-cluster replica and
> between-clusters
> > > replication. I am currently using TPCH schema for this POC, and also
> > > consider star-schema. Since it is a POC, I can pretty much define my
> > rules
> > > and set limitations as it fits. :-)
> > >
> > > Why doesn't this(presplit) work for you?
> > >
> > >  The reason is that presplit won't guarantee the regions stay at the
> > > pre-assigned regionServer. Let's say I have a very large table and a
> very
> > > small table with different data distribution, even with the same
> presplit
> > > value. HBase won't ensure the same range of data located on the same
> > > physical node. Unless we have a custom LB mentioned by @Anoop and
> > @esteban.
> > > Is my understanding correct? BTW, I will look into HBASE-10576 to see
> > > whether it fits my needs.
> > >
> > > Is your table staic?
> > > >
> > > while I can make it static for POC purpose, but I will use this
> > limitation,
> > > as I'd like the HBase for its OLTP feature. So besides the 'static'
> > HFile,
> > > need HLOGs on the same local node too. But again, I would worry about
> the
> > > 'static' HFile for now
> > >
> > > However as you add data to the table, those regions will eventually
> > split.
> > >
> > >  while the region can surely split when more data added-on, but can
> HBase
> > > keep the new regions still on the same regionServer according to the
> > > predefined bounary? I will worry about hotspot-issue late. that is the
> > > beauty of doing POC instead of production. :-)
> > >
> > > What you’re suggesting is that as you do a region scan, you’re going to
> > the
> > > > other table and then try to fetch a row if it exists.
> > > >
> > > Yes, something like that. I am currently using the client API: scan()
> > with
> > > start and end key.  Since I know my start and end keys, and with the
> > > local-read feature, the scan should be local-READ. With some
> > > statistics(such as which one is larger table) and  a hash join
> > > operation(which I need to implement), the join will work with
> not-too-bad
> > > performance. Again, it is POC, so I won't worry about the situation
> that
> > a
> > > regionServer hosts too much data(hotspot). But surely, a LB should be
> > used
> > > before putting into production if it ever occurs.
> > >
> > > either the second table should be part of the first table in the same
> CF
> > or
> > > > as a separate CF
> > > >
> > > I am not sure whether it will work for a situation of a large table vs
> a
> > > small table. The data of the small table has to be duplicated in many
> > > places, and a update of the small table can be costly.
> > >
> > > Demai
> > >
> > >
> > > On Wed, Apr 8, 2015 at 10:24 AM, Esteban Gutierrez <
> esteban@cloudera.com
> > >
> > > wrote:
> > >
> > > > +1 Anoop.
> > > >
> > > > Thats pretty much the only way right now if you need a custom
> > balancing.
> > > > This balancer doesn't have to live in the HMaster and can be invoked
> > > > externally (there are caveats of doing that, when a RS die but works
> ok
> > > so
> > > > far). A long term solution for your the problem you are trying to
> solve
> > > is
> > > > HBASE-10576 by tweaking it a little.
> > > >
> > > > cheers,
> > > > esteban.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Cloudera, Inc.
> > > >
> > > >
> > > > On Wed, Apr 8, 2015 at 4:41 AM, Michael Segel <
> > michael_segel@hotmail.com
> > > >
> > > > wrote:
> > > >
> > > > > Is your table staic?
> > > > >
> > > > > If you know your data and your ranges, you can do it. However as
> you
> > > add
> > > > > data to the table, those regions will eventually split.
> > > > >
> > > > > The other issue that you brought up is that you want to do ‘local’
> > > joins.
> > > > >
> > > > > Simple single word response… don’t.
> > > > >
> > > > > Longer response..
> > > > >
> > > > > You’re suggesting that the tables in question share the row key in
> > > > > common.  Ok… why? Are they part of the same record?
> > > > > How is the data normally being used?
> > > > >
> > > > > Have you looked at column families?
> > > > >
> > > > > The issue is that joins are expensive. What you’re suggesting is
> that
> > > as
> > > > > you do a region scan, you’re going to the other table and then try
> to
> > > > fetch
> > > > > a row if it exists.
> > > > > So its essentially for each row in the scan, try a get() which will
> > > > almost
> > > > > double the cost of your fetch. Then you have to decide how to do it
> > > > > locally. Are you really going to write a coprocessor for this?
> > (Hint:
> > > If
> > > > > this is a common thing. Then either the second table should be part
> > of
> > > > the
> > > > > first table in the same CF or as a separate CF. You need to rethink
> > > your
> > > > > schema.)
> > > > >
> > > > > Does this make sense?
> > > > >
> > > > > > On Apr 7, 2015, at 7:05 PM, Demai Ni <ni...@gmail.com> wrote:
> > > > > >
> > > > > > hi, folks,
> > > > > >
> > > > > > I have a question about region assignment and like to clarify
> some
> > > > > through.
> > > > > >
> > > > > > Let's say I have a table with rowkey as "row00000 ~ row30000" on
> a
> > 4
> > > > node
> > > > > > hbase cluster, is there a way to keep data partitioned by range
> on
> > > each
> > > > > > node? for example:
> > > > > >
> > > > > > node1:  <=row10000
> > > > > > node2:  row10001~row20000
> > > > > > node3:  row20001~row30000
> > > > > > node4:  >row30000
> > > > > >
> > > > > > And even when one of the node become hotspot, the boundary won't
> be
> > > > > crossed
> > > > > > unless manually doing a load balancing?
> > > > > >
> > > > > > I looked at presplit: { SPLITS => ['row100','row200','row300'] }
> ,
> > > but
> > > > > > don't think it serves this purpose.
> > > > > >
> > > > > > BTW, a bit background. I am thinking to do a local join between
> two
> > > > > tables
> > > > > > if both have same rowkey, and partitioned by range (or same hash
> > > > > > algorithm). If I can keep the join-key on the same node(aka
> > > > > regionServer),
> > > > > > the join can be handled locally instead of broadcast to all other
> > > > nodes.
> > > > > >
> > > > > > Thanks for your input. A couple pointers to blog/presentation
> would
> > > be
> > > > > > appreciated.
> > > > > >
> > > > > > Demai
> > > > >
> > > > > The opinions expressed here are mine, while they may reflect a
> > > cognitive
> > > > > thought, that is purely accidental.
> > > > > Use at your own risk.
> > > > > Michael Segel
> > > > > michael_segel (AT) hotmail.com
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: HBase region assignment by range?

Posted by Michael Segel <mi...@hotmail.com>.

Sorry, but ‘local’ secondary indexes are a joke. 

If anyone is going to the Big Data TechCon in Boston, find me and I’ll give you a nice lecture on why this not a good idea. 
But for those who can’t make it… 

1) You’re fighting the system.  
2) Unless you have at least 10K rows in the region, you’re going to probably be faster doing a filtered scan.
3) Think about table joins against indexed columns as your use case.  The minute you do this… you’ll understand why its not a good idea.
4) Sparse indexes. (Uniqueness of values and their distribution throughout the data set…) 
5) What are you actually using as your index… ;-) [hint:  Lucene or a modified Lucene could work for a secondary index… 
	you are only thinking in terms of inverted tables. ]
6) Look at your index. 
	A) You’re combining all indexes for the region in to a single table. 
	B) You have to now synchronize splits and management of the two tables so that splits and locality are maintained. (Fighting the underlying HDFS system too.) 
	C) You’re adding to the overhead on managing the region which means longer splits and compactions.
	D) Look at your key to value size ratio. In fact if you’re doing all of the data in a single table, you will end up with a ratio of one row in the index to one row in the table
	     and will not even need to see the value stored because everything would be in the index.  (column name, attribute, value) 
	E) Increased complexity during splits.  (You will end up having to rebuild your index during every split.) 



The minute you actually spend a quick NY minute on thinking about these things, you’ll realize that its a dumb idea. 

HTH

-Mike


> On Apr 8, 2015, at 6:11 PM, Nick Dimiduk <nd...@gmail.com> wrote:
> 
> Your needs (and use case?) looks a lot like the local secondary index work
> happening around Phoenix.
> 
> On Wed, Apr 8, 2015 at 11:50 AM, Anoop John <an...@gmail.com> wrote:
> 
>> bq.while the region can surely split when more data added-on, but can HBase
>> keep the new regions still on the same regionServer according to the
>> predefined bounary?
>> 
>> You need custom LB for that.. If there, it is possible to restrict
>> 
>> -Anoop-
>> 
>> 
>> On Thu, Apr 9, 2015 at 12:09 AM, Demai Ni <ni...@gmail.com> wrote:
>> 
>>> hi, Guys,
>>> 
>>> many thanks for your quick response.
>>> 
>>> First, Let me share what I am looking at, which may help to clarify the
>>> intention and answer a few of questions. I am working on a POC to bring
>> in
>>> MPP style of OLAP on Hadoop, and looking for whether it is feasible to
>> have
>>> HBase as Datastore. With HBase, I'd like to take advantage of 1) OLTP
>>> capability ; 2) many filters ; 3) in-cluster replica and between-clusters
>>> replication. I am currently using TPCH schema for this POC, and also
>>> consider star-schema. Since it is a POC, I can pretty much define my
>> rules
>>> and set limitations as it fits. :-)
>>> 
>>> Why doesn't this(presplit) work for you?
>>> 
>>> The reason is that presplit won't guarantee the regions stay at the
>>> pre-assigned regionServer. Let's say I have a very large table and a very
>>> small table with different data distribution, even with the same presplit
>>> value. HBase won't ensure the same range of data located on the same
>>> physical node. Unless we have a custom LB mentioned by @Anoop and
>> @esteban.
>>> Is my understanding correct? BTW, I will look into HBASE-10576 to see
>>> whether it fits my needs.
>>> 
>>> Is your table staic?
>>>> 
>>> while I can make it static for POC purpose, but I will use this
>> limitation,
>>> as I'd like the HBase for its OLTP feature. So besides the 'static'
>> HFile,
>>> need HLOGs on the same local node too. But again, I would worry about the
>>> 'static' HFile for now
>>> 
>>> However as you add data to the table, those regions will eventually
>> split.
>>> 
>>> while the region can surely split when more data added-on, but can HBase
>>> keep the new regions still on the same regionServer according to the
>>> predefined bounary? I will worry about hotspot-issue late. that is the
>>> beauty of doing POC instead of production. :-)
>>> 
>>> What you’re suggesting is that as you do a region scan, you’re going to
>> the
>>>> other table and then try to fetch a row if it exists.
>>>> 
>>> Yes, something like that. I am currently using the client API: scan()
>> with
>>> start and end key.  Since I know my start and end keys, and with the
>>> local-read feature, the scan should be local-READ. With some
>>> statistics(such as which one is larger table) and  a hash join
>>> operation(which I need to implement), the join will work with not-too-bad
>>> performance. Again, it is POC, so I won't worry about the situation that
>> a
>>> regionServer hosts too much data(hotspot). But surely, a LB should be
>> used
>>> before putting into production if it ever occurs.
>>> 
>>> either the second table should be part of the first table in the same CF
>> or
>>>> as a separate CF
>>>> 
>>> I am not sure whether it will work for a situation of a large table vs a
>>> small table. The data of the small table has to be duplicated in many
>>> places, and a update of the small table can be costly.
>>> 
>>> Demai
>>> 
>>> 
>>> On Wed, Apr 8, 2015 at 10:24 AM, Esteban Gutierrez <esteban@cloudera.com
>>> 
>>> wrote:
>>> 
>>>> +1 Anoop.
>>>> 
>>>> Thats pretty much the only way right now if you need a custom
>> balancing.
>>>> This balancer doesn't have to live in the HMaster and can be invoked
>>>> externally (there are caveats of doing that, when a RS die but works ok
>>> so
>>>> far). A long term solution for your the problem you are trying to solve
>>> is
>>>> HBASE-10576 by tweaking it a little.
>>>> 
>>>> cheers,
>>>> esteban.
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Cloudera, Inc.
>>>> 
>>>> 
>>>> On Wed, Apr 8, 2015 at 4:41 AM, Michael Segel <
>> michael_segel@hotmail.com
>>>> 
>>>> wrote:
>>>> 
>>>>> Is your table staic?
>>>>> 
>>>>> If you know your data and your ranges, you can do it. However as you
>>> add
>>>>> data to the table, those regions will eventually split.
>>>>> 
>>>>> The other issue that you brought up is that you want to do ‘local’
>>> joins.
>>>>> 
>>>>> Simple single word response… don’t.
>>>>> 
>>>>> Longer response..
>>>>> 
>>>>> You’re suggesting that the tables in question share the row key in
>>>>> common.  Ok… why? Are they part of the same record?
>>>>> How is the data normally being used?
>>>>> 
>>>>> Have you looked at column families?
>>>>> 
>>>>> The issue is that joins are expensive. What you’re suggesting is that
>>> as
>>>>> you do a region scan, you’re going to the other table and then try to
>>>> fetch
>>>>> a row if it exists.
>>>>> So its essentially for each row in the scan, try a get() which will
>>>> almost
>>>>> double the cost of your fetch. Then you have to decide how to do it
>>>>> locally. Are you really going to write a coprocessor for this?
>> (Hint:
>>> If
>>>>> this is a common thing. Then either the second table should be part
>> of
>>>> the
>>>>> first table in the same CF or as a separate CF. You need to rethink
>>> your
>>>>> schema.)
>>>>> 
>>>>> Does this make sense?
>>>>> 
>>>>>> On Apr 7, 2015, at 7:05 PM, Demai Ni <ni...@gmail.com> wrote:
>>>>>> 
>>>>>> hi, folks,
>>>>>> 
>>>>>> I have a question about region assignment and like to clarify some
>>>>> through.
>>>>>> 
>>>>>> Let's say I have a table with rowkey as "row00000 ~ row30000" on a
>> 4
>>>> node
>>>>>> hbase cluster, is there a way to keep data partitioned by range on
>>> each
>>>>>> node? for example:
>>>>>> 
>>>>>> node1:  <=row10000
>>>>>> node2:  row10001~row20000
>>>>>> node3:  row20001~row30000
>>>>>> node4:  >row30000
>>>>>> 
>>>>>> And even when one of the node become hotspot, the boundary won't be
>>>>> crossed
>>>>>> unless manually doing a load balancing?
>>>>>> 
>>>>>> I looked at presplit: { SPLITS => ['row100','row200','row300'] } ,
>>> but
>>>>>> don't think it serves this purpose.
>>>>>> 
>>>>>> BTW, a bit background. I am thinking to do a local join between two
>>>>> tables
>>>>>> if both have same rowkey, and partitioned by range (or same hash
>>>>>> algorithm). If I can keep the join-key on the same node(aka
>>>>> regionServer),
>>>>>> the join can be handled locally instead of broadcast to all other
>>>> nodes.
>>>>>> 
>>>>>> Thanks for your input. A couple pointers to blog/presentation would
>>> be
>>>>>> appreciated.
>>>>>> 
>>>>>> Demai
>>>>> 
>>>>> The opinions expressed here are mine, while they may reflect a
>>> cognitive
>>>>> thought, that is purely accidental.
>>>>> Use at your own risk.
>>>>> Michael Segel
>>>>> michael_segel (AT) hotmail.com
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> 
>> 

The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com

Re: HBase region assignment by range?

Posted by Michael Segel <mi...@hotmail.com>.

Sorry, but ‘local’ secondary indexes are a joke. 

If anyone is going to the Big Data TechCon in Boston, find me and I’ll give you a nice lecture on why this not a good idea. 
But for those who can’t make it… 

1) You’re fighting the system.  
2) Unless you have at least 10K rows in the region, you’re going to probably be faster doing a filtered scan.
3) Think about table joins against indexed columns as your use case.  The minute you do this… you’ll understand why its not a good idea.
4) Sparse indexes. (Uniqueness of values and their distribution throughout the data set…) 
5) What are you actually using as your index… ;-) [hint:  Lucene or a modified Lucene could work for a secondary index… 
	you are only thinking in terms of inverted tables. ]
6) Look at your index. 
	A) You’re combining all indexes for the region in to a single table. 
	B) You have to now synchronize splits and management of the two tables so that splits and locality are maintained. (Fighting the underlying HDFS system too.) 
	C) You’re adding to the overhead on managing the region which means longer splits and compactions.
	D) Look at your key to value size ratio. In fact if you’re doing all of the data in a single table, you will end up with a ratio of one row in the index to one row in the table
	     and will not even need to see the value stored because everything would be in the index.  (column name, attribute, value) 
	E) Increased complexity during splits.  (You will end up having to rebuild your index during every split.) 



The minute you actually spend a quick NY minute on thinking about these things, you’ll realize that its a dumb idea. 

HTH

-Mike


> On Apr 8, 2015, at 6:11 PM, Nick Dimiduk <nd...@gmail.com> wrote:
> 
> Your needs (and use case?) looks a lot like the local secondary index work
> happening around Phoenix.
> 
> On Wed, Apr 8, 2015 at 11:50 AM, Anoop John <an...@gmail.com> wrote:
> 
>> bq.while the region can surely split when more data added-on, but can HBase
>> keep the new regions still on the same regionServer according to the
>> predefined bounary?
>> 
>> You need custom LB for that.. If there, it is possible to restrict
>> 
>> -Anoop-
>> 
>> 
>> On Thu, Apr 9, 2015 at 12:09 AM, Demai Ni <ni...@gmail.com> wrote:
>> 
>>> hi, Guys,
>>> 
>>> many thanks for your quick response.
>>> 
>>> First, Let me share what I am looking at, which may help to clarify the
>>> intention and answer a few of questions. I am working on a POC to bring
>> in
>>> MPP style of OLAP on Hadoop, and looking for whether it is feasible to
>> have
>>> HBase as Datastore. With HBase, I'd like to take advantage of 1) OLTP
>>> capability ; 2) many filters ; 3) in-cluster replica and between-clusters
>>> replication. I am currently using TPCH schema for this POC, and also
>>> consider star-schema. Since it is a POC, I can pretty much define my
>> rules
>>> and set limitations as it fits. :-)
>>> 
>>> Why doesn't this(presplit) work for you?
>>> 
>>> The reason is that presplit won't guarantee the regions stay at the
>>> pre-assigned regionServer. Let's say I have a very large table and a very
>>> small table with different data distribution, even with the same presplit
>>> value. HBase won't ensure the same range of data located on the same
>>> physical node. Unless we have a custom LB mentioned by @Anoop and
>> @esteban.
>>> Is my understanding correct? BTW, I will look into HBASE-10576 to see
>>> whether it fits my needs.
>>> 
>>> Is your table staic?
>>>> 
>>> while I can make it static for POC purpose, but I will use this
>> limitation,
>>> as I'd like the HBase for its OLTP feature. So besides the 'static'
>> HFile,
>>> need HLOGs on the same local node too. But again, I would worry about the
>>> 'static' HFile for now
>>> 
>>> However as you add data to the table, those regions will eventually
>> split.
>>> 
>>> while the region can surely split when more data added-on, but can HBase
>>> keep the new regions still on the same regionServer according to the
>>> predefined bounary? I will worry about hotspot-issue late. that is the
>>> beauty of doing POC instead of production. :-)
>>> 
>>> What you’re suggesting is that as you do a region scan, you’re going to
>> the
>>>> other table and then try to fetch a row if it exists.
>>>> 
>>> Yes, something like that. I am currently using the client API: scan()
>> with
>>> start and end key.  Since I know my start and end keys, and with the
>>> local-read feature, the scan should be local-READ. With some
>>> statistics(such as which one is larger table) and  a hash join
>>> operation(which I need to implement), the join will work with not-too-bad
>>> performance. Again, it is POC, so I won't worry about the situation that
>> a
>>> regionServer hosts too much data(hotspot). But surely, a LB should be
>> used
>>> before putting into production if it ever occurs.
>>> 
>>> either the second table should be part of the first table in the same CF
>> or
>>>> as a separate CF
>>>> 
>>> I am not sure whether it will work for a situation of a large table vs a
>>> small table. The data of the small table has to be duplicated in many
>>> places, and a update of the small table can be costly.
>>> 
>>> Demai
>>> 
>>> 
>>> On Wed, Apr 8, 2015 at 10:24 AM, Esteban Gutierrez <esteban@cloudera.com
>>> 
>>> wrote:
>>> 
>>>> +1 Anoop.
>>>> 
>>>> Thats pretty much the only way right now if you need a custom
>> balancing.
>>>> This balancer doesn't have to live in the HMaster and can be invoked
>>>> externally (there are caveats of doing that, when a RS die but works ok
>>> so
>>>> far). A long term solution for your the problem you are trying to solve
>>> is
>>>> HBASE-10576 by tweaking it a little.
>>>> 
>>>> cheers,
>>>> esteban.
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Cloudera, Inc.
>>>> 
>>>> 
>>>> On Wed, Apr 8, 2015 at 4:41 AM, Michael Segel <
>> michael_segel@hotmail.com
>>>> 
>>>> wrote:
>>>> 
>>>>> Is your table staic?
>>>>> 
>>>>> If you know your data and your ranges, you can do it. However as you
>>> add
>>>>> data to the table, those regions will eventually split.
>>>>> 
>>>>> The other issue that you brought up is that you want to do ‘local’
>>> joins.
>>>>> 
>>>>> Simple single word response… don’t.
>>>>> 
>>>>> Longer response..
>>>>> 
>>>>> You’re suggesting that the tables in question share the row key in
>>>>> common.  Ok… why? Are they part of the same record?
>>>>> How is the data normally being used?
>>>>> 
>>>>> Have you looked at column families?
>>>>> 
>>>>> The issue is that joins are expensive. What you’re suggesting is that
>>> as
>>>>> you do a region scan, you’re going to the other table and then try to
>>>> fetch
>>>>> a row if it exists.
>>>>> So its essentially for each row in the scan, try a get() which will
>>>> almost
>>>>> double the cost of your fetch. Then you have to decide how to do it
>>>>> locally. Are you really going to write a coprocessor for this?
>> (Hint:
>>> If
>>>>> this is a common thing. Then either the second table should be part
>> of
>>>> the
>>>>> first table in the same CF or as a separate CF. You need to rethink
>>> your
>>>>> schema.)
>>>>> 
>>>>> Does this make sense?
>>>>> 
>>>>>> On Apr 7, 2015, at 7:05 PM, Demai Ni <ni...@gmail.com> wrote:
>>>>>> 
>>>>>> hi, folks,
>>>>>> 
>>>>>> I have a question about region assignment and like to clarify some
>>>>> through.
>>>>>> 
>>>>>> Let's say I have a table with rowkey as "row00000 ~ row30000" on a
>> 4
>>>> node
>>>>>> hbase cluster, is there a way to keep data partitioned by range on
>>> each
>>>>>> node? for example:
>>>>>> 
>>>>>> node1:  <=row10000
>>>>>> node2:  row10001~row20000
>>>>>> node3:  row20001~row30000
>>>>>> node4:  >row30000
>>>>>> 
>>>>>> And even when one of the node become hotspot, the boundary won't be
>>>>> crossed
>>>>>> unless manually doing a load balancing?
>>>>>> 
>>>>>> I looked at presplit: { SPLITS => ['row100','row200','row300'] } ,
>>> but
>>>>>> don't think it serves this purpose.
>>>>>> 
>>>>>> BTW, a bit background. I am thinking to do a local join between two
>>>>> tables
>>>>>> if both have same rowkey, and partitioned by range (or same hash
>>>>>> algorithm). If I can keep the join-key on the same node(aka
>>>>> regionServer),
>>>>>> the join can be handled locally instead of broadcast to all other
>>>> nodes.
>>>>>> 
>>>>>> Thanks for your input. A couple pointers to blog/presentation would
>>> be
>>>>>> appreciated.
>>>>>> 
>>>>>> Demai
>>>>> 
>>>>> The opinions expressed here are mine, while they may reflect a
>>> cognitive
>>>>> thought, that is purely accidental.
>>>>> Use at your own risk.
>>>>> Michael Segel
>>>>> michael_segel (AT) hotmail.com
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> 
>> 

The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com

Re: HBase region assignment by range?

Posted by Nick Dimiduk <nd...@gmail.com>.

Your needs (and use case?) looks a lot like the local secondary index work
happening around Phoenix.

On Wed, Apr 8, 2015 at 11:50 AM, Anoop John <an...@gmail.com> wrote:

> bq.while the region can surely split when more data added-on, but can HBase
> keep the new regions still on the same regionServer according to the
> predefined bounary?
>
> You need custom LB for that.. If there, it is possible to restrict
>
> -Anoop-
>
>
> On Thu, Apr 9, 2015 at 12:09 AM, Demai Ni <ni...@gmail.com> wrote:
>
> > hi, Guys,
> >
> > many thanks for your quick response.
> >
> > First, Let me share what I am looking at, which may help to clarify the
> > intention and answer a few of questions. I am working on a POC to bring
> in
> > MPP style of OLAP on Hadoop, and looking for whether it is feasible to
> have
> > HBase as Datastore. With HBase, I'd like to take advantage of 1) OLTP
> > capability ; 2) many filters ; 3) in-cluster replica and between-clusters
> > replication. I am currently using TPCH schema for this POC, and also
> > consider star-schema. Since it is a POC, I can pretty much define my
> rules
> > and set limitations as it fits. :-)
> >
> > Why doesn't this(presplit) work for you?
> >
> >  The reason is that presplit won't guarantee the regions stay at the
> > pre-assigned regionServer. Let's say I have a very large table and a very
> > small table with different data distribution, even with the same presplit
> > value. HBase won't ensure the same range of data located on the same
> > physical node. Unless we have a custom LB mentioned by @Anoop and
> @esteban.
> > Is my understanding correct? BTW, I will look into HBASE-10576 to see
> > whether it fits my needs.
> >
> > Is your table staic?
> > >
> > while I can make it static for POC purpose, but I will use this
> limitation,
> > as I'd like the HBase for its OLTP feature. So besides the 'static'
> HFile,
> > need HLOGs on the same local node too. But again, I would worry about the
> > 'static' HFile for now
> >
> > However as you add data to the table, those regions will eventually
> split.
> >
> >  while the region can surely split when more data added-on, but can HBase
> > keep the new regions still on the same regionServer according to the
> > predefined bounary? I will worry about hotspot-issue late. that is the
> > beauty of doing POC instead of production. :-)
> >
> > What you’re suggesting is that as you do a region scan, you’re going to
> the
> > > other table and then try to fetch a row if it exists.
> > >
> > Yes, something like that. I am currently using the client API: scan()
> with
> > start and end key.  Since I know my start and end keys, and with the
> > local-read feature, the scan should be local-READ. With some
> > statistics(such as which one is larger table) and  a hash join
> > operation(which I need to implement), the join will work with not-too-bad
> > performance. Again, it is POC, so I won't worry about the situation that
> a
> > regionServer hosts too much data(hotspot). But surely, a LB should be
> used
> > before putting into production if it ever occurs.
> >
> > either the second table should be part of the first table in the same CF
> or
> > > as a separate CF
> > >
> > I am not sure whether it will work for a situation of a large table vs a
> > small table. The data of the small table has to be duplicated in many
> > places, and a update of the small table can be costly.
> >
> > Demai
> >
> >
> > On Wed, Apr 8, 2015 at 10:24 AM, Esteban Gutierrez <esteban@cloudera.com
> >
> > wrote:
> >
> > > +1 Anoop.
> > >
> > > Thats pretty much the only way right now if you need a custom
> balancing.
> > > This balancer doesn't have to live in the HMaster and can be invoked
> > > externally (there are caveats of doing that, when a RS die but works ok
> > so
> > > far). A long term solution for your the problem you are trying to solve
> > is
> > > HBASE-10576 by tweaking it a little.
> > >
> > > cheers,
> > > esteban.
> > >
> > >
> > >
> > >
> > >
> > > --
> > > Cloudera, Inc.
> > >
> > >
> > > On Wed, Apr 8, 2015 at 4:41 AM, Michael Segel <
> michael_segel@hotmail.com
> > >
> > > wrote:
> > >
> > > > Is your table staic?
> > > >
> > > > If you know your data and your ranges, you can do it. However as you
> > add
> > > > data to the table, those regions will eventually split.
> > > >
> > > > The other issue that you brought up is that you want to do ‘local’
> > joins.
> > > >
> > > > Simple single word response… don’t.
> > > >
> > > > Longer response..
> > > >
> > > > You’re suggesting that the tables in question share the row key in
> > > > common.  Ok… why? Are they part of the same record?
> > > > How is the data normally being used?
> > > >
> > > > Have you looked at column families?
> > > >
> > > > The issue is that joins are expensive. What you’re suggesting is that
> > as
> > > > you do a region scan, you’re going to the other table and then try to
> > > fetch
> > > > a row if it exists.
> > > > So its essentially for each row in the scan, try a get() which will
> > > almost
> > > > double the cost of your fetch. Then you have to decide how to do it
> > > > locally. Are you really going to write a coprocessor for this?
> (Hint:
> > If
> > > > this is a common thing. Then either the second table should be part
> of
> > > the
> > > > first table in the same CF or as a separate CF. You need to rethink
> > your
> > > > schema.)
> > > >
> > > > Does this make sense?
> > > >
> > > > > On Apr 7, 2015, at 7:05 PM, Demai Ni <ni...@gmail.com> wrote:
> > > > >
> > > > > hi, folks,
> > > > >
> > > > > I have a question about region assignment and like to clarify some
> > > > through.
> > > > >
> > > > > Let's say I have a table with rowkey as "row00000 ~ row30000" on a
> 4
> > > node
> > > > > hbase cluster, is there a way to keep data partitioned by range on
> > each
> > > > > node? for example:
> > > > >
> > > > > node1:  <=row10000
> > > > > node2:  row10001~row20000
> > > > > node3:  row20001~row30000
> > > > > node4:  >row30000
> > > > >
> > > > > And even when one of the node become hotspot, the boundary won't be
> > > > crossed
> > > > > unless manually doing a load balancing?
> > > > >
> > > > > I looked at presplit: { SPLITS => ['row100','row200','row300'] } ,
> > but
> > > > > don't think it serves this purpose.
> > > > >
> > > > > BTW, a bit background. I am thinking to do a local join between two
> > > > tables
> > > > > if both have same rowkey, and partitioned by range (or same hash
> > > > > algorithm). If I can keep the join-key on the same node(aka
> > > > regionServer),
> > > > > the join can be handled locally instead of broadcast to all other
> > > nodes.
> > > > >
> > > > > Thanks for your input. A couple pointers to blog/presentation would
> > be
> > > > > appreciated.
> > > > >
> > > > > Demai
> > > >
> > > > The opinions expressed here are mine, while they may reflect a
> > cognitive
> > > > thought, that is purely accidental.
> > > > Use at your own risk.
> > > > Michael Segel
> > > > michael_segel (AT) hotmail.com
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> >
>

Re: HBase region assignment by range?

Posted by Anoop John <an...@gmail.com>.

bq.while the region can surely split when more data added-on, but can HBase
keep the new regions still on the same regionServer according to the
predefined bounary?

You need custom LB for that.. If there, it is possible to restrict

-Anoop-


On Thu, Apr 9, 2015 at 12:09 AM, Demai Ni <ni...@gmail.com> wrote:

> hi, Guys,
>
> many thanks for your quick response.
>
> First, Let me share what I am looking at, which may help to clarify the
> intention and answer a few of questions. I am working on a POC to bring in
> MPP style of OLAP on Hadoop, and looking for whether it is feasible to have
> HBase as Datastore. With HBase, I'd like to take advantage of 1) OLTP
> capability ; 2) many filters ; 3) in-cluster replica and between-clusters
> replication. I am currently using TPCH schema for this POC, and also
> consider star-schema. Since it is a POC, I can pretty much define my rules
> and set limitations as it fits. :-)
>
> Why doesn't this(presplit) work for you?
>
>  The reason is that presplit won't guarantee the regions stay at the
> pre-assigned regionServer. Let's say I have a very large table and a very
> small table with different data distribution, even with the same presplit
> value. HBase won't ensure the same range of data located on the same
> physical node. Unless we have a custom LB mentioned by @Anoop and @esteban.
> Is my understanding correct? BTW, I will look into HBASE-10576 to see
> whether it fits my needs.
>
> Is your table staic?
> >
> while I can make it static for POC purpose, but I will use this limitation,
> as I'd like the HBase for its OLTP feature. So besides the 'static' HFile,
> need HLOGs on the same local node too. But again, I would worry about the
> 'static' HFile for now
>
> However as you add data to the table, those regions will eventually split.
>
>  while the region can surely split when more data added-on, but can HBase
> keep the new regions still on the same regionServer according to the
> predefined bounary? I will worry about hotspot-issue late. that is the
> beauty of doing POC instead of production. :-)
>
> What you’re suggesting is that as you do a region scan, you’re going to the
> > other table and then try to fetch a row if it exists.
> >
> Yes, something like that. I am currently using the client API: scan() with
> start and end key.  Since I know my start and end keys, and with the
> local-read feature, the scan should be local-READ. With some
> statistics(such as which one is larger table) and  a hash join
> operation(which I need to implement), the join will work with not-too-bad
> performance. Again, it is POC, so I won't worry about the situation that a
> regionServer hosts too much data(hotspot). But surely, a LB should be used
> before putting into production if it ever occurs.
>
> either the second table should be part of the first table in the same CF or
> > as a separate CF
> >
> I am not sure whether it will work for a situation of a large table vs a
> small table. The data of the small table has to be duplicated in many
> places, and a update of the small table can be costly.
>
> Demai
>
>
> On Wed, Apr 8, 2015 at 10:24 AM, Esteban Gutierrez <es...@cloudera.com>
> wrote:
>
> > +1 Anoop.
> >
> > Thats pretty much the only way right now if you need a custom balancing.
> > This balancer doesn't have to live in the HMaster and can be invoked
> > externally (there are caveats of doing that, when a RS die but works ok
> so
> > far). A long term solution for your the problem you are trying to solve
> is
> > HBASE-10576 by tweaking it a little.
> >
> > cheers,
> > esteban.
> >
> >
> >
> >
> >
> > --
> > Cloudera, Inc.
> >
> >
> > On Wed, Apr 8, 2015 at 4:41 AM, Michael Segel <michael_segel@hotmail.com
> >
> > wrote:
> >
> > > Is your table staic?
> > >
> > > If you know your data and your ranges, you can do it. However as you
> add
> > > data to the table, those regions will eventually split.
> > >
> > > The other issue that you brought up is that you want to do ‘local’
> joins.
> > >
> > > Simple single word response… don’t.
> > >
> > > Longer response..
> > >
> > > You’re suggesting that the tables in question share the row key in
> > > common.  Ok… why? Are they part of the same record?
> > > How is the data normally being used?
> > >
> > > Have you looked at column families?
> > >
> > > The issue is that joins are expensive. What you’re suggesting is that
> as
> > > you do a region scan, you’re going to the other table and then try to
> > fetch
> > > a row if it exists.
> > > So its essentially for each row in the scan, try a get() which will
> > almost
> > > double the cost of your fetch. Then you have to decide how to do it
> > > locally. Are you really going to write a coprocessor for this?  (Hint:
> If
> > > this is a common thing. Then either the second table should be part of
> > the
> > > first table in the same CF or as a separate CF. You need to rethink
> your
> > > schema.)
> > >
> > > Does this make sense?
> > >
> > > > On Apr 7, 2015, at 7:05 PM, Demai Ni <ni...@gmail.com> wrote:
> > > >
> > > > hi, folks,
> > > >
> > > > I have a question about region assignment and like to clarify some
> > > through.
> > > >
> > > > Let's say I have a table with rowkey as "row00000 ~ row30000" on a 4
> > node
> > > > hbase cluster, is there a way to keep data partitioned by range on
> each
> > > > node? for example:
> > > >
> > > > node1:  <=row10000
> > > > node2:  row10001~row20000
> > > > node3:  row20001~row30000
> > > > node4:  >row30000
> > > >
> > > > And even when one of the node become hotspot, the boundary won't be
> > > crossed
> > > > unless manually doing a load balancing?
> > > >
> > > > I looked at presplit: { SPLITS => ['row100','row200','row300'] } ,
> but
> > > > don't think it serves this purpose.
> > > >
> > > > BTW, a bit background. I am thinking to do a local join between two
> > > tables
> > > > if both have same rowkey, and partitioned by range (or same hash
> > > > algorithm). If I can keep the join-key on the same node(aka
> > > regionServer),
> > > > the join can be handled locally instead of broadcast to all other
> > nodes.
> > > >
> > > > Thanks for your input. A couple pointers to blog/presentation would
> be
> > > > appreciated.
> > > >
> > > > Demai
> > >
> > > The opinions expressed here are mine, while they may reflect a
> cognitive
> > > thought, that is purely accidental.
> > > Use at your own risk.
> > > Michael Segel
> > > michael_segel (AT) hotmail.com
> > >
> > >
> > >
> > >
> > >
> > >
> >
>

Re: HBase region assignment by range?

Posted by Demai Ni <ni...@gmail.com>.

hi, Guys,

many thanks for your quick response.

First, Let me share what I am looking at, which may help to clarify the
intention and answer a few of questions. I am working on a POC to bring in
MPP style of OLAP on Hadoop, and looking for whether it is feasible to have
HBase as Datastore. With HBase, I'd like to take advantage of 1) OLTP
capability ; 2) many filters ; 3) in-cluster replica and between-clusters
replication. I am currently using TPCH schema for this POC, and also
consider star-schema. Since it is a POC, I can pretty much define my rules
and set limitations as it fits. :-)

Why doesn't this(presplit) work for you?

 The reason is that presplit won't guarantee the regions stay at the
pre-assigned regionServer. Let's say I have a very large table and a very
small table with different data distribution, even with the same presplit
value. HBase won't ensure the same range of data located on the same
physical node. Unless we have a custom LB mentioned by @Anoop and @esteban.
Is my understanding correct? BTW, I will look into HBASE-10576 to see
whether it fits my needs.

Is your table staic?
>
while I can make it static for POC purpose, but I will use this limitation,
as I'd like the HBase for its OLTP feature. So besides the 'static' HFile,
need HLOGs on the same local node too. But again, I would worry about the
'static' HFile for now

However as you add data to the table, those regions will eventually split.

 while the region can surely split when more data added-on, but can HBase
keep the new regions still on the same regionServer according to the
predefined bounary? I will worry about hotspot-issue late. that is the
beauty of doing POC instead of production. :-)

What you’re suggesting is that as you do a region scan, you’re going to the
> other table and then try to fetch a row if it exists.
>
Yes, something like that. I am currently using the client API: scan() with
start and end key.  Since I know my start and end keys, and with the
local-read feature, the scan should be local-READ. With some
statistics(such as which one is larger table) and  a hash join
operation(which I need to implement), the join will work with not-too-bad
performance. Again, it is POC, so I won't worry about the situation that a
regionServer hosts too much data(hotspot). But surely, a LB should be used
before putting into production if it ever occurs.

either the second table should be part of the first table in the same CF or
> as a separate CF
>
I am not sure whether it will work for a situation of a large table vs a
small table. The data of the small table has to be duplicated in many
places, and a update of the small table can be costly.

Demai


On Wed, Apr 8, 2015 at 10:24 AM, Esteban Gutierrez <es...@cloudera.com>
wrote:

> +1 Anoop.
>
> Thats pretty much the only way right now if you need a custom balancing.
> This balancer doesn't have to live in the HMaster and can be invoked
> externally (there are caveats of doing that, when a RS die but works ok so
> far). A long term solution for your the problem you are trying to solve is
> HBASE-10576 by tweaking it a little.
>
> cheers,
> esteban.
>
>
>
>
>
> --
> Cloudera, Inc.
>
>
> On Wed, Apr 8, 2015 at 4:41 AM, Michael Segel <mi...@hotmail.com>
> wrote:
>
> > Is your table staic?
> >
> > If you know your data and your ranges, you can do it. However as you add
> > data to the table, those regions will eventually split.
> >
> > The other issue that you brought up is that you want to do ‘local’ joins.
> >
> > Simple single word response… don’t.
> >
> > Longer response..
> >
> > You’re suggesting that the tables in question share the row key in
> > common.  Ok… why? Are they part of the same record?
> > How is the data normally being used?
> >
> > Have you looked at column families?
> >
> > The issue is that joins are expensive. What you’re suggesting is that as
> > you do a region scan, you’re going to the other table and then try to
> fetch
> > a row if it exists.
> > So its essentially for each row in the scan, try a get() which will
> almost
> > double the cost of your fetch. Then you have to decide how to do it
> > locally. Are you really going to write a coprocessor for this?  (Hint: If
> > this is a common thing. Then either the second table should be part of
> the
> > first table in the same CF or as a separate CF. You need to rethink your
> > schema.)
> >
> > Does this make sense?
> >
> > > On Apr 7, 2015, at 7:05 PM, Demai Ni <ni...@gmail.com> wrote:
> > >
> > > hi, folks,
> > >
> > > I have a question about region assignment and like to clarify some
> > through.
> > >
> > > Let's say I have a table with rowkey as "row00000 ~ row30000" on a 4
> node
> > > hbase cluster, is there a way to keep data partitioned by range on each
> > > node? for example:
> > >
> > > node1:  <=row10000
> > > node2:  row10001~row20000
> > > node3:  row20001~row30000
> > > node4:  >row30000
> > >
> > > And even when one of the node become hotspot, the boundary won't be
> > crossed
> > > unless manually doing a load balancing?
> > >
> > > I looked at presplit: { SPLITS => ['row100','row200','row300'] } , but
> > > don't think it serves this purpose.
> > >
> > > BTW, a bit background. I am thinking to do a local join between two
> > tables
> > > if both have same rowkey, and partitioned by range (or same hash
> > > algorithm). If I can keep the join-key on the same node(aka
> > regionServer),
> > > the join can be handled locally instead of broadcast to all other
> nodes.
> > >
> > > Thanks for your input. A couple pointers to blog/presentation would be
> > > appreciated.
> > >
> > > Demai
> >
> > The opinions expressed here are mine, while they may reflect a cognitive
> > thought, that is purely accidental.
> > Use at your own risk.
> > Michael Segel
> > michael_segel (AT) hotmail.com
> >
> >
> >
> >
> >
> >
>

Re: HBase region assignment by range?

Posted by Esteban Gutierrez <es...@cloudera.com>.

+1 Anoop.

Thats pretty much the only way right now if you need a custom balancing.
This balancer doesn't have to live in the HMaster and can be invoked
externally (there are caveats of doing that, when a RS die but works ok so
far). A long term solution for your the problem you are trying to solve is
HBASE-10576 by tweaking it a little.

cheers,
esteban.





--
Cloudera, Inc.


On Wed, Apr 8, 2015 at 4:41 AM, Michael Segel <mi...@hotmail.com>
wrote:

> Is your table staic?
>
> If you know your data and your ranges, you can do it. However as you add
> data to the table, those regions will eventually split.
>
> The other issue that you brought up is that you want to do ‘local’ joins.
>
> Simple single word response… don’t.
>
> Longer response..
>
> You’re suggesting that the tables in question share the row key in
> common.  Ok… why? Are they part of the same record?
> How is the data normally being used?
>
> Have you looked at column families?
>
> The issue is that joins are expensive. What you’re suggesting is that as
> you do a region scan, you’re going to the other table and then try to fetch
> a row if it exists.
> So its essentially for each row in the scan, try a get() which will almost
> double the cost of your fetch. Then you have to decide how to do it
> locally. Are you really going to write a coprocessor for this?  (Hint: If
> this is a common thing. Then either the second table should be part of the
> first table in the same CF or as a separate CF. You need to rethink your
> schema.)
>
> Does this make sense?
>
> > On Apr 7, 2015, at 7:05 PM, Demai Ni <ni...@gmail.com> wrote:
> >
> > hi, folks,
> >
> > I have a question about region assignment and like to clarify some
> through.
> >
> > Let's say I have a table with rowkey as "row00000 ~ row30000" on a 4 node
> > hbase cluster, is there a way to keep data partitioned by range on each
> > node? for example:
> >
> > node1:  <=row10000
> > node2:  row10001~row20000
> > node3:  row20001~row30000
> > node4:  >row30000
> >
> > And even when one of the node become hotspot, the boundary won't be
> crossed
> > unless manually doing a load balancing?
> >
> > I looked at presplit: { SPLITS => ['row100','row200','row300'] } , but
> > don't think it serves this purpose.
> >
> > BTW, a bit background. I am thinking to do a local join between two
> tables
> > if both have same rowkey, and partitioned by range (or same hash
> > algorithm). If I can keep the join-key on the same node(aka
> regionServer),
> > the join can be handled locally instead of broadcast to all other nodes.
> >
> > Thanks for your input. A couple pointers to blog/presentation would be
> > appreciated.
> >
> > Demai
>
> The opinions expressed here are mine, while they may reflect a cognitive
> thought, that is purely accidental.
> Use at your own risk.
> Michael Segel
> michael_segel (AT) hotmail.com
>
>
>
>
>
>

Re: HBase region assignment by range?

Posted by Michael Segel <mi...@hotmail.com>.

Is your table staic? 

If you know your data and your ranges, you can do it. However as you add data to the table, those regions will eventually split. 

The other issue that you brought up is that you want to do ‘local’ joins.

Simple single word response… don’t. 

Longer response.. 

You’re suggesting that the tables in question share the row key in common.  Ok… why? Are they part of the same record? 
How is the data normally being used?  

Have you looked at column families?

The issue is that joins are expensive. What you’re suggesting is that as you do a region scan, you’re going to the other table and then try to fetch a row if it exists. 
So its essentially for each row in the scan, try a get() which will almost double the cost of your fetch. Then you have to decide how to do it locally. Are you really going to write a coprocessor for this?  (Hint: If this is a common thing. Then either the second table should be part of the first table in the same CF or as a separate CF. You need to rethink your schema.) 

Does this make sense? 

> On Apr 7, 2015, at 7:05 PM, Demai Ni <ni...@gmail.com> wrote:
> 
> hi, folks,
> 
> I have a question about region assignment and like to clarify some through.
> 
> Let's say I have a table with rowkey as "row00000 ~ row30000" on a 4 node
> hbase cluster, is there a way to keep data partitioned by range on each
> node? for example:
> 
> node1:  <=row10000
> node2:  row10001~row20000
> node3:  row20001~row30000
> node4:  >row30000
> 
> And even when one of the node become hotspot, the boundary won't be crossed
> unless manually doing a load balancing?
> 
> I looked at presplit: { SPLITS => ['row100','row200','row300'] } , but
> don't think it serves this purpose.
> 
> BTW, a bit background. I am thinking to do a local join between two tables
> if both have same rowkey, and partitioned by range (or same hash
> algorithm). If I can keep the join-key on the same node(aka regionServer),
> the join can be handled locally instead of broadcast to all other nodes.
> 
> Thanks for your input. A couple pointers to blog/presentation would be
> appreciated.
> 
> Demai

The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com