You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Ramasubramanian Narayanan <ra...@gmail.com> on 2012/11/26 08:04:55 UTC

Expert suggestion needed to create table in Hbase - Banking

Hi,

  I have a requirement of physicalising the logical model... I have a
client model which has 600+ entities...

  Need suggestion how to go about physicalising it...

  I have few other doubts :
  1) Whether is it good to create a single table for all the 600+ columns?
  2) To have different column families for different groups or can it be
under a single column family? For example, customer address can we have as
a different column family?

  Please help on this..


regards,
Rams

Re: Expert suggestion needed to create table in Hbase - Banking

Posted by matan <ma...@cloudaloe.org>.
Hi Anil,

How much would you say is 'HBase:The Definitive Guide' still up-to-date and
useful, as concerning the time and new releases passed since it was written?
HBase is still evolving....

Thanks,
Matan



--
View this message in context: http://apache-hbase.679495.n3.nabble.com/Expert-suggestion-needed-to-create-table-in-Hbase-Banking-tp4034371p4034562.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: Expert suggestion needed to create table in Hbase - Banking

Posted by anil gupta <an...@gmail.com>.
Hi Rams,

IMHO, you need to go through http://hbase.apache.org/book.html and the book
"HBase:The Definitive Guide" to get a deeper understanding of HBase. It
will help you in designing your system.

There is no magical trick to design the most efficient/best RowKey without
knowing the detailed requirements, constraints and carrying out couple of
experiments.

HTH,
Anil


On Tue, Nov 27, 2012 at 8:44 PM, Ramasubramanian <
ramasubramanian.narayanan@gmail.com> wrote:

> Hi,
>
> Thanks!!
>
> Can someone help in suggesting what is the best rowkey that we can use in
> this scenario.
>
> Regards,
> Rams
>
> On 27-Nov-2012, at 10:37 PM, Suraj Varma <sv...@gmail.com> wrote:
>
> > Ian Varley's excellent HBaseCon presentation is another great resource.
> > http://ianvarley.com/coding/HBaseSchema_HBaseCon2012.pdf
> >
> > On Mon, Nov 26, 2012 at 5:43 AM, Doug Meil
> > <do...@explorysmedical.com> wrote:
> >>
> >> Hi there, somebody already wisely mentioned the link to the # of CF's
> >> entry, but here are a few other entries that can save you some heartburn
> >> if you read them ahead of time.
> >>
> >> http://hbase.apache.org/book.html#datamodel
> >>
> >> http://hbase.apache.org/book.html#schema
> >>
> >> http://hbase.apache.org/book.html#architecture
> >>
> >>
> >>
> >>
> >>
> >> On 11/26/12 5:28 AM, "Mohammad Tariq" <do...@gmail.com> wrote:
> >>
> >>> Hello sir,
> >>>
> >>>   You might become a victim of RS hotspotting, since the cutomerIDs
> will
> >>> be sequential(I assume). To keep things simple Hbase puts all the rows
> >>> with
> >>> similar keys to the same RS. But, it becomes a bottleneck in the long
> run
> >>> as all the data keeps on going to the same region.
> >>>
> >>> HTH
> >>>
> >>> Regards,
> >>>   Mohammad Tariq
> >>>
> >>>
> >>>
> >>> On Mon, Nov 26, 2012 at 3:53 PM, Ramasubramanian Narayanan <
> >>> ramasubramanian.narayanan@gmail.com> wrote:
> >>>
> >>>> Hi,
> >>>> Thanks! Can we have the customer number as the RowKey for the customer
> >>>> (client) master table? Please help in educating me on the advantage
> and
> >>>> disadvantage of having customer number as the Row key...
> >>>>
> >>>> Also SCD2 we may need to implement in that table.. will it work if I
> >>>> have
> >>>> like that?
> >>>>
> >>>> Or
> >>>>
> >>>> SCD2 is not needed instead we can achieve the same by increasing the
> >>>> version number that it will hold?
> >>>>
> >>>> pls suggest...
> >>>>
> >>>> regards,
> >>>> Rams
> >>>>
> >>>> On Mon, Nov 26, 2012 at 1:10 PM, Li, Min <mi...@microstrategy.com>
> wrote:
> >>>>
> >>>>> When 1 cf need to do split, other 599 cfs will split at the same
> >>>> time. So
> >>>>> many fragments will be produced when you use so many column families.
> >>>>> Actually, many cfs can be merge to only one cf with specific tags in
> >>>>> rowkey. For example, rowkey of customer address can be uid+'AD', and
> >>>>> customer profile can be uid+'PR'.
> >>>>>
> >>>>> Min
> >>>>> -----Original Message-----
> >>>>> From: Ramasubramanian Narayanan [mailto:
> >>>>> ramasubramanian.narayanan@gmail.com]
> >>>>> Sent: Monday, November 26, 2012 3:05 PM
> >>>>> To: user@hbase.apache.org
> >>>>> Subject: Expert suggestion needed to create table in Hbase - Banking
> >>>>>
> >>>>> Hi,
> >>>>>
> >>>>>  I have a requirement of physicalising the logical model... I have a
> >>>>> client model which has 600+ entities...
> >>>>>
> >>>>>  Need suggestion how to go about physicalising it...
> >>>>>
> >>>>>  I have few other doubts :
> >>>>>  1) Whether is it good to create a single table for all the 600+
> >>>> columns?
> >>>>>  2) To have different column families for different groups or can it
> >>>> be
> >>>>> under a single column family? For example, customer address can we
> >>>> have
> >>>> as
> >>>>> a different column family?
> >>>>>
> >>>>>  Please help on this..
> >>>>>
> >>>>>
> >>>>> regards,
> >>>>> Rams
> >>
> >>
>



-- 
Thanks & Regards,
Anil Gupta

Re: Expert suggestion needed to create table in Hbase - Banking

Posted by Ramasubramanian Narayanan <ra...@gmail.com>.
Nick,

    In most scenario we will fetch records based on the Customer Number....
The project is currently on design stage hence there are many other
system's requirements are yet to be known (to which this system will send
the feed)... As per the currently analysis only customer number will get
hit most of the time.


regards,
Rams

On Thu, Nov 29, 2012 at 12:04 AM, Nick Dimiduk <nd...@gmail.com> wrote:

> Hi Rams,
>
> Can you explain in more detail how you will be accessing this data?
>
> Thanks,
> Nick
>
> On Tue, Nov 27, 2012 at 8:44 PM, Ramasubramanian <
> ramasubramanian.narayanan@gmail.com> wrote:
>
> > Hi,
> >
> > Thanks!!
> >
> > Can someone help in suggesting what is the best rowkey that we can use in
> > this scenario.
> >
> > Regards,
> > Rams
> >
> > On 27-Nov-2012, at 10:37 PM, Suraj Varma <sv...@gmail.com> wrote:
> >
> > > Ian Varley's excellent HBaseCon presentation is another great resource.
> > > http://ianvarley.com/coding/HBaseSchema_HBaseCon2012.pdf
> > >
> > > On Mon, Nov 26, 2012 at 5:43 AM, Doug Meil
> > > <do...@explorysmedical.com> wrote:
> > >>
> > >> Hi there, somebody already wisely mentioned the link to the # of CF's
> > >> entry, but here are a few other entries that can save you some
> heartburn
> > >> if you read them ahead of time.
> > >>
> > >> http://hbase.apache.org/book.html#datamodel
> > >>
> > >> http://hbase.apache.org/book.html#schema
> > >>
> > >> http://hbase.apache.org/book.html#architecture
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> On 11/26/12 5:28 AM, "Mohammad Tariq" <do...@gmail.com> wrote:
> > >>
> > >>> Hello sir,
> > >>>
> > >>>   You might become a victim of RS hotspotting, since the cutomerIDs
> > will
> > >>> be sequential(I assume). To keep things simple Hbase puts all the
> rows
> > >>> with
> > >>> similar keys to the same RS. But, it becomes a bottleneck in the long
> > run
> > >>> as all the data keeps on going to the same region.
> > >>>
> > >>> HTH
> > >>>
> > >>> Regards,
> > >>>   Mohammad Tariq
> > >>>
> > >>>
> > >>>
> > >>> On Mon, Nov 26, 2012 at 3:53 PM, Ramasubramanian Narayanan <
> > >>> ramasubramanian.narayanan@gmail.com> wrote:
> > >>>
> > >>>> Hi,
> > >>>> Thanks! Can we have the customer number as the RowKey for the
> customer
> > >>>> (client) master table? Please help in educating me on the advantage
> > and
> > >>>> disadvantage of having customer number as the Row key...
> > >>>>
> > >>>> Also SCD2 we may need to implement in that table.. will it work if I
> > >>>> have
> > >>>> like that?
> > >>>>
> > >>>> Or
> > >>>>
> > >>>> SCD2 is not needed instead we can achieve the same by increasing the
> > >>>> version number that it will hold?
> > >>>>
> > >>>> pls suggest...
> > >>>>
> > >>>> regards,
> > >>>> Rams
> > >>>>
> > >>>> On Mon, Nov 26, 2012 at 1:10 PM, Li, Min <mi...@microstrategy.com>
> > wrote:
> > >>>>
> > >>>>> When 1 cf need to do split, other 599 cfs will split at the same
> > >>>> time. So
> > >>>>> many fragments will be produced when you use so many column
> families.
> > >>>>> Actually, many cfs can be merge to only one cf with specific tags
> in
> > >>>>> rowkey. For example, rowkey of customer address can be uid+'AD',
> and
> > >>>>> customer profile can be uid+'PR'.
> > >>>>>
> > >>>>> Min
> > >>>>> -----Original Message-----
> > >>>>> From: Ramasubramanian Narayanan [mailto:
> > >>>>> ramasubramanian.narayanan@gmail.com]
> > >>>>> Sent: Monday, November 26, 2012 3:05 PM
> > >>>>> To: user@hbase.apache.org
> > >>>>> Subject: Expert suggestion needed to create table in Hbase -
> Banking
> > >>>>>
> > >>>>> Hi,
> > >>>>>
> > >>>>>  I have a requirement of physicalising the logical model... I have
> a
> > >>>>> client model which has 600+ entities...
> > >>>>>
> > >>>>>  Need suggestion how to go about physicalising it...
> > >>>>>
> > >>>>>  I have few other doubts :
> > >>>>>  1) Whether is it good to create a single table for all the 600+
> > >>>> columns?
> > >>>>>  2) To have different column families for different groups or can
> it
> > >>>> be
> > >>>>> under a single column family? For example, customer address can we
> > >>>> have
> > >>>> as
> > >>>>> a different column family?
> > >>>>>
> > >>>>>  Please help on this..
> > >>>>>
> > >>>>>
> > >>>>> regards,
> > >>>>> Rams
> > >>
> > >>
> >
>

Re: Expert suggestion needed to create table in Hbase - Banking

Posted by Nick Dimiduk <nd...@gmail.com>.
Hi Rams,

Can you explain in more detail how you will be accessing this data?

Thanks,
Nick

On Tue, Nov 27, 2012 at 8:44 PM, Ramasubramanian <
ramasubramanian.narayanan@gmail.com> wrote:

> Hi,
>
> Thanks!!
>
> Can someone help in suggesting what is the best rowkey that we can use in
> this scenario.
>
> Regards,
> Rams
>
> On 27-Nov-2012, at 10:37 PM, Suraj Varma <sv...@gmail.com> wrote:
>
> > Ian Varley's excellent HBaseCon presentation is another great resource.
> > http://ianvarley.com/coding/HBaseSchema_HBaseCon2012.pdf
> >
> > On Mon, Nov 26, 2012 at 5:43 AM, Doug Meil
> > <do...@explorysmedical.com> wrote:
> >>
> >> Hi there, somebody already wisely mentioned the link to the # of CF's
> >> entry, but here are a few other entries that can save you some heartburn
> >> if you read them ahead of time.
> >>
> >> http://hbase.apache.org/book.html#datamodel
> >>
> >> http://hbase.apache.org/book.html#schema
> >>
> >> http://hbase.apache.org/book.html#architecture
> >>
> >>
> >>
> >>
> >>
> >> On 11/26/12 5:28 AM, "Mohammad Tariq" <do...@gmail.com> wrote:
> >>
> >>> Hello sir,
> >>>
> >>>   You might become a victim of RS hotspotting, since the cutomerIDs
> will
> >>> be sequential(I assume). To keep things simple Hbase puts all the rows
> >>> with
> >>> similar keys to the same RS. But, it becomes a bottleneck in the long
> run
> >>> as all the data keeps on going to the same region.
> >>>
> >>> HTH
> >>>
> >>> Regards,
> >>>   Mohammad Tariq
> >>>
> >>>
> >>>
> >>> On Mon, Nov 26, 2012 at 3:53 PM, Ramasubramanian Narayanan <
> >>> ramasubramanian.narayanan@gmail.com> wrote:
> >>>
> >>>> Hi,
> >>>> Thanks! Can we have the customer number as the RowKey for the customer
> >>>> (client) master table? Please help in educating me on the advantage
> and
> >>>> disadvantage of having customer number as the Row key...
> >>>>
> >>>> Also SCD2 we may need to implement in that table.. will it work if I
> >>>> have
> >>>> like that?
> >>>>
> >>>> Or
> >>>>
> >>>> SCD2 is not needed instead we can achieve the same by increasing the
> >>>> version number that it will hold?
> >>>>
> >>>> pls suggest...
> >>>>
> >>>> regards,
> >>>> Rams
> >>>>
> >>>> On Mon, Nov 26, 2012 at 1:10 PM, Li, Min <mi...@microstrategy.com>
> wrote:
> >>>>
> >>>>> When 1 cf need to do split, other 599 cfs will split at the same
> >>>> time. So
> >>>>> many fragments will be produced when you use so many column families.
> >>>>> Actually, many cfs can be merge to only one cf with specific tags in
> >>>>> rowkey. For example, rowkey of customer address can be uid+'AD', and
> >>>>> customer profile can be uid+'PR'.
> >>>>>
> >>>>> Min
> >>>>> -----Original Message-----
> >>>>> From: Ramasubramanian Narayanan [mailto:
> >>>>> ramasubramanian.narayanan@gmail.com]
> >>>>> Sent: Monday, November 26, 2012 3:05 PM
> >>>>> To: user@hbase.apache.org
> >>>>> Subject: Expert suggestion needed to create table in Hbase - Banking
> >>>>>
> >>>>> Hi,
> >>>>>
> >>>>>  I have a requirement of physicalising the logical model... I have a
> >>>>> client model which has 600+ entities...
> >>>>>
> >>>>>  Need suggestion how to go about physicalising it...
> >>>>>
> >>>>>  I have few other doubts :
> >>>>>  1) Whether is it good to create a single table for all the 600+
> >>>> columns?
> >>>>>  2) To have different column families for different groups or can it
> >>>> be
> >>>>> under a single column family? For example, customer address can we
> >>>> have
> >>>> as
> >>>>> a different column family?
> >>>>>
> >>>>>  Please help on this..
> >>>>>
> >>>>>
> >>>>> regards,
> >>>>> Rams
> >>
> >>
>

Re: Expert suggestion needed to create table in Hbase - Banking

Posted by Ramasubramanian <ra...@gmail.com>.
Hi,

Thanks!!

Can someone help in suggesting what is the best rowkey that we can use in this scenario. 

Regards,
Rams

On 27-Nov-2012, at 10:37 PM, Suraj Varma <sv...@gmail.com> wrote:

> Ian Varley's excellent HBaseCon presentation is another great resource.
> http://ianvarley.com/coding/HBaseSchema_HBaseCon2012.pdf
> 
> On Mon, Nov 26, 2012 at 5:43 AM, Doug Meil
> <do...@explorysmedical.com> wrote:
>> 
>> Hi there, somebody already wisely mentioned the link to the # of CF's
>> entry, but here are a few other entries that can save you some heartburn
>> if you read them ahead of time.
>> 
>> http://hbase.apache.org/book.html#datamodel
>> 
>> http://hbase.apache.org/book.html#schema
>> 
>> http://hbase.apache.org/book.html#architecture
>> 
>> 
>> 
>> 
>> 
>> On 11/26/12 5:28 AM, "Mohammad Tariq" <do...@gmail.com> wrote:
>> 
>>> Hello sir,
>>> 
>>>   You might become a victim of RS hotspotting, since the cutomerIDs will
>>> be sequential(I assume). To keep things simple Hbase puts all the rows
>>> with
>>> similar keys to the same RS. But, it becomes a bottleneck in the long run
>>> as all the data keeps on going to the same region.
>>> 
>>> HTH
>>> 
>>> Regards,
>>>   Mohammad Tariq
>>> 
>>> 
>>> 
>>> On Mon, Nov 26, 2012 at 3:53 PM, Ramasubramanian Narayanan <
>>> ramasubramanian.narayanan@gmail.com> wrote:
>>> 
>>>> Hi,
>>>> Thanks! Can we have the customer number as the RowKey for the customer
>>>> (client) master table? Please help in educating me on the advantage and
>>>> disadvantage of having customer number as the Row key...
>>>> 
>>>> Also SCD2 we may need to implement in that table.. will it work if I
>>>> have
>>>> like that?
>>>> 
>>>> Or
>>>> 
>>>> SCD2 is not needed instead we can achieve the same by increasing the
>>>> version number that it will hold?
>>>> 
>>>> pls suggest...
>>>> 
>>>> regards,
>>>> Rams
>>>> 
>>>> On Mon, Nov 26, 2012 at 1:10 PM, Li, Min <mi...@microstrategy.com> wrote:
>>>> 
>>>>> When 1 cf need to do split, other 599 cfs will split at the same
>>>> time. So
>>>>> many fragments will be produced when you use so many column families.
>>>>> Actually, many cfs can be merge to only one cf with specific tags in
>>>>> rowkey. For example, rowkey of customer address can be uid+'AD', and
>>>>> customer profile can be uid+'PR'.
>>>>> 
>>>>> Min
>>>>> -----Original Message-----
>>>>> From: Ramasubramanian Narayanan [mailto:
>>>>> ramasubramanian.narayanan@gmail.com]
>>>>> Sent: Monday, November 26, 2012 3:05 PM
>>>>> To: user@hbase.apache.org
>>>>> Subject: Expert suggestion needed to create table in Hbase - Banking
>>>>> 
>>>>> Hi,
>>>>> 
>>>>>  I have a requirement of physicalising the logical model... I have a
>>>>> client model which has 600+ entities...
>>>>> 
>>>>>  Need suggestion how to go about physicalising it...
>>>>> 
>>>>>  I have few other doubts :
>>>>>  1) Whether is it good to create a single table for all the 600+
>>>> columns?
>>>>>  2) To have different column families for different groups or can it
>>>> be
>>>>> under a single column family? For example, customer address can we
>>>> have
>>>> as
>>>>> a different column family?
>>>>> 
>>>>>  Please help on this..
>>>>> 
>>>>> 
>>>>> regards,
>>>>> Rams
>> 
>> 

Re: Expert suggestion needed to create table in Hbase - Banking

Posted by Suraj Varma <sv...@gmail.com>.
Ian Varley's excellent HBaseCon presentation is another great resource.
http://ianvarley.com/coding/HBaseSchema_HBaseCon2012.pdf

On Mon, Nov 26, 2012 at 5:43 AM, Doug Meil
<do...@explorysmedical.com> wrote:
>
> Hi there, somebody already wisely mentioned the link to the # of CF's
> entry, but here are a few other entries that can save you some heartburn
> if you read them ahead of time.
>
> http://hbase.apache.org/book.html#datamodel
>
> http://hbase.apache.org/book.html#schema
>
> http://hbase.apache.org/book.html#architecture
>
>
>
>
>
> On 11/26/12 5:28 AM, "Mohammad Tariq" <do...@gmail.com> wrote:
>
>>Hello sir,
>>
>>    You might become a victim of RS hotspotting, since the cutomerIDs will
>>be sequential(I assume). To keep things simple Hbase puts all the rows
>>with
>>similar keys to the same RS. But, it becomes a bottleneck in the long run
>>as all the data keeps on going to the same region.
>>
>>HTH
>>
>>Regards,
>>    Mohammad Tariq
>>
>>
>>
>>On Mon, Nov 26, 2012 at 3:53 PM, Ramasubramanian Narayanan <
>>ramasubramanian.narayanan@gmail.com> wrote:
>>
>>> Hi,
>>> Thanks! Can we have the customer number as the RowKey for the customer
>>> (client) master table? Please help in educating me on the advantage and
>>> disadvantage of having customer number as the Row key...
>>>
>>> Also SCD2 we may need to implement in that table.. will it work if I
>>>have
>>> like that?
>>>
>>> Or
>>>
>>> SCD2 is not needed instead we can achieve the same by increasing the
>>> version number that it will hold?
>>>
>>> pls suggest...
>>>
>>> regards,
>>> Rams
>>>
>>> On Mon, Nov 26, 2012 at 1:10 PM, Li, Min <mi...@microstrategy.com> wrote:
>>>
>>> > When 1 cf need to do split, other 599 cfs will split at the same
>>>time. So
>>> > many fragments will be produced when you use so many column families.
>>> > Actually, many cfs can be merge to only one cf with specific tags in
>>> > rowkey. For example, rowkey of customer address can be uid+'AD', and
>>> > customer profile can be uid+'PR'.
>>> >
>>> > Min
>>> > -----Original Message-----
>>> > From: Ramasubramanian Narayanan [mailto:
>>> > ramasubramanian.narayanan@gmail.com]
>>> > Sent: Monday, November 26, 2012 3:05 PM
>>> > To: user@hbase.apache.org
>>> > Subject: Expert suggestion needed to create table in Hbase - Banking
>>> >
>>> > Hi,
>>> >
>>> >   I have a requirement of physicalising the logical model... I have a
>>> > client model which has 600+ entities...
>>> >
>>> >   Need suggestion how to go about physicalising it...
>>> >
>>> >   I have few other doubts :
>>> >   1) Whether is it good to create a single table for all the 600+
>>> columns?
>>> >   2) To have different column families for different groups or can it
>>>be
>>> > under a single column family? For example, customer address can we
>>>have
>>> as
>>> > a different column family?
>>> >
>>> >   Please help on this..
>>> >
>>> >
>>> > regards,
>>> > Rams
>>> >
>>>
>
>

Re: Expert suggestion needed to create table in Hbase - Banking

Posted by syed kather <in...@gmail.com>.
Hello Sir ,

 For solving RS hotspotting you can also try this below
http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserver-hotspotting-despite-writing-records-with-sequential-keys/
It works fine ..

Regrading the Columns Family you can also try to group similar columns
towards one family, based on the process which you decide .

thanks and regards,
Syed Abdul Kather


            Thanks and Regards,
        S SYED ABDUL KATHER



On Mon, Nov 26, 2012 at 3:58 PM, Mohammad Tariq <do...@gmail.com> wrote:

> Hello sir,
>
>     You might become a victim of RS hotspotting, since the cutomerIDs will
> be sequential(I assume). To keep things simple Hbase puts all the rows with
> similar keys to the same RS. But, it becomes a bottleneck in the long run
> as all the data keeps on going to the same region.
>
> HTH
>
> Regards,
>     Mohammad Tariq
>
>
>
> On Mon, Nov 26, 2012 at 3:53 PM, Ramasubramanian Narayanan <
> ramasubramanian.narayanan@gmail.com> wrote:
>
> > Hi,
> > Thanks! Can we have the customer number as the RowKey for the customer
> > (client) master table? Please help in educating me on the advantage and
> > disadvantage of having customer number as the Row key...
> >
> > Also SCD2 we may need to implement in that table.. will it work if I have
> > like that?
> >
> > Or
> >
> > SCD2 is not needed instead we can achieve the same by increasing the
> > version number that it will hold?
> >
> > pls suggest...
> >
> > regards,
> > Rams
> >
> > On Mon, Nov 26, 2012 at 1:10 PM, Li, Min <mi...@microstrategy.com> wrote:
> >
> > > When 1 cf need to do split, other 599 cfs will split at the same time.
> So
> > > many fragments will be produced when you use so many column families.
> > > Actually, many cfs can be merge to only one cf with specific tags in
> > > rowkey. For example, rowkey of customer address can be uid+'AD', and
> > > customer profile can be uid+'PR'.
> > >
> > > Min
> > > -----Original Message-----
> > > From: Ramasubramanian Narayanan [mailto:
> > > ramasubramanian.narayanan@gmail.com]
> > > Sent: Monday, November 26, 2012 3:05 PM
> > > To: user@hbase.apache.org
> > > Subject: Expert suggestion needed to create table in Hbase - Banking
> > >
> > > Hi,
> > >
> > >   I have a requirement of physicalising the logical model... I have a
> > > client model which has 600+ entities...
> > >
> > >   Need suggestion how to go about physicalising it...
> > >
> > >   I have few other doubts :
> > >   1) Whether is it good to create a single table for all the 600+
> > columns?
> > >   2) To have different column families for different groups or can it
> be
> > > under a single column family? For example, customer address can we have
> > as
> > > a different column family?
> > >
> > >   Please help on this..
> > >
> > >
> > > regards,
> > > Rams
> > >
> >
>

Re: Expert suggestion needed to create table in Hbase - Banking

Posted by Michael Segel <mi...@hotmail.com>.
If the row Key is just the customer ID, then a simple MD5 hash or SHA-1 hash would suffice. 
That would clear up any risk of hot spotting, once you do your initial load of data. 

And that's probably a key point... hot spotting when you're first loading a very large table is really a moot point. It may be painful, but the pain lasts for less than an hour.

On Nov 26, 2012, at 4:28 AM, Mohammad Tariq <do...@gmail.com> wrote:

> Hello sir,
> 
>    You might become a victim of RS hotspotting, since the cutomerIDs will
> be sequential(I assume). To keep things simple Hbase puts all the rows with
> similar keys to the same RS. But, it becomes a bottleneck in the long run
> as all the data keeps on going to the same region.
> 
> HTH
> 
> Regards,
>    Mohammad Tariq
> 
> 
> 
> On Mon, Nov 26, 2012 at 3:53 PM, Ramasubramanian Narayanan <
> ramasubramanian.narayanan@gmail.com> wrote:
> 
>> Hi,
>> Thanks! Can we have the customer number as the RowKey for the customer
>> (client) master table? Please help in educating me on the advantage and
>> disadvantage of having customer number as the Row key...
>> 
>> Also SCD2 we may need to implement in that table.. will it work if I have
>> like that?
>> 
>> Or
>> 
>> SCD2 is not needed instead we can achieve the same by increasing the
>> version number that it will hold?
>> 
>> pls suggest...
>> 
>> regards,
>> Rams
>> 
>> On Mon, Nov 26, 2012 at 1:10 PM, Li, Min <mi...@microstrategy.com> wrote:
>> 
>>> When 1 cf need to do split, other 599 cfs will split at the same time. So
>>> many fragments will be produced when you use so many column families.
>>> Actually, many cfs can be merge to only one cf with specific tags in
>>> rowkey. For example, rowkey of customer address can be uid+'AD', and
>>> customer profile can be uid+'PR'.
>>> 
>>> Min
>>> -----Original Message-----
>>> From: Ramasubramanian Narayanan [mailto:
>>> ramasubramanian.narayanan@gmail.com]
>>> Sent: Monday, November 26, 2012 3:05 PM
>>> To: user@hbase.apache.org
>>> Subject: Expert suggestion needed to create table in Hbase - Banking
>>> 
>>> Hi,
>>> 
>>>  I have a requirement of physicalising the logical model... I have a
>>> client model which has 600+ entities...
>>> 
>>>  Need suggestion how to go about physicalising it...
>>> 
>>>  I have few other doubts :
>>>  1) Whether is it good to create a single table for all the 600+
>> columns?
>>>  2) To have different column families for different groups or can it be
>>> under a single column family? For example, customer address can we have
>> as
>>> a different column family?
>>> 
>>>  Please help on this..
>>> 
>>> 
>>> regards,
>>> Rams
>>> 
>> 


Re: Expert suggestion needed to create table in Hbase - Banking

Posted by Doug Meil <do...@explorysmedical.com>.
Hi there, somebody already wisely mentioned the link to the # of CF's
entry, but here are a few other entries that can save you some heartburn
if you read them ahead of time.

http://hbase.apache.org/book.html#datamodel

http://hbase.apache.org/book.html#schema

http://hbase.apache.org/book.html#architecture





On 11/26/12 5:28 AM, "Mohammad Tariq" <do...@gmail.com> wrote:

>Hello sir,
>
>    You might become a victim of RS hotspotting, since the cutomerIDs will
>be sequential(I assume). To keep things simple Hbase puts all the rows
>with
>similar keys to the same RS. But, it becomes a bottleneck in the long run
>as all the data keeps on going to the same region.
>
>HTH
>
>Regards,
>    Mohammad Tariq
>
>
>
>On Mon, Nov 26, 2012 at 3:53 PM, Ramasubramanian Narayanan <
>ramasubramanian.narayanan@gmail.com> wrote:
>
>> Hi,
>> Thanks! Can we have the customer number as the RowKey for the customer
>> (client) master table? Please help in educating me on the advantage and
>> disadvantage of having customer number as the Row key...
>>
>> Also SCD2 we may need to implement in that table.. will it work if I
>>have
>> like that?
>>
>> Or
>>
>> SCD2 is not needed instead we can achieve the same by increasing the
>> version number that it will hold?
>>
>> pls suggest...
>>
>> regards,
>> Rams
>>
>> On Mon, Nov 26, 2012 at 1:10 PM, Li, Min <mi...@microstrategy.com> wrote:
>>
>> > When 1 cf need to do split, other 599 cfs will split at the same
>>time. So
>> > many fragments will be produced when you use so many column families.
>> > Actually, many cfs can be merge to only one cf with specific tags in
>> > rowkey. For example, rowkey of customer address can be uid+'AD', and
>> > customer profile can be uid+'PR'.
>> >
>> > Min
>> > -----Original Message-----
>> > From: Ramasubramanian Narayanan [mailto:
>> > ramasubramanian.narayanan@gmail.com]
>> > Sent: Monday, November 26, 2012 3:05 PM
>> > To: user@hbase.apache.org
>> > Subject: Expert suggestion needed to create table in Hbase - Banking
>> >
>> > Hi,
>> >
>> >   I have a requirement of physicalising the logical model... I have a
>> > client model which has 600+ entities...
>> >
>> >   Need suggestion how to go about physicalising it...
>> >
>> >   I have few other doubts :
>> >   1) Whether is it good to create a single table for all the 600+
>> columns?
>> >   2) To have different column families for different groups or can it
>>be
>> > under a single column family? For example, customer address can we
>>have
>> as
>> > a different column family?
>> >
>> >   Please help on this..
>> >
>> >
>> > regards,
>> > Rams
>> >
>>



Re: Expert suggestion needed to create table in Hbase - Banking

Posted by Mohammad Tariq <do...@gmail.com>.
Hello sir,

    You might become a victim of RS hotspotting, since the cutomerIDs will
be sequential(I assume). To keep things simple Hbase puts all the rows with
similar keys to the same RS. But, it becomes a bottleneck in the long run
as all the data keeps on going to the same region.

HTH

Regards,
    Mohammad Tariq



On Mon, Nov 26, 2012 at 3:53 PM, Ramasubramanian Narayanan <
ramasubramanian.narayanan@gmail.com> wrote:

> Hi,
> Thanks! Can we have the customer number as the RowKey for the customer
> (client) master table? Please help in educating me on the advantage and
> disadvantage of having customer number as the Row key...
>
> Also SCD2 we may need to implement in that table.. will it work if I have
> like that?
>
> Or
>
> SCD2 is not needed instead we can achieve the same by increasing the
> version number that it will hold?
>
> pls suggest...
>
> regards,
> Rams
>
> On Mon, Nov 26, 2012 at 1:10 PM, Li, Min <mi...@microstrategy.com> wrote:
>
> > When 1 cf need to do split, other 599 cfs will split at the same time. So
> > many fragments will be produced when you use so many column families.
> > Actually, many cfs can be merge to only one cf with specific tags in
> > rowkey. For example, rowkey of customer address can be uid+'AD', and
> > customer profile can be uid+'PR'.
> >
> > Min
> > -----Original Message-----
> > From: Ramasubramanian Narayanan [mailto:
> > ramasubramanian.narayanan@gmail.com]
> > Sent: Monday, November 26, 2012 3:05 PM
> > To: user@hbase.apache.org
> > Subject: Expert suggestion needed to create table in Hbase - Banking
> >
> > Hi,
> >
> >   I have a requirement of physicalising the logical model... I have a
> > client model which has 600+ entities...
> >
> >   Need suggestion how to go about physicalising it...
> >
> >   I have few other doubts :
> >   1) Whether is it good to create a single table for all the 600+
> columns?
> >   2) To have different column families for different groups or can it be
> > under a single column family? For example, customer address can we have
> as
> > a different column family?
> >
> >   Please help on this..
> >
> >
> > regards,
> > Rams
> >
>

Re: Expert suggestion needed to create table in Hbase - Banking

Posted by Ramasubramanian Narayanan <ra...@gmail.com>.
Hi,
Thanks! Can we have the customer number as the RowKey for the customer
(client) master table? Please help in educating me on the advantage and
disadvantage of having customer number as the Row key...

Also SCD2 we may need to implement in that table.. will it work if I have
like that?

Or

SCD2 is not needed instead we can achieve the same by increasing the
version number that it will hold?

pls suggest...

regards,
Rams

On Mon, Nov 26, 2012 at 1:10 PM, Li, Min <mi...@microstrategy.com> wrote:

> When 1 cf need to do split, other 599 cfs will split at the same time. So
> many fragments will be produced when you use so many column families.
> Actually, many cfs can be merge to only one cf with specific tags in
> rowkey. For example, rowkey of customer address can be uid+'AD', and
> customer profile can be uid+'PR'.
>
> Min
> -----Original Message-----
> From: Ramasubramanian Narayanan [mailto:
> ramasubramanian.narayanan@gmail.com]
> Sent: Monday, November 26, 2012 3:05 PM
> To: user@hbase.apache.org
> Subject: Expert suggestion needed to create table in Hbase - Banking
>
> Hi,
>
>   I have a requirement of physicalising the logical model... I have a
> client model which has 600+ entities...
>
>   Need suggestion how to go about physicalising it...
>
>   I have few other doubts :
>   1) Whether is it good to create a single table for all the 600+ columns?
>   2) To have different column families for different groups or can it be
> under a single column family? For example, customer address can we have as
> a different column family?
>
>   Please help on this..
>
>
> regards,
> Rams
>

RE: Expert suggestion needed to create table in Hbase - Banking

Posted by "Li, Min" <mi...@microstrategy.com>.
When 1 cf need to do split, other 599 cfs will split at the same time. So many fragments will be produced when you use so many column families. Actually, many cfs can be merge to only one cf with specific tags in rowkey. For example, rowkey of customer address can be uid+'AD', and customer profile can be uid+'PR'.

Min 
-----Original Message-----
From: Ramasubramanian Narayanan [mailto:ramasubramanian.narayanan@gmail.com] 
Sent: Monday, November 26, 2012 3:05 PM
To: user@hbase.apache.org
Subject: Expert suggestion needed to create table in Hbase - Banking

Hi,

  I have a requirement of physicalising the logical model... I have a
client model which has 600+ entities...

  Need suggestion how to go about physicalising it...

  I have few other doubts :
  1) Whether is it good to create a single table for all the 600+ columns?
  2) To have different column families for different groups or can it be
under a single column family? For example, customer address can we have as
a different column family?

  Please help on this..


regards,
Rams

Re: Expert suggestion needed to create table in Hbase - Banking

Posted by anil gupta <an...@gmail.com>.
More on number of column families:
http://hbase.apache.org/book/number.of.cfs.html


On Sun, Nov 25, 2012 at 11:35 PM, anil gupta <an...@gmail.com> wrote:

> Hi Rams,
>
> The description of your use case is very abstract so i will try to answer
> your question to the best of my ability.
>
>
> 1) Whether is it good to create a single table for all the 600+ columns?
> Anil: Yes, it is absolutely ok to have 600+ columns in a row in HBase (you
> can go max upto few millions)
>
>
> 2) To have different column families for different groups or can it be
> under a single column family? For example, customer address can we have as
> a different column family?
> Anil: Usually HBase recommends not to have many column families(not more
> than 3 or 4). Having one column family is a very standard practice.
> However, in some cases creating more then one CF is justified. For example
> in around 95% of your lookups if you dont need to access "Customer Address"
> data then it would make sense to put them into a separate column family.
>
> HTH,
> Anil Gupta
>
>
>
>
>
>
>
> On Sun, Nov 25, 2012 at 11:04 PM, Ramasubramanian Narayanan <
> ramasubramanian.narayanan@gmail.com> wrote:
>
>> Hi,
>>
>>   I have a requirement of physicalising the logical model... I have a
>> client model which has 600+ entities...
>>
>>   Need suggestion how to go about physicalising it...
>>
>>   I have few other doubts :
>>   1) Whether is it good to create a single table for all the 600+ columns?
>>   2) To have different column families for different groups or can it be
>> under a single column family? For example, customer address can we have as
>> a different column family?
>>
>>   Please help on this..
>>
>>
>> regards,
>> Rams
>>
>
>
>
> --
> Thanks & Regards,
> Anil Gupta
>



-- 
Thanks & Regards,
Anil Gupta

Re: Expert suggestion needed to create table in Hbase - Banking

Posted by anil gupta <an...@gmail.com>.
Hi Rams,

The description of your use case is very abstract so i will try to answer
your question to the best of my ability.

1) Whether is it good to create a single table for all the 600+ columns?
Anil: Yes, it is absolutely ok to have 600+ columns in a row in HBase (you
can go max upto few millions)

2) To have different column families for different groups or can it be
under a single column family? For example, customer address can we have as
a different column family?
Anil: Usually HBase recommends not to have many column families(not more
than 3 or 4). Having one column family is a very standard practice.
However, in some cases creating more then one CF is justified. For example
in around 95% of your lookups if you dont need to access "Customer Address"
data then it would make sense to put them into a separate column family.

HTH,
Anil Gupta







On Sun, Nov 25, 2012 at 11:04 PM, Ramasubramanian Narayanan <
ramasubramanian.narayanan@gmail.com> wrote:

> Hi,
>
>   I have a requirement of physicalising the logical model... I have a
> client model which has 600+ entities...
>
>   Need suggestion how to go about physicalising it...
>
>   I have few other doubts :
>   1) Whether is it good to create a single table for all the 600+ columns?
>   2) To have different column families for different groups or can it be
> under a single column family? For example, customer address can we have as
> a different column family?
>
>   Please help on this..
>
>
> regards,
> Rams
>



-- 
Thanks & Regards,
Anil Gupta

Re: Expert suggestion needed to create table in Hbase - Banking

Posted by Michael Segel <mi...@hotmail.com>.
Rams, 

I think you need to go back and think about why you want to use Hadoop and HBase in the first place. 
Second, you need to think about your data and how you are planning to use it. 

Beyond that, we can only give you a bit of generic answers....

1) You can create a table with 600 columns, however... it depends on what you are trying to do.  There are some limitations that you have to consider in your design. However for the specific use case you stated.... they are not applicable. 

2) You can have models with different column families. However again it depends on what you are trying to do. 
However, in your example ... customer address... That's not a good example of when to use a column family. 
I was going to do a schema design course at a Hadoop conference next year, but it got turned down because it was considered to 'basic'. Maybe I'll propose it for the Hadoop conference in Amsterdam...  sorry, I digressed. 

Have you thought about using a schema on top of HBase? At a minimum, Avro, or possibly Wibidata's Kiji ? (Not that I'm plugging Aaron's project. ;-) 

I am also curious... this isn't the first time this question has come up on the lists... class project? 

HTH

-Mike



On Nov 26, 2012, at 1:04 AM, Ramasubramanian Narayanan <ra...@gmail.com> wrote:

> Hi,
> 
>  I have a requirement of physicalising the logical model... I have a
> client model which has 600+ entities...
> 
>  Need suggestion how to go about physicalising it...
> 
>  I have few other doubts :
>  1) Whether is it good to create a single table for all the 600+ columns?
>  2) To have different column families for different groups or can it be
> under a single column family? For example, customer address can we have as
> a different column family?
> 
>  Please help on this..
> 
> 
> regards,
> Rams