You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Eric Czech <er...@nextbigsound.com> on 2012/08/05 23:48:37 UTC

more tables or more rows

I need to support data that comes from 30+ sources and the structure
of that data is consistent across all the sources, but what I'm not
clear on is whether or not I should use 30+ tables with roughly the
same format or 1 table where the row key reflects the source.

Anybody have a strong argument one way or the other?

Thanks!

Re: more tables or more rows

Posted by J Mohamed Zahoor <jm...@gmail.com>.
Hi MK,

Some suggestions here

http://www.hbasecon.com/sessions/lightning-talk-real-performance-gains-with-real-time-data/

./zahoor

On Wed, Aug 8, 2012 at 5:44 PM, M. Karthikeyan
<m....@ericsson.com>wrote:

> Hi,
> A slightly related question:
> We have time series data continuously flowing into the system and has to
> be stored in HBase.
> We have retention policy to retain data for 90 days, so data older than 90
> days have to be deleted from HBase every midnight.
> There are two (that we know) ways of doing this:
> 1) Since bulk deletes could be costly and dropping an entire table is
> easier, we could have day wise tables and drop entire table
> 2) This post
> http://permalink.gmane.org/gmane.comp.java.hadoop.hbase.user/9603suggests that we can have a single table and use the TTL feature for ageing
> out data.
>
> May I request someone to briefly list out the pros and cons of either
> options?
> PS: We expect around 200 million records per day and each record would be
> approx.. 500 bytes.
> Thanks & Regards
> MK
>
>
> -----Original Message-----
> From: Mohammad Tariq [mailto:dontariq@gmail.com]
> Sent: 08 August 2012 03:19
> To: user@hbase.apache.org
> Subject: Re: more tables or more rows
>
> Hello sir,
>
>     It is absolutely fine to have as many tables as we like. My point was
> that if we have a large no of tables then it might add some overhead in
> locating the user region, as there will be a huge amount of mapping from
> "user tables" to "region servers". Also, client will have to cache  more
> information blocking the additional memory. So, I suggested to have small
> no of large tables rather than large no of small tables, if the data is
> similar.
>
> Regards,
>     Mohammad Tariq
>
>
> On Tue, Aug 7, 2012 at 5:30 PM, Eric Czech <er...@nextbigsound.com> wrote:
> > Thanks Mohammad,
> >
> > By saying the major purpose is to host very large tables (implying a
> > smaller number of them), are you referring to anything other than the
> > memstores per column family taking up sizable portions of physical
> memory?
> >  Are there other components or design aspects that make using large
> > numbers of tables inadvisable?
> >
> > On Sun, Aug 5, 2012 at 5:55 PM, Mohammad Tariq <do...@gmail.com>
> wrote:
> >> Hello sir,
> >>
> >>       Going for a single table with 30+ rows would be a better
> >> choice, if the data from all the sources is not very different.
> >> Since, you are considering Hbase as your data store, it wouldn't be
> >> wise to have several small rows. The major purpose of Hbase is to
> >> host very large tables that may go beyond billions of rows and millions
> of columns.
> >>
> >> Regards,
> >>     Mohammad Tariq
> >>
> >>
> >> On Mon, Aug 6, 2012 at 3:18 AM, Eric Czech <er...@nextbigsound.com>
> wrote:
> >>> I need to support data that comes from 30+ sources and the structure
> >>> of that data is consistent across all the sources, but what I'm not
> >>> clear on is whether or not I should use 30+ tables with roughly the
> >>> same format or 1 table where the row key reflects the source.
> >>>
> >>> Anybody have a strong argument one way or the other?
> >>>
> >>> Thanks!
>

RE: more tables or more rows

Posted by "M. Karthikeyan" <m....@ericsson.com>.
Hi,
A slightly related question:
We have time series data continuously flowing into the system and has to be stored in HBase.
We have retention policy to retain data for 90 days, so data older than 90 days have to be deleted from HBase every midnight.
There are two (that we know) ways of doing this:
1) Since bulk deletes could be costly and dropping an entire table is easier, we could have day wise tables and drop entire table 
2) This post http://permalink.gmane.org/gmane.comp.java.hadoop.hbase.user/9603 suggests that we can have a single table and use the TTL feature for ageing out data.

May I request someone to briefly list out the pros and cons of either options?
PS: We expect around 200 million records per day and each record would be approx.. 500 bytes.
Thanks & Regards
MK


-----Original Message-----
From: Mohammad Tariq [mailto:dontariq@gmail.com] 
Sent: 08 August 2012 03:19
To: user@hbase.apache.org
Subject: Re: more tables or more rows

Hello sir,

    It is absolutely fine to have as many tables as we like. My point was that if we have a large no of tables then it might add some overhead in locating the user region, as there will be a huge amount of mapping from "user tables" to "region servers". Also, client will have to cache  more information blocking the additional memory. So, I suggested to have small no of large tables rather than large no of small tables, if the data is similar.

Regards,
    Mohammad Tariq


On Tue, Aug 7, 2012 at 5:30 PM, Eric Czech <er...@nextbigsound.com> wrote:
> Thanks Mohammad,
>
> By saying the major purpose is to host very large tables (implying a 
> smaller number of them), are you referring to anything other than the 
> memstores per column family taking up sizable portions of physical memory?
>  Are there other components or design aspects that make using large 
> numbers of tables inadvisable?
>
> On Sun, Aug 5, 2012 at 5:55 PM, Mohammad Tariq <do...@gmail.com> wrote:
>> Hello sir,
>>
>>       Going for a single table with 30+ rows would be a better 
>> choice, if the data from all the sources is not very different. 
>> Since, you are considering Hbase as your data store, it wouldn't be 
>> wise to have several small rows. The major purpose of Hbase is to 
>> host very large tables that may go beyond billions of rows and millions of columns.
>>
>> Regards,
>>     Mohammad Tariq
>>
>>
>> On Mon, Aug 6, 2012 at 3:18 AM, Eric Czech <er...@nextbigsound.com> wrote:
>>> I need to support data that comes from 30+ sources and the structure 
>>> of that data is consistent across all the sources, but what I'm not 
>>> clear on is whether or not I should use 30+ tables with roughly the 
>>> same format or 1 table where the row key reflects the source.
>>>
>>> Anybody have a strong argument one way or the other?
>>>
>>> Thanks!

Re: more tables or more rows

Posted by Mohammad Tariq <do...@gmail.com>.
Hello sir,

    It is absolutely fine to have as many tables as we like. My point
was that if we have a large no of tables then it might add some
overhead in locating the user region, as there will be a huge amount
of mapping from "user tables" to "region servers". Also, client will
have to cache  more information blocking the additional memory. So, I
suggested to have small no of large tables rather than large no of
small tables, if the data is similar.

Regards,
    Mohammad Tariq


On Tue, Aug 7, 2012 at 5:30 PM, Eric Czech <er...@nextbigsound.com> wrote:
> Thanks Mohammad,
>
> By saying the major purpose is to host very large tables (implying a
> smaller number of them), are you referring to anything other than the
> memstores per column family taking up sizable portions of physical memory?
>  Are there other components or design aspects that make using large numbers
> of tables inadvisable?
>
> On Sun, Aug 5, 2012 at 5:55 PM, Mohammad Tariq <do...@gmail.com> wrote:
>> Hello sir,
>>
>>       Going for a single table with 30+ rows would be a better choice,
>> if the data from all the sources is not very different. Since, you are
>> considering Hbase as your data store, it wouldn't be wise to have
>> several small rows. The major purpose of Hbase is to host very large
>> tables that may go beyond billions of rows and millions of columns.
>>
>> Regards,
>>     Mohammad Tariq
>>
>>
>> On Mon, Aug 6, 2012 at 3:18 AM, Eric Czech <er...@nextbigsound.com> wrote:
>>> I need to support data that comes from 30+ sources and the structure
>>> of that data is consistent across all the sources, but what I'm not
>>> clear on is whether or not I should use 30+ tables with roughly the
>>> same format or 1 table where the row key reflects the source.
>>>
>>> Anybody have a strong argument one way or the other?
>>>
>>> Thanks!

more tables or more rows

Posted by Eric Czech <er...@nextbigsound.com>.
Thanks Mohammad,

By saying the major purpose is to host very large tables (implying a
smaller number of them), are you referring to anything other than the
memstores per column family taking up sizable portions of physical memory?
 Are there other components or design aspects that make using large numbers
of tables inadvisable?

On Sun, Aug 5, 2012 at 5:55 PM, Mohammad Tariq <do...@gmail.com> wrote:
> Hello sir,
>
>       Going for a single table with 30+ rows would be a better choice,
> if the data from all the sources is not very different. Since, you are
> considering Hbase as your data store, it wouldn't be wise to have
> several small rows. The major purpose of Hbase is to host very large
> tables that may go beyond billions of rows and millions of columns.
>
> Regards,
>     Mohammad Tariq
>
>
> On Mon, Aug 6, 2012 at 3:18 AM, Eric Czech <er...@nextbigsound.com> wrote:
>> I need to support data that comes from 30+ sources and the structure
>> of that data is consistent across all the sources, but what I'm not
>> clear on is whether or not I should use 30+ tables with roughly the
>> same format or 1 table where the row key reflects the source.
>>
>> Anybody have a strong argument one way or the other?
>>
>> Thanks!

Re: more tables or more rows

Posted by Mohammad Tariq <do...@gmail.com>.
Hello sir,

      Going for a single table with 30+ rows would be a better choice,
if the data from all the sources is not very different. Since, you are
considering Hbase as your data store, it wouldn't be wise to have
several small rows. The major purpose of Hbase is to host very large
tables that may go beyond billions of rows and millions of columns.

Regards,
    Mohammad Tariq


On Mon, Aug 6, 2012 at 3:18 AM, Eric Czech <er...@nextbigsound.com> wrote:
> I need to support data that comes from 30+ sources and the structure
> of that data is consistent across all the sources, but what I'm not
> clear on is whether or not I should use 30+ tables with roughly the
> same format or 1 table where the row key reflects the source.
>
> Anybody have a strong argument one way or the other?
>
> Thanks!