You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Vimal Jain <vk...@gmail.com> on 2013/06/28 07:20:16 UTC

How many column families in one table ?

Hi,
How many column families should be there in an hbase table ? Is there any
performance issue in read/write if we have more column families ?
I have designed one table with around 14 column families in it with each
having on average 6 qualifiers.
Is it a good design ?

-- 
Thanks and Regards,
Vimal Jain

Re: How many column families in one table ?

Posted by Michael Segel <mi...@hotmail.com>.
Beyond the physical limitations (cost constraints) there's a logical one in terms of design. 

I just did a talk at the CHUG on schema design and the key was to understand how and why one should use column families. 

From a logical design perspective you would want to limit data within a CF to data that you grab all at once. Meaning that when you do your scan / get, you want to minimize the column families that you have to hit. 

So you need to think about how you approach organizing your data. 

The best example of this is to look at an order entry system where the column families are broken out in to Order Entry, Pick Slips, Shipping and Invoices. 

While they all use the same key (customer number | order number) the data for each part of the order entry through fulfillment is accessed separately. 

So even in this example, you have 4 column families in use for this one table. 

HTH

-Mike

On Jun 28, 2013, at 7:27 AM, Ted Yu <yu...@gmail.com> wrote:

> http://search-hadoop.com/m/qOx8l15Z1q42/column+families+fb&subj=Re+HBase+Column+Family+Limit+Reasoning


Re: How many column families in one table ?

Posted by Inder Pall <in...@gmail.com>.
Thanks kevin

inder
"you are the average of 5 people you spend the most time with"
On Aug 4, 2013 8:35 PM, "Kevin O'dell" <ke...@cloudera.com> wrote:

> Hi Inder,
>
>   Here is an excellent blog post which is a little dated:
>
> http://www.larsgeorge.com/2009/11/hbase-vs-bigtable-comparison.html?m=1
> On Aug 4, 2013 10:55 AM, "Inder Pall" <in...@gmail.com> wrote:
>
> > Kevin
> >
> > Would love to hear your thoughts around hbase  not big table.
> >
> > Thanks
> >
> > inder
> > "you are the average of 5 people you spend the most time with"
> > On Aug 4, 2013 8:15 PM, "Kevin O'dell" <ke...@cloudera.com> wrote:
> >
> > > Hi Vimal,
> > >
> > >   It really depends on your usage pattern but HBase != Bigtable.
> > > On Aug 4, 2013 2:29 AM, "Vimal Jain" <vk...@gmail.com> wrote:
> > >
> > > > Hi,
> > > > I have tested read performance after reducing number of column
> families
> > > > from 14 to 3 and yes there is improvement.
> > > > Meanwhile i was going through the paper published by google on
> > BigTable.
> > > > It says
> > > >
> > > > "It is our intent that the number of distinct column
> > > > families in a table be small (in the hundreds at most), and
> > > > that families rarely change during operation."
> > > >
> > > > So Is that theoretical value ( 100 CFs )  or its possible but not
> with
> > > the
> > > > current version of Hbase ?
> > > >
> > > >
> > > > On Tue, Jul 2, 2013 at 12:48 AM, Viral Bajaria <
> > viral.bajaria@gmail.com
> > > > >wrote:
> > > >
> > > > > On Mon, Jul 1, 2013 at 10:06 AM, Vimal Jain <vk...@gmail.com>
> > wrote:
> > > > >
> > > > > > Sorry for the typo .. please ignore previous mail.. Here is the
> > > > corrected
> > > > > > one..
> > > > > > 1)I have around 140 columns for each row , out of 140 , around
> 100
> > > > > columns
> > > > > > hold java primitive data type , remaining 40 columns  contain
> > > > serialized
> > > > > > java object as byte array(Inside each object is an ArrayList).
> Yes
> > ,
> > > I
> > > > do
> > > > > > delete data but the frequency is very less ( 1 out of 5K
> operations
> > > ).
> > > > I
> > > > > > dont run any compaction.
> > > > > >
> > > > >
> > > > > This answers the type of data in each cell not the size of data.
> Can
> > > you
> > > > > figure out the average size of data that you insert in that size.
> For
> > > > > example what is the length of the byte array ? Also for java
> > primitive,
> > > > is
> > > > > it 8-byte long ? 4-byte int ?
> > > > > In addition to that, what is in the row key ? How long is that in
> > > bytes ?
> > > > > Same for column family, can you share the names of the column
> family
> > ?
> > > > How
> > > > > about qualifiers ?
> > > > >
> > > > > If you have disabled major compactions, you should run it once a
> few
> > > days
> > > > > (if not once a day) to consolidate the # of files that each scan
> will
> > > > have
> > > > > to open.
> > > > >
> > > > > 2) I had ran scan keeping in mind the CPU,IO and other system
> related
> > > > > > parameters.I found them to be normal with system load being
> > 0.1-0.3.
> > > > > >
> > > > >
> > > > > How many disks do you have in your box ? Have you ever benchmarked
> > the
> > > > > hardware ?
> > > > >
> > > > > Thanks,
> > > > > Viral
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Thanks and Regards,
> > > > Vimal Jain
> > > >
> > >
> >
>

Re: How many column families in one table ?

Posted by Kevin O'dell <ke...@cloudera.com>.
Hi Inder,

  Here is an excellent blog post which is a little dated:

http://www.larsgeorge.com/2009/11/hbase-vs-bigtable-comparison.html?m=1
On Aug 4, 2013 10:55 AM, "Inder Pall" <in...@gmail.com> wrote:

> Kevin
>
> Would love to hear your thoughts around hbase  not big table.
>
> Thanks
>
> inder
> "you are the average of 5 people you spend the most time with"
> On Aug 4, 2013 8:15 PM, "Kevin O'dell" <ke...@cloudera.com> wrote:
>
> > Hi Vimal,
> >
> >   It really depends on your usage pattern but HBase != Bigtable.
> > On Aug 4, 2013 2:29 AM, "Vimal Jain" <vk...@gmail.com> wrote:
> >
> > > Hi,
> > > I have tested read performance after reducing number of column families
> > > from 14 to 3 and yes there is improvement.
> > > Meanwhile i was going through the paper published by google on
> BigTable.
> > > It says
> > >
> > > "It is our intent that the number of distinct column
> > > families in a table be small (in the hundreds at most), and
> > > that families rarely change during operation."
> > >
> > > So Is that theoretical value ( 100 CFs )  or its possible but not with
> > the
> > > current version of Hbase ?
> > >
> > >
> > > On Tue, Jul 2, 2013 at 12:48 AM, Viral Bajaria <
> viral.bajaria@gmail.com
> > > >wrote:
> > >
> > > > On Mon, Jul 1, 2013 at 10:06 AM, Vimal Jain <vk...@gmail.com>
> wrote:
> > > >
> > > > > Sorry for the typo .. please ignore previous mail.. Here is the
> > > corrected
> > > > > one..
> > > > > 1)I have around 140 columns for each row , out of 140 , around 100
> > > > columns
> > > > > hold java primitive data type , remaining 40 columns  contain
> > > serialized
> > > > > java object as byte array(Inside each object is an ArrayList). Yes
> ,
> > I
> > > do
> > > > > delete data but the frequency is very less ( 1 out of 5K operations
> > ).
> > > I
> > > > > dont run any compaction.
> > > > >
> > > >
> > > > This answers the type of data in each cell not the size of data. Can
> > you
> > > > figure out the average size of data that you insert in that size. For
> > > > example what is the length of the byte array ? Also for java
> primitive,
> > > is
> > > > it 8-byte long ? 4-byte int ?
> > > > In addition to that, what is in the row key ? How long is that in
> > bytes ?
> > > > Same for column family, can you share the names of the column family
> ?
> > > How
> > > > about qualifiers ?
> > > >
> > > > If you have disabled major compactions, you should run it once a few
> > days
> > > > (if not once a day) to consolidate the # of files that each scan will
> > > have
> > > > to open.
> > > >
> > > > 2) I had ran scan keeping in mind the CPU,IO and other system related
> > > > > parameters.I found them to be normal with system load being
> 0.1-0.3.
> > > > >
> > > >
> > > > How many disks do you have in your box ? Have you ever benchmarked
> the
> > > > hardware ?
> > > >
> > > > Thanks,
> > > > Viral
> > > >
> > >
> > >
> > >
> > > --
> > > Thanks and Regards,
> > > Vimal Jain
> > >
> >
>

Re: How many column families in one table ?

Posted by Inder Pall <in...@gmail.com>.
Kevin

Would love to hear your thoughts around hbase  not big table.

Thanks

inder
"you are the average of 5 people you spend the most time with"
On Aug 4, 2013 8:15 PM, "Kevin O'dell" <ke...@cloudera.com> wrote:

> Hi Vimal,
>
>   It really depends on your usage pattern but HBase != Bigtable.
> On Aug 4, 2013 2:29 AM, "Vimal Jain" <vk...@gmail.com> wrote:
>
> > Hi,
> > I have tested read performance after reducing number of column families
> > from 14 to 3 and yes there is improvement.
> > Meanwhile i was going through the paper published by google on BigTable.
> > It says
> >
> > "It is our intent that the number of distinct column
> > families in a table be small (in the hundreds at most), and
> > that families rarely change during operation."
> >
> > So Is that theoretical value ( 100 CFs )  or its possible but not with
> the
> > current version of Hbase ?
> >
> >
> > On Tue, Jul 2, 2013 at 12:48 AM, Viral Bajaria <viral.bajaria@gmail.com
> > >wrote:
> >
> > > On Mon, Jul 1, 2013 at 10:06 AM, Vimal Jain <vk...@gmail.com> wrote:
> > >
> > > > Sorry for the typo .. please ignore previous mail.. Here is the
> > corrected
> > > > one..
> > > > 1)I have around 140 columns for each row , out of 140 , around 100
> > > columns
> > > > hold java primitive data type , remaining 40 columns  contain
> > serialized
> > > > java object as byte array(Inside each object is an ArrayList). Yes ,
> I
> > do
> > > > delete data but the frequency is very less ( 1 out of 5K operations
> ).
> > I
> > > > dont run any compaction.
> > > >
> > >
> > > This answers the type of data in each cell not the size of data. Can
> you
> > > figure out the average size of data that you insert in that size. For
> > > example what is the length of the byte array ? Also for java primitive,
> > is
> > > it 8-byte long ? 4-byte int ?
> > > In addition to that, what is in the row key ? How long is that in
> bytes ?
> > > Same for column family, can you share the names of the column family ?
> > How
> > > about qualifiers ?
> > >
> > > If you have disabled major compactions, you should run it once a few
> days
> > > (if not once a day) to consolidate the # of files that each scan will
> > have
> > > to open.
> > >
> > > 2) I had ran scan keeping in mind the CPU,IO and other system related
> > > > parameters.I found them to be normal with system load being 0.1-0.3.
> > > >
> > >
> > > How many disks do you have in your box ? Have you ever benchmarked the
> > > hardware ?
> > >
> > > Thanks,
> > > Viral
> > >
> >
> >
> >
> > --
> > Thanks and Regards,
> > Vimal Jain
> >
>

Re: How many column families in one table ?

Posted by Kevin O'dell <ke...@cloudera.com>.
Hi Vimal,

  It really depends on your usage pattern but HBase != Bigtable.
On Aug 4, 2013 2:29 AM, "Vimal Jain" <vk...@gmail.com> wrote:

> Hi,
> I have tested read performance after reducing number of column families
> from 14 to 3 and yes there is improvement.
> Meanwhile i was going through the paper published by google on BigTable.
> It says
>
> "It is our intent that the number of distinct column
> families in a table be small (in the hundreds at most), and
> that families rarely change during operation."
>
> So Is that theoretical value ( 100 CFs )  or its possible but not with the
> current version of Hbase ?
>
>
> On Tue, Jul 2, 2013 at 12:48 AM, Viral Bajaria <viral.bajaria@gmail.com
> >wrote:
>
> > On Mon, Jul 1, 2013 at 10:06 AM, Vimal Jain <vk...@gmail.com> wrote:
> >
> > > Sorry for the typo .. please ignore previous mail.. Here is the
> corrected
> > > one..
> > > 1)I have around 140 columns for each row , out of 140 , around 100
> > columns
> > > hold java primitive data type , remaining 40 columns  contain
> serialized
> > > java object as byte array(Inside each object is an ArrayList). Yes , I
> do
> > > delete data but the frequency is very less ( 1 out of 5K operations ).
> I
> > > dont run any compaction.
> > >
> >
> > This answers the type of data in each cell not the size of data. Can you
> > figure out the average size of data that you insert in that size. For
> > example what is the length of the byte array ? Also for java primitive,
> is
> > it 8-byte long ? 4-byte int ?
> > In addition to that, what is in the row key ? How long is that in bytes ?
> > Same for column family, can you share the names of the column family ?
> How
> > about qualifiers ?
> >
> > If you have disabled major compactions, you should run it once a few days
> > (if not once a day) to consolidate the # of files that each scan will
> have
> > to open.
> >
> > 2) I had ran scan keeping in mind the CPU,IO and other system related
> > > parameters.I found them to be normal with system load being 0.1-0.3.
> > >
> >
> > How many disks do you have in your box ? Have you ever benchmarked the
> > hardware ?
> >
> > Thanks,
> > Viral
> >
>
>
>
> --
> Thanks and Regards,
> Vimal Jain
>

Re: How many column families in one table ?

Posted by Kevin O'dell <ke...@cloudera.com>.
Pablo,

  That is correct.


On Mon, Aug 5, 2013 at 10:00 AM, Pablo Medina <pa...@gmail.com>wrote:

> Lars,
>
> when you say 'when one memstore needs to be flushed all other column
> families are flushed', are you referring to other column families of the
> same table, right?
>
>
>
>
> 2013/8/4 Rohit Kelkar <ro...@gmail.com>
>
> > Regarding slow scan- only fetch the columns /qualifiers that you need. It
> > may be that you are fetching a whole lot of data that you don't need. Try
> > scan.addColumn() and let us know.
> >
> > - R
> >
> > On Sunday, August 4, 2013, lars hofhansl wrote:
> >
> > > BigTable has one more level of abstraction: Locality Groups
> > > A Column Family in HBase is both a Column Faimily and a Locality Group:
> > It
> > > is a group of columns *and* it defines storage parameters (compression,
> > > versions, TTL, etc).
> > >
> > > As to how many make sense. It depends.
> > > If you can group your columns such that a scan is often limited to a
> > > single Column Family, you'll get huge benefit by using more Column
> > Families.
> > > The main consideration for many Column Families and that each has its
> own
> > > store files, and hence scanning involves more seeking for each Column
> > > Families included in a scan.
> > >
> > > They are also flushed together; when one memstore (which is per Column
> > > Family) needs to be flushed all other Column Families are also flushed
> > > leading to many small files until they are compacted. If all your
> Column
> > > Faimilies are roughly the same size this is less of a problem. It's
> also
> > > possible to mitigate this by tweaking the compaction policies.
> > >
> > >
> > > -- Lars
> > >
> > >
> > >
> > > ________________________________
> > >  From: Vimal Jain <vkjk89@gmail.com <javascript:;>>
> > > To: user@hbase.apache.org <javascript:;>
> > > Sent: Saturday, August 3, 2013 11:28 PM
> > > Subject: Re: How many column families in one table ?
> > >
> > >
> > > Hi,
> > > I have tested read performance after reducing number of column families
> > > from 14 to 3 and yes there is improvement.
> > > Meanwhile i was going through the paper published by google on
> BigTable.
> > > It says
> > >
> > > "It is our intent that the number of distinct column
> > > families in a table be small (in the hundreds at most), and
> > > that families rarely change during operation."
> > >
> > > So Is that theoretical value ( 100 CFs )  or its possible but not with
> > the
> > > current version of Hbase ?
> > >
> > >
> > > On Tue, Jul 2, 2013 at 12:48 AM, Viral Bajaria <
> viral.bajaria@gmail.com
> > <javascript:;>
> > > >wrote:
> > >
> > > > On Mon, Jul 1, 2013 at 10:06 AM, Vimal Jain <vkjk89@gmail.com
> > <javascript:;>>
> > > wrote:
> > > >
> > > > > Sorry for the typo .. please ignore previous mail.. Here is the
> > > corrected
> > > > > one..
> > > > > 1)I have around 140 columns for each row , out of 140 , around 100
> > > > columns
> > > > > hold java primitive data type , remaining 40 columns  contain
> > > serialized
> > > > > java object as byte array(Inside each object is an ArrayList). Yes
> ,
> > I
> > > do
> > > > > delete data but the frequency is very less ( 1 out of 5K operations
> > ).
> > > I
> > > > > dont run any compaction.
> > > > >
> > > >
> > > > This answers the type of data in each cell not the size of data. Can
> > you
> > > > figure out the average size of data that you insert in that size. For
> > > > example what is the length of the byte array ? Also for java
> primitive,
> > > is
> > > > it 8-byte long ? 4-byte int ?
> > > > In addition to that, what is in the row key ? How long is that in
> > bytes ?
> > > > Same for column family, can you share the names of the column family
> ?
> > > How
> > > > about qualifiers ?
> > > >
> > > > If you have disabled major compactions, you should run it once a few
> > days
> > > > (if not once a day) to consolidate the # of files that each scan will
> > > have
> > > > to open.
> > > >
> > > > 2) I had ran scan keeping in mind the CPU,IO and other system related
> > > > > parameters.I found them to be normal with system load being
> 0.1-0.3.
> > > > >
> > > >
> > > > How many disks do you have in your box ? Have you ever benchmarked
> the
> > > > hardware ?
> > > >
> > > > Thanks,
> > > > Viral
> > > >
> > >
> > >
> > >
> > > --
> > > Thanks and Regards,
> > > Vimal Jain
> >
>



-- 
Kevin O'Dell
Systems Engineer, Cloudera

Re: How many column families in one table ?

Posted by Pablo Medina <pa...@gmail.com>.
Lars,

when you say 'when one memstore needs to be flushed all other column
families are flushed', are you referring to other column families of the
same table, right?




2013/8/4 Rohit Kelkar <ro...@gmail.com>

> Regarding slow scan- only fetch the columns /qualifiers that you need. It
> may be that you are fetching a whole lot of data that you don't need. Try
> scan.addColumn() and let us know.
>
> - R
>
> On Sunday, August 4, 2013, lars hofhansl wrote:
>
> > BigTable has one more level of abstraction: Locality Groups
> > A Column Family in HBase is both a Column Faimily and a Locality Group:
> It
> > is a group of columns *and* it defines storage parameters (compression,
> > versions, TTL, etc).
> >
> > As to how many make sense. It depends.
> > If you can group your columns such that a scan is often limited to a
> > single Column Family, you'll get huge benefit by using more Column
> Families.
> > The main consideration for many Column Families and that each has its own
> > store files, and hence scanning involves more seeking for each Column
> > Families included in a scan.
> >
> > They are also flushed together; when one memstore (which is per Column
> > Family) needs to be flushed all other Column Families are also flushed
> > leading to many small files until they are compacted. If all your Column
> > Faimilies are roughly the same size this is less of a problem. It's also
> > possible to mitigate this by tweaking the compaction policies.
> >
> >
> > -- Lars
> >
> >
> >
> > ________________________________
> >  From: Vimal Jain <vkjk89@gmail.com <javascript:;>>
> > To: user@hbase.apache.org <javascript:;>
> > Sent: Saturday, August 3, 2013 11:28 PM
> > Subject: Re: How many column families in one table ?
> >
> >
> > Hi,
> > I have tested read performance after reducing number of column families
> > from 14 to 3 and yes there is improvement.
> > Meanwhile i was going through the paper published by google on BigTable.
> > It says
> >
> > "It is our intent that the number of distinct column
> > families in a table be small (in the hundreds at most), and
> > that families rarely change during operation."
> >
> > So Is that theoretical value ( 100 CFs )  or its possible but not with
> the
> > current version of Hbase ?
> >
> >
> > On Tue, Jul 2, 2013 at 12:48 AM, Viral Bajaria <viral.bajaria@gmail.com
> <javascript:;>
> > >wrote:
> >
> > > On Mon, Jul 1, 2013 at 10:06 AM, Vimal Jain <vkjk89@gmail.com
> <javascript:;>>
> > wrote:
> > >
> > > > Sorry for the typo .. please ignore previous mail.. Here is the
> > corrected
> > > > one..
> > > > 1)I have around 140 columns for each row , out of 140 , around 100
> > > columns
> > > > hold java primitive data type , remaining 40 columns  contain
> > serialized
> > > > java object as byte array(Inside each object is an ArrayList). Yes ,
> I
> > do
> > > > delete data but the frequency is very less ( 1 out of 5K operations
> ).
> > I
> > > > dont run any compaction.
> > > >
> > >
> > > This answers the type of data in each cell not the size of data. Can
> you
> > > figure out the average size of data that you insert in that size. For
> > > example what is the length of the byte array ? Also for java primitive,
> > is
> > > it 8-byte long ? 4-byte int ?
> > > In addition to that, what is in the row key ? How long is that in
> bytes ?
> > > Same for column family, can you share the names of the column family ?
> > How
> > > about qualifiers ?
> > >
> > > If you have disabled major compactions, you should run it once a few
> days
> > > (if not once a day) to consolidate the # of files that each scan will
> > have
> > > to open.
> > >
> > > 2) I had ran scan keeping in mind the CPU,IO and other system related
> > > > parameters.I found them to be normal with system load being 0.1-0.3.
> > > >
> > >
> > > How many disks do you have in your box ? Have you ever benchmarked the
> > > hardware ?
> > >
> > > Thanks,
> > > Viral
> > >
> >
> >
> >
> > --
> > Thanks and Regards,
> > Vimal Jain
>

Re: How many column families in one table ?

Posted by Rohit Kelkar <ro...@gmail.com>.
Regarding slow scan- only fetch the columns /qualifiers that you need. It
may be that you are fetching a whole lot of data that you don't need. Try
scan.addColumn() and let us know.

- R

On Sunday, August 4, 2013, lars hofhansl wrote:

> BigTable has one more level of abstraction: Locality Groups
> A Column Family in HBase is both a Column Faimily and a Locality Group: It
> is a group of columns *and* it defines storage parameters (compression,
> versions, TTL, etc).
>
> As to how many make sense. It depends.
> If you can group your columns such that a scan is often limited to a
> single Column Family, you'll get huge benefit by using more Column Families.
> The main consideration for many Column Families and that each has its own
> store files, and hence scanning involves more seeking for each Column
> Families included in a scan.
>
> They are also flushed together; when one memstore (which is per Column
> Family) needs to be flushed all other Column Families are also flushed
> leading to many small files until they are compacted. If all your Column
> Faimilies are roughly the same size this is less of a problem. It's also
> possible to mitigate this by tweaking the compaction policies.
>
>
> -- Lars
>
>
>
> ________________________________
>  From: Vimal Jain <vkjk89@gmail.com <javascript:;>>
> To: user@hbase.apache.org <javascript:;>
> Sent: Saturday, August 3, 2013 11:28 PM
> Subject: Re: How many column families in one table ?
>
>
> Hi,
> I have tested read performance after reducing number of column families
> from 14 to 3 and yes there is improvement.
> Meanwhile i was going through the paper published by google on BigTable.
> It says
>
> "It is our intent that the number of distinct column
> families in a table be small (in the hundreds at most), and
> that families rarely change during operation."
>
> So Is that theoretical value ( 100 CFs )  or its possible but not with the
> current version of Hbase ?
>
>
> On Tue, Jul 2, 2013 at 12:48 AM, Viral Bajaria <viral.bajaria@gmail.com<javascript:;>
> >wrote:
>
> > On Mon, Jul 1, 2013 at 10:06 AM, Vimal Jain <vkjk89@gmail.com<javascript:;>>
> wrote:
> >
> > > Sorry for the typo .. please ignore previous mail.. Here is the
> corrected
> > > one..
> > > 1)I have around 140 columns for each row , out of 140 , around 100
> > columns
> > > hold java primitive data type , remaining 40 columns  contain
> serialized
> > > java object as byte array(Inside each object is an ArrayList). Yes , I
> do
> > > delete data but the frequency is very less ( 1 out of 5K operations ).
> I
> > > dont run any compaction.
> > >
> >
> > This answers the type of data in each cell not the size of data. Can you
> > figure out the average size of data that you insert in that size. For
> > example what is the length of the byte array ? Also for java primitive,
> is
> > it 8-byte long ? 4-byte int ?
> > In addition to that, what is in the row key ? How long is that in bytes ?
> > Same for column family, can you share the names of the column family ?
> How
> > about qualifiers ?
> >
> > If you have disabled major compactions, you should run it once a few days
> > (if not once a day) to consolidate the # of files that each scan will
> have
> > to open.
> >
> > 2) I had ran scan keeping in mind the CPU,IO and other system related
> > > parameters.I found them to be normal with system load being 0.1-0.3.
> > >
> >
> > How many disks do you have in your box ? Have you ever benchmarked the
> > hardware ?
> >
> > Thanks,
> > Viral
> >
>
>
>
> --
> Thanks and Regards,
> Vimal Jain

Re: How many column families in one table ?

Posted by lars hofhansl <la...@apache.org>.
BigTable has one more level of abstraction: Locality Groups
A Column Family in HBase is both a Column Faimily and a Locality Group: It is a group of columns *and* it defines storage parameters (compression, versions, TTL, etc).

As to how many make sense. It depends.
If you can group your columns such that a scan is often limited to a single Column Family, you'll get huge benefit by using more Column Families.
The main consideration for many Column Families and that each has its own store files, and hence scanning involves more seeking for each Column Families included in a scan.

They are also flushed together; when one memstore (which is per Column Family) needs to be flushed all other Column Families are also flushed leading to many small files until they are compacted. If all your Column Faimilies are roughly the same size this is less of a problem. It's also possible to mitigate this by tweaking the compaction policies.


-- Lars



________________________________
 From: Vimal Jain <vk...@gmail.com>
To: user@hbase.apache.org 
Sent: Saturday, August 3, 2013 11:28 PM
Subject: Re: How many column families in one table ?
 

Hi,
I have tested read performance after reducing number of column families
from 14 to 3 and yes there is improvement.
Meanwhile i was going through the paper published by google on BigTable.
It says

"It is our intent that the number of distinct column
families in a table be small (in the hundreds at most), and
that families rarely change during operation."

So Is that theoretical value ( 100 CFs )  or its possible but not with the
current version of Hbase ?


On Tue, Jul 2, 2013 at 12:48 AM, Viral Bajaria <vi...@gmail.com>wrote:

> On Mon, Jul 1, 2013 at 10:06 AM, Vimal Jain <vk...@gmail.com> wrote:
>
> > Sorry for the typo .. please ignore previous mail.. Here is the corrected
> > one..
> > 1)I have around 140 columns for each row , out of 140 , around 100
> columns
> > hold java primitive data type , remaining 40 columns  contain serialized
> > java object as byte array(Inside each object is an ArrayList). Yes , I do
> > delete data but the frequency is very less ( 1 out of 5K operations ). I
> > dont run any compaction.
> >
>
> This answers the type of data in each cell not the size of data. Can you
> figure out the average size of data that you insert in that size. For
> example what is the length of the byte array ? Also for java primitive, is
> it 8-byte long ? 4-byte int ?
> In addition to that, what is in the row key ? How long is that in bytes ?
> Same for column family, can you share the names of the column family ? How
> about qualifiers ?
>
> If you have disabled major compactions, you should run it once a few days
> (if not once a day) to consolidate the # of files that each scan will have
> to open.
>
> 2) I had ran scan keeping in mind the CPU,IO and other system related
> > parameters.I found them to be normal with system load being 0.1-0.3.
> >
>
> How many disks do you have in your box ? Have you ever benchmarked the
> hardware ?
>
> Thanks,
> Viral
>



-- 
Thanks and Regards,
Vimal Jain

Re: How many column families in one table ?

Posted by Vimal Jain <vk...@gmail.com>.
Hi,
I have tested read performance after reducing number of column families
from 14 to 3 and yes there is improvement.
Meanwhile i was going through the paper published by google on BigTable.
It says

"It is our intent that the number of distinct column
families in a table be small (in the hundreds at most), and
that families rarely change during operation."

So Is that theoretical value ( 100 CFs )  or its possible but not with the
current version of Hbase ?


On Tue, Jul 2, 2013 at 12:48 AM, Viral Bajaria <vi...@gmail.com>wrote:

> On Mon, Jul 1, 2013 at 10:06 AM, Vimal Jain <vk...@gmail.com> wrote:
>
> > Sorry for the typo .. please ignore previous mail.. Here is the corrected
> > one..
> > 1)I have around 140 columns for each row , out of 140 , around 100
> columns
> > hold java primitive data type , remaining 40 columns  contain serialized
> > java object as byte array(Inside each object is an ArrayList). Yes , I do
> > delete data but the frequency is very less ( 1 out of 5K operations ). I
> > dont run any compaction.
> >
>
> This answers the type of data in each cell not the size of data. Can you
> figure out the average size of data that you insert in that size. For
> example what is the length of the byte array ? Also for java primitive, is
> it 8-byte long ? 4-byte int ?
> In addition to that, what is in the row key ? How long is that in bytes ?
> Same for column family, can you share the names of the column family ? How
> about qualifiers ?
>
> If you have disabled major compactions, you should run it once a few days
> (if not once a day) to consolidate the # of files that each scan will have
> to open.
>
> 2) I had ran scan keeping in mind the CPU,IO and other system related
> > parameters.I found them to be normal with system load being 0.1-0.3.
> >
>
> How many disks do you have in your box ? Have you ever benchmarked the
> hardware ?
>
> Thanks,
> Viral
>



-- 
Thanks and Regards,
Vimal Jain

Re: How many column families in one table ?

Posted by Viral Bajaria <vi...@gmail.com>.
On Mon, Jul 1, 2013 at 10:06 AM, Vimal Jain <vk...@gmail.com> wrote:

> Sorry for the typo .. please ignore previous mail.. Here is the corrected
> one..
> 1)I have around 140 columns for each row , out of 140 , around 100 columns
> hold java primitive data type , remaining 40 columns  contain serialized
> java object as byte array(Inside each object is an ArrayList). Yes , I do
> delete data but the frequency is very less ( 1 out of 5K operations ). I
> dont run any compaction.
>

This answers the type of data in each cell not the size of data. Can you
figure out the average size of data that you insert in that size. For
example what is the length of the byte array ? Also for java primitive, is
it 8-byte long ? 4-byte int ?
In addition to that, what is in the row key ? How long is that in bytes ?
Same for column family, can you share the names of the column family ? How
about qualifiers ?

If you have disabled major compactions, you should run it once a few days
(if not once a day) to consolidate the # of files that each scan will have
to open.

2) I had ran scan keeping in mind the CPU,IO and other system related
> parameters.I found them to be normal with system load being 0.1-0.3.
>

How many disks do you have in your box ? Have you ever benchmarked the
hardware ?

Thanks,
Viral

Re: How many column families in one table ?

Posted by Vimal Jain <vk...@gmail.com>.
Sorry for the typo .. please ignore previous mail.. Here is the corrected
one..
1)I have around 140 columns for each row , out of 140 , around 100 columns
hold java primitive data type , remaining 40 columns  contain serialized
java object as byte array(Inside each object is an ArrayList). Yes , I do
delete data but the frequency is very less ( 1 out of 5K operations ). I
dont run any compaction.
2) I had ran scan keeping in mind the CPU,IO and other system related
parameters.I found them to be normal with system load being 0.1-0.3.
3) Yes i have 3 versions of cell ( default value).



On Mon, Jul 1, 2013 at 10:33 PM, Vimal Jain <vk...@gmail.com> wrote:

> Hi Lars,
> 1)I have around 140 columns for each row , out of 140 , around 100 rows
> are holds java primitive data type , remaining 40 rows contains serialized
> java object as byte array. Yes , I do delete data but the frequency is very
> less ( 1 out of 5K operations ). I dont run any compaction.
> 2) I had ran scan keeping in mind the CPU,IO and other system related
> parameters.I found them to be normal with system load being 0.1-0.3.
> 3) Yes i have 3 versions of cell ( default value).
>
>
> On Mon, Jul 1, 2013 at 9:08 PM, lars hofhansl <la...@apache.org> wrote:
>
>> The performance you're seeing is definitely not typical. 'couple of
>> further questions:
>> - How large are your KVs (columns)?- Do you delete data? Do you run major
>> compactions?
>> - Can you measure: CPU, IO, context switches, etc, during the scanning?
>> - Do you have many versions of the columns?
>>
>>
>> Note that HBase is a key value store, i.e. the storage is sparse. Each
>> column is represented by its own key value pair, and HBase has to do the
>> work to reassemble the data.
>>
>>
>> -- Lars
>>
>>
>>
>> ________________________________
>>  From: Vimal Jain <vk...@gmail.com>
>> To: user@hbase.apache.org
>> Sent: Monday, July 1, 2013 4:44 AM
>> Subject: Re: How many column families in one table ?
>>
>>
>> Hi,
>> We had some hardware constraints along with the fact that our total data
>> size was in GBs.
>> Thats why to start with Hbase ,  we first began  with pseudo distributed
>> mode and thought if required we would upgrade to fully distributed mode.
>>
>>
>>
>> On Mon, Jul 1, 2013 at 5:09 PM, Ted Yu <yu...@gmail.com> wrote:
>>
>> > bq. I have configured Hbase in pseudo distributed mode on top of HDFS.
>> >
>> > What was the reason for using pseudo distributed mode in production
>> setup ?
>> >
>> > Cheers
>> >
>> > On Mon, Jul 1, 2013 at 1:44 AM, Vimal Jain <vk...@gmail.com> wrote:
>> >
>> > > Thanks Dhaval/Michael/Ted/Otis for your replies.
>> > > Actually , i asked this question because i am seeing some performance
>> > > degradation in my production Hbase setup.
>> > > I have configured Hbase in pseudo distributed mode on top of HDFS.
>> > > I have created 17 Column families :( . I am actually using 14 out of
>> > these
>> > > 17 column families.
>> > > Each column family has around on average 8-10 column qualifiers so
>> total
>> > > around 140 columns are there for each row key.
>> > > I have around 1.6 millions rows in the table.
>> > > To completely scan the table for all 140 columns  , it takes around
>> 30-40
>> > > minutes.
>> > > Is it normal or Should i redesign my table schema ( probably merging
>> 4-5
>> > > column families into one , so that at the end i have just 3-4 cf ) ?
>> > >
>> > >
>> > >
>> > > On Sat, Jun 29, 2013 at 12:06 AM, Otis Gospodnetic <
>> > > otis.gospodnetic@gmail.com> wrote:
>> > >
>> > > > Hm, works for me -
>> > > >
>> > > >
>> > >
>> >
>> http://search-hadoop.com/m/qOx8l15Z1q42/column+families+fb&subj=Re+HBase+Column+Family+Limit+Reasoning
>> > > >
>> > > > Shorter version: http://search-hadoop.com/m/qOx8l15Z1q42
>> > > >
>> > > > Otis
>> > > > --
>> > > > Solr & ElasticSearch Support -- http://sematext.com/
>> > > > Performance Monitoring -- http://sematext.com/spm
>> > > >
>> > > >
>> > > >
>> > > > On Fri, Jun 28, 2013 at 8:40 AM, Vimal Jain <vk...@gmail.com>
>> wrote:
>> > > > > Hi All ,
>> > > > > Thanks for your replies.
>> > > > >
>> > > > > Ted,
>> > > > > Thanks for the link, but its not working . :(
>> > > > >
>> > > > >
>> > > > > On Fri, Jun 28, 2013 at 5:57 PM, Ted Yu <yu...@gmail.com>
>> wrote:
>> > > > >
>> > > > >> Vimal:
>> > > > >> Please also refer to:
>> > > > >>
>> > > > >>
>> > > >
>> > >
>> >
>> http://search-hadoop.com/m/qOx8l15Z1q42/column+families+fb&subj=Re+HBase+Column+Family+Limit+Reasoning
>> > > > >>
>> > > > >> On Fri, Jun 28, 2013 at 1:37 PM, Michel Segel <
>> > > > michael_segel@hotmail.com
>> > > > >> >wrote:
>> > > > >>
>> > > > >> > Short answer... As few as possible.
>> > > > >> >
>> > > > >> > 14 CF doesn't make too much sense.
>> > > > >> >
>> > > > >> > Sent from a remote device. Please excuse any typos...
>> > > > >> >
>> > > > >> > Mike Segel
>> > > > >> >
>> > > > >> > On Jun 28, 2013, at 12:20 AM, Vimal Jain <vk...@gmail.com>
>> > wrote:
>> > > > >> >
>> > > > >> > > Hi,
>> > > > >> > > How many column families should be there in an hbase table ?
>> Is
>> > > > there
>> > > > >> any
>> > > > >> > > performance issue in read/write if we have more column
>> families
>> > ?
>> > > > >> > > I have designed one table with around 14 column families in
>> it
>> > > with
>> > > > >> each
>> > > > >> > > having on average 6 qualifiers.
>> > > > >> > > Is it a good design ?
>> > > > >> > >
>> > > > >> > > --
>> > > > >> > > Thanks and Regards,
>> > > > >> > > Vimal Jain
>> > > > >> >
>> > > > >>
>> > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > > Thanks and Regards,
>> > > > > Vimal Jain
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > Thanks and Regards,
>> > > Vimal Jain
>> > >
>> >
>>
>>
>>
>> --
>> Thanks and Regards,
>> Vimal Jain
>>
>
>
>
> --
> Thanks and Regards,
> Vimal Jain
>



-- 
Thanks and Regards,
Vimal Jain

Re: How many column families in one table ?

Posted by Vimal Jain <vk...@gmail.com>.
Hi Lars,
1)I have around 140 columns for each row , out of 140 , around 100 rows are
holds java primitive data type , remaining 40 rows contains serialized java
object as byte array. Yes , I do delete data but the frequency is very less
( 1 out of 5K operations ). I dont run any compaction.
2) I had ran scan keeping in mind the CPU,IO and other system related
parameters.I found them to be normal with system load being 0.1-0.3.
3) Yes i have 3 versions of cell ( default value).


On Mon, Jul 1, 2013 at 9:08 PM, lars hofhansl <la...@apache.org> wrote:

> The performance you're seeing is definitely not typical. 'couple of
> further questions:
> - How large are your KVs (columns)?- Do you delete data? Do you run major
> compactions?
> - Can you measure: CPU, IO, context switches, etc, during the scanning?
> - Do you have many versions of the columns?
>
>
> Note that HBase is a key value store, i.e. the storage is sparse. Each
> column is represented by its own key value pair, and HBase has to do the
> work to reassemble the data.
>
>
> -- Lars
>
>
>
> ________________________________
>  From: Vimal Jain <vk...@gmail.com>
> To: user@hbase.apache.org
> Sent: Monday, July 1, 2013 4:44 AM
> Subject: Re: How many column families in one table ?
>
>
> Hi,
> We had some hardware constraints along with the fact that our total data
> size was in GBs.
> Thats why to start with Hbase ,  we first began  with pseudo distributed
> mode and thought if required we would upgrade to fully distributed mode.
>
>
>
> On Mon, Jul 1, 2013 at 5:09 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > bq. I have configured Hbase in pseudo distributed mode on top of HDFS.
> >
> > What was the reason for using pseudo distributed mode in production
> setup ?
> >
> > Cheers
> >
> > On Mon, Jul 1, 2013 at 1:44 AM, Vimal Jain <vk...@gmail.com> wrote:
> >
> > > Thanks Dhaval/Michael/Ted/Otis for your replies.
> > > Actually , i asked this question because i am seeing some performance
> > > degradation in my production Hbase setup.
> > > I have configured Hbase in pseudo distributed mode on top of HDFS.
> > > I have created 17 Column families :( . I am actually using 14 out of
> > these
> > > 17 column families.
> > > Each column family has around on average 8-10 column qualifiers so
> total
> > > around 140 columns are there for each row key.
> > > I have around 1.6 millions rows in the table.
> > > To completely scan the table for all 140 columns  , it takes around
> 30-40
> > > minutes.
> > > Is it normal or Should i redesign my table schema ( probably merging
> 4-5
> > > column families into one , so that at the end i have just 3-4 cf ) ?
> > >
> > >
> > >
> > > On Sat, Jun 29, 2013 at 12:06 AM, Otis Gospodnetic <
> > > otis.gospodnetic@gmail.com> wrote:
> > >
> > > > Hm, works for me -
> > > >
> > > >
> > >
> >
> http://search-hadoop.com/m/qOx8l15Z1q42/column+families+fb&subj=Re+HBase+Column+Family+Limit+Reasoning
> > > >
> > > > Shorter version: http://search-hadoop.com/m/qOx8l15Z1q42
> > > >
> > > > Otis
> > > > --
> > > > Solr & ElasticSearch Support -- http://sematext.com/
> > > > Performance Monitoring -- http://sematext.com/spm
> > > >
> > > >
> > > >
> > > > On Fri, Jun 28, 2013 at 8:40 AM, Vimal Jain <vk...@gmail.com>
> wrote:
> > > > > Hi All ,
> > > > > Thanks for your replies.
> > > > >
> > > > > Ted,
> > > > > Thanks for the link, but its not working . :(
> > > > >
> > > > >
> > > > > On Fri, Jun 28, 2013 at 5:57 PM, Ted Yu <yu...@gmail.com>
> wrote:
> > > > >
> > > > >> Vimal:
> > > > >> Please also refer to:
> > > > >>
> > > > >>
> > > >
> > >
> >
> http://search-hadoop.com/m/qOx8l15Z1q42/column+families+fb&subj=Re+HBase+Column+Family+Limit+Reasoning
> > > > >>
> > > > >> On Fri, Jun 28, 2013 at 1:37 PM, Michel Segel <
> > > > michael_segel@hotmail.com
> > > > >> >wrote:
> > > > >>
> > > > >> > Short answer... As few as possible.
> > > > >> >
> > > > >> > 14 CF doesn't make too much sense.
> > > > >> >
> > > > >> > Sent from a remote device. Please excuse any typos...
> > > > >> >
> > > > >> > Mike Segel
> > > > >> >
> > > > >> > On Jun 28, 2013, at 12:20 AM, Vimal Jain <vk...@gmail.com>
> > wrote:
> > > > >> >
> > > > >> > > Hi,
> > > > >> > > How many column families should be there in an hbase table ?
> Is
> > > > there
> > > > >> any
> > > > >> > > performance issue in read/write if we have more column
> families
> > ?
> > > > >> > > I have designed one table with around 14 column families in it
> > > with
> > > > >> each
> > > > >> > > having on average 6 qualifiers.
> > > > >> > > Is it a good design ?
> > > > >> > >
> > > > >> > > --
> > > > >> > > Thanks and Regards,
> > > > >> > > Vimal Jain
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Thanks and Regards,
> > > > > Vimal Jain
> > > >
> > >
> > >
> > >
> > > --
> > > Thanks and Regards,
> > > Vimal Jain
> > >
> >
>
>
>
> --
> Thanks and Regards,
> Vimal Jain
>



-- 
Thanks and Regards,
Vimal Jain

Re: How many column families in one table ?

Posted by lars hofhansl <la...@apache.org>.
The performance you're seeing is definitely not typical. 'couple of further questions:
- How large are your KVs (columns)?- Do you delete data? Do you run major compactions?
- Can you measure: CPU, IO, context switches, etc, during the scanning?
- Do you have many versions of the columns?


Note that HBase is a key value store, i.e. the storage is sparse. Each column is represented by its own key value pair, and HBase has to do the work to reassemble the data.


-- Lars



________________________________
 From: Vimal Jain <vk...@gmail.com>
To: user@hbase.apache.org 
Sent: Monday, July 1, 2013 4:44 AM
Subject: Re: How many column families in one table ?
 

Hi,
We had some hardware constraints along with the fact that our total data
size was in GBs.
Thats why to start with Hbase ,  we first began  with pseudo distributed
mode and thought if required we would upgrade to fully distributed mode.



On Mon, Jul 1, 2013 at 5:09 PM, Ted Yu <yu...@gmail.com> wrote:

> bq. I have configured Hbase in pseudo distributed mode on top of HDFS.
>
> What was the reason for using pseudo distributed mode in production setup ?
>
> Cheers
>
> On Mon, Jul 1, 2013 at 1:44 AM, Vimal Jain <vk...@gmail.com> wrote:
>
> > Thanks Dhaval/Michael/Ted/Otis for your replies.
> > Actually , i asked this question because i am seeing some performance
> > degradation in my production Hbase setup.
> > I have configured Hbase in pseudo distributed mode on top of HDFS.
> > I have created 17 Column families :( . I am actually using 14 out of
> these
> > 17 column families.
> > Each column family has around on average 8-10 column qualifiers so total
> > around 140 columns are there for each row key.
> > I have around 1.6 millions rows in the table.
> > To completely scan the table for all 140 columns  , it takes around 30-40
> > minutes.
> > Is it normal or Should i redesign my table schema ( probably merging 4-5
> > column families into one , so that at the end i have just 3-4 cf ) ?
> >
> >
> >
> > On Sat, Jun 29, 2013 at 12:06 AM, Otis Gospodnetic <
> > otis.gospodnetic@gmail.com> wrote:
> >
> > > Hm, works for me -
> > >
> > >
> >
> http://search-hadoop.com/m/qOx8l15Z1q42/column+families+fb&subj=Re+HBase+Column+Family+Limit+Reasoning
> > >
> > > Shorter version: http://search-hadoop.com/m/qOx8l15Z1q42
> > >
> > > Otis
> > > --
> > > Solr & ElasticSearch Support -- http://sematext.com/
> > > Performance Monitoring -- http://sematext.com/spm
> > >
> > >
> > >
> > > On Fri, Jun 28, 2013 at 8:40 AM, Vimal Jain <vk...@gmail.com> wrote:
> > > > Hi All ,
> > > > Thanks for your replies.
> > > >
> > > > Ted,
> > > > Thanks for the link, but its not working . :(
> > > >
> > > >
> > > > On Fri, Jun 28, 2013 at 5:57 PM, Ted Yu <yu...@gmail.com> wrote:
> > > >
> > > >> Vimal:
> > > >> Please also refer to:
> > > >>
> > > >>
> > >
> >
> http://search-hadoop.com/m/qOx8l15Z1q42/column+families+fb&subj=Re+HBase+Column+Family+Limit+Reasoning
> > > >>
> > > >> On Fri, Jun 28, 2013 at 1:37 PM, Michel Segel <
> > > michael_segel@hotmail.com
> > > >> >wrote:
> > > >>
> > > >> > Short answer... As few as possible.
> > > >> >
> > > >> > 14 CF doesn't make too much sense.
> > > >> >
> > > >> > Sent from a remote device. Please excuse any typos...
> > > >> >
> > > >> > Mike Segel
> > > >> >
> > > >> > On Jun 28, 2013, at 12:20 AM, Vimal Jain <vk...@gmail.com>
> wrote:
> > > >> >
> > > >> > > Hi,
> > > >> > > How many column families should be there in an hbase table ? Is
> > > there
> > > >> any
> > > >> > > performance issue in read/write if we have more column families
> ?
> > > >> > > I have designed one table with around 14 column families in it
> > with
> > > >> each
> > > >> > > having on average 6 qualifiers.
> > > >> > > Is it a good design ?
> > > >> > >
> > > >> > > --
> > > >> > > Thanks and Regards,
> > > >> > > Vimal Jain
> > > >> >
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > Thanks and Regards,
> > > > Vimal Jain
> > >
> >
> >
> >
> > --
> > Thanks and Regards,
> > Vimal Jain
> >
>



-- 
Thanks and Regards,
Vimal Jain

Re: How many column families in one table ?

Posted by Vimal Jain <vk...@gmail.com>.
Hi,
We had some hardware constraints along with the fact that our total data
size was in GBs.
Thats why to start with Hbase ,  we first began  with pseudo distributed
mode and thought if required we would upgrade to fully distributed mode.



On Mon, Jul 1, 2013 at 5:09 PM, Ted Yu <yu...@gmail.com> wrote:

> bq. I have configured Hbase in pseudo distributed mode on top of HDFS.
>
> What was the reason for using pseudo distributed mode in production setup ?
>
> Cheers
>
> On Mon, Jul 1, 2013 at 1:44 AM, Vimal Jain <vk...@gmail.com> wrote:
>
> > Thanks Dhaval/Michael/Ted/Otis for your replies.
> > Actually , i asked this question because i am seeing some performance
> > degradation in my production Hbase setup.
> > I have configured Hbase in pseudo distributed mode on top of HDFS.
> > I have created 17 Column families :( . I am actually using 14 out of
> these
> > 17 column families.
> > Each column family has around on average 8-10 column qualifiers so total
> > around 140 columns are there for each row key.
> > I have around 1.6 millions rows in the table.
> > To completely scan the table for all 140 columns  , it takes around 30-40
> > minutes.
> > Is it normal or Should i redesign my table schema ( probably merging 4-5
> > column families into one , so that at the end i have just 3-4 cf ) ?
> >
> >
> >
> > On Sat, Jun 29, 2013 at 12:06 AM, Otis Gospodnetic <
> > otis.gospodnetic@gmail.com> wrote:
> >
> > > Hm, works for me -
> > >
> > >
> >
> http://search-hadoop.com/m/qOx8l15Z1q42/column+families+fb&subj=Re+HBase+Column+Family+Limit+Reasoning
> > >
> > > Shorter version: http://search-hadoop.com/m/qOx8l15Z1q42
> > >
> > > Otis
> > > --
> > > Solr & ElasticSearch Support -- http://sematext.com/
> > > Performance Monitoring -- http://sematext.com/spm
> > >
> > >
> > >
> > > On Fri, Jun 28, 2013 at 8:40 AM, Vimal Jain <vk...@gmail.com> wrote:
> > > > Hi All ,
> > > > Thanks for your replies.
> > > >
> > > > Ted,
> > > > Thanks for the link, but its not working . :(
> > > >
> > > >
> > > > On Fri, Jun 28, 2013 at 5:57 PM, Ted Yu <yu...@gmail.com> wrote:
> > > >
> > > >> Vimal:
> > > >> Please also refer to:
> > > >>
> > > >>
> > >
> >
> http://search-hadoop.com/m/qOx8l15Z1q42/column+families+fb&subj=Re+HBase+Column+Family+Limit+Reasoning
> > > >>
> > > >> On Fri, Jun 28, 2013 at 1:37 PM, Michel Segel <
> > > michael_segel@hotmail.com
> > > >> >wrote:
> > > >>
> > > >> > Short answer... As few as possible.
> > > >> >
> > > >> > 14 CF doesn't make too much sense.
> > > >> >
> > > >> > Sent from a remote device. Please excuse any typos...
> > > >> >
> > > >> > Mike Segel
> > > >> >
> > > >> > On Jun 28, 2013, at 12:20 AM, Vimal Jain <vk...@gmail.com>
> wrote:
> > > >> >
> > > >> > > Hi,
> > > >> > > How many column families should be there in an hbase table ? Is
> > > there
> > > >> any
> > > >> > > performance issue in read/write if we have more column families
> ?
> > > >> > > I have designed one table with around 14 column families in it
> > with
> > > >> each
> > > >> > > having on average 6 qualifiers.
> > > >> > > Is it a good design ?
> > > >> > >
> > > >> > > --
> > > >> > > Thanks and Regards,
> > > >> > > Vimal Jain
> > > >> >
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > Thanks and Regards,
> > > > Vimal Jain
> > >
> >
> >
> >
> > --
> > Thanks and Regards,
> > Vimal Jain
> >
>



-- 
Thanks and Regards,
Vimal Jain

Re: How many column families in one table ?

Posted by Ted Yu <yu...@gmail.com>.
bq. I have configured Hbase in pseudo distributed mode on top of HDFS.

What was the reason for using pseudo distributed mode in production setup ?

Cheers

On Mon, Jul 1, 2013 at 1:44 AM, Vimal Jain <vk...@gmail.com> wrote:

> Thanks Dhaval/Michael/Ted/Otis for your replies.
> Actually , i asked this question because i am seeing some performance
> degradation in my production Hbase setup.
> I have configured Hbase in pseudo distributed mode on top of HDFS.
> I have created 17 Column families :( . I am actually using 14 out of these
> 17 column families.
> Each column family has around on average 8-10 column qualifiers so total
> around 140 columns are there for each row key.
> I have around 1.6 millions rows in the table.
> To completely scan the table for all 140 columns  , it takes around 30-40
> minutes.
> Is it normal or Should i redesign my table schema ( probably merging 4-5
> column families into one , so that at the end i have just 3-4 cf ) ?
>
>
>
> On Sat, Jun 29, 2013 at 12:06 AM, Otis Gospodnetic <
> otis.gospodnetic@gmail.com> wrote:
>
> > Hm, works for me -
> >
> >
> http://search-hadoop.com/m/qOx8l15Z1q42/column+families+fb&subj=Re+HBase+Column+Family+Limit+Reasoning
> >
> > Shorter version: http://search-hadoop.com/m/qOx8l15Z1q42
> >
> > Otis
> > --
> > Solr & ElasticSearch Support -- http://sematext.com/
> > Performance Monitoring -- http://sematext.com/spm
> >
> >
> >
> > On Fri, Jun 28, 2013 at 8:40 AM, Vimal Jain <vk...@gmail.com> wrote:
> > > Hi All ,
> > > Thanks for your replies.
> > >
> > > Ted,
> > > Thanks for the link, but its not working . :(
> > >
> > >
> > > On Fri, Jun 28, 2013 at 5:57 PM, Ted Yu <yu...@gmail.com> wrote:
> > >
> > >> Vimal:
> > >> Please also refer to:
> > >>
> > >>
> >
> http://search-hadoop.com/m/qOx8l15Z1q42/column+families+fb&subj=Re+HBase+Column+Family+Limit+Reasoning
> > >>
> > >> On Fri, Jun 28, 2013 at 1:37 PM, Michel Segel <
> > michael_segel@hotmail.com
> > >> >wrote:
> > >>
> > >> > Short answer... As few as possible.
> > >> >
> > >> > 14 CF doesn't make too much sense.
> > >> >
> > >> > Sent from a remote device. Please excuse any typos...
> > >> >
> > >> > Mike Segel
> > >> >
> > >> > On Jun 28, 2013, at 12:20 AM, Vimal Jain <vk...@gmail.com> wrote:
> > >> >
> > >> > > Hi,
> > >> > > How many column families should be there in an hbase table ? Is
> > there
> > >> any
> > >> > > performance issue in read/write if we have more column families ?
> > >> > > I have designed one table with around 14 column families in it
> with
> > >> each
> > >> > > having on average 6 qualifiers.
> > >> > > Is it a good design ?
> > >> > >
> > >> > > --
> > >> > > Thanks and Regards,
> > >> > > Vimal Jain
> > >> >
> > >>
> > >
> > >
> > >
> > > --
> > > Thanks and Regards,
> > > Vimal Jain
> >
>
>
>
> --
> Thanks and Regards,
> Vimal Jain
>

Re: How many column families in one table ?

Posted by Vimal Jain <vk...@gmail.com>.
Hi Lars,
I am using Hadoop version - 1.1.2  and Hbase version - 0.94.7.
Yes , I have enabled scanner caching with value 10K but performance is not
too good. :(


On Mon, Jul 1, 2013 at 4:48 PM, lars hofhansl <la...@apache.org> wrote:

> Which version of HBase?
> Did you enable scanner caching? Otherwise each call to next() is a RPC
> roundtrip and you are basically measuring your networks RTT.
>
> -- Lars
>
>
> ________________________________
>  From: Vimal Jain <vk...@gmail.com>
> To: user@hbase.apache.org
> Sent: Monday, July 1, 2013 4:11 AM
> Subject: Re: How many column families in one table ?
>
>
> Can someone please reply ?
> Also what is  the typical read/write speed of hbase and how much deviation
> would be there in my scenario mentioned above (14 cf , total 140 columns )
> ?
> I am asking this because i am not simply printing out the scanned values ,
> instead i am applying some logic on the data retrieved per row basis. So
> was just curious to find if that small logic in my code is contributing
> towards the long time taken to scan the table.
>
>
> On Mon, Jul 1, 2013 at 2:41 PM, Vimal Jain <vk...@gmail.com> wrote:
>
> > I scanned it during normal traffic hours.There was no I/O load on the
> > server.
> > I dont see any GC locks too.
> > Also i have given 1.5G to RS , 512M to each Master and Zookeeper.
> >
> > One correction in the post above :
> > Actual time to scan whole table is even more , it takes 10 mins to scan
> > 0.1 million rows ( so total of 2.5 hours to scan 1.6 million rows) .
> > The time i mentioned in previous post was for different type of
> > lookup.Please ignore that.
> >
> >
> > On Mon, Jul 1, 2013 at 2:24 PM, Viral Bajaria <viral.bajaria@gmail.com
> >wrote:
> >
> >> When you did the scan, did you check what the bottleneck was ? Was it
> I/O
> >> ?
> >> Did you see any GC locks ? How much RAM are you giving to your RS ?
> >>
> >> -Viral
> >>
> >> On Mon, Jul 1, 2013 at 1:44 AM, Vimal Jain <vk...@gmail.com> wrote:
> >>
> >> > To completely scan the table for all 140 columns  , it takes around
> >> 30-40
> >> > minutes.
> >> >
> >>
> >
> >
> >
> > --
> > Thanks and Regards,
> > Vimal Jain
> >
>
>
>
> --
> Thanks and Regards,
> Vimal Jain
>



-- 
Thanks and Regards,
Vimal Jain

Re: How many column families in one table ?

Posted by lars hofhansl <la...@apache.org>.
Which version of HBase?
Did you enable scanner caching? Otherwise each call to next() is a RPC roundtrip and you are basically measuring your networks RTT.

-- Lars


________________________________
 From: Vimal Jain <vk...@gmail.com>
To: user@hbase.apache.org 
Sent: Monday, July 1, 2013 4:11 AM
Subject: Re: How many column families in one table ?
 

Can someone please reply ?
Also what is  the typical read/write speed of hbase and how much deviation
would be there in my scenario mentioned above (14 cf , total 140 columns ) ?
I am asking this because i am not simply printing out the scanned values ,
instead i am applying some logic on the data retrieved per row basis. So
was just curious to find if that small logic in my code is contributing
towards the long time taken to scan the table.


On Mon, Jul 1, 2013 at 2:41 PM, Vimal Jain <vk...@gmail.com> wrote:

> I scanned it during normal traffic hours.There was no I/O load on the
> server.
> I dont see any GC locks too.
> Also i have given 1.5G to RS , 512M to each Master and Zookeeper.
>
> One correction in the post above :
> Actual time to scan whole table is even more , it takes 10 mins to scan
> 0.1 million rows ( so total of 2.5 hours to scan 1.6 million rows) .
> The time i mentioned in previous post was for different type of
> lookup.Please ignore that.
>
>
> On Mon, Jul 1, 2013 at 2:24 PM, Viral Bajaria <vi...@gmail.com>wrote:
>
>> When you did the scan, did you check what the bottleneck was ? Was it I/O
>> ?
>> Did you see any GC locks ? How much RAM are you giving to your RS ?
>>
>> -Viral
>>
>> On Mon, Jul 1, 2013 at 1:44 AM, Vimal Jain <vk...@gmail.com> wrote:
>>
>> > To completely scan the table for all 140 columns  , it takes around
>> 30-40
>> > minutes.
>> >
>>
>
>
>
> --
> Thanks and Regards,
> Vimal Jain
>



-- 
Thanks and Regards,
Vimal Jain

Re: How many column families in one table ?

Posted by Vimal Jain <vk...@gmail.com>.
Can someone please reply ?
Also what is  the typical read/write speed of hbase and how much deviation
would be there in my scenario mentioned above (14 cf , total 140 columns ) ?
I am asking this because i am not simply printing out the scanned values ,
instead i am applying some logic on the data retrieved per row basis. So
was just curious to find if that small logic in my code is contributing
towards the long time taken to scan the table.


On Mon, Jul 1, 2013 at 2:41 PM, Vimal Jain <vk...@gmail.com> wrote:

> I scanned it during normal traffic hours.There was no I/O load on the
> server.
> I dont see any GC locks too.
> Also i have given 1.5G to RS , 512M to each Master and Zookeeper.
>
> One correction in the post above :
> Actual time to scan whole table is even more , it takes 10 mins to scan
> 0.1 million rows ( so total of 2.5 hours to scan 1.6 million rows) .
> The time i mentioned in previous post was for different type of
> lookup.Please ignore that.
>
>
> On Mon, Jul 1, 2013 at 2:24 PM, Viral Bajaria <vi...@gmail.com>wrote:
>
>> When you did the scan, did you check what the bottleneck was ? Was it I/O
>> ?
>> Did you see any GC locks ? How much RAM are you giving to your RS ?
>>
>> -Viral
>>
>> On Mon, Jul 1, 2013 at 1:44 AM, Vimal Jain <vk...@gmail.com> wrote:
>>
>> > To completely scan the table for all 140 columns  , it takes around
>> 30-40
>> > minutes.
>> >
>>
>
>
>
> --
> Thanks and Regards,
> Vimal Jain
>



-- 
Thanks and Regards,
Vimal Jain

Re: How many column families in one table ?

Posted by Vimal Jain <vk...@gmail.com>.
I scanned it during normal traffic hours.There was no I/O load on the
server.
I dont see any GC locks too.
Also i have given 1.5G to RS , 512M to each Master and Zookeeper.

One correction in the post above :
Actual time to scan whole table is even more , it takes 10 mins to scan 0.1
million rows ( so total of 2.5 hours to scan 1.6 million rows) .
The time i mentioned in previous post was for different type of
lookup.Please ignore that.


On Mon, Jul 1, 2013 at 2:24 PM, Viral Bajaria <vi...@gmail.com>wrote:

> When you did the scan, did you check what the bottleneck was ? Was it I/O ?
> Did you see any GC locks ? How much RAM are you giving to your RS ?
>
> -Viral
>
> On Mon, Jul 1, 2013 at 1:44 AM, Vimal Jain <vk...@gmail.com> wrote:
>
> > To completely scan the table for all 140 columns  , it takes around 30-40
> > minutes.
> >
>



-- 
Thanks and Regards,
Vimal Jain

Re: How many column families in one table ?

Posted by Viral Bajaria <vi...@gmail.com>.
When you did the scan, did you check what the bottleneck was ? Was it I/O ?
Did you see any GC locks ? How much RAM are you giving to your RS ?

-Viral

On Mon, Jul 1, 2013 at 1:44 AM, Vimal Jain <vk...@gmail.com> wrote:

> To completely scan the table for all 140 columns  , it takes around 30-40
> minutes.
>

Re: How many column families in one table ?

Posted by Vimal Jain <vk...@gmail.com>.
Thanks Dhaval/Michael/Ted/Otis for your replies.
Actually , i asked this question because i am seeing some performance
degradation in my production Hbase setup.
I have configured Hbase in pseudo distributed mode on top of HDFS.
I have created 17 Column families :( . I am actually using 14 out of these
17 column families.
Each column family has around on average 8-10 column qualifiers so total
around 140 columns are there for each row key.
I have around 1.6 millions rows in the table.
To completely scan the table for all 140 columns  , it takes around 30-40
minutes.
Is it normal or Should i redesign my table schema ( probably merging 4-5
column families into one , so that at the end i have just 3-4 cf ) ?



On Sat, Jun 29, 2013 at 12:06 AM, Otis Gospodnetic <
otis.gospodnetic@gmail.com> wrote:

> Hm, works for me -
>
> http://search-hadoop.com/m/qOx8l15Z1q42/column+families+fb&subj=Re+HBase+Column+Family+Limit+Reasoning
>
> Shorter version: http://search-hadoop.com/m/qOx8l15Z1q42
>
> Otis
> --
> Solr & ElasticSearch Support -- http://sematext.com/
> Performance Monitoring -- http://sematext.com/spm
>
>
>
> On Fri, Jun 28, 2013 at 8:40 AM, Vimal Jain <vk...@gmail.com> wrote:
> > Hi All ,
> > Thanks for your replies.
> >
> > Ted,
> > Thanks for the link, but its not working . :(
> >
> >
> > On Fri, Jun 28, 2013 at 5:57 PM, Ted Yu <yu...@gmail.com> wrote:
> >
> >> Vimal:
> >> Please also refer to:
> >>
> >>
> http://search-hadoop.com/m/qOx8l15Z1q42/column+families+fb&subj=Re+HBase+Column+Family+Limit+Reasoning
> >>
> >> On Fri, Jun 28, 2013 at 1:37 PM, Michel Segel <
> michael_segel@hotmail.com
> >> >wrote:
> >>
> >> > Short answer... As few as possible.
> >> >
> >> > 14 CF doesn't make too much sense.
> >> >
> >> > Sent from a remote device. Please excuse any typos...
> >> >
> >> > Mike Segel
> >> >
> >> > On Jun 28, 2013, at 12:20 AM, Vimal Jain <vk...@gmail.com> wrote:
> >> >
> >> > > Hi,
> >> > > How many column families should be there in an hbase table ? Is
> there
> >> any
> >> > > performance issue in read/write if we have more column families ?
> >> > > I have designed one table with around 14 column families in it with
> >> each
> >> > > having on average 6 qualifiers.
> >> > > Is it a good design ?
> >> > >
> >> > > --
> >> > > Thanks and Regards,
> >> > > Vimal Jain
> >> >
> >>
> >
> >
> >
> > --
> > Thanks and Regards,
> > Vimal Jain
>



-- 
Thanks and Regards,
Vimal Jain

Re: How many column families in one table ?

Posted by Otis Gospodnetic <ot...@gmail.com>.
Hm, works for me -
http://search-hadoop.com/m/qOx8l15Z1q42/column+families+fb&subj=Re+HBase+Column+Family+Limit+Reasoning

Shorter version: http://search-hadoop.com/m/qOx8l15Z1q42

Otis
--
Solr & ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm



On Fri, Jun 28, 2013 at 8:40 AM, Vimal Jain <vk...@gmail.com> wrote:
> Hi All ,
> Thanks for your replies.
>
> Ted,
> Thanks for the link, but its not working . :(
>
>
> On Fri, Jun 28, 2013 at 5:57 PM, Ted Yu <yu...@gmail.com> wrote:
>
>> Vimal:
>> Please also refer to:
>>
>> http://search-hadoop.com/m/qOx8l15Z1q42/column+families+fb&subj=Re+HBase+Column+Family+Limit+Reasoning
>>
>> On Fri, Jun 28, 2013 at 1:37 PM, Michel Segel <michael_segel@hotmail.com
>> >wrote:
>>
>> > Short answer... As few as possible.
>> >
>> > 14 CF doesn't make too much sense.
>> >
>> > Sent from a remote device. Please excuse any typos...
>> >
>> > Mike Segel
>> >
>> > On Jun 28, 2013, at 12:20 AM, Vimal Jain <vk...@gmail.com> wrote:
>> >
>> > > Hi,
>> > > How many column families should be there in an hbase table ? Is there
>> any
>> > > performance issue in read/write if we have more column families ?
>> > > I have designed one table with around 14 column families in it with
>> each
>> > > having on average 6 qualifiers.
>> > > Is it a good design ?
>> > >
>> > > --
>> > > Thanks and Regards,
>> > > Vimal Jain
>> >
>>
>
>
>
> --
> Thanks and Regards,
> Vimal Jain

Re: How many column families in one table ?

Posted by Vimal Jain <vk...@gmail.com>.
Hi All ,
Thanks for your replies.

Ted,
Thanks for the link, but its not working . :(


On Fri, Jun 28, 2013 at 5:57 PM, Ted Yu <yu...@gmail.com> wrote:

> Vimal:
> Please also refer to:
>
> http://search-hadoop.com/m/qOx8l15Z1q42/column+families+fb&subj=Re+HBase+Column+Family+Limit+Reasoning
>
> On Fri, Jun 28, 2013 at 1:37 PM, Michel Segel <michael_segel@hotmail.com
> >wrote:
>
> > Short answer... As few as possible.
> >
> > 14 CF doesn't make too much sense.
> >
> > Sent from a remote device. Please excuse any typos...
> >
> > Mike Segel
> >
> > On Jun 28, 2013, at 12:20 AM, Vimal Jain <vk...@gmail.com> wrote:
> >
> > > Hi,
> > > How many column families should be there in an hbase table ? Is there
> any
> > > performance issue in read/write if we have more column families ?
> > > I have designed one table with around 14 column families in it with
> each
> > > having on average 6 qualifiers.
> > > Is it a good design ?
> > >
> > > --
> > > Thanks and Regards,
> > > Vimal Jain
> >
>



-- 
Thanks and Regards,
Vimal Jain

Re: How many column families in one table ?

Posted by Ted Yu <yu...@gmail.com>.
Vimal:
Please also refer to:
http://search-hadoop.com/m/qOx8l15Z1q42/column+families+fb&subj=Re+HBase+Column+Family+Limit+Reasoning

On Fri, Jun 28, 2013 at 1:37 PM, Michel Segel <mi...@hotmail.com>wrote:

> Short answer... As few as possible.
>
> 14 CF doesn't make too much sense.
>
> Sent from a remote device. Please excuse any typos...
>
> Mike Segel
>
> On Jun 28, 2013, at 12:20 AM, Vimal Jain <vk...@gmail.com> wrote:
>
> > Hi,
> > How many column families should be there in an hbase table ? Is there any
> > performance issue in read/write if we have more column families ?
> > I have designed one table with around 14 column families in it with each
> > having on average 6 qualifiers.
> > Is it a good design ?
> >
> > --
> > Thanks and Regards,
> > Vimal Jain
>

Re: How many column families in one table ?

Posted by Michel Segel <mi...@hotmail.com>.
Short answer... As few as possible.

14 CF doesn't make too much sense.

Sent from a remote device. Please excuse any typos...

Mike Segel

On Jun 28, 2013, at 12:20 AM, Vimal Jain <vk...@gmail.com> wrote:

> Hi,
> How many column families should be there in an hbase table ? Is there any
> performance issue in read/write if we have more column families ?
> I have designed one table with around 14 column families in it with each
> having on average 6 qualifiers.
> Is it a good design ?
> 
> -- 
> Thanks and Regards,
> Vimal Jain