You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Michael Dagaev <mi...@gmail.com> on 2009/03/09 09:23:20 UTC

Many columns in 0.19

Hi , all

    I remember it was not recommended to add many columns (column
qualifiers) in Hbase 0.18
Does Hbase 0.19.0 still have this limitation?

Thank you for your cooperation,
M.

Re: Many columns in 0.19

Posted by Michael Dagaev <mi...@gmail.com>.
Thanks, Amandeep.
I am probably mistaken.  The limit exists only for the column families
but not column qualifiers. Is it correct?

On Mon, Mar 9, 2009 at 10:30 AM, Amandeep Khurana <am...@gmail.com> wrote:
> You mean column families is it?
>
>
> Amandeep Khurana
> Computer Science Graduate Student
> University of California, Santa Cruz
>
>
> On Mon, Mar 9, 2009 at 1:28 AM, Ryan Rawson <ry...@gmail.com> wrote:
>
>> Sadly this is still a limit.
>>
>> 0.20 should make things much better.
>>
>> -ryan
>>
>> On Mon, Mar 9, 2009 at 12:23 AM, Michael Dagaev <michael.dagaev@gmail.com
>> >wrote:
>>
>> > Hi , all
>> >
>> >    I remember it was not recommended to add many columns (column
>> > qualifiers) in Hbase 0.18
>> > Does Hbase 0.19.0 still have this limitation?
>> >
>> > Thank you for your cooperation,
>> > M.
>> >
>>
>

Re: Many columns in 0.19

Posted by Amandeep Khurana <am...@gmail.com>.
You mean column families is it?


Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz


On Mon, Mar 9, 2009 at 1:28 AM, Ryan Rawson <ry...@gmail.com> wrote:

> Sadly this is still a limit.
>
> 0.20 should make things much better.
>
> -ryan
>
> On Mon, Mar 9, 2009 at 12:23 AM, Michael Dagaev <michael.dagaev@gmail.com
> >wrote:
>
> > Hi , all
> >
> >    I remember it was not recommended to add many columns (column
> > qualifiers) in Hbase 0.18
> > Does Hbase 0.19.0 still have this limitation?
> >
> > Thank you for your cooperation,
> > M.
> >
>

RE: Many columns in 0.19

Posted by Jonathan Gray <jl...@streamy.com>.
That is within the acceptable ranges for 0.19.  It's not recommended to go past 15 or so families on 0.19, should be better on 0.20, but like Ryan said each family is stored separately so you'll always be adding more overhead to the table for each additional family.  Total column qualifiers/labels for 0.19 is several thousand, I've got row-families with about 50,000 columns and don't see any problems.  The hope is, however, to be in the millions for 0.20.


> -----Original Message-----
> From: Michael Seibold [mailto:seibold@in.tum.de]
> Sent: Monday, March 09, 2009 2:06 AM
> To: hbase-user@hadoop.apache.org
> Subject: Re: Many columns in 0.19
> 
> Hi,
> 
> so how many column families per table and how many columns(=labels) per
> column family are feasible in HBase 0.19.0?
> 
> column families per table: ?
> columns per column family: ?
> 
> I want to use:
> --------------------
> column families per table: 10
> columns (=labels) per column family: 100 - 150
> 
> Will I hit HDFS performance issues with that?
> 
> Kind regards,
> Michael
> 
> 
> El lun, 09-03-2009 a las 01:50 -0800, Ryan Rawson escribió:
> > Don't forget, each column family is another file on disk, and file
> open.
> > Every column family is stored in it's own mapfile, and that increases
> the
> > load on HDFS.
> >
> > This particular restriction won't ever really go away (unless we
> introduce
> > locality groups, even then, each locality group = N families = 1
> file), but
> > in 0.20 it should be more feasable to have thousands of columns per
> family,
> > or more.
> >
> > -ryan
> >
> > On Mon, Mar 9, 2009 at 1:47 AM, Michael Dagaev
> <mi...@gmail.com>wrote:
> >
> > > Thank you, Ryan
> > >
> > > On Mon, Mar 9, 2009 at 10:28 AM, Ryan Rawson <ry...@gmail.com>
> wrote:
> > > > Sadly this is still a limit.
> > > >
> > > > 0.20 should make things much better.
> > > >
> > > > -ryan
> > > >
> > > > On Mon, Mar 9, 2009 at 12:23 AM, Michael Dagaev <
> > > michael.dagaev@gmail.com>wrote:
> > > >
> > > >> Hi , all
> > > >>
> > > >>    I remember it was not recommended to add many columns (column
> > > >> qualifiers) in Hbase 0.18
> > > >> Does Hbase 0.19.0 still have this limitation?
> > > >>
> > > >> Thank you for your cooperation,
> > > >> M.
> > > >>
> > > >
> > >


Re: Many columns in 0.19

Posted by Michael Seibold <se...@in.tum.de>.
Hi,

so how many column families per table and how many columns(=labels) per
column family are feasible in HBase 0.19.0?

column families per table: ?
columns per column family: ?

I want to use:
--------------------
column families per table: 10
columns (=labels) per column family: 100 - 150

Will I hit HDFS performance issues with that?

Kind regards,
Michael


El lun, 09-03-2009 a las 01:50 -0800, Ryan Rawson escribió:
> Don't forget, each column family is another file on disk, and file open.
> Every column family is stored in it's own mapfile, and that increases the
> load on HDFS.
> 
> This particular restriction won't ever really go away (unless we introduce
> locality groups, even then, each locality group = N families = 1 file), but
> in 0.20 it should be more feasable to have thousands of columns per family,
> or more.
> 
> -ryan
> 
> On Mon, Mar 9, 2009 at 1:47 AM, Michael Dagaev <mi...@gmail.com>wrote:
> 
> > Thank you, Ryan
> >
> > On Mon, Mar 9, 2009 at 10:28 AM, Ryan Rawson <ry...@gmail.com> wrote:
> > > Sadly this is still a limit.
> > >
> > > 0.20 should make things much better.
> > >
> > > -ryan
> > >
> > > On Mon, Mar 9, 2009 at 12:23 AM, Michael Dagaev <
> > michael.dagaev@gmail.com>wrote:
> > >
> > >> Hi , all
> > >>
> > >>    I remember it was not recommended to add many columns (column
> > >> qualifiers) in Hbase 0.18
> > >> Does Hbase 0.19.0 still have this limitation?
> > >>
> > >> Thank you for your cooperation,
> > >> M.
> > >>
> > >
> >


RE: Many columns in 0.19

Posted by "Puri, Aseem" <As...@Honeywell.com>.
Thanks JG for sharing you knowledge

-Aseem Puri

-----Original Message-----
From: Jonathan Gray [mailto:jlist@streamy.com] 
Sent: Thursday, March 12, 2009 12:37 AM
To: hbase-user@hadoop.apache.org
Subject: RE: Many columns in 0.19

That is correct.  If you have 20 regions of a table which contains 10
families, you will have 200 HStores, 200 Memcaches, and some number of
HStoreFiles.

Families are very expensive in that way, almost like creating another
table
except it can be read at the same time (in the same request) with other
families and under the same row key.

JG

> -----Original Message-----
> From: Puri, Aseem [mailto:Aseem.Puri@Honeywell.com]
> Sent: Wednesday, March 11, 2009 6:03 AM
> To: hbase-user@hadoop.apache.org
> Subject: RE: Many columns in 0.19
> 
> 
> Thanks JG and Schubert for sharing your knowledge.
> 
> One thing I want to ask that in a HRegionServer there are lots of
> region. So it means for every region have its HStore and correspond to
> that every Hstore have one memcache. Like earlier if there are 20
> region
> in HRegionServer and 10 column families in it, so it means there are
10
> HStore. So we have 20*10 total HStore?
> 
> Plz tell exactly whats happening, I am little bit confused?
> 
> -Aseem Puri
> 
> -----Original Message-----
> From: schubert zhang [mailto:zsongbo@gmail.com]
> Sent: Wednesday, March 11, 2009 8:36 AM
> To: hbase-user@hadoop.apache.org
> Subject: Re: Many columns in 0.19
> 
> Cool, the HFile solution is what mentioned in Paper of Bigtable, it
> will
> be
> more efficient than MapFile.We are looking forward 0.20.0, including
> Bloom
> Filter.
> Thanks.
> 
> On Wed, Mar 11, 2009 at 2:28 AM, Jonathan Gray <jl...@streamy.com>
> wrote:
> 
> > Aseem,
> >
> > Almost!
> >
> > You will have 10 HStores as you say.  Each of those HStores is made
> up
> of a
> > single Memcache instance and zero or many MapFiles on HDFS.  Default
> block
> > size in HDFS is 64MB not 64k, so it could be a single block or many.
> >
> > Writes are done into the Memcache.  That is periodically flushed to
> HDFS
> > creating a single HStoreFile.  Multiple flushes will then yield
> multiples
> > HSFs.  Compactions and major compactions are run periodically to
> combine
> > these files into a single HStoreFile, for efficiency.
> >
> > In the upcoming 0.20 release we will move to a new HDFS file format
> called
> > HFile.  Within HFile, our data will be broken up into ~64k blocks
> > (configurable) but still stored in HDFS in 64M blocks (again,
> > configurable).
> >
> > JG
> >
> > > -----Original Message-----
> > > From: Puri, Aseem [mailto:Aseem.Puri@Honeywell.com]
> > > Sent: Monday, March 09, 2009 9:34 PM
> > > To: hbase-user@hadoop.apache.org
> > > Subject: RE: Many columns in 0.19
> > >
> > > Hi
> > >
> > > Thanks for help.
> > >
> > > So it means for a table if there are 10 column families then there
> are
> > > 10 HStore in a region and corresponding to it there are 10 map
> files.
> > > Mapfile further have blocks inside it of 64K are stored by HDFS.
> > >
> > > Am I right?
> > >
> > > -Aseem Puri
> > >
> > >
> > >
> > > -----Original Message-----
> > > From: Jonathan Gray [mailto:jlist@streamy.com]
> > > Sent: Monday, March 09, 2009 7:24 PM
> > > To: hbase-user@hadoop.apache.org
> > > Subject: RE: Many columns in 0.19
> > >
> > > A Table is made up of 1 to N HRegions and defined by its Column
> > > Families.
> > >
> > > Each HRegion is made up of an HStore per column family.  Each
> HStore
> is
> > > then
> > > made up of a single Memcache and 0 to M HStoreFiles.
> > >
> > > So, the HStore is one column family in one region.  It houses that
> > > families
> > > Memcache and HStoreFiles for that particular region.
> > >
> > > And yes, Bigtable stores one family of a region in one SSTable.
> The
> > > only
> > > caveat to that is that they offer "Locality Groups", as mentioned
> by
> > > Ryan,
> > > that group different families together in a single SSTable (or
> HStore
> > > in
> > > our
> > > case).  Changes in 0.20 leave the door open for HBase to also
> implement
> > > them
> > > but it is not currently on the roadmap.
> > >
> > > Hope that helps.
> > >
> > > JG
> > >
> > > > -----Original Message-----
> > > > From: Puri, Aseem [mailto:Aseem.Puri@Honeywell.com]
> > > > Sent: Monday, March 09, 2009 3:22 AM
> > > > To: hbase-user@hadoop.apache.org
> > > > Subject: RE: Many columns in 0.19
> > > >
> > > >
> > > > Hi
> > > >
> > > > I was reading Google BigTable article. Many thing oh hbase are
> > > similar
> > > > to Bigatable. But I cant understand the concept of HStore. Is
> HStore
> > > > means one column family in one map file?
> > > >
> > > > Is BigTable also store one column family in one SStable?
> > > >
> > > > -Aseem
> > > >
> > > > -----Original Message-----
> > > > From: Ryan Rawson [mailto:ryanobjc@gmail.com]
> > > > Sent: Monday, March 09, 2009 3:20 PM
> > > > To: hbase-user@hadoop.apache.org
> > > > Subject: Re: Many columns in 0.19
> > > >
> > > > Don't forget, each column family is another file on disk, and
> file
> > > > open.
> > > > Every column family is stored in it's own mapfile, and that
> increases
> > > > the
> > > > load on HDFS.
> > > >
> > > > This particular restriction won't ever really go away (unless we
> > > > introduce
> > > > locality groups, even then, each locality group = N families = 1
> > > file),
> > > > but
> > > > in 0.20 it should be more feasable to have thousands of columns
> per
> > > > family,
> > > > or more.
> > > >
> > > > -ryan
> > > >
> > > > On Mon, Mar 9, 2009 at 1:47 AM, Michael Dagaev
> > > > <mi...@gmail.com>wrote:
> > > >
> > > > > Thank you, Ryan
> > > > >
> > > > > On Mon, Mar 9, 2009 at 10:28 AM, Ryan Rawson
> <ry...@gmail.com>
> > > > wrote:
> > > > > > Sadly this is still a limit.
> > > > > >
> > > > > > 0.20 should make things much better.
> > > > > >
> > > > > > -ryan
> > > > > >
> > > > > > On Mon, Mar 9, 2009 at 12:23 AM, Michael Dagaev <
> > > > > michael.dagaev@gmail.com>wrote:
> > > > > >
> > > > > >> Hi , all
> > > > > >>
> > > > > >>    I remember it was not recommended to add many columns
> (column
> > > > > >> qualifiers) in Hbase 0.18
> > > > > >> Does Hbase 0.19.0 still have this limitation?
> > > > > >>
> > > > > >> Thank you for your cooperation,
> > > > > >> M.
> > > > > >>
> > > > > >
> > > > >
> >
> >
> >


RE: Many columns in 0.19

Posted by Jonathan Gray <jl...@streamy.com>.
That is correct.  If you have 20 regions of a table which contains 10
families, you will have 200 HStores, 200 Memcaches, and some number of
HStoreFiles.

Families are very expensive in that way, almost like creating another table
except it can be read at the same time (in the same request) with other
families and under the same row key.

JG

> -----Original Message-----
> From: Puri, Aseem [mailto:Aseem.Puri@Honeywell.com]
> Sent: Wednesday, March 11, 2009 6:03 AM
> To: hbase-user@hadoop.apache.org
> Subject: RE: Many columns in 0.19
> 
> 
> Thanks JG and Schubert for sharing your knowledge.
> 
> One thing I want to ask that in a HRegionServer there are lots of
> region. So it means for every region have its HStore and correspond to
> that every Hstore have one memcache. Like earlier if there are 20
> region
> in HRegionServer and 10 column families in it, so it means there are 10
> HStore. So we have 20*10 total HStore?
> 
> Plz tell exactly whats happening, I am little bit confused?
> 
> -Aseem Puri
> 
> -----Original Message-----
> From: schubert zhang [mailto:zsongbo@gmail.com]
> Sent: Wednesday, March 11, 2009 8:36 AM
> To: hbase-user@hadoop.apache.org
> Subject: Re: Many columns in 0.19
> 
> Cool, the HFile solution is what mentioned in Paper of Bigtable, it
> will
> be
> more efficient than MapFile.We are looking forward 0.20.0, including
> Bloom
> Filter.
> Thanks.
> 
> On Wed, Mar 11, 2009 at 2:28 AM, Jonathan Gray <jl...@streamy.com>
> wrote:
> 
> > Aseem,
> >
> > Almost!
> >
> > You will have 10 HStores as you say.  Each of those HStores is made
> up
> of a
> > single Memcache instance and zero or many MapFiles on HDFS.  Default
> block
> > size in HDFS is 64MB not 64k, so it could be a single block or many.
> >
> > Writes are done into the Memcache.  That is periodically flushed to
> HDFS
> > creating a single HStoreFile.  Multiple flushes will then yield
> multiples
> > HSFs.  Compactions and major compactions are run periodically to
> combine
> > these files into a single HStoreFile, for efficiency.
> >
> > In the upcoming 0.20 release we will move to a new HDFS file format
> called
> > HFile.  Within HFile, our data will be broken up into ~64k blocks
> > (configurable) but still stored in HDFS in 64M blocks (again,
> > configurable).
> >
> > JG
> >
> > > -----Original Message-----
> > > From: Puri, Aseem [mailto:Aseem.Puri@Honeywell.com]
> > > Sent: Monday, March 09, 2009 9:34 PM
> > > To: hbase-user@hadoop.apache.org
> > > Subject: RE: Many columns in 0.19
> > >
> > > Hi
> > >
> > > Thanks for help.
> > >
> > > So it means for a table if there are 10 column families then there
> are
> > > 10 HStore in a region and corresponding to it there are 10 map
> files.
> > > Mapfile further have blocks inside it of 64K are stored by HDFS.
> > >
> > > Am I right?
> > >
> > > -Aseem Puri
> > >
> > >
> > >
> > > -----Original Message-----
> > > From: Jonathan Gray [mailto:jlist@streamy.com]
> > > Sent: Monday, March 09, 2009 7:24 PM
> > > To: hbase-user@hadoop.apache.org
> > > Subject: RE: Many columns in 0.19
> > >
> > > A Table is made up of 1 to N HRegions and defined by its Column
> > > Families.
> > >
> > > Each HRegion is made up of an HStore per column family.  Each
> HStore
> is
> > > then
> > > made up of a single Memcache and 0 to M HStoreFiles.
> > >
> > > So, the HStore is one column family in one region.  It houses that
> > > families
> > > Memcache and HStoreFiles for that particular region.
> > >
> > > And yes, Bigtable stores one family of a region in one SSTable.
> The
> > > only
> > > caveat to that is that they offer "Locality Groups", as mentioned
> by
> > > Ryan,
> > > that group different families together in a single SSTable (or
> HStore
> > > in
> > > our
> > > case).  Changes in 0.20 leave the door open for HBase to also
> implement
> > > them
> > > but it is not currently on the roadmap.
> > >
> > > Hope that helps.
> > >
> > > JG
> > >
> > > > -----Original Message-----
> > > > From: Puri, Aseem [mailto:Aseem.Puri@Honeywell.com]
> > > > Sent: Monday, March 09, 2009 3:22 AM
> > > > To: hbase-user@hadoop.apache.org
> > > > Subject: RE: Many columns in 0.19
> > > >
> > > >
> > > > Hi
> > > >
> > > > I was reading Google BigTable article. Many thing oh hbase are
> > > similar
> > > > to Bigatable. But I cant understand the concept of HStore. Is
> HStore
> > > > means one column family in one map file?
> > > >
> > > > Is BigTable also store one column family in one SStable?
> > > >
> > > > -Aseem
> > > >
> > > > -----Original Message-----
> > > > From: Ryan Rawson [mailto:ryanobjc@gmail.com]
> > > > Sent: Monday, March 09, 2009 3:20 PM
> > > > To: hbase-user@hadoop.apache.org
> > > > Subject: Re: Many columns in 0.19
> > > >
> > > > Don't forget, each column family is another file on disk, and
> file
> > > > open.
> > > > Every column family is stored in it's own mapfile, and that
> increases
> > > > the
> > > > load on HDFS.
> > > >
> > > > This particular restriction won't ever really go away (unless we
> > > > introduce
> > > > locality groups, even then, each locality group = N families = 1
> > > file),
> > > > but
> > > > in 0.20 it should be more feasable to have thousands of columns
> per
> > > > family,
> > > > or more.
> > > >
> > > > -ryan
> > > >
> > > > On Mon, Mar 9, 2009 at 1:47 AM, Michael Dagaev
> > > > <mi...@gmail.com>wrote:
> > > >
> > > > > Thank you, Ryan
> > > > >
> > > > > On Mon, Mar 9, 2009 at 10:28 AM, Ryan Rawson
> <ry...@gmail.com>
> > > > wrote:
> > > > > > Sadly this is still a limit.
> > > > > >
> > > > > > 0.20 should make things much better.
> > > > > >
> > > > > > -ryan
> > > > > >
> > > > > > On Mon, Mar 9, 2009 at 12:23 AM, Michael Dagaev <
> > > > > michael.dagaev@gmail.com>wrote:
> > > > > >
> > > > > >> Hi , all
> > > > > >>
> > > > > >>    I remember it was not recommended to add many columns
> (column
> > > > > >> qualifiers) in Hbase 0.18
> > > > > >> Does Hbase 0.19.0 still have this limitation?
> > > > > >>
> > > > > >> Thank you for your cooperation,
> > > > > >> M.
> > > > > >>
> > > > > >
> > > > >
> >
> >
> >


RE: Many columns in 0.19

Posted by "Puri, Aseem" <As...@Honeywell.com>.
Thanks JG and Schubert for sharing your knowledge.

One thing I want to ask that in a HRegionServer there are lots of
region. So it means for every region have its HStore and correspond to
that every Hstore have one memcache. Like earlier if there are 20 region
in HRegionServer and 10 column families in it, so it means there are 10
HStore. So we have 20*10 total HStore? 

Plz tell exactly whats happening, I am little bit confused?

-Aseem Puri

-----Original Message-----
From: schubert zhang [mailto:zsongbo@gmail.com] 
Sent: Wednesday, March 11, 2009 8:36 AM
To: hbase-user@hadoop.apache.org
Subject: Re: Many columns in 0.19

Cool, the HFile solution is what mentioned in Paper of Bigtable, it will
be
more efficient than MapFile.We are looking forward 0.20.0, including
Bloom
Filter.
Thanks.

On Wed, Mar 11, 2009 at 2:28 AM, Jonathan Gray <jl...@streamy.com>
wrote:

> Aseem,
>
> Almost!
>
> You will have 10 HStores as you say.  Each of those HStores is made up
of a
> single Memcache instance and zero or many MapFiles on HDFS.  Default
block
> size in HDFS is 64MB not 64k, so it could be a single block or many.
>
> Writes are done into the Memcache.  That is periodically flushed to
HDFS
> creating a single HStoreFile.  Multiple flushes will then yield
multiples
> HSFs.  Compactions and major compactions are run periodically to
combine
> these files into a single HStoreFile, for efficiency.
>
> In the upcoming 0.20 release we will move to a new HDFS file format
called
> HFile.  Within HFile, our data will be broken up into ~64k blocks
> (configurable) but still stored in HDFS in 64M blocks (again,
> configurable).
>
> JG
>
> > -----Original Message-----
> > From: Puri, Aseem [mailto:Aseem.Puri@Honeywell.com]
> > Sent: Monday, March 09, 2009 9:34 PM
> > To: hbase-user@hadoop.apache.org
> > Subject: RE: Many columns in 0.19
> >
> > Hi
> >
> > Thanks for help.
> >
> > So it means for a table if there are 10 column families then there
are
> > 10 HStore in a region and corresponding to it there are 10 map
files.
> > Mapfile further have blocks inside it of 64K are stored by HDFS.
> >
> > Am I right?
> >
> > -Aseem Puri
> >
> >
> >
> > -----Original Message-----
> > From: Jonathan Gray [mailto:jlist@streamy.com]
> > Sent: Monday, March 09, 2009 7:24 PM
> > To: hbase-user@hadoop.apache.org
> > Subject: RE: Many columns in 0.19
> >
> > A Table is made up of 1 to N HRegions and defined by its Column
> > Families.
> >
> > Each HRegion is made up of an HStore per column family.  Each HStore
is
> > then
> > made up of a single Memcache and 0 to M HStoreFiles.
> >
> > So, the HStore is one column family in one region.  It houses that
> > families
> > Memcache and HStoreFiles for that particular region.
> >
> > And yes, Bigtable stores one family of a region in one SSTable.  The
> > only
> > caveat to that is that they offer "Locality Groups", as mentioned by
> > Ryan,
> > that group different families together in a single SSTable (or
HStore
> > in
> > our
> > case).  Changes in 0.20 leave the door open for HBase to also
implement
> > them
> > but it is not currently on the roadmap.
> >
> > Hope that helps.
> >
> > JG
> >
> > > -----Original Message-----
> > > From: Puri, Aseem [mailto:Aseem.Puri@Honeywell.com]
> > > Sent: Monday, March 09, 2009 3:22 AM
> > > To: hbase-user@hadoop.apache.org
> > > Subject: RE: Many columns in 0.19
> > >
> > >
> > > Hi
> > >
> > > I was reading Google BigTable article. Many thing oh hbase are
> > similar
> > > to Bigatable. But I cant understand the concept of HStore. Is
HStore
> > > means one column family in one map file?
> > >
> > > Is BigTable also store one column family in one SStable?
> > >
> > > -Aseem
> > >
> > > -----Original Message-----
> > > From: Ryan Rawson [mailto:ryanobjc@gmail.com]
> > > Sent: Monday, March 09, 2009 3:20 PM
> > > To: hbase-user@hadoop.apache.org
> > > Subject: Re: Many columns in 0.19
> > >
> > > Don't forget, each column family is another file on disk, and file
> > > open.
> > > Every column family is stored in it's own mapfile, and that
increases
> > > the
> > > load on HDFS.
> > >
> > > This particular restriction won't ever really go away (unless we
> > > introduce
> > > locality groups, even then, each locality group = N families = 1
> > file),
> > > but
> > > in 0.20 it should be more feasable to have thousands of columns
per
> > > family,
> > > or more.
> > >
> > > -ryan
> > >
> > > On Mon, Mar 9, 2009 at 1:47 AM, Michael Dagaev
> > > <mi...@gmail.com>wrote:
> > >
> > > > Thank you, Ryan
> > > >
> > > > On Mon, Mar 9, 2009 at 10:28 AM, Ryan Rawson
<ry...@gmail.com>
> > > wrote:
> > > > > Sadly this is still a limit.
> > > > >
> > > > > 0.20 should make things much better.
> > > > >
> > > > > -ryan
> > > > >
> > > > > On Mon, Mar 9, 2009 at 12:23 AM, Michael Dagaev <
> > > > michael.dagaev@gmail.com>wrote:
> > > > >
> > > > >> Hi , all
> > > > >>
> > > > >>    I remember it was not recommended to add many columns
(column
> > > > >> qualifiers) in Hbase 0.18
> > > > >> Does Hbase 0.19.0 still have this limitation?
> > > > >>
> > > > >> Thank you for your cooperation,
> > > > >> M.
> > > > >>
> > > > >
> > > >
>
>
>

Re: Many columns in 0.19

Posted by schubert zhang <zs...@gmail.com>.
Cool, the HFile solution is what mentioned in Paper of Bigtable, it will be
more efficient than MapFile.We are looking forward 0.20.0, including Bloom
Filter.
Thanks.

On Wed, Mar 11, 2009 at 2:28 AM, Jonathan Gray <jl...@streamy.com> wrote:

> Aseem,
>
> Almost!
>
> You will have 10 HStores as you say.  Each of those HStores is made up of a
> single Memcache instance and zero or many MapFiles on HDFS.  Default block
> size in HDFS is 64MB not 64k, so it could be a single block or many.
>
> Writes are done into the Memcache.  That is periodically flushed to HDFS
> creating a single HStoreFile.  Multiple flushes will then yield multiples
> HSFs.  Compactions and major compactions are run periodically to combine
> these files into a single HStoreFile, for efficiency.
>
> In the upcoming 0.20 release we will move to a new HDFS file format called
> HFile.  Within HFile, our data will be broken up into ~64k blocks
> (configurable) but still stored in HDFS in 64M blocks (again,
> configurable).
>
> JG
>
> > -----Original Message-----
> > From: Puri, Aseem [mailto:Aseem.Puri@Honeywell.com]
> > Sent: Monday, March 09, 2009 9:34 PM
> > To: hbase-user@hadoop.apache.org
> > Subject: RE: Many columns in 0.19
> >
> > Hi
> >
> > Thanks for help.
> >
> > So it means for a table if there are 10 column families then there are
> > 10 HStore in a region and corresponding to it there are 10 map files.
> > Mapfile further have blocks inside it of 64K are stored by HDFS.
> >
> > Am I right?
> >
> > -Aseem Puri
> >
> >
> >
> > -----Original Message-----
> > From: Jonathan Gray [mailto:jlist@streamy.com]
> > Sent: Monday, March 09, 2009 7:24 PM
> > To: hbase-user@hadoop.apache.org
> > Subject: RE: Many columns in 0.19
> >
> > A Table is made up of 1 to N HRegions and defined by its Column
> > Families.
> >
> > Each HRegion is made up of an HStore per column family.  Each HStore is
> > then
> > made up of a single Memcache and 0 to M HStoreFiles.
> >
> > So, the HStore is one column family in one region.  It houses that
> > families
> > Memcache and HStoreFiles for that particular region.
> >
> > And yes, Bigtable stores one family of a region in one SSTable.  The
> > only
> > caveat to that is that they offer "Locality Groups", as mentioned by
> > Ryan,
> > that group different families together in a single SSTable (or HStore
> > in
> > our
> > case).  Changes in 0.20 leave the door open for HBase to also implement
> > them
> > but it is not currently on the roadmap.
> >
> > Hope that helps.
> >
> > JG
> >
> > > -----Original Message-----
> > > From: Puri, Aseem [mailto:Aseem.Puri@Honeywell.com]
> > > Sent: Monday, March 09, 2009 3:22 AM
> > > To: hbase-user@hadoop.apache.org
> > > Subject: RE: Many columns in 0.19
> > >
> > >
> > > Hi
> > >
> > > I was reading Google BigTable article. Many thing oh hbase are
> > similar
> > > to Bigatable. But I cant understand the concept of HStore. Is HStore
> > > means one column family in one map file?
> > >
> > > Is BigTable also store one column family in one SStable?
> > >
> > > -Aseem
> > >
> > > -----Original Message-----
> > > From: Ryan Rawson [mailto:ryanobjc@gmail.com]
> > > Sent: Monday, March 09, 2009 3:20 PM
> > > To: hbase-user@hadoop.apache.org
> > > Subject: Re: Many columns in 0.19
> > >
> > > Don't forget, each column family is another file on disk, and file
> > > open.
> > > Every column family is stored in it's own mapfile, and that increases
> > > the
> > > load on HDFS.
> > >
> > > This particular restriction won't ever really go away (unless we
> > > introduce
> > > locality groups, even then, each locality group = N families = 1
> > file),
> > > but
> > > in 0.20 it should be more feasable to have thousands of columns per
> > > family,
> > > or more.
> > >
> > > -ryan
> > >
> > > On Mon, Mar 9, 2009 at 1:47 AM, Michael Dagaev
> > > <mi...@gmail.com>wrote:
> > >
> > > > Thank you, Ryan
> > > >
> > > > On Mon, Mar 9, 2009 at 10:28 AM, Ryan Rawson <ry...@gmail.com>
> > > wrote:
> > > > > Sadly this is still a limit.
> > > > >
> > > > > 0.20 should make things much better.
> > > > >
> > > > > -ryan
> > > > >
> > > > > On Mon, Mar 9, 2009 at 12:23 AM, Michael Dagaev <
> > > > michael.dagaev@gmail.com>wrote:
> > > > >
> > > > >> Hi , all
> > > > >>
> > > > >>    I remember it was not recommended to add many columns (column
> > > > >> qualifiers) in Hbase 0.18
> > > > >> Does Hbase 0.19.0 still have this limitation?
> > > > >>
> > > > >> Thank you for your cooperation,
> > > > >> M.
> > > > >>
> > > > >
> > > >
>
>
>

RE: Many columns in 0.19

Posted by Jonathan Gray <jl...@streamy.com>.
Aseem,

Almost!

You will have 10 HStores as you say.  Each of those HStores is made up of a
single Memcache instance and zero or many MapFiles on HDFS.  Default block
size in HDFS is 64MB not 64k, so it could be a single block or many.

Writes are done into the Memcache.  That is periodically flushed to HDFS
creating a single HStoreFile.  Multiple flushes will then yield multiples
HSFs.  Compactions and major compactions are run periodically to combine
these files into a single HStoreFile, for efficiency.

In the upcoming 0.20 release we will move to a new HDFS file format called
HFile.  Within HFile, our data will be broken up into ~64k blocks
(configurable) but still stored in HDFS in 64M blocks (again, configurable).

JG

> -----Original Message-----
> From: Puri, Aseem [mailto:Aseem.Puri@Honeywell.com]
> Sent: Monday, March 09, 2009 9:34 PM
> To: hbase-user@hadoop.apache.org
> Subject: RE: Many columns in 0.19
> 
> Hi
> 
> Thanks for help.
> 
> So it means for a table if there are 10 column families then there are
> 10 HStore in a region and corresponding to it there are 10 map files.
> Mapfile further have blocks inside it of 64K are stored by HDFS.
> 
> Am I right?
> 
> -Aseem Puri
> 
> 
> 
> -----Original Message-----
> From: Jonathan Gray [mailto:jlist@streamy.com]
> Sent: Monday, March 09, 2009 7:24 PM
> To: hbase-user@hadoop.apache.org
> Subject: RE: Many columns in 0.19
> 
> A Table is made up of 1 to N HRegions and defined by its Column
> Families.
> 
> Each HRegion is made up of an HStore per column family.  Each HStore is
> then
> made up of a single Memcache and 0 to M HStoreFiles.
> 
> So, the HStore is one column family in one region.  It houses that
> families
> Memcache and HStoreFiles for that particular region.
> 
> And yes, Bigtable stores one family of a region in one SSTable.  The
> only
> caveat to that is that they offer "Locality Groups", as mentioned by
> Ryan,
> that group different families together in a single SSTable (or HStore
> in
> our
> case).  Changes in 0.20 leave the door open for HBase to also implement
> them
> but it is not currently on the roadmap.
> 
> Hope that helps.
> 
> JG
> 
> > -----Original Message-----
> > From: Puri, Aseem [mailto:Aseem.Puri@Honeywell.com]
> > Sent: Monday, March 09, 2009 3:22 AM
> > To: hbase-user@hadoop.apache.org
> > Subject: RE: Many columns in 0.19
> >
> >
> > Hi
> >
> > I was reading Google BigTable article. Many thing oh hbase are
> similar
> > to Bigatable. But I cant understand the concept of HStore. Is HStore
> > means one column family in one map file?
> >
> > Is BigTable also store one column family in one SStable?
> >
> > -Aseem
> >
> > -----Original Message-----
> > From: Ryan Rawson [mailto:ryanobjc@gmail.com]
> > Sent: Monday, March 09, 2009 3:20 PM
> > To: hbase-user@hadoop.apache.org
> > Subject: Re: Many columns in 0.19
> >
> > Don't forget, each column family is another file on disk, and file
> > open.
> > Every column family is stored in it's own mapfile, and that increases
> > the
> > load on HDFS.
> >
> > This particular restriction won't ever really go away (unless we
> > introduce
> > locality groups, even then, each locality group = N families = 1
> file),
> > but
> > in 0.20 it should be more feasable to have thousands of columns per
> > family,
> > or more.
> >
> > -ryan
> >
> > On Mon, Mar 9, 2009 at 1:47 AM, Michael Dagaev
> > <mi...@gmail.com>wrote:
> >
> > > Thank you, Ryan
> > >
> > > On Mon, Mar 9, 2009 at 10:28 AM, Ryan Rawson <ry...@gmail.com>
> > wrote:
> > > > Sadly this is still a limit.
> > > >
> > > > 0.20 should make things much better.
> > > >
> > > > -ryan
> > > >
> > > > On Mon, Mar 9, 2009 at 12:23 AM, Michael Dagaev <
> > > michael.dagaev@gmail.com>wrote:
> > > >
> > > >> Hi , all
> > > >>
> > > >>    I remember it was not recommended to add many columns (column
> > > >> qualifiers) in Hbase 0.18
> > > >> Does Hbase 0.19.0 still have this limitation?
> > > >>
> > > >> Thank you for your cooperation,
> > > >> M.
> > > >>
> > > >
> > >



RE: Many columns in 0.19

Posted by "Puri, Aseem" <As...@Honeywell.com>.
Hi

Thanks for help. 

So it means for a table if there are 10 column families then there are
10 HStore in a region and corresponding to it there are 10 map files.
Mapfile further have blocks inside it of 64K are stored by HDFS. 

Am I right?  

-Aseem Puri



-----Original Message-----
From: Jonathan Gray [mailto:jlist@streamy.com] 
Sent: Monday, March 09, 2009 7:24 PM
To: hbase-user@hadoop.apache.org
Subject: RE: Many columns in 0.19

A Table is made up of 1 to N HRegions and defined by its Column
Families.

Each HRegion is made up of an HStore per column family.  Each HStore is
then
made up of a single Memcache and 0 to M HStoreFiles.

So, the HStore is one column family in one region.  It houses that
families
Memcache and HStoreFiles for that particular region.

And yes, Bigtable stores one family of a region in one SSTable.  The
only
caveat to that is that they offer "Locality Groups", as mentioned by
Ryan,
that group different families together in a single SSTable (or HStore in
our
case).  Changes in 0.20 leave the door open for HBase to also implement
them
but it is not currently on the roadmap.

Hope that helps.

JG

> -----Original Message-----
> From: Puri, Aseem [mailto:Aseem.Puri@Honeywell.com]
> Sent: Monday, March 09, 2009 3:22 AM
> To: hbase-user@hadoop.apache.org
> Subject: RE: Many columns in 0.19
> 
> 
> Hi
> 
> I was reading Google BigTable article. Many thing oh hbase are similar
> to Bigatable. But I cant understand the concept of HStore. Is HStore
> means one column family in one map file?
> 
> Is BigTable also store one column family in one SStable?
> 
> -Aseem
> 
> -----Original Message-----
> From: Ryan Rawson [mailto:ryanobjc@gmail.com]
> Sent: Monday, March 09, 2009 3:20 PM
> To: hbase-user@hadoop.apache.org
> Subject: Re: Many columns in 0.19
> 
> Don't forget, each column family is another file on disk, and file
> open.
> Every column family is stored in it's own mapfile, and that increases
> the
> load on HDFS.
> 
> This particular restriction won't ever really go away (unless we
> introduce
> locality groups, even then, each locality group = N families = 1
file),
> but
> in 0.20 it should be more feasable to have thousands of columns per
> family,
> or more.
> 
> -ryan
> 
> On Mon, Mar 9, 2009 at 1:47 AM, Michael Dagaev
> <mi...@gmail.com>wrote:
> 
> > Thank you, Ryan
> >
> > On Mon, Mar 9, 2009 at 10:28 AM, Ryan Rawson <ry...@gmail.com>
> wrote:
> > > Sadly this is still a limit.
> > >
> > > 0.20 should make things much better.
> > >
> > > -ryan
> > >
> > > On Mon, Mar 9, 2009 at 12:23 AM, Michael Dagaev <
> > michael.dagaev@gmail.com>wrote:
> > >
> > >> Hi , all
> > >>
> > >>    I remember it was not recommended to add many columns (column
> > >> qualifiers) in Hbase 0.18
> > >> Does Hbase 0.19.0 still have this limitation?
> > >>
> > >> Thank you for your cooperation,
> > >> M.
> > >>
> > >
> >


RE: Many columns in 0.19

Posted by Jonathan Gray <jl...@streamy.com>.
A Table is made up of 1 to N HRegions and defined by its Column Families.

Each HRegion is made up of an HStore per column family.  Each HStore is then
made up of a single Memcache and 0 to M HStoreFiles.

So, the HStore is one column family in one region.  It houses that families
Memcache and HStoreFiles for that particular region.

And yes, Bigtable stores one family of a region in one SSTable.  The only
caveat to that is that they offer "Locality Groups", as mentioned by Ryan,
that group different families together in a single SSTable (or HStore in our
case).  Changes in 0.20 leave the door open for HBase to also implement them
but it is not currently on the roadmap.

Hope that helps.

JG

> -----Original Message-----
> From: Puri, Aseem [mailto:Aseem.Puri@Honeywell.com]
> Sent: Monday, March 09, 2009 3:22 AM
> To: hbase-user@hadoop.apache.org
> Subject: RE: Many columns in 0.19
> 
> 
> Hi
> 
> I was reading Google BigTable article. Many thing oh hbase are similar
> to Bigatable. But I cant understand the concept of HStore. Is HStore
> means one column family in one map file?
> 
> Is BigTable also store one column family in one SStable?
> 
> -Aseem
> 
> -----Original Message-----
> From: Ryan Rawson [mailto:ryanobjc@gmail.com]
> Sent: Monday, March 09, 2009 3:20 PM
> To: hbase-user@hadoop.apache.org
> Subject: Re: Many columns in 0.19
> 
> Don't forget, each column family is another file on disk, and file
> open.
> Every column family is stored in it's own mapfile, and that increases
> the
> load on HDFS.
> 
> This particular restriction won't ever really go away (unless we
> introduce
> locality groups, even then, each locality group = N families = 1 file),
> but
> in 0.20 it should be more feasable to have thousands of columns per
> family,
> or more.
> 
> -ryan
> 
> On Mon, Mar 9, 2009 at 1:47 AM, Michael Dagaev
> <mi...@gmail.com>wrote:
> 
> > Thank you, Ryan
> >
> > On Mon, Mar 9, 2009 at 10:28 AM, Ryan Rawson <ry...@gmail.com>
> wrote:
> > > Sadly this is still a limit.
> > >
> > > 0.20 should make things much better.
> > >
> > > -ryan
> > >
> > > On Mon, Mar 9, 2009 at 12:23 AM, Michael Dagaev <
> > michael.dagaev@gmail.com>wrote:
> > >
> > >> Hi , all
> > >>
> > >>    I remember it was not recommended to add many columns (column
> > >> qualifiers) in Hbase 0.18
> > >> Does Hbase 0.19.0 still have this limitation?
> > >>
> > >> Thank you for your cooperation,
> > >> M.
> > >>
> > >
> >


RE: Many columns in 0.19

Posted by "Puri, Aseem" <As...@Honeywell.com>.
Hi

I was reading Google BigTable article. Many thing oh hbase are similar
to Bigatable. But I cant understand the concept of HStore. Is HStore
means one column family in one map file? 

Is BigTable also store one column family in one SStable?

-Aseem

-----Original Message-----
From: Ryan Rawson [mailto:ryanobjc@gmail.com] 
Sent: Monday, March 09, 2009 3:20 PM
To: hbase-user@hadoop.apache.org
Subject: Re: Many columns in 0.19

Don't forget, each column family is another file on disk, and file open.
Every column family is stored in it's own mapfile, and that increases
the
load on HDFS.

This particular restriction won't ever really go away (unless we
introduce
locality groups, even then, each locality group = N families = 1 file),
but
in 0.20 it should be more feasable to have thousands of columns per
family,
or more.

-ryan

On Mon, Mar 9, 2009 at 1:47 AM, Michael Dagaev
<mi...@gmail.com>wrote:

> Thank you, Ryan
>
> On Mon, Mar 9, 2009 at 10:28 AM, Ryan Rawson <ry...@gmail.com>
wrote:
> > Sadly this is still a limit.
> >
> > 0.20 should make things much better.
> >
> > -ryan
> >
> > On Mon, Mar 9, 2009 at 12:23 AM, Michael Dagaev <
> michael.dagaev@gmail.com>wrote:
> >
> >> Hi , all
> >>
> >>    I remember it was not recommended to add many columns (column
> >> qualifiers) in Hbase 0.18
> >> Does Hbase 0.19.0 still have this limitation?
> >>
> >> Thank you for your cooperation,
> >> M.
> >>
> >
>

Re: Many columns in 0.19

Posted by Ryan Rawson <ry...@gmail.com>.
Don't forget, each column family is another file on disk, and file open.
Every column family is stored in it's own mapfile, and that increases the
load on HDFS.

This particular restriction won't ever really go away (unless we introduce
locality groups, even then, each locality group = N families = 1 file), but
in 0.20 it should be more feasable to have thousands of columns per family,
or more.

-ryan

On Mon, Mar 9, 2009 at 1:47 AM, Michael Dagaev <mi...@gmail.com>wrote:

> Thank you, Ryan
>
> On Mon, Mar 9, 2009 at 10:28 AM, Ryan Rawson <ry...@gmail.com> wrote:
> > Sadly this is still a limit.
> >
> > 0.20 should make things much better.
> >
> > -ryan
> >
> > On Mon, Mar 9, 2009 at 12:23 AM, Michael Dagaev <
> michael.dagaev@gmail.com>wrote:
> >
> >> Hi , all
> >>
> >>    I remember it was not recommended to add many columns (column
> >> qualifiers) in Hbase 0.18
> >> Does Hbase 0.19.0 still have this limitation?
> >>
> >> Thank you for your cooperation,
> >> M.
> >>
> >
>

Re: Many columns in 0.19

Posted by Michael Dagaev <mi...@gmail.com>.
Thank you, Ryan

On Mon, Mar 9, 2009 at 10:28 AM, Ryan Rawson <ry...@gmail.com> wrote:
> Sadly this is still a limit.
>
> 0.20 should make things much better.
>
> -ryan
>
> On Mon, Mar 9, 2009 at 12:23 AM, Michael Dagaev <mi...@gmail.com>wrote:
>
>> Hi , all
>>
>>    I remember it was not recommended to add many columns (column
>> qualifiers) in Hbase 0.18
>> Does Hbase 0.19.0 still have this limitation?
>>
>> Thank you for your cooperation,
>> M.
>>
>

Re: Many columns in 0.19

Posted by Ryan Rawson <ry...@gmail.com>.
Sadly this is still a limit.

0.20 should make things much better.

-ryan

On Mon, Mar 9, 2009 at 12:23 AM, Michael Dagaev <mi...@gmail.com>wrote:

> Hi , all
>
>    I remember it was not recommended to add many columns (column
> qualifiers) in Hbase 0.18
> Does Hbase 0.19.0 still have this limitation?
>
> Thank you for your cooperation,
> M.
>