You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by anil gupta <an...@gmail.com> on 2012/08/14 20:15:48 UTC

Disk space usage of HFilev1 vs HFilev2

Hi All,

I recently updated my cluster from HBase 0.90 to HBase 0.92. One replica of
one table used to take 90 GB in 0.90 but the same table takes 45 GB in
0.92(HFilev2). The table has 1 column family and each row stores data of
300-400 bytes(this is the size of values) in 20-30 column.
I am interested in knowing of any disk usage optimization done in HFilev2?
Please share if you know of any relevant document to understand the
reduction in disk space usage?

-- 
Thanks & Regards,
Anil Gupta

Re: Disk space usage of HFilev1 vs HFilev2

Posted by anil gupta <an...@gmail.com>.
Thanks a lot for quick test, Harsh. This will certainly help me. I'll see
if i am missing something in my comparison of hdfs usage between HBase0.90
and HBase0.92.

Thanks Again,
Anil

On Tue, Aug 14, 2012 at 2:42 PM, Harsh J <ha...@cloudera.com> wrote:

> Not wanting to have this thread too end up as a mystery-result on the
> web, I did some tests. I loaded 10k rows (of 100 KB random chars each)
> into test tables on 0.90 and 0.92 both, flushed them, major_compact'ed
> them (waited for completion and drop in IO write activity) and then
> measured them to find this:
>
> 0.92 takes a total of 1049661190 bytes under its /hbase/test directory.
> 0.90 takes a total of 1049467570 bytes under its /hbase/test directory.
>
> So… not much of a difference. It is still your data that counts. I
> believe what Anil may have had were merely additional, un-compacted
> stores?
>
> P.s. Note that my 'test' table were all defaults. That is, merely
> "create 'test', 'col1'", nothing else, so the block indexes must've
> probably gotten created for every row, as thats at 64k by default,
> while my rows are all 100k each.
>
> On Wed, Aug 15, 2012 at 2:25 AM, anil gupta <an...@gmail.com> wrote:
> > Hi Kevin,
> >
> > If it's not possible to store table in HFilev1 in HBase 0.92 then my last
> > option will be to do store data on pseudo-distributed or standalone
> cluster
> > for the comparison.
> > The advantage with the current installation is that its a fully
> distributed
> > cluster with around 33 million records in a table. So, it would give me a
> > better estimate.
> >
> > Thanks,
> > Anil Gupta
> >
> > On Tue, Aug 14, 2012 at 1:48 PM, Kevin O'dell <kevin.odell@cloudera.com
> >wrote:
> >
> >> Do you not have a pseudo cluster for testing anywhere?
> >>
> >> On Tue, Aug 14, 2012 at 4:46 PM, anil gupta <an...@gmail.com>
> wrote:
> >>
> >> > Hi Jerry,
> >> >
> >> > I am wiling to do that but the problem is that i wiped off the
> HBase0.90
> >> > cluster. Is there a way to store a table in HFilev1 in HBase0.92? If i
> >> can
> >> > store a file in HFilev1 in 0.92 then i can do the comparison.
> >> >
> >> > Thanks,
> >> > Anil Gupta
> >> >
> >> > On Tue, Aug 14, 2012 at 1:28 PM, Jerry Lam <ch...@gmail.com>
> wrote:
> >> >
> >> > > Hi Anil:
> >> > >
> >> > > Maybe you can try to compare the two HFile implementation directly?
> Let
> >> > say
> >> > > write 1000 rows into HFile v1 format and then into HFile v2 format.
> You
> >> > can
> >> > > then compare the size of the two directly?
> >> > >
> >> > > HTH,
> >> > >
> >> > > Jerry
> >> > >
> >> > > On Tue, Aug 14, 2012 at 3:36 PM, anil gupta <an...@gmail.com>
> >> > wrote:
> >> > >
> >> > > > Hi Zahoor,
> >> > > >
> >> > > > Then it seems like i might have missed something when doing hdfs
> >> usage
> >> > > > estimation of HBase. I usually do hadoop fs -dus
> /hbase/$TABLE_NAME
> >> for
> >> > > > getting the hdfs usage of a table. Is this the right way? Since i
> >> wiped
> >> > > of
> >> > > > the HBase0.90 cluster so now i cannot look into hdfs usage of it.
> Is
> >> it
> >> > > > possible to store a table in HFileV1 instead of HFileV2 in
> HBase0.92?
> >> > > > In this way i can do a fair comparison.
> >> > > >
> >> > > > Thanks,
> >> > > > Anil Gupta
> >> > > >
> >> > > > On Tue, Aug 14, 2012 at 12:13 PM, jmozah <jm...@gmail.com>
> wrote:
> >> > > >
> >> > > > > Hi Anil,
> >> > > > >
> >> > > > > I really doubt that there is 50% drop in file sizes... As far
> as i
> >> > > know..
> >> > > > > there is no drastic space conserving feature in V2. Just as  an
> >> after
> >> > > > > thought.. do a major compact and check the sizes.
> >> > > > >
> >> > > > > ./Zahoor
> >> > > > > http://blog.zahoor.in
> >> > > > >
> >> > > > >
> >> > > > > On 15-Aug-2012, at 12:31 AM, anil gupta <an...@gmail.com>
> >> > wrote:
> >> > > > >
> >> > > > > > l
> >> > > > >
> >> > > > >
> >> > > >
> >> > > >
> >> > > > --
> >> > > > Thanks & Regards,
> >> > > > Anil Gupta
> >> > > >
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> > Thanks & Regards,
> >> > Anil Gupta
> >> >
> >>
> >>
> >>
> >> --
> >> Kevin O'Dell
> >> Customer Operations Engineer, Cloudera
> >>
> >
> >
> >
> > --
> > Thanks & Regards,
> > Anil Gupta
>
>
>
> --
> Harsh J
>



-- 
Thanks & Regards,
Anil Gupta

Re: Disk space usage of HFilev1 vs HFilev2

Posted by lars hofhansl <lh...@yahoo.com>.
I think the memstoreTS is stored with each KV (until it can be proven to be not needed - because no older open scanners, in which case it is not written during the next compaction and assumed 0)
"Mild passing interest" :)  Yep.



________________________________
 From: Stack <st...@duboce.net>
To: user@hbase.apache.org; lars hofhansl <lh...@yahoo.com> 
Sent: Tuesday, August 28, 2012 11:54 AM
Subject: Re: Disk space usage of HFilev1 vs HFilev2
 
On Tue, Aug 28, 2012 at 11:42 AM, lars hofhansl <lh...@yahoo.com> wrote:
> Are we terribly concerned about 3.5% of extra disk usage?
> HFileV2 was designed to be more main memory efficient, which is in much shorter supply than disk space (bloom filters and index blocks are interspersed with data blocks and loaded when needed, etc)
>

I wouldn't get my knickers in a twist about 3.5% but a mild, passing
interest, yes.

> The stored MemstoreTS was introduced in 0.92, which also introduced HFileV2.
>

Yeah, IIRC, its a metadata field added to each storefile.

St.Ack

Re: Disk space usage of HFilev1 vs HFilev2

Posted by Stack <st...@duboce.net>.
On Tue, Aug 28, 2012 at 11:42 AM, lars hofhansl <lh...@yahoo.com> wrote:
> Are we terribly concerned about 3.5% of extra disk usage?
> HFileV2 was designed to be more main memory efficient, which is in much shorter supply than disk space (bloom filters and index blocks are interspersed with data blocks and loaded when needed, etc)
>

I wouldn't get my knickers in a twist about 3.5% but a mild, passing
interest, yes.

> The stored MemstoreTS was introduced in 0.92, which also introduced HFileV2.
>

Yeah, IIRC, its a metadata field added to each storefile.

St.Ack

Re: Disk space usage of HFilev1 vs HFilev2

Posted by lars hofhansl <lh...@yahoo.com>.
Are we terribly concerned about 3.5% of extra disk usage?
HFileV2 was designed to be more main memory efficient, which is in much shorter supply than disk space (bloom filters and index blocks are interspersed with data blocks and loaded when needed, etc)

The stored MemstoreTS was introduced in 0.92, which also introduced HFileV2.


-- Lars



________________________________
 From: Matt Corgan <mc...@hotpads.com>
To: user@hbase.apache.org 
Sent: Tuesday, August 28, 2012 11:24 AM
Subject: Re: Disk space usage of HFilev1 vs HFilev2
 
Could it be the addition of the memstoreTS?  i forget if that is in v1 as
well.

Matt

On Tue, Aug 28, 2012 at 7:37 AM, Stack <st...@duboce.net> wrote:

> On Mon, Aug 27, 2012 at 8:30 PM, anil gupta <an...@gmail.com> wrote:
> > Hi All,
> >
> > Here are the steps i followed to load the table with HFilev1 format:
> > 1. Set the property hfile.format.version to 1.
> > 2. Updated the conf across the cluster.
> > 3. Restarted the cluster.
> > 4. Ran the bulk loader.
> >
> > Table has 34 million records and one column family.
> > Results:
> > HDFS space for one replica of table in HFilev2:39.8 GB
> > HDFS space for one replica of table in HFilev1:38.4 GB
> >
> > Ironically, as per the above results HFileV1 is taking 3.5% lesser space
> > than HFileV2 format. I also skimmed through the code and i saw references
> > to "hfile.format.version" in HFile.java class.
> >
>
> It would be interesting to know what makes up the 3.5% difference?
> More metadata on the end of the file on v2?
>
> St.Ack
>

Re: Disk space usage of HFilev1 vs HFilev2

Posted by Matt Corgan <mc...@hotpads.com>.
Could it be the addition of the memstoreTS?  i forget if that is in v1 as
well.

Matt

On Tue, Aug 28, 2012 at 7:37 AM, Stack <st...@duboce.net> wrote:

> On Mon, Aug 27, 2012 at 8:30 PM, anil gupta <an...@gmail.com> wrote:
> > Hi All,
> >
> > Here are the steps i followed to load the table with HFilev1 format:
> > 1. Set the property hfile.format.version to 1.
> > 2. Updated the conf across the cluster.
> > 3. Restarted the cluster.
> > 4. Ran the bulk loader.
> >
> > Table has 34 million records and one column family.
> > Results:
> > HDFS space for one replica of table in HFilev2:39.8 GB
> > HDFS space for one replica of table in HFilev1:38.4 GB
> >
> > Ironically, as per the above results HFileV1 is taking 3.5% lesser space
> > than HFileV2 format. I also skimmed through the code and i saw references
> > to "hfile.format.version" in HFile.java class.
> >
>
> It would be interesting to know what makes up the 3.5% difference?
> More metadata on the end of the file on v2?
>
> St.Ack
>

Re: Disk space usage of HFilev1 vs HFilev2

Posted by Stack <st...@duboce.net>.
On Mon, Aug 27, 2012 at 8:30 PM, anil gupta <an...@gmail.com> wrote:
> Hi All,
>
> Here are the steps i followed to load the table with HFilev1 format:
> 1. Set the property hfile.format.version to 1.
> 2. Updated the conf across the cluster.
> 3. Restarted the cluster.
> 4. Ran the bulk loader.
>
> Table has 34 million records and one column family.
> Results:
> HDFS space for one replica of table in HFilev2:39.8 GB
> HDFS space for one replica of table in HFilev1:38.4 GB
>
> Ironically, as per the above results HFileV1 is taking 3.5% lesser space
> than HFileV2 format. I also skimmed through the code and i saw references
> to "hfile.format.version" in HFile.java class.
>

It would be interesting to know what makes up the 3.5% difference?
More metadata on the end of the file on v2?

St.Ack

Re: Disk space usage of HFilev1 vs HFilev2

Posted by anil gupta <an...@gmail.com>.
Hi All,

Here are the steps i followed to load the table with HFilev1 format:
1. Set the property hfile.format.version to 1.
2. Updated the conf across the cluster.
3. Restarted the cluster.
4. Ran the bulk loader.

Table has 34 million records and one column family.
Results:
HDFS space for one replica of table in HFilev2:39.8 GB
HDFS space for one replica of table in HFilev1:38.4 GB

Ironically, as per the above results HFileV1 is taking 3.5% lesser space
than HFileV2 format. I also skimmed through the code and i saw references
to "hfile.format.version" in HFile.java class.

Thanks,
Anil Gupta

On Mon, Aug 27, 2012 at 1:32 PM, Kevin O'dell <ke...@cloudera.com>wrote:

> Anil,
>
>   Please let us know how well this works.
>
> On Mon, Aug 27, 2012 at 4:19 PM, anil gupta <an...@gmail.com> wrote:
>
> > Hi Guys,
> >
> > I was digging through the hbase-default.xml file and i found this
> property
> > relates HFile handling:
> > </property>
> >     <property>
> >       <name>hfile.format.version</name>
> >       <value>2</value>
> >       <description>
> >           The HFile format version to use for new files. Set this to 1 to
> > test
> >           backwards-compatibility. The default value of this option
> should
> > be
> >           consistent with FixedFileTrailer.MAX_VERSION.
> >       </description>
> >   </property>
> >
> > I believe setting this to 1 would help me carry out my test. Now we know
> > how to store data in HFileV1 in HBase0.92 :) . I'll post the result once
> i
> > try this out.
> >
> > Thanks,
> > Anil
> >
> >
> > On Wed, Aug 15, 2012 at 5:09 AM, J Mohamed Zahoor <jm...@gmail.com>
> > wrote:
> >
> > > Cool. Now we have something on the records :-)
> > >
> > > ./Zahoor@iPad
> > >
> > > On 15-Aug-2012, at 3:12 AM, Harsh J <ha...@cloudera.com> wrote:
> > >
> > > > Not wanting to have this thread too end up as a mystery-result on the
> > > > web, I did some tests. I loaded 10k rows (of 100 KB random chars
> each)
> > > > into test tables on 0.90 and 0.92 both, flushed them,
> major_compact'ed
> > > > them (waited for completion and drop in IO write activity) and then
> > > > measured them to find this:
> > > >
> > > > 0.92 takes a total of 1049661190 bytes under its /hbase/test
> directory.
> > > > 0.90 takes a total of 1049467570 bytes under its /hbase/test
> directory.
> > > >
> > > > So… not much of a difference. It is still your data that counts. I
> > > > believe what Anil may have had were merely additional, un-compacted
> > > > stores?
> > > >
> > > > P.s. Note that my 'test' table were all defaults. That is, merely
> > > > "create 'test', 'col1'", nothing else, so the block indexes must've
> > > > probably gotten created for every row, as thats at 64k by default,
> > > > while my rows are all 100k each.
> > > >
> > > > On Wed, Aug 15, 2012 at 2:25 AM, anil gupta <an...@gmail.com>
> > > wrote:
> > > >> Hi Kevin,
> > > >>
> > > >> If it's not possible to store table in HFilev1 in HBase 0.92 then my
> > > last
> > > >> option will be to do store data on pseudo-distributed or standalone
> > > cluster
> > > >> for the comparison.
> > > >> The advantage with the current installation is that its a fully
> > > distributed
> > > >> cluster with around 33 million records in a table. So, it would give
> > me
> > > a
> > > >> better estimate.
> > > >>
> > > >> Thanks,
> > > >> Anil Gupta
> > > >>
> > > >> On Tue, Aug 14, 2012 at 1:48 PM, Kevin O'dell <
> > kevin.odell@cloudera.com
> > > >wrote:
> > > >>
> > > >>> Do you not have a pseudo cluster for testing anywhere?
> > > >>>
> > > >>> On Tue, Aug 14, 2012 at 4:46 PM, anil gupta <anilgupta84@gmail.com
> >
> > > wrote:
> > > >>>
> > > >>>> Hi Jerry,
> > > >>>>
> > > >>>> I am wiling to do that but the problem is that i wiped off the
> > > HBase0.90
> > > >>>> cluster. Is there a way to store a table in HFilev1 in HBase0.92?
> > If i
> > > >>> can
> > > >>>> store a file in HFilev1 in 0.92 then i can do the comparison.
> > > >>>>
> > > >>>> Thanks,
> > > >>>> Anil Gupta
> > > >>>>
> > > >>>> On Tue, Aug 14, 2012 at 1:28 PM, Jerry Lam <ch...@gmail.com>
> > > wrote:
> > > >>>>
> > > >>>>> Hi Anil:
> > > >>>>>
> > > >>>>> Maybe you can try to compare the two HFile implementation
> directly?
> > > Let
> > > >>>> say
> > > >>>>> write 1000 rows into HFile v1 format and then into HFile v2
> format.
> > > You
> > > >>>> can
> > > >>>>> then compare the size of the two directly?
> > > >>>>>
> > > >>>>> HTH,
> > > >>>>>
> > > >>>>> Jerry
> > > >>>>>
> > > >>>>> On Tue, Aug 14, 2012 at 3:36 PM, anil gupta <
> anilgupta84@gmail.com
> > >
> > > >>>> wrote:
> > > >>>>>
> > > >>>>>> Hi Zahoor,
> > > >>>>>>
> > > >>>>>> Then it seems like i might have missed something when doing hdfs
> > > >>> usage
> > > >>>>>> estimation of HBase. I usually do hadoop fs -dus
> > /hbase/$TABLE_NAME
> > > >>> for
> > > >>>>>> getting the hdfs usage of a table. Is this the right way? Since
> i
> > > >>> wiped
> > > >>>>> of
> > > >>>>>> the HBase0.90 cluster so now i cannot look into hdfs usage of
> it.
> > Is
> > > >>> it
> > > >>>>>> possible to store a table in HFileV1 instead of HFileV2 in
> > > HBase0.92?
> > > >>>>>> In this way i can do a fair comparison.
> > > >>>>>>
> > > >>>>>> Thanks,
> > > >>>>>> Anil Gupta
> > > >>>>>>
> > > >>>>>> On Tue, Aug 14, 2012 at 12:13 PM, jmozah <jm...@gmail.com>
> > wrote:
> > > >>>>>>
> > > >>>>>>> Hi Anil,
> > > >>>>>>>
> > > >>>>>>> I really doubt that there is 50% drop in file sizes... As far
> as
> > i
> > > >>>>> know..
> > > >>>>>>> there is no drastic space conserving feature in V2. Just as  an
> > > >>> after
> > > >>>>>>> thought.. do a major compact and check the sizes.
> > > >>>>>>>
> > > >>>>>>> ./Zahoor
> > > >>>>>>> http://blog.zahoor.in
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> On 15-Aug-2012, at 12:31 AM, anil gupta <anilgupta84@gmail.com
> >
> > > >>>> wrote:
> > > >>>>>>>
> > > >>>>>>>> l
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> --
> > > >>>>>> Thanks & Regards,
> > > >>>>>> Anil Gupta
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> --
> > > >>>> Thanks & Regards,
> > > >>>> Anil Gupta
> > > >>>>
> > > >>>
> > > >>>
> > > >>>
> > > >>> --
> > > >>> Kevin O'Dell
> > > >>> Customer Operations Engineer, Cloudera
> > > >>>
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> Thanks & Regards,
> > > >> Anil Gupta
> > > >
> > > >
> > > >
> > > > --
> > > > Harsh J
> > >
> >
> >
> >
> > --
> > Thanks & Regards,
> > Anil Gupta
> >
>
>
>
> --
> Kevin O'Dell
> Customer Operations Engineer, Cloudera
>



-- 
Thanks & Regards,
Anil Gupta

Re: Disk space usage of HFilev1 vs HFilev2

Posted by Kevin O'dell <ke...@cloudera.com>.
Anil,

  Please let us know how well this works.

On Mon, Aug 27, 2012 at 4:19 PM, anil gupta <an...@gmail.com> wrote:

> Hi Guys,
>
> I was digging through the hbase-default.xml file and i found this property
> relates HFile handling:
> </property>
>     <property>
>       <name>hfile.format.version</name>
>       <value>2</value>
>       <description>
>           The HFile format version to use for new files. Set this to 1 to
> test
>           backwards-compatibility. The default value of this option should
> be
>           consistent with FixedFileTrailer.MAX_VERSION.
>       </description>
>   </property>
>
> I believe setting this to 1 would help me carry out my test. Now we know
> how to store data in HFileV1 in HBase0.92 :) . I'll post the result once i
> try this out.
>
> Thanks,
> Anil
>
>
> On Wed, Aug 15, 2012 at 5:09 AM, J Mohamed Zahoor <jm...@gmail.com>
> wrote:
>
> > Cool. Now we have something on the records :-)
> >
> > ./Zahoor@iPad
> >
> > On 15-Aug-2012, at 3:12 AM, Harsh J <ha...@cloudera.com> wrote:
> >
> > > Not wanting to have this thread too end up as a mystery-result on the
> > > web, I did some tests. I loaded 10k rows (of 100 KB random chars each)
> > > into test tables on 0.90 and 0.92 both, flushed them, major_compact'ed
> > > them (waited for completion and drop in IO write activity) and then
> > > measured them to find this:
> > >
> > > 0.92 takes a total of 1049661190 bytes under its /hbase/test directory.
> > > 0.90 takes a total of 1049467570 bytes under its /hbase/test directory.
> > >
> > > So… not much of a difference. It is still your data that counts. I
> > > believe what Anil may have had were merely additional, un-compacted
> > > stores?
> > >
> > > P.s. Note that my 'test' table were all defaults. That is, merely
> > > "create 'test', 'col1'", nothing else, so the block indexes must've
> > > probably gotten created for every row, as thats at 64k by default,
> > > while my rows are all 100k each.
> > >
> > > On Wed, Aug 15, 2012 at 2:25 AM, anil gupta <an...@gmail.com>
> > wrote:
> > >> Hi Kevin,
> > >>
> > >> If it's not possible to store table in HFilev1 in HBase 0.92 then my
> > last
> > >> option will be to do store data on pseudo-distributed or standalone
> > cluster
> > >> for the comparison.
> > >> The advantage with the current installation is that its a fully
> > distributed
> > >> cluster with around 33 million records in a table. So, it would give
> me
> > a
> > >> better estimate.
> > >>
> > >> Thanks,
> > >> Anil Gupta
> > >>
> > >> On Tue, Aug 14, 2012 at 1:48 PM, Kevin O'dell <
> kevin.odell@cloudera.com
> > >wrote:
> > >>
> > >>> Do you not have a pseudo cluster for testing anywhere?
> > >>>
> > >>> On Tue, Aug 14, 2012 at 4:46 PM, anil gupta <an...@gmail.com>
> > wrote:
> > >>>
> > >>>> Hi Jerry,
> > >>>>
> > >>>> I am wiling to do that but the problem is that i wiped off the
> > HBase0.90
> > >>>> cluster. Is there a way to store a table in HFilev1 in HBase0.92?
> If i
> > >>> can
> > >>>> store a file in HFilev1 in 0.92 then i can do the comparison.
> > >>>>
> > >>>> Thanks,
> > >>>> Anil Gupta
> > >>>>
> > >>>> On Tue, Aug 14, 2012 at 1:28 PM, Jerry Lam <ch...@gmail.com>
> > wrote:
> > >>>>
> > >>>>> Hi Anil:
> > >>>>>
> > >>>>> Maybe you can try to compare the two HFile implementation directly?
> > Let
> > >>>> say
> > >>>>> write 1000 rows into HFile v1 format and then into HFile v2 format.
> > You
> > >>>> can
> > >>>>> then compare the size of the two directly?
> > >>>>>
> > >>>>> HTH,
> > >>>>>
> > >>>>> Jerry
> > >>>>>
> > >>>>> On Tue, Aug 14, 2012 at 3:36 PM, anil gupta <anilgupta84@gmail.com
> >
> > >>>> wrote:
> > >>>>>
> > >>>>>> Hi Zahoor,
> > >>>>>>
> > >>>>>> Then it seems like i might have missed something when doing hdfs
> > >>> usage
> > >>>>>> estimation of HBase. I usually do hadoop fs -dus
> /hbase/$TABLE_NAME
> > >>> for
> > >>>>>> getting the hdfs usage of a table. Is this the right way? Since i
> > >>> wiped
> > >>>>> of
> > >>>>>> the HBase0.90 cluster so now i cannot look into hdfs usage of it.
> Is
> > >>> it
> > >>>>>> possible to store a table in HFileV1 instead of HFileV2 in
> > HBase0.92?
> > >>>>>> In this way i can do a fair comparison.
> > >>>>>>
> > >>>>>> Thanks,
> > >>>>>> Anil Gupta
> > >>>>>>
> > >>>>>> On Tue, Aug 14, 2012 at 12:13 PM, jmozah <jm...@gmail.com>
> wrote:
> > >>>>>>
> > >>>>>>> Hi Anil,
> > >>>>>>>
> > >>>>>>> I really doubt that there is 50% drop in file sizes... As far as
> i
> > >>>>> know..
> > >>>>>>> there is no drastic space conserving feature in V2. Just as  an
> > >>> after
> > >>>>>>> thought.. do a major compact and check the sizes.
> > >>>>>>>
> > >>>>>>> ./Zahoor
> > >>>>>>> http://blog.zahoor.in
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On 15-Aug-2012, at 12:31 AM, anil gupta <an...@gmail.com>
> > >>>> wrote:
> > >>>>>>>
> > >>>>>>>> l
> > >>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> --
> > >>>>>> Thanks & Regards,
> > >>>>>> Anil Gupta
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>> --
> > >>>> Thanks & Regards,
> > >>>> Anil Gupta
> > >>>>
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> Kevin O'Dell
> > >>> Customer Operations Engineer, Cloudera
> > >>>
> > >>
> > >>
> > >>
> > >> --
> > >> Thanks & Regards,
> > >> Anil Gupta
> > >
> > >
> > >
> > > --
> > > Harsh J
> >
>
>
>
> --
> Thanks & Regards,
> Anil Gupta
>



-- 
Kevin O'Dell
Customer Operations Engineer, Cloudera

Re: Disk space usage of HFilev1 vs HFilev2

Posted by anil gupta <an...@gmail.com>.
Hi Guys,

I was digging through the hbase-default.xml file and i found this property
relates HFile handling:
</property>
    <property>
      <name>hfile.format.version</name>
      <value>2</value>
      <description>
          The HFile format version to use for new files. Set this to 1 to
test
          backwards-compatibility. The default value of this option should
be
          consistent with FixedFileTrailer.MAX_VERSION.
      </description>
  </property>

I believe setting this to 1 would help me carry out my test. Now we know
how to store data in HFileV1 in HBase0.92 :) . I'll post the result once i
try this out.

Thanks,
Anil


On Wed, Aug 15, 2012 at 5:09 AM, J Mohamed Zahoor <jm...@gmail.com> wrote:

> Cool. Now we have something on the records :-)
>
> ./Zahoor@iPad
>
> On 15-Aug-2012, at 3:12 AM, Harsh J <ha...@cloudera.com> wrote:
>
> > Not wanting to have this thread too end up as a mystery-result on the
> > web, I did some tests. I loaded 10k rows (of 100 KB random chars each)
> > into test tables on 0.90 and 0.92 both, flushed them, major_compact'ed
> > them (waited for completion and drop in IO write activity) and then
> > measured them to find this:
> >
> > 0.92 takes a total of 1049661190 bytes under its /hbase/test directory.
> > 0.90 takes a total of 1049467570 bytes under its /hbase/test directory.
> >
> > So… not much of a difference. It is still your data that counts. I
> > believe what Anil may have had were merely additional, un-compacted
> > stores?
> >
> > P.s. Note that my 'test' table were all defaults. That is, merely
> > "create 'test', 'col1'", nothing else, so the block indexes must've
> > probably gotten created for every row, as thats at 64k by default,
> > while my rows are all 100k each.
> >
> > On Wed, Aug 15, 2012 at 2:25 AM, anil gupta <an...@gmail.com>
> wrote:
> >> Hi Kevin,
> >>
> >> If it's not possible to store table in HFilev1 in HBase 0.92 then my
> last
> >> option will be to do store data on pseudo-distributed or standalone
> cluster
> >> for the comparison.
> >> The advantage with the current installation is that its a fully
> distributed
> >> cluster with around 33 million records in a table. So, it would give me
> a
> >> better estimate.
> >>
> >> Thanks,
> >> Anil Gupta
> >>
> >> On Tue, Aug 14, 2012 at 1:48 PM, Kevin O'dell <kevin.odell@cloudera.com
> >wrote:
> >>
> >>> Do you not have a pseudo cluster for testing anywhere?
> >>>
> >>> On Tue, Aug 14, 2012 at 4:46 PM, anil gupta <an...@gmail.com>
> wrote:
> >>>
> >>>> Hi Jerry,
> >>>>
> >>>> I am wiling to do that but the problem is that i wiped off the
> HBase0.90
> >>>> cluster. Is there a way to store a table in HFilev1 in HBase0.92? If i
> >>> can
> >>>> store a file in HFilev1 in 0.92 then i can do the comparison.
> >>>>
> >>>> Thanks,
> >>>> Anil Gupta
> >>>>
> >>>> On Tue, Aug 14, 2012 at 1:28 PM, Jerry Lam <ch...@gmail.com>
> wrote:
> >>>>
> >>>>> Hi Anil:
> >>>>>
> >>>>> Maybe you can try to compare the two HFile implementation directly?
> Let
> >>>> say
> >>>>> write 1000 rows into HFile v1 format and then into HFile v2 format.
> You
> >>>> can
> >>>>> then compare the size of the two directly?
> >>>>>
> >>>>> HTH,
> >>>>>
> >>>>> Jerry
> >>>>>
> >>>>> On Tue, Aug 14, 2012 at 3:36 PM, anil gupta <an...@gmail.com>
> >>>> wrote:
> >>>>>
> >>>>>> Hi Zahoor,
> >>>>>>
> >>>>>> Then it seems like i might have missed something when doing hdfs
> >>> usage
> >>>>>> estimation of HBase. I usually do hadoop fs -dus /hbase/$TABLE_NAME
> >>> for
> >>>>>> getting the hdfs usage of a table. Is this the right way? Since i
> >>> wiped
> >>>>> of
> >>>>>> the HBase0.90 cluster so now i cannot look into hdfs usage of it. Is
> >>> it
> >>>>>> possible to store a table in HFileV1 instead of HFileV2 in
> HBase0.92?
> >>>>>> In this way i can do a fair comparison.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Anil Gupta
> >>>>>>
> >>>>>> On Tue, Aug 14, 2012 at 12:13 PM, jmozah <jm...@gmail.com> wrote:
> >>>>>>
> >>>>>>> Hi Anil,
> >>>>>>>
> >>>>>>> I really doubt that there is 50% drop in file sizes... As far as i
> >>>>> know..
> >>>>>>> there is no drastic space conserving feature in V2. Just as  an
> >>> after
> >>>>>>> thought.. do a major compact and check the sizes.
> >>>>>>>
> >>>>>>> ./Zahoor
> >>>>>>> http://blog.zahoor.in
> >>>>>>>
> >>>>>>>
> >>>>>>> On 15-Aug-2012, at 12:31 AM, anil gupta <an...@gmail.com>
> >>>> wrote:
> >>>>>>>
> >>>>>>>> l
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> Thanks & Regards,
> >>>>>> Anil Gupta
> >>>>>>
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Thanks & Regards,
> >>>> Anil Gupta
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Kevin O'Dell
> >>> Customer Operations Engineer, Cloudera
> >>>
> >>
> >>
> >>
> >> --
> >> Thanks & Regards,
> >> Anil Gupta
> >
> >
> >
> > --
> > Harsh J
>



-- 
Thanks & Regards,
Anil Gupta

Re: Disk space usage of HFilev1 vs HFilev2

Posted by J Mohamed Zahoor <jm...@gmail.com>.
Cool. Now we have something on the records :-)

./Zahoor@iPad

On 15-Aug-2012, at 3:12 AM, Harsh J <ha...@cloudera.com> wrote:

> Not wanting to have this thread too end up as a mystery-result on the
> web, I did some tests. I loaded 10k rows (of 100 KB random chars each)
> into test tables on 0.90 and 0.92 both, flushed them, major_compact'ed
> them (waited for completion and drop in IO write activity) and then
> measured them to find this:
> 
> 0.92 takes a total of 1049661190 bytes under its /hbase/test directory.
> 0.90 takes a total of 1049467570 bytes under its /hbase/test directory.
> 
> So… not much of a difference. It is still your data that counts. I
> believe what Anil may have had were merely additional, un-compacted
> stores?
> 
> P.s. Note that my 'test' table were all defaults. That is, merely
> "create 'test', 'col1'", nothing else, so the block indexes must've
> probably gotten created for every row, as thats at 64k by default,
> while my rows are all 100k each.
> 
> On Wed, Aug 15, 2012 at 2:25 AM, anil gupta <an...@gmail.com> wrote:
>> Hi Kevin,
>> 
>> If it's not possible to store table in HFilev1 in HBase 0.92 then my last
>> option will be to do store data on pseudo-distributed or standalone cluster
>> for the comparison.
>> The advantage with the current installation is that its a fully distributed
>> cluster with around 33 million records in a table. So, it would give me a
>> better estimate.
>> 
>> Thanks,
>> Anil Gupta
>> 
>> On Tue, Aug 14, 2012 at 1:48 PM, Kevin O'dell <ke...@cloudera.com>wrote:
>> 
>>> Do you not have a pseudo cluster for testing anywhere?
>>> 
>>> On Tue, Aug 14, 2012 at 4:46 PM, anil gupta <an...@gmail.com> wrote:
>>> 
>>>> Hi Jerry,
>>>> 
>>>> I am wiling to do that but the problem is that i wiped off the HBase0.90
>>>> cluster. Is there a way to store a table in HFilev1 in HBase0.92? If i
>>> can
>>>> store a file in HFilev1 in 0.92 then i can do the comparison.
>>>> 
>>>> Thanks,
>>>> Anil Gupta
>>>> 
>>>> On Tue, Aug 14, 2012 at 1:28 PM, Jerry Lam <ch...@gmail.com> wrote:
>>>> 
>>>>> Hi Anil:
>>>>> 
>>>>> Maybe you can try to compare the two HFile implementation directly? Let
>>>> say
>>>>> write 1000 rows into HFile v1 format and then into HFile v2 format. You
>>>> can
>>>>> then compare the size of the two directly?
>>>>> 
>>>>> HTH,
>>>>> 
>>>>> Jerry
>>>>> 
>>>>> On Tue, Aug 14, 2012 at 3:36 PM, anil gupta <an...@gmail.com>
>>>> wrote:
>>>>> 
>>>>>> Hi Zahoor,
>>>>>> 
>>>>>> Then it seems like i might have missed something when doing hdfs
>>> usage
>>>>>> estimation of HBase. I usually do hadoop fs -dus /hbase/$TABLE_NAME
>>> for
>>>>>> getting the hdfs usage of a table. Is this the right way? Since i
>>> wiped
>>>>> of
>>>>>> the HBase0.90 cluster so now i cannot look into hdfs usage of it. Is
>>> it
>>>>>> possible to store a table in HFileV1 instead of HFileV2 in HBase0.92?
>>>>>> In this way i can do a fair comparison.
>>>>>> 
>>>>>> Thanks,
>>>>>> Anil Gupta
>>>>>> 
>>>>>> On Tue, Aug 14, 2012 at 12:13 PM, jmozah <jm...@gmail.com> wrote:
>>>>>> 
>>>>>>> Hi Anil,
>>>>>>> 
>>>>>>> I really doubt that there is 50% drop in file sizes... As far as i
>>>>> know..
>>>>>>> there is no drastic space conserving feature in V2. Just as  an
>>> after
>>>>>>> thought.. do a major compact and check the sizes.
>>>>>>> 
>>>>>>> ./Zahoor
>>>>>>> http://blog.zahoor.in
>>>>>>> 
>>>>>>> 
>>>>>>> On 15-Aug-2012, at 12:31 AM, anil gupta <an...@gmail.com>
>>>> wrote:
>>>>>>> 
>>>>>>>> l
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Thanks & Regards,
>>>>>> Anil Gupta
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Thanks & Regards,
>>>> Anil Gupta
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Kevin O'Dell
>>> Customer Operations Engineer, Cloudera
>>> 
>> 
>> 
>> 
>> --
>> Thanks & Regards,
>> Anil Gupta
> 
> 
> 
> -- 
> Harsh J

Re: Disk space usage of HFilev1 vs HFilev2

Posted by Harsh J <ha...@cloudera.com>.
Not wanting to have this thread too end up as a mystery-result on the
web, I did some tests. I loaded 10k rows (of 100 KB random chars each)
into test tables on 0.90 and 0.92 both, flushed them, major_compact'ed
them (waited for completion and drop in IO write activity) and then
measured them to find this:

0.92 takes a total of 1049661190 bytes under its /hbase/test directory.
0.90 takes a total of 1049467570 bytes under its /hbase/test directory.

So… not much of a difference. It is still your data that counts. I
believe what Anil may have had were merely additional, un-compacted
stores?

P.s. Note that my 'test' table were all defaults. That is, merely
"create 'test', 'col1'", nothing else, so the block indexes must've
probably gotten created for every row, as thats at 64k by default,
while my rows are all 100k each.

On Wed, Aug 15, 2012 at 2:25 AM, anil gupta <an...@gmail.com> wrote:
> Hi Kevin,
>
> If it's not possible to store table in HFilev1 in HBase 0.92 then my last
> option will be to do store data on pseudo-distributed or standalone cluster
> for the comparison.
> The advantage with the current installation is that its a fully distributed
> cluster with around 33 million records in a table. So, it would give me a
> better estimate.
>
> Thanks,
> Anil Gupta
>
> On Tue, Aug 14, 2012 at 1:48 PM, Kevin O'dell <ke...@cloudera.com>wrote:
>
>> Do you not have a pseudo cluster for testing anywhere?
>>
>> On Tue, Aug 14, 2012 at 4:46 PM, anil gupta <an...@gmail.com> wrote:
>>
>> > Hi Jerry,
>> >
>> > I am wiling to do that but the problem is that i wiped off the HBase0.90
>> > cluster. Is there a way to store a table in HFilev1 in HBase0.92? If i
>> can
>> > store a file in HFilev1 in 0.92 then i can do the comparison.
>> >
>> > Thanks,
>> > Anil Gupta
>> >
>> > On Tue, Aug 14, 2012 at 1:28 PM, Jerry Lam <ch...@gmail.com> wrote:
>> >
>> > > Hi Anil:
>> > >
>> > > Maybe you can try to compare the two HFile implementation directly? Let
>> > say
>> > > write 1000 rows into HFile v1 format and then into HFile v2 format. You
>> > can
>> > > then compare the size of the two directly?
>> > >
>> > > HTH,
>> > >
>> > > Jerry
>> > >
>> > > On Tue, Aug 14, 2012 at 3:36 PM, anil gupta <an...@gmail.com>
>> > wrote:
>> > >
>> > > > Hi Zahoor,
>> > > >
>> > > > Then it seems like i might have missed something when doing hdfs
>> usage
>> > > > estimation of HBase. I usually do hadoop fs -dus /hbase/$TABLE_NAME
>> for
>> > > > getting the hdfs usage of a table. Is this the right way? Since i
>> wiped
>> > > of
>> > > > the HBase0.90 cluster so now i cannot look into hdfs usage of it. Is
>> it
>> > > > possible to store a table in HFileV1 instead of HFileV2 in HBase0.92?
>> > > > In this way i can do a fair comparison.
>> > > >
>> > > > Thanks,
>> > > > Anil Gupta
>> > > >
>> > > > On Tue, Aug 14, 2012 at 12:13 PM, jmozah <jm...@gmail.com> wrote:
>> > > >
>> > > > > Hi Anil,
>> > > > >
>> > > > > I really doubt that there is 50% drop in file sizes... As far as i
>> > > know..
>> > > > > there is no drastic space conserving feature in V2. Just as  an
>> after
>> > > > > thought.. do a major compact and check the sizes.
>> > > > >
>> > > > > ./Zahoor
>> > > > > http://blog.zahoor.in
>> > > > >
>> > > > >
>> > > > > On 15-Aug-2012, at 12:31 AM, anil gupta <an...@gmail.com>
>> > wrote:
>> > > > >
>> > > > > > l
>> > > > >
>> > > > >
>> > > >
>> > > >
>> > > > --
>> > > > Thanks & Regards,
>> > > > Anil Gupta
>> > > >
>> > >
>> >
>> >
>> >
>> > --
>> > Thanks & Regards,
>> > Anil Gupta
>> >
>>
>>
>>
>> --
>> Kevin O'Dell
>> Customer Operations Engineer, Cloudera
>>
>
>
>
> --
> Thanks & Regards,
> Anil Gupta



-- 
Harsh J

Re: Disk space usage of HFilev1 vs HFilev2

Posted by anil gupta <an...@gmail.com>.
Hi Kevin,

If it's not possible to store table in HFilev1 in HBase 0.92 then my last
option will be to do store data on pseudo-distributed or standalone cluster
for the comparison.
The advantage with the current installation is that its a fully distributed
cluster with around 33 million records in a table. So, it would give me a
better estimate.

Thanks,
Anil Gupta

On Tue, Aug 14, 2012 at 1:48 PM, Kevin O'dell <ke...@cloudera.com>wrote:

> Do you not have a pseudo cluster for testing anywhere?
>
> On Tue, Aug 14, 2012 at 4:46 PM, anil gupta <an...@gmail.com> wrote:
>
> > Hi Jerry,
> >
> > I am wiling to do that but the problem is that i wiped off the HBase0.90
> > cluster. Is there a way to store a table in HFilev1 in HBase0.92? If i
> can
> > store a file in HFilev1 in 0.92 then i can do the comparison.
> >
> > Thanks,
> > Anil Gupta
> >
> > On Tue, Aug 14, 2012 at 1:28 PM, Jerry Lam <ch...@gmail.com> wrote:
> >
> > > Hi Anil:
> > >
> > > Maybe you can try to compare the two HFile implementation directly? Let
> > say
> > > write 1000 rows into HFile v1 format and then into HFile v2 format. You
> > can
> > > then compare the size of the two directly?
> > >
> > > HTH,
> > >
> > > Jerry
> > >
> > > On Tue, Aug 14, 2012 at 3:36 PM, anil gupta <an...@gmail.com>
> > wrote:
> > >
> > > > Hi Zahoor,
> > > >
> > > > Then it seems like i might have missed something when doing hdfs
> usage
> > > > estimation of HBase. I usually do hadoop fs -dus /hbase/$TABLE_NAME
> for
> > > > getting the hdfs usage of a table. Is this the right way? Since i
> wiped
> > > of
> > > > the HBase0.90 cluster so now i cannot look into hdfs usage of it. Is
> it
> > > > possible to store a table in HFileV1 instead of HFileV2 in HBase0.92?
> > > > In this way i can do a fair comparison.
> > > >
> > > > Thanks,
> > > > Anil Gupta
> > > >
> > > > On Tue, Aug 14, 2012 at 12:13 PM, jmozah <jm...@gmail.com> wrote:
> > > >
> > > > > Hi Anil,
> > > > >
> > > > > I really doubt that there is 50% drop in file sizes... As far as i
> > > know..
> > > > > there is no drastic space conserving feature in V2. Just as  an
> after
> > > > > thought.. do a major compact and check the sizes.
> > > > >
> > > > > ./Zahoor
> > > > > http://blog.zahoor.in
> > > > >
> > > > >
> > > > > On 15-Aug-2012, at 12:31 AM, anil gupta <an...@gmail.com>
> > wrote:
> > > > >
> > > > > > l
> > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Thanks & Regards,
> > > > Anil Gupta
> > > >
> > >
> >
> >
> >
> > --
> > Thanks & Regards,
> > Anil Gupta
> >
>
>
>
> --
> Kevin O'Dell
> Customer Operations Engineer, Cloudera
>



-- 
Thanks & Regards,
Anil Gupta

Re: Disk space usage of HFilev1 vs HFilev2

Posted by Kevin O'dell <ke...@cloudera.com>.
Do you not have a pseudo cluster for testing anywhere?

On Tue, Aug 14, 2012 at 4:46 PM, anil gupta <an...@gmail.com> wrote:

> Hi Jerry,
>
> I am wiling to do that but the problem is that i wiped off the HBase0.90
> cluster. Is there a way to store a table in HFilev1 in HBase0.92? If i can
> store a file in HFilev1 in 0.92 then i can do the comparison.
>
> Thanks,
> Anil Gupta
>
> On Tue, Aug 14, 2012 at 1:28 PM, Jerry Lam <ch...@gmail.com> wrote:
>
> > Hi Anil:
> >
> > Maybe you can try to compare the two HFile implementation directly? Let
> say
> > write 1000 rows into HFile v1 format and then into HFile v2 format. You
> can
> > then compare the size of the two directly?
> >
> > HTH,
> >
> > Jerry
> >
> > On Tue, Aug 14, 2012 at 3:36 PM, anil gupta <an...@gmail.com>
> wrote:
> >
> > > Hi Zahoor,
> > >
> > > Then it seems like i might have missed something when doing hdfs usage
> > > estimation of HBase. I usually do hadoop fs -dus /hbase/$TABLE_NAME for
> > > getting the hdfs usage of a table. Is this the right way? Since i wiped
> > of
> > > the HBase0.90 cluster so now i cannot look into hdfs usage of it. Is it
> > > possible to store a table in HFileV1 instead of HFileV2 in HBase0.92?
> > > In this way i can do a fair comparison.
> > >
> > > Thanks,
> > > Anil Gupta
> > >
> > > On Tue, Aug 14, 2012 at 12:13 PM, jmozah <jm...@gmail.com> wrote:
> > >
> > > > Hi Anil,
> > > >
> > > > I really doubt that there is 50% drop in file sizes... As far as i
> > know..
> > > > there is no drastic space conserving feature in V2. Just as  an after
> > > > thought.. do a major compact and check the sizes.
> > > >
> > > > ./Zahoor
> > > > http://blog.zahoor.in
> > > >
> > > >
> > > > On 15-Aug-2012, at 12:31 AM, anil gupta <an...@gmail.com>
> wrote:
> > > >
> > > > > l
> > > >
> > > >
> > >
> > >
> > > --
> > > Thanks & Regards,
> > > Anil Gupta
> > >
> >
>
>
>
> --
> Thanks & Regards,
> Anil Gupta
>



-- 
Kevin O'Dell
Customer Operations Engineer, Cloudera

Re: Disk space usage of HFilev1 vs HFilev2

Posted by anil gupta <an...@gmail.com>.
Hi Jerry,

I am wiling to do that but the problem is that i wiped off the HBase0.90
cluster. Is there a way to store a table in HFilev1 in HBase0.92? If i can
store a file in HFilev1 in 0.92 then i can do the comparison.

Thanks,
Anil Gupta

On Tue, Aug 14, 2012 at 1:28 PM, Jerry Lam <ch...@gmail.com> wrote:

> Hi Anil:
>
> Maybe you can try to compare the two HFile implementation directly? Let say
> write 1000 rows into HFile v1 format and then into HFile v2 format. You can
> then compare the size of the two directly?
>
> HTH,
>
> Jerry
>
> On Tue, Aug 14, 2012 at 3:36 PM, anil gupta <an...@gmail.com> wrote:
>
> > Hi Zahoor,
> >
> > Then it seems like i might have missed something when doing hdfs usage
> > estimation of HBase. I usually do hadoop fs -dus /hbase/$TABLE_NAME for
> > getting the hdfs usage of a table. Is this the right way? Since i wiped
> of
> > the HBase0.90 cluster so now i cannot look into hdfs usage of it. Is it
> > possible to store a table in HFileV1 instead of HFileV2 in HBase0.92?
> > In this way i can do a fair comparison.
> >
> > Thanks,
> > Anil Gupta
> >
> > On Tue, Aug 14, 2012 at 12:13 PM, jmozah <jm...@gmail.com> wrote:
> >
> > > Hi Anil,
> > >
> > > I really doubt that there is 50% drop in file sizes... As far as i
> know..
> > > there is no drastic space conserving feature in V2. Just as  an after
> > > thought.. do a major compact and check the sizes.
> > >
> > > ./Zahoor
> > > http://blog.zahoor.in
> > >
> > >
> > > On 15-Aug-2012, at 12:31 AM, anil gupta <an...@gmail.com> wrote:
> > >
> > > > l
> > >
> > >
> >
> >
> > --
> > Thanks & Regards,
> > Anil Gupta
> >
>



-- 
Thanks & Regards,
Anil Gupta

Re: Disk space usage of HFilev1 vs HFilev2

Posted by Jerry Lam <ch...@gmail.com>.
Hi Anil:

Maybe you can try to compare the two HFile implementation directly? Let say
write 1000 rows into HFile v1 format and then into HFile v2 format. You can
then compare the size of the two directly?

HTH,

Jerry

On Tue, Aug 14, 2012 at 3:36 PM, anil gupta <an...@gmail.com> wrote:

> Hi Zahoor,
>
> Then it seems like i might have missed something when doing hdfs usage
> estimation of HBase. I usually do hadoop fs -dus /hbase/$TABLE_NAME for
> getting the hdfs usage of a table. Is this the right way? Since i wiped of
> the HBase0.90 cluster so now i cannot look into hdfs usage of it. Is it
> possible to store a table in HFileV1 instead of HFileV2 in HBase0.92?
> In this way i can do a fair comparison.
>
> Thanks,
> Anil Gupta
>
> On Tue, Aug 14, 2012 at 12:13 PM, jmozah <jm...@gmail.com> wrote:
>
> > Hi Anil,
> >
> > I really doubt that there is 50% drop in file sizes... As far as i know..
> > there is no drastic space conserving feature in V2. Just as  an after
> > thought.. do a major compact and check the sizes.
> >
> > ./Zahoor
> > http://blog.zahoor.in
> >
> >
> > On 15-Aug-2012, at 12:31 AM, anil gupta <an...@gmail.com> wrote:
> >
> > > l
> >
> >
>
>
> --
> Thanks & Regards,
> Anil Gupta
>

Re: Disk space usage of HFilev1 vs HFilev2

Posted by anil gupta <an...@gmail.com>.
Hi Zahoor,

Then it seems like i might have missed something when doing hdfs usage
estimation of HBase. I usually do hadoop fs -dus /hbase/$TABLE_NAME for
getting the hdfs usage of a table. Is this the right way? Since i wiped of
the HBase0.90 cluster so now i cannot look into hdfs usage of it. Is it
possible to store a table in HFileV1 instead of HFileV2 in HBase0.92?
In this way i can do a fair comparison.

Thanks,
Anil Gupta

On Tue, Aug 14, 2012 at 12:13 PM, jmozah <jm...@gmail.com> wrote:

> Hi Anil,
>
> I really doubt that there is 50% drop in file sizes... As far as i know..
> there is no drastic space conserving feature in V2. Just as  an after
> thought.. do a major compact and check the sizes.
>
> ./Zahoor
> http://blog.zahoor.in
>
>
> On 15-Aug-2012, at 12:31 AM, anil gupta <an...@gmail.com> wrote:
>
> > l
>
>


-- 
Thanks & Regards,
Anil Gupta

Re: Disk space usage of HFilev1 vs HFilev2

Posted by jmozah <jm...@gmail.com>.
Hi Anil,

I really doubt that there is 50% drop in file sizes... As far as i know.. there is no drastic space conserving feature in V2. Just as  an after thought.. do a major compact and check the sizes.

./Zahoor
http://blog.zahoor.in


On 15-Aug-2012, at 12:31 AM, anil gupta <an...@gmail.com> wrote:

> l


Re: Disk space usage of HFilev1 vs HFilev2

Posted by anil gupta <an...@gmail.com>.
Hi Zahoor,

I mean the HDFS space taken by one replica of table in HBase0.90 was 90 GB
however hdfs disk space taken for the same table in HBase0.92 is 45GB. So,
i am interested in knowing how HFilev2 takes around 50% less hdfs space. No
compression was enabled for these tables, no schema changes and same
data-set is used .

Actually, i have to provide estimates for Hardware of HBase cluster and
difference of 50% disk usage between HFilev1 and HFilev2 makes a big
difference in my estimates. So, i am just trying to make sure that if we
use HFilev2 then less disk space will be required.

Thanks,
Anil

On Tue, Aug 14, 2012 at 11:50 AM, jmozah <jm...@gmail.com> wrote:

> Hi
>
> I am not very sure about the storage savings you are talking about, But
> there is definitely savings in RAM as there is block level index and bloom
> filter  instead of file level. More here
>
> http://www.cloudera.com/blog/2012/06/hbase-io-hfile-input-output/
> http://hbase.apache.org/book.html#d540e10937
>
> Was compression enabled in 0.90? is it enabled now in 0.92?
>
> ./zahoor
>
>
> On 14-Aug-2012, at 11:45 PM, anil gupta <an...@gmail.com> wrote:
>
> > Hi All,
> >
> > I recently updated my cluster from HBase 0.90 to HBase 0.92. One replica
> of
> > one table used to take 90 GB in 0.90 but the same table takes 45 GB in
> > 0.92(HFilev2). The table has 1 column family and each row stores data of
> > 300-400 bytes(this is the size of values) in 20-30 column.
> > I am interested in knowing of any disk usage optimization done in
> HFilev2?
> > Please share if you know of any relevant document to understand the
> > reduction in disk space usage?
> >
> > --
> > Thanks & Regards,
> > Anil Gupta
>
>


-- 
Thanks & Regards,
Anil Gupta

Re: Disk space usage of HFilev1 vs HFilev2

Posted by jmozah <jm...@gmail.com>.
Hi

I am not very sure about the storage savings you are talking about, But there is definitely savings in RAM as there is block level index and bloom filter  instead of file level. More here

http://www.cloudera.com/blog/2012/06/hbase-io-hfile-input-output/
http://hbase.apache.org/book.html#d540e10937

Was compression enabled in 0.90? is it enabled now in 0.92?

./zahoor


On 14-Aug-2012, at 11:45 PM, anil gupta <an...@gmail.com> wrote:

> Hi All,
> 
> I recently updated my cluster from HBase 0.90 to HBase 0.92. One replica of
> one table used to take 90 GB in 0.90 but the same table takes 45 GB in
> 0.92(HFilev2). The table has 1 column family and each row stores data of
> 300-400 bytes(this is the size of values) in 20-30 column.
> I am interested in knowing of any disk usage optimization done in HFilev2?
> Please share if you know of any relevant document to understand the
> reduction in disk space usage?
> 
> -- 
> Thanks & Regards,
> Anil Gupta