You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by mete <ef...@gmail.com> on 2012/04/07 22:21:03 UTC

hbase table size

Hello folks,

i am trying to import a CSV file that is around 10 gb into HBASE. After the
import, i check the size of the folder with the hadoop fs -du command, and
it is a little above 100 gigabytes in size.
I did not confgure any compression or anything.  I have both tried with
sequential import using the api and creating a Hfile and mounting into
hbase but the size is nearly the same. Does this sound like normal?

Kind Regards.
Mete

Re: hbase table size

Posted by mete <ef...@gmail.com>.

Hello and thanks for the insight.
I think i misused it a little bit. I was extracting csv columns and storing
each in a different hbase column. (which i did not need at all, they are
indexed against the row key anyway)
I will try to put the entire line as  a single column and compress the
family as well for the next run.

Cheers

On Mon, Apr 9, 2012 at 1:04 PM, Ioan Eugen Stan <st...@gmail.com>wrote:

> 2012/4/7 mete <ef...@gmail.com>:
> > Hello folks,
> >
> > i am trying to import a CSV file that is around 10 gb into HBASE. After
> the
> > import, i check the size of the folder with the hadoop fs -du command,
> and
> > it is a little above 100 gigabytes in size.
> > I did not confgure any compression or anything.  I have both tried with
> > sequential import using the api and creating a Hfile and mounting into
> > hbase but the size is nearly the same. Does this sound like normal?
> >
> > Kind Regards.
> > Mete
>
>
> Hi Mete,
>
> Start with compression. It's the most easiest solution. Also try to
> make your column family of size 1 e.g. "C", or "D" and also make your
> qualifiers as small as possible, if possible. This will also save some
> space.
>
> Regards,
> --
> Ioan Eugen Stan
> http://ieugen.blogspot.com/
>

Re: hbase table size

Posted by Ioan Eugen Stan <st...@gmail.com>.

2012/4/7 mete <ef...@gmail.com>:
> Hello folks,
>
> i am trying to import a CSV file that is around 10 gb into HBASE. After the
> import, i check the size of the folder with the hadoop fs -du command, and
> it is a little above 100 gigabytes in size.
> I did not confgure any compression or anything.  I have both tried with
> sequential import using the api and creating a Hfile and mounting into
> hbase but the size is nearly the same. Does this sound like normal?
>
> Kind Regards.
> Mete


Hi Mete,

Start with compression. It's the most easiest solution. Also try to
make your column family of size 1 e.g. "C", or "D" and also make your
qualifiers as small as possible, if possible. This will also save some
space.

Regards,
-- 
Ioan Eugen Stan
http://ieugen.blogspot.com/

Re: hbase table size

Posted by lars hofhansl <lh...@yahoo.com>.

10 -> 100gb sounds about right. Of course it depends on the relative size of the keys and the values.

HBase needs to store the entire coordinates (rowkey, column identifier, timestamp) for each KeyValue (i.e. each column), whereas the TSV file only stores the values.


You can try Snappy or LZO compression if (CPU) performance is the primary consideration or GZ if disk/IO is more important.
Also 0.94+ comes with key prefix compression, which will help a lot in many cases.


-- Lars



________________________________
 From: mete <ef...@gmail.com>
To: user@hbase.apache.org 
Sent: Saturday, April 7, 2012 1:21 PM
Subject: hbase table size
 
Hello folks,

i am trying to import a CSV file that is around 10 gb into HBASE. After the
import, i check the size of the folder with the hadoop fs -du command, and
it is a little above 100 gigabytes in size.
I did not confgure any compression or anything.  I have both tried with
sequential import using the api and creating a Hfile and mounting into
hbase but the size is nearly the same. Does this sound like normal?

Kind Regards.
Mete