You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Denis Kreis <de...@gmail.com> on 2011/11/22 19:45:31 UTC

Waste of disk space

Hi,

I loaded a 2GB log file using importtsv. Each row has 54 values, which are
all stored in one column family. The disk space consumed on HDFS is about
46GB. Is it normal?
I am using HBase on HDFS in pseudo-distributed mode.

Thanks
Denis

Re: Waste of disk space

Posted by Denis Kreis <de...@gmail.com>.
Thank you!

2011/11/22 Doug Meil <do...@explorysmedical.com>:
>
> "Normal" depends a lot of the KeyValues that get generated.
>
> See the KeyValue section in here..
>
> http://hbase.apache.org/book.html#store
>
> ... Because the usage has a lot to do with the rowkey length, the CF
> name-length, attribute lengths, and whether you're using compression for
> the CF.
>
>
>
>
> On 11/22/11 1:45 PM, "Denis Kreis" <de...@gmail.com> wrote:
>
>>Hi,
>>
>>I loaded a 2GB log file using importtsv. Each row has 54 values, which are
>>all stored in one column family. The disk space consumed on HDFS is about
>>46GB. Is it normal?
>>I am using HBase on HDFS in pseudo-distributed mode.
>>
>>Thanks
>>Denis
>
>
>

Re: Waste of disk space

Posted by Doug Meil <do...@explorysmedical.com>.
"Normal" depends a lot of the KeyValues that get generated.

See the KeyValue section in here..

http://hbase.apache.org/book.html#store

... Because the usage has a lot to do with the rowkey length, the CF
name-length, attribute lengths, and whether you're using compression for
the CF.




On 11/22/11 1:45 PM, "Denis Kreis" <de...@gmail.com> wrote:

>Hi,
>
>I loaded a 2GB log file using importtsv. Each row has 54 values, which are
>all stored in one column family. The disk space consumed on HDFS is about
>46GB. Is it normal?
>I am using HBase on HDFS in pseudo-distributed mode.
>
>Thanks
>Denis