You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Hari Sreekumar <hs...@clickable.com> on 2010/11/10 06:21:31 UTC

Data taking up too much space when put into HBase

Hi,

     Data seems to be taking up too much space when I put into HBase. e.g, I
have a 2 GB text file which seems to be taking up ~70 GB when I dump into
HBase. I have block size set to 64 MB and replication=3, which I think is
the possible reason for this expansion. But if that is the case, how can I
prevent it? Decreasing the block size will have a negative impact on
performance, so is there a way I can increase the average size on
HBase-created  files to be comparable to 64 MB. Right now they are ~5 MB on
average. Or is this an entirely different thing at work here?

thanks,
hari

Re: Data taking up too much space when put into HBase

Posted by Debashis Saha <de...@gmail.com>.
Just to add a note to the comment of J-D:

You want more than one column family ( CF-A and CF-B) only when most (or one
set) of your application is reading information stored in CF-A and does not
care about information in CF-B. In this case separating less used
information in different column family reducing the reading overhead of most
common application use case.

-Debashis

On Thu, Nov 11, 2010 at 12:04 PM, Jeff Whiting <je...@qualtrics.com> wrote:

> Just to clarify, each column family is stored separately from each other.
>  But within a column family each rowkey => key / value is stored
> independently.  I was under the impression that a rowkey would point to
> multiple key / value pairs within the column family stores.  Am I
> understanding everything correctly?
>
> So looking at http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture under
> "Physical Storage View" it looks like multiple key / values are stored under
> one rowkey.  However it should show the rowkey repeated for each time stamp
> key / value combination.  If that is true then I understand why compression
> is so important (lots of redundant data).
>
> ~Jeff
>
>
> On 11/9/2010 10:46 PM, Jean-Daniel Cryans wrote:
>
>> Each value is stored with it's full key e.g. row key + family +
>> qualifier + timestamp + offsets. You don't give any information
>> regarding how you stored the data, but if you have large enough keys
>> then it should easily explain the bloat.
>>
>> J-D
>>
>> On Tue, Nov 9, 2010 at 9:21 PM, Hari Sreekumar<hs...@clickable.com>
>>  wrote:
>>
>>> Hi,
>>>
>>>     Data seems to be taking up too much space when I put into HBase. e.g,
>>> I
>>> have a 2 GB text file which seems to be taking up ~70 GB when I dump into
>>> HBase. I have block size set to 64 MB and replication=3, which I think is
>>> the possible reason for this expansion. But if that is the case, how can
>>> I
>>> prevent it? Decreasing the block size will have a negative impact on
>>> performance, so is there a way I can increase the average size on
>>> HBase-created  files to be comparable to 64 MB. Right now they are ~5 MB
>>> on
>>> average. Or is this an entirely different thing at work here?
>>>
>>> thanks,
>>> hari
>>>
>>>
> --
> Jeff Whiting
> Qualtrics Senior Software Engineer
> jeffw@qualtrics.com
>
>


-- 
- DEBASHIS SAHA

2519 Honeysuckle Ln
Rolling Meadows, IL 60008, USA

1-(847) 925 - 5071 (H);
1-(312)-731- 6414 (M)
--~<O>~--

Re: Data taking up too much space when put into HBase

Posted by Jeff Whiting <je...@qualtrics.com>.
Just to clarify, each column family is stored separately from each other.  But within a column 
family each rowkey => key / value is stored independently.  I was under the impression that a rowkey 
would point to multiple key / value pairs within the column family stores.  Am I understanding 
everything correctly?

So looking at http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture under "Physical Storage View" it 
looks like multiple key / values are stored under one rowkey.  However it should show the rowkey 
repeated for each time stamp key / value combination.  If that is true then I understand why 
compression is so important (lots of redundant data).

~Jeff

On 11/9/2010 10:46 PM, Jean-Daniel Cryans wrote:
> Each value is stored with it's full key e.g. row key + family +
> qualifier + timestamp + offsets. You don't give any information
> regarding how you stored the data, but if you have large enough keys
> then it should easily explain the bloat.
>
> J-D
>
> On Tue, Nov 9, 2010 at 9:21 PM, Hari Sreekumar<hs...@clickable.com>  wrote:
>> Hi,
>>
>>      Data seems to be taking up too much space when I put into HBase. e.g, I
>> have a 2 GB text file which seems to be taking up ~70 GB when I dump into
>> HBase. I have block size set to 64 MB and replication=3, which I think is
>> the possible reason for this expansion. But if that is the case, how can I
>> prevent it? Decreasing the block size will have a negative impact on
>> performance, so is there a way I can increase the average size on
>> HBase-created  files to be comparable to 64 MB. Right now they are ~5 MB on
>> average. Or is this an entirely different thing at work here?
>>
>> thanks,
>> hari
>>

-- 
Jeff Whiting
Qualtrics Senior Software Engineer
jeffw@qualtrics.com


Re: Data taking up too much space when put into HBase

Posted by Hari Sreekumar <hs...@clickable.com>.
Ah, that's a great piece of info J-D! I had 4 families just as a logical
division. I don't think I'm really using the fact that we have 4 different
families anywhere. Thanks a lot for the information.

thanks,
hari

On Thu, Nov 11, 2010 at 10:45 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:

> Oh I see, you are using 4 families. An important thing to know (and
> it's not super obvious) is that the regions flush on the total size of
> the memstore across all families (there's one memstore per family,
> learn more here
> http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html).
>
> This is actually a deficiency which will be solved in the context of
> https://issues.apache.org/jira/browse/HBASE-3149
>
> Generally, I rarely see any reason to use more than 1 family. You
> really have to be in a case where the stored data is very different in
> nature and requires specific family-level configurations. Here across
> our 100+ tables, only 3-4 have more than one family and I'm sure that
> number should be lower.
>
> J-D
>
> On Thu, Nov 11, 2010 at 12:54 AM, Hari Sreekumar
> <hs...@clickable.com> wrote:
> > Here's the output of lsr on one of the tables:
> >
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:33
> > /hbase/Webevent/1102232448
> > -rw-r--r--   3 hadoop supergroup       2318 2010-11-11 13:33
> > /hbase/Webevent/1102232448/.regioninfo
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:33
> > /hbase/Webevent/1102232448/Channel
> > -rw-r--r--   3 hadoop supergroup   16943616 2010-11-11 13:33
> > /hbase/Webevent/1102232448/Channel/7714679806810147132
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:33
> > /hbase/Webevent/1102232448/Customer
> > -rw-r--r--   3 hadoop supergroup   19089809 2010-11-11 13:33
> > /hbase/Webevent/1102232448/Customer/228422950590673569
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:33
> > /hbase/Webevent/1102232448/Event
> > -rw-r--r--   3 hadoop supergroup   96925019 2010-11-11 13:33
> > /hbase/Webevent/1102232448/Event/3246797304454611713
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:34
> > /hbase/Webevent/1102232448/User
> > -rw-r--r--   3 hadoop supergroup  176008329 2010-11-11 13:33
> > /hbase/Webevent/1102232448/User/6713166405821540696
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:34
> > /hbase/Webevent/1102232448/http
> > -rw-r--r--   3 hadoop supergroup   36644077 2010-11-11 13:34
> > /hbase/Webevent/1102232448/http/5528514474393215140
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:40
> > /hbase/Webevent/1181349092
> > -rw-r--r--   3 hadoop supergroup       2318 2010-11-11 13:40
> > /hbase/Webevent/1181349092/.regioninfo
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:40
> > /hbase/Webevent/1181349092/Channel
> > -rw-r--r--   3 hadoop supergroup   14203831 2010-11-11 13:40
> > /hbase/Webevent/1181349092/Channel/1711324265142021994
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:40
> > /hbase/Webevent/1181349092/Customer
> > -rw-r--r--   3 hadoop supergroup   14091927 2010-11-11 13:40
> > /hbase/Webevent/1181349092/Customer/3269372098573435637
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:40
> > /hbase/Webevent/1181349092/Event
> > -rw-r--r--   3 hadoop supergroup   80842368 2010-11-11 13:40
> > /hbase/Webevent/1181349092/Event/1632526964097525926
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:41
> > /hbase/Webevent/1181349092/User
> > -rw-r--r--   3 hadoop supergroup  146490419 2010-11-11 13:40
> > /hbase/Webevent/1181349092/User/723684665063798772
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:41
> > /hbase/Webevent/1181349092/http
> > -rw-r--r--   3 hadoop supergroup   27612664 2010-11-11 13:41
> > /hbase/Webevent/1181349092/http/3591070734425406504
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:28
> > /hbase/Webevent/124990928
> > -rw-r--r--   3 hadoop supergroup       2317 2010-11-11 13:28
> > /hbase/Webevent/124990928/.regioninfo
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:35
> > /hbase/Webevent/124990928/Channel
> > -rw-r--r--   3 hadoop supergroup   23700865 2010-11-11 13:35
> > /hbase/Webevent/124990928/Channel/3488091559288595522
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:35
> > /hbase/Webevent/124990928/Customer
> > -rw-r--r--   3 hadoop supergroup   23572454 2010-11-11 13:35
> > /hbase/Webevent/124990928/Customer/522070966307001888
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:36
> > /hbase/Webevent/124990928/Event
> > -rw-r--r--   3 hadoop supergroup  126857284 2010-11-11 13:35
> > /hbase/Webevent/124990928/Event/8659573512216796018
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:36
> > /hbase/Webevent/124990928/User
> > -rw-r--r--   3 hadoop supergroup  229590074 2010-11-11 13:36
> > /hbase/Webevent/124990928/User/4169913968975354294
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:36
> > /hbase/Webevent/124990928/http
> > -rw-r--r--   3 hadoop supergroup   43849622 2010-11-11 13:36
> > /hbase/Webevent/124990928/http/798925777717846362
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:22
> > /hbase/Webevent/13518424
> > -rw-r--r--   3 hadoop supergroup       2316 2010-11-11 13:22
> > /hbase/Webevent/13518424/.regioninfo
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:24
> > /hbase/Webevent/13518424/Channel
> > -rw-r--r--   3 hadoop supergroup   11192244 2010-11-11 13:24
> > /hbase/Webevent/13518424/Channel/6283534518250465269
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:24
> > /hbase/Webevent/13518424/Customer
> > -rw-r--r--   3 hadoop supergroup   16335757 2010-11-11 13:24
> > /hbase/Webevent/13518424/Customer/8233555538562313638
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:25
> > /hbase/Webevent/13518424/Event
> > -rw-r--r--   3 hadoop supergroup   86782869 2010-11-11 13:24
> > /hbase/Webevent/13518424/Event/7296313542067955537
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:25
> > /hbase/Webevent/13518424/User
> > -rw-r--r--   3 hadoop supergroup  157614762 2010-11-11 13:25
> > /hbase/Webevent/13518424/User/5713897981539665344
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:25
> > /hbase/Webevent/13518424/http
> > -rw-r--r--   3 hadoop supergroup   31036461 2010-11-11 13:25
> > /hbase/Webevent/13518424/http/3276765473089850908
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:22
> > /hbase/Webevent/1397796225
> > -rw-r--r--   3 hadoop supergroup       2144 2010-11-11 13:22
> > /hbase/Webevent/1397796225/.regioninfo
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:30
> > /hbase/Webevent/1397796225/Channel
> > -rw-r--r--   3 hadoop supergroup    3937460 2010-11-11 13:30
> > /hbase/Webevent/1397796225/Channel/3684194843745008101
> > -rw-r--r--   3 hadoop supergroup   13426908 2010-11-11 13:27
> > /hbase/Webevent/1397796225/Channel/5763776518727398923
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:30
> > /hbase/Webevent/1397796225/Customer
> > -rw-r--r--   3 hadoop supergroup    9358001 2010-11-11 13:30
> > /hbase/Webevent/1397796225/Customer/2373893879659383981
> > -rw-r--r--   3 hadoop supergroup   15152448 2010-11-11 13:23
> > /hbase/Webevent/1397796225/Customer/5404281688196690956
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:31
> > /hbase/Webevent/1397796225/Event
> > -rw-r--r--   3 hadoop supergroup   49691275 2010-11-11 13:30
> > /hbase/Webevent/1397796225/Event/1611219478516160819
> > -rw-r--r--   3 hadoop supergroup   80075191 2010-11-11 13:23
> > /hbase/Webevent/1397796225/Event/4491108423840726530
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:31
> > /hbase/Webevent/1397796225/User
> > -rw-r--r--   3 hadoop supergroup  235564578 2010-11-11 13:31
> > /hbase/Webevent/1397796225/User/1070607442453415896
> > -rw-r--r--   3 hadoop supergroup  145355910 2010-11-11 13:23
> > /hbase/Webevent/1397796225/User/6446151707620200218
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:32
> > /hbase/Webevent/1397796225/http
> > -rw-r--r--   3 hadoop supergroup   46665707 2010-11-11 13:31
> > /hbase/Webevent/1397796225/http/2613117415168100829
> > -rw-r--r--   3 hadoop supergroup   28997988 2010-11-11 13:24
> > /hbase/Webevent/1397796225/http/7620282531029987336
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:30
> > /hbase/Webevent/1568886745
> > -rw-r--r--   3 hadoop supergroup       2318 2010-11-11 13:30
> > /hbase/Webevent/1568886745/.regioninfo
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:32
> > /hbase/Webevent/1568886745/Channel
> > -rw-r--r--   3 hadoop supergroup   22384663 2010-11-11 13:32
> > /hbase/Webevent/1568886745/Channel/3092028782443043693
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:32
> > /hbase/Webevent/1568886745/Customer
> > -rw-r--r--   3 hadoop supergroup   24060024 2010-11-11 13:32
> > /hbase/Webevent/1568886745/Customer/2143995643997658656
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:32
> > /hbase/Webevent/1568886745/Event
> > -rw-r--r--   3 hadoop supergroup  111172989 2010-11-11 13:32
> > /hbase/Webevent/1568886745/Event/606180646892333139
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:33
> > /hbase/Webevent/1568886745/User
> > -rw-r--r--   3 hadoop supergroup  201627486 2010-11-11 13:32
> > /hbase/Webevent/1568886745/User/1159084185112718235
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:33
> > /hbase/Webevent/1568886745/http
> > -rw-r--r--   3 hadoop supergroup   42824881 2010-11-11 13:33
> > /hbase/Webevent/1568886745/http/3005498889980823864
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:32
> > /hbase/Webevent/1585185360
> > -rw-r--r--   3 hadoop supergroup       2318 2010-11-11 13:32
> > /hbase/Webevent/1585185360/.regioninfo
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:38
> > /hbase/Webevent/1585185360/Channel
> > -rw-r--r--   3 hadoop supergroup   13146621 2010-11-11 13:38
> > /hbase/Webevent/1585185360/Channel/2384148253824087933
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:38
> > /hbase/Webevent/1585185360/Customer
> > -rw-r--r--   3 hadoop supergroup   17772527 2010-11-11 13:38
> > /hbase/Webevent/1585185360/Customer/7079893521022823531
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:39
> > /hbase/Webevent/1585185360/Event
> > -rw-r--r--   3 hadoop supergroup   97860459 2010-11-11 13:38
> > /hbase/Webevent/1585185360/Event/4129421247504808018
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:39
> > /hbase/Webevent/1585185360/User
> > -rw-r--r--   3 hadoop supergroup  177262872 2010-11-11 13:39
> > /hbase/Webevent/1585185360/User/5689647586095222756
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:39
> > /hbase/Webevent/1585185360/http
> > -rw-r--r--   3 hadoop supergroup   38392938 2010-11-11 13:39
> > /hbase/Webevent/1585185360/http/1513015171284860625
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:31
> > /hbase/Webevent/1679169023
> > -rw-r--r--   3 hadoop supergroup       1970 2010-11-11 13:31
> > /hbase/Webevent/1679169023/.regioninfo
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:37
> > /hbase/Webevent/1679169023/Channel
> > -rw-r--r--   3 hadoop supergroup   16691718 2010-11-11 13:37
> > /hbase/Webevent/1679169023/Channel/3995013105248642215
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:37
> > /hbase/Webevent/1679169023/Customer
> > -rw-r--r--   3 hadoop supergroup   18627546 2010-11-11 13:37
> > /hbase/Webevent/1679169023/Customer/2697135409291299740
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:37
> > /hbase/Webevent/1679169023/Event
> > -rw-r--r--   3 hadoop supergroup   97721412 2010-11-11 13:37
> > /hbase/Webevent/1679169023/Event/5517850771377063599
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:38
> > /hbase/Webevent/1679169023/User
> > -rw-r--r--   3 hadoop supergroup  177198181 2010-11-11 13:37
> > /hbase/Webevent/1679169023/User/1664697801534568988
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:38
> > /hbase/Webevent/1679169023/http
> > -rw-r--r--   3 hadoop supergroup   35558386 2010-11-11 13:38
> > /hbase/Webevent/1679169023/http/2236900881608337670
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:30
> > /hbase/Webevent/1837902643
> > -rw-r--r--   3 hadoop supergroup       2318 2010-11-11 13:30
> > /hbase/Webevent/1837902643/.regioninfo
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:33
> > /hbase/Webevent/1837902643/Channel
> > -rw-r--r--   3 hadoop supergroup   12956819 2010-11-11 13:33
> > /hbase/Webevent/1837902643/Channel/7551397343290053516
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:33
> > /hbase/Webevent/1837902643/Customer
> > -rw-r--r--   3 hadoop supergroup   18017948 2010-11-11 13:33
> > /hbase/Webevent/1837902643/Customer/1637842838964675843
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:33
> > /hbase/Webevent/1837902643/Event
> > -rw-r--r--   3 hadoop supergroup   99238886 2010-11-11 13:33
> > /hbase/Webevent/1837902643/Event/4961580175946952300
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:34
> > /hbase/Webevent/1837902643/User
> > -rw-r--r--   3 hadoop supergroup  179431668 2010-11-11 13:33
> > /hbase/Webevent/1837902643/User/8513763763938668916
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:34
> > /hbase/Webevent/1837902643/http
> > -rw-r--r--   3 hadoop supergroup   35275755 2010-11-11 13:34
> > /hbase/Webevent/1837902643/http/1801439100480395261
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:25
> > /hbase/Webevent/1840258192
> > -rw-r--r--   3 hadoop supergroup       2318 2010-11-11 13:25
> > /hbase/Webevent/1840258192/.regioninfo
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:34
> > /hbase/Webevent/1840258192/Channel
> > -rw-r--r--   3 hadoop supergroup   15810928 2010-11-11 13:34
> > /hbase/Webevent/1840258192/Channel/8758451310929982789
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:34
> > /hbase/Webevent/1840258192/Customer
> > -rw-r--r--   3 hadoop supergroup   16184063 2010-11-11 13:34
> > /hbase/Webevent/1840258192/Customer/8209107027540853853
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:35
> > /hbase/Webevent/1840258192/Event
> > -rw-r--r--   3 hadoop supergroup   89893065 2010-11-11 13:34
> > /hbase/Webevent/1840258192/Event/2507733338503153306
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:35
> > /hbase/Webevent/1840258192/User
> > -rw-r--r--   3 hadoop supergroup  162202298 2010-11-11 13:35
> > /hbase/Webevent/1840258192/User/3877054643528147835
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:35
> > /hbase/Webevent/1840258192/http
> > -rw-r--r--   3 hadoop supergroup   30458950 2010-11-11 13:35
> > /hbase/Webevent/1840258192/http/7057895626422451135
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:28
> > /hbase/Webevent/1857066524
> > -rw-r--r--   3 hadoop supergroup       2318 2010-11-11 13:28
> > /hbase/Webevent/1857066524/.regioninfo
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:28
> > /hbase/Webevent/1857066524/Channel
> > -rw-r--r--   3 hadoop supergroup   17158229 2010-11-11 13:28
> > /hbase/Webevent/1857066524/Channel/660294007043817390
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:28
> > /hbase/Webevent/1857066524/Customer
> > -rw-r--r--   3 hadoop supergroup   17982120 2010-11-11 13:28
> > /hbase/Webevent/1857066524/Customer/8154314358497892797
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:36
> > /hbase/Webevent/1857066524/Event
> > -rw-r--r--   3 hadoop supergroup  103807737 2010-11-11 13:28
> > /hbase/Webevent/1857066524/Event/8608458148878068560
> > -rw-r--r--   3 hadoop supergroup  103807737 2010-11-11 13:36
> > /hbase/Webevent/1857066524/Event/8753716512365715611
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:29
> > /hbase/Webevent/1857066524/User
> > -rw-r--r--   3 hadoop supergroup  188208796 2010-11-11 13:28
> > /hbase/Webevent/1857066524/User/5807656088473870598
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:29
> > /hbase/Webevent/1857066524/http
> > -rw-r--r--   3 hadoop supergroup   35830676 2010-11-11 13:29
> > /hbase/Webevent/1857066524/http/4192260931766222885
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:31
> > /hbase/Webevent/1954991296
> > -rw-r--r--   3 hadoop supergroup       2318 2010-11-11 13:31
> > /hbase/Webevent/1954991296/.regioninfo
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:38
> > /hbase/Webevent/1954991296/Channel
> > -rw-r--r--   3 hadoop supergroup   14723821 2010-11-11 13:38
> > /hbase/Webevent/1954991296/Channel/1271796192395132719
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:38
> > /hbase/Webevent/1954991296/Customer
> > -rw-r--r--   3 hadoop supergroup   16998002 2010-11-11 13:38
> > /hbase/Webevent/1954991296/Customer/1871613240079217431
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:38
> > /hbase/Webevent/1954991296/Event
> > -rw-r--r--   3 hadoop supergroup   90132913 2010-11-11 13:38
> > /hbase/Webevent/1954991296/Event/8627908912432238564
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:38
> > /hbase/Webevent/1954991296/User
> > -rw-r--r--   3 hadoop supergroup  163362248 2010-11-11 13:38
> > /hbase/Webevent/1954991296/User/8343583184278031381
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:38
> > /hbase/Webevent/1954991296/http
> > -rw-r--r--   3 hadoop supergroup   37650515 2010-11-11 13:38
> > /hbase/Webevent/1954991296/http/783502764043910698
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:16
> > /hbase/Webevent/387441199
> > -rw-r--r--   3 hadoop supergroup       2317 2010-11-11 13:16
> > /hbase/Webevent/387441199/.regioninfo
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:23
> > /hbase/Webevent/387441199/Channel
> > -rw-r--r--   3 hadoop supergroup    8751094 2010-11-11 13:22
> > /hbase/Webevent/387441199/Channel/6907788666949153760
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:23
> > /hbase/Webevent/387441199/Customer
> > -rw-r--r--   3 hadoop supergroup   16526400 2010-11-11 13:23
> > /hbase/Webevent/387441199/Customer/52924882214004995
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:23
> > /hbase/Webevent/387441199/Event
> > -rw-r--r--   3 hadoop supergroup   96466783 2010-11-11 13:23
> > /hbase/Webevent/387441199/Event/991918398642333797
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:24
> > /hbase/Webevent/387441199/User
> > -rw-r--r--   3 hadoop supergroup  173755411 2010-11-11 13:23
> > /hbase/Webevent/387441199/User/3697716047653972271
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:22
> > /hbase/Webevent/387441199/http
> > -rw-r--r--   3 hadoop supergroup   29164625 2010-11-11 13:19
> > /hbase/Webevent/387441199/http/2172660655272329198
> > -rw-r--r--   3 hadoop supergroup    3505176 2010-11-11 13:22
> > /hbase/Webevent/387441199/http/9190482934578742068
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:30
> > /hbase/Webevent/480045516
> > -rw-r--r--   3 hadoop supergroup       2317 2010-11-11 13:30
> > /hbase/Webevent/480045516/.regioninfo
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:37
> > /hbase/Webevent/480045516/Channel
> > -rw-r--r--   3 hadoop supergroup   14777812 2010-11-11 13:37
> > /hbase/Webevent/480045516/Channel/2328066899305806515
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:37
> > /hbase/Webevent/480045516/Customer
> > -rw-r--r--   3 hadoop supergroup   18953627 2010-11-11 13:37
> > /hbase/Webevent/480045516/Customer/2078047623290175963
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:37
> > /hbase/Webevent/480045516/Event
> > -rw-r--r--   3 hadoop supergroup  104229664 2010-11-11 13:37
> > /hbase/Webevent/480045516/Event/910211247163239598
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:37
> > /hbase/Webevent/480045516/User
> > -rw-r--r--   3 hadoop supergroup  189096799 2010-11-11 13:37
> > /hbase/Webevent/480045516/User/5717389634644419119
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:37
> > /hbase/Webevent/480045516/http
> > -rw-r--r--   3 hadoop supergroup   36533404 2010-11-11 13:37
> > /hbase/Webevent/480045516/http/8604372036650962237
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:40
> > /hbase/Webevent/601109706
> > -rw-r--r--   3 hadoop supergroup       2317 2010-11-11 13:40
> > /hbase/Webevent/601109706/.regioninfo
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:40
> > /hbase/Webevent/601109706/Channel
> > -rw-r--r--   3 hadoop supergroup   14155967 2010-11-11 13:40
> > /hbase/Webevent/601109706/Channel/1819667230290028427
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:40
> > /hbase/Webevent/601109706/Customer
> > -rw-r--r--   3 hadoop supergroup   14563111 2010-11-11 13:40
> > /hbase/Webevent/601109706/Customer/7336170720169514891
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:40
> > /hbase/Webevent/601109706/Event
> > -rw-r--r--   3 hadoop supergroup   82278013 2010-11-11 13:40
> > /hbase/Webevent/601109706/Event/5064894617590864583
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:40
> > /hbase/Webevent/601109706/User
> > -rw-r--r--   3 hadoop supergroup  149299853 2010-11-11 13:40
> > /hbase/Webevent/601109706/User/5997879119834564841
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:40
> > /hbase/Webevent/601109706/http
> > -rw-r--r--   3 hadoop supergroup   29266049 2010-11-11 13:40
> > /hbase/Webevent/601109706/http/3987271255931462679
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:25
> > /hbase/Webevent/666508206
> > -rw-r--r--   3 hadoop supergroup       2317 2010-11-11 13:25
> > /hbase/Webevent/666508206/.regioninfo
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:33
> > /hbase/Webevent/666508206/Channel
> > -rw-r--r--   3 hadoop supergroup   22727461 2010-11-11 13:33
> > /hbase/Webevent/666508206/Channel/9154587641511700292
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:33
> > /hbase/Webevent/666508206/Customer
> > -rw-r--r--   3 hadoop supergroup   23277615 2010-11-11 13:33
> > /hbase/Webevent/666508206/Customer/3760018687145755911
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:34
> > /hbase/Webevent/666508206/Event
> > -rw-r--r--   3 hadoop supergroup  111133668 2010-11-11 13:33
> > /hbase/Webevent/666508206/Event/3598650053650721687
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:34
> > /hbase/Webevent/666508206/User
> > -rw-r--r--   3 hadoop supergroup  201631388 2010-11-11 13:34
> > /hbase/Webevent/666508206/User/3597127170470234124
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:34
> > /hbase/Webevent/666508206/http
> > -rw-r--r--   3 hadoop supergroup   39920111 2010-11-11 13:34
> > /hbase/Webevent/666508206/http/1455502897668123089
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:32
> > /hbase/Webevent/717393157
> > -rw-r--r--   3 hadoop supergroup       2317 2010-11-11 13:32
> > /hbase/Webevent/717393157/.regioninfo
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:34
> > /hbase/Webevent/717393157/Channel
> > -rw-r--r--   3 hadoop supergroup    7937724 2010-11-11 13:34
> > /hbase/Webevent/717393157/Channel/4038125755496042580
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:34
> > /hbase/Webevent/717393157/Customer
> > -rw-r--r--   3 hadoop supergroup   14666396 2010-11-11 13:34
> > /hbase/Webevent/717393157/Customer/8406371944316504992
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:34
> > /hbase/Webevent/717393157/Event
> > -rw-r--r--   3 hadoop supergroup   85611423 2010-11-11 13:34
> > /hbase/Webevent/717393157/Event/127456153926503346
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:35
> > /hbase/Webevent/717393157/User
> > -rw-r--r--   3 hadoop supergroup  154335622 2010-11-11 13:34
> > /hbase/Webevent/717393157/User/7421172344231467438
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:35
> > /hbase/Webevent/717393157/http
> > -rw-r--r--   3 hadoop supergroup   28943243 2010-11-11 13:35
> > /hbase/Webevent/717393157/http/7543152081662309456
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:30
> > /hbase/Webevent/902882312
> > -rw-r--r--   3 hadoop supergroup       2317 2010-11-11 13:30
> > /hbase/Webevent/902882312/.regioninfo
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:30
> > /hbase/Webevent/902882312/Channel
> > -rw-r--r--   3 hadoop supergroup    9541469 2010-11-11 13:30
> > /hbase/Webevent/902882312/Channel/3254461494206070427
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:31
> > /hbase/Webevent/902882312/Customer
> > -rw-r--r--   3 hadoop supergroup   16270772 2010-11-11 13:30
> > /hbase/Webevent/902882312/Customer/3583245475353507819
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:31
> > /hbase/Webevent/902882312/Event
> > -rw-r--r--   3 hadoop supergroup   90805116 2010-11-11 13:31
> > /hbase/Webevent/902882312/Event/1032140072520109551
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:31
> > /hbase/Webevent/902882312/User
> > -rw-r--r--   3 hadoop supergroup  164990613 2010-11-11 13:31
> > /hbase/Webevent/902882312/User/5112158281218703912
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:31
> > /hbase/Webevent/902882312/http
> > -rw-r--r--   3 hadoop supergroup   38405659 2010-11-11 13:31
> > /hbase/Webevent/902882312/http/5928256232381135445
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:41
> > /hbase/Webevent/compaction.dir
> > drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:36
> > /hbase/Webevent/compaction.dir/1857066524
> > -rw-r--r--   3 hadoop supergroup  153276719 2010-11-11 13:36
> > /hbase/Webevent/compaction.dir/1857066524/8008135349377513409
> >
> > There are many smaller files of sizes < 20 MB which might be actually
> taking
> > up 64*3=192 MB after replication. And even for the larger files, a file
> of
> > 129 MB would use up 3 blocks right? Or is it somehow optimized to
> minimize
> > space usage?
> >
> > On Wed, Nov 10, 2010 at 11:07 PM, Jean-Daniel Cryans <
> jdcryans@apache.org>wrote:
> >
> >> Can you pastebin the output of the lsr command on the table's dir?
> >>
> >> Thx
> >>
> >> J-D
> >>
> >> On Tue, Nov 9, 2010 at 10:54 PM, Hari Sreekumar
> >> <hs...@clickable.com> wrote:
> >> > I checked the "browse filesystem" link in the web interface (50070).
> >> HBase
> >> > creates a directly named after the table ,and in the directory, there
> are
> >> > files which are 5-6 MB in size, on average. Some are in kbs, and there
> >> are
> >> > some of 12-13 MB size, but most are around  6 MB. I was thinking these
> >> files
> >> > are stored in 64 MB blocks, leading to the space usage.
> >> >
> >> > hari
> >> >
> >> > On Wed, Nov 10, 2010 at 11:56 AM, Jean-Daniel Cryans <
> >> jdcryans@apache.org>wrote:
> >> >
> >> >> I'm pretty sure that's not how it's reported by the "du" command, but
> >> >> I wouldn't expect to see files of 5MB on average. Can you be more
> >> >> specific?
> >> >>
> >> >> J-D
> >> >>
> >> >> On Tue, Nov 9, 2010 at 9:58 PM, Hari Sreekumar <
> >> hsreekumar@clickable.com>
> >> >> wrote:
> >> >> > Ah, so the bloat is not because of the files being 5-6 MB in size?
> >> >> Wouldn't
> >> >> > a 6 MB file occupy 64 MB if I set block size as 64 MB?
> >> >> >
> >> >> > hari
> >> >> >
> >> >> > On Wed, Nov 10, 2010 at 11:16 AM, Jean-Daniel Cryans <
> >> >> jdcryans@apache.org>wrote:
> >> >> >
> >> >> >> Each value is stored with it's full key e.g. row key + family +
> >> >> >> qualifier + timestamp + offsets. You don't give any information
> >> >> >> regarding how you stored the data, but if you have large enough
> keys
> >> >> >> then it should easily explain the bloat.
> >> >> >>
> >> >> >> J-D
> >> >> >>
> >> >> >> On Tue, Nov 9, 2010 at 9:21 PM, Hari Sreekumar <
> >> >> hsreekumar@clickable.com>
> >> >> >> wrote:
> >> >> >> > Hi,
> >> >> >> >
> >> >> >> >     Data seems to be taking up too much space when I put into
> >> HBase.
> >> >> e.g,
> >> >> >> I
> >> >> >> > have a 2 GB text file which seems to be taking up ~70 GB when I
> >> dump
> >> >> into
> >> >> >> > HBase. I have block size set to 64 MB and replication=3, which I
> >> think
> >> >> is
> >> >> >> > the possible reason for this expansion. But if that is the case,
> >> how
> >> >> can
> >> >> >> I
> >> >> >> > prevent it? Decreasing the block size will have a negative
> impact
> >> on
> >> >> >> > performance, so is there a way I can increase the average size
> on
> >> >> >> > HBase-created  files to be comparable to 64 MB. Right now they
> are
> >> ~5
> >> >> MB
> >> >> >> on
> >> >> >> > average. Or is this an entirely different thing at work here?
> >> >> >> >
> >> >> >> > thanks,
> >> >> >> > hari
> >> >> >> >
> >> >> >>
> >> >> >
> >> >>
> >> >
> >>
> >
>

Re: Data taking up too much space when put into HBase

Posted by Jean-Daniel Cryans <jd...@apache.org>.
Oh I see, you are using 4 families. An important thing to know (and
it's not super obvious) is that the regions flush on the total size of
the memstore across all families (there's one memstore per family,
learn more here
http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html).

This is actually a deficiency which will be solved in the context of
https://issues.apache.org/jira/browse/HBASE-3149

Generally, I rarely see any reason to use more than 1 family. You
really have to be in a case where the stored data is very different in
nature and requires specific family-level configurations. Here across
our 100+ tables, only 3-4 have more than one family and I'm sure that
number should be lower.

J-D

On Thu, Nov 11, 2010 at 12:54 AM, Hari Sreekumar
<hs...@clickable.com> wrote:
> Here's the output of lsr on one of the tables:
>
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:33
> /hbase/Webevent/1102232448
> -rw-r--r--   3 hadoop supergroup       2318 2010-11-11 13:33
> /hbase/Webevent/1102232448/.regioninfo
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:33
> /hbase/Webevent/1102232448/Channel
> -rw-r--r--   3 hadoop supergroup   16943616 2010-11-11 13:33
> /hbase/Webevent/1102232448/Channel/7714679806810147132
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:33
> /hbase/Webevent/1102232448/Customer
> -rw-r--r--   3 hadoop supergroup   19089809 2010-11-11 13:33
> /hbase/Webevent/1102232448/Customer/228422950590673569
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:33
> /hbase/Webevent/1102232448/Event
> -rw-r--r--   3 hadoop supergroup   96925019 2010-11-11 13:33
> /hbase/Webevent/1102232448/Event/3246797304454611713
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:34
> /hbase/Webevent/1102232448/User
> -rw-r--r--   3 hadoop supergroup  176008329 2010-11-11 13:33
> /hbase/Webevent/1102232448/User/6713166405821540696
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:34
> /hbase/Webevent/1102232448/http
> -rw-r--r--   3 hadoop supergroup   36644077 2010-11-11 13:34
> /hbase/Webevent/1102232448/http/5528514474393215140
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:40
> /hbase/Webevent/1181349092
> -rw-r--r--   3 hadoop supergroup       2318 2010-11-11 13:40
> /hbase/Webevent/1181349092/.regioninfo
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:40
> /hbase/Webevent/1181349092/Channel
> -rw-r--r--   3 hadoop supergroup   14203831 2010-11-11 13:40
> /hbase/Webevent/1181349092/Channel/1711324265142021994
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:40
> /hbase/Webevent/1181349092/Customer
> -rw-r--r--   3 hadoop supergroup   14091927 2010-11-11 13:40
> /hbase/Webevent/1181349092/Customer/3269372098573435637
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:40
> /hbase/Webevent/1181349092/Event
> -rw-r--r--   3 hadoop supergroup   80842368 2010-11-11 13:40
> /hbase/Webevent/1181349092/Event/1632526964097525926
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:41
> /hbase/Webevent/1181349092/User
> -rw-r--r--   3 hadoop supergroup  146490419 2010-11-11 13:40
> /hbase/Webevent/1181349092/User/723684665063798772
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:41
> /hbase/Webevent/1181349092/http
> -rw-r--r--   3 hadoop supergroup   27612664 2010-11-11 13:41
> /hbase/Webevent/1181349092/http/3591070734425406504
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:28
> /hbase/Webevent/124990928
> -rw-r--r--   3 hadoop supergroup       2317 2010-11-11 13:28
> /hbase/Webevent/124990928/.regioninfo
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:35
> /hbase/Webevent/124990928/Channel
> -rw-r--r--   3 hadoop supergroup   23700865 2010-11-11 13:35
> /hbase/Webevent/124990928/Channel/3488091559288595522
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:35
> /hbase/Webevent/124990928/Customer
> -rw-r--r--   3 hadoop supergroup   23572454 2010-11-11 13:35
> /hbase/Webevent/124990928/Customer/522070966307001888
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:36
> /hbase/Webevent/124990928/Event
> -rw-r--r--   3 hadoop supergroup  126857284 2010-11-11 13:35
> /hbase/Webevent/124990928/Event/8659573512216796018
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:36
> /hbase/Webevent/124990928/User
> -rw-r--r--   3 hadoop supergroup  229590074 2010-11-11 13:36
> /hbase/Webevent/124990928/User/4169913968975354294
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:36
> /hbase/Webevent/124990928/http
> -rw-r--r--   3 hadoop supergroup   43849622 2010-11-11 13:36
> /hbase/Webevent/124990928/http/798925777717846362
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:22
> /hbase/Webevent/13518424
> -rw-r--r--   3 hadoop supergroup       2316 2010-11-11 13:22
> /hbase/Webevent/13518424/.regioninfo
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:24
> /hbase/Webevent/13518424/Channel
> -rw-r--r--   3 hadoop supergroup   11192244 2010-11-11 13:24
> /hbase/Webevent/13518424/Channel/6283534518250465269
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:24
> /hbase/Webevent/13518424/Customer
> -rw-r--r--   3 hadoop supergroup   16335757 2010-11-11 13:24
> /hbase/Webevent/13518424/Customer/8233555538562313638
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:25
> /hbase/Webevent/13518424/Event
> -rw-r--r--   3 hadoop supergroup   86782869 2010-11-11 13:24
> /hbase/Webevent/13518424/Event/7296313542067955537
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:25
> /hbase/Webevent/13518424/User
> -rw-r--r--   3 hadoop supergroup  157614762 2010-11-11 13:25
> /hbase/Webevent/13518424/User/5713897981539665344
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:25
> /hbase/Webevent/13518424/http
> -rw-r--r--   3 hadoop supergroup   31036461 2010-11-11 13:25
> /hbase/Webevent/13518424/http/3276765473089850908
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:22
> /hbase/Webevent/1397796225
> -rw-r--r--   3 hadoop supergroup       2144 2010-11-11 13:22
> /hbase/Webevent/1397796225/.regioninfo
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:30
> /hbase/Webevent/1397796225/Channel
> -rw-r--r--   3 hadoop supergroup    3937460 2010-11-11 13:30
> /hbase/Webevent/1397796225/Channel/3684194843745008101
> -rw-r--r--   3 hadoop supergroup   13426908 2010-11-11 13:27
> /hbase/Webevent/1397796225/Channel/5763776518727398923
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:30
> /hbase/Webevent/1397796225/Customer
> -rw-r--r--   3 hadoop supergroup    9358001 2010-11-11 13:30
> /hbase/Webevent/1397796225/Customer/2373893879659383981
> -rw-r--r--   3 hadoop supergroup   15152448 2010-11-11 13:23
> /hbase/Webevent/1397796225/Customer/5404281688196690956
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:31
> /hbase/Webevent/1397796225/Event
> -rw-r--r--   3 hadoop supergroup   49691275 2010-11-11 13:30
> /hbase/Webevent/1397796225/Event/1611219478516160819
> -rw-r--r--   3 hadoop supergroup   80075191 2010-11-11 13:23
> /hbase/Webevent/1397796225/Event/4491108423840726530
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:31
> /hbase/Webevent/1397796225/User
> -rw-r--r--   3 hadoop supergroup  235564578 2010-11-11 13:31
> /hbase/Webevent/1397796225/User/1070607442453415896
> -rw-r--r--   3 hadoop supergroup  145355910 2010-11-11 13:23
> /hbase/Webevent/1397796225/User/6446151707620200218
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:32
> /hbase/Webevent/1397796225/http
> -rw-r--r--   3 hadoop supergroup   46665707 2010-11-11 13:31
> /hbase/Webevent/1397796225/http/2613117415168100829
> -rw-r--r--   3 hadoop supergroup   28997988 2010-11-11 13:24
> /hbase/Webevent/1397796225/http/7620282531029987336
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:30
> /hbase/Webevent/1568886745
> -rw-r--r--   3 hadoop supergroup       2318 2010-11-11 13:30
> /hbase/Webevent/1568886745/.regioninfo
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:32
> /hbase/Webevent/1568886745/Channel
> -rw-r--r--   3 hadoop supergroup   22384663 2010-11-11 13:32
> /hbase/Webevent/1568886745/Channel/3092028782443043693
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:32
> /hbase/Webevent/1568886745/Customer
> -rw-r--r--   3 hadoop supergroup   24060024 2010-11-11 13:32
> /hbase/Webevent/1568886745/Customer/2143995643997658656
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:32
> /hbase/Webevent/1568886745/Event
> -rw-r--r--   3 hadoop supergroup  111172989 2010-11-11 13:32
> /hbase/Webevent/1568886745/Event/606180646892333139
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:33
> /hbase/Webevent/1568886745/User
> -rw-r--r--   3 hadoop supergroup  201627486 2010-11-11 13:32
> /hbase/Webevent/1568886745/User/1159084185112718235
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:33
> /hbase/Webevent/1568886745/http
> -rw-r--r--   3 hadoop supergroup   42824881 2010-11-11 13:33
> /hbase/Webevent/1568886745/http/3005498889980823864
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:32
> /hbase/Webevent/1585185360
> -rw-r--r--   3 hadoop supergroup       2318 2010-11-11 13:32
> /hbase/Webevent/1585185360/.regioninfo
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:38
> /hbase/Webevent/1585185360/Channel
> -rw-r--r--   3 hadoop supergroup   13146621 2010-11-11 13:38
> /hbase/Webevent/1585185360/Channel/2384148253824087933
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:38
> /hbase/Webevent/1585185360/Customer
> -rw-r--r--   3 hadoop supergroup   17772527 2010-11-11 13:38
> /hbase/Webevent/1585185360/Customer/7079893521022823531
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:39
> /hbase/Webevent/1585185360/Event
> -rw-r--r--   3 hadoop supergroup   97860459 2010-11-11 13:38
> /hbase/Webevent/1585185360/Event/4129421247504808018
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:39
> /hbase/Webevent/1585185360/User
> -rw-r--r--   3 hadoop supergroup  177262872 2010-11-11 13:39
> /hbase/Webevent/1585185360/User/5689647586095222756
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:39
> /hbase/Webevent/1585185360/http
> -rw-r--r--   3 hadoop supergroup   38392938 2010-11-11 13:39
> /hbase/Webevent/1585185360/http/1513015171284860625
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:31
> /hbase/Webevent/1679169023
> -rw-r--r--   3 hadoop supergroup       1970 2010-11-11 13:31
> /hbase/Webevent/1679169023/.regioninfo
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:37
> /hbase/Webevent/1679169023/Channel
> -rw-r--r--   3 hadoop supergroup   16691718 2010-11-11 13:37
> /hbase/Webevent/1679169023/Channel/3995013105248642215
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:37
> /hbase/Webevent/1679169023/Customer
> -rw-r--r--   3 hadoop supergroup   18627546 2010-11-11 13:37
> /hbase/Webevent/1679169023/Customer/2697135409291299740
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:37
> /hbase/Webevent/1679169023/Event
> -rw-r--r--   3 hadoop supergroup   97721412 2010-11-11 13:37
> /hbase/Webevent/1679169023/Event/5517850771377063599
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:38
> /hbase/Webevent/1679169023/User
> -rw-r--r--   3 hadoop supergroup  177198181 2010-11-11 13:37
> /hbase/Webevent/1679169023/User/1664697801534568988
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:38
> /hbase/Webevent/1679169023/http
> -rw-r--r--   3 hadoop supergroup   35558386 2010-11-11 13:38
> /hbase/Webevent/1679169023/http/2236900881608337670
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:30
> /hbase/Webevent/1837902643
> -rw-r--r--   3 hadoop supergroup       2318 2010-11-11 13:30
> /hbase/Webevent/1837902643/.regioninfo
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:33
> /hbase/Webevent/1837902643/Channel
> -rw-r--r--   3 hadoop supergroup   12956819 2010-11-11 13:33
> /hbase/Webevent/1837902643/Channel/7551397343290053516
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:33
> /hbase/Webevent/1837902643/Customer
> -rw-r--r--   3 hadoop supergroup   18017948 2010-11-11 13:33
> /hbase/Webevent/1837902643/Customer/1637842838964675843
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:33
> /hbase/Webevent/1837902643/Event
> -rw-r--r--   3 hadoop supergroup   99238886 2010-11-11 13:33
> /hbase/Webevent/1837902643/Event/4961580175946952300
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:34
> /hbase/Webevent/1837902643/User
> -rw-r--r--   3 hadoop supergroup  179431668 2010-11-11 13:33
> /hbase/Webevent/1837902643/User/8513763763938668916
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:34
> /hbase/Webevent/1837902643/http
> -rw-r--r--   3 hadoop supergroup   35275755 2010-11-11 13:34
> /hbase/Webevent/1837902643/http/1801439100480395261
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:25
> /hbase/Webevent/1840258192
> -rw-r--r--   3 hadoop supergroup       2318 2010-11-11 13:25
> /hbase/Webevent/1840258192/.regioninfo
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:34
> /hbase/Webevent/1840258192/Channel
> -rw-r--r--   3 hadoop supergroup   15810928 2010-11-11 13:34
> /hbase/Webevent/1840258192/Channel/8758451310929982789
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:34
> /hbase/Webevent/1840258192/Customer
> -rw-r--r--   3 hadoop supergroup   16184063 2010-11-11 13:34
> /hbase/Webevent/1840258192/Customer/8209107027540853853
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:35
> /hbase/Webevent/1840258192/Event
> -rw-r--r--   3 hadoop supergroup   89893065 2010-11-11 13:34
> /hbase/Webevent/1840258192/Event/2507733338503153306
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:35
> /hbase/Webevent/1840258192/User
> -rw-r--r--   3 hadoop supergroup  162202298 2010-11-11 13:35
> /hbase/Webevent/1840258192/User/3877054643528147835
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:35
> /hbase/Webevent/1840258192/http
> -rw-r--r--   3 hadoop supergroup   30458950 2010-11-11 13:35
> /hbase/Webevent/1840258192/http/7057895626422451135
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:28
> /hbase/Webevent/1857066524
> -rw-r--r--   3 hadoop supergroup       2318 2010-11-11 13:28
> /hbase/Webevent/1857066524/.regioninfo
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:28
> /hbase/Webevent/1857066524/Channel
> -rw-r--r--   3 hadoop supergroup   17158229 2010-11-11 13:28
> /hbase/Webevent/1857066524/Channel/660294007043817390
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:28
> /hbase/Webevent/1857066524/Customer
> -rw-r--r--   3 hadoop supergroup   17982120 2010-11-11 13:28
> /hbase/Webevent/1857066524/Customer/8154314358497892797
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:36
> /hbase/Webevent/1857066524/Event
> -rw-r--r--   3 hadoop supergroup  103807737 2010-11-11 13:28
> /hbase/Webevent/1857066524/Event/8608458148878068560
> -rw-r--r--   3 hadoop supergroup  103807737 2010-11-11 13:36
> /hbase/Webevent/1857066524/Event/8753716512365715611
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:29
> /hbase/Webevent/1857066524/User
> -rw-r--r--   3 hadoop supergroup  188208796 2010-11-11 13:28
> /hbase/Webevent/1857066524/User/5807656088473870598
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:29
> /hbase/Webevent/1857066524/http
> -rw-r--r--   3 hadoop supergroup   35830676 2010-11-11 13:29
> /hbase/Webevent/1857066524/http/4192260931766222885
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:31
> /hbase/Webevent/1954991296
> -rw-r--r--   3 hadoop supergroup       2318 2010-11-11 13:31
> /hbase/Webevent/1954991296/.regioninfo
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:38
> /hbase/Webevent/1954991296/Channel
> -rw-r--r--   3 hadoop supergroup   14723821 2010-11-11 13:38
> /hbase/Webevent/1954991296/Channel/1271796192395132719
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:38
> /hbase/Webevent/1954991296/Customer
> -rw-r--r--   3 hadoop supergroup   16998002 2010-11-11 13:38
> /hbase/Webevent/1954991296/Customer/1871613240079217431
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:38
> /hbase/Webevent/1954991296/Event
> -rw-r--r--   3 hadoop supergroup   90132913 2010-11-11 13:38
> /hbase/Webevent/1954991296/Event/8627908912432238564
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:38
> /hbase/Webevent/1954991296/User
> -rw-r--r--   3 hadoop supergroup  163362248 2010-11-11 13:38
> /hbase/Webevent/1954991296/User/8343583184278031381
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:38
> /hbase/Webevent/1954991296/http
> -rw-r--r--   3 hadoop supergroup   37650515 2010-11-11 13:38
> /hbase/Webevent/1954991296/http/783502764043910698
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:16
> /hbase/Webevent/387441199
> -rw-r--r--   3 hadoop supergroup       2317 2010-11-11 13:16
> /hbase/Webevent/387441199/.regioninfo
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:23
> /hbase/Webevent/387441199/Channel
> -rw-r--r--   3 hadoop supergroup    8751094 2010-11-11 13:22
> /hbase/Webevent/387441199/Channel/6907788666949153760
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:23
> /hbase/Webevent/387441199/Customer
> -rw-r--r--   3 hadoop supergroup   16526400 2010-11-11 13:23
> /hbase/Webevent/387441199/Customer/52924882214004995
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:23
> /hbase/Webevent/387441199/Event
> -rw-r--r--   3 hadoop supergroup   96466783 2010-11-11 13:23
> /hbase/Webevent/387441199/Event/991918398642333797
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:24
> /hbase/Webevent/387441199/User
> -rw-r--r--   3 hadoop supergroup  173755411 2010-11-11 13:23
> /hbase/Webevent/387441199/User/3697716047653972271
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:22
> /hbase/Webevent/387441199/http
> -rw-r--r--   3 hadoop supergroup   29164625 2010-11-11 13:19
> /hbase/Webevent/387441199/http/2172660655272329198
> -rw-r--r--   3 hadoop supergroup    3505176 2010-11-11 13:22
> /hbase/Webevent/387441199/http/9190482934578742068
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:30
> /hbase/Webevent/480045516
> -rw-r--r--   3 hadoop supergroup       2317 2010-11-11 13:30
> /hbase/Webevent/480045516/.regioninfo
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:37
> /hbase/Webevent/480045516/Channel
> -rw-r--r--   3 hadoop supergroup   14777812 2010-11-11 13:37
> /hbase/Webevent/480045516/Channel/2328066899305806515
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:37
> /hbase/Webevent/480045516/Customer
> -rw-r--r--   3 hadoop supergroup   18953627 2010-11-11 13:37
> /hbase/Webevent/480045516/Customer/2078047623290175963
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:37
> /hbase/Webevent/480045516/Event
> -rw-r--r--   3 hadoop supergroup  104229664 2010-11-11 13:37
> /hbase/Webevent/480045516/Event/910211247163239598
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:37
> /hbase/Webevent/480045516/User
> -rw-r--r--   3 hadoop supergroup  189096799 2010-11-11 13:37
> /hbase/Webevent/480045516/User/5717389634644419119
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:37
> /hbase/Webevent/480045516/http
> -rw-r--r--   3 hadoop supergroup   36533404 2010-11-11 13:37
> /hbase/Webevent/480045516/http/8604372036650962237
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:40
> /hbase/Webevent/601109706
> -rw-r--r--   3 hadoop supergroup       2317 2010-11-11 13:40
> /hbase/Webevent/601109706/.regioninfo
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:40
> /hbase/Webevent/601109706/Channel
> -rw-r--r--   3 hadoop supergroup   14155967 2010-11-11 13:40
> /hbase/Webevent/601109706/Channel/1819667230290028427
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:40
> /hbase/Webevent/601109706/Customer
> -rw-r--r--   3 hadoop supergroup   14563111 2010-11-11 13:40
> /hbase/Webevent/601109706/Customer/7336170720169514891
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:40
> /hbase/Webevent/601109706/Event
> -rw-r--r--   3 hadoop supergroup   82278013 2010-11-11 13:40
> /hbase/Webevent/601109706/Event/5064894617590864583
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:40
> /hbase/Webevent/601109706/User
> -rw-r--r--   3 hadoop supergroup  149299853 2010-11-11 13:40
> /hbase/Webevent/601109706/User/5997879119834564841
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:40
> /hbase/Webevent/601109706/http
> -rw-r--r--   3 hadoop supergroup   29266049 2010-11-11 13:40
> /hbase/Webevent/601109706/http/3987271255931462679
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:25
> /hbase/Webevent/666508206
> -rw-r--r--   3 hadoop supergroup       2317 2010-11-11 13:25
> /hbase/Webevent/666508206/.regioninfo
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:33
> /hbase/Webevent/666508206/Channel
> -rw-r--r--   3 hadoop supergroup   22727461 2010-11-11 13:33
> /hbase/Webevent/666508206/Channel/9154587641511700292
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:33
> /hbase/Webevent/666508206/Customer
> -rw-r--r--   3 hadoop supergroup   23277615 2010-11-11 13:33
> /hbase/Webevent/666508206/Customer/3760018687145755911
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:34
> /hbase/Webevent/666508206/Event
> -rw-r--r--   3 hadoop supergroup  111133668 2010-11-11 13:33
> /hbase/Webevent/666508206/Event/3598650053650721687
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:34
> /hbase/Webevent/666508206/User
> -rw-r--r--   3 hadoop supergroup  201631388 2010-11-11 13:34
> /hbase/Webevent/666508206/User/3597127170470234124
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:34
> /hbase/Webevent/666508206/http
> -rw-r--r--   3 hadoop supergroup   39920111 2010-11-11 13:34
> /hbase/Webevent/666508206/http/1455502897668123089
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:32
> /hbase/Webevent/717393157
> -rw-r--r--   3 hadoop supergroup       2317 2010-11-11 13:32
> /hbase/Webevent/717393157/.regioninfo
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:34
> /hbase/Webevent/717393157/Channel
> -rw-r--r--   3 hadoop supergroup    7937724 2010-11-11 13:34
> /hbase/Webevent/717393157/Channel/4038125755496042580
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:34
> /hbase/Webevent/717393157/Customer
> -rw-r--r--   3 hadoop supergroup   14666396 2010-11-11 13:34
> /hbase/Webevent/717393157/Customer/8406371944316504992
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:34
> /hbase/Webevent/717393157/Event
> -rw-r--r--   3 hadoop supergroup   85611423 2010-11-11 13:34
> /hbase/Webevent/717393157/Event/127456153926503346
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:35
> /hbase/Webevent/717393157/User
> -rw-r--r--   3 hadoop supergroup  154335622 2010-11-11 13:34
> /hbase/Webevent/717393157/User/7421172344231467438
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:35
> /hbase/Webevent/717393157/http
> -rw-r--r--   3 hadoop supergroup   28943243 2010-11-11 13:35
> /hbase/Webevent/717393157/http/7543152081662309456
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:30
> /hbase/Webevent/902882312
> -rw-r--r--   3 hadoop supergroup       2317 2010-11-11 13:30
> /hbase/Webevent/902882312/.regioninfo
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:30
> /hbase/Webevent/902882312/Channel
> -rw-r--r--   3 hadoop supergroup    9541469 2010-11-11 13:30
> /hbase/Webevent/902882312/Channel/3254461494206070427
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:31
> /hbase/Webevent/902882312/Customer
> -rw-r--r--   3 hadoop supergroup   16270772 2010-11-11 13:30
> /hbase/Webevent/902882312/Customer/3583245475353507819
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:31
> /hbase/Webevent/902882312/Event
> -rw-r--r--   3 hadoop supergroup   90805116 2010-11-11 13:31
> /hbase/Webevent/902882312/Event/1032140072520109551
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:31
> /hbase/Webevent/902882312/User
> -rw-r--r--   3 hadoop supergroup  164990613 2010-11-11 13:31
> /hbase/Webevent/902882312/User/5112158281218703912
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:31
> /hbase/Webevent/902882312/http
> -rw-r--r--   3 hadoop supergroup   38405659 2010-11-11 13:31
> /hbase/Webevent/902882312/http/5928256232381135445
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:41
> /hbase/Webevent/compaction.dir
> drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:36
> /hbase/Webevent/compaction.dir/1857066524
> -rw-r--r--   3 hadoop supergroup  153276719 2010-11-11 13:36
> /hbase/Webevent/compaction.dir/1857066524/8008135349377513409
>
> There are many smaller files of sizes < 20 MB which might be actually taking
> up 64*3=192 MB after replication. And even for the larger files, a file of
> 129 MB would use up 3 blocks right? Or is it somehow optimized to minimize
> space usage?
>
> On Wed, Nov 10, 2010 at 11:07 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:
>
>> Can you pastebin the output of the lsr command on the table's dir?
>>
>> Thx
>>
>> J-D
>>
>> On Tue, Nov 9, 2010 at 10:54 PM, Hari Sreekumar
>> <hs...@clickable.com> wrote:
>> > I checked the "browse filesystem" link in the web interface (50070).
>> HBase
>> > creates a directly named after the table ,and in the directory, there are
>> > files which are 5-6 MB in size, on average. Some are in kbs, and there
>> are
>> > some of 12-13 MB size, but most are around  6 MB. I was thinking these
>> files
>> > are stored in 64 MB blocks, leading to the space usage.
>> >
>> > hari
>> >
>> > On Wed, Nov 10, 2010 at 11:56 AM, Jean-Daniel Cryans <
>> jdcryans@apache.org>wrote:
>> >
>> >> I'm pretty sure that's not how it's reported by the "du" command, but
>> >> I wouldn't expect to see files of 5MB on average. Can you be more
>> >> specific?
>> >>
>> >> J-D
>> >>
>> >> On Tue, Nov 9, 2010 at 9:58 PM, Hari Sreekumar <
>> hsreekumar@clickable.com>
>> >> wrote:
>> >> > Ah, so the bloat is not because of the files being 5-6 MB in size?
>> >> Wouldn't
>> >> > a 6 MB file occupy 64 MB if I set block size as 64 MB?
>> >> >
>> >> > hari
>> >> >
>> >> > On Wed, Nov 10, 2010 at 11:16 AM, Jean-Daniel Cryans <
>> >> jdcryans@apache.org>wrote:
>> >> >
>> >> >> Each value is stored with it's full key e.g. row key + family +
>> >> >> qualifier + timestamp + offsets. You don't give any information
>> >> >> regarding how you stored the data, but if you have large enough keys
>> >> >> then it should easily explain the bloat.
>> >> >>
>> >> >> J-D
>> >> >>
>> >> >> On Tue, Nov 9, 2010 at 9:21 PM, Hari Sreekumar <
>> >> hsreekumar@clickable.com>
>> >> >> wrote:
>> >> >> > Hi,
>> >> >> >
>> >> >> >     Data seems to be taking up too much space when I put into
>> HBase.
>> >> e.g,
>> >> >> I
>> >> >> > have a 2 GB text file which seems to be taking up ~70 GB when I
>> dump
>> >> into
>> >> >> > HBase. I have block size set to 64 MB and replication=3, which I
>> think
>> >> is
>> >> >> > the possible reason for this expansion. But if that is the case,
>> how
>> >> can
>> >> >> I
>> >> >> > prevent it? Decreasing the block size will have a negative impact
>> on
>> >> >> > performance, so is there a way I can increase the average size on
>> >> >> > HBase-created  files to be comparable to 64 MB. Right now they are
>> ~5
>> >> MB
>> >> >> on
>> >> >> > average. Or is this an entirely different thing at work here?
>> >> >> >
>> >> >> > thanks,
>> >> >> > hari
>> >> >> >
>> >> >>
>> >> >
>> >>
>> >
>>
>

Re: Memory leak in LZO native code?

Posted by Todd Lipcon <to...@cloudera.com>.
On Tue, Nov 16, 2010 at 11:55 AM, Friso van Vollenhoven <
fvanvollenhoven@xebia.com> wrote:

> It's the version at https://github.com/toddlipcon/hadoop-lzo that got
> upped to 0.4.7 because of this fix (download and build yourself). You need
> this version to go with CDH3b3. I don't know how this relates to the ASF
> release / trunk of HBase. This version is a fork from
> https://github.com/kevinweil/hadoop-lzo, which is what I used before (on
> ASF HBase and CDH3b2).
>
>
It's a "fork" in the github term, but in reality if you look at history both
Kevin and I contribute regularly to the project and merge each others'
changes.

-Todd

>
>
> <https://github.com/toddlipcon/hadoop-lzo>
> On 16 nov 2010, at 19:47, Sean Bigdatafun wrote:
>
> Hi Todd,
>
> Can you please give the URL of this fix?
>
> Thanks,
> Sean
>
> On Sat, Nov 13, 2010 at 9:10 PM, Todd Lipcon <todd@cloudera.com<mailto:
> todd@cloudera.com>> wrote:
>
> Hi Friso,
>
> I think I identified the issue. As you suspected, we were unnecessarily
> allocating a lot of native byte buffers in the LZO code where we weren't
> before.
>
> I just pushed a fix to my LZO repository and bumped the version number to
> 0.4.7.
>
> If you have a chance to test this on a dev environment that would be great.
> I will try to test myself this week. (unfortunately I wasn't able to
> reproduce the issue yet)
>
> Thanks
> -Todd
>
> On Fri, Nov 12, 2010 at 4:09 PM, Todd Lipcon <todd@cloudera.com<mailto:
> todd@cloudera.com>> wrote:
>
> Hey Friso,
>
> Thanks so much for the details. I am starting to imagine it could indeed
> be
> a codec leak - especially since you have some cells which are into the
> MB,
> maybe it's expanding some buffers to 64MB.
>
> Let me try to do some tests to reproduce it here in the next week or so.
>
> Anyone else seen this issue?
>
> Thanks
> -Todd
>
> On Fri, Nov 12, 2010 at 1:19 AM, Friso van Vollenhoven <
> fvanvollenhoven@xebia.com<ma...@xebia.com>> wrote:
>
> Hi Todd,
>
> I am afraid I no longer have the broken setup around, because we really
> need a working one right now. We need to demo at a conference next week
> and
> until after that, all changes are frozen both on dev and prod (so we can
> use
> dev as fall back). Later on I could maybe try some more things on our
> dev
> boxes.
>
> If you are doing a repro, here's the stuff you'd probably want to know:
> The workload is write only. No reads happening at the same time. No
> other
> active clients. It is an initial import of data. We do insertions in a
> MR
> job from the reducers. The total volume is about 11 billion puts across
> roughly 450K rows per table (we have a many columns per row data model)
> across 15 tables, all use LZO. Qualifiers are some 50 bytes. Values
> range
> from a small number of KBs generally to MBs in rare cases. The row keys
> have
> a time-related part at the start, so I know the keyspace in advance, so
> I
> create the empty tables with pre-created regions (40 regions) across the
> keyspace to get decent distribution from the start of the job. In order
> to
> not overload HBase, I run the job with only 15 reducers, so there are
> max 15
> concurrent clients active. Other settnigs: max file size is 1GB, HFile
> block
> size is default 64K, client side buffer is 16M, memstore flush size is
> 128M,
> compaction threshold is 5, blocking store files is 9, mem store upper
> limit
> is 20%, lower limit 15%, block cache 40%. During the run, the RSes never
> report more than 5GB of heap usage from the UI, which makes sense,
> because
> block cache is not touched. On a healthy run with somewhat conservative
> settings right now, HBase reports on average about 380K requests per
> second
> in the master UI.
>
> The cluster has 8 workers running TT, DN, RS and another JVM process for
> our own software that sits in front of HBase. Workers are dual quad
> cores
> with 64GB RAM and 10x 600GB disks (we decided to scale the amount of
> seeks
> we can do concurrently). Disks are quite fast: 10K RPM. MR task VMs get
> 1GB
> of heap, TT and DN also. RS gets 16GB of heap and our own software too.
> We
> run 8 mappers and 4 reducers per node. So at absolute max, we should
> have
> 46GB of allocated heap. That leaves 18GB for JVM overhead, native
> allocations and OS. We run Linux 2.6.18-194.11.4.el5. I think it is
> CentOS,
> but I didn't do the installs myself.
>
> I tried numerous different settings both more extreme and more
> conservative to get the thing working, but in the end it always ends up
> swapping. I should have tried a run without LZO, of course, but I was
> out of
> time by then.
>
>
>
> Cheers,
> Friso
>
>
>
> On 12 nov 2010, at 07:06, Todd Lipcon wrote:
>
> Hrm, any chance you can run with a smaller heap and get a jmap dump?
> The
> eclipse MAT tool is also super nice for looking at this stuff if
> indeed
> they
> are java objects.
>
> What kind of workload are you using? Read mostly? Write mostly? Mixed?
> I
> will try to repro.
>
> -Todd
>
> On Thu, Nov 11, 2010 at 8:41 PM, Friso van Vollenhoven <
> fvanvollenhoven@xebia.com<ma...@xebia.com>> wrote:
>
> I figured the same. I also did a run with CMS instead of G1. Same
> results.
>
> I also did a run with the RS heap tuned down to 12GB and 8GB, but
> given
> enough time the process still grows over 40GB in size.
>
>
> Friso
>
>
>
> On 12 nov 2010, at 01:55, Todd Lipcon wrote:
>
> Can you try running this with CMS GC instead of G1GC? G1 still has
> some
> bugs... 64M sounds like it might be G1 "regions"?
>
> -Todd
>
> On Thu, Nov 11, 2010 at 2:07 AM, Friso van Vollenhoven <
> fvanvollenhoven@xebia.com<ma...@xebia.com>> wrote:
>
> Hi All,
>
> (This is all about CDH3, so I am not sure whether it should go on
> this
> list, but I figure it is at least interesting for people trying the
> same.)
>
> I've recently tried CDH3 on a new cluster from RPMs with the
> hadoop-lzo
> fork from https://github.com/toddlipcon/hadoop-lzo. Everything
> works
> like
> a charm initially, but after some time (minutes to max one hour),
> the
> RS
> JVM
> process memory grows to more than twice the given heap size and
> beyond.
> I
> have seen a RS with 16GB heap that grows to 55GB virtual size. At
> some
> point, everything start swapping and GC times go into the minutes
> and
> everything dies or is considered dead by the master.
>
> I did a pmap -x on the RS process and that shows a lot of allocated
> blocks
> of about 64M by the process. There about 500 of these, which is
> 32GB
> in
> total. See: http://pastebin.com/8pgzPf7b (bottom of the file, the
> blocks
> of about 1M on top are probably thread stacks). Unfortunately,
> Linux
> shows
> the native heap as anon blocks, so I can not link it to a specific
> lib
> or
> something.
>
> I am running the latest CDH3 and hadoop-lzo 0.4.6 (from said URL,
> the
> one
> which has the reinit() support). I run Java 6u21 with the G1
> garbage
> collector, which has been running fine for some weeks now. Full
> command
> line
> is:
> java -Xmx16000m -XX:+HeapDumpOnOutOfMemoryError
> -XX:+UnlockExperimentalVMOptions -XX:+UseG1GC
> -XX:+UseCompressedOops
> -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps
> -Xloggc:/export/logs/hbase/gc-hbase.log
> -Djava.library.path=/home/inr/java-lib/hbase/native/Linux-amd64-64
> -Djava.net.preferIPv4Stack=true -Dhbase.log.dir=/export/logs/hbase
> -Dhbase.log.file=hbase-hbase-regionserver-w3r1.inrdb.ripe.net.log
> -Dhbase.home.dir=/usr/lib/hbase/bin/.. -Dhbase.id.str=hbase
> -Dhbase.r
>
> I searched the HBase source for something that could point to
> native
> heap
> usage (like ByteBuffer#allocateDirect(...)), but I could not find
> anything.
> Thread count is about 185 (I have 100 handlers), so nothing strange
> there as
> well.
>
> Question is, could this be HBase or is this a problem with the
> hadoop-lzo?
>
> I have currently downgraded to a version known to work, because we
> have
> a
> demo coming up. But still interested in the answer.
>
>
>
> Regards,
> Friso
>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>
>
>
>
> --
> --Sean
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Memory leak in LZO native code?

Posted by Friso van Vollenhoven <fv...@xebia.com>.
It's the version at https://github.com/toddlipcon/hadoop-lzo that got upped to 0.4.7 because of this fix (download and build yourself). You need this version to go with CDH3b3. I don't know how this relates to the ASF release / trunk of HBase. This version is a fork from https://github.com/kevinweil/hadoop-lzo, which is what I used before (on ASF HBase and CDH3b2).

Friso


<https://github.com/toddlipcon/hadoop-lzo>
On 16 nov 2010, at 19:47, Sean Bigdatafun wrote:

Hi Todd,

Can you please give the URL of this fix?

Thanks,
Sean

On Sat, Nov 13, 2010 at 9:10 PM, Todd Lipcon <to...@cloudera.com>> wrote:

Hi Friso,

I think I identified the issue. As you suspected, we were unnecessarily
allocating a lot of native byte buffers in the LZO code where we weren't
before.

I just pushed a fix to my LZO repository and bumped the version number to
0.4.7.

If you have a chance to test this on a dev environment that would be great.
I will try to test myself this week. (unfortunately I wasn't able to
reproduce the issue yet)

Thanks
-Todd

On Fri, Nov 12, 2010 at 4:09 PM, Todd Lipcon <to...@cloudera.com>> wrote:

Hey Friso,

Thanks so much for the details. I am starting to imagine it could indeed
be
a codec leak - especially since you have some cells which are into the
MB,
maybe it's expanding some buffers to 64MB.

Let me try to do some tests to reproduce it here in the next week or so.

Anyone else seen this issue?

Thanks
-Todd

On Fri, Nov 12, 2010 at 1:19 AM, Friso van Vollenhoven <
fvanvollenhoven@xebia.com<ma...@xebia.com>> wrote:

Hi Todd,

I am afraid I no longer have the broken setup around, because we really
need a working one right now. We need to demo at a conference next week
and
until after that, all changes are frozen both on dev and prod (so we can
use
dev as fall back). Later on I could maybe try some more things on our
dev
boxes.

If you are doing a repro, here's the stuff you'd probably want to know:
The workload is write only. No reads happening at the same time. No
other
active clients. It is an initial import of data. We do insertions in a
MR
job from the reducers. The total volume is about 11 billion puts across
roughly 450K rows per table (we have a many columns per row data model)
across 15 tables, all use LZO. Qualifiers are some 50 bytes. Values
range
from a small number of KBs generally to MBs in rare cases. The row keys
have
a time-related part at the start, so I know the keyspace in advance, so
I
create the empty tables with pre-created regions (40 regions) across the
keyspace to get decent distribution from the start of the job. In order
to
not overload HBase, I run the job with only 15 reducers, so there are
max 15
concurrent clients active. Other settnigs: max file size is 1GB, HFile
block
size is default 64K, client side buffer is 16M, memstore flush size is
128M,
compaction threshold is 5, blocking store files is 9, mem store upper
limit
is 20%, lower limit 15%, block cache 40%. During the run, the RSes never
report more than 5GB of heap usage from the UI, which makes sense,
because
block cache is not touched. On a healthy run with somewhat conservative
settings right now, HBase reports on average about 380K requests per
second
in the master UI.

The cluster has 8 workers running TT, DN, RS and another JVM process for
our own software that sits in front of HBase. Workers are dual quad
cores
with 64GB RAM and 10x 600GB disks (we decided to scale the amount of
seeks
we can do concurrently). Disks are quite fast: 10K RPM. MR task VMs get
1GB
of heap, TT and DN also. RS gets 16GB of heap and our own software too.
We
run 8 mappers and 4 reducers per node. So at absolute max, we should
have
46GB of allocated heap. That leaves 18GB for JVM overhead, native
allocations and OS. We run Linux 2.6.18-194.11.4.el5. I think it is
CentOS,
but I didn't do the installs myself.

I tried numerous different settings both more extreme and more
conservative to get the thing working, but in the end it always ends up
swapping. I should have tried a run without LZO, of course, but I was
out of
time by then.



Cheers,
Friso



On 12 nov 2010, at 07:06, Todd Lipcon wrote:

Hrm, any chance you can run with a smaller heap and get a jmap dump?
The
eclipse MAT tool is also super nice for looking at this stuff if
indeed
they
are java objects.

What kind of workload are you using? Read mostly? Write mostly? Mixed?
I
will try to repro.

-Todd

On Thu, Nov 11, 2010 at 8:41 PM, Friso van Vollenhoven <
fvanvollenhoven@xebia.com<ma...@xebia.com>> wrote:

I figured the same. I also did a run with CMS instead of G1. Same
results.

I also did a run with the RS heap tuned down to 12GB and 8GB, but
given
enough time the process still grows over 40GB in size.


Friso



On 12 nov 2010, at 01:55, Todd Lipcon wrote:

Can you try running this with CMS GC instead of G1GC? G1 still has
some
bugs... 64M sounds like it might be G1 "regions"?

-Todd

On Thu, Nov 11, 2010 at 2:07 AM, Friso van Vollenhoven <
fvanvollenhoven@xebia.com<ma...@xebia.com>> wrote:

Hi All,

(This is all about CDH3, so I am not sure whether it should go on
this
list, but I figure it is at least interesting for people trying the
same.)

I've recently tried CDH3 on a new cluster from RPMs with the
hadoop-lzo
fork from https://github.com/toddlipcon/hadoop-lzo. Everything
works
like
a charm initially, but after some time (minutes to max one hour),
the
RS
JVM
process memory grows to more than twice the given heap size and
beyond.
I
have seen a RS with 16GB heap that grows to 55GB virtual size. At
some
point, everything start swapping and GC times go into the minutes
and
everything dies or is considered dead by the master.

I did a pmap -x on the RS process and that shows a lot of allocated
blocks
of about 64M by the process. There about 500 of these, which is
32GB
in
total. See: http://pastebin.com/8pgzPf7b (bottom of the file, the
blocks
of about 1M on top are probably thread stacks). Unfortunately,
Linux
shows
the native heap as anon blocks, so I can not link it to a specific
lib
or
something.

I am running the latest CDH3 and hadoop-lzo 0.4.6 (from said URL,
the
one
which has the reinit() support). I run Java 6u21 with the G1
garbage
collector, which has been running fine for some weeks now. Full
command
line
is:
java -Xmx16000m -XX:+HeapDumpOnOutOfMemoryError
-XX:+UnlockExperimentalVMOptions -XX:+UseG1GC
-XX:+UseCompressedOops
-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps
-Xloggc:/export/logs/hbase/gc-hbase.log
-Djava.library.path=/home/inr/java-lib/hbase/native/Linux-amd64-64
-Djava.net.preferIPv4Stack=true -Dhbase.log.dir=/export/logs/hbase
-Dhbase.log.file=hbase-hbase-regionserver-w3r1.inrdb.ripe.net.log
-Dhbase.home.dir=/usr/lib/hbase/bin/.. -Dhbase.id.str=hbase
-Dhbase.r

I searched the HBase source for something that could point to
native
heap
usage (like ByteBuffer#allocateDirect(...)), but I could not find
anything.
Thread count is about 185 (I have 100 handlers), so nothing strange
there as
well.

Question is, could this be HBase or is this a problem with the
hadoop-lzo?

I have currently downgraded to a version known to work, because we
have
a
demo coming up. But still interested in the answer.



Regards,
Friso




--
Todd Lipcon
Software Engineer, Cloudera




--
Todd Lipcon
Software Engineer, Cloudera




--
Todd Lipcon
Software Engineer, Cloudera




--
Todd Lipcon
Software Engineer, Cloudera




--
--Sean


Re: Memory leak in LZO native code?

Posted by Sean Bigdatafun <se...@gmail.com>.
Hi Todd,

Can you please give the URL of this fix?

Thanks,
Sean

On Sat, Nov 13, 2010 at 9:10 PM, Todd Lipcon <to...@cloudera.com> wrote:

> Hi Friso,
>
> I think I identified the issue. As you suspected, we were unnecessarily
> allocating a lot of native byte buffers in the LZO code where we weren't
> before.
>
> I just pushed a fix to my LZO repository and bumped the version number to
> 0.4.7.
>
> If you have a chance to test this on a dev environment that would be great.
> I will try to test myself this week. (unfortunately I wasn't able to
> reproduce the issue yet)
>
> Thanks
> -Todd
>
> On Fri, Nov 12, 2010 at 4:09 PM, Todd Lipcon <to...@cloudera.com> wrote:
>
> > Hey Friso,
> >
> > Thanks so much for the details. I am starting to imagine it could indeed
> be
> > a codec leak - especially since you have some cells which are into the
> MB,
> > maybe it's expanding some buffers to 64MB.
> >
> > Let me try to do some tests to reproduce it here in the next week or so.
> >
> > Anyone else seen this issue?
> >
> > Thanks
> > -Todd
> >
> > On Fri, Nov 12, 2010 at 1:19 AM, Friso van Vollenhoven <
> > fvanvollenhoven@xebia.com> wrote:
> >
> >> Hi Todd,
> >>
> >> I am afraid I no longer have the broken setup around, because we really
> >> need a working one right now. We need to demo at a conference next week
> and
> >> until after that, all changes are frozen both on dev and prod (so we can
> use
> >> dev as fall back). Later on I could maybe try some more things on our
> dev
> >> boxes.
> >>
> >> If you are doing a repro, here's the stuff you'd probably want to know:
> >> The workload is write only. No reads happening at the same time. No
> other
> >> active clients. It is an initial import of data. We do insertions in a
> MR
> >> job from the reducers. The total volume is about 11 billion puts across
> >> roughly 450K rows per table (we have a many columns per row data model)
> >> across 15 tables, all use LZO. Qualifiers are some 50 bytes. Values
> range
> >> from a small number of KBs generally to MBs in rare cases. The row keys
> have
> >> a time-related part at the start, so I know the keyspace in advance, so
> I
> >> create the empty tables with pre-created regions (40 regions) across the
> >> keyspace to get decent distribution from the start of the job. In order
> to
> >> not overload HBase, I run the job with only 15 reducers, so there are
> max 15
> >> concurrent clients active. Other settnigs: max file size is 1GB, HFile
> block
> >> size is default 64K, client side buffer is 16M, memstore flush size is
> 128M,
> >> compaction threshold is 5, blocking store files is 9, mem store upper
> limit
> >> is 20%, lower limit 15%, block cache 40%. During the run, the RSes never
> >> report more than 5GB of heap usage from the UI, which makes sense,
> because
> >> block cache is not touched. On a healthy run with somewhat conservative
> >> settings right now, HBase reports on average about 380K requests per
> second
> >> in the master UI.
> >>
> >> The cluster has 8 workers running TT, DN, RS and another JVM process for
> >> our own software that sits in front of HBase. Workers are dual quad
> cores
> >> with 64GB RAM and 10x 600GB disks (we decided to scale the amount of
> seeks
> >> we can do concurrently). Disks are quite fast: 10K RPM. MR task VMs get
> 1GB
> >> of heap, TT and DN also. RS gets 16GB of heap and our own software too.
> We
> >> run 8 mappers and 4 reducers per node. So at absolute max, we should
> have
> >> 46GB of allocated heap. That leaves 18GB for JVM overhead, native
> >> allocations and OS. We run Linux 2.6.18-194.11.4.el5. I think it is
> CentOS,
> >> but I didn't do the installs myself.
> >>
> >> I tried numerous different settings both more extreme and more
> >> conservative to get the thing working, but in the end it always ends up
> >> swapping. I should have tried a run without LZO, of course, but I was
> out of
> >> time by then.
> >>
> >>
> >>
> >> Cheers,
> >> Friso
> >>
> >>
> >>
> >> On 12 nov 2010, at 07:06, Todd Lipcon wrote:
> >>
> >> > Hrm, any chance you can run with a smaller heap and get a jmap dump?
> The
> >> > eclipse MAT tool is also super nice for looking at this stuff if
> indeed
> >> they
> >> > are java objects.
> >> >
> >> > What kind of workload are you using? Read mostly? Write mostly? Mixed?
> I
> >> > will try to repro.
> >> >
> >> > -Todd
> >> >
> >> > On Thu, Nov 11, 2010 at 8:41 PM, Friso van Vollenhoven <
> >> > fvanvollenhoven@xebia.com> wrote:
> >> >
> >> >> I figured the same. I also did a run with CMS instead of G1. Same
> >> results.
> >> >>
> >> >> I also did a run with the RS heap tuned down to 12GB and 8GB, but
> given
> >> >> enough time the process still grows over 40GB in size.
> >> >>
> >> >>
> >> >> Friso
> >> >>
> >> >>
> >> >>
> >> >> On 12 nov 2010, at 01:55, Todd Lipcon wrote:
> >> >>
> >> >>> Can you try running this with CMS GC instead of G1GC? G1 still has
> >> some
> >> >>> bugs... 64M sounds like it might be G1 "regions"?
> >> >>>
> >> >>> -Todd
> >> >>>
> >> >>> On Thu, Nov 11, 2010 at 2:07 AM, Friso van Vollenhoven <
> >> >>> fvanvollenhoven@xebia.com> wrote:
> >> >>>
> >> >>>> Hi All,
> >> >>>>
> >> >>>> (This is all about CDH3, so I am not sure whether it should go on
> >> this
> >> >>>> list, but I figure it is at least interesting for people trying the
> >> >> same.)
> >> >>>>
> >> >>>> I've recently tried CDH3 on a new cluster from RPMs with the
> >> hadoop-lzo
> >> >>>> fork from https://github.com/toddlipcon/hadoop-lzo. Everything
> works
> >> >> like
> >> >>>> a charm initially, but after some time (minutes to max one hour),
> the
> >> RS
> >> >> JVM
> >> >>>> process memory grows to more than twice the given heap size and
> >> beyond.
> >> >> I
> >> >>>> have seen a RS with 16GB heap that grows to 55GB virtual size. At
> >> some
> >> >>>> point, everything start swapping and GC times go into the minutes
> and
> >> >>>> everything dies or is considered dead by the master.
> >> >>>>
> >> >>>> I did a pmap -x on the RS process and that shows a lot of allocated
> >> >> blocks
> >> >>>> of about 64M by the process. There about 500 of these, which is
> 32GB
> >> in
> >> >>>> total. See: http://pastebin.com/8pgzPf7b (bottom of the file, the
> >> >> blocks
> >> >>>> of about 1M on top are probably thread stacks). Unfortunately,
> Linux
> >> >> shows
> >> >>>> the native heap as anon blocks, so I can not link it to a specific
> >> lib
> >> >> or
> >> >>>> something.
> >> >>>>
> >> >>>> I am running the latest CDH3 and hadoop-lzo 0.4.6 (from said URL,
> the
> >> >> one
> >> >>>> which has the reinit() support). I run Java 6u21 with the G1
> garbage
> >> >>>> collector, which has been running fine for some weeks now. Full
> >> command
> >> >> line
> >> >>>> is:
> >> >>>> java -Xmx16000m -XX:+HeapDumpOnOutOfMemoryError
> >> >>>> -XX:+UnlockExperimentalVMOptions -XX:+UseG1GC
> -XX:+UseCompressedOops
> >> >>>> -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps
> >> >>>> -Xloggc:/export/logs/hbase/gc-hbase.log
> >> >>>> -Djava.library.path=/home/inr/java-lib/hbase/native/Linux-amd64-64
> >> >>>> -Djava.net.preferIPv4Stack=true -Dhbase.log.dir=/export/logs/hbase
> >> >>>> -Dhbase.log.file=hbase-hbase-regionserver-w3r1.inrdb.ripe.net.log
> >> >>>> -Dhbase.home.dir=/usr/lib/hbase/bin/.. -Dhbase.id.str=hbase
> -Dhbase.r
> >> >>>>
> >> >>>> I searched the HBase source for something that could point to
> native
> >> >> heap
> >> >>>> usage (like ByteBuffer#allocateDirect(...)), but I could not find
> >> >> anything.
> >> >>>> Thread count is about 185 (I have 100 handlers), so nothing strange
> >> >> there as
> >> >>>> well.
> >> >>>>
> >> >>>> Question is, could this be HBase or is this a problem with the
> >> >> hadoop-lzo?
> >> >>>>
> >> >>>> I have currently downgraded to a version known to work, because we
> >> have
> >> >> a
> >> >>>> demo coming up. But still interested in the answer.
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>> Regards,
> >> >>>> Friso
> >> >>>>
> >> >>>>
> >> >>>
> >> >>>
> >> >>> --
> >> >>> Todd Lipcon
> >> >>> Software Engineer, Cloudera
> >> >>
> >> >>
> >> >
> >> >
> >> > --
> >> > Todd Lipcon
> >> > Software Engineer, Cloudera
> >>
> >>
> >
> >
> > --
> > Todd Lipcon
> > Software Engineer, Cloudera
> >
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>



-- 
--Sean

Re: Memory leak in LZO native code?

Posted by Todd Lipcon <to...@cloudera.com>.
Hi Friso,

I think I identified the issue. As you suspected, we were unnecessarily
allocating a lot of native byte buffers in the LZO code where we weren't
before.

I just pushed a fix to my LZO repository and bumped the version number to
0.4.7.

If you have a chance to test this on a dev environment that would be great.
I will try to test myself this week. (unfortunately I wasn't able to
reproduce the issue yet)

Thanks
-Todd

On Fri, Nov 12, 2010 at 4:09 PM, Todd Lipcon <to...@cloudera.com> wrote:

> Hey Friso,
>
> Thanks so much for the details. I am starting to imagine it could indeed be
> a codec leak - especially since you have some cells which are into the MB,
> maybe it's expanding some buffers to 64MB.
>
> Let me try to do some tests to reproduce it here in the next week or so.
>
> Anyone else seen this issue?
>
> Thanks
> -Todd
>
> On Fri, Nov 12, 2010 at 1:19 AM, Friso van Vollenhoven <
> fvanvollenhoven@xebia.com> wrote:
>
>> Hi Todd,
>>
>> I am afraid I no longer have the broken setup around, because we really
>> need a working one right now. We need to demo at a conference next week and
>> until after that, all changes are frozen both on dev and prod (so we can use
>> dev as fall back). Later on I could maybe try some more things on our dev
>> boxes.
>>
>> If you are doing a repro, here's the stuff you'd probably want to know:
>> The workload is write only. No reads happening at the same time. No other
>> active clients. It is an initial import of data. We do insertions in a MR
>> job from the reducers. The total volume is about 11 billion puts across
>> roughly 450K rows per table (we have a many columns per row data model)
>> across 15 tables, all use LZO. Qualifiers are some 50 bytes. Values range
>> from a small number of KBs generally to MBs in rare cases. The row keys have
>> a time-related part at the start, so I know the keyspace in advance, so I
>> create the empty tables with pre-created regions (40 regions) across the
>> keyspace to get decent distribution from the start of the job. In order to
>> not overload HBase, I run the job with only 15 reducers, so there are max 15
>> concurrent clients active. Other settnigs: max file size is 1GB, HFile block
>> size is default 64K, client side buffer is 16M, memstore flush size is 128M,
>> compaction threshold is 5, blocking store files is 9, mem store upper limit
>> is 20%, lower limit 15%, block cache 40%. During the run, the RSes never
>> report more than 5GB of heap usage from the UI, which makes sense, because
>> block cache is not touched. On a healthy run with somewhat conservative
>> settings right now, HBase reports on average about 380K requests per second
>> in the master UI.
>>
>> The cluster has 8 workers running TT, DN, RS and another JVM process for
>> our own software that sits in front of HBase. Workers are dual quad cores
>> with 64GB RAM and 10x 600GB disks (we decided to scale the amount of seeks
>> we can do concurrently). Disks are quite fast: 10K RPM. MR task VMs get 1GB
>> of heap, TT and DN also. RS gets 16GB of heap and our own software too. We
>> run 8 mappers and 4 reducers per node. So at absolute max, we should have
>> 46GB of allocated heap. That leaves 18GB for JVM overhead, native
>> allocations and OS. We run Linux 2.6.18-194.11.4.el5. I think it is CentOS,
>> but I didn't do the installs myself.
>>
>> I tried numerous different settings both more extreme and more
>> conservative to get the thing working, but in the end it always ends up
>> swapping. I should have tried a run without LZO, of course, but I was out of
>> time by then.
>>
>>
>>
>> Cheers,
>> Friso
>>
>>
>>
>> On 12 nov 2010, at 07:06, Todd Lipcon wrote:
>>
>> > Hrm, any chance you can run with a smaller heap and get a jmap dump? The
>> > eclipse MAT tool is also super nice for looking at this stuff if indeed
>> they
>> > are java objects.
>> >
>> > What kind of workload are you using? Read mostly? Write mostly? Mixed? I
>> > will try to repro.
>> >
>> > -Todd
>> >
>> > On Thu, Nov 11, 2010 at 8:41 PM, Friso van Vollenhoven <
>> > fvanvollenhoven@xebia.com> wrote:
>> >
>> >> I figured the same. I also did a run with CMS instead of G1. Same
>> results.
>> >>
>> >> I also did a run with the RS heap tuned down to 12GB and 8GB, but given
>> >> enough time the process still grows over 40GB in size.
>> >>
>> >>
>> >> Friso
>> >>
>> >>
>> >>
>> >> On 12 nov 2010, at 01:55, Todd Lipcon wrote:
>> >>
>> >>> Can you try running this with CMS GC instead of G1GC? G1 still has
>> some
>> >>> bugs... 64M sounds like it might be G1 "regions"?
>> >>>
>> >>> -Todd
>> >>>
>> >>> On Thu, Nov 11, 2010 at 2:07 AM, Friso van Vollenhoven <
>> >>> fvanvollenhoven@xebia.com> wrote:
>> >>>
>> >>>> Hi All,
>> >>>>
>> >>>> (This is all about CDH3, so I am not sure whether it should go on
>> this
>> >>>> list, but I figure it is at least interesting for people trying the
>> >> same.)
>> >>>>
>> >>>> I've recently tried CDH3 on a new cluster from RPMs with the
>> hadoop-lzo
>> >>>> fork from https://github.com/toddlipcon/hadoop-lzo. Everything works
>> >> like
>> >>>> a charm initially, but after some time (minutes to max one hour), the
>> RS
>> >> JVM
>> >>>> process memory grows to more than twice the given heap size and
>> beyond.
>> >> I
>> >>>> have seen a RS with 16GB heap that grows to 55GB virtual size. At
>> some
>> >>>> point, everything start swapping and GC times go into the minutes and
>> >>>> everything dies or is considered dead by the master.
>> >>>>
>> >>>> I did a pmap -x on the RS process and that shows a lot of allocated
>> >> blocks
>> >>>> of about 64M by the process. There about 500 of these, which is 32GB
>> in
>> >>>> total. See: http://pastebin.com/8pgzPf7b (bottom of the file, the
>> >> blocks
>> >>>> of about 1M on top are probably thread stacks). Unfortunately, Linux
>> >> shows
>> >>>> the native heap as anon blocks, so I can not link it to a specific
>> lib
>> >> or
>> >>>> something.
>> >>>>
>> >>>> I am running the latest CDH3 and hadoop-lzo 0.4.6 (from said URL, the
>> >> one
>> >>>> which has the reinit() support). I run Java 6u21 with the G1 garbage
>> >>>> collector, which has been running fine for some weeks now. Full
>> command
>> >> line
>> >>>> is:
>> >>>> java -Xmx16000m -XX:+HeapDumpOnOutOfMemoryError
>> >>>> -XX:+UnlockExperimentalVMOptions -XX:+UseG1GC -XX:+UseCompressedOops
>> >>>> -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps
>> >>>> -Xloggc:/export/logs/hbase/gc-hbase.log
>> >>>> -Djava.library.path=/home/inr/java-lib/hbase/native/Linux-amd64-64
>> >>>> -Djava.net.preferIPv4Stack=true -Dhbase.log.dir=/export/logs/hbase
>> >>>> -Dhbase.log.file=hbase-hbase-regionserver-w3r1.inrdb.ripe.net.log
>> >>>> -Dhbase.home.dir=/usr/lib/hbase/bin/.. -Dhbase.id.str=hbase -Dhbase.r
>> >>>>
>> >>>> I searched the HBase source for something that could point to native
>> >> heap
>> >>>> usage (like ByteBuffer#allocateDirect(...)), but I could not find
>> >> anything.
>> >>>> Thread count is about 185 (I have 100 handlers), so nothing strange
>> >> there as
>> >>>> well.
>> >>>>
>> >>>> Question is, could this be HBase or is this a problem with the
>> >> hadoop-lzo?
>> >>>>
>> >>>> I have currently downgraded to a version known to work, because we
>> have
>> >> a
>> >>>> demo coming up. But still interested in the answer.
>> >>>>
>> >>>>
>> >>>>
>> >>>> Regards,
>> >>>> Friso
>> >>>>
>> >>>>
>> >>>
>> >>>
>> >>> --
>> >>> Todd Lipcon
>> >>> Software Engineer, Cloudera
>> >>
>> >>
>> >
>> >
>> > --
>> > Todd Lipcon
>> > Software Engineer, Cloudera
>>
>>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Memory leak in LZO native code?

Posted by Todd Lipcon <to...@cloudera.com>.
Hey Friso,

Thanks so much for the details. I am starting to imagine it could indeed be
a codec leak - especially since you have some cells which are into the MB,
maybe it's expanding some buffers to 64MB.

Let me try to do some tests to reproduce it here in the next week or so.

Anyone else seen this issue?

Thanks
-Todd

On Fri, Nov 12, 2010 at 1:19 AM, Friso van Vollenhoven <
fvanvollenhoven@xebia.com> wrote:

> Hi Todd,
>
> I am afraid I no longer have the broken setup around, because we really
> need a working one right now. We need to demo at a conference next week and
> until after that, all changes are frozen both on dev and prod (so we can use
> dev as fall back). Later on I could maybe try some more things on our dev
> boxes.
>
> If you are doing a repro, here's the stuff you'd probably want to know:
> The workload is write only. No reads happening at the same time. No other
> active clients. It is an initial import of data. We do insertions in a MR
> job from the reducers. The total volume is about 11 billion puts across
> roughly 450K rows per table (we have a many columns per row data model)
> across 15 tables, all use LZO. Qualifiers are some 50 bytes. Values range
> from a small number of KBs generally to MBs in rare cases. The row keys have
> a time-related part at the start, so I know the keyspace in advance, so I
> create the empty tables with pre-created regions (40 regions) across the
> keyspace to get decent distribution from the start of the job. In order to
> not overload HBase, I run the job with only 15 reducers, so there are max 15
> concurrent clients active. Other settnigs: max file size is 1GB, HFile block
> size is default 64K, client side buffer is 16M, memstore flush size is 128M,
> compaction threshold is 5, blocking store files is 9, mem store upper limit
> is 20%, lower limit 15%, block cache 40%. During the run, the RSes never
> report more than 5GB of heap usage from the UI, which makes sense, because
> block cache is not touched. On a healthy run with somewhat conservative
> settings right now, HBase reports on average about 380K requests per second
> in the master UI.
>
> The cluster has 8 workers running TT, DN, RS and another JVM process for
> our own software that sits in front of HBase. Workers are dual quad cores
> with 64GB RAM and 10x 600GB disks (we decided to scale the amount of seeks
> we can do concurrently). Disks are quite fast: 10K RPM. MR task VMs get 1GB
> of heap, TT and DN also. RS gets 16GB of heap and our own software too. We
> run 8 mappers and 4 reducers per node. So at absolute max, we should have
> 46GB of allocated heap. That leaves 18GB for JVM overhead, native
> allocations and OS. We run Linux 2.6.18-194.11.4.el5. I think it is CentOS,
> but I didn't do the installs myself.
>
> I tried numerous different settings both more extreme and more conservative
> to get the thing working, but in the end it always ends up swapping. I
> should have tried a run without LZO, of course, but I was out of time by
> then.
>
>
>
> Cheers,
> Friso
>
>
>
> On 12 nov 2010, at 07:06, Todd Lipcon wrote:
>
> > Hrm, any chance you can run with a smaller heap and get a jmap dump? The
> > eclipse MAT tool is also super nice for looking at this stuff if indeed
> they
> > are java objects.
> >
> > What kind of workload are you using? Read mostly? Write mostly? Mixed? I
> > will try to repro.
> >
> > -Todd
> >
> > On Thu, Nov 11, 2010 at 8:41 PM, Friso van Vollenhoven <
> > fvanvollenhoven@xebia.com> wrote:
> >
> >> I figured the same. I also did a run with CMS instead of G1. Same
> results.
> >>
> >> I also did a run with the RS heap tuned down to 12GB and 8GB, but given
> >> enough time the process still grows over 40GB in size.
> >>
> >>
> >> Friso
> >>
> >>
> >>
> >> On 12 nov 2010, at 01:55, Todd Lipcon wrote:
> >>
> >>> Can you try running this with CMS GC instead of G1GC? G1 still has some
> >>> bugs... 64M sounds like it might be G1 "regions"?
> >>>
> >>> -Todd
> >>>
> >>> On Thu, Nov 11, 2010 at 2:07 AM, Friso van Vollenhoven <
> >>> fvanvollenhoven@xebia.com> wrote:
> >>>
> >>>> Hi All,
> >>>>
> >>>> (This is all about CDH3, so I am not sure whether it should go on this
> >>>> list, but I figure it is at least interesting for people trying the
> >> same.)
> >>>>
> >>>> I've recently tried CDH3 on a new cluster from RPMs with the
> hadoop-lzo
> >>>> fork from https://github.com/toddlipcon/hadoop-lzo. Everything works
> >> like
> >>>> a charm initially, but after some time (minutes to max one hour), the
> RS
> >> JVM
> >>>> process memory grows to more than twice the given heap size and
> beyond.
> >> I
> >>>> have seen a RS with 16GB heap that grows to 55GB virtual size. At some
> >>>> point, everything start swapping and GC times go into the minutes and
> >>>> everything dies or is considered dead by the master.
> >>>>
> >>>> I did a pmap -x on the RS process and that shows a lot of allocated
> >> blocks
> >>>> of about 64M by the process. There about 500 of these, which is 32GB
> in
> >>>> total. See: http://pastebin.com/8pgzPf7b (bottom of the file, the
> >> blocks
> >>>> of about 1M on top are probably thread stacks). Unfortunately, Linux
> >> shows
> >>>> the native heap as anon blocks, so I can not link it to a specific lib
> >> or
> >>>> something.
> >>>>
> >>>> I am running the latest CDH3 and hadoop-lzo 0.4.6 (from said URL, the
> >> one
> >>>> which has the reinit() support). I run Java 6u21 with the G1 garbage
> >>>> collector, which has been running fine for some weeks now. Full
> command
> >> line
> >>>> is:
> >>>> java -Xmx16000m -XX:+HeapDumpOnOutOfMemoryError
> >>>> -XX:+UnlockExperimentalVMOptions -XX:+UseG1GC -XX:+UseCompressedOops
> >>>> -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps
> >>>> -Xloggc:/export/logs/hbase/gc-hbase.log
> >>>> -Djava.library.path=/home/inr/java-lib/hbase/native/Linux-amd64-64
> >>>> -Djava.net.preferIPv4Stack=true -Dhbase.log.dir=/export/logs/hbase
> >>>> -Dhbase.log.file=hbase-hbase-regionserver-w3r1.inrdb.ripe.net.log
> >>>> -Dhbase.home.dir=/usr/lib/hbase/bin/.. -Dhbase.id.str=hbase -Dhbase.r
> >>>>
> >>>> I searched the HBase source for something that could point to native
> >> heap
> >>>> usage (like ByteBuffer#allocateDirect(...)), but I could not find
> >> anything.
> >>>> Thread count is about 185 (I have 100 handlers), so nothing strange
> >> there as
> >>>> well.
> >>>>
> >>>> Question is, could this be HBase or is this a problem with the
> >> hadoop-lzo?
> >>>>
> >>>> I have currently downgraded to a version known to work, because we
> have
> >> a
> >>>> demo coming up. But still interested in the answer.
> >>>>
> >>>>
> >>>>
> >>>> Regards,
> >>>> Friso
> >>>>
> >>>>
> >>>
> >>>
> >>> --
> >>> Todd Lipcon
> >>> Software Engineer, Cloudera
> >>
> >>
> >
> >
> > --
> > Todd Lipcon
> > Software Engineer, Cloudera
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Memory leak in LZO native code?

Posted by Friso van Vollenhoven <fv...@xebia.com>.
Hi Todd,

I am afraid I no longer have the broken setup around, because we really need a working one right now. We need to demo at a conference next week and until after that, all changes are frozen both on dev and prod (so we can use dev as fall back). Later on I could maybe try some more things on our dev boxes.

If you are doing a repro, here's the stuff you'd probably want to know:
The workload is write only. No reads happening at the same time. No other active clients. It is an initial import of data. We do insertions in a MR job from the reducers. The total volume is about 11 billion puts across roughly 450K rows per table (we have a many columns per row data model) across 15 tables, all use LZO. Qualifiers are some 50 bytes. Values range from a small number of KBs generally to MBs in rare cases. The row keys have a time-related part at the start, so I know the keyspace in advance, so I create the empty tables with pre-created regions (40 regions) across the keyspace to get decent distribution from the start of the job. In order to not overload HBase, I run the job with only 15 reducers, so there are max 15 concurrent clients active. Other settnigs: max file size is 1GB, HFile block size is default 64K, client side buffer is 16M, memstore flush size is 128M, compaction threshold is 5, blocking store files is 9, mem store upper limit is 20%, lower limit 15%, block cache 40%. During the run, the RSes never report more than 5GB of heap usage from the UI, which makes sense, because block cache is not touched. On a healthy run with somewhat conservative settings right now, HBase reports on average about 380K requests per second in the master UI.

The cluster has 8 workers running TT, DN, RS and another JVM process for our own software that sits in front of HBase. Workers are dual quad cores with 64GB RAM and 10x 600GB disks (we decided to scale the amount of seeks we can do concurrently). Disks are quite fast: 10K RPM. MR task VMs get 1GB of heap, TT and DN also. RS gets 16GB of heap and our own software too. We run 8 mappers and 4 reducers per node. So at absolute max, we should have 46GB of allocated heap. That leaves 18GB for JVM overhead, native allocations and OS. We run Linux 2.6.18-194.11.4.el5. I think it is CentOS, but I didn't do the installs myself.

I tried numerous different settings both more extreme and more conservative to get the thing working, but in the end it always ends up swapping. I should have tried a run without LZO, of course, but I was out of time by then.



Cheers,
Friso



On 12 nov 2010, at 07:06, Todd Lipcon wrote:

> Hrm, any chance you can run with a smaller heap and get a jmap dump? The
> eclipse MAT tool is also super nice for looking at this stuff if indeed they
> are java objects.
> 
> What kind of workload are you using? Read mostly? Write mostly? Mixed? I
> will try to repro.
> 
> -Todd
> 
> On Thu, Nov 11, 2010 at 8:41 PM, Friso van Vollenhoven <
> fvanvollenhoven@xebia.com> wrote:
> 
>> I figured the same. I also did a run with CMS instead of G1. Same results.
>> 
>> I also did a run with the RS heap tuned down to 12GB and 8GB, but given
>> enough time the process still grows over 40GB in size.
>> 
>> 
>> Friso
>> 
>> 
>> 
>> On 12 nov 2010, at 01:55, Todd Lipcon wrote:
>> 
>>> Can you try running this with CMS GC instead of G1GC? G1 still has some
>>> bugs... 64M sounds like it might be G1 "regions"?
>>> 
>>> -Todd
>>> 
>>> On Thu, Nov 11, 2010 at 2:07 AM, Friso van Vollenhoven <
>>> fvanvollenhoven@xebia.com> wrote:
>>> 
>>>> Hi All,
>>>> 
>>>> (This is all about CDH3, so I am not sure whether it should go on this
>>>> list, but I figure it is at least interesting for people trying the
>> same.)
>>>> 
>>>> I've recently tried CDH3 on a new cluster from RPMs with the hadoop-lzo
>>>> fork from https://github.com/toddlipcon/hadoop-lzo. Everything works
>> like
>>>> a charm initially, but after some time (minutes to max one hour), the RS
>> JVM
>>>> process memory grows to more than twice the given heap size and beyond.
>> I
>>>> have seen a RS with 16GB heap that grows to 55GB virtual size. At some
>>>> point, everything start swapping and GC times go into the minutes and
>>>> everything dies or is considered dead by the master.
>>>> 
>>>> I did a pmap -x on the RS process and that shows a lot of allocated
>> blocks
>>>> of about 64M by the process. There about 500 of these, which is 32GB in
>>>> total. See: http://pastebin.com/8pgzPf7b (bottom of the file, the
>> blocks
>>>> of about 1M on top are probably thread stacks). Unfortunately, Linux
>> shows
>>>> the native heap as anon blocks, so I can not link it to a specific lib
>> or
>>>> something.
>>>> 
>>>> I am running the latest CDH3 and hadoop-lzo 0.4.6 (from said URL, the
>> one
>>>> which has the reinit() support). I run Java 6u21 with the G1 garbage
>>>> collector, which has been running fine for some weeks now. Full command
>> line
>>>> is:
>>>> java -Xmx16000m -XX:+HeapDumpOnOutOfMemoryError
>>>> -XX:+UnlockExperimentalVMOptions -XX:+UseG1GC -XX:+UseCompressedOops
>>>> -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps
>>>> -Xloggc:/export/logs/hbase/gc-hbase.log
>>>> -Djava.library.path=/home/inr/java-lib/hbase/native/Linux-amd64-64
>>>> -Djava.net.preferIPv4Stack=true -Dhbase.log.dir=/export/logs/hbase
>>>> -Dhbase.log.file=hbase-hbase-regionserver-w3r1.inrdb.ripe.net.log
>>>> -Dhbase.home.dir=/usr/lib/hbase/bin/.. -Dhbase.id.str=hbase -Dhbase.r
>>>> 
>>>> I searched the HBase source for something that could point to native
>> heap
>>>> usage (like ByteBuffer#allocateDirect(...)), but I could not find
>> anything.
>>>> Thread count is about 185 (I have 100 handlers), so nothing strange
>> there as
>>>> well.
>>>> 
>>>> Question is, could this be HBase or is this a problem with the
>> hadoop-lzo?
>>>> 
>>>> I have currently downgraded to a version known to work, because we have
>> a
>>>> demo coming up. But still interested in the answer.
>>>> 
>>>> 
>>>> 
>>>> Regards,
>>>> Friso
>>>> 
>>>> 
>>> 
>>> 
>>> --
>>> Todd Lipcon
>>> Software Engineer, Cloudera
>> 
>> 
> 
> 
> -- 
> Todd Lipcon
> Software Engineer, Cloudera


Re: Memory leak in LZO native code?

Posted by Todd Lipcon <to...@cloudera.com>.
Hrm, any chance you can run with a smaller heap and get a jmap dump? The
eclipse MAT tool is also super nice for looking at this stuff if indeed they
are java objects.

What kind of workload are you using? Read mostly? Write mostly? Mixed? I
will try to repro.

-Todd

On Thu, Nov 11, 2010 at 8:41 PM, Friso van Vollenhoven <
fvanvollenhoven@xebia.com> wrote:

> I figured the same. I also did a run with CMS instead of G1. Same results.
>
> I also did a run with the RS heap tuned down to 12GB and 8GB, but given
> enough time the process still grows over 40GB in size.
>
>
> Friso
>
>
>
> On 12 nov 2010, at 01:55, Todd Lipcon wrote:
>
> > Can you try running this with CMS GC instead of G1GC? G1 still has some
> > bugs... 64M sounds like it might be G1 "regions"?
> >
> > -Todd
> >
> > On Thu, Nov 11, 2010 at 2:07 AM, Friso van Vollenhoven <
> > fvanvollenhoven@xebia.com> wrote:
> >
> >> Hi All,
> >>
> >> (This is all about CDH3, so I am not sure whether it should go on this
> >> list, but I figure it is at least interesting for people trying the
> same.)
> >>
> >> I've recently tried CDH3 on a new cluster from RPMs with the hadoop-lzo
> >> fork from https://github.com/toddlipcon/hadoop-lzo. Everything works
> like
> >> a charm initially, but after some time (minutes to max one hour), the RS
> JVM
> >> process memory grows to more than twice the given heap size and beyond.
> I
> >> have seen a RS with 16GB heap that grows to 55GB virtual size. At some
> >> point, everything start swapping and GC times go into the minutes and
> >> everything dies or is considered dead by the master.
> >>
> >> I did a pmap -x on the RS process and that shows a lot of allocated
> blocks
> >> of about 64M by the process. There about 500 of these, which is 32GB in
> >> total. See: http://pastebin.com/8pgzPf7b (bottom of the file, the
> blocks
> >> of about 1M on top are probably thread stacks). Unfortunately, Linux
> shows
> >> the native heap as anon blocks, so I can not link it to a specific lib
> or
> >> something.
> >>
> >> I am running the latest CDH3 and hadoop-lzo 0.4.6 (from said URL, the
> one
> >> which has the reinit() support). I run Java 6u21 with the G1 garbage
> >> collector, which has been running fine for some weeks now. Full command
> line
> >> is:
> >> java -Xmx16000m -XX:+HeapDumpOnOutOfMemoryError
> >> -XX:+UnlockExperimentalVMOptions -XX:+UseG1GC -XX:+UseCompressedOops
> >> -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps
> >> -Xloggc:/export/logs/hbase/gc-hbase.log
> >> -Djava.library.path=/home/inr/java-lib/hbase/native/Linux-amd64-64
> >> -Djava.net.preferIPv4Stack=true -Dhbase.log.dir=/export/logs/hbase
> >> -Dhbase.log.file=hbase-hbase-regionserver-w3r1.inrdb.ripe.net.log
> >> -Dhbase.home.dir=/usr/lib/hbase/bin/.. -Dhbase.id.str=hbase -Dhbase.r
> >>
> >> I searched the HBase source for something that could point to native
> heap
> >> usage (like ByteBuffer#allocateDirect(...)), but I could not find
> anything.
> >> Thread count is about 185 (I have 100 handlers), so nothing strange
> there as
> >> well.
> >>
> >> Question is, could this be HBase or is this a problem with the
> hadoop-lzo?
> >>
> >> I have currently downgraded to a version known to work, because we have
> a
> >> demo coming up. But still interested in the answer.
> >>
> >>
> >>
> >> Regards,
> >> Friso
> >>
> >>
> >
> >
> > --
> > Todd Lipcon
> > Software Engineer, Cloudera
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Memory leak in LZO native code?

Posted by Ted Yu <yu...@gmail.com>.
Have you used YourKit ?
It can show you the class instances which consume the most heap memory.

On Thu, Nov 11, 2010 at 8:41 PM, Friso van Vollenhoven <
fvanvollenhoven@xebia.com> wrote:

> I figured the same. I also did a run with CMS instead of G1. Same results.
>
> I also did a run with the RS heap tuned down to 12GB and 8GB, but given
> enough time the process still grows over 40GB in size.
>
>
> Friso
>
>
>
> On 12 nov 2010, at 01:55, Todd Lipcon wrote:
>
> > Can you try running this with CMS GC instead of G1GC? G1 still has some
> > bugs... 64M sounds like it might be G1 "regions"?
> >
> > -Todd
> >
> > On Thu, Nov 11, 2010 at 2:07 AM, Friso van Vollenhoven <
> > fvanvollenhoven@xebia.com> wrote:
> >
> >> Hi All,
> >>
> >> (This is all about CDH3, so I am not sure whether it should go on this
> >> list, but I figure it is at least interesting for people trying the
> same.)
> >>
> >> I've recently tried CDH3 on a new cluster from RPMs with the hadoop-lzo
> >> fork from https://github.com/toddlipcon/hadoop-lzo. Everything works
> like
> >> a charm initially, but after some time (minutes to max one hour), the RS
> JVM
> >> process memory grows to more than twice the given heap size and beyond.
> I
> >> have seen a RS with 16GB heap that grows to 55GB virtual size. At some
> >> point, everything start swapping and GC times go into the minutes and
> >> everything dies or is considered dead by the master.
> >>
> >> I did a pmap -x on the RS process and that shows a lot of allocated
> blocks
> >> of about 64M by the process. There about 500 of these, which is 32GB in
> >> total. See: http://pastebin.com/8pgzPf7b (bottom of the file, the
> blocks
> >> of about 1M on top are probably thread stacks). Unfortunately, Linux
> shows
> >> the native heap as anon blocks, so I can not link it to a specific lib
> or
> >> something.
> >>
> >> I am running the latest CDH3 and hadoop-lzo 0.4.6 (from said URL, the
> one
> >> which has the reinit() support). I run Java 6u21 with the G1 garbage
> >> collector, which has been running fine for some weeks now. Full command
> line
> >> is:
> >> java -Xmx16000m -XX:+HeapDumpOnOutOfMemoryError
> >> -XX:+UnlockExperimentalVMOptions -XX:+UseG1GC -XX:+UseCompressedOops
> >> -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps
> >> -Xloggc:/export/logs/hbase/gc-hbase.log
> >> -Djava.library.path=/home/inr/java-lib/hbase/native/Linux-amd64-64
> >> -Djava.net.preferIPv4Stack=true -Dhbase.log.dir=/export/logs/hbase
> >> -Dhbase.log.file=hbase-hbase-regionserver-w3r1.inrdb.ripe.net.log
> >> -Dhbase.home.dir=/usr/lib/hbase/bin/.. -Dhbase.id.str=hbase -Dhbase.r
> >>
> >> I searched the HBase source for something that could point to native
> heap
> >> usage (like ByteBuffer#allocateDirect(...)), but I could not find
> anything.
> >> Thread count is about 185 (I have 100 handlers), so nothing strange
> there as
> >> well.
> >>
> >> Question is, could this be HBase or is this a problem with the
> hadoop-lzo?
> >>
> >> I have currently downgraded to a version known to work, because we have
> a
> >> demo coming up. But still interested in the answer.
> >>
> >>
> >>
> >> Regards,
> >> Friso
> >>
> >>
> >
> >
> > --
> > Todd Lipcon
> > Software Engineer, Cloudera
>
>

Re: Memory leak in LZO native code?

Posted by Friso van Vollenhoven <fv...@xebia.com>.
I figured the same. I also did a run with CMS instead of G1. Same results.

I also did a run with the RS heap tuned down to 12GB and 8GB, but given enough time the process still grows over 40GB in size.


Friso



On 12 nov 2010, at 01:55, Todd Lipcon wrote:

> Can you try running this with CMS GC instead of G1GC? G1 still has some
> bugs... 64M sounds like it might be G1 "regions"?
> 
> -Todd
> 
> On Thu, Nov 11, 2010 at 2:07 AM, Friso van Vollenhoven <
> fvanvollenhoven@xebia.com> wrote:
> 
>> Hi All,
>> 
>> (This is all about CDH3, so I am not sure whether it should go on this
>> list, but I figure it is at least interesting for people trying the same.)
>> 
>> I've recently tried CDH3 on a new cluster from RPMs with the hadoop-lzo
>> fork from https://github.com/toddlipcon/hadoop-lzo. Everything works like
>> a charm initially, but after some time (minutes to max one hour), the RS JVM
>> process memory grows to more than twice the given heap size and beyond. I
>> have seen a RS with 16GB heap that grows to 55GB virtual size. At some
>> point, everything start swapping and GC times go into the minutes and
>> everything dies or is considered dead by the master.
>> 
>> I did a pmap -x on the RS process and that shows a lot of allocated blocks
>> of about 64M by the process. There about 500 of these, which is 32GB in
>> total. See: http://pastebin.com/8pgzPf7b (bottom of the file, the blocks
>> of about 1M on top are probably thread stacks). Unfortunately, Linux shows
>> the native heap as anon blocks, so I can not link it to a specific lib or
>> something.
>> 
>> I am running the latest CDH3 and hadoop-lzo 0.4.6 (from said URL, the one
>> which has the reinit() support). I run Java 6u21 with the G1 garbage
>> collector, which has been running fine for some weeks now. Full command line
>> is:
>> java -Xmx16000m -XX:+HeapDumpOnOutOfMemoryError
>> -XX:+UnlockExperimentalVMOptions -XX:+UseG1GC -XX:+UseCompressedOops
>> -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps
>> -Xloggc:/export/logs/hbase/gc-hbase.log
>> -Djava.library.path=/home/inr/java-lib/hbase/native/Linux-amd64-64
>> -Djava.net.preferIPv4Stack=true -Dhbase.log.dir=/export/logs/hbase
>> -Dhbase.log.file=hbase-hbase-regionserver-w3r1.inrdb.ripe.net.log
>> -Dhbase.home.dir=/usr/lib/hbase/bin/.. -Dhbase.id.str=hbase -Dhbase.r
>> 
>> I searched the HBase source for something that could point to native heap
>> usage (like ByteBuffer#allocateDirect(...)), but I could not find anything.
>> Thread count is about 185 (I have 100 handlers), so nothing strange there as
>> well.
>> 
>> Question is, could this be HBase or is this a problem with the hadoop-lzo?
>> 
>> I have currently downgraded to a version known to work, because we have a
>> demo coming up. But still interested in the answer.
>> 
>> 
>> 
>> Regards,
>> Friso
>> 
>> 
> 
> 
> -- 
> Todd Lipcon
> Software Engineer, Cloudera


Re: Memory leak in LZO native code?

Posted by Todd Lipcon <to...@cloudera.com>.
Can you try running this with CMS GC instead of G1GC? G1 still has some
bugs... 64M sounds like it might be G1 "regions"?

-Todd

On Thu, Nov 11, 2010 at 2:07 AM, Friso van Vollenhoven <
fvanvollenhoven@xebia.com> wrote:

> Hi All,
>
> (This is all about CDH3, so I am not sure whether it should go on this
> list, but I figure it is at least interesting for people trying the same.)
>
> I've recently tried CDH3 on a new cluster from RPMs with the hadoop-lzo
> fork from https://github.com/toddlipcon/hadoop-lzo. Everything works like
> a charm initially, but after some time (minutes to max one hour), the RS JVM
> process memory grows to more than twice the given heap size and beyond. I
> have seen a RS with 16GB heap that grows to 55GB virtual size. At some
> point, everything start swapping and GC times go into the minutes and
> everything dies or is considered dead by the master.
>
> I did a pmap -x on the RS process and that shows a lot of allocated blocks
> of about 64M by the process. There about 500 of these, which is 32GB in
> total. See: http://pastebin.com/8pgzPf7b (bottom of the file, the blocks
> of about 1M on top are probably thread stacks). Unfortunately, Linux shows
> the native heap as anon blocks, so I can not link it to a specific lib or
> something.
>
> I am running the latest CDH3 and hadoop-lzo 0.4.6 (from said URL, the one
> which has the reinit() support). I run Java 6u21 with the G1 garbage
> collector, which has been running fine for some weeks now. Full command line
> is:
> java -Xmx16000m -XX:+HeapDumpOnOutOfMemoryError
> -XX:+UnlockExperimentalVMOptions -XX:+UseG1GC -XX:+UseCompressedOops
> -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps
> -Xloggc:/export/logs/hbase/gc-hbase.log
> -Djava.library.path=/home/inr/java-lib/hbase/native/Linux-amd64-64
> -Djava.net.preferIPv4Stack=true -Dhbase.log.dir=/export/logs/hbase
> -Dhbase.log.file=hbase-hbase-regionserver-w3r1.inrdb.ripe.net.log
> -Dhbase.home.dir=/usr/lib/hbase/bin/.. -Dhbase.id.str=hbase -Dhbase.r
>
> I searched the HBase source for something that could point to native heap
> usage (like ByteBuffer#allocateDirect(...)), but I could not find anything.
> Thread count is about 185 (I have 100 handlers), so nothing strange there as
> well.
>
> Question is, could this be HBase or is this a problem with the hadoop-lzo?
>
> I have currently downgraded to a version known to work, because we have a
> demo coming up. But still interested in the answer.
>
>
>
> Regards,
> Friso
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Memory leak in LZO native code?

Posted by Ryan Rawson <ry...@gmail.com>.
Hey,

We are not allocating anything direct byte buffer-y inside HBase code,
so it seems like there is a bug in the either the LZO connector
library or the LZO native connector library.  Chunks of data around
the size of 64k sounds like HFile blocks... I don't have the LZO code
in front of me now, I'll have to look later.

Right now we are using an older version of hadoop-gpl-compression
library, so maybe the newer one is buggy?

-ryan

On Thu, Nov 11, 2010 at 2:07 AM, Friso van Vollenhoven
<fv...@xebia.com> wrote:
> Hi All,
>
> (This is all about CDH3, so I am not sure whether it should go on this list, but I figure it is at least interesting for people trying the same.)
>
> I've recently tried CDH3 on a new cluster from RPMs with the hadoop-lzo fork from https://github.com/toddlipcon/hadoop-lzo. Everything works like a charm initially, but after some time (minutes to max one hour), the RS JVM process memory grows to more than twice the given heap size and beyond. I have seen a RS with 16GB heap that grows to 55GB virtual size. At some point, everything start swapping and GC times go into the minutes and everything dies or is considered dead by the master.
>
> I did a pmap -x on the RS process and that shows a lot of allocated blocks of about 64M by the process. There about 500 of these, which is 32GB in total. See: http://pastebin.com/8pgzPf7b (bottom of the file, the blocks of about 1M on top are probably thread stacks). Unfortunately, Linux shows the native heap as anon blocks, so I can not link it to a specific lib or something.
>
> I am running the latest CDH3 and hadoop-lzo 0.4.6 (from said URL, the one which has the reinit() support). I run Java 6u21 with the G1 garbage collector, which has been running fine for some weeks now. Full command line is:
> java -Xmx16000m -XX:+HeapDumpOnOutOfMemoryError -XX:+UnlockExperimentalVMOptions -XX:+UseG1GC -XX:+UseCompressedOops -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/export/logs/hbase/gc-hbase.log -Djava.library.path=/home/inr/java-lib/hbase/native/Linux-amd64-64 -Djava.net.preferIPv4Stack=true -Dhbase.log.dir=/export/logs/hbase -Dhbase.log.file=hbase-hbase-regionserver-w3r1.inrdb.ripe.net.log -Dhbase.home.dir=/usr/lib/hbase/bin/.. -Dhbase.id.str=hbase -Dhbase.r
>
> I searched the HBase source for something that could point to native heap usage (like ByteBuffer#allocateDirect(...)), but I could not find anything. Thread count is about 185 (I have 100 handlers), so nothing strange there as well.
>
> Question is, could this be HBase or is this a problem with the hadoop-lzo?
>
> I have currently downgraded to a version known to work, because we have a demo coming up. But still interested in the answer.
>
>
>
> Regards,
> Friso
>
>

Memory leak in LZO native code?

Posted by Friso van Vollenhoven <fv...@xebia.com>.
Hi All,

(This is all about CDH3, so I am not sure whether it should go on this list, but I figure it is at least interesting for people trying the same.)

I've recently tried CDH3 on a new cluster from RPMs with the hadoop-lzo fork from https://github.com/toddlipcon/hadoop-lzo. Everything works like a charm initially, but after some time (minutes to max one hour), the RS JVM process memory grows to more than twice the given heap size and beyond. I have seen a RS with 16GB heap that grows to 55GB virtual size. At some point, everything start swapping and GC times go into the minutes and everything dies or is considered dead by the master.

I did a pmap -x on the RS process and that shows a lot of allocated blocks of about 64M by the process. There about 500 of these, which is 32GB in total. See: http://pastebin.com/8pgzPf7b (bottom of the file, the blocks of about 1M on top are probably thread stacks). Unfortunately, Linux shows the native heap as anon blocks, so I can not link it to a specific lib or something.

I am running the latest CDH3 and hadoop-lzo 0.4.6 (from said URL, the one which has the reinit() support). I run Java 6u21 with the G1 garbage collector, which has been running fine for some weeks now. Full command line is:
java -Xmx16000m -XX:+HeapDumpOnOutOfMemoryError -XX:+UnlockExperimentalVMOptions -XX:+UseG1GC -XX:+UseCompressedOops -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/export/logs/hbase/gc-hbase.log -Djava.library.path=/home/inr/java-lib/hbase/native/Linux-amd64-64 -Djava.net.preferIPv4Stack=true -Dhbase.log.dir=/export/logs/hbase -Dhbase.log.file=hbase-hbase-regionserver-w3r1.inrdb.ripe.net.log -Dhbase.home.dir=/usr/lib/hbase/bin/.. -Dhbase.id.str=hbase -Dhbase.r

I searched the HBase source for something that could point to native heap usage (like ByteBuffer#allocateDirect(...)), but I could not find anything. Thread count is about 185 (I have 100 handlers), so nothing strange there as well.

Question is, could this be HBase or is this a problem with the hadoop-lzo?

I have currently downgraded to a version known to work, because we have a demo coming up. But still interested in the answer.



Regards,
Friso


Re: Data taking up too much space when put into HBase

Posted by Hari Sreekumar <hs...@clickable.com>.
Here's the output of lsr on one of the tables:

drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:33
/hbase/Webevent/1102232448
-rw-r--r--   3 hadoop supergroup       2318 2010-11-11 13:33
/hbase/Webevent/1102232448/.regioninfo
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:33
/hbase/Webevent/1102232448/Channel
-rw-r--r--   3 hadoop supergroup   16943616 2010-11-11 13:33
/hbase/Webevent/1102232448/Channel/7714679806810147132
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:33
/hbase/Webevent/1102232448/Customer
-rw-r--r--   3 hadoop supergroup   19089809 2010-11-11 13:33
/hbase/Webevent/1102232448/Customer/228422950590673569
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:33
/hbase/Webevent/1102232448/Event
-rw-r--r--   3 hadoop supergroup   96925019 2010-11-11 13:33
/hbase/Webevent/1102232448/Event/3246797304454611713
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:34
/hbase/Webevent/1102232448/User
-rw-r--r--   3 hadoop supergroup  176008329 2010-11-11 13:33
/hbase/Webevent/1102232448/User/6713166405821540696
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:34
/hbase/Webevent/1102232448/http
-rw-r--r--   3 hadoop supergroup   36644077 2010-11-11 13:34
/hbase/Webevent/1102232448/http/5528514474393215140
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:40
/hbase/Webevent/1181349092
-rw-r--r--   3 hadoop supergroup       2318 2010-11-11 13:40
/hbase/Webevent/1181349092/.regioninfo
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:40
/hbase/Webevent/1181349092/Channel
-rw-r--r--   3 hadoop supergroup   14203831 2010-11-11 13:40
/hbase/Webevent/1181349092/Channel/1711324265142021994
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:40
/hbase/Webevent/1181349092/Customer
-rw-r--r--   3 hadoop supergroup   14091927 2010-11-11 13:40
/hbase/Webevent/1181349092/Customer/3269372098573435637
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:40
/hbase/Webevent/1181349092/Event
-rw-r--r--   3 hadoop supergroup   80842368 2010-11-11 13:40
/hbase/Webevent/1181349092/Event/1632526964097525926
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:41
/hbase/Webevent/1181349092/User
-rw-r--r--   3 hadoop supergroup  146490419 2010-11-11 13:40
/hbase/Webevent/1181349092/User/723684665063798772
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:41
/hbase/Webevent/1181349092/http
-rw-r--r--   3 hadoop supergroup   27612664 2010-11-11 13:41
/hbase/Webevent/1181349092/http/3591070734425406504
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:28
/hbase/Webevent/124990928
-rw-r--r--   3 hadoop supergroup       2317 2010-11-11 13:28
/hbase/Webevent/124990928/.regioninfo
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:35
/hbase/Webevent/124990928/Channel
-rw-r--r--   3 hadoop supergroup   23700865 2010-11-11 13:35
/hbase/Webevent/124990928/Channel/3488091559288595522
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:35
/hbase/Webevent/124990928/Customer
-rw-r--r--   3 hadoop supergroup   23572454 2010-11-11 13:35
/hbase/Webevent/124990928/Customer/522070966307001888
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:36
/hbase/Webevent/124990928/Event
-rw-r--r--   3 hadoop supergroup  126857284 2010-11-11 13:35
/hbase/Webevent/124990928/Event/8659573512216796018
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:36
/hbase/Webevent/124990928/User
-rw-r--r--   3 hadoop supergroup  229590074 2010-11-11 13:36
/hbase/Webevent/124990928/User/4169913968975354294
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:36
/hbase/Webevent/124990928/http
-rw-r--r--   3 hadoop supergroup   43849622 2010-11-11 13:36
/hbase/Webevent/124990928/http/798925777717846362
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:22
/hbase/Webevent/13518424
-rw-r--r--   3 hadoop supergroup       2316 2010-11-11 13:22
/hbase/Webevent/13518424/.regioninfo
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:24
/hbase/Webevent/13518424/Channel
-rw-r--r--   3 hadoop supergroup   11192244 2010-11-11 13:24
/hbase/Webevent/13518424/Channel/6283534518250465269
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:24
/hbase/Webevent/13518424/Customer
-rw-r--r--   3 hadoop supergroup   16335757 2010-11-11 13:24
/hbase/Webevent/13518424/Customer/8233555538562313638
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:25
/hbase/Webevent/13518424/Event
-rw-r--r--   3 hadoop supergroup   86782869 2010-11-11 13:24
/hbase/Webevent/13518424/Event/7296313542067955537
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:25
/hbase/Webevent/13518424/User
-rw-r--r--   3 hadoop supergroup  157614762 2010-11-11 13:25
/hbase/Webevent/13518424/User/5713897981539665344
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:25
/hbase/Webevent/13518424/http
-rw-r--r--   3 hadoop supergroup   31036461 2010-11-11 13:25
/hbase/Webevent/13518424/http/3276765473089850908
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:22
/hbase/Webevent/1397796225
-rw-r--r--   3 hadoop supergroup       2144 2010-11-11 13:22
/hbase/Webevent/1397796225/.regioninfo
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:30
/hbase/Webevent/1397796225/Channel
-rw-r--r--   3 hadoop supergroup    3937460 2010-11-11 13:30
/hbase/Webevent/1397796225/Channel/3684194843745008101
-rw-r--r--   3 hadoop supergroup   13426908 2010-11-11 13:27
/hbase/Webevent/1397796225/Channel/5763776518727398923
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:30
/hbase/Webevent/1397796225/Customer
-rw-r--r--   3 hadoop supergroup    9358001 2010-11-11 13:30
/hbase/Webevent/1397796225/Customer/2373893879659383981
-rw-r--r--   3 hadoop supergroup   15152448 2010-11-11 13:23
/hbase/Webevent/1397796225/Customer/5404281688196690956
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:31
/hbase/Webevent/1397796225/Event
-rw-r--r--   3 hadoop supergroup   49691275 2010-11-11 13:30
/hbase/Webevent/1397796225/Event/1611219478516160819
-rw-r--r--   3 hadoop supergroup   80075191 2010-11-11 13:23
/hbase/Webevent/1397796225/Event/4491108423840726530
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:31
/hbase/Webevent/1397796225/User
-rw-r--r--   3 hadoop supergroup  235564578 2010-11-11 13:31
/hbase/Webevent/1397796225/User/1070607442453415896
-rw-r--r--   3 hadoop supergroup  145355910 2010-11-11 13:23
/hbase/Webevent/1397796225/User/6446151707620200218
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:32
/hbase/Webevent/1397796225/http
-rw-r--r--   3 hadoop supergroup   46665707 2010-11-11 13:31
/hbase/Webevent/1397796225/http/2613117415168100829
-rw-r--r--   3 hadoop supergroup   28997988 2010-11-11 13:24
/hbase/Webevent/1397796225/http/7620282531029987336
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:30
/hbase/Webevent/1568886745
-rw-r--r--   3 hadoop supergroup       2318 2010-11-11 13:30
/hbase/Webevent/1568886745/.regioninfo
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:32
/hbase/Webevent/1568886745/Channel
-rw-r--r--   3 hadoop supergroup   22384663 2010-11-11 13:32
/hbase/Webevent/1568886745/Channel/3092028782443043693
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:32
/hbase/Webevent/1568886745/Customer
-rw-r--r--   3 hadoop supergroup   24060024 2010-11-11 13:32
/hbase/Webevent/1568886745/Customer/2143995643997658656
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:32
/hbase/Webevent/1568886745/Event
-rw-r--r--   3 hadoop supergroup  111172989 2010-11-11 13:32
/hbase/Webevent/1568886745/Event/606180646892333139
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:33
/hbase/Webevent/1568886745/User
-rw-r--r--   3 hadoop supergroup  201627486 2010-11-11 13:32
/hbase/Webevent/1568886745/User/1159084185112718235
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:33
/hbase/Webevent/1568886745/http
-rw-r--r--   3 hadoop supergroup   42824881 2010-11-11 13:33
/hbase/Webevent/1568886745/http/3005498889980823864
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:32
/hbase/Webevent/1585185360
-rw-r--r--   3 hadoop supergroup       2318 2010-11-11 13:32
/hbase/Webevent/1585185360/.regioninfo
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:38
/hbase/Webevent/1585185360/Channel
-rw-r--r--   3 hadoop supergroup   13146621 2010-11-11 13:38
/hbase/Webevent/1585185360/Channel/2384148253824087933
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:38
/hbase/Webevent/1585185360/Customer
-rw-r--r--   3 hadoop supergroup   17772527 2010-11-11 13:38
/hbase/Webevent/1585185360/Customer/7079893521022823531
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:39
/hbase/Webevent/1585185360/Event
-rw-r--r--   3 hadoop supergroup   97860459 2010-11-11 13:38
/hbase/Webevent/1585185360/Event/4129421247504808018
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:39
/hbase/Webevent/1585185360/User
-rw-r--r--   3 hadoop supergroup  177262872 2010-11-11 13:39
/hbase/Webevent/1585185360/User/5689647586095222756
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:39
/hbase/Webevent/1585185360/http
-rw-r--r--   3 hadoop supergroup   38392938 2010-11-11 13:39
/hbase/Webevent/1585185360/http/1513015171284860625
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:31
/hbase/Webevent/1679169023
-rw-r--r--   3 hadoop supergroup       1970 2010-11-11 13:31
/hbase/Webevent/1679169023/.regioninfo
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:37
/hbase/Webevent/1679169023/Channel
-rw-r--r--   3 hadoop supergroup   16691718 2010-11-11 13:37
/hbase/Webevent/1679169023/Channel/3995013105248642215
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:37
/hbase/Webevent/1679169023/Customer
-rw-r--r--   3 hadoop supergroup   18627546 2010-11-11 13:37
/hbase/Webevent/1679169023/Customer/2697135409291299740
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:37
/hbase/Webevent/1679169023/Event
-rw-r--r--   3 hadoop supergroup   97721412 2010-11-11 13:37
/hbase/Webevent/1679169023/Event/5517850771377063599
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:38
/hbase/Webevent/1679169023/User
-rw-r--r--   3 hadoop supergroup  177198181 2010-11-11 13:37
/hbase/Webevent/1679169023/User/1664697801534568988
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:38
/hbase/Webevent/1679169023/http
-rw-r--r--   3 hadoop supergroup   35558386 2010-11-11 13:38
/hbase/Webevent/1679169023/http/2236900881608337670
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:30
/hbase/Webevent/1837902643
-rw-r--r--   3 hadoop supergroup       2318 2010-11-11 13:30
/hbase/Webevent/1837902643/.regioninfo
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:33
/hbase/Webevent/1837902643/Channel
-rw-r--r--   3 hadoop supergroup   12956819 2010-11-11 13:33
/hbase/Webevent/1837902643/Channel/7551397343290053516
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:33
/hbase/Webevent/1837902643/Customer
-rw-r--r--   3 hadoop supergroup   18017948 2010-11-11 13:33
/hbase/Webevent/1837902643/Customer/1637842838964675843
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:33
/hbase/Webevent/1837902643/Event
-rw-r--r--   3 hadoop supergroup   99238886 2010-11-11 13:33
/hbase/Webevent/1837902643/Event/4961580175946952300
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:34
/hbase/Webevent/1837902643/User
-rw-r--r--   3 hadoop supergroup  179431668 2010-11-11 13:33
/hbase/Webevent/1837902643/User/8513763763938668916
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:34
/hbase/Webevent/1837902643/http
-rw-r--r--   3 hadoop supergroup   35275755 2010-11-11 13:34
/hbase/Webevent/1837902643/http/1801439100480395261
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:25
/hbase/Webevent/1840258192
-rw-r--r--   3 hadoop supergroup       2318 2010-11-11 13:25
/hbase/Webevent/1840258192/.regioninfo
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:34
/hbase/Webevent/1840258192/Channel
-rw-r--r--   3 hadoop supergroup   15810928 2010-11-11 13:34
/hbase/Webevent/1840258192/Channel/8758451310929982789
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:34
/hbase/Webevent/1840258192/Customer
-rw-r--r--   3 hadoop supergroup   16184063 2010-11-11 13:34
/hbase/Webevent/1840258192/Customer/8209107027540853853
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:35
/hbase/Webevent/1840258192/Event
-rw-r--r--   3 hadoop supergroup   89893065 2010-11-11 13:34
/hbase/Webevent/1840258192/Event/2507733338503153306
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:35
/hbase/Webevent/1840258192/User
-rw-r--r--   3 hadoop supergroup  162202298 2010-11-11 13:35
/hbase/Webevent/1840258192/User/3877054643528147835
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:35
/hbase/Webevent/1840258192/http
-rw-r--r--   3 hadoop supergroup   30458950 2010-11-11 13:35
/hbase/Webevent/1840258192/http/7057895626422451135
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:28
/hbase/Webevent/1857066524
-rw-r--r--   3 hadoop supergroup       2318 2010-11-11 13:28
/hbase/Webevent/1857066524/.regioninfo
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:28
/hbase/Webevent/1857066524/Channel
-rw-r--r--   3 hadoop supergroup   17158229 2010-11-11 13:28
/hbase/Webevent/1857066524/Channel/660294007043817390
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:28
/hbase/Webevent/1857066524/Customer
-rw-r--r--   3 hadoop supergroup   17982120 2010-11-11 13:28
/hbase/Webevent/1857066524/Customer/8154314358497892797
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:36
/hbase/Webevent/1857066524/Event
-rw-r--r--   3 hadoop supergroup  103807737 2010-11-11 13:28
/hbase/Webevent/1857066524/Event/8608458148878068560
-rw-r--r--   3 hadoop supergroup  103807737 2010-11-11 13:36
/hbase/Webevent/1857066524/Event/8753716512365715611
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:29
/hbase/Webevent/1857066524/User
-rw-r--r--   3 hadoop supergroup  188208796 2010-11-11 13:28
/hbase/Webevent/1857066524/User/5807656088473870598
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:29
/hbase/Webevent/1857066524/http
-rw-r--r--   3 hadoop supergroup   35830676 2010-11-11 13:29
/hbase/Webevent/1857066524/http/4192260931766222885
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:31
/hbase/Webevent/1954991296
-rw-r--r--   3 hadoop supergroup       2318 2010-11-11 13:31
/hbase/Webevent/1954991296/.regioninfo
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:38
/hbase/Webevent/1954991296/Channel
-rw-r--r--   3 hadoop supergroup   14723821 2010-11-11 13:38
/hbase/Webevent/1954991296/Channel/1271796192395132719
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:38
/hbase/Webevent/1954991296/Customer
-rw-r--r--   3 hadoop supergroup   16998002 2010-11-11 13:38
/hbase/Webevent/1954991296/Customer/1871613240079217431
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:38
/hbase/Webevent/1954991296/Event
-rw-r--r--   3 hadoop supergroup   90132913 2010-11-11 13:38
/hbase/Webevent/1954991296/Event/8627908912432238564
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:38
/hbase/Webevent/1954991296/User
-rw-r--r--   3 hadoop supergroup  163362248 2010-11-11 13:38
/hbase/Webevent/1954991296/User/8343583184278031381
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:38
/hbase/Webevent/1954991296/http
-rw-r--r--   3 hadoop supergroup   37650515 2010-11-11 13:38
/hbase/Webevent/1954991296/http/783502764043910698
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:16
/hbase/Webevent/387441199
-rw-r--r--   3 hadoop supergroup       2317 2010-11-11 13:16
/hbase/Webevent/387441199/.regioninfo
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:23
/hbase/Webevent/387441199/Channel
-rw-r--r--   3 hadoop supergroup    8751094 2010-11-11 13:22
/hbase/Webevent/387441199/Channel/6907788666949153760
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:23
/hbase/Webevent/387441199/Customer
-rw-r--r--   3 hadoop supergroup   16526400 2010-11-11 13:23
/hbase/Webevent/387441199/Customer/52924882214004995
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:23
/hbase/Webevent/387441199/Event
-rw-r--r--   3 hadoop supergroup   96466783 2010-11-11 13:23
/hbase/Webevent/387441199/Event/991918398642333797
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:24
/hbase/Webevent/387441199/User
-rw-r--r--   3 hadoop supergroup  173755411 2010-11-11 13:23
/hbase/Webevent/387441199/User/3697716047653972271
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:22
/hbase/Webevent/387441199/http
-rw-r--r--   3 hadoop supergroup   29164625 2010-11-11 13:19
/hbase/Webevent/387441199/http/2172660655272329198
-rw-r--r--   3 hadoop supergroup    3505176 2010-11-11 13:22
/hbase/Webevent/387441199/http/9190482934578742068
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:30
/hbase/Webevent/480045516
-rw-r--r--   3 hadoop supergroup       2317 2010-11-11 13:30
/hbase/Webevent/480045516/.regioninfo
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:37
/hbase/Webevent/480045516/Channel
-rw-r--r--   3 hadoop supergroup   14777812 2010-11-11 13:37
/hbase/Webevent/480045516/Channel/2328066899305806515
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:37
/hbase/Webevent/480045516/Customer
-rw-r--r--   3 hadoop supergroup   18953627 2010-11-11 13:37
/hbase/Webevent/480045516/Customer/2078047623290175963
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:37
/hbase/Webevent/480045516/Event
-rw-r--r--   3 hadoop supergroup  104229664 2010-11-11 13:37
/hbase/Webevent/480045516/Event/910211247163239598
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:37
/hbase/Webevent/480045516/User
-rw-r--r--   3 hadoop supergroup  189096799 2010-11-11 13:37
/hbase/Webevent/480045516/User/5717389634644419119
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:37
/hbase/Webevent/480045516/http
-rw-r--r--   3 hadoop supergroup   36533404 2010-11-11 13:37
/hbase/Webevent/480045516/http/8604372036650962237
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:40
/hbase/Webevent/601109706
-rw-r--r--   3 hadoop supergroup       2317 2010-11-11 13:40
/hbase/Webevent/601109706/.regioninfo
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:40
/hbase/Webevent/601109706/Channel
-rw-r--r--   3 hadoop supergroup   14155967 2010-11-11 13:40
/hbase/Webevent/601109706/Channel/1819667230290028427
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:40
/hbase/Webevent/601109706/Customer
-rw-r--r--   3 hadoop supergroup   14563111 2010-11-11 13:40
/hbase/Webevent/601109706/Customer/7336170720169514891
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:40
/hbase/Webevent/601109706/Event
-rw-r--r--   3 hadoop supergroup   82278013 2010-11-11 13:40
/hbase/Webevent/601109706/Event/5064894617590864583
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:40
/hbase/Webevent/601109706/User
-rw-r--r--   3 hadoop supergroup  149299853 2010-11-11 13:40
/hbase/Webevent/601109706/User/5997879119834564841
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:40
/hbase/Webevent/601109706/http
-rw-r--r--   3 hadoop supergroup   29266049 2010-11-11 13:40
/hbase/Webevent/601109706/http/3987271255931462679
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:25
/hbase/Webevent/666508206
-rw-r--r--   3 hadoop supergroup       2317 2010-11-11 13:25
/hbase/Webevent/666508206/.regioninfo
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:33
/hbase/Webevent/666508206/Channel
-rw-r--r--   3 hadoop supergroup   22727461 2010-11-11 13:33
/hbase/Webevent/666508206/Channel/9154587641511700292
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:33
/hbase/Webevent/666508206/Customer
-rw-r--r--   3 hadoop supergroup   23277615 2010-11-11 13:33
/hbase/Webevent/666508206/Customer/3760018687145755911
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:34
/hbase/Webevent/666508206/Event
-rw-r--r--   3 hadoop supergroup  111133668 2010-11-11 13:33
/hbase/Webevent/666508206/Event/3598650053650721687
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:34
/hbase/Webevent/666508206/User
-rw-r--r--   3 hadoop supergroup  201631388 2010-11-11 13:34
/hbase/Webevent/666508206/User/3597127170470234124
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:34
/hbase/Webevent/666508206/http
-rw-r--r--   3 hadoop supergroup   39920111 2010-11-11 13:34
/hbase/Webevent/666508206/http/1455502897668123089
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:32
/hbase/Webevent/717393157
-rw-r--r--   3 hadoop supergroup       2317 2010-11-11 13:32
/hbase/Webevent/717393157/.regioninfo
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:34
/hbase/Webevent/717393157/Channel
-rw-r--r--   3 hadoop supergroup    7937724 2010-11-11 13:34
/hbase/Webevent/717393157/Channel/4038125755496042580
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:34
/hbase/Webevent/717393157/Customer
-rw-r--r--   3 hadoop supergroup   14666396 2010-11-11 13:34
/hbase/Webevent/717393157/Customer/8406371944316504992
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:34
/hbase/Webevent/717393157/Event
-rw-r--r--   3 hadoop supergroup   85611423 2010-11-11 13:34
/hbase/Webevent/717393157/Event/127456153926503346
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:35
/hbase/Webevent/717393157/User
-rw-r--r--   3 hadoop supergroup  154335622 2010-11-11 13:34
/hbase/Webevent/717393157/User/7421172344231467438
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:35
/hbase/Webevent/717393157/http
-rw-r--r--   3 hadoop supergroup   28943243 2010-11-11 13:35
/hbase/Webevent/717393157/http/7543152081662309456
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:30
/hbase/Webevent/902882312
-rw-r--r--   3 hadoop supergroup       2317 2010-11-11 13:30
/hbase/Webevent/902882312/.regioninfo
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:30
/hbase/Webevent/902882312/Channel
-rw-r--r--   3 hadoop supergroup    9541469 2010-11-11 13:30
/hbase/Webevent/902882312/Channel/3254461494206070427
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:31
/hbase/Webevent/902882312/Customer
-rw-r--r--   3 hadoop supergroup   16270772 2010-11-11 13:30
/hbase/Webevent/902882312/Customer/3583245475353507819
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:31
/hbase/Webevent/902882312/Event
-rw-r--r--   3 hadoop supergroup   90805116 2010-11-11 13:31
/hbase/Webevent/902882312/Event/1032140072520109551
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:31
/hbase/Webevent/902882312/User
-rw-r--r--   3 hadoop supergroup  164990613 2010-11-11 13:31
/hbase/Webevent/902882312/User/5112158281218703912
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:31
/hbase/Webevent/902882312/http
-rw-r--r--   3 hadoop supergroup   38405659 2010-11-11 13:31
/hbase/Webevent/902882312/http/5928256232381135445
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:41
/hbase/Webevent/compaction.dir
drwxr-xr-x   - hadoop supergroup          0 2010-11-11 13:36
/hbase/Webevent/compaction.dir/1857066524
-rw-r--r--   3 hadoop supergroup  153276719 2010-11-11 13:36
/hbase/Webevent/compaction.dir/1857066524/8008135349377513409

There are many smaller files of sizes < 20 MB which might be actually taking
up 64*3=192 MB after replication. And even for the larger files, a file of
129 MB would use up 3 blocks right? Or is it somehow optimized to minimize
space usage?

On Wed, Nov 10, 2010 at 11:07 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:

> Can you pastebin the output of the lsr command on the table's dir?
>
> Thx
>
> J-D
>
> On Tue, Nov 9, 2010 at 10:54 PM, Hari Sreekumar
> <hs...@clickable.com> wrote:
> > I checked the "browse filesystem" link in the web interface (50070).
> HBase
> > creates a directly named after the table ,and in the directory, there are
> > files which are 5-6 MB in size, on average. Some are in kbs, and there
> are
> > some of 12-13 MB size, but most are around  6 MB. I was thinking these
> files
> > are stored in 64 MB blocks, leading to the space usage.
> >
> > hari
> >
> > On Wed, Nov 10, 2010 at 11:56 AM, Jean-Daniel Cryans <
> jdcryans@apache.org>wrote:
> >
> >> I'm pretty sure that's not how it's reported by the "du" command, but
> >> I wouldn't expect to see files of 5MB on average. Can you be more
> >> specific?
> >>
> >> J-D
> >>
> >> On Tue, Nov 9, 2010 at 9:58 PM, Hari Sreekumar <
> hsreekumar@clickable.com>
> >> wrote:
> >> > Ah, so the bloat is not because of the files being 5-6 MB in size?
> >> Wouldn't
> >> > a 6 MB file occupy 64 MB if I set block size as 64 MB?
> >> >
> >> > hari
> >> >
> >> > On Wed, Nov 10, 2010 at 11:16 AM, Jean-Daniel Cryans <
> >> jdcryans@apache.org>wrote:
> >> >
> >> >> Each value is stored with it's full key e.g. row key + family +
> >> >> qualifier + timestamp + offsets. You don't give any information
> >> >> regarding how you stored the data, but if you have large enough keys
> >> >> then it should easily explain the bloat.
> >> >>
> >> >> J-D
> >> >>
> >> >> On Tue, Nov 9, 2010 at 9:21 PM, Hari Sreekumar <
> >> hsreekumar@clickable.com>
> >> >> wrote:
> >> >> > Hi,
> >> >> >
> >> >> >     Data seems to be taking up too much space when I put into
> HBase.
> >> e.g,
> >> >> I
> >> >> > have a 2 GB text file which seems to be taking up ~70 GB when I
> dump
> >> into
> >> >> > HBase. I have block size set to 64 MB and replication=3, which I
> think
> >> is
> >> >> > the possible reason for this expansion. But if that is the case,
> how
> >> can
> >> >> I
> >> >> > prevent it? Decreasing the block size will have a negative impact
> on
> >> >> > performance, so is there a way I can increase the average size on
> >> >> > HBase-created  files to be comparable to 64 MB. Right now they are
> ~5
> >> MB
> >> >> on
> >> >> > average. Or is this an entirely different thing at work here?
> >> >> >
> >> >> > thanks,
> >> >> > hari
> >> >> >
> >> >>
> >> >
> >>
> >
>

Re: Data taking up too much space when put into HBase

Posted by Jean-Daniel Cryans <jd...@apache.org>.
Can you pastebin the output of the lsr command on the table's dir?

Thx

J-D

On Tue, Nov 9, 2010 at 10:54 PM, Hari Sreekumar
<hs...@clickable.com> wrote:
> I checked the "browse filesystem" link in the web interface (50070). HBase
> creates a directly named after the table ,and in the directory, there are
> files which are 5-6 MB in size, on average. Some are in kbs, and there are
> some of 12-13 MB size, but most are around  6 MB. I was thinking these files
> are stored in 64 MB blocks, leading to the space usage.
>
> hari
>
> On Wed, Nov 10, 2010 at 11:56 AM, Jean-Daniel Cryans <jd...@apache.org>wrote:
>
>> I'm pretty sure that's not how it's reported by the "du" command, but
>> I wouldn't expect to see files of 5MB on average. Can you be more
>> specific?
>>
>> J-D
>>
>> On Tue, Nov 9, 2010 at 9:58 PM, Hari Sreekumar <hs...@clickable.com>
>> wrote:
>> > Ah, so the bloat is not because of the files being 5-6 MB in size?
>> Wouldn't
>> > a 6 MB file occupy 64 MB if I set block size as 64 MB?
>> >
>> > hari
>> >
>> > On Wed, Nov 10, 2010 at 11:16 AM, Jean-Daniel Cryans <
>> jdcryans@apache.org>wrote:
>> >
>> >> Each value is stored with it's full key e.g. row key + family +
>> >> qualifier + timestamp + offsets. You don't give any information
>> >> regarding how you stored the data, but if you have large enough keys
>> >> then it should easily explain the bloat.
>> >>
>> >> J-D
>> >>
>> >> On Tue, Nov 9, 2010 at 9:21 PM, Hari Sreekumar <
>> hsreekumar@clickable.com>
>> >> wrote:
>> >> > Hi,
>> >> >
>> >> >     Data seems to be taking up too much space when I put into HBase.
>> e.g,
>> >> I
>> >> > have a 2 GB text file which seems to be taking up ~70 GB when I dump
>> into
>> >> > HBase. I have block size set to 64 MB and replication=3, which I think
>> is
>> >> > the possible reason for this expansion. But if that is the case, how
>> can
>> >> I
>> >> > prevent it? Decreasing the block size will have a negative impact on
>> >> > performance, so is there a way I can increase the average size on
>> >> > HBase-created  files to be comparable to 64 MB. Right now they are ~5
>> MB
>> >> on
>> >> > average. Or is this an entirely different thing at work here?
>> >> >
>> >> > thanks,
>> >> > hari
>> >> >
>> >>
>> >
>>
>

Re: Data taking up too much space when put into HBase

Posted by Hari Sreekumar <hs...@clickable.com>.
I checked the "browse filesystem" link in the web interface (50070). HBase
creates a directly named after the table ,and in the directory, there are
files which are 5-6 MB in size, on average. Some are in kbs, and there are
some of 12-13 MB size, but most are around  6 MB. I was thinking these files
are stored in 64 MB blocks, leading to the space usage.

hari

On Wed, Nov 10, 2010 at 11:56 AM, Jean-Daniel Cryans <jd...@apache.org>wrote:

> I'm pretty sure that's not how it's reported by the "du" command, but
> I wouldn't expect to see files of 5MB on average. Can you be more
> specific?
>
> J-D
>
> On Tue, Nov 9, 2010 at 9:58 PM, Hari Sreekumar <hs...@clickable.com>
> wrote:
> > Ah, so the bloat is not because of the files being 5-6 MB in size?
> Wouldn't
> > a 6 MB file occupy 64 MB if I set block size as 64 MB?
> >
> > hari
> >
> > On Wed, Nov 10, 2010 at 11:16 AM, Jean-Daniel Cryans <
> jdcryans@apache.org>wrote:
> >
> >> Each value is stored with it's full key e.g. row key + family +
> >> qualifier + timestamp + offsets. You don't give any information
> >> regarding how you stored the data, but if you have large enough keys
> >> then it should easily explain the bloat.
> >>
> >> J-D
> >>
> >> On Tue, Nov 9, 2010 at 9:21 PM, Hari Sreekumar <
> hsreekumar@clickable.com>
> >> wrote:
> >> > Hi,
> >> >
> >> >     Data seems to be taking up too much space when I put into HBase.
> e.g,
> >> I
> >> > have a 2 GB text file which seems to be taking up ~70 GB when I dump
> into
> >> > HBase. I have block size set to 64 MB and replication=3, which I think
> is
> >> > the possible reason for this expansion. But if that is the case, how
> can
> >> I
> >> > prevent it? Decreasing the block size will have a negative impact on
> >> > performance, so is there a way I can increase the average size on
> >> > HBase-created  files to be comparable to 64 MB. Right now they are ~5
> MB
> >> on
> >> > average. Or is this an entirely different thing at work here?
> >> >
> >> > thanks,
> >> > hari
> >> >
> >>
> >
>

Re: Data taking up too much space when put into HBase

Posted by Jean-Daniel Cryans <jd...@apache.org>.
I'm pretty sure that's not how it's reported by the "du" command, but
I wouldn't expect to see files of 5MB on average. Can you be more
specific?

J-D

On Tue, Nov 9, 2010 at 9:58 PM, Hari Sreekumar <hs...@clickable.com> wrote:
> Ah, so the bloat is not because of the files being 5-6 MB in size? Wouldn't
> a 6 MB file occupy 64 MB if I set block size as 64 MB?
>
> hari
>
> On Wed, Nov 10, 2010 at 11:16 AM, Jean-Daniel Cryans <jd...@apache.org>wrote:
>
>> Each value is stored with it's full key e.g. row key + family +
>> qualifier + timestamp + offsets. You don't give any information
>> regarding how you stored the data, but if you have large enough keys
>> then it should easily explain the bloat.
>>
>> J-D
>>
>> On Tue, Nov 9, 2010 at 9:21 PM, Hari Sreekumar <hs...@clickable.com>
>> wrote:
>> > Hi,
>> >
>> >     Data seems to be taking up too much space when I put into HBase. e.g,
>> I
>> > have a 2 GB text file which seems to be taking up ~70 GB when I dump into
>> > HBase. I have block size set to 64 MB and replication=3, which I think is
>> > the possible reason for this expansion. But if that is the case, how can
>> I
>> > prevent it? Decreasing the block size will have a negative impact on
>> > performance, so is there a way I can increase the average size on
>> > HBase-created  files to be comparable to 64 MB. Right now they are ~5 MB
>> on
>> > average. Or is this an entirely different thing at work here?
>> >
>> > thanks,
>> > hari
>> >
>>
>

Re: Data taking up too much space when put into HBase

Posted by Hari Sreekumar <hs...@clickable.com>.
Ah, so the bloat is not because of the files being 5-6 MB in size? Wouldn't
a 6 MB file occupy 64 MB if I set block size as 64 MB?

hari

On Wed, Nov 10, 2010 at 11:16 AM, Jean-Daniel Cryans <jd...@apache.org>wrote:

> Each value is stored with it's full key e.g. row key + family +
> qualifier + timestamp + offsets. You don't give any information
> regarding how you stored the data, but if you have large enough keys
> then it should easily explain the bloat.
>
> J-D
>
> On Tue, Nov 9, 2010 at 9:21 PM, Hari Sreekumar <hs...@clickable.com>
> wrote:
> > Hi,
> >
> >     Data seems to be taking up too much space when I put into HBase. e.g,
> I
> > have a 2 GB text file which seems to be taking up ~70 GB when I dump into
> > HBase. I have block size set to 64 MB and replication=3, which I think is
> > the possible reason for this expansion. But if that is the case, how can
> I
> > prevent it? Decreasing the block size will have a negative impact on
> > performance, so is there a way I can increase the average size on
> > HBase-created  files to be comparable to 64 MB. Right now they are ~5 MB
> on
> > average. Or is this an entirely different thing at work here?
> >
> > thanks,
> > hari
> >
>

Re: Data taking up too much space when put into HBase

Posted by Jean-Daniel Cryans <jd...@apache.org>.
Each value is stored with it's full key e.g. row key + family +
qualifier + timestamp + offsets. You don't give any information
regarding how you stored the data, but if you have large enough keys
then it should easily explain the bloat.

J-D

On Tue, Nov 9, 2010 at 9:21 PM, Hari Sreekumar <hs...@clickable.com> wrote:
> Hi,
>
>     Data seems to be taking up too much space when I put into HBase. e.g, I
> have a 2 GB text file which seems to be taking up ~70 GB when I dump into
> HBase. I have block size set to 64 MB and replication=3, which I think is
> the possible reason for this expansion. But if that is the case, how can I
> prevent it? Decreasing the block size will have a negative impact on
> performance, so is there a way I can increase the average size on
> HBase-created  files to be comparable to 64 MB. Right now they are ~5 MB on
> average. Or is this an entirely different thing at work here?
>
> thanks,
> hari
>