You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by Anthony Fox <ad...@gmail.com> on 2012/12/05 16:55:19 UTC

Re: tservers running out of heap space

So, after removing the bloom filter, I get no OOMs with multiple scanners
but my column family only scans are quite slow.  Is there any settings you
can recommend to enable the CF bloom filters that won't cause OOMs?

Thanks,
Anthony


On Thu, Nov 29, 2012 at 3:50 PM, Anthony Fox <ad...@gmail.com> wrote:

> Ok, a bit more info.  I set -XX:+HeapDumpOnOutOfMemoryError and took a
> look at the heap dump.  The thread that caused the OOM is reading a column
> family bloom filter from the CacheableBlockFile.  The class taking up the
> memory is long[] which seems to be consistent with a bloom filter.  Does
> this sound right?  Any guidance on settings to tweak related to bloom
> filters to alleviate this issue?
>
>
> On Thu, Nov 29, 2012 at 2:24 PM, Anthony Fox <ad...@gmail.com> wrote:
>
>> Since the scan involves an intersecting iterator, it has to scan the
>> entire row range.  Also, it's not even very many concurrent clients -
>> between 5 and 10.  Should I turn compression off on this table or is that
>> bad idea in general?
>>
>>
>> On Thu, Nov 29, 2012 at 2:22 PM, Keith Turner <ke...@deenlo.com> wrote:
>>
>>>
>>>
>>> On Thu, Nov 29, 2012 at 2:09 PM, Anthony Fox <ad...@gmail.com>wrote:
>>>
>>>> We're not on 1.4 yet, unfortunately.  Are there any config params I can
>>>> tweak to manipulate the compressor pool?
>>>
>>>
>>> Not that I know of, but its been a while since I looked at that.
>>>
>>>
>>>>
>>>>
>>>> On Thu, Nov 29, 2012 at 1:49 PM, Keith Turner <ke...@deenlo.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Thu, Nov 29, 2012 at 12:20 PM, Anthony Fox <ad...@gmail.com>wrote:
>>>>>
>>>>>> Compacting down to a single file is not feasible - there's about 70G
>>>>>> in 255 tablets across 15 tablet servers.  Is there another way to tune the
>>>>>> compressor pool or another mechanism to verify that this is the issue?
>>>>>
>>>>>
>>>>> I suppose another way to test this would be to run a lot of concurrent
>>>>> scans, but not enough to kill the tserver.  Then get a heap dump of the
>>>>> tserver and see if it contains a lot of 128k or 256k (can not remember
>>>>> exact size) byte arrays that are referenced by the compressor pool.
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Nov 29, 2012 at 12:09 PM, Keith Turner <ke...@deenlo.com>wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Nov 29, 2012 at 11:14 AM, Anthony Fox <ad...@gmail.com>wrote:
>>>>>>>
>>>>>>>> I am experiencing some issues running multiple parallel scans
>>>>>>>> against Accumulo.  Running single scans works just fine but when I ramp up
>>>>>>>> the number of simultaneous clients, my tablet servers die due to running
>>>>>>>> out of heap space.  I've tried raising max heap to 4G which should be more
>>>>>>>> than enough but I still see this error.  I've tried with
>>>>>>>> table.cache.block.enable=false
>>>>>>>> table.cache.index.enable=false, and table.scan.cache.enable=false
>>>>>>>> and all combinations of caching enabled as well.
>>>>>>>>
>>>>>>>> My scans involve a custom intersecting iterator that maintains no
>>>>>>>> more state than the top key and value.  The scans also do a bit of
>>>>>>>> aggregation on column qualifiers but the result is small and the number of
>>>>>>>> returned entries is only in the dozens.  The size of each returned value is
>>>>>>>> only around 500 bytes.
>>>>>>>>
>>>>>>>> Any ideas why this may be happening or where to look for further
>>>>>>>> info?
>>>>>>>>
>>>>>>>
>>>>>>> One know issues is hadoops compressor pool.  If you have a tablet
>>>>>>> with 8 files and you query 10 terms, you will allocate 80 decompressors.
>>>>>>> Each decompressor uses 128K.   If you have 10 concurrent queries, 10 terms,
>>>>>>> and 10 files then you will allocate 1000 decompressors.    These
>>>>>>> decompressors come from a pool that never shrinks.  So if you allocate 1000
>>>>>>> at the same time, they will stay around.
>>>>>>>
>>>>>>> Try compacting your table down to one file and rerun your query just
>>>>>>> to see if that helps.   If it does, then thats an important clue.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Anthony
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: tservers running out of heap space

Posted by Eric Newton <er...@gmail.com>.
Some options:

* Decrease table.bloom.size
* Increase table.bloom.error.rate
* Decrease the number of files that can be opened at once
* Increase the size of your JVM (may require more hardware :-)

Can you tell us a little more about the column family and the size of BF
that you are getting?  o.a.a.core.file.rfile.PrintInfo can get you the size
of the the bloom filter in a file.

-Eric




On Wed, Dec 5, 2012 at 10:55 AM, Anthony Fox <ad...@gmail.com> wrote:

> So, after removing the bloom filter, I get no OOMs with multiple scanners
> but my column family only scans are quite slow.  Is there any settings you
> can recommend to enable the CF bloom filters that won't cause OOMs?
>
> Thanks,
> Anthony
>
>
> On Thu, Nov 29, 2012 at 3:50 PM, Anthony Fox <ad...@gmail.com> wrote:
>
>> Ok, a bit more info.  I set -XX:+HeapDumpOnOutOfMemoryError and took a
>> look at the heap dump.  The thread that caused the OOM is reading a column
>> family bloom filter from the CacheableBlockFile.  The class taking up the
>> memory is long[] which seems to be consistent with a bloom filter.  Does
>> this sound right?  Any guidance on settings to tweak related to bloom
>> filters to alleviate this issue?
>>
>>
>> On Thu, Nov 29, 2012 at 2:24 PM, Anthony Fox <ad...@gmail.com>wrote:
>>
>>> Since the scan involves an intersecting iterator, it has to scan the
>>> entire row range.  Also, it's not even very many concurrent clients -
>>> between 5 and 10.  Should I turn compression off on this table or is that
>>> bad idea in general?
>>>
>>>
>>> On Thu, Nov 29, 2012 at 2:22 PM, Keith Turner <ke...@deenlo.com> wrote:
>>>
>>>>
>>>>
>>>> On Thu, Nov 29, 2012 at 2:09 PM, Anthony Fox <ad...@gmail.com>wrote:
>>>>
>>>>> We're not on 1.4 yet, unfortunately.  Are there any config params I
>>>>> can tweak to manipulate the compressor pool?
>>>>
>>>>
>>>> Not that I know of, but its been a while since I looked at that.
>>>>
>>>>
>>>>>
>>>>>
>>>>> On Thu, Nov 29, 2012 at 1:49 PM, Keith Turner <ke...@deenlo.com>wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Nov 29, 2012 at 12:20 PM, Anthony Fox <ad...@gmail.com>wrote:
>>>>>>
>>>>>>> Compacting down to a single file is not feasible - there's about 70G
>>>>>>> in 255 tablets across 15 tablet servers.  Is there another way to tune the
>>>>>>> compressor pool or another mechanism to verify that this is the issue?
>>>>>>
>>>>>>
>>>>>> I suppose another way to test this would be to run a lot of
>>>>>> concurrent scans, but not enough to kill the tserver.  Then get a heap dump
>>>>>> of the tserver and see if it contains a lot of 128k or 256k (can not
>>>>>> remember exact size) byte arrays that are referenced by the compressor pool.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Nov 29, 2012 at 12:09 PM, Keith Turner <ke...@deenlo.com>wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Nov 29, 2012 at 11:14 AM, Anthony Fox <adfaccuser@gmail.com
>>>>>>>> > wrote:
>>>>>>>>
>>>>>>>>> I am experiencing some issues running multiple parallel scans
>>>>>>>>> against Accumulo.  Running single scans works just fine but when I ramp up
>>>>>>>>> the number of simultaneous clients, my tablet servers die due to running
>>>>>>>>> out of heap space.  I've tried raising max heap to 4G which should be more
>>>>>>>>> than enough but I still see this error.  I've tried with
>>>>>>>>> table.cache.block.enable=false
>>>>>>>>> table.cache.index.enable=false, and table.scan.cache.enable=false
>>>>>>>>> and all combinations of caching enabled as well.
>>>>>>>>>
>>>>>>>>> My scans involve a custom intersecting iterator that maintains no
>>>>>>>>> more state than the top key and value.  The scans also do a bit of
>>>>>>>>> aggregation on column qualifiers but the result is small and the number of
>>>>>>>>> returned entries is only in the dozens.  The size of each returned value is
>>>>>>>>> only around 500 bytes.
>>>>>>>>>
>>>>>>>>> Any ideas why this may be happening or where to look for further
>>>>>>>>> info?
>>>>>>>>>
>>>>>>>>
>>>>>>>> One know issues is hadoops compressor pool.  If you have a tablet
>>>>>>>> with 8 files and you query 10 terms, you will allocate 80 decompressors.
>>>>>>>> Each decompressor uses 128K.   If you have 10 concurrent queries, 10 terms,
>>>>>>>> and 10 files then you will allocate 1000 decompressors.    These
>>>>>>>> decompressors come from a pool that never shrinks.  So if you allocate 1000
>>>>>>>> at the same time, they will stay around.
>>>>>>>>
>>>>>>>> Try compacting your table down to one file and rerun your query
>>>>>>>> just to see if that helps.   If it does, then thats an important clue.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Anthony
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>