You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Mick Semb Wever <mc...@apache.org> on 2012/03/11 20:47:56 UTC

OOM opening bloom filter

Using cassandra-1.0.6 one node fails to start.

java.lang.OutOfMemoryError: Java heap space
	at org.apache.cassandra.utils.obs.OpenBitSet.<init>(OpenBitSet.java:104)
	at org.apache.cassandra.utils.obs.OpenBitSet.<init>(OpenBitSet.java:92)
	at org.apache.cassandra.utils.BloomFilterSerializer.deserialize(BloomFilterSerializer.java:55)
	at org.apache.cassandra.io.sstable.SSTableReader.loadBloomFilter(SSTableReader.java:308)
	at org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:168)
	at org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableReader.java:205)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:662)


This happens with (our normal) -Xmx12g setting.

How did this this bloom filter get too big?
Is the best option to keep trying with larger Xmx settings until i can
startup the node and then to do a `nodetool scrub` ?

~mck


-- 
"Don't use Outlook. Outlook is really just a security hole with a small
e-mail client attached to it." Brian Trosko 

| http://github.com/finn-no | http://tech.finn.no |

Re: OOM opening bloom filter

Posted by Mick Semb Wever <mc...@apache.org>.


> How much smaller did the BF get to ? 

After pending compactions completed today, i'm presuming fp_ratio is
applied now to all sstables in the keyspace, it has gone from 20G+ down
to 1G. This node is now running comfortably on Xmx4G (used heap ~1.5G).


~mck


-- 
"A Microsoft Certified System Engineer is to information technology as a
McDonalds Certified Food Specialist is to the culinary arts." Michael
Bacarella 

| http://github.com/finn-no | http://tech.finn.no |

Re: OOM opening bloom filter

Posted by aaron morton <aa...@thelastpickle.com>.

Thanks for the update. 

How much smaller did the BF get to ? 

A

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 13/03/2012, at 8:24 AM, Mick Semb Wever wrote:

> 
>>>>> It's my understanding then for this use case that bloom filters are of
>>>>> little importance and that i can
> 
> 
> Ok. To summarise our actions to get us out of this situation, in hope
> that it may help others one day, we did the following actions:
> 
> 1) upgrade to 1.0.7
> 2) set fp_ratio=0.99
> 3) set index_interval=4096
> 4) restarted the node with Xmx30G
> 5) run `nodetool scrub` 
>      and monitor total size of bf files
>      using `du -hc *-Filter.db | grep total`
> 6) restart node with original Xmx setting once total bf size is under
>      (scrub was running for >12hrs)
>      (remaining bloom filters can be rebuilt later from normal compact)
> 
> Hopefully it will also eventuate that this cluster can run with a more
> normal Xmx4G rather than the previous Xmx12G.
> 
> (2) and (3) are very much dependent on our set up using hadoop where all
> reads are get_range_slice with 16k rows per request. Both could be tuned
> correctly but they're the numbers that worked first up.
> 
> ~mck
> 
> -- 
> "When there is no enemy within, the enemies outside can't hurt you."
> African proverb 
> 
> | http://github.com/finn-no | http://tech.finn.no |

Re: OOM opening bloom filter

Posted by Mick Semb Wever <mc...@apache.org>.

> > > > It's my understanding then for this use case that bloom filters are of
> > > > little importance and that i can


Ok. To summarise our actions to get us out of this situation, in hope
that it may help others one day, we did the following actions:

 1) upgrade to 1.0.7
 2) set fp_ratio=0.99
 3) set index_interval=4096
 4) restarted the node with Xmx30G
 5) run `nodetool scrub` 
      and monitor total size of bf files
      using `du -hc *-Filter.db | grep total`
 6) restart node with original Xmx setting once total bf size is under
      (scrub was running for >12hrs)
      (remaining bloom filters can be rebuilt later from normal compact)

Hopefully it will also eventuate that this cluster can run with a more
normal Xmx4G rather than the previous Xmx12G.

(2) and (3) are very much dependent on our set up using hadoop where all
reads are get_range_slice with 16k rows per request. Both could be tuned
correctly but they're the numbers that worked first up.

~mck

-- 
"When there is no enemy within, the enemies outside can't hurt you."
African proverb 

| http://github.com/finn-no | http://tech.finn.no |

Re: OOM opening bloom filter

Posted by aaron morton <aa...@thelastpickle.com>.

>>> It's my understanding then for this use case that bloom filters are of
>>> little importance and that i can
>> 
Yes.
AFAIK there is only one position seek (that will use the bloom filter)  at the start of a get_range_slice request. After that the iterators step over the rows in the -Data files. 

For the same reason caches may be considered a little less useful.  

Hope that helps. 

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 12/03/2012, at 12:44 PM, Mick Semb Wever wrote:

> On Sun, 2012-03-11 at 15:36 -0700, Peter Schuller wrote:
>> Are you doing RF=1? 
> 
> That is correct. So are you calculations then :-)
> 
> 
>>> very small, <1k. Data from this cf is only read via hadoop jobs in batch
>>> reads of 16k rows at a time.
>> [snip]
>>> It's my understanding then for this use case that bloom filters are of
>>> little importance and that i can
>> 
>> Depends. I'm not familiar enough with how the hadoop integration works
>> so someone else will have to comment, but if your hadoop jobs are just
>> performan normal reads of keys via thrift and the keys they are
>> grabbing are not in token order, those reads would be effectively
>> random and bloom filters should still be highly relevant to the amount
>> of I/O operations you need to perform. 
> 
> They are thrift get_range_slice reads of 16k rows per request.
> Hadoop reads are based on tokens, but in my use case the keys are also
> ordered and this cluster is using BOP.
> 
> ~mck
> 
> -- 
> "Living on Earth is expensive, but it does include a free trip around
> the sun every year." Unknown 
> 
> | http://github.com/finn-no | http://tech.finn.no |

Re: OOM opening bloom filter

Posted by Mick Semb Wever <mc...@apache.org>.

On Sun, 2012-03-11 at 15:36 -0700, Peter Schuller wrote:
> Are you doing RF=1? 

That is correct. So are you calculations then :-)


> > very small, <1k. Data from this cf is only read via hadoop jobs in batch
> > reads of 16k rows at a time.
> [snip]
> > It's my understanding then for this use case that bloom filters are of
> > little importance and that i can
> 
> Depends. I'm not familiar enough with how the hadoop integration works
> so someone else will have to comment, but if your hadoop jobs are just
> performan normal reads of keys via thrift and the keys they are
> grabbing are not in token order, those reads would be effectively
> random and bloom filters should still be highly relevant to the amount
> of I/O operations you need to perform. 

They are thrift get_range_slice reads of 16k rows per request.
Hadoop reads are based on tokens, but in my use case the keys are also
ordered and this cluster is using BOP.

~mck

-- 
"Living on Earth is expensive, but it does include a free trip around
the sun every year." Unknown 

| http://github.com/finn-no | http://tech.finn.no |

Re: OOM opening bloom filter

Posted by Peter Schuller <pe...@infidyne.com>.

> This particular cf has up to ~10 billion rows over 3 nodes. Each row is

With default settings, 143 million keys roughly gives you 2^31 bits of
bloom filter. Or put another way, you get about 1 GB of bloom filters
per 570 million keys, if I'm not mistaken. If you have 10 billion
rows, that should be roughly 20 gigs, plus any overhead caused by rows
appearing in multiple sstables.

Are you doing RF=1? That would explain how you're fitting this into 3
nodes with a heap size of 12 gb. If not, I'm probably making a mistake
in my calculation :)

> very small, <1k. Data from this cf is only read via hadoop jobs in batch
> reads of 16k rows at a time.
[snip]
> It's my understanding then for this use case that bloom filters are of
> little importance and that i can

Depends. I'm not familiar enough with how the hadoop integration works
so someone else will have to comment, but if your hadoop jobs are just
performan normal reads of keys via thrift and the keys they are
grabbing are not in token order, those reads would be effectively
random and bloom filters should still be highly relevant to the amount
of I/O operations you need to perform.

>  - upgrade to 1.0.7
>  - set fp_ratio=0.99
>  - set index_interval=1024
>
> This should alleviate much of the memory problems.
> Is this correct?

Provided that you do indeed not need the BF:s, then yeah. For the
record I have not yet personally tried the fp_ratio setting, but it
certainly should significantly decrease memory use.

-- 
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re: OOM opening bloom filter

Posted by Mick Semb Wever <mc...@apache.org>.

On Sun, 2012-03-11 at 15:06 -0700, Peter Schuller wrote:
> If it is legitimate use of memory, you *may*, depending on your
> workload, want to adjust target bloom filter false positive rates:
> 
>    https://issues.apache.org/jira/browse/CASSANDRA-3497 

This particular cf has up to ~10 billion rows over 3 nodes. Each row is
very small, <1k. Data from this cf is only read via hadoop jobs in batch
reads of 16k rows at a time. 

*-Data.db files are typically ~50G, and *-Filter.db files typically 2G
although some are 7Gb.
At the moment there are many pending compactions, but i can't do any
because the node crashes at startup.

It's my understanding then for this use case that bloom filters are of
little importance and that i can 
 - upgrade to 1.0.7
 - set fp_ratio=0.99
 - set index_interval=1024

This should alleviate much of the memory problems.
Is this correct?

~mck

-- 
"It seems that perfection is reached not when there is nothing left to
add, but when there is nothing left to take away" Antoine de Saint
Exupéry (William of Ockham) 

| http://github.com/finn-no | http://tech.finn.no |

Re: OOM opening bloom filter

Posted by Peter Schuller <pe...@infidyne.com>.

> How did this this bloom filter get too big?

Bloom filters grow with the amount of row keys you have. It is natural
that they grow bigger over time. The question is whether there is
something "wrong" with this node (for example, lots of sstables and
disk space used due to compaction not running, etc) or whether your
cluster is simply increasing it's use of row keys over time. You'd
want graphs to be able to see the trends. If you don't, I'd start by
comparing this node with other nodes in the cluster and figure out
whether there is a very significant difference or not.

In any case, a bigger heap will allow you to start up again. But you
should definitely make sure you know what's going on (natural growth
of data vs. some problem) if you want to avoid problems in the future.

If it is legitimate use of memory, you *may*, depending on your
workload, want to adjust target bloom filter false positive rates:

   https://issues.apache.org/jira/browse/CASSANDRA-3497

-- 
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)