You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Roland Hänel <ro...@haenel.me> on 2010/04/26 21:16:24 UTC

Cassandra cluster runs into OOM when bulk loading data

I have a cluster of 5 machines building a Cassandra datastore, and I load
bulk data into this using the Java Thrift API. The first ~250GB runs fine,
then, one of the nodes starts to throw OutOfMemory exceptions. I'm not using
and row or index caches, and since I only have 5 CF's and some 2,5 GB of RAM
allocated to the JVM (-Xmx2500M), in theory, that should happen. All inserts
are done with consistency level ALL.

I hope with this I have avoided all the 'usual dummy errors' that lead to
OOM's. I have begun to troubleshoot the issue with JMX, however, it's
difficult to catch the JVM in the right moment because it runs well for
several hours before this thing happens.

One thing gets to my mind, maybe one of the experts could confirm or reject
this idea for me: is it possible that when one machine slows down a little
bit (for example because a big compaction is going on), the memtables don't
get flushed to disk as fast as they are building up under the continuing
bulk import? That would result in a downward spiral, the system gets slower
and slower on disk I/O, but since more and more data arrives over Thrift,
finally OOM.

I'm using the "periodic" commit log sync, maybe also this could create a
situation where the commit log writer is too slow to catch up with the data
intake, resulting in ever growing memory usage?

Maybe these thoughts are just bullshit. Let me now if so... ;-)

Re: Cassandra cluster runs into OOM when bulk loading data

Posted by Roland Hänel <ro...@haenel.me>.

Thanks Chris

2010/4/26 Chris Goffinet <go...@digg.com>

> Upgrade to b20 of Sun's version of JVM. This OOM might be related to
> LinkedBlockQueue issues that were fixed.
>
> -Chris
>
>
> 2010/4/26 Roland Hänel <ro...@haenel.me>
>
>> Cassandra Version 0.6.1
>> OpenJDK Server VM (build 14.0-b16, mixed mode)
>> Import speed is about 10MB/s for the full cluster; if a compaction is
>> going on the individual node is I/O limited
>> tpstats: caught me, didn't know this. I will set up a test and try to
>> catch a node during the critical time.
>>
>> Thanks,
>> Roland
>>
>>
>> 2010/4/26 Chris Goffinet <go...@digg.com>
>>
>>  Which version of Cassandra?
>>> Which version of Java JVM are you using?
>>> What do your I/O stats look like when bulk importing?
>>> When you run `nodeprobe -host XXXX tpstats` is any thread pool backing up
>>> during the import?
>>>
>>> -Chris
>>>
>>>
>>> 2010/4/26 Roland Hänel <ro...@haenel.me>
>>>
>>> I have a cluster of 5 machines building a Cassandra datastore, and I load
>>>> bulk data into this using the Java Thrift API. The first ~250GB runs fine,
>>>> then, one of the nodes starts to throw OutOfMemory exceptions. I'm not using
>>>> and row or index caches, and since I only have 5 CF's and some 2,5 GB of RAM
>>>> allocated to the JVM (-Xmx2500M), in theory, that should happen. All inserts
>>>> are done with consistency level ALL.
>>>>
>>>> I hope with this I have avoided all the 'usual dummy errors' that lead
>>>> to OOM's. I have begun to troubleshoot the issue with JMX, however, it's
>>>> difficult to catch the JVM in the right moment because it runs well for
>>>> several hours before this thing happens.
>>>>
>>>> One thing gets to my mind, maybe one of the experts could confirm or
>>>> reject this idea for me: is it possible that when one machine slows down a
>>>> little bit (for example because a big compaction is going on), the memtables
>>>> don't get flushed to disk as fast as they are building up under the
>>>> continuing bulk import? That would result in a downward spiral, the system
>>>> gets slower and slower on disk I/O, but since more and more data arrives
>>>> over Thrift, finally OOM.
>>>>
>>>> I'm using the "periodic" commit log sync, maybe also this could create a
>>>> situation where the commit log writer is too slow to catch up with the data
>>>> intake, resulting in ever growing memory usage?
>>>>
>>>> Maybe these thoughts are just bullshit. Let me now if so... ;-)
>>>>
>>>>
>>>>
>>>
>>
>

Re: Cassandra cluster runs into OOM when bulk loading data

Posted by Roland Hänel <ro...@haenel.me>.

There are other threads linked to this issue. Most notable, I think we're
hitting

https://issues.apache.org/jira/browse/CASSANDRA-1014

here.


2010/4/27 Schubert Zhang <zs...@gmail.com>

> Seems:
>
> ROW-MUTATION-STAGE   32      3349       63897493
> is the clue, too many mutation requests are pending.
>
>
> Yes, I also think cassandra should add a mechanism to avoid too many
> requests pending (in queue).
> When the queue is full, just reject the request from client.
>
> Seems https://issues.apache.org/jira/browse/CASSANDRA-685 is what we want.
>
>
>
> On Tue, Apr 27, 2010 at 8:16 PM, Eric Yu <su...@gmail.com> wrote:
>
>> I wrote a script to record the tpstats output every 5 seconds.
>> Here is the output just before the jvm OOM:
>>
>>
>> Pool Name                    Active   Pending      Completed
>> FILEUTILS-DELETE-POOL             0         0            280
>>
>> STREAM-STAGE                      0         0              0
>> RESPONSE-STAGE                    0         0         245573
>>
>> ROW-READ-STAGE                    0         0              0
>> LB-OPERATIONS                     0         0              0
>>  MESSAGE-DESERIALIZER-POOL         1  14290091       65943291
>> GMFD                              0         0          26670
>>
>> LB-TARGET                         0         0              0
>> CONSISTENCY-MANAGER               0         0              0
>>  ROW-MUTATION-STAGE               32      3349       63897493
>>
>> MESSAGE-STREAMING-POOL            0         0              3
>> LOAD-BALANCER-STAGE               0         0              0
>> FLUSH-SORTER-POOL                 0         0              0
>>  MEMTABLE-POST-FLUSHER             0         0            420
>> FLUSH-WRITER-POOL                 0         0            420
>>
>> AE-SERVICE-STAGE                  1         1              4
>> HINTED-HANDOFF-POOL               0         0             52
>>
>>
>> On Tue, Apr 27, 2010 at 10:53 AM, Chris Goffinet <go...@digg.com>wrote:
>>
>>> I'll work on doing more tests around this. In 0.5 we used a different
>>> data structure that required polling. But this does seem problematic.
>>>
>>>  -Chris
>>>
>>> On Apr 26, 2010, at 7:04 PM, Eric Yu wrote:
>>>
>>> I have the same problem here, and I analysised the hprof file with mat,
>>> as you said, LinkedBlockQueue used 2.6GB.
>>> I think the ThreadPool of cassandra should limit the queue size.
>>>
>>> cassandra 0.6.1
>>>
>>> java version
>>> $ java -version
>>> java version "1.6.0_20"
>>> Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
>>> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode)
>>>
>>> iostat
>>> $ iostat -x -l 1
>>> Device:         rrqm/s   wrqm/s   r/s   w/s    rkB/s    wkB/s avgrq-sz
>>> avgqu-sz   await  svctm  %util
>>> sda              81.00  8175.00 224.00 17.00 23984.00  2728.00
>>> 221.68     1.01    1.86   0.76  18.20
>>>
>>> tpstats, of coz, this node is still alive
>>> $ ./nodetool -host localhost tpstats
>>> Pool Name                    Active   Pending      Completed
>>> FILEUTILS-DELETE-POOL             0         0           1281
>>> STREAM-STAGE                      0         0              0
>>> RESPONSE-STAGE                    0         0      473617241
>>> ROW-READ-STAGE                    0         0              0
>>> LB-OPERATIONS                     0         0              0
>>> MESSAGE-DESERIALIZER-POOL         0         0      718355184
>>> GMFD                              0         0         132509
>>> LB-TARGET                         0         0              0
>>> CONSISTENCY-MANAGER               0         0              0
>>> ROW-MUTATION-STAGE                0         0      293735704
>>> MESSAGE-STREAMING-POOL            0         0              6
>>> LOAD-BALANCER-STAGE               0         0              0
>>> FLUSH-SORTER-POOL                 0         0              0
>>> MEMTABLE-POST-FLUSHER             0         0           1870
>>> FLUSH-WRITER-POOL                 0         0           1870
>>> AE-SERVICE-STAGE                  0         0              5
>>> HINTED-HANDOFF-POOL               0         0             21
>>>
>>>
>>> On Tue, Apr 27, 2010 at 3:32 AM, Chris Goffinet <go...@digg.com>wrote:
>>>
>>>> Upgrade to b20 of Sun's version of JVM. This OOM might be related to
>>>> LinkedBlockQueue issues that were fixed.
>>>>
>>>> -Chris
>>>>
>>>>
>>>> 2010/4/26 Roland Hänel <ro...@haenel.me>
>>>>
>>>>> Cassandra Version 0.6.1
>>>>> OpenJDK Server VM (build 14.0-b16, mixed mode)
>>>>> Import speed is about 10MB/s for the full cluster; if a compaction is
>>>>> going on the individual node is I/O limited
>>>>> tpstats: caught me, didn't know this. I will set up a test and try to
>>>>> catch a node during the critical time.
>>>>>
>>>>> Thanks,
>>>>> Roland
>>>>>
>>>>>
>>>>> 2010/4/26 Chris Goffinet <go...@digg.com>
>>>>>
>>>>>  Which version of Cassandra?
>>>>>> Which version of Java JVM are you using?
>>>>>> What do your I/O stats look like when bulk importing?
>>>>>> When you run `nodeprobe -host XXXX tpstats` is any thread pool backing
>>>>>> up during the import?
>>>>>>
>>>>>> -Chris
>>>>>>
>>>>>>
>>>>>> 2010/4/26 Roland Hänel <ro...@haenel.me>
>>>>>>
>>>>>> I have a cluster of 5 machines building a Cassandra datastore, and I
>>>>>>> load bulk data into this using the Java Thrift API. The first ~250GB runs
>>>>>>> fine, then, one of the nodes starts to throw OutOfMemory exceptions. I'm not
>>>>>>> using and row or index caches, and since I only have 5 CF's and some 2,5 GB
>>>>>>> of RAM allocated to the JVM (-Xmx2500M), in theory, that should happen. All
>>>>>>> inserts are done with consistency level ALL.
>>>>>>>
>>>>>>> I hope with this I have avoided all the 'usual dummy errors' that
>>>>>>> lead to OOM's. I have begun to troubleshoot the issue with JMX, however,
>>>>>>> it's difficult to catch the JVM in the right moment because it runs well for
>>>>>>> several hours before this thing happens.
>>>>>>>
>>>>>>> One thing gets to my mind, maybe one of the experts could confirm or
>>>>>>> reject this idea for me: is it possible that when one machine slows down a
>>>>>>> little bit (for example because a big compaction is going on), the memtables
>>>>>>> don't get flushed to disk as fast as they are building up under the
>>>>>>> continuing bulk import? That would result in a downward spiral, the system
>>>>>>> gets slower and slower on disk I/O, but since more and more data arrives
>>>>>>> over Thrift, finally OOM.
>>>>>>>
>>>>>>> I'm using the "periodic" commit log sync, maybe also this could
>>>>>>> create a situation where the commit log writer is too slow to catch up with
>>>>>>> the data intake, resulting in ever growing memory usage?
>>>>>>>
>>>>>>> Maybe these thoughts are just bullshit. Let me now if so... ;-)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>
>

Re: Cassandra cluster runs into OOM when bulk loading data

Posted by Schubert Zhang <zs...@gmail.com>.

Seems:
ROW-MUTATION-STAGE   32      3349       63897493
is the clue, too many mutation requests are pending.


Yes, I also think cassandra should add a mechanism to avoid too many
requests pending (in queue).
When the queue is full, just reject the request from client.

Seems https://issues.apache.org/jira/browse/CASSANDRA-685 is what we want.



On Tue, Apr 27, 2010 at 8:16 PM, Eric Yu <su...@gmail.com> wrote:

> I wrote a script to record the tpstats output every 5 seconds.
> Here is the output just before the jvm OOM:
>
>
> Pool Name                    Active   Pending      Completed
> FILEUTILS-DELETE-POOL             0         0            280
>
> STREAM-STAGE                      0         0              0
> RESPONSE-STAGE                    0         0         245573
>
> ROW-READ-STAGE                    0         0              0
> LB-OPERATIONS                     0         0              0
> MESSAGE-DESERIALIZER-POOL         1  14290091       65943291
> GMFD                              0         0          26670
>
> LB-TARGET                         0         0              0
> CONSISTENCY-MANAGER               0         0              0
> ROW-MUTATION-STAGE               32      3349       63897493
>
> MESSAGE-STREAMING-POOL            0         0              3
> LOAD-BALANCER-STAGE               0         0              0
> FLUSH-SORTER-POOL                 0         0              0
> MEMTABLE-POST-FLUSHER             0         0            420
> FLUSH-WRITER-POOL                 0         0            420
>
> AE-SERVICE-STAGE                  1         1              4
> HINTED-HANDOFF-POOL               0         0             52
>
>
> On Tue, Apr 27, 2010 at 10:53 AM, Chris Goffinet <go...@digg.com>wrote:
>
>> I'll work on doing more tests around this. In 0.5 we used a different data
>> structure that required polling. But this does seem problematic.
>>
>>  -Chris
>>
>> On Apr 26, 2010, at 7:04 PM, Eric Yu wrote:
>>
>> I have the same problem here, and I analysised the hprof file with mat, as
>> you said, LinkedBlockQueue used 2.6GB.
>> I think the ThreadPool of cassandra should limit the queue size.
>>
>> cassandra 0.6.1
>>
>> java version
>> $ java -version
>> java version "1.6.0_20"
>> Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
>> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode)
>>
>> iostat
>> $ iostat -x -l 1
>> Device:         rrqm/s   wrqm/s   r/s   w/s    rkB/s    wkB/s avgrq-sz
>> avgqu-sz   await  svctm  %util
>> sda              81.00  8175.00 224.00 17.00 23984.00  2728.00
>> 221.68     1.01    1.86   0.76  18.20
>>
>> tpstats, of coz, this node is still alive
>> $ ./nodetool -host localhost tpstats
>> Pool Name                    Active   Pending      Completed
>> FILEUTILS-DELETE-POOL             0         0           1281
>> STREAM-STAGE                      0         0              0
>> RESPONSE-STAGE                    0         0      473617241
>> ROW-READ-STAGE                    0         0              0
>> LB-OPERATIONS                     0         0              0
>> MESSAGE-DESERIALIZER-POOL         0         0      718355184
>> GMFD                              0         0         132509
>> LB-TARGET                         0         0              0
>> CONSISTENCY-MANAGER               0         0              0
>> ROW-MUTATION-STAGE                0         0      293735704
>> MESSAGE-STREAMING-POOL            0         0              6
>> LOAD-BALANCER-STAGE               0         0              0
>> FLUSH-SORTER-POOL                 0         0              0
>> MEMTABLE-POST-FLUSHER             0         0           1870
>> FLUSH-WRITER-POOL                 0         0           1870
>> AE-SERVICE-STAGE                  0         0              5
>> HINTED-HANDOFF-POOL               0         0             21
>>
>>
>> On Tue, Apr 27, 2010 at 3:32 AM, Chris Goffinet <go...@digg.com>wrote:
>>
>>> Upgrade to b20 of Sun's version of JVM. This OOM might be related to
>>> LinkedBlockQueue issues that were fixed.
>>>
>>> -Chris
>>>
>>>
>>> 2010/4/26 Roland Hänel <ro...@haenel.me>
>>>
>>>> Cassandra Version 0.6.1
>>>> OpenJDK Server VM (build 14.0-b16, mixed mode)
>>>> Import speed is about 10MB/s for the full cluster; if a compaction is
>>>> going on the individual node is I/O limited
>>>> tpstats: caught me, didn't know this. I will set up a test and try to
>>>> catch a node during the critical time.
>>>>
>>>> Thanks,
>>>> Roland
>>>>
>>>>
>>>> 2010/4/26 Chris Goffinet <go...@digg.com>
>>>>
>>>>  Which version of Cassandra?
>>>>> Which version of Java JVM are you using?
>>>>> What do your I/O stats look like when bulk importing?
>>>>> When you run `nodeprobe -host XXXX tpstats` is any thread pool backing
>>>>> up during the import?
>>>>>
>>>>> -Chris
>>>>>
>>>>>
>>>>> 2010/4/26 Roland Hänel <ro...@haenel.me>
>>>>>
>>>>> I have a cluster of 5 machines building a Cassandra datastore, and I
>>>>>> load bulk data into this using the Java Thrift API. The first ~250GB runs
>>>>>> fine, then, one of the nodes starts to throw OutOfMemory exceptions. I'm not
>>>>>> using and row or index caches, and since I only have 5 CF's and some 2,5 GB
>>>>>> of RAM allocated to the JVM (-Xmx2500M), in theory, that should happen. All
>>>>>> inserts are done with consistency level ALL.
>>>>>>
>>>>>> I hope with this I have avoided all the 'usual dummy errors' that lead
>>>>>> to OOM's. I have begun to troubleshoot the issue with JMX, however, it's
>>>>>> difficult to catch the JVM in the right moment because it runs well for
>>>>>> several hours before this thing happens.
>>>>>>
>>>>>> One thing gets to my mind, maybe one of the experts could confirm or
>>>>>> reject this idea for me: is it possible that when one machine slows down a
>>>>>> little bit (for example because a big compaction is going on), the memtables
>>>>>> don't get flushed to disk as fast as they are building up under the
>>>>>> continuing bulk import? That would result in a downward spiral, the system
>>>>>> gets slower and slower on disk I/O, but since more and more data arrives
>>>>>> over Thrift, finally OOM.
>>>>>>
>>>>>> I'm using the "periodic" commit log sync, maybe also this could create
>>>>>> a situation where the commit log writer is too slow to catch up with the
>>>>>> data intake, resulting in ever growing memory usage?
>>>>>>
>>>>>> Maybe these thoughts are just bullshit. Let me now if so... ;-)
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>>
>

Re: Cassandra cluster runs into OOM when bulk loading data

Posted by Eric Yu <su...@gmail.com>.

I wrote a script to record the tpstats output every 5 seconds.
Here is the output just before the jvm OOM:

Pool Name                    Active   Pending      Completed
FILEUTILS-DELETE-POOL             0         0            280
STREAM-STAGE                      0         0              0
RESPONSE-STAGE                    0         0         245573
ROW-READ-STAGE                    0         0              0
LB-OPERATIONS                     0         0              0
MESSAGE-DESERIALIZER-POOL         1  14290091       65943291
GMFD                              0         0          26670
LB-TARGET                         0         0              0
CONSISTENCY-MANAGER               0         0              0
ROW-MUTATION-STAGE               32      3349       63897493
MESSAGE-STREAMING-POOL            0         0              3
LOAD-BALANCER-STAGE               0         0              0
FLUSH-SORTER-POOL                 0         0              0
MEMTABLE-POST-FLUSHER             0         0            420
FLUSH-WRITER-POOL                 0         0            420
AE-SERVICE-STAGE                  1         1              4
HINTED-HANDOFF-POOL               0         0             52

On Tue, Apr 27, 2010 at 10:53 AM, Chris Goffinet <go...@digg.com> wrote:

> I'll work on doing more tests around this. In 0.5 we used a different data
> structure that required polling. But this does seem problematic.
>
> -Chris
>
> On Apr 26, 2010, at 7:04 PM, Eric Yu wrote:
>
> I have the same problem here, and I analysised the hprof file with mat, as
> you said, LinkedBlockQueue used 2.6GB.
> I think the ThreadPool of cassandra should limit the queue size.
>
> cassandra 0.6.1
>
> java version
> $ java -version
> java version "1.6.0_20"
> Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode)
>
> iostat
> $ iostat -x -l 1
> Device:         rrqm/s   wrqm/s   r/s   w/s    rkB/s    wkB/s avgrq-sz
> avgqu-sz   await  svctm  %util
> sda              81.00  8175.00 224.00 17.00 23984.00  2728.00   221.68
> 1.01    1.86   0.76  18.20
>
> tpstats, of coz, this node is still alive
> $ ./nodetool -host localhost tpstats
> Pool Name                    Active   Pending      Completed
> FILEUTILS-DELETE-POOL             0         0           1281
> STREAM-STAGE                      0         0              0
> RESPONSE-STAGE                    0         0      473617241
> ROW-READ-STAGE                    0         0              0
> LB-OPERATIONS                     0         0              0
> MESSAGE-DESERIALIZER-POOL         0         0      718355184
> GMFD                              0         0         132509
> LB-TARGET                         0         0              0
> CONSISTENCY-MANAGER               0         0              0
> ROW-MUTATION-STAGE                0         0      293735704
> MESSAGE-STREAMING-POOL            0         0              6
> LOAD-BALANCER-STAGE               0         0              0
> FLUSH-SORTER-POOL                 0         0              0
> MEMTABLE-POST-FLUSHER             0         0           1870
> FLUSH-WRITER-POOL                 0         0           1870
> AE-SERVICE-STAGE                  0         0              5
> HINTED-HANDOFF-POOL               0         0             21
>
>
> On Tue, Apr 27, 2010 at 3:32 AM, Chris Goffinet <go...@digg.com> wrote:
>
>> Upgrade to b20 of Sun's version of JVM. This OOM might be related to
>> LinkedBlockQueue issues that were fixed.
>>
>> -Chris
>>
>>
>> 2010/4/26 Roland Hänel <ro...@haenel.me>
>>
>>> Cassandra Version 0.6.1
>>> OpenJDK Server VM (build 14.0-b16, mixed mode)
>>> Import speed is about 10MB/s for the full cluster; if a compaction is
>>> going on the individual node is I/O limited
>>> tpstats: caught me, didn't know this. I will set up a test and try to
>>> catch a node during the critical time.
>>>
>>> Thanks,
>>> Roland
>>>
>>>
>>> 2010/4/26 Chris Goffinet <go...@digg.com>
>>>
>>>  Which version of Cassandra?
>>>> Which version of Java JVM are you using?
>>>> What do your I/O stats look like when bulk importing?
>>>> When you run `nodeprobe -host XXXX tpstats` is any thread pool backing
>>>> up during the import?
>>>>
>>>> -Chris
>>>>
>>>>
>>>> 2010/4/26 Roland Hänel <ro...@haenel.me>
>>>>
>>>> I have a cluster of 5 machines building a Cassandra datastore, and I
>>>>> load bulk data into this using the Java Thrift API. The first ~250GB runs
>>>>> fine, then, one of the nodes starts to throw OutOfMemory exceptions. I'm not
>>>>> using and row or index caches, and since I only have 5 CF's and some 2,5 GB
>>>>> of RAM allocated to the JVM (-Xmx2500M), in theory, that should happen. All
>>>>> inserts are done with consistency level ALL.
>>>>>
>>>>> I hope with this I have avoided all the 'usual dummy errors' that lead
>>>>> to OOM's. I have begun to troubleshoot the issue with JMX, however, it's
>>>>> difficult to catch the JVM in the right moment because it runs well for
>>>>> several hours before this thing happens.
>>>>>
>>>>> One thing gets to my mind, maybe one of the experts could confirm or
>>>>> reject this idea for me: is it possible that when one machine slows down a
>>>>> little bit (for example because a big compaction is going on), the memtables
>>>>> don't get flushed to disk as fast as they are building up under the
>>>>> continuing bulk import? That would result in a downward spiral, the system
>>>>> gets slower and slower on disk I/O, but since more and more data arrives
>>>>> over Thrift, finally OOM.
>>>>>
>>>>> I'm using the "periodic" commit log sync, maybe also this could create
>>>>> a situation where the commit log writer is too slow to catch up with the
>>>>> data intake, resulting in ever growing memory usage?
>>>>>
>>>>> Maybe these thoughts are just bullshit. Let me now if so... ;-)
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
>

Re: Cassandra cluster runs into OOM when bulk loading data

Posted by Chris Goffinet <go...@digg.com>.

I'll work on doing more tests around this. In 0.5 we used a different data structure that required polling. But this does seem problematic. 

-Chris

On Apr 26, 2010, at 7:04 PM, Eric Yu wrote:

> I have the same problem here, and I analysised the hprof file with mat, as you said, LinkedBlockQueue used 2.6GB.
> I think the ThreadPool of cassandra should limit the queue size.
> 
> cassandra 0.6.1
> 
> java version
> $ java -version
> java version "1.6.0_20"
> Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode)
> 
> iostat
> $ iostat -x -l 1
> Device:         rrqm/s   wrqm/s   r/s   w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
> sda              81.00  8175.00 224.00 17.00 23984.00  2728.00   221.68     1.01    1.86   0.76  18.20
> 
> tpstats, of coz, this node is still alive
> $ ./nodetool -host localhost tpstats  
> Pool Name                    Active   Pending      Completed
> FILEUTILS-DELETE-POOL             0         0           1281
> STREAM-STAGE                      0         0              0
> RESPONSE-STAGE                    0         0      473617241
> ROW-READ-STAGE                    0         0              0
> LB-OPERATIONS                     0         0              0
> MESSAGE-DESERIALIZER-POOL         0         0      718355184
> GMFD                              0         0         132509
> LB-TARGET                         0         0              0
> CONSISTENCY-MANAGER               0         0              0
> ROW-MUTATION-STAGE                0         0      293735704
> MESSAGE-STREAMING-POOL            0         0              6
> LOAD-BALANCER-STAGE               0         0              0
> FLUSH-SORTER-POOL                 0         0              0
> MEMTABLE-POST-FLUSHER             0         0           1870
> FLUSH-WRITER-POOL                 0         0           1870
> AE-SERVICE-STAGE                  0         0              5
> HINTED-HANDOFF-POOL               0         0             21
> 
> 
> On Tue, Apr 27, 2010 at 3:32 AM, Chris Goffinet <go...@digg.com> wrote:
> Upgrade to b20 of Sun's version of JVM. This OOM might be related to LinkedBlockQueue issues that were fixed.
> 
> -Chris
> 
> 
> 2010/4/26 Roland Hänel <ro...@haenel.me>
> Cassandra Version 0.6.1
> OpenJDK Server VM (build 14.0-b16, mixed mode)
> Import speed is about 10MB/s for the full cluster; if a compaction is going on the individual node is I/O limited
> tpstats: caught me, didn't know this. I will set up a test and try to catch a node during the critical time.
> 
> Thanks,
> Roland
> 
> 
> 2010/4/26 Chris Goffinet <go...@digg.com>
> 
> Which version of Cassandra?
> Which version of Java JVM are you using?
> What do your I/O stats look like when bulk importing?
> When you run `nodeprobe -host XXXX tpstats` is any thread pool backing up during the import?
> 
> -Chris
> 
> 
> 2010/4/26 Roland Hänel <ro...@haenel.me>
> 
> I have a cluster of 5 machines building a Cassandra datastore, and I load bulk data into this using the Java Thrift API. The first ~250GB runs fine, then, one of the nodes starts to throw OutOfMemory exceptions. I'm not using and row or index caches, and since I only have 5 CF's and some 2,5 GB of RAM allocated to the JVM (-Xmx2500M), in theory, that should happen. All inserts are done with consistency level ALL.
> 
> I hope with this I have avoided all the 'usual dummy errors' that lead to OOM's. I have begun to troubleshoot the issue with JMX, however, it's difficult to catch the JVM in the right moment because it runs well for several hours before this thing happens.
> 
> One thing gets to my mind, maybe one of the experts could confirm or reject this idea for me: is it possible that when one machine slows down a little bit (for example because a big compaction is going on), the memtables don't get flushed to disk as fast as they are building up under the continuing bulk import? That would result in a downward spiral, the system gets slower and slower on disk I/O, but since more and more data arrives over Thrift, finally OOM.
> 
> I'm using the "periodic" commit log sync, maybe also this could create a situation where the commit log writer is too slow to catch up with the data intake, resulting in ever growing memory usage?
> 
> Maybe these thoughts are just bullshit. Let me now if so... ;-)
> 
> 
> 
> 
> 
>

Re: Cassandra cluster runs into OOM when bulk loading data

Posted by Eric Yu <su...@gmail.com>.

I have the same problem here, and I analysised the hprof file with mat, as
you said, LinkedBlockQueue used 2.6GB.
I think the ThreadPool of cassandra should limit the queue size.

cassandra 0.6.1

java version
$ java -version
java version "1.6.0_20"
Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode)

iostat
$ iostat -x -l 1
Device:         rrqm/s   wrqm/s   r/s   w/s    rkB/s    wkB/s avgrq-sz
avgqu-sz   await  svctm  %util
sda              81.00  8175.00 224.00 17.00 23984.00  2728.00   221.68
1.01    1.86   0.76  18.20

tpstats, of coz, this node is still alive
$ ./nodetool -host localhost tpstats
Pool Name                    Active   Pending      Completed
FILEUTILS-DELETE-POOL             0         0           1281
STREAM-STAGE                      0         0              0
RESPONSE-STAGE                    0         0      473617241
ROW-READ-STAGE                    0         0              0
LB-OPERATIONS                     0         0              0
MESSAGE-DESERIALIZER-POOL         0         0      718355184
GMFD                              0         0         132509
LB-TARGET                         0         0              0
CONSISTENCY-MANAGER               0         0              0
ROW-MUTATION-STAGE                0         0      293735704
MESSAGE-STREAMING-POOL            0         0              6
LOAD-BALANCER-STAGE               0         0              0
FLUSH-SORTER-POOL                 0         0              0
MEMTABLE-POST-FLUSHER             0         0           1870
FLUSH-WRITER-POOL                 0         0           1870
AE-SERVICE-STAGE                  0         0              5
HINTED-HANDOFF-POOL               0         0             21


On Tue, Apr 27, 2010 at 3:32 AM, Chris Goffinet <go...@digg.com> wrote:

> Upgrade to b20 of Sun's version of JVM. This OOM might be related to
> LinkedBlockQueue issues that were fixed.
>
> -Chris
>
>
> 2010/4/26 Roland Hänel <ro...@haenel.me>
>
>> Cassandra Version 0.6.1
>> OpenJDK Server VM (build 14.0-b16, mixed mode)
>> Import speed is about 10MB/s for the full cluster; if a compaction is
>> going on the individual node is I/O limited
>> tpstats: caught me, didn't know this. I will set up a test and try to
>> catch a node during the critical time.
>>
>> Thanks,
>> Roland
>>
>>
>> 2010/4/26 Chris Goffinet <go...@digg.com>
>>
>>  Which version of Cassandra?
>>> Which version of Java JVM are you using?
>>> What do your I/O stats look like when bulk importing?
>>> When you run `nodeprobe -host XXXX tpstats` is any thread pool backing up
>>> during the import?
>>>
>>> -Chris
>>>
>>>
>>> 2010/4/26 Roland Hänel <ro...@haenel.me>
>>>
>>> I have a cluster of 5 machines building a Cassandra datastore, and I load
>>>> bulk data into this using the Java Thrift API. The first ~250GB runs fine,
>>>> then, one of the nodes starts to throw OutOfMemory exceptions. I'm not using
>>>> and row or index caches, and since I only have 5 CF's and some 2,5 GB of RAM
>>>> allocated to the JVM (-Xmx2500M), in theory, that should happen. All inserts
>>>> are done with consistency level ALL.
>>>>
>>>> I hope with this I have avoided all the 'usual dummy errors' that lead
>>>> to OOM's. I have begun to troubleshoot the issue with JMX, however, it's
>>>> difficult to catch the JVM in the right moment because it runs well for
>>>> several hours before this thing happens.
>>>>
>>>> One thing gets to my mind, maybe one of the experts could confirm or
>>>> reject this idea for me: is it possible that when one machine slows down a
>>>> little bit (for example because a big compaction is going on), the memtables
>>>> don't get flushed to disk as fast as they are building up under the
>>>> continuing bulk import? That would result in a downward spiral, the system
>>>> gets slower and slower on disk I/O, but since more and more data arrives
>>>> over Thrift, finally OOM.
>>>>
>>>> I'm using the "periodic" commit log sync, maybe also this could create a
>>>> situation where the commit log writer is too slow to catch up with the data
>>>> intake, resulting in ever growing memory usage?
>>>>
>>>> Maybe these thoughts are just bullshit. Let me now if so... ;-)
>>>>
>>>>
>>>>
>>>
>>
>

Re: Cassandra cluster runs into OOM when bulk loading data

Posted by Chris Goffinet <go...@digg.com>.

Upgrade to b20 of Sun's version of JVM. This OOM might be related to
LinkedBlockQueue issues that were fixed.

-Chris


2010/4/26 Roland Hänel <ro...@haenel.me>

> Cassandra Version 0.6.1
> OpenJDK Server VM (build 14.0-b16, mixed mode)
> Import speed is about 10MB/s for the full cluster; if a compaction is going
> on the individual node is I/O limited
> tpstats: caught me, didn't know this. I will set up a test and try to catch
> a node during the critical time.
>
> Thanks,
> Roland
>
>
> 2010/4/26 Chris Goffinet <go...@digg.com>
>
>  Which version of Cassandra?
>> Which version of Java JVM are you using?
>> What do your I/O stats look like when bulk importing?
>> When you run `nodeprobe -host XXXX tpstats` is any thread pool backing up
>> during the import?
>>
>> -Chris
>>
>>
>> 2010/4/26 Roland Hänel <ro...@haenel.me>
>>
>> I have a cluster of 5 machines building a Cassandra datastore, and I load
>>> bulk data into this using the Java Thrift API. The first ~250GB runs fine,
>>> then, one of the nodes starts to throw OutOfMemory exceptions. I'm not using
>>> and row or index caches, and since I only have 5 CF's and some 2,5 GB of RAM
>>> allocated to the JVM (-Xmx2500M), in theory, that should happen. All inserts
>>> are done with consistency level ALL.
>>>
>>> I hope with this I have avoided all the 'usual dummy errors' that lead to
>>> OOM's. I have begun to troubleshoot the issue with JMX, however, it's
>>> difficult to catch the JVM in the right moment because it runs well for
>>> several hours before this thing happens.
>>>
>>> One thing gets to my mind, maybe one of the experts could confirm or
>>> reject this idea for me: is it possible that when one machine slows down a
>>> little bit (for example because a big compaction is going on), the memtables
>>> don't get flushed to disk as fast as they are building up under the
>>> continuing bulk import? That would result in a downward spiral, the system
>>> gets slower and slower on disk I/O, but since more and more data arrives
>>> over Thrift, finally OOM.
>>>
>>> I'm using the "periodic" commit log sync, maybe also this could create a
>>> situation where the commit log writer is too slow to catch up with the data
>>> intake, resulting in ever growing memory usage?
>>>
>>> Maybe these thoughts are just bullshit. Let me now if so... ;-)
>>>
>>>
>>>
>>
>

Re: Cassandra cluster runs into OOM when bulk loading data

Posted by Roland Hänel <ro...@haenel.me>.

Cassandra Version 0.6.1
OpenJDK Server VM (build 14.0-b16, mixed mode)
Import speed is about 10MB/s for the full cluster; if a compaction is going
on the individual node is I/O limited
tpstats: caught me, didn't know this. I will set up a test and try to catch
a node during the critical time.

Thanks,
Roland


2010/4/26 Chris Goffinet <go...@digg.com>

> Which version of Cassandra?
> Which version of Java JVM are you using?
> What do your I/O stats look like when bulk importing?
> When you run `nodeprobe -host XXXX tpstats` is any thread pool backing up
> during the import?
>
> -Chris
>
>
> 2010/4/26 Roland Hänel <ro...@haenel.me>
>
> I have a cluster of 5 machines building a Cassandra datastore, and I load
>> bulk data into this using the Java Thrift API. The first ~250GB runs fine,
>> then, one of the nodes starts to throw OutOfMemory exceptions. I'm not using
>> and row or index caches, and since I only have 5 CF's and some 2,5 GB of RAM
>> allocated to the JVM (-Xmx2500M), in theory, that should happen. All inserts
>> are done with consistency level ALL.
>>
>> I hope with this I have avoided all the 'usual dummy errors' that lead to
>> OOM's. I have begun to troubleshoot the issue with JMX, however, it's
>> difficult to catch the JVM in the right moment because it runs well for
>> several hours before this thing happens.
>>
>> One thing gets to my mind, maybe one of the experts could confirm or
>> reject this idea for me: is it possible that when one machine slows down a
>> little bit (for example because a big compaction is going on), the memtables
>> don't get flushed to disk as fast as they are building up under the
>> continuing bulk import? That would result in a downward spiral, the system
>> gets slower and slower on disk I/O, but since more and more data arrives
>> over Thrift, finally OOM.
>>
>> I'm using the "periodic" commit log sync, maybe also this could create a
>> situation where the commit log writer is too slow to catch up with the data
>> intake, resulting in ever growing memory usage?
>>
>> Maybe these thoughts are just bullshit. Let me now if so... ;-)
>>
>>
>>
>

Re: Cassandra cluster runs into OOM when bulk loading data

Posted by Chris Goffinet <go...@digg.com>.

Which version of Cassandra?
Which version of Java JVM are you using?
What do your I/O stats look like when bulk importing?
When you run `nodeprobe -host XXXX tpstats` is any thread pool backing up
during the import?

-Chris


2010/4/26 Roland Hänel <ro...@haenel.me>

> I have a cluster of 5 machines building a Cassandra datastore, and I load
> bulk data into this using the Java Thrift API. The first ~250GB runs fine,
> then, one of the nodes starts to throw OutOfMemory exceptions. I'm not using
> and row or index caches, and since I only have 5 CF's and some 2,5 GB of RAM
> allocated to the JVM (-Xmx2500M), in theory, that should happen. All inserts
> are done with consistency level ALL.
>
> I hope with this I have avoided all the 'usual dummy errors' that lead to
> OOM's. I have begun to troubleshoot the issue with JMX, however, it's
> difficult to catch the JVM in the right moment because it runs well for
> several hours before this thing happens.
>
> One thing gets to my mind, maybe one of the experts could confirm or reject
> this idea for me: is it possible that when one machine slows down a little
> bit (for example because a big compaction is going on), the memtables don't
> get flushed to disk as fast as they are building up under the continuing
> bulk import? That would result in a downward spiral, the system gets slower
> and slower on disk I/O, but since more and more data arrives over Thrift,
> finally OOM.
>
> I'm using the "periodic" commit log sync, maybe also this could create a
> situation where the commit log writer is too slow to catch up with the data
> intake, resulting in ever growing memory usage?
>
> Maybe these thoughts are just bullshit. Let me now if so... ;-)
>
>
>