You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Jussi P?öri <ju...@androidconsulting.com> on 2010/04/28 16:24:34 UTC

Inserting files to Cassandra timeouts

new try, previous went to wrong place...

Hi all,

i'm trying to run a scenario of adding files from specific folder to cassandra. Now I have 64 files(about 15-20 MB per file) and overall of 1GB of data. 
I'm able to insert a round 40 files, but after that the cassandra goes to some GC loop and I finally get an timeout to the client. 
It is not going to OOM, but it just jams. 

Here is what I had last marks in log file:
NFO [GC inspection] 2010-04-28 10:07:55,297 GCInspector.java (line 110) GC for ParNew: 232 ms, 25731128 reclaimed leaving 553241120 used; max is 4108386304
 INFO [GC inspection] 2010-04-28 10:09:02,331 GCInspector.java (line 110) GC for ParNew: 2844 ms, 238909856 reclaimed leaving 1435582832 used; max is 4108386304
 INFO [GC inspection] 2010-04-28 10:09:49,421 GCInspector.java (line 110) GC for ParNew: 30666 ms, 11185824 reclaimed leaving 1679795336 used; max is 4108386304
 INFO [GC inspection] 2010-04-28 10:11:18,090 GCInspector.java (line 110) GC for ParNew: 895 ms, 17921680 reclaimed leaving 1589308456 used; max is 4108386304



I think that I must have something wrong in my configurations or in how I use cassandra, because here people are inserting 10 times more stuff and it works. 

Column family I using:
<ColumnFamily CompareWith="BytesType" Name="Standard1"/>
Basically inserting with key name is "Folder_name" and column name is "file name" and value is the file content. 
I tried with Hector(mainly) and directly using thrift(insert and batch_mutate). 

In my case, the data does not need to readable immediately after insert, but I don't know it that helps in anyway. 


My environment :
mac and/or linux, tested in both
java 1.6.0_17 
Cassandra 0.6.1



 <RpcTimeoutInMillis>60000</RpcTimeoutInMillis>
<CommitLogRotationThresholdInMB>32</CommitLogRotationThresholdInMB>
<RowWarningThresholdInMB>512</RowWarningThresholdInMB>
  <SlicedBufferSizeInKB>32</SlicedBufferSizeInKB>
  <FlushDataBufferSizeInMB>32</FlushDataBufferSizeInMB>
  <FlushIndexBufferSizeInMB>8</FlushIndexBufferSizeInMB>
  <ColumnIndexSizeInKB>64</ColumnIndexSizeInKB>
  <MemtableThroughputInMB>64</MemtableThroughputInMB>
  <BinaryMemtableThroughputInMB>256</BinaryMemtableThroughputInMB>
  <MemtableOperationsInMillions>0.1</MemtableOperationsInMillions>
  <MemtableFlushAfterMinutes>60</MemtableFlushAfterMinutes>
  <ConcurrentReads>8</ConcurrentReads>
  <ConcurrentWrites>32</ConcurrentWrites>
  <CommitLogSync>batch</CommitLogSync>
  <!-- CommitLogSyncPeriodInMS>10000</CommitLogSyncPeriodInMS -->
  <CommitLogSyncBatchWindowInMS>1.0</CommitLogSyncBatchWindowInMS>
  <GCGraceSeconds>500</GCGraceSeconds>

JVM_OPTS=" \
        -server \
        -Xms3G \
        -Xmx3G \
        -XX:PermSize=512m \
        -XX:MaxPermSize=800m \
        -XX:MaxNewSize=256m \
        -XX:NewSize=128m \
        -XX:TargetSurvivorRatio=90 \
        -XX:+AggressiveOpts \
        -XX:+UseParNewGC \
        -XX:+UseConcMarkSweepGC \
        -XX:+CMSParallelRemarkEnabled \
        -XX:+HeapDumpOnOutOfMemoryError \
        -XX:SurvivorRatio=128 \
        -XX:MaxTenuringThreshold=0 \
        -XX:+DisableExplicitGC \
        -Dcom.sun.management.jmxremote.port=8080 \
        -Dcom.sun.management.jmxremote.ssl=false \
        -Dcom.sun.management.jmxremote.authenticate=false"


Re: Inserting files to Cassandra timeouts

Posted by Jussi P?öri <ju...@androidconsulting.com>.
I was thinking this too, but I think that the overall insert amount is
not that big.
Data is basically map data, and the files are map tiles, which I can
easily make smaller.
We are currently using this data from multiple nodes(GRID), but we want
to get rid off the files system hassle(basically samba mounts).

Read r always done per file(column). This is why I think that Cassandra
would be good. At least the read performance is more that good
for us.

-jussi


Schubert Zhang wrote:
> I think your file (as cassandra column value) is too large.
> And I also think Cassandra is not good at store files.
>
> On Wed, Apr 28, 2010 at 10:24 PM, Jussi P?öri
> <jussi@androidconsulting.com <ma...@androidconsulting.com>> wrote:
>
>     new try, previous went to wrong place...
>
>     Hi all,
>
>     i'm trying to run a scenario of adding files from specific folder
>     to cassandra. Now I have 64 files(about 15-20 MB per file) and
>     overall of 1GB of data.
>     I'm able to insert a round 40 files, but after that the cassandra
>     goes to some GC loop and I finally get an timeout to the client.
>     It is not going to OOM, but it just jams.
>
>     Here is what I had last marks in log file:
>     NFO [GC inspection] 2010-04-28 10:07:55,297 GCInspector.java (line
>     110) GC for ParNew: 232 ms, 25731128 reclaimed leaving 553241120
>     used; max is 4108386304
>      INFO [GC inspection] 2010-04-28 10:09:02,331 GCInspector.java
>     (line 110) GC for ParNew: 2844 ms, 238909856 reclaimed leaving
>     1435582832 used; max is 4108386304
>      INFO [GC inspection] 2010-04-28 10:09:49,421 GCInspector.java
>     (line 110) GC for ParNew: 30666 ms, 11185824 reclaimed leaving
>     1679795336 used; max is 4108386304
>      INFO [GC inspection] 2010-04-28 10:11:18,090 GCInspector.java
>     (line 110) GC for ParNew: 895 ms, 17921680 reclaimed leaving
>     1589308456 used; max is 4108386304
>
>
>
>     I think that I must have something wrong in my configurations or
>     in how I use cassandra, because here people are inserting 10 times
>     more stuff and it works.
>
>     Column family I using:
>     <ColumnFamily CompareWith="BytesType" Name="Standard1"/>
>     Basically inserting with key name is "Folder_name" and column name
>     is "file name" and value is the file content.
>     I tried with Hector(mainly) and directly using thrift(insert and
>     batch_mutate).
>
>     In my case, the data does not need to readable immediately after
>     insert, but I don't know it that helps in anyway.
>
>
>     My environment :
>     mac and/or linux, tested in both
>     java 1.6.0_17
>     Cassandra 0.6.1
>
>
>
>      <RpcTimeoutInMillis>60000</RpcTimeoutInMillis>
>     <CommitLogRotationThresholdInMB>32</CommitLogRotationThresholdInMB>
>     <RowWarningThresholdInMB>512</RowWarningThresholdInMB>
>      <SlicedBufferSizeInKB>32</SlicedBufferSizeInKB>
>      <FlushDataBufferSizeInMB>32</FlushDataBufferSizeInMB>
>      <FlushIndexBufferSizeInMB>8</FlushIndexBufferSizeInMB>
>      <ColumnIndexSizeInKB>64</ColumnIndexSizeInKB>
>      <MemtableThroughputInMB>64</MemtableThroughputInMB>
>      <BinaryMemtableThroughputInMB>256</BinaryMemtableThroughputInMB>
>      <MemtableOperationsInMillions>0.1</MemtableOperationsInMillions>
>      <MemtableFlushAfterMinutes>60</MemtableFlushAfterMinutes>
>      <ConcurrentReads>8</ConcurrentReads>
>      <ConcurrentWrites>32</ConcurrentWrites>
>      <CommitLogSync>batch</CommitLogSync>
>      <!-- CommitLogSyncPeriodInMS>10000</CommitLogSyncPeriodInMS -->
>      <CommitLogSyncBatchWindowInMS>1.0</CommitLogSyncBatchWindowInMS>
>      <GCGraceSeconds>500</GCGraceSeconds>
>
>     JVM_OPTS=" \
>            -server \
>            -Xms3G \
>            -Xmx3G \
>            -XX:PermSize=512m \
>            -XX:MaxPermSize=800m \
>            -XX:MaxNewSize=256m \
>            -XX:NewSize=128m \
>            -XX:TargetSurvivorRatio=90 \
>            -XX:+AggressiveOpts \
>            -XX:+UseParNewGC \
>            -XX:+UseConcMarkSweepGC \
>            -XX:+CMSParallelRemarkEnabled \
>            -XX:+HeapDumpOnOutOfMemoryError \
>            -XX:SurvivorRatio=128 \
>            -XX:MaxTenuringThreshold=0 \
>            -XX:+DisableExplicitGC \
>            -Dcom.sun.management.jmxremote.port=8080 \
>            -Dcom.sun.management.jmxremote.ssl=false \
>            -Dcom.sun.management.jmxremote.authenticate=false"
>
>

Re: Inserting files to Cassandra timeouts

Posted by Schubert Zhang <zs...@gmail.com>.
I think your file (as cassandra column value) is too large.
And I also think Cassandra is not good at store files.

On Wed, Apr 28, 2010 at 10:24 PM, Jussi P?öri
<ju...@androidconsulting.com>wrote:

> new try, previous went to wrong place...
>
> Hi all,
>
> i'm trying to run a scenario of adding files from specific folder to
> cassandra. Now I have 64 files(about 15-20 MB per file) and overall of 1GB
> of data.
> I'm able to insert a round 40 files, but after that the cassandra goes to
> some GC loop and I finally get an timeout to the client.
> It is not going to OOM, but it just jams.
>
> Here is what I had last marks in log file:
> NFO [GC inspection] 2010-04-28 10:07:55,297 GCInspector.java (line 110) GC
> for ParNew: 232 ms, 25731128 reclaimed leaving 553241120 used; max is
> 4108386304
>  INFO [GC inspection] 2010-04-28 10:09:02,331 GCInspector.java (line 110)
> GC for ParNew: 2844 ms, 238909856 reclaimed leaving 1435582832 used; max is
> 4108386304
>  INFO [GC inspection] 2010-04-28 10:09:49,421 GCInspector.java (line 110)
> GC for ParNew: 30666 ms, 11185824 reclaimed leaving 1679795336 used; max is
> 4108386304
>  INFO [GC inspection] 2010-04-28 10:11:18,090 GCInspector.java (line 110)
> GC for ParNew: 895 ms, 17921680 reclaimed leaving 1589308456 used; max is
> 4108386304
>
>
>
> I think that I must have something wrong in my configurations or in how I
> use cassandra, because here people are inserting 10 times more stuff and it
> works.
>
> Column family I using:
> <ColumnFamily CompareWith="BytesType" Name="Standard1"/>
> Basically inserting with key name is "Folder_name" and column name is "file
> name" and value is the file content.
> I tried with Hector(mainly) and directly using thrift(insert and
> batch_mutate).
>
> In my case, the data does not need to readable immediately after insert,
> but I don't know it that helps in anyway.
>
>
> My environment :
> mac and/or linux, tested in both
> java 1.6.0_17
> Cassandra 0.6.1
>
>
>
>  <RpcTimeoutInMillis>60000</RpcTimeoutInMillis>
> <CommitLogRotationThresholdInMB>32</CommitLogRotationThresholdInMB>
> <RowWarningThresholdInMB>512</RowWarningThresholdInMB>
>  <SlicedBufferSizeInKB>32</SlicedBufferSizeInKB>
>  <FlushDataBufferSizeInMB>32</FlushDataBufferSizeInMB>
>  <FlushIndexBufferSizeInMB>8</FlushIndexBufferSizeInMB>
>  <ColumnIndexSizeInKB>64</ColumnIndexSizeInKB>
>  <MemtableThroughputInMB>64</MemtableThroughputInMB>
>  <BinaryMemtableThroughputInMB>256</BinaryMemtableThroughputInMB>
>  <MemtableOperationsInMillions>0.1</MemtableOperationsInMillions>
>  <MemtableFlushAfterMinutes>60</MemtableFlushAfterMinutes>
>  <ConcurrentReads>8</ConcurrentReads>
>  <ConcurrentWrites>32</ConcurrentWrites>
>  <CommitLogSync>batch</CommitLogSync>
>  <!-- CommitLogSyncPeriodInMS>10000</CommitLogSyncPeriodInMS -->
>  <CommitLogSyncBatchWindowInMS>1.0</CommitLogSyncBatchWindowInMS>
>  <GCGraceSeconds>500</GCGraceSeconds>
>
> JVM_OPTS=" \
>        -server \
>        -Xms3G \
>        -Xmx3G \
>        -XX:PermSize=512m \
>        -XX:MaxPermSize=800m \
>        -XX:MaxNewSize=256m \
>        -XX:NewSize=128m \
>        -XX:TargetSurvivorRatio=90 \
>        -XX:+AggressiveOpts \
>        -XX:+UseParNewGC \
>        -XX:+UseConcMarkSweepGC \
>        -XX:+CMSParallelRemarkEnabled \
>        -XX:+HeapDumpOnOutOfMemoryError \
>        -XX:SurvivorRatio=128 \
>        -XX:MaxTenuringThreshold=0 \
>        -XX:+DisableExplicitGC \
>        -Dcom.sun.management.jmxremote.port=8080 \
>        -Dcom.sun.management.jmxremote.ssl=false \
>        -Dcom.sun.management.jmxremote.authenticate=false"
>
>

Re: Inserting files to Cassandra timeouts

Posted by Jonathan Ellis <jb...@gmail.com>.
compaction starts but never finishes.

are you inserting all these files into the same row?  don't do that.

On Fri, Apr 30, 2010 at 3:04 AM, Spacejatsi <sp...@gmail.com> wrote:
> I ran again the test, inserting 64 files (15-25MB per file) with 2 threads inserting file file at the time.
> First 30 files goes relatively fast in, but then it jams, and finally timeouts. This tpstats is taken when the first timeout came.
>
> I also tested to split the files max of 5 mb per file. That was running fine about 50% but that also jammed.
> Should I be running the inserts to multiple nodes at the same time, not to overload one node only?
>
> 100305-mac17:bin jussi.pori$ ./nodetool -host localhost tpstats
> Pool Name                    Active   Pending      Completed
> FILEUTILS-DELETE-POOL             0         0              8
> STREAM-STAGE                      0         0              0
> RESPONSE-STAGE                    0         0              0
> ROW-READ-STAGE                    0         0              0
> LB-OPERATIONS                     0         0              0
> MESSAGE-DESERIALIZER-POOL         0         0              0
> GMFD                              0         0              0
> LB-TARGET                         0         0              0
> CONSISTENCY-MANAGER               0         0              0
> ROW-MUTATION-STAGE                2         2             31
> MESSAGE-STREAMING-POOL            0         0              0
> LOAD-BALANCER-STAGE               0         0              0
> FLUSH-SORTER-POOL                 0         0              0
> MEMTABLE-POST-FLUSHER             0         0              5
> FLUSH-WRITER-POOL                 0         0              5
> AE-SERVICE-STAGE                  0         0              0
>
> This is the system.log from that test:
>
>  INFO [COMMIT-LOG-WRITER] 2010-04-30 10:43:34,174 CommitLogSegment.java (line 50) Creating new commitlog segment /Users/jussi.pori/Programs/apache-cassandra-0.6.1/commitlog/CommitLog-1272613414173.log
>  INFO [COMMIT-LOG-WRITER] 2010-04-30 10:43:35,481 CommitLogSegment.java (line 50) Creating new commitlog segment /Users/jussi.pori/Programs/apache-cassandra-0.6.1/commitlog/CommitLog-1272613415481.log
>  INFO [ROW-MUTATION-STAGE:9] 2010-04-30 10:43:36,106 ColumnFamilyStore.java (line 357) Standard1 has reached its threshold; switching in a fresh Memtable at CommitLogContext(file='/Users/jussi.pori/Programs/apache-cassandra-0.6.1/commitlog/CommitLog-1272613415481.log', position=18000316)
>  INFO [ROW-MUTATION-STAGE:9] 2010-04-30 10:43:36,107 ColumnFamilyStore.java (line 609) Enqueuing flush of Memtable(Standard1)@505410915
>  INFO [FLUSH-WRITER-POOL:1] 2010-04-30 10:43:36,108 Memtable.java (line 148) Writing Memtable(Standard1)@505410915
>  INFO [COMMIT-LOG-WRITER] 2010-04-30 10:43:36,862 CommitLogSegment.java (line 50) Creating new commitlog segment /Users/jussi.pori/Programs/apache-cassandra-0.6.1/commitlog/CommitLog-1272613416862.log
>  INFO [COMMIT-LOG-WRITER] 2010-04-30 10:43:39,624 CommitLogSegment.java (line 50) Creating new commitlog segment /Users/jussi.pori/Programs/apache-cassandra-0.6.1/commitlog/CommitLog-1272613419624.log
>  INFO [COMMIT-LOG-WRITER] 2010-04-30 10:43:42,555 CommitLogSegment.java (line 50) Creating new commitlog segment /Users/jussi.pori/Programs/apache-cassandra-0.6.1/commitlog/CommitLog-1272613422555.log
>  INFO [FLUSH-WRITER-POOL:1] 2010-04-30 10:43:42,603 Memtable.java (line 162) Completed flushing /Users/jussi.pori/Programs/apache-cassandra-0.6.1/data/Clutters/Standard1-1-Data.db
>  INFO [COMMIT-LOG-WRITER] 2010-04-30 10:43:42,650 CommitLog.java (line 407) Discarding obsolete commit log:CommitLogSegment(/Users/jussi.pori/Programs/apache-cassandra-0.6.1/commitlog/CommitLog-1272613414173.log)
>  INFO [ROW-MUTATION-STAGE:15] 2010-04-30 10:43:42,655 ColumnFamilyStore.java (line 357) Standard1 has reached its threshold; switching in a fresh Memtable at CommitLogContext(file='/Users/jussi.pori/Programs/apache-cassandra-0.6.1/commitlog/CommitLog-1272613422555.log', position=145)
>  INFO [ROW-MUTATION-STAGE:15] 2010-04-30 10:43:42,656 ColumnFamilyStore.java (line 609) Enqueuing flush of Memtable(Standard1)@481059370
>  INFO [FLUSH-WRITER-POOL:1] 2010-04-30 10:43:42,657 Memtable.java (line 148) Writing Memtable(Standard1)@481059370
>  INFO [GC inspection] 2010-04-30 10:43:49,760 GCInspector.java (line 110) GC for ParNew: 2390 ms, 58993544 reclaimed leaving 614178960 used; max is 4108386304
>  INFO [COMMIT-LOG-WRITER] 2010-04-30 10:43:49,913 CommitLogSegment.java (line 50) Creating new commitlog segment /Users/jussi.pori/Programs/apache-cassandra-0.6.1/commitlog/CommitLog-1272613429913.log
>  INFO [GC inspection] 2010-04-30 10:44:07,039 GCInspector.java (line 110) GC for ParNew: 13666 ms, 60792648 reclaimed leaving 816424624 used; max is 4108386304
>  INFO [FLUSH-WRITER-POOL:1] 2010-04-30 10:44:07,040 Memtable.java (line 162) Completed flushing /Users/jussi.pori/Programs/apache-cassandra-0.6.1/data/Clutters/Standard1-2-Data.db
>  INFO [COMMIT-LOG-WRITER] 2010-04-30 10:44:08,164 CommitLog.java (line 407) Discarding obsolete commit log:CommitLogSegment(/Users/jussi.pori/Programs/apache-cassandra-0.6.1/commitlog/CommitLog-1272613415481.log)
>  INFO [COMMIT-LOG-WRITER] 2010-04-30 10:44:08,164 CommitLog.java (line 407) Discarding obsolete commit log:CommitLogSegment(/Users/jussi.pori/Programs/apache-cassandra-0.6.1/commitlog/CommitLog-1272613416862.log)
>  INFO [COMMIT-LOG-WRITER] 2010-04-30 10:44:08,169 CommitLog.java (line 407) Discarding obsolete commit log:CommitLogSegment(/Users/jussi.pori/Programs/apache-cassandra-0.6.1/commitlog/CommitLog-1272613419624.log)
>  INFO [COMMIT-LOG-WRITER] 2010-04-30 10:44:10,256 CommitLogSegment.java (line 50) Creating new commitlog segment /Users/jussi.pori/Programs/apache-cassandra-0.6.1/commitlog/CommitLog-1272613450256.log
>  INFO [ROW-MUTATION-STAGE:21] 2010-04-30 10:44:10,564 ColumnFamilyStore.java (line 357) Standard1 has reached its threshold; switching in a fresh Memtable at CommitLogContext(file='/Users/jussi.pori/Programs/apache-cassandra-0.6.1/commitlog/CommitLog-1272613450256.log', position=6000316)
>  INFO [ROW-MUTATION-STAGE:21] 2010-04-30 10:44:10,565 ColumnFamilyStore.java (line 609) Enqueuing flush of Memtable(Standard1)@1457398981
>  INFO [FLUSH-WRITER-POOL:1] 2010-04-30 10:44:10,565 Memtable.java (line 148) Writing Memtable(Standard1)@1457398981
>  INFO [GC inspection] 2010-04-30 10:44:19,393 GCInspector.java (line 110) GC for ParNew: 8633 ms, 56607240 reclaimed leaving 930530096 used; max is 4108386304
>  INFO [COMMIT-LOG-WRITER] 2010-04-30 10:45:02,527 CommitLogSegment.java (line 50) Creating new commitlog segment /Users/jussi.pori/Programs/apache-cassandra-0.6.1/commitlog/CommitLog-1272613502527.log
>  INFO [GC inspection] 2010-04-30 10:45:58,307 GCInspector.java (line 110) GC for ParNew: 49727 ms, 76431056 reclaimed leaving 1102511632 used; max is 4108386304
>  INFO [FLUSH-WRITER-POOL:1] 2010-04-30 10:47:05,944 Memtable.java (line 162) Completed flushing /Users/jussi.pori/Programs/apache-cassandra-0.6.1/data/Clutters/Standard1-3-Data.db
>  INFO [COMMIT-LOG-WRITER] 2010-04-30 10:47:14,264 CommitLogSegment.java (line 50) Creating new commitlog segment /Users/jussi.pori/Programs/apache-cassandra-0.6.1/commitlog/CommitLog-1272613634264.log
>  INFO [COMMIT-LOG-WRITER] 2010-04-30 10:48:23,571 CommitLog.java (line 407) Discarding obsolete commit log:CommitLogSegment(/Users/jussi.pori/Programs/apache-cassandra-0.6.1/commitlog/CommitLog-1272613422555.log)
>  INFO [COMMIT-LOG-WRITER] 2010-04-30 10:48:23,599 CommitLog.java (line 407) Discarding obsolete commit log:CommitLogSegment(/Users/jussi.pori/Programs/apache-cassandra-0.6.1/commitlog/CommitLog-1272613429913.log)
>  INFO [ROW-MUTATION-STAGE:26] 2010-04-30 10:48:24,348 ColumnFamilyStore.java (line 357) Standard1 has reached its threshold; switching in a fresh Memtable at CommitLogContext(file='/Users/jussi.pori/Programs/apache-cassandra-0.6.1/commitlog/CommitLog-1272613634264.log', position=18000316)
>  INFO [ROW-MUTATION-STAGE:26] 2010-04-30 10:48:24,409 ColumnFamilyStore.java (line 609) Enqueuing flush of Memtable(Standard1)@47327328
>  INFO [FLUSH-WRITER-POOL:1] 2010-04-30 10:48:24,409 Memtable.java (line 148) Writing Memtable(Standard1)@47327328
>  INFO [GC inspection] 2010-04-30 10:49:27,238 GCInspector.java (line 110) GC for ParNew: 44522 ms, 80667552 reclaimed leaving 1234718736 used; max is 4108386304
>  INFO [GC inspection] 2010-04-30 10:50:53,913 GCInspector.java (line 110) GC for ParNew: 73534 ms, 83052224 reclaimed leaving 1414752304 used; max is 4108386304
>  INFO [COMMIT-LOG-WRITER] 2010-04-30 10:51:19,800 CommitLogSegment.java (line 50) Creating new commitlog segment /Users/jussi.pori/Programs/apache-cassandra-0.6.1/commitlog/CommitLog-1272613879800.log
>  INFO [COMMIT-LOG-WRITER] 2010-04-30 10:52:17,371 CommitLogSegment.java (line 50) Creating new commitlog segment /Users/jussi.pori/Programs/apache-cassandra-0.6.1/commitlog/CommitLog-1272613937371.log
>  INFO [FLUSH-WRITER-POOL:1] 2010-04-30 10:52:18,731 Memtable.java (line 162) Completed flushing /Users/jussi.pori/Programs/apache-cassandra-0.6.1/data/Clutters/Standard1-4-Data.db
>  INFO [COMPACTION-POOL:1] 2010-04-30 10:52:18,880 CompactionManager.java (line 246) Compacting [org.apache.cassandra.io.SSTableReader(path='/Users/jussi.pori/Programs/apache-cassandra-0.6.1/data/Clutters/Standard1-1-Data.db'),org.apache.cassandra.io.SSTableReader(path='/Users/jussi.pori/Programs/apache-cassandra-0.6.1/data/Clutters/Standard1-2-Data.db'),org.apache.cassandra.io.SSTableReader(path='/Users/jussi.pori/Programs/apache-cassandra-0.6.1/data/Clutters/Standard1-3-Data.db'),org.apache.cassandra.io.SSTableReader(path='/Users/jussi.pori/Programs/apache-cassandra-0.6.1/data/Clutters/Standard1-4-Data.db')]
>  INFO [COMMIT-LOG-WRITER] 2010-04-30 10:52:32,070 CommitLog.java (line 407) Discarding obsolete commit log:CommitLogSegment(/Users/jussi.pori/Programs/apache-cassandra-0.6.1/commitlog/CommitLog-1272613450256.log)
>  INFO [COMMIT-LOG-WRITER] 2010-04-30 10:52:32,188 CommitLog.java (line 407) Discarding obsolete commit log:CommitLogSegment(/Users/jussi.pori/Programs/apache-cassandra-0.6.1/commitlog/CommitLog-1272613502527.log)
>
>
>  INFO [GC inspection] 2010-04-30 10:53:46,949 GCInspector.java (line 110) GC for ParNew: 13621 ms, 88262512 reclaimed leaving 1581477264 used; max is 4108386304
>  INFO [GC inspection] 2010-04-30 10:55:08,644 GCInspector.java (line 110) GC for ParNew: 15073 ms, 20452712 reclaimed leaving 431259232 used; max is 4108386304
>  INFO [COMMIT-LOG-WRITER] 2010-04-30 10:55:08,644 CommitLogSegment.java (line 50) Creating new commitlog segment /Users/jussi.pori/Programs/apache-cassandra-0.6.1/commitlog/CommitLog-1272614108644.log
>  INFO [ROW-MUTATION-STAGE:32] 2010-04-30 10:55:12,952 ColumnFamilyStore.java (line 357) Standard1 has reached its threshold; switching in a fresh Memtable at CommitLogContext(file='/Users/jussi.pori/Programs/apache-cassandra-0.6.1/commitlog/CommitLog-1272614108644.log', position=18000316)
>  INFO [ROW-MUTATION-STAGE:32] 2010-04-30 10:55:12,953 ColumnFamilyStore.java (line 609) Enqueuing flush of Memtable(Standard1)@1990541648
>  INFO [FLUSH-WRITER-POOL:1] 2010-04-30 10:55:12,967 Memtable.java (line 148) Writing Memtable(Standard1)@1990541648
>  INFO [GC inspection] 2010-04-30 10:55:20,718 GCInspector.java (line 110) GC for ParNew: 6685 ms, 31187112 reclaimed leaving 626981312 used; max is 4108386304
>  INFO [GC inspection] 2010-04-30 10:55:31,084 GCInspector.java (line 110) GC for ParNew: 10109 ms, 3237800 reclaimed leaving 752990680 used; max is 4108386304
>  INFO [COMMIT-LOG-WRITER] 2010-04-30 10:55:33,247 CommitLogSegment.java (line 50) Creating new commitlog segment /Users/jussi.pori/Programs/apache-cassandra-0.6.1/commitlog/CommitLog-1272614133247.log
>  INFO [GC inspection] 2010-04-30 10:55:41,083 GCInspector.java (line 110) GC for ParNew: 1426 ms, 48780568 reclaimed leaving 1053980248 used; max is 4108386304
>  INFO [GC inspection] 2010-04-30 10:56:01,084 GCInspector.java (line 110) GC for ParNew: 269 ms, 200003712 reclaimed leaving 1500784512 used; max is 4108386304
>  INFO [COMMIT-LOG-WRITER] 2010-04-30 10:56:03,609 CommitLogSegment.java (line 50) Creating new commitlog segment /Users/jussi.pori/Programs/apache-cassandra-0.6.1/commitlog/CommitLog-1272614163609.log
>  INFO [FLUSH-WRITER-POOL:1] 2010-04-30 10:56:13,219 Memtable.java (line 162) Completed flushing /Users/jussi.pori/Programs/apache-cassandra-0.6.1/data/Clutters/Standard1-5-Data.db
>  INFO [COMMIT-LOG-WRITER] 2010-04-30 10:56:13,405 CommitLog.java (line 407) Discarding obsolete commit log:CommitLogSegment(/Users/jussi.pori/Programs/apache-cassandra-0.6.1/commitlog/CommitLog-1272613634264.log)
>  INFO [COMMIT-LOG-WRITER] 2010-04-30 10:56:13,406 CommitLog.java (line 407) Discarding obsolete commit log:CommitLogSegment(/Users/jussi.pori/Programs/apache-cassandra-0.6.1/commitlog/CommitLog-1272613879800.log)
>  INFO [COMMIT-LOG-WRITER] 2010-04-30 10:56:13,407 CommitLog.java (line 407) Discarding obsolete commit log:CommitLogSegment(/Users/jussi.pori/Programs/apache-cassandra-0.6.1/commitlog/CommitLog-1272613937371.log)
>  INFO [COMMIT-LOG-WRITER] 2010-04-30 10:56:20,275 CommitLogSegment.java (line 50) Creating new commitlog segment /Users/jussi.pori/Programs/apache-cassandra-0.6.1/commitlog/CommitLog-1272614180275.log
>  INFO [ROW-MUTATION-STAGE:6] 2010-04-30 10:56:20,300 ColumnFamilyStore.java (line 357) Standard1 has reached its threshold; switching in a fresh Memtable at CommitLogContext(file='/Users/jussi.pori/Programs/apache-cassandra-0.6.1/commitlog/CommitLog-1272614180275.log', position=145)
>  INFO [ROW-MUTATION-STAGE:6] 2010-04-30 10:56:20,300 ColumnFamilyStore.java (line 609) Enqueuing flush of Memtable(Standard1)@1034225086
>  INFO [FLUSH-WRITER-POOL:1] 2010-04-30 10:56:20,301 Memtable.java (line 148) Writing Memtable(Standard1)@1034225086
>  INFO [GC inspection] 2010-04-30 10:56:42,053 GCInspector.java (line 110) GC for ParNew: 21602 ms, 58548344 reclaimed leaving 1670216000 used; max is 4108386304
>  INFO [GC inspection] 2010-04-30 10:57:12,785 GCInspector.java (line 110) GC for ParNew: 30465 ms, 3229848 reclaimed leaving 1922229272 used; max is 4108386304
>
>
> On Apr 29, 2010, at 3:50 PM, Jonathan Ellis wrote:
>
>> are you seeing memtable flushes and compactions in the log?
>>
>> what does tpstats look like when it's timing out?
>>
>> spending 2000ms on GC every 50s indicates that it's not GC causing
>> your problem.  (especially when all of them are ParNew, which are
>> completely non-blocking to other threads)
>>
>> On Wed, Apr 28, 2010 at 9:24 AM, Jussi P?öri
>> <ju...@androidconsulting.com> wrote:
>>> new try, previous went to wrong place...
>>>
>>> Hi all,
>>>
>>> i'm trying to run a scenario of adding files from specific folder to cassandra. Now I have 64 files(about 15-20 MB per file) and overall of 1GB of data.
>>> I'm able to insert a round 40 files, but after that the cassandra goes to some GC loop and I finally get an timeout to the client.
>>> It is not going to OOM, but it just jams.
>>>
>>> Here is what I had last marks in log file:
>>> NFO [GC inspection] 2010-04-28 10:07:55,297 GCInspector.java (line 110) GC for ParNew: 232 ms, 25731128 reclaimed leaving 553241120 used; max is 4108386304
>>>  INFO [GC inspection] 2010-04-28 10:09:02,331 GCInspector.java (line 110) GC for ParNew: 2844 ms, 238909856 reclaimed leaving 1435582832 used; max is 4108386304
>>>  INFO [GC inspection] 2010-04-28 10:09:49,421 GCInspector.java (line 110) GC for ParNew: 30666 ms, 11185824 reclaimed leaving 1679795336 used; max is 4108386304
>>>  INFO [GC inspection] 2010-04-28 10:11:18,090 GCInspector.java (line 110) GC for ParNew: 895 ms, 17921680 reclaimed leaving 1589308456 used; max is 4108386304
>>>
>>>
>>>
>>> I think that I must have something wrong in my configurations or in how I use cassandra, because here people are inserting 10 times more stuff and it works.
>>>
>>> Column family I using:
>>> <ColumnFamily CompareWith="BytesType" Name="Standard1"/>
>>> Basically inserting with key name is "Folder_name" and column name is "file name" and value is the file content.
>>> I tried with Hector(mainly) and directly using thrift(insert and batch_mutate).
>>>
>>> In my case, the data does not need to readable immediately after insert, but I don't know it that helps in anyway.
>>>
>>>
>>> My environment :
>>> mac and/or linux, tested in both
>>> java 1.6.0_17
>>> Cassandra 0.6.1
>>>
>>>
>>>
>>>  <RpcTimeoutInMillis>60000</RpcTimeoutInMillis>
>>> <CommitLogRotationThresholdInMB>32</CommitLogRotationThresholdInMB>
>>> <RowWarningThresholdInMB>512</RowWarningThresholdInMB>
>>>  <SlicedBufferSizeInKB>32</SlicedBufferSizeInKB>
>>>  <FlushDataBufferSizeInMB>32</FlushDataBufferSizeInMB>
>>>  <FlushIndexBufferSizeInMB>8</FlushIndexBufferSizeInMB>
>>>  <ColumnIndexSizeInKB>64</ColumnIndexSizeInKB>
>>>  <MemtableThroughputInMB>64</MemtableThroughputInMB>
>>>  <BinaryMemtableThroughputInMB>256</BinaryMemtableThroughputInMB>
>>>  <MemtableOperationsInMillions>0.1</MemtableOperationsInMillions>
>>>  <MemtableFlushAfterMinutes>60</MemtableFlushAfterMinutes>
>>>  <ConcurrentReads>8</ConcurrentReads>
>>>  <ConcurrentWrites>32</ConcurrentWrites>
>>>  <CommitLogSync>batch</CommitLogSync>
>>>  <!-- CommitLogSyncPeriodInMS>10000</CommitLogSyncPeriodInMS -->
>>>  <CommitLogSyncBatchWindowInMS>1.0</CommitLogSyncBatchWindowInMS>
>>>  <GCGraceSeconds>500</GCGraceSeconds>
>>>
>>> JVM_OPTS=" \
>>>        -server \
>>>        -Xms3G \
>>>        -Xmx3G \
>>>        -XX:PermSize=512m \
>>>        -XX:MaxPermSize=800m \
>>>        -XX:MaxNewSize=256m \
>>>        -XX:NewSize=128m \
>>>        -XX:TargetSurvivorRatio=90 \
>>>        -XX:+AggressiveOpts \
>>>        -XX:+UseParNewGC \
>>>        -XX:+UseConcMarkSweepGC \
>>>        -XX:+CMSParallelRemarkEnabled \
>>>        -XX:+HeapDumpOnOutOfMemoryError \
>>>        -XX:SurvivorRatio=128 \
>>>        -XX:MaxTenuringThreshold=0 \
>>>        -XX:+DisableExplicitGC \
>>>        -Dcom.sun.management.jmxremote.port=8080 \
>>>        -Dcom.sun.management.jmxremote.ssl=false \
>>>        -Dcom.sun.management.jmxremote.authenticate=false"
>>>
>>>
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of Riptano, the source for professional Cassandra support
>> http://riptano.com
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: Inserting files to Cassandra timeouts

Posted by Spacejatsi <sp...@gmail.com>.
I ran again the test, inserting 64 files (15-25MB per file) with 2 threads inserting file file at the time. 
First 30 files goes relatively fast in, but then it jams, and finally timeouts. This tpstats is taken when the first timeout came.

I also tested to split the files max of 5 mb per file. That was running fine about 50% but that also jammed. 
Should I be running the inserts to multiple nodes at the same time, not to overload one node only?

100305-mac17:bin jussi.pori$ ./nodetool -host localhost tpstats
Pool Name                    Active   Pending      Completed
FILEUTILS-DELETE-POOL             0         0              8
STREAM-STAGE                      0         0              0
RESPONSE-STAGE                    0         0              0
ROW-READ-STAGE                    0         0              0
LB-OPERATIONS                     0         0              0
MESSAGE-DESERIALIZER-POOL         0         0              0
GMFD                              0         0              0
LB-TARGET                         0         0              0
CONSISTENCY-MANAGER               0         0              0
ROW-MUTATION-STAGE                2         2             31
MESSAGE-STREAMING-POOL            0         0              0
LOAD-BALANCER-STAGE               0         0              0
FLUSH-SORTER-POOL                 0         0              0
MEMTABLE-POST-FLUSHER             0         0              5
FLUSH-WRITER-POOL                 0         0              5
AE-SERVICE-STAGE                  0         0              0

This is the system.log from that test:

 INFO [COMMIT-LOG-WRITER] 2010-04-30 10:43:34,174 CommitLogSegment.java (line 50) Creating new commitlog segment /Users/jussi.pori/Programs/apache-cassandra-0.6.1/commitlog/CommitLog-1272613414173.log
 INFO [COMMIT-LOG-WRITER] 2010-04-30 10:43:35,481 CommitLogSegment.java (line 50) Creating new commitlog segment /Users/jussi.pori/Programs/apache-cassandra-0.6.1/commitlog/CommitLog-1272613415481.log
 INFO [ROW-MUTATION-STAGE:9] 2010-04-30 10:43:36,106 ColumnFamilyStore.java (line 357) Standard1 has reached its threshold; switching in a fresh Memtable at CommitLogContext(file='/Users/jussi.pori/Programs/apache-cassandra-0.6.1/commitlog/CommitLog-1272613415481.log', position=18000316)
 INFO [ROW-MUTATION-STAGE:9] 2010-04-30 10:43:36,107 ColumnFamilyStore.java (line 609) Enqueuing flush of Memtable(Standard1)@505410915
 INFO [FLUSH-WRITER-POOL:1] 2010-04-30 10:43:36,108 Memtable.java (line 148) Writing Memtable(Standard1)@505410915
 INFO [COMMIT-LOG-WRITER] 2010-04-30 10:43:36,862 CommitLogSegment.java (line 50) Creating new commitlog segment /Users/jussi.pori/Programs/apache-cassandra-0.6.1/commitlog/CommitLog-1272613416862.log
 INFO [COMMIT-LOG-WRITER] 2010-04-30 10:43:39,624 CommitLogSegment.java (line 50) Creating new commitlog segment /Users/jussi.pori/Programs/apache-cassandra-0.6.1/commitlog/CommitLog-1272613419624.log
 INFO [COMMIT-LOG-WRITER] 2010-04-30 10:43:42,555 CommitLogSegment.java (line 50) Creating new commitlog segment /Users/jussi.pori/Programs/apache-cassandra-0.6.1/commitlog/CommitLog-1272613422555.log
 INFO [FLUSH-WRITER-POOL:1] 2010-04-30 10:43:42,603 Memtable.java (line 162) Completed flushing /Users/jussi.pori/Programs/apache-cassandra-0.6.1/data/Clutters/Standard1-1-Data.db
 INFO [COMMIT-LOG-WRITER] 2010-04-30 10:43:42,650 CommitLog.java (line 407) Discarding obsolete commit log:CommitLogSegment(/Users/jussi.pori/Programs/apache-cassandra-0.6.1/commitlog/CommitLog-1272613414173.log)
 INFO [ROW-MUTATION-STAGE:15] 2010-04-30 10:43:42,655 ColumnFamilyStore.java (line 357) Standard1 has reached its threshold; switching in a fresh Memtable at CommitLogContext(file='/Users/jussi.pori/Programs/apache-cassandra-0.6.1/commitlog/CommitLog-1272613422555.log', position=145)
 INFO [ROW-MUTATION-STAGE:15] 2010-04-30 10:43:42,656 ColumnFamilyStore.java (line 609) Enqueuing flush of Memtable(Standard1)@481059370
 INFO [FLUSH-WRITER-POOL:1] 2010-04-30 10:43:42,657 Memtable.java (line 148) Writing Memtable(Standard1)@481059370
 INFO [GC inspection] 2010-04-30 10:43:49,760 GCInspector.java (line 110) GC for ParNew: 2390 ms, 58993544 reclaimed leaving 614178960 used; max is 4108386304
 INFO [COMMIT-LOG-WRITER] 2010-04-30 10:43:49,913 CommitLogSegment.java (line 50) Creating new commitlog segment /Users/jussi.pori/Programs/apache-cassandra-0.6.1/commitlog/CommitLog-1272613429913.log
 INFO [GC inspection] 2010-04-30 10:44:07,039 GCInspector.java (line 110) GC for ParNew: 13666 ms, 60792648 reclaimed leaving 816424624 used; max is 4108386304
 INFO [FLUSH-WRITER-POOL:1] 2010-04-30 10:44:07,040 Memtable.java (line 162) Completed flushing /Users/jussi.pori/Programs/apache-cassandra-0.6.1/data/Clutters/Standard1-2-Data.db
 INFO [COMMIT-LOG-WRITER] 2010-04-30 10:44:08,164 CommitLog.java (line 407) Discarding obsolete commit log:CommitLogSegment(/Users/jussi.pori/Programs/apache-cassandra-0.6.1/commitlog/CommitLog-1272613415481.log)
 INFO [COMMIT-LOG-WRITER] 2010-04-30 10:44:08,164 CommitLog.java (line 407) Discarding obsolete commit log:CommitLogSegment(/Users/jussi.pori/Programs/apache-cassandra-0.6.1/commitlog/CommitLog-1272613416862.log)
 INFO [COMMIT-LOG-WRITER] 2010-04-30 10:44:08,169 CommitLog.java (line 407) Discarding obsolete commit log:CommitLogSegment(/Users/jussi.pori/Programs/apache-cassandra-0.6.1/commitlog/CommitLog-1272613419624.log)
 INFO [COMMIT-LOG-WRITER] 2010-04-30 10:44:10,256 CommitLogSegment.java (line 50) Creating new commitlog segment /Users/jussi.pori/Programs/apache-cassandra-0.6.1/commitlog/CommitLog-1272613450256.log
 INFO [ROW-MUTATION-STAGE:21] 2010-04-30 10:44:10,564 ColumnFamilyStore.java (line 357) Standard1 has reached its threshold; switching in a fresh Memtable at CommitLogContext(file='/Users/jussi.pori/Programs/apache-cassandra-0.6.1/commitlog/CommitLog-1272613450256.log', position=6000316)
 INFO [ROW-MUTATION-STAGE:21] 2010-04-30 10:44:10,565 ColumnFamilyStore.java (line 609) Enqueuing flush of Memtable(Standard1)@1457398981
 INFO [FLUSH-WRITER-POOL:1] 2010-04-30 10:44:10,565 Memtable.java (line 148) Writing Memtable(Standard1)@1457398981
 INFO [GC inspection] 2010-04-30 10:44:19,393 GCInspector.java (line 110) GC for ParNew: 8633 ms, 56607240 reclaimed leaving 930530096 used; max is 4108386304
 INFO [COMMIT-LOG-WRITER] 2010-04-30 10:45:02,527 CommitLogSegment.java (line 50) Creating new commitlog segment /Users/jussi.pori/Programs/apache-cassandra-0.6.1/commitlog/CommitLog-1272613502527.log
 INFO [GC inspection] 2010-04-30 10:45:58,307 GCInspector.java (line 110) GC for ParNew: 49727 ms, 76431056 reclaimed leaving 1102511632 used; max is 4108386304
 INFO [FLUSH-WRITER-POOL:1] 2010-04-30 10:47:05,944 Memtable.java (line 162) Completed flushing /Users/jussi.pori/Programs/apache-cassandra-0.6.1/data/Clutters/Standard1-3-Data.db
 INFO [COMMIT-LOG-WRITER] 2010-04-30 10:47:14,264 CommitLogSegment.java (line 50) Creating new commitlog segment /Users/jussi.pori/Programs/apache-cassandra-0.6.1/commitlog/CommitLog-1272613634264.log
 INFO [COMMIT-LOG-WRITER] 2010-04-30 10:48:23,571 CommitLog.java (line 407) Discarding obsolete commit log:CommitLogSegment(/Users/jussi.pori/Programs/apache-cassandra-0.6.1/commitlog/CommitLog-1272613422555.log)
 INFO [COMMIT-LOG-WRITER] 2010-04-30 10:48:23,599 CommitLog.java (line 407) Discarding obsolete commit log:CommitLogSegment(/Users/jussi.pori/Programs/apache-cassandra-0.6.1/commitlog/CommitLog-1272613429913.log)
 INFO [ROW-MUTATION-STAGE:26] 2010-04-30 10:48:24,348 ColumnFamilyStore.java (line 357) Standard1 has reached its threshold; switching in a fresh Memtable at CommitLogContext(file='/Users/jussi.pori/Programs/apache-cassandra-0.6.1/commitlog/CommitLog-1272613634264.log', position=18000316)
 INFO [ROW-MUTATION-STAGE:26] 2010-04-30 10:48:24,409 ColumnFamilyStore.java (line 609) Enqueuing flush of Memtable(Standard1)@47327328
 INFO [FLUSH-WRITER-POOL:1] 2010-04-30 10:48:24,409 Memtable.java (line 148) Writing Memtable(Standard1)@47327328
 INFO [GC inspection] 2010-04-30 10:49:27,238 GCInspector.java (line 110) GC for ParNew: 44522 ms, 80667552 reclaimed leaving 1234718736 used; max is 4108386304
 INFO [GC inspection] 2010-04-30 10:50:53,913 GCInspector.java (line 110) GC for ParNew: 73534 ms, 83052224 reclaimed leaving 1414752304 used; max is 4108386304
 INFO [COMMIT-LOG-WRITER] 2010-04-30 10:51:19,800 CommitLogSegment.java (line 50) Creating new commitlog segment /Users/jussi.pori/Programs/apache-cassandra-0.6.1/commitlog/CommitLog-1272613879800.log
 INFO [COMMIT-LOG-WRITER] 2010-04-30 10:52:17,371 CommitLogSegment.java (line 50) Creating new commitlog segment /Users/jussi.pori/Programs/apache-cassandra-0.6.1/commitlog/CommitLog-1272613937371.log
 INFO [FLUSH-WRITER-POOL:1] 2010-04-30 10:52:18,731 Memtable.java (line 162) Completed flushing /Users/jussi.pori/Programs/apache-cassandra-0.6.1/data/Clutters/Standard1-4-Data.db
 INFO [COMPACTION-POOL:1] 2010-04-30 10:52:18,880 CompactionManager.java (line 246) Compacting [org.apache.cassandra.io.SSTableReader(path='/Users/jussi.pori/Programs/apache-cassandra-0.6.1/data/Clutters/Standard1-1-Data.db'),org.apache.cassandra.io.SSTableReader(path='/Users/jussi.pori/Programs/apache-cassandra-0.6.1/data/Clutters/Standard1-2-Data.db'),org.apache.cassandra.io.SSTableReader(path='/Users/jussi.pori/Programs/apache-cassandra-0.6.1/data/Clutters/Standard1-3-Data.db'),org.apache.cassandra.io.SSTableReader(path='/Users/jussi.pori/Programs/apache-cassandra-0.6.1/data/Clutters/Standard1-4-Data.db')]
 INFO [COMMIT-LOG-WRITER] 2010-04-30 10:52:32,070 CommitLog.java (line 407) Discarding obsolete commit log:CommitLogSegment(/Users/jussi.pori/Programs/apache-cassandra-0.6.1/commitlog/CommitLog-1272613450256.log)
 INFO [COMMIT-LOG-WRITER] 2010-04-30 10:52:32,188 CommitLog.java (line 407) Discarding obsolete commit log:CommitLogSegment(/Users/jussi.pori/Programs/apache-cassandra-0.6.1/commitlog/CommitLog-1272613502527.log)


 INFO [GC inspection] 2010-04-30 10:53:46,949 GCInspector.java (line 110) GC for ParNew: 13621 ms, 88262512 reclaimed leaving 1581477264 used; max is 4108386304
 INFO [GC inspection] 2010-04-30 10:55:08,644 GCInspector.java (line 110) GC for ParNew: 15073 ms, 20452712 reclaimed leaving 431259232 used; max is 4108386304
 INFO [COMMIT-LOG-WRITER] 2010-04-30 10:55:08,644 CommitLogSegment.java (line 50) Creating new commitlog segment /Users/jussi.pori/Programs/apache-cassandra-0.6.1/commitlog/CommitLog-1272614108644.log
 INFO [ROW-MUTATION-STAGE:32] 2010-04-30 10:55:12,952 ColumnFamilyStore.java (line 357) Standard1 has reached its threshold; switching in a fresh Memtable at CommitLogContext(file='/Users/jussi.pori/Programs/apache-cassandra-0.6.1/commitlog/CommitLog-1272614108644.log', position=18000316)
 INFO [ROW-MUTATION-STAGE:32] 2010-04-30 10:55:12,953 ColumnFamilyStore.java (line 609) Enqueuing flush of Memtable(Standard1)@1990541648
 INFO [FLUSH-WRITER-POOL:1] 2010-04-30 10:55:12,967 Memtable.java (line 148) Writing Memtable(Standard1)@1990541648
 INFO [GC inspection] 2010-04-30 10:55:20,718 GCInspector.java (line 110) GC for ParNew: 6685 ms, 31187112 reclaimed leaving 626981312 used; max is 4108386304
 INFO [GC inspection] 2010-04-30 10:55:31,084 GCInspector.java (line 110) GC for ParNew: 10109 ms, 3237800 reclaimed leaving 752990680 used; max is 4108386304
 INFO [COMMIT-LOG-WRITER] 2010-04-30 10:55:33,247 CommitLogSegment.java (line 50) Creating new commitlog segment /Users/jussi.pori/Programs/apache-cassandra-0.6.1/commitlog/CommitLog-1272614133247.log
 INFO [GC inspection] 2010-04-30 10:55:41,083 GCInspector.java (line 110) GC for ParNew: 1426 ms, 48780568 reclaimed leaving 1053980248 used; max is 4108386304
 INFO [GC inspection] 2010-04-30 10:56:01,084 GCInspector.java (line 110) GC for ParNew: 269 ms, 200003712 reclaimed leaving 1500784512 used; max is 4108386304
 INFO [COMMIT-LOG-WRITER] 2010-04-30 10:56:03,609 CommitLogSegment.java (line 50) Creating new commitlog segment /Users/jussi.pori/Programs/apache-cassandra-0.6.1/commitlog/CommitLog-1272614163609.log
 INFO [FLUSH-WRITER-POOL:1] 2010-04-30 10:56:13,219 Memtable.java (line 162) Completed flushing /Users/jussi.pori/Programs/apache-cassandra-0.6.1/data/Clutters/Standard1-5-Data.db
 INFO [COMMIT-LOG-WRITER] 2010-04-30 10:56:13,405 CommitLog.java (line 407) Discarding obsolete commit log:CommitLogSegment(/Users/jussi.pori/Programs/apache-cassandra-0.6.1/commitlog/CommitLog-1272613634264.log)
 INFO [COMMIT-LOG-WRITER] 2010-04-30 10:56:13,406 CommitLog.java (line 407) Discarding obsolete commit log:CommitLogSegment(/Users/jussi.pori/Programs/apache-cassandra-0.6.1/commitlog/CommitLog-1272613879800.log)
 INFO [COMMIT-LOG-WRITER] 2010-04-30 10:56:13,407 CommitLog.java (line 407) Discarding obsolete commit log:CommitLogSegment(/Users/jussi.pori/Programs/apache-cassandra-0.6.1/commitlog/CommitLog-1272613937371.log)
 INFO [COMMIT-LOG-WRITER] 2010-04-30 10:56:20,275 CommitLogSegment.java (line 50) Creating new commitlog segment /Users/jussi.pori/Programs/apache-cassandra-0.6.1/commitlog/CommitLog-1272614180275.log
 INFO [ROW-MUTATION-STAGE:6] 2010-04-30 10:56:20,300 ColumnFamilyStore.java (line 357) Standard1 has reached its threshold; switching in a fresh Memtable at CommitLogContext(file='/Users/jussi.pori/Programs/apache-cassandra-0.6.1/commitlog/CommitLog-1272614180275.log', position=145)
 INFO [ROW-MUTATION-STAGE:6] 2010-04-30 10:56:20,300 ColumnFamilyStore.java (line 609) Enqueuing flush of Memtable(Standard1)@1034225086
 INFO [FLUSH-WRITER-POOL:1] 2010-04-30 10:56:20,301 Memtable.java (line 148) Writing Memtable(Standard1)@1034225086
 INFO [GC inspection] 2010-04-30 10:56:42,053 GCInspector.java (line 110) GC for ParNew: 21602 ms, 58548344 reclaimed leaving 1670216000 used; max is 4108386304
 INFO [GC inspection] 2010-04-30 10:57:12,785 GCInspector.java (line 110) GC for ParNew: 30465 ms, 3229848 reclaimed leaving 1922229272 used; max is 4108386304


On Apr 29, 2010, at 3:50 PM, Jonathan Ellis wrote:

> are you seeing memtable flushes and compactions in the log?
> 
> what does tpstats look like when it's timing out?
> 
> spending 2000ms on GC every 50s indicates that it's not GC causing
> your problem.  (especially when all of them are ParNew, which are
> completely non-blocking to other threads)
> 
> On Wed, Apr 28, 2010 at 9:24 AM, Jussi P?öri
> <ju...@androidconsulting.com> wrote:
>> new try, previous went to wrong place...
>> 
>> Hi all,
>> 
>> i'm trying to run a scenario of adding files from specific folder to cassandra. Now I have 64 files(about 15-20 MB per file) and overall of 1GB of data.
>> I'm able to insert a round 40 files, but after that the cassandra goes to some GC loop and I finally get an timeout to the client.
>> It is not going to OOM, but it just jams.
>> 
>> Here is what I had last marks in log file:
>> NFO [GC inspection] 2010-04-28 10:07:55,297 GCInspector.java (line 110) GC for ParNew: 232 ms, 25731128 reclaimed leaving 553241120 used; max is 4108386304
>>  INFO [GC inspection] 2010-04-28 10:09:02,331 GCInspector.java (line 110) GC for ParNew: 2844 ms, 238909856 reclaimed leaving 1435582832 used; max is 4108386304
>>  INFO [GC inspection] 2010-04-28 10:09:49,421 GCInspector.java (line 110) GC for ParNew: 30666 ms, 11185824 reclaimed leaving 1679795336 used; max is 4108386304
>>  INFO [GC inspection] 2010-04-28 10:11:18,090 GCInspector.java (line 110) GC for ParNew: 895 ms, 17921680 reclaimed leaving 1589308456 used; max is 4108386304
>> 
>> 
>> 
>> I think that I must have something wrong in my configurations or in how I use cassandra, because here people are inserting 10 times more stuff and it works.
>> 
>> Column family I using:
>> <ColumnFamily CompareWith="BytesType" Name="Standard1"/>
>> Basically inserting with key name is "Folder_name" and column name is "file name" and value is the file content.
>> I tried with Hector(mainly) and directly using thrift(insert and batch_mutate).
>> 
>> In my case, the data does not need to readable immediately after insert, but I don't know it that helps in anyway.
>> 
>> 
>> My environment :
>> mac and/or linux, tested in both
>> java 1.6.0_17
>> Cassandra 0.6.1
>> 
>> 
>> 
>>  <RpcTimeoutInMillis>60000</RpcTimeoutInMillis>
>> <CommitLogRotationThresholdInMB>32</CommitLogRotationThresholdInMB>
>> <RowWarningThresholdInMB>512</RowWarningThresholdInMB>
>>  <SlicedBufferSizeInKB>32</SlicedBufferSizeInKB>
>>  <FlushDataBufferSizeInMB>32</FlushDataBufferSizeInMB>
>>  <FlushIndexBufferSizeInMB>8</FlushIndexBufferSizeInMB>
>>  <ColumnIndexSizeInKB>64</ColumnIndexSizeInKB>
>>  <MemtableThroughputInMB>64</MemtableThroughputInMB>
>>  <BinaryMemtableThroughputInMB>256</BinaryMemtableThroughputInMB>
>>  <MemtableOperationsInMillions>0.1</MemtableOperationsInMillions>
>>  <MemtableFlushAfterMinutes>60</MemtableFlushAfterMinutes>
>>  <ConcurrentReads>8</ConcurrentReads>
>>  <ConcurrentWrites>32</ConcurrentWrites>
>>  <CommitLogSync>batch</CommitLogSync>
>>  <!-- CommitLogSyncPeriodInMS>10000</CommitLogSyncPeriodInMS -->
>>  <CommitLogSyncBatchWindowInMS>1.0</CommitLogSyncBatchWindowInMS>
>>  <GCGraceSeconds>500</GCGraceSeconds>
>> 
>> JVM_OPTS=" \
>>        -server \
>>        -Xms3G \
>>        -Xmx3G \
>>        -XX:PermSize=512m \
>>        -XX:MaxPermSize=800m \
>>        -XX:MaxNewSize=256m \
>>        -XX:NewSize=128m \
>>        -XX:TargetSurvivorRatio=90 \
>>        -XX:+AggressiveOpts \
>>        -XX:+UseParNewGC \
>>        -XX:+UseConcMarkSweepGC \
>>        -XX:+CMSParallelRemarkEnabled \
>>        -XX:+HeapDumpOnOutOfMemoryError \
>>        -XX:SurvivorRatio=128 \
>>        -XX:MaxTenuringThreshold=0 \
>>        -XX:+DisableExplicitGC \
>>        -Dcom.sun.management.jmxremote.port=8080 \
>>        -Dcom.sun.management.jmxremote.ssl=false \
>>        -Dcom.sun.management.jmxremote.authenticate=false"
>> 
>> 
> 
> 
> 
> -- 
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com


Re: Inserting files to Cassandra timeouts

Posted by Jonathan Ellis <jb...@gmail.com>.
are you seeing memtable flushes and compactions in the log?

what does tpstats look like when it's timing out?

spending 2000ms on GC every 50s indicates that it's not GC causing
your problem.  (especially when all of them are ParNew, which are
completely non-blocking to other threads)

On Wed, Apr 28, 2010 at 9:24 AM, Jussi P?öri
<ju...@androidconsulting.com> wrote:
> new try, previous went to wrong place...
>
> Hi all,
>
> i'm trying to run a scenario of adding files from specific folder to cassandra. Now I have 64 files(about 15-20 MB per file) and overall of 1GB of data.
> I'm able to insert a round 40 files, but after that the cassandra goes to some GC loop and I finally get an timeout to the client.
> It is not going to OOM, but it just jams.
>
> Here is what I had last marks in log file:
> NFO [GC inspection] 2010-04-28 10:07:55,297 GCInspector.java (line 110) GC for ParNew: 232 ms, 25731128 reclaimed leaving 553241120 used; max is 4108386304
>  INFO [GC inspection] 2010-04-28 10:09:02,331 GCInspector.java (line 110) GC for ParNew: 2844 ms, 238909856 reclaimed leaving 1435582832 used; max is 4108386304
>  INFO [GC inspection] 2010-04-28 10:09:49,421 GCInspector.java (line 110) GC for ParNew: 30666 ms, 11185824 reclaimed leaving 1679795336 used; max is 4108386304
>  INFO [GC inspection] 2010-04-28 10:11:18,090 GCInspector.java (line 110) GC for ParNew: 895 ms, 17921680 reclaimed leaving 1589308456 used; max is 4108386304
>
>
>
> I think that I must have something wrong in my configurations or in how I use cassandra, because here people are inserting 10 times more stuff and it works.
>
> Column family I using:
> <ColumnFamily CompareWith="BytesType" Name="Standard1"/>
> Basically inserting with key name is "Folder_name" and column name is "file name" and value is the file content.
> I tried with Hector(mainly) and directly using thrift(insert and batch_mutate).
>
> In my case, the data does not need to readable immediately after insert, but I don't know it that helps in anyway.
>
>
> My environment :
> mac and/or linux, tested in both
> java 1.6.0_17
> Cassandra 0.6.1
>
>
>
>  <RpcTimeoutInMillis>60000</RpcTimeoutInMillis>
> <CommitLogRotationThresholdInMB>32</CommitLogRotationThresholdInMB>
> <RowWarningThresholdInMB>512</RowWarningThresholdInMB>
>  <SlicedBufferSizeInKB>32</SlicedBufferSizeInKB>
>  <FlushDataBufferSizeInMB>32</FlushDataBufferSizeInMB>
>  <FlushIndexBufferSizeInMB>8</FlushIndexBufferSizeInMB>
>  <ColumnIndexSizeInKB>64</ColumnIndexSizeInKB>
>  <MemtableThroughputInMB>64</MemtableThroughputInMB>
>  <BinaryMemtableThroughputInMB>256</BinaryMemtableThroughputInMB>
>  <MemtableOperationsInMillions>0.1</MemtableOperationsInMillions>
>  <MemtableFlushAfterMinutes>60</MemtableFlushAfterMinutes>
>  <ConcurrentReads>8</ConcurrentReads>
>  <ConcurrentWrites>32</ConcurrentWrites>
>  <CommitLogSync>batch</CommitLogSync>
>  <!-- CommitLogSyncPeriodInMS>10000</CommitLogSyncPeriodInMS -->
>  <CommitLogSyncBatchWindowInMS>1.0</CommitLogSyncBatchWindowInMS>
>  <GCGraceSeconds>500</GCGraceSeconds>
>
> JVM_OPTS=" \
>        -server \
>        -Xms3G \
>        -Xmx3G \
>        -XX:PermSize=512m \
>        -XX:MaxPermSize=800m \
>        -XX:MaxNewSize=256m \
>        -XX:NewSize=128m \
>        -XX:TargetSurvivorRatio=90 \
>        -XX:+AggressiveOpts \
>        -XX:+UseParNewGC \
>        -XX:+UseConcMarkSweepGC \
>        -XX:+CMSParallelRemarkEnabled \
>        -XX:+HeapDumpOnOutOfMemoryError \
>        -XX:SurvivorRatio=128 \
>        -XX:MaxTenuringThreshold=0 \
>        -XX:+DisableExplicitGC \
>        -Dcom.sun.management.jmxremote.port=8080 \
>        -Dcom.sun.management.jmxremote.ssl=false \
>        -Dcom.sun.management.jmxremote.authenticate=false"
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com