You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Ivan Georgiev <yn...@bk.ru> on 2011/02/19 15:16:43 UTC

0.7.2 slow memtables flushing

Hi,

I am testing 0.7.2 on a Windows 2003 x64 system(one node) and I am 
having the following problem.
My insertion speed is relatively slow, so the memtables do not get full 
and the actual flushing is triggered by memtable_flush_after_mins, this 
happens on the hour mark. My problem with 0.7.2 is that when that 
happens, the cpu spikes to 25% overall usage (x4 Xeon) and the operation 
takes anywhere from 2 to 4 minutes, leaving the node not responding 
during that time. This has forced to me to increase the rpc timeout 
option to beyond what i feel comfortable with.
I have run multiple tests with 0.7.0 and 0.7.2 with the same dataset and 
the results are consistent. During the same operation 0.7.0 takes about 
10 seconds to complete vs. 2 to 4 minutes for 0.7.2. I am attaching a 
log with the timestamps from one such flushing of 0.7.2. Please let me 
know if there is anything i can do to speed up and get results similar 
to 0.7.0.

Regards:
Ivan

This is the log of the operations which took most time during the flush 
operation. Using 0.7.0, with the same number of operations the flushing 
takes less than 10 seconds.

INFO 01:36:44,906 Enqueuing flush of 
Memtable-ArchiveFiles.6f776e65724944@1225856921(1619949 bytes, 34467 
operations)
INFO 01:37:47,375 Completed flushing 
C:\Storage\data\Storage\ArchiveFiles.68617368-f-3-Data.db (5549187 bytes)
INFO 01:37:47,375 Writing 
Memtable-ArchiveFiles.6e616d65@978152661(1619949 bytes, 34467 operations)
INFO 01:37:47,375 Enqueuing flush of 
Memtable-ArchiveFiles.706172656e74466f6c6465724944@2097700961(1619949 
bytes, 34467 operations)
INFO 01:38:51,343 Completed flushing 
C:\Storage\data\Storage\ArchiveFiles.6e616d65-f-3-Data.db (3820265 bytes)
INFO 01:38:51,343 Writing 
Memtable-ArchiveFiles.6f776e65724944@1225856921(1619949 bytes, 34467 
operations)

Re: 0.7.2 slow memtables flushing

Posted by Jonathan Ellis <jb...@gmail.com>.

Absolutely right.  (So, it's really a write-time slowdown, not read-time.)

Created https://issues.apache.org/jira/browse/CASSANDRA-2218 for the fix.

Thanks a lot for tracking that down!

2011/2/22 Ivan Georgiev <yn...@bk.ru>:
> Hi, yes, you are absolutely right, i overlooked that.
> I am sending directly as i dont want to polute the mailing list with my
> guesses.
>
> Problem is my dataset is exactly the same for 0.7.0 and 0.7.2 and my rows
> are not large at all.
> So in all reality, rebuffer() should read only when a row is on a boundary
> and crosses across buffers.
> And the rows size should easily fit more than 15 rows in a buffer. So i am
> sure that this flushing of 50mb file taking more than a minute
> means something is not right.
>
> I did a single change in BRAF.reBuffer() but i am not sure if it is correct.
> Basically changed:
>
>  if (bufferOffset > channel.size())
>        {
>            buffer.rewind();
>            bufferEnd = bufferOffset;
>            hitEOF = true;
>
>            return 0;
>  }
>
> to
>
>  if (bufferOffset >= channel.size())
>        {
>            buffer.rewind();
>            bufferEnd = bufferOffset;
>            hitEOF = true;
>
>            return 0;
>  }
>
> This gives me the same performance as in 0.7.0. The flushes in 0.7.2 now
> take about 6 seconds
> which is consistent with 0.7.0.
>
> I did that change, because i noticed that bytesRead was returning multiple
> times with -1 at the end of the method, which meant
> it had hit the end of file even on firs read. So i augmented the test to
> include that case and not do unnecessary reads when the buffer
> is at the end of file (looks like those reads were the expensive ones).
> Please let me know if that is correct or it breaks something ?
> Again, sorry if i am wasting your time.
>
> Thanks in advance.
>
> Regards:
> Ivan
>
> On 21.2.2011 г. 22:33 ч., Jonathan Ellis wrote:
>>
>> If you look in that code, the bounds are checked on each write and
>> reBuffer is called from there instead of from seek
>>
>> On Mon, Feb 21, 2011 at 2:21 PM, Ivan Georgiev<yn...@bk.ru>  wrote:
>>>
>>> I meant what was tagged as 0.7.0, at least that is what i used in my
>>> 0.7.0
>>> tests:
>>>
>>> http://svn.apache.org/repos/asf/cassandra/tags/cassandra-0.7.0/
>>>
>>> Ivan
>>>
>>> On 21.2.2011 г. 22:12 ч., ruslan usifov wrote:
>>>
>>> 2011/2/21 Ivan Georgiev<yn...@bk.ru>
>>>>
>>>> That is strange. In 0.7.0 i see this for seek:
>>>>
>>>> public void seek(long pos) throws IOException
>>>> {
>>>> this.curr_ = pos;
>>>> }
>>>
>>> You doesn't see 0.7.0 version, you see version before
>>> cassandra/branches/cassandra-0.7@1052531 (2010-12-24 16:57:07 +0000 (8
>>> weeks
>>> ago))
>>>
>>>
>>
>>
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: 0.7.2 slow memtables flushing

Posted by Jonathan Ellis <jb...@gmail.com>.

If you look in that code, the bounds are checked on each write and
reBuffer is called from there instead of from seek

On Mon, Feb 21, 2011 at 2:21 PM, Ivan Georgiev <yn...@bk.ru> wrote:
> I meant what was tagged as 0.7.0, at least that is what i used in my 0.7.0
> tests:
>
> http://svn.apache.org/repos/asf/cassandra/tags/cassandra-0.7.0/
>
> Ivan
>
> On 21.2.2011 г. 22:12 ч., ruslan usifov wrote:
>
> 2011/2/21 Ivan Georgiev <yn...@bk.ru>
>>
>> That is strange. In 0.7.0 i see this for seek:
>>
>> public void seek(long pos) throws IOException
>> {
>> this.curr_ = pos;
>> }
>
> You doesn't see 0.7.0 version, you see version before
> cassandra/branches/cassandra-0.7@1052531 (2010-12-24 16:57:07 +0000 (8 weeks
> ago))
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: 0.7.2 slow memtables flushing

Posted by Ivan Georgiev <yn...@bk.ru>.

I meant what was tagged as 0.7.0, at least that is what i used in my 
0.7.0 tests:

http://svn.apache.org/repos/asf/cassandra/tags/cassandra-0.7.0/

Ivan

On 21.2.2011 ?. 22:12 ?., ruslan usifov wrote:
>
>
> 2011/2/21 Ivan Georgiev <yngwiie@bk.ru <ma...@bk.ru>>
>
>     That is strange. In 0.7.0 i see this for seek:
>
>     public void seek(long pos) throws IOException
>     {
>     this.curr_ = pos;
>     }
>
>
> You doesn't see 0.7.0 version, you see version before 
> cassandra/branches/cassandra-0.7@1052531 (2010-12-24 16:57:07 +0000 (8 
> weeks ago))

Re: 0.7.2 slow memtables flushing

Posted by Jonathan Ellis <jb...@gmail.com>.

And it's not like it didn't have to rebuffer then, just that the code
organization was different.

On Mon, Feb 21, 2011 at 2:12 PM, ruslan usifov <ru...@gmail.com> wrote:
>
>
> 2011/2/21 Ivan Georgiev <yn...@bk.ru>
>>
>> That is strange. In 0.7.0 i see this for seek:
>>
>> public void seek(long pos) throws IOException
>> {
>> this.curr_ = pos;
>> }
>
> You doesn't see 0.7.0 version, you see version before
> cassandra/branches/cassandra-0.7@1052531 (2010-12-24 16:57:07 +0000 (8 weeks
> ago))
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: 0.7.2 slow memtables flushing

Posted by ruslan usifov <ru...@gmail.com>.

2011/2/21 Ivan Georgiev <yn...@bk.ru>

> That is strange. In 0.7.0 i see this for seek:
>
> public void seek(long pos) throws IOException
> {
> this.curr_ = pos;
> }
>

You doesn't see 0.7.0 version, you see version before
cassandra/branches/cassandra-0.7@1052531 (2010-12-24 16:57:07 +0000 (8 weeks
ago))

Re: 0.7.2 slow memtables flushing

Posted by Ivan Georgiev <yn...@bk.ru>.

That is strange. In 0.7.0 i see this for seek:

public void seek(long pos) throws IOException
{
this.curr_ = pos;
}

Ivan

On 21.2.2011 г. 21:20 ч., Jonathan Ellis wrote:
> BRAF.seek has not changed since 0.7.0.  Here is the implementation:
>
>      public void seek(long newPosition) throws IOException
>      {
>          current = newPosition;
>
>          if (newPosition>= bufferEnd || newPosition<  bufferOffset)
>          {
>              reBuffer(); // this will set bufferEnd for us
>          }
>
>          final int delta = (int) (newPosition - bufferOffset);
>          buffer.position(delta);
>      }
>
> In other words: if we are seeking outside of the part of the file that
> is currently buffered, we have to call rebuffer.
>
> Now, you might wonder why seek is being called at all in SSTW.append.
> Here is the relevant part (also unchanged since 0.7.0):
>
>          // write placeholder for the row size, since we don't know it yet
>          long sizePosition = dataFile.getFilePointer();
>          dataFile.writeLong(-1);
>          // write out row data
>          int columnCount =
> ColumnFamily.serializer().serializeWithIndexes(cf, dataFile);
>          // seek back and write the row size (not including the size Long itself)
>          long endPosition = dataFile.getFilePointer();
>          dataFile.seek(sizePosition);
>          dataFile.writeLong(endPosition - (sizePosition + 8));
>
> So basically once your row size is larger than the BRAF buffer you can
> expect to do a lot of rebuffer calls during flush or compact.  Sounds
> to me like the difference you are seeing is caused by your row size
> and not a regression in Cassandra.
>
> 2011/2/21 Ivan Georgiev<yn...@bk.ru>:
>> Some more digging. This is the code path causing the excessive rebuffer()
>> calls.
>>
>> java.lang.Exception: Stack trace
>>     at java.lang.Thread.dumpStack(Unknown Source)
>>     at
>> org.apache.cassandra.io.util.BufferedRandomAccessFile.reBuffer(BufferedRandomAccessFile.java:204)
>>     at
>> org.apache.cassandra.io.util.BufferedRandomAccessFile.seek(BufferedRandomAccessFile.java:394)
>>     at
>> org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:148)
>>     at
>> org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:159)
>>     at org.apache.cassandra.db.Memtable.access$000(Memtable.java:49)
>>     at org.apache.cassandra.db.Memtable$1.runMayThrow(Memtable.java:174)
>>     at
>> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>>     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
>>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>>     at java.lang.Thread.run(Unknown Source)
>>
>> Basically the rebuffer() is being called through seek(). In 0.7.0 seek()
>> does not call rebuffer() while 0.7.1 and on does
>> and that is where the performance degradation is happening in my test case.
>> Any ideas how to alleviate the problem ?
>>
>> Ivan
>>
>> On 21.2.2011 г. 14:07 ч., Ivan Georgiev wrote:
>>> I did some very rough measurements in a desperate attempt to see if I can
>>> find the issue (if there is an issue).
>>> Since I dont know the code base well enough i chose
>>> BufferedRandomAccessFile as my suspect, since it was rewritten from 0.7.0 to
>>> 0.7.1
>>>
>>> I did rough measurements on how many times write() and rebuffer() were
>>> called, and how much time was spent in them. The measurements
>>> were printed out when close() on the file was called. Time measurements
>>> are rounded in seconds. I have included the measurements only for the
>>> large files that were flushed to disk.
>>>
>>> 0.7.0
>>> Total time in reBuffer : 0 seconds.
>>> Total times called reBuffer : 2
>>> Total time in write : 0 seconds.
>>> Total times called write : 579374
>>>
>>> 0.7.2
>>> Time spent in reBuffer : 67 seconds.
>>> Times called reBuffer : 30888
>>> Time spent in write : 0 seconds.
>>> Times called write : 1884107
>>>
>>> Seems like rebuffer is being called a lot more in 0.7.2 and thats where
>>> the performance degradation is coming from.
>>>
>>> rebuffer can end up being called from write itself, but since the time
>>> spent in write is insignificant, that means this excessive
>>> calling of rebuffer in 0.7.2 is not happening from the write path.
>>>
>>> Ivan
>>>
>>>
>>> On 20.2.2011 г. 05:21 ч., Ivan Georgiev wrote:
>>>> Is it possible that the changes to the BufferedRandomAccessFile.java
>>>> could be causing the issue ?
>>>> I think the most notable change there is using ByteBuffer instead of a
>>>> byte[] for the buffer.
>>>>
>>>> I will run more tests and see what comes out of it.
>>>>
>>>> Ivan
>>>>
>>>> On 20.2.2011 г. 05:03 ч., Jonathan Ellis wrote:
>>>>> I guess you'll need to binary search through the 0.7.1 changes to find
>>>>> what made the difference.  I can't think of any obvious candidates.
>>>>>
>>>>> On Sat, Feb 19, 2011 at 10:32 AM, Ivan Georgiev<yn...@bk.ru>   wrote:
>>>>>> On 19.2.2011 г. 16:43 ч., Jonathan Ellis wrote:
>>>>>>> Flush code didn't change between 0.7.0 and 0.7.2. There must be some
>>>>>>> other variable here. Memory pressure maybe?
>>>>>> Cloud you please elaborate on that one ?
>>>>>> The conditions are exactly the same for the test with 0.7.0 and 0.7.2.
>>>>>> By the way, 0.7.1 tests are similar to 0.7.2, while 0.7 early betas to
>>>>>> 0.7.0
>>>>>> is fine.
>>>>>>
>>>>>> Ivan
>>>>>>
>>>>>> On 19.2.2011 г. 16:43 ч., Jonathan Ellis wrote:
>>>>>>> Flush code didn't change between 0.7.0 and 0.7.2. There must be some
>>>>>>> other variable here. Memory pressure maybe?
>>>>>>>
>>>>>>> On Sat, Feb 19, 2011 at 8:16 AM, Ivan Georgiev<yn...@bk.ru>
>>>>>>>   wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I am testing 0.7.2 on a Windows 2003 x64 system(one node) and I am
>>>>>>>> having
>>>>>>>> the following problem.
>>>>>>>> My insertion speed is relatively slow, so the memtables do not get
>>>>>>>> full
>>>>>>>> and
>>>>>>>> the actual flushing is triggered by memtable_flush_after_mins, this
>>>>>>>> happens
>>>>>>>> on the hour mark. My problem with 0.7.2 is that when that happens,
>>>>>>>> the
>>>>>>>> cpu
>>>>>>>> spikes to 25% overall usage (x4 Xeon) and the operation takes
>>>>>>>> anywhere
>>>>>>>> from
>>>>>>>> 2 to 4 minutes, leaving the node not responding during that time.
>>>>>>>> This
>>>>>>>> has
>>>>>>>> forced to me to increase the rpc timeout option to beyond what i feel
>>>>>>>> comfortable with.
>>>>>>>> I have run multiple tests with 0.7.0 and 0.7.2 with the same dataset
>>>>>>>> and
>>>>>>>> the
>>>>>>>> results are consistent. During the same operation 0.7.0 takes about
>>>>>>>> 10
>>>>>>>> seconds to complete vs. 2 to 4 minutes for 0.7.2. I am attaching a
>>>>>>>> log
>>>>>>>> with
>>>>>>>> the timestamps from one such flushing of 0.7.2. Please let me know if
>>>>>>>> there
>>>>>>>> is anything i can do to speed up and get results similar to 0.7.0.
>>>>>>>>
>>>>>>>> Regards:
>>>>>>>> Ivan
>>>>>>>>
>>>>>>>> This is the log of the operations which took most time during the
>>>>>>>> flush
>>>>>>>> operation. Using 0.7.0, with the same number of operations the
>>>>>>>> flushing
>>>>>>>> takes less than 10 seconds.
>>>>>>>>
>>>>>>>> INFO 01:36:44,906 Enqueuing flush of
>>>>>>>> Memtable-ArchiveFiles.6f776e65724944@1225856921(1619949 bytes, 34467
>>>>>>>> operations)
>>>>>>>> INFO 01:37:47,375 Completed flushing
>>>>>>>> C:\Storage\data\Storage\ArchiveFiles.68617368-f-3-Data.db (5549187
>>>>>>>> bytes)
>>>>>>>> INFO 01:37:47,375 Writing
>>>>>>>> Memtable-ArchiveFiles.6e616d65@978152661(1619949
>>>>>>>> bytes, 34467 operations)
>>>>>>>> INFO 01:37:47,375 Enqueuing flush of
>>>>>>>> Memtable-ArchiveFiles.706172656e74466f6c6465724944@2097700961(1619949
>>>>>>>> bytes,
>>>>>>>> 34467 operations)
>>>>>>>> INFO 01:38:51,343 Completed flushing
>>>>>>>> C:\Storage\data\Storage\ArchiveFiles.6e616d65-f-3-Data.db (3820265
>>>>>>>> bytes)
>>>>>>>> INFO 01:38:51,343 Writing
>>>>>>>> Memtable-ArchiveFiles.6f776e65724944@1225856921(1619949 bytes, 34467
>>>>>>>> operations)
>>>>>>>>
>>>>>
>>>>
>>>
>>
>
>

Re: 0.7.2 slow memtables flushing

Posted by Ivan Georgiev <yn...@bk.ru>.

That is strange. In 0.7.0 i see this for seek:

public void seek(long pos) throws IOException
{
this.curr_ = pos;
}

Ivan

On 21.2.2011 г. 21:20 ч., Jonathan Ellis wrote:
> BRAF.seek has not changed since 0.7.0.  Here is the implementation:
>
>      public void seek(long newPosition) throws IOException
>      {
>          current = newPosition;
>
>          if (newPosition>= bufferEnd || newPosition<  bufferOffset)
>          {
>              reBuffer(); // this will set bufferEnd for us
>          }
>
>          final int delta = (int) (newPosition - bufferOffset);
>          buffer.position(delta);
>      }
>
> In other words: if we are seeking outside of the part of the file that
> is currently buffered, we have to call rebuffer.
>
> Now, you might wonder why seek is being called at all in SSTW.append.
> Here is the relevant part (also unchanged since 0.7.0):
>
>          // write placeholder for the row size, since we don't know it yet
>          long sizePosition = dataFile.getFilePointer();
>          dataFile.writeLong(-1);
>          // write out row data
>          int columnCount =
> ColumnFamily.serializer().serializeWithIndexes(cf, dataFile);
>          // seek back and write the row size (not including the size Long itself)
>          long endPosition = dataFile.getFilePointer();
>          dataFile.seek(sizePosition);
>          dataFile.writeLong(endPosition - (sizePosition + 8));
>
> So basically once your row size is larger than the BRAF buffer you can
> expect to do a lot of rebuffer calls during flush or compact.  Sounds
> to me like the difference you are seeing is caused by your row size
> and not a regression in Cassandra.
>
> 2011/2/21 Ivan Georgiev<yn...@bk.ru>:
>> Some more digging. This is the code path causing the excessive rebuffer()
>> calls.
>>
>> java.lang.Exception: Stack trace
>>     at java.lang.Thread.dumpStack(Unknown Source)
>>     at
>> org.apache.cassandra.io.util.BufferedRandomAccessFile.reBuffer(BufferedRandomAccessFile.java:204)
>>     at
>> org.apache.cassandra.io.util.BufferedRandomAccessFile.seek(BufferedRandomAccessFile.java:394)
>>     at
>> org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:148)
>>     at
>> org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:159)
>>     at org.apache.cassandra.db.Memtable.access$000(Memtable.java:49)
>>     at org.apache.cassandra.db.Memtable$1.runMayThrow(Memtable.java:174)
>>     at
>> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>>     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
>>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>>     at java.lang.Thread.run(Unknown Source)
>>
>> Basically the rebuffer() is being called through seek(). In 0.7.0 seek()
>> does not call rebuffer() while 0.7.1 and on does
>> and that is where the performance degradation is happening in my test case.
>> Any ideas how to alleviate the problem ?
>>
>> Ivan
>>
>> On 21.2.2011 г. 14:07 ч., Ivan Georgiev wrote:
>>> I did some very rough measurements in a desperate attempt to see if I can
>>> find the issue (if there is an issue).
>>> Since I dont know the code base well enough i chose
>>> BufferedRandomAccessFile as my suspect, since it was rewritten from 0.7.0 to
>>> 0.7.1
>>>
>>> I did rough measurements on how many times write() and rebuffer() were
>>> called, and how much time was spent in them. The measurements
>>> were printed out when close() on the file was called. Time measurements
>>> are rounded in seconds. I have included the measurements only for the
>>> large files that were flushed to disk.
>>>
>>> 0.7.0
>>> Total time in reBuffer : 0 seconds.
>>> Total times called reBuffer : 2
>>> Total time in write : 0 seconds.
>>> Total times called write : 579374
>>>
>>> 0.7.2
>>> Time spent in reBuffer : 67 seconds.
>>> Times called reBuffer : 30888
>>> Time spent in write : 0 seconds.
>>> Times called write : 1884107
>>>
>>> Seems like rebuffer is being called a lot more in 0.7.2 and thats where
>>> the performance degradation is coming from.
>>>
>>> rebuffer can end up being called from write itself, but since the time
>>> spent in write is insignificant, that means this excessive
>>> calling of rebuffer in 0.7.2 is not happening from the write path.
>>>
>>> Ivan
>>>
>>>
>>> On 20.2.2011 г. 05:21 ч., Ivan Georgiev wrote:
>>>> Is it possible that the changes to the BufferedRandomAccessFile.java
>>>> could be causing the issue ?
>>>> I think the most notable change there is using ByteBuffer instead of a
>>>> byte[] for the buffer.
>>>>
>>>> I will run more tests and see what comes out of it.
>>>>
>>>> Ivan
>>>>
>>>> On 20.2.2011 г. 05:03 ч., Jonathan Ellis wrote:
>>>>> I guess you'll need to binary search through the 0.7.1 changes to find
>>>>> what made the difference.  I can't think of any obvious candidates.
>>>>>
>>>>> On Sat, Feb 19, 2011 at 10:32 AM, Ivan Georgiev<yn...@bk.ru>   wrote:
>>>>>> On 19.2.2011 г. 16:43 ч., Jonathan Ellis wrote:
>>>>>>> Flush code didn't change between 0.7.0 and 0.7.2. There must be some
>>>>>>> other variable here. Memory pressure maybe?
>>>>>> Cloud you please elaborate on that one ?
>>>>>> The conditions are exactly the same for the test with 0.7.0 and 0.7.2.
>>>>>> By the way, 0.7.1 tests are similar to 0.7.2, while 0.7 early betas to
>>>>>> 0.7.0
>>>>>> is fine.
>>>>>>
>>>>>> Ivan
>>>>>>
>>>>>> On 19.2.2011 г. 16:43 ч., Jonathan Ellis wrote:
>>>>>>> Flush code didn't change between 0.7.0 and 0.7.2. There must be some
>>>>>>> other variable here. Memory pressure maybe?
>>>>>>>
>>>>>>> On Sat, Feb 19, 2011 at 8:16 AM, Ivan Georgiev<yn...@bk.ru>
>>>>>>>   wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I am testing 0.7.2 on a Windows 2003 x64 system(one node) and I am
>>>>>>>> having
>>>>>>>> the following problem.
>>>>>>>> My insertion speed is relatively slow, so the memtables do not get
>>>>>>>> full
>>>>>>>> and
>>>>>>>> the actual flushing is triggered by memtable_flush_after_mins, this
>>>>>>>> happens
>>>>>>>> on the hour mark. My problem with 0.7.2 is that when that happens,
>>>>>>>> the
>>>>>>>> cpu
>>>>>>>> spikes to 25% overall usage (x4 Xeon) and the operation takes
>>>>>>>> anywhere
>>>>>>>> from
>>>>>>>> 2 to 4 minutes, leaving the node not responding during that time.
>>>>>>>> This
>>>>>>>> has
>>>>>>>> forced to me to increase the rpc timeout option to beyond what i feel
>>>>>>>> comfortable with.
>>>>>>>> I have run multiple tests with 0.7.0 and 0.7.2 with the same dataset
>>>>>>>> and
>>>>>>>> the
>>>>>>>> results are consistent. During the same operation 0.7.0 takes about
>>>>>>>> 10
>>>>>>>> seconds to complete vs. 2 to 4 minutes for 0.7.2. I am attaching a
>>>>>>>> log
>>>>>>>> with
>>>>>>>> the timestamps from one such flushing of 0.7.2. Please let me know if
>>>>>>>> there
>>>>>>>> is anything i can do to speed up and get results similar to 0.7.0.
>>>>>>>>
>>>>>>>> Regards:
>>>>>>>> Ivan
>>>>>>>>
>>>>>>>> This is the log of the operations which took most time during the
>>>>>>>> flush
>>>>>>>> operation. Using 0.7.0, with the same number of operations the
>>>>>>>> flushing
>>>>>>>> takes less than 10 seconds.
>>>>>>>>
>>>>>>>> INFO 01:36:44,906 Enqueuing flush of
>>>>>>>> Memtable-ArchiveFiles.6f776e65724944@1225856921(1619949 bytes, 34467
>>>>>>>> operations)
>>>>>>>> INFO 01:37:47,375 Completed flushing
>>>>>>>> C:\Storage\data\Storage\ArchiveFiles.68617368-f-3-Data.db (5549187
>>>>>>>> bytes)
>>>>>>>> INFO 01:37:47,375 Writing
>>>>>>>> Memtable-ArchiveFiles.6e616d65@978152661(1619949
>>>>>>>> bytes, 34467 operations)
>>>>>>>> INFO 01:37:47,375 Enqueuing flush of
>>>>>>>> Memtable-ArchiveFiles.706172656e74466f6c6465724944@2097700961(1619949
>>>>>>>> bytes,
>>>>>>>> 34467 operations)
>>>>>>>> INFO 01:38:51,343 Completed flushing
>>>>>>>> C:\Storage\data\Storage\ArchiveFiles.6e616d65-f-3-Data.db (3820265
>>>>>>>> bytes)
>>>>>>>> INFO 01:38:51,343 Writing
>>>>>>>> Memtable-ArchiveFiles.6f776e65724944@1225856921(1619949 bytes, 34467
>>>>>>>> operations)
>>>>>>>>
>>>>>
>>>>
>>>
>>
>
>

Re: 0.7.2 slow memtables flushing

Posted by Jonathan Ellis <jb...@gmail.com>.

BRAF.seek has not changed since 0.7.0.  Here is the implementation:

    public void seek(long newPosition) throws IOException
    {
        current = newPosition;

        if (newPosition >= bufferEnd || newPosition < bufferOffset)
        {
            reBuffer(); // this will set bufferEnd for us
        }

        final int delta = (int) (newPosition - bufferOffset);
        buffer.position(delta);
    }

In other words: if we are seeking outside of the part of the file that
is currently buffered, we have to call rebuffer.

Now, you might wonder why seek is being called at all in SSTW.append.
Here is the relevant part (also unchanged since 0.7.0):

        // write placeholder for the row size, since we don't know it yet
        long sizePosition = dataFile.getFilePointer();
        dataFile.writeLong(-1);
        // write out row data
        int columnCount =
ColumnFamily.serializer().serializeWithIndexes(cf, dataFile);
        // seek back and write the row size (not including the size Long itself)
        long endPosition = dataFile.getFilePointer();
        dataFile.seek(sizePosition);
        dataFile.writeLong(endPosition - (sizePosition + 8));

So basically once your row size is larger than the BRAF buffer you can
expect to do a lot of rebuffer calls during flush or compact.  Sounds
to me like the difference you are seeing is caused by your row size
and not a regression in Cassandra.

2011/2/21 Ivan Georgiev <yn...@bk.ru>:
> Some more digging. This is the code path causing the excessive rebuffer()
> calls.
>
> java.lang.Exception: Stack trace
>    at java.lang.Thread.dumpStack(Unknown Source)
>    at
> org.apache.cassandra.io.util.BufferedRandomAccessFile.reBuffer(BufferedRandomAccessFile.java:204)
>    at
> org.apache.cassandra.io.util.BufferedRandomAccessFile.seek(BufferedRandomAccessFile.java:394)
>    at
> org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:148)
>    at
> org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:159)
>    at org.apache.cassandra.db.Memtable.access$000(Memtable.java:49)
>    at org.apache.cassandra.db.Memtable$1.runMayThrow(Memtable.java:174)
>    at
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
>    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>    at java.lang.Thread.run(Unknown Source)
>
> Basically the rebuffer() is being called through seek(). In 0.7.0 seek()
> does not call rebuffer() while 0.7.1 and on does
> and that is where the performance degradation is happening in my test case.
> Any ideas how to alleviate the problem ?
>
> Ivan
>
> On 21.2.2011 г. 14:07 ч., Ivan Georgiev wrote:
>>
>> I did some very rough measurements in a desperate attempt to see if I can
>> find the issue (if there is an issue).
>> Since I dont know the code base well enough i chose
>> BufferedRandomAccessFile as my suspect, since it was rewritten from 0.7.0 to
>> 0.7.1
>>
>> I did rough measurements on how many times write() and rebuffer() were
>> called, and how much time was spent in them. The measurements
>> were printed out when close() on the file was called. Time measurements
>> are rounded in seconds. I have included the measurements only for the
>> large files that were flushed to disk.
>>
>> 0.7.0
>> Total time in reBuffer : 0 seconds.
>> Total times called reBuffer : 2
>> Total time in write : 0 seconds.
>> Total times called write : 579374
>>
>> 0.7.2
>> Time spent in reBuffer : 67 seconds.
>> Times called reBuffer : 30888
>> Time spent in write : 0 seconds.
>> Times called write : 1884107
>>
>> Seems like rebuffer is being called a lot more in 0.7.2 and thats where
>> the performance degradation is coming from.
>>
>> rebuffer can end up being called from write itself, but since the time
>> spent in write is insignificant, that means this excessive
>> calling of rebuffer in 0.7.2 is not happening from the write path.
>>
>> Ivan
>>
>>
>> On 20.2.2011 г. 05:21 ч., Ivan Georgiev wrote:
>>>
>>> Is it possible that the changes to the BufferedRandomAccessFile.java
>>> could be causing the issue ?
>>> I think the most notable change there is using ByteBuffer instead of a
>>> byte[] for the buffer.
>>>
>>> I will run more tests and see what comes out of it.
>>>
>>> Ivan
>>>
>>> On 20.2.2011 г. 05:03 ч., Jonathan Ellis wrote:
>>>>
>>>> I guess you'll need to binary search through the 0.7.1 changes to find
>>>> what made the difference.  I can't think of any obvious candidates.
>>>>
>>>> On Sat, Feb 19, 2011 at 10:32 AM, Ivan Georgiev<yn...@bk.ru>  wrote:
>>>>>
>>>>> On 19.2.2011 г. 16:43 ч., Jonathan Ellis wrote:
>>>>>>
>>>>>> Flush code didn't change between 0.7.0 and 0.7.2. There must be some
>>>>>> other variable here. Memory pressure maybe?
>>>>>
>>>>> Cloud you please elaborate on that one ?
>>>>> The conditions are exactly the same for the test with 0.7.0 and 0.7.2.
>>>>> By the way, 0.7.1 tests are similar to 0.7.2, while 0.7 early betas to
>>>>> 0.7.0
>>>>> is fine.
>>>>>
>>>>> Ivan
>>>>>
>>>>> On 19.2.2011 г. 16:43 ч., Jonathan Ellis wrote:
>>>>>>
>>>>>> Flush code didn't change between 0.7.0 and 0.7.2. There must be some
>>>>>> other variable here. Memory pressure maybe?
>>>>>>
>>>>>> On Sat, Feb 19, 2011 at 8:16 AM, Ivan Georgiev<yn...@bk.ru>
>>>>>>  wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I am testing 0.7.2 on a Windows 2003 x64 system(one node) and I am
>>>>>>> having
>>>>>>> the following problem.
>>>>>>> My insertion speed is relatively slow, so the memtables do not get
>>>>>>> full
>>>>>>> and
>>>>>>> the actual flushing is triggered by memtable_flush_after_mins, this
>>>>>>> happens
>>>>>>> on the hour mark. My problem with 0.7.2 is that when that happens,
>>>>>>> the
>>>>>>> cpu
>>>>>>> spikes to 25% overall usage (x4 Xeon) and the operation takes
>>>>>>> anywhere
>>>>>>> from
>>>>>>> 2 to 4 minutes, leaving the node not responding during that time.
>>>>>>> This
>>>>>>> has
>>>>>>> forced to me to increase the rpc timeout option to beyond what i feel
>>>>>>> comfortable with.
>>>>>>> I have run multiple tests with 0.7.0 and 0.7.2 with the same dataset
>>>>>>> and
>>>>>>> the
>>>>>>> results are consistent. During the same operation 0.7.0 takes about
>>>>>>> 10
>>>>>>> seconds to complete vs. 2 to 4 minutes for 0.7.2. I am attaching a
>>>>>>> log
>>>>>>> with
>>>>>>> the timestamps from one such flushing of 0.7.2. Please let me know if
>>>>>>> there
>>>>>>> is anything i can do to speed up and get results similar to 0.7.0.
>>>>>>>
>>>>>>> Regards:
>>>>>>> Ivan
>>>>>>>
>>>>>>> This is the log of the operations which took most time during the
>>>>>>> flush
>>>>>>> operation. Using 0.7.0, with the same number of operations the
>>>>>>> flushing
>>>>>>> takes less than 10 seconds.
>>>>>>>
>>>>>>> INFO 01:36:44,906 Enqueuing flush of
>>>>>>> Memtable-ArchiveFiles.6f776e65724944@1225856921(1619949 bytes, 34467
>>>>>>> operations)
>>>>>>> INFO 01:37:47,375 Completed flushing
>>>>>>> C:\Storage\data\Storage\ArchiveFiles.68617368-f-3-Data.db (5549187
>>>>>>> bytes)
>>>>>>> INFO 01:37:47,375 Writing
>>>>>>> Memtable-ArchiveFiles.6e616d65@978152661(1619949
>>>>>>> bytes, 34467 operations)
>>>>>>> INFO 01:37:47,375 Enqueuing flush of
>>>>>>> Memtable-ArchiveFiles.706172656e74466f6c6465724944@2097700961(1619949
>>>>>>> bytes,
>>>>>>> 34467 operations)
>>>>>>> INFO 01:38:51,343 Completed flushing
>>>>>>> C:\Storage\data\Storage\ArchiveFiles.6e616d65-f-3-Data.db (3820265
>>>>>>> bytes)
>>>>>>> INFO 01:38:51,343 Writing
>>>>>>> Memtable-ArchiveFiles.6f776e65724944@1225856921(1619949 bytes, 34467
>>>>>>> operations)
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: 0.7.2 slow memtables flushing

Posted by Jonathan Ellis <jb...@gmail.com>.

BRAF.seek has not changed since 0.7.0.  Here is the implementation:

    public void seek(long newPosition) throws IOException
    {
        current = newPosition;

        if (newPosition >= bufferEnd || newPosition < bufferOffset)
        {
            reBuffer(); // this will set bufferEnd for us
        }

        final int delta = (int) (newPosition - bufferOffset);
        buffer.position(delta);
    }

In other words: if we are seeking outside of the part of the file that
is currently buffered, we have to call rebuffer.

Now, you might wonder why seek is being called at all in SSTW.append.
Here is the relevant part (also unchanged since 0.7.0):

        // write placeholder for the row size, since we don't know it yet
        long sizePosition = dataFile.getFilePointer();
        dataFile.writeLong(-1);
        // write out row data
        int columnCount =
ColumnFamily.serializer().serializeWithIndexes(cf, dataFile);
        // seek back and write the row size (not including the size Long itself)
        long endPosition = dataFile.getFilePointer();
        dataFile.seek(sizePosition);
        dataFile.writeLong(endPosition - (sizePosition + 8));

So basically once your row size is larger than the BRAF buffer you can
expect to do a lot of rebuffer calls during flush or compact.  Sounds
to me like the difference you are seeing is caused by your row size
and not a regression in Cassandra.

2011/2/21 Ivan Georgiev <yn...@bk.ru>:
> Some more digging. This is the code path causing the excessive rebuffer()
> calls.
>
> java.lang.Exception: Stack trace
>    at java.lang.Thread.dumpStack(Unknown Source)
>    at
> org.apache.cassandra.io.util.BufferedRandomAccessFile.reBuffer(BufferedRandomAccessFile.java:204)
>    at
> org.apache.cassandra.io.util.BufferedRandomAccessFile.seek(BufferedRandomAccessFile.java:394)
>    at
> org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:148)
>    at
> org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:159)
>    at org.apache.cassandra.db.Memtable.access$000(Memtable.java:49)
>    at org.apache.cassandra.db.Memtable$1.runMayThrow(Memtable.java:174)
>    at
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
>    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>    at java.lang.Thread.run(Unknown Source)
>
> Basically the rebuffer() is being called through seek(). In 0.7.0 seek()
> does not call rebuffer() while 0.7.1 and on does
> and that is where the performance degradation is happening in my test case.
> Any ideas how to alleviate the problem ?
>
> Ivan
>
> On 21.2.2011 г. 14:07 ч., Ivan Georgiev wrote:
>>
>> I did some very rough measurements in a desperate attempt to see if I can
>> find the issue (if there is an issue).
>> Since I dont know the code base well enough i chose
>> BufferedRandomAccessFile as my suspect, since it was rewritten from 0.7.0 to
>> 0.7.1
>>
>> I did rough measurements on how many times write() and rebuffer() were
>> called, and how much time was spent in them. The measurements
>> were printed out when close() on the file was called. Time measurements
>> are rounded in seconds. I have included the measurements only for the
>> large files that were flushed to disk.
>>
>> 0.7.0
>> Total time in reBuffer : 0 seconds.
>> Total times called reBuffer : 2
>> Total time in write : 0 seconds.
>> Total times called write : 579374
>>
>> 0.7.2
>> Time spent in reBuffer : 67 seconds.
>> Times called reBuffer : 30888
>> Time spent in write : 0 seconds.
>> Times called write : 1884107
>>
>> Seems like rebuffer is being called a lot more in 0.7.2 and thats where
>> the performance degradation is coming from.
>>
>> rebuffer can end up being called from write itself, but since the time
>> spent in write is insignificant, that means this excessive
>> calling of rebuffer in 0.7.2 is not happening from the write path.
>>
>> Ivan
>>
>>
>> On 20.2.2011 г. 05:21 ч., Ivan Georgiev wrote:
>>>
>>> Is it possible that the changes to the BufferedRandomAccessFile.java
>>> could be causing the issue ?
>>> I think the most notable change there is using ByteBuffer instead of a
>>> byte[] for the buffer.
>>>
>>> I will run more tests and see what comes out of it.
>>>
>>> Ivan
>>>
>>> On 20.2.2011 г. 05:03 ч., Jonathan Ellis wrote:
>>>>
>>>> I guess you'll need to binary search through the 0.7.1 changes to find
>>>> what made the difference.  I can't think of any obvious candidates.
>>>>
>>>> On Sat, Feb 19, 2011 at 10:32 AM, Ivan Georgiev<yn...@bk.ru>  wrote:
>>>>>
>>>>> On 19.2.2011 г. 16:43 ч., Jonathan Ellis wrote:
>>>>>>
>>>>>> Flush code didn't change between 0.7.0 and 0.7.2. There must be some
>>>>>> other variable here. Memory pressure maybe?
>>>>>
>>>>> Cloud you please elaborate on that one ?
>>>>> The conditions are exactly the same for the test with 0.7.0 and 0.7.2.
>>>>> By the way, 0.7.1 tests are similar to 0.7.2, while 0.7 early betas to
>>>>> 0.7.0
>>>>> is fine.
>>>>>
>>>>> Ivan
>>>>>
>>>>> On 19.2.2011 г. 16:43 ч., Jonathan Ellis wrote:
>>>>>>
>>>>>> Flush code didn't change between 0.7.0 and 0.7.2. There must be some
>>>>>> other variable here. Memory pressure maybe?
>>>>>>
>>>>>> On Sat, Feb 19, 2011 at 8:16 AM, Ivan Georgiev<yn...@bk.ru>
>>>>>>  wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I am testing 0.7.2 on a Windows 2003 x64 system(one node) and I am
>>>>>>> having
>>>>>>> the following problem.
>>>>>>> My insertion speed is relatively slow, so the memtables do not get
>>>>>>> full
>>>>>>> and
>>>>>>> the actual flushing is triggered by memtable_flush_after_mins, this
>>>>>>> happens
>>>>>>> on the hour mark. My problem with 0.7.2 is that when that happens,
>>>>>>> the
>>>>>>> cpu
>>>>>>> spikes to 25% overall usage (x4 Xeon) and the operation takes
>>>>>>> anywhere
>>>>>>> from
>>>>>>> 2 to 4 minutes, leaving the node not responding during that time.
>>>>>>> This
>>>>>>> has
>>>>>>> forced to me to increase the rpc timeout option to beyond what i feel
>>>>>>> comfortable with.
>>>>>>> I have run multiple tests with 0.7.0 and 0.7.2 with the same dataset
>>>>>>> and
>>>>>>> the
>>>>>>> results are consistent. During the same operation 0.7.0 takes about
>>>>>>> 10
>>>>>>> seconds to complete vs. 2 to 4 minutes for 0.7.2. I am attaching a
>>>>>>> log
>>>>>>> with
>>>>>>> the timestamps from one such flushing of 0.7.2. Please let me know if
>>>>>>> there
>>>>>>> is anything i can do to speed up and get results similar to 0.7.0.
>>>>>>>
>>>>>>> Regards:
>>>>>>> Ivan
>>>>>>>
>>>>>>> This is the log of the operations which took most time during the
>>>>>>> flush
>>>>>>> operation. Using 0.7.0, with the same number of operations the
>>>>>>> flushing
>>>>>>> takes less than 10 seconds.
>>>>>>>
>>>>>>> INFO 01:36:44,906 Enqueuing flush of
>>>>>>> Memtable-ArchiveFiles.6f776e65724944@1225856921(1619949 bytes, 34467
>>>>>>> operations)
>>>>>>> INFO 01:37:47,375 Completed flushing
>>>>>>> C:\Storage\data\Storage\ArchiveFiles.68617368-f-3-Data.db (5549187
>>>>>>> bytes)
>>>>>>> INFO 01:37:47,375 Writing
>>>>>>> Memtable-ArchiveFiles.6e616d65@978152661(1619949
>>>>>>> bytes, 34467 operations)
>>>>>>> INFO 01:37:47,375 Enqueuing flush of
>>>>>>> Memtable-ArchiveFiles.706172656e74466f6c6465724944@2097700961(1619949
>>>>>>> bytes,
>>>>>>> 34467 operations)
>>>>>>> INFO 01:38:51,343 Completed flushing
>>>>>>> C:\Storage\data\Storage\ArchiveFiles.6e616d65-f-3-Data.db (3820265
>>>>>>> bytes)
>>>>>>> INFO 01:38:51,343 Writing
>>>>>>> Memtable-ArchiveFiles.6f776e65724944@1225856921(1619949 bytes, 34467
>>>>>>> operations)
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: 0.7.2 slow memtables flushing

Posted by Ivan Georgiev <yn...@bk.ru>.

Some more digging. This is the code path causing the excessive 
rebuffer() calls.

java.lang.Exception: Stack trace
     at java.lang.Thread.dumpStack(Unknown Source)
     at 
org.apache.cassandra.io.util.BufferedRandomAccessFile.reBuffer(BufferedRandomAccessFile.java:204)
     at 
org.apache.cassandra.io.util.BufferedRandomAccessFile.seek(BufferedRandomAccessFile.java:394)
     at 
org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:148)
     at 
org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:159)
     at org.apache.cassandra.db.Memtable.access$000(Memtable.java:49)
     at org.apache.cassandra.db.Memtable$1.runMayThrow(Memtable.java:174)
     at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown 
Source)
     at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
     at java.lang.Thread.run(Unknown Source)

Basically the rebuffer() is being called through seek(). In 0.7.0 seek() 
does not call rebuffer() while 0.7.1 and on does
and that is where the performance degradation is happening in my test 
case. Any ideas how to alleviate the problem ?

Ivan

On 21.2.2011 г. 14:07 ч., Ivan Georgiev wrote:
> I did some very rough measurements in a desperate attempt to see if I 
> can find the issue (if there is an issue).
> Since I dont know the code base well enough i chose 
> BufferedRandomAccessFile as my suspect, since it was rewritten from 
> 0.7.0 to 0.7.1
>
> I did rough measurements on how many times write() and rebuffer() were 
> called, and how much time was spent in them. The measurements
> were printed out when close() on the file was called. Time 
> measurements are rounded in seconds. I have included the measurements 
> only for the
> large files that were flushed to disk.
>
> 0.7.0
> Total time in reBuffer : 0 seconds.
> Total times called reBuffer : 2
> Total time in write : 0 seconds.
> Total times called write : 579374
>
> 0.7.2
> Time spent in reBuffer : 67 seconds.
> Times called reBuffer : 30888
> Time spent in write : 0 seconds.
> Times called write : 1884107
>
> Seems like rebuffer is being called a lot more in 0.7.2 and thats 
> where the performance degradation is coming from.
>
> rebuffer can end up being called from write itself, but since the time 
> spent in write is insignificant, that means this excessive
> calling of rebuffer in 0.7.2 is not happening from the write path.
>
> Ivan
>
>
> On 20.2.2011 г. 05:21 ч., Ivan Georgiev wrote:
>> Is it possible that the changes to the BufferedRandomAccessFile.java 
>> could be causing the issue ?
>> I think the most notable change there is using ByteBuffer instead of 
>> a byte[] for the buffer.
>>
>> I will run more tests and see what comes out of it.
>>
>> Ivan
>>
>> On 20.2.2011 г. 05:03 ч., Jonathan Ellis wrote:
>>> I guess you'll need to binary search through the 0.7.1 changes to find
>>> what made the difference.  I can't think of any obvious candidates.
>>>
>>> On Sat, Feb 19, 2011 at 10:32 AM, Ivan Georgiev<yn...@bk.ru>  wrote:
>>>> On 19.2.2011 г. 16:43 ч., Jonathan Ellis wrote:
>>>>> Flush code didn't change between 0.7.0 and 0.7.2. There must be some
>>>>> other variable here. Memory pressure maybe?
>>>> Cloud you please elaborate on that one ?
>>>> The conditions are exactly the same for the test with 0.7.0 and 0.7.2.
>>>> By the way, 0.7.1 tests are similar to 0.7.2, while 0.7 early betas 
>>>> to 0.7.0
>>>> is fine.
>>>>
>>>> Ivan
>>>>
>>>> On 19.2.2011 г. 16:43 ч., Jonathan Ellis wrote:
>>>>> Flush code didn't change between 0.7.0 and 0.7.2. There must be some
>>>>> other variable here. Memory pressure maybe?
>>>>>
>>>>> On Sat, Feb 19, 2011 at 8:16 AM, Ivan Georgiev<yn...@bk.ru>    
>>>>> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I am testing 0.7.2 on a Windows 2003 x64 system(one node) and I 
>>>>>> am having
>>>>>> the following problem.
>>>>>> My insertion speed is relatively slow, so the memtables do not 
>>>>>> get full
>>>>>> and
>>>>>> the actual flushing is triggered by memtable_flush_after_mins, this
>>>>>> happens
>>>>>> on the hour mark. My problem with 0.7.2 is that when that 
>>>>>> happens, the
>>>>>> cpu
>>>>>> spikes to 25% overall usage (x4 Xeon) and the operation takes 
>>>>>> anywhere
>>>>>> from
>>>>>> 2 to 4 minutes, leaving the node not responding during that time. 
>>>>>> This
>>>>>> has
>>>>>> forced to me to increase the rpc timeout option to beyond what i 
>>>>>> feel
>>>>>> comfortable with.
>>>>>> I have run multiple tests with 0.7.0 and 0.7.2 with the same 
>>>>>> dataset and
>>>>>> the
>>>>>> results are consistent. During the same operation 0.7.0 takes 
>>>>>> about 10
>>>>>> seconds to complete vs. 2 to 4 minutes for 0.7.2. I am attaching 
>>>>>> a log
>>>>>> with
>>>>>> the timestamps from one such flushing of 0.7.2. Please let me 
>>>>>> know if
>>>>>> there
>>>>>> is anything i can do to speed up and get results similar to 0.7.0.
>>>>>>
>>>>>> Regards:
>>>>>> Ivan
>>>>>>
>>>>>> This is the log of the operations which took most time during the 
>>>>>> flush
>>>>>> operation. Using 0.7.0, with the same number of operations the 
>>>>>> flushing
>>>>>> takes less than 10 seconds.
>>>>>>
>>>>>> INFO 01:36:44,906 Enqueuing flush of
>>>>>> Memtable-ArchiveFiles.6f776e65724944@1225856921(1619949 bytes, 34467
>>>>>> operations)
>>>>>> INFO 01:37:47,375 Completed flushing
>>>>>> C:\Storage\data\Storage\ArchiveFiles.68617368-f-3-Data.db 
>>>>>> (5549187 bytes)
>>>>>> INFO 01:37:47,375 Writing
>>>>>> Memtable-ArchiveFiles.6e616d65@978152661(1619949
>>>>>> bytes, 34467 operations)
>>>>>> INFO 01:37:47,375 Enqueuing flush of
>>>>>> Memtable-ArchiveFiles.706172656e74466f6c6465724944@2097700961(1619949 
>>>>>>
>>>>>> bytes,
>>>>>> 34467 operations)
>>>>>> INFO 01:38:51,343 Completed flushing
>>>>>> C:\Storage\data\Storage\ArchiveFiles.6e616d65-f-3-Data.db 
>>>>>> (3820265 bytes)
>>>>>> INFO 01:38:51,343 Writing
>>>>>> Memtable-ArchiveFiles.6f776e65724944@1225856921(1619949 bytes, 34467
>>>>>> operations)
>>>>>>
>>>>>
>>>>
>>>
>>>
>>
>>
>
>

Re: 0.7.2 slow memtables flushing

Posted by Ivan Georgiev <yn...@bk.ru>.

Some more digging. This is the code path causing the excessive 
rebuffer() calls.

java.lang.Exception: Stack trace
     at java.lang.Thread.dumpStack(Unknown Source)
     at 
org.apache.cassandra.io.util.BufferedRandomAccessFile.reBuffer(BufferedRandomAccessFile.java:204)
     at 
org.apache.cassandra.io.util.BufferedRandomAccessFile.seek(BufferedRandomAccessFile.java:394)
     at 
org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:148)
     at 
org.apache.cassandra.db.Memtable.writeSortedContents(Memtable.java:159)
     at org.apache.cassandra.db.Memtable.access$000(Memtable.java:49)
     at org.apache.cassandra.db.Memtable$1.runMayThrow(Memtable.java:174)
     at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown 
Source)
     at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
     at java.lang.Thread.run(Unknown Source)

Basically the rebuffer() is being called through seek(). In 0.7.0 seek() 
does not call rebuffer() while 0.7.1 and on does
and that is where the performance degradation is happening in my test 
case. Any ideas how to alleviate the problem ?

Ivan

On 21.2.2011 г. 14:07 ч., Ivan Georgiev wrote:
> I did some very rough measurements in a desperate attempt to see if I 
> can find the issue (if there is an issue).
> Since I dont know the code base well enough i chose 
> BufferedRandomAccessFile as my suspect, since it was rewritten from 
> 0.7.0 to 0.7.1
>
> I did rough measurements on how many times write() and rebuffer() were 
> called, and how much time was spent in them. The measurements
> were printed out when close() on the file was called. Time 
> measurements are rounded in seconds. I have included the measurements 
> only for the
> large files that were flushed to disk.
>
> 0.7.0
> Total time in reBuffer : 0 seconds.
> Total times called reBuffer : 2
> Total time in write : 0 seconds.
> Total times called write : 579374
>
> 0.7.2
> Time spent in reBuffer : 67 seconds.
> Times called reBuffer : 30888
> Time spent in write : 0 seconds.
> Times called write : 1884107
>
> Seems like rebuffer is being called a lot more in 0.7.2 and thats 
> where the performance degradation is coming from.
>
> rebuffer can end up being called from write itself, but since the time 
> spent in write is insignificant, that means this excessive
> calling of rebuffer in 0.7.2 is not happening from the write path.
>
> Ivan
>
>
> On 20.2.2011 г. 05:21 ч., Ivan Georgiev wrote:
>> Is it possible that the changes to the BufferedRandomAccessFile.java 
>> could be causing the issue ?
>> I think the most notable change there is using ByteBuffer instead of 
>> a byte[] for the buffer.
>>
>> I will run more tests and see what comes out of it.
>>
>> Ivan
>>
>> On 20.2.2011 г. 05:03 ч., Jonathan Ellis wrote:
>>> I guess you'll need to binary search through the 0.7.1 changes to find
>>> what made the difference.  I can't think of any obvious candidates.
>>>
>>> On Sat, Feb 19, 2011 at 10:32 AM, Ivan Georgiev<yn...@bk.ru>  wrote:
>>>> On 19.2.2011 г. 16:43 ч., Jonathan Ellis wrote:
>>>>> Flush code didn't change between 0.7.0 and 0.7.2. There must be some
>>>>> other variable here. Memory pressure maybe?
>>>> Cloud you please elaborate on that one ?
>>>> The conditions are exactly the same for the test with 0.7.0 and 0.7.2.
>>>> By the way, 0.7.1 tests are similar to 0.7.2, while 0.7 early betas 
>>>> to 0.7.0
>>>> is fine.
>>>>
>>>> Ivan
>>>>
>>>> On 19.2.2011 г. 16:43 ч., Jonathan Ellis wrote:
>>>>> Flush code didn't change between 0.7.0 and 0.7.2. There must be some
>>>>> other variable here. Memory pressure maybe?
>>>>>
>>>>> On Sat, Feb 19, 2011 at 8:16 AM, Ivan Georgiev<yn...@bk.ru>    
>>>>> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I am testing 0.7.2 on a Windows 2003 x64 system(one node) and I 
>>>>>> am having
>>>>>> the following problem.
>>>>>> My insertion speed is relatively slow, so the memtables do not 
>>>>>> get full
>>>>>> and
>>>>>> the actual flushing is triggered by memtable_flush_after_mins, this
>>>>>> happens
>>>>>> on the hour mark. My problem with 0.7.2 is that when that 
>>>>>> happens, the
>>>>>> cpu
>>>>>> spikes to 25% overall usage (x4 Xeon) and the operation takes 
>>>>>> anywhere
>>>>>> from
>>>>>> 2 to 4 minutes, leaving the node not responding during that time. 
>>>>>> This
>>>>>> has
>>>>>> forced to me to increase the rpc timeout option to beyond what i 
>>>>>> feel
>>>>>> comfortable with.
>>>>>> I have run multiple tests with 0.7.0 and 0.7.2 with the same 
>>>>>> dataset and
>>>>>> the
>>>>>> results are consistent. During the same operation 0.7.0 takes 
>>>>>> about 10
>>>>>> seconds to complete vs. 2 to 4 minutes for 0.7.2. I am attaching 
>>>>>> a log
>>>>>> with
>>>>>> the timestamps from one such flushing of 0.7.2. Please let me 
>>>>>> know if
>>>>>> there
>>>>>> is anything i can do to speed up and get results similar to 0.7.0.
>>>>>>
>>>>>> Regards:
>>>>>> Ivan
>>>>>>
>>>>>> This is the log of the operations which took most time during the 
>>>>>> flush
>>>>>> operation. Using 0.7.0, with the same number of operations the 
>>>>>> flushing
>>>>>> takes less than 10 seconds.
>>>>>>
>>>>>> INFO 01:36:44,906 Enqueuing flush of
>>>>>> Memtable-ArchiveFiles.6f776e65724944@1225856921(1619949 bytes, 34467
>>>>>> operations)
>>>>>> INFO 01:37:47,375 Completed flushing
>>>>>> C:\Storage\data\Storage\ArchiveFiles.68617368-f-3-Data.db 
>>>>>> (5549187 bytes)
>>>>>> INFO 01:37:47,375 Writing
>>>>>> Memtable-ArchiveFiles.6e616d65@978152661(1619949
>>>>>> bytes, 34467 operations)
>>>>>> INFO 01:37:47,375 Enqueuing flush of
>>>>>> Memtable-ArchiveFiles.706172656e74466f6c6465724944@2097700961(1619949 
>>>>>>
>>>>>> bytes,
>>>>>> 34467 operations)
>>>>>> INFO 01:38:51,343 Completed flushing
>>>>>> C:\Storage\data\Storage\ArchiveFiles.6e616d65-f-3-Data.db 
>>>>>> (3820265 bytes)
>>>>>> INFO 01:38:51,343 Writing
>>>>>> Memtable-ArchiveFiles.6f776e65724944@1225856921(1619949 bytes, 34467
>>>>>> operations)
>>>>>>
>>>>>
>>>>
>>>
>>>
>>
>>
>
>

Re: 0.7.2 slow memtables flushing

Posted by Ivan Georgiev <yn...@bk.ru>.

I did some very rough measurements in a desperate attempt to see if I 
can find the issue (if there is an issue).
Since I dont know the code base well enough i chose 
BufferedRandomAccessFile as my suspect, since it was rewritten from 
0.7.0 to 0.7.1

I did rough measurements on how many times write() and rebuffer() were 
called, and how much time was spent in them. The measurements
were printed out when close() on the file was called. Time measurements 
are rounded in seconds. I have included the measurements only for the
large files that were flushed to disk.

0.7.0
Total time in reBuffer : 0 seconds.
Total times called reBuffer : 2
Total time in write : 0 seconds.
Total times called write : 579374

0.7.2
Time spent in reBuffer : 67 seconds.
Times called reBuffer : 30888
Time spent in write : 0 seconds.
Times called write : 1884107

Seems like rebuffer is being called a lot more in 0.7.2 and thats where 
the performance degradation is coming from.

rebuffer can end up being called from write itself, but since the time 
spent in write is insignificant, that means this excessive
calling of rebuffer in 0.7.2 is not happening from the write path.

Ivan


On 20.2.2011 г. 05:21 ч., Ivan Georgiev wrote:
> Is it possible that the changes to the BufferedRandomAccessFile.java 
> could be causing the issue ?
> I think the most notable change there is using ByteBuffer instead of a 
> byte[] for the buffer.
>
> I will run more tests and see what comes out of it.
>
> Ivan
>
> On 20.2.2011 г. 05:03 ч., Jonathan Ellis wrote:
>> I guess you'll need to binary search through the 0.7.1 changes to find
>> what made the difference.  I can't think of any obvious candidates.
>>
>> On Sat, Feb 19, 2011 at 10:32 AM, Ivan Georgiev<yn...@bk.ru>  wrote:
>>> On 19.2.2011 г. 16:43 ч., Jonathan Ellis wrote:
>>>> Flush code didn't change between 0.7.0 and 0.7.2. There must be some
>>>> other variable here. Memory pressure maybe?
>>> Cloud you please elaborate on that one ?
>>> The conditions are exactly the same for the test with 0.7.0 and 0.7.2.
>>> By the way, 0.7.1 tests are similar to 0.7.2, while 0.7 early betas 
>>> to 0.7.0
>>> is fine.
>>>
>>> Ivan
>>>
>>> On 19.2.2011 г. 16:43 ч., Jonathan Ellis wrote:
>>>> Flush code didn't change between 0.7.0 and 0.7.2. There must be some
>>>> other variable here. Memory pressure maybe?
>>>>
>>>> On Sat, Feb 19, 2011 at 8:16 AM, Ivan Georgiev<yn...@bk.ru>    
>>>> wrote:
>>>>> Hi,
>>>>>
>>>>> I am testing 0.7.2 on a Windows 2003 x64 system(one node) and I am 
>>>>> having
>>>>> the following problem.
>>>>> My insertion speed is relatively slow, so the memtables do not get 
>>>>> full
>>>>> and
>>>>> the actual flushing is triggered by memtable_flush_after_mins, this
>>>>> happens
>>>>> on the hour mark. My problem with 0.7.2 is that when that happens, 
>>>>> the
>>>>> cpu
>>>>> spikes to 25% overall usage (x4 Xeon) and the operation takes 
>>>>> anywhere
>>>>> from
>>>>> 2 to 4 minutes, leaving the node not responding during that time. 
>>>>> This
>>>>> has
>>>>> forced to me to increase the rpc timeout option to beyond what i feel
>>>>> comfortable with.
>>>>> I have run multiple tests with 0.7.0 and 0.7.2 with the same 
>>>>> dataset and
>>>>> the
>>>>> results are consistent. During the same operation 0.7.0 takes 
>>>>> about 10
>>>>> seconds to complete vs. 2 to 4 minutes for 0.7.2. I am attaching a 
>>>>> log
>>>>> with
>>>>> the timestamps from one such flushing of 0.7.2. Please let me know if
>>>>> there
>>>>> is anything i can do to speed up and get results similar to 0.7.0.
>>>>>
>>>>> Regards:
>>>>> Ivan
>>>>>
>>>>> This is the log of the operations which took most time during the 
>>>>> flush
>>>>> operation. Using 0.7.0, with the same number of operations the 
>>>>> flushing
>>>>> takes less than 10 seconds.
>>>>>
>>>>> INFO 01:36:44,906 Enqueuing flush of
>>>>> Memtable-ArchiveFiles.6f776e65724944@1225856921(1619949 bytes, 34467
>>>>> operations)
>>>>> INFO 01:37:47,375 Completed flushing
>>>>> C:\Storage\data\Storage\ArchiveFiles.68617368-f-3-Data.db (5549187 
>>>>> bytes)
>>>>> INFO 01:37:47,375 Writing
>>>>> Memtable-ArchiveFiles.6e616d65@978152661(1619949
>>>>> bytes, 34467 operations)
>>>>> INFO 01:37:47,375 Enqueuing flush of
>>>>> Memtable-ArchiveFiles.706172656e74466f6c6465724944@2097700961(1619949
>>>>> bytes,
>>>>> 34467 operations)
>>>>> INFO 01:38:51,343 Completed flushing
>>>>> C:\Storage\data\Storage\ArchiveFiles.6e616d65-f-3-Data.db (3820265 
>>>>> bytes)
>>>>> INFO 01:38:51,343 Writing
>>>>> Memtable-ArchiveFiles.6f776e65724944@1225856921(1619949 bytes, 34467
>>>>> operations)
>>>>>
>>>>
>>>
>>
>>
>
>

Re: 0.7.2 slow memtables flushing

Posted by Ivan Georgiev <yn...@bk.ru>.

Is it possible that the changes to the BufferedRandomAccessFile.java 
could be causing the issue ?
I think the most notable change there is using ByteBuffer instead of a 
byte[] for the buffer.

I will run more tests and see what comes out of it.

Ivan

On 20.2.2011 г. 05:03 ч., Jonathan Ellis wrote:
> I guess you'll need to binary search through the 0.7.1 changes to find
> what made the difference.  I can't think of any obvious candidates.
>
> On Sat, Feb 19, 2011 at 10:32 AM, Ivan Georgiev<yn...@bk.ru>  wrote:
>> On 19.2.2011 г. 16:43 ч., Jonathan Ellis wrote:
>>> Flush code didn't change between 0.7.0 and 0.7.2. There must be some
>>> other variable here. Memory pressure maybe?
>> Cloud you please elaborate on that one ?
>> The conditions are exactly the same for the test with 0.7.0 and 0.7.2.
>> By the way, 0.7.1 tests are similar to 0.7.2, while 0.7 early betas to 0.7.0
>> is fine.
>>
>> Ivan
>>
>> On 19.2.2011 г. 16:43 ч., Jonathan Ellis wrote:
>>> Flush code didn't change between 0.7.0 and 0.7.2. There must be some
>>> other variable here. Memory pressure maybe?
>>>
>>> On Sat, Feb 19, 2011 at 8:16 AM, Ivan Georgiev<yn...@bk.ru>    wrote:
>>>> Hi,
>>>>
>>>> I am testing 0.7.2 on a Windows 2003 x64 system(one node) and I am having
>>>> the following problem.
>>>> My insertion speed is relatively slow, so the memtables do not get full
>>>> and
>>>> the actual flushing is triggered by memtable_flush_after_mins, this
>>>> happens
>>>> on the hour mark. My problem with 0.7.2 is that when that happens, the
>>>> cpu
>>>> spikes to 25% overall usage (x4 Xeon) and the operation takes anywhere
>>>> from
>>>> 2 to 4 minutes, leaving the node not responding during that time. This
>>>> has
>>>> forced to me to increase the rpc timeout option to beyond what i feel
>>>> comfortable with.
>>>> I have run multiple tests with 0.7.0 and 0.7.2 with the same dataset and
>>>> the
>>>> results are consistent. During the same operation 0.7.0 takes about 10
>>>> seconds to complete vs. 2 to 4 minutes for 0.7.2. I am attaching a log
>>>> with
>>>> the timestamps from one such flushing of 0.7.2. Please let me know if
>>>> there
>>>> is anything i can do to speed up and get results similar to 0.7.0.
>>>>
>>>> Regards:
>>>> Ivan
>>>>
>>>> This is the log of the operations which took most time during the flush
>>>> operation. Using 0.7.0, with the same number of operations the flushing
>>>> takes less than 10 seconds.
>>>>
>>>> INFO 01:36:44,906 Enqueuing flush of
>>>> Memtable-ArchiveFiles.6f776e65724944@1225856921(1619949 bytes, 34467
>>>> operations)
>>>> INFO 01:37:47,375 Completed flushing
>>>> C:\Storage\data\Storage\ArchiveFiles.68617368-f-3-Data.db (5549187 bytes)
>>>> INFO 01:37:47,375 Writing
>>>> Memtable-ArchiveFiles.6e616d65@978152661(1619949
>>>> bytes, 34467 operations)
>>>> INFO 01:37:47,375 Enqueuing flush of
>>>> Memtable-ArchiveFiles.706172656e74466f6c6465724944@2097700961(1619949
>>>> bytes,
>>>> 34467 operations)
>>>> INFO 01:38:51,343 Completed flushing
>>>> C:\Storage\data\Storage\ArchiveFiles.6e616d65-f-3-Data.db (3820265 bytes)
>>>> INFO 01:38:51,343 Writing
>>>> Memtable-ArchiveFiles.6f776e65724944@1225856921(1619949 bytes, 34467
>>>> operations)
>>>>
>>>
>>
>
>

Re: 0.7.2 slow memtables flushing

Posted by Jonathan Ellis <jb...@gmail.com>.

I guess you'll need to binary search through the 0.7.1 changes to find
what made the difference.  I can't think of any obvious candidates.

On Sat, Feb 19, 2011 at 10:32 AM, Ivan Georgiev <yn...@bk.ru> wrote:
> On 19.2.2011 г. 16:43 ч., Jonathan Ellis wrote:
>>
>> Flush code didn't change between 0.7.0 and 0.7.2. There must be some
>> other variable here. Memory pressure maybe?
>
> Cloud you please elaborate on that one ?
> The conditions are exactly the same for the test with 0.7.0 and 0.7.2.
> By the way, 0.7.1 tests are similar to 0.7.2, while 0.7 early betas to 0.7.0
> is fine.
>
> Ivan
>
> On 19.2.2011 г. 16:43 ч., Jonathan Ellis wrote:
>>
>> Flush code didn't change between 0.7.0 and 0.7.2. There must be some
>> other variable here. Memory pressure maybe?
>>
>> On Sat, Feb 19, 2011 at 8:16 AM, Ivan Georgiev<yn...@bk.ru>  wrote:
>>>
>>> Hi,
>>>
>>> I am testing 0.7.2 on a Windows 2003 x64 system(one node) and I am having
>>> the following problem.
>>> My insertion speed is relatively slow, so the memtables do not get full
>>> and
>>> the actual flushing is triggered by memtable_flush_after_mins, this
>>> happens
>>> on the hour mark. My problem with 0.7.2 is that when that happens, the
>>> cpu
>>> spikes to 25% overall usage (x4 Xeon) and the operation takes anywhere
>>> from
>>> 2 to 4 minutes, leaving the node not responding during that time. This
>>> has
>>> forced to me to increase the rpc timeout option to beyond what i feel
>>> comfortable with.
>>> I have run multiple tests with 0.7.0 and 0.7.2 with the same dataset and
>>> the
>>> results are consistent. During the same operation 0.7.0 takes about 10
>>> seconds to complete vs. 2 to 4 minutes for 0.7.2. I am attaching a log
>>> with
>>> the timestamps from one such flushing of 0.7.2. Please let me know if
>>> there
>>> is anything i can do to speed up and get results similar to 0.7.0.
>>>
>>> Regards:
>>> Ivan
>>>
>>> This is the log of the operations which took most time during the flush
>>> operation. Using 0.7.0, with the same number of operations the flushing
>>> takes less than 10 seconds.
>>>
>>> INFO 01:36:44,906 Enqueuing flush of
>>> Memtable-ArchiveFiles.6f776e65724944@1225856921(1619949 bytes, 34467
>>> operations)
>>> INFO 01:37:47,375 Completed flushing
>>> C:\Storage\data\Storage\ArchiveFiles.68617368-f-3-Data.db (5549187 bytes)
>>> INFO 01:37:47,375 Writing
>>> Memtable-ArchiveFiles.6e616d65@978152661(1619949
>>> bytes, 34467 operations)
>>> INFO 01:37:47,375 Enqueuing flush of
>>> Memtable-ArchiveFiles.706172656e74466f6c6465724944@2097700961(1619949
>>> bytes,
>>> 34467 operations)
>>> INFO 01:38:51,343 Completed flushing
>>> C:\Storage\data\Storage\ArchiveFiles.6e616d65-f-3-Data.db (3820265 bytes)
>>> INFO 01:38:51,343 Writing
>>> Memtable-ArchiveFiles.6f776e65724944@1225856921(1619949 bytes, 34467
>>> operations)
>>>
>>
>>
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: 0.7.2 slow memtables flushing

Posted by Ivan Georgiev <yn...@bk.ru>.

JNA is disabled and it is a windows box.

Ivan

On 19.2.2011 г. 19:23 ч., Jake Luciani wrote:
> Are you running with JNA enabled? If so could you try disabling it?
>
> On Sat, Feb 19, 2011 at 11:32 AM, Ivan Georgiev <yngwiie@bk.ru 
> <ma...@bk.ru>> wrote:
>
>     On 19.2.2011 г. 16:43 ч., Jonathan Ellis wrote:
>
>         Flush code didn't change between 0.7.0 and 0.7.2. There must
>         be some
>         other variable here. Memory pressure maybe?
>
>
>     Cloud you please elaborate on that one ?
>     The conditions are exactly the same for the test with 0.7.0 and 0.7.2.
>     By the way, 0.7.1 tests are similar to 0.7.2, while 0.7 early
>     betas to 0.7.0 is fine.
>
>     Ivan
>
>
>     On 19.2.2011 г. 16:43 ч., Jonathan Ellis wrote:
>
>         Flush code didn't change between 0.7.0 and 0.7.2. There must
>         be some
>         other variable here. Memory pressure maybe?
>
>         On Sat, Feb 19, 2011 at 8:16 AM, Ivan Georgiev<yngwiie@bk.ru
>         <ma...@bk.ru>>  wrote:
>
>             Hi,
>
>             I am testing 0.7.2 on a Windows 2003 x64 system(one node)
>             and I am having
>             the following problem.
>             My insertion speed is relatively slow, so the memtables do
>             not get full and
>             the actual flushing is triggered by
>             memtable_flush_after_mins, this happens
>             on the hour mark. My problem with 0.7.2 is that when that
>             happens, the cpu
>             spikes to 25% overall usage (x4 Xeon) and the operation
>             takes anywhere from
>             2 to 4 minutes, leaving the node not responding during
>             that time. This has
>             forced to me to increase the rpc timeout option to beyond
>             what i feel
>             comfortable with.
>             I have run multiple tests with 0.7.0 and 0.7.2 with the
>             same dataset and the
>             results are consistent. During the same operation 0.7.0
>             takes about 10
>             seconds to complete vs. 2 to 4 minutes for 0.7.2. I am
>             attaching a log with
>             the timestamps from one such flushing of 0.7.2. Please let
>             me know if there
>             is anything i can do to speed up and get results similar
>             to 0.7.0.
>
>             Regards:
>             Ivan
>
>             This is the log of the operations which took most time
>             during the flush
>             operation. Using 0.7.0, with the same number of operations
>             the flushing
>             takes less than 10 seconds.
>
>             INFO 01:36:44,906 Enqueuing flush of
>             Memtable-ArchiveFiles.6f776e65724944@1225856921(1619949
>             bytes, 34467
>             operations)
>             INFO 01:37:47,375 Completed flushing
>             C:\Storage\data\Storage\ArchiveFiles.68617368-f-3-Data.db
>             (5549187 bytes)
>             INFO 01:37:47,375 Writing
>             Memtable-ArchiveFiles.6e616d65@978152661(1619949
>             bytes, 34467 operations)
>             INFO 01:37:47,375 Enqueuing flush of
>             Memtable-ArchiveFiles.706172656e74466f6c6465724944@2097700961(1619949
>             bytes,
>             34467 operations)
>             INFO 01:38:51,343 Completed flushing
>             C:\Storage\data\Storage\ArchiveFiles.6e616d65-f-3-Data.db
>             (3820265 bytes)
>             INFO 01:38:51,343 Writing
>             Memtable-ArchiveFiles.6f776e65724944@1225856921(1619949
>             bytes, 34467
>             operations)
>
>
>
>
>
>
>
> -- 
> http://twitter.com/tjake

Re: 0.7.2 slow memtables flushing

Posted by Jake Luciani <ja...@gmail.com>.

Are you running with JNA enabled? If so could you try disabling it?

On Sat, Feb 19, 2011 at 11:32 AM, Ivan Georgiev <yn...@bk.ru> wrote:

> On 19.2.2011 г. 16:43 ч., Jonathan Ellis wrote:
>
>> Flush code didn't change between 0.7.0 and 0.7.2. There must be some
>> other variable here. Memory pressure maybe?
>>
>
> Cloud you please elaborate on that one ?
> The conditions are exactly the same for the test with 0.7.0 and 0.7.2.
> By the way, 0.7.1 tests are similar to 0.7.2, while 0.7 early betas to
> 0.7.0 is fine.
>
> Ivan
>
>
> On 19.2.2011 г. 16:43 ч., Jonathan Ellis wrote:
>
>> Flush code didn't change between 0.7.0 and 0.7.2. There must be some
>> other variable here. Memory pressure maybe?
>>
>> On Sat, Feb 19, 2011 at 8:16 AM, Ivan Georgiev<yn...@bk.ru>  wrote:
>>
>>> Hi,
>>>
>>> I am testing 0.7.2 on a Windows 2003 x64 system(one node) and I am having
>>> the following problem.
>>> My insertion speed is relatively slow, so the memtables do not get full
>>> and
>>> the actual flushing is triggered by memtable_flush_after_mins, this
>>> happens
>>> on the hour mark. My problem with 0.7.2 is that when that happens, the
>>> cpu
>>> spikes to 25% overall usage (x4 Xeon) and the operation takes anywhere
>>> from
>>> 2 to 4 minutes, leaving the node not responding during that time. This
>>> has
>>> forced to me to increase the rpc timeout option to beyond what i feel
>>> comfortable with.
>>> I have run multiple tests with 0.7.0 and 0.7.2 with the same dataset and
>>> the
>>> results are consistent. During the same operation 0.7.0 takes about 10
>>> seconds to complete vs. 2 to 4 minutes for 0.7.2. I am attaching a log
>>> with
>>> the timestamps from one such flushing of 0.7.2. Please let me know if
>>> there
>>> is anything i can do to speed up and get results similar to 0.7.0.
>>>
>>> Regards:
>>> Ivan
>>>
>>> This is the log of the operations which took most time during the flush
>>> operation. Using 0.7.0, with the same number of operations the flushing
>>> takes less than 10 seconds.
>>>
>>> INFO 01:36:44,906 Enqueuing flush of
>>> Memtable-ArchiveFiles.6f776e65724944@1225856921(1619949 bytes, 34467
>>> operations)
>>> INFO 01:37:47,375 Completed flushing
>>> C:\Storage\data\Storage\ArchiveFiles.68617368-f-3-Data.db (5549187 bytes)
>>> INFO 01:37:47,375 Writing Memtable-ArchiveFiles.6e616d65@978152661
>>> (1619949
>>> bytes, 34467 operations)
>>> INFO 01:37:47,375 Enqueuing flush of
>>> Memtable-ArchiveFiles.706172656e74466f6c6465724944@2097700961(1619949
>>> bytes,
>>> 34467 operations)
>>> INFO 01:38:51,343 Completed flushing
>>> C:\Storage\data\Storage\ArchiveFiles.6e616d65-f-3-Data.db (3820265 bytes)
>>> INFO 01:38:51,343 Writing
>>> Memtable-ArchiveFiles.6f776e65724944@1225856921(1619949 bytes, 34467
>>> operations)
>>>
>>>
>>
>>
>


-- 
http://twitter.com/tjake

Re: 0.7.2 slow memtables flushing

Posted by Ivan Georgiev <yn...@bk.ru>.

On 19.2.2011 г. 16:43 ч., Jonathan Ellis wrote:
> Flush code didn't change between 0.7.0 and 0.7.2. There must be some
> other variable here. Memory pressure maybe?

Cloud you please elaborate on that one ?
The conditions are exactly the same for the test with 0.7.0 and 0.7.2.
By the way, 0.7.1 tests are similar to 0.7.2, while 0.7 early betas to 
0.7.0 is fine.

Ivan

On 19.2.2011 г. 16:43 ч., Jonathan Ellis wrote:
> Flush code didn't change between 0.7.0 and 0.7.2. There must be some
> other variable here. Memory pressure maybe?
>
> On Sat, Feb 19, 2011 at 8:16 AM, Ivan Georgiev<yn...@bk.ru>  wrote:
>> Hi,
>>
>> I am testing 0.7.2 on a Windows 2003 x64 system(one node) and I am having
>> the following problem.
>> My insertion speed is relatively slow, so the memtables do not get full and
>> the actual flushing is triggered by memtable_flush_after_mins, this happens
>> on the hour mark. My problem with 0.7.2 is that when that happens, the cpu
>> spikes to 25% overall usage (x4 Xeon) and the operation takes anywhere from
>> 2 to 4 minutes, leaving the node not responding during that time. This has
>> forced to me to increase the rpc timeout option to beyond what i feel
>> comfortable with.
>> I have run multiple tests with 0.7.0 and 0.7.2 with the same dataset and the
>> results are consistent. During the same operation 0.7.0 takes about 10
>> seconds to complete vs. 2 to 4 minutes for 0.7.2. I am attaching a log with
>> the timestamps from one such flushing of 0.7.2. Please let me know if there
>> is anything i can do to speed up and get results similar to 0.7.0.
>>
>> Regards:
>> Ivan
>>
>> This is the log of the operations which took most time during the flush
>> operation. Using 0.7.0, with the same number of operations the flushing
>> takes less than 10 seconds.
>>
>> INFO 01:36:44,906 Enqueuing flush of
>> Memtable-ArchiveFiles.6f776e65724944@1225856921(1619949 bytes, 34467
>> operations)
>> INFO 01:37:47,375 Completed flushing
>> C:\Storage\data\Storage\ArchiveFiles.68617368-f-3-Data.db (5549187 bytes)
>> INFO 01:37:47,375 Writing Memtable-ArchiveFiles.6e616d65@978152661(1619949
>> bytes, 34467 operations)
>> INFO 01:37:47,375 Enqueuing flush of
>> Memtable-ArchiveFiles.706172656e74466f6c6465724944@2097700961(1619949 bytes,
>> 34467 operations)
>> INFO 01:38:51,343 Completed flushing
>> C:\Storage\data\Storage\ArchiveFiles.6e616d65-f-3-Data.db (3820265 bytes)
>> INFO 01:38:51,343 Writing
>> Memtable-ArchiveFiles.6f776e65724944@1225856921(1619949 bytes, 34467
>> operations)
>>
>
>

Re: 0.7.2 slow memtables flushing

Posted by Jonathan Ellis <jb...@gmail.com>.

Flush code didn't change between 0.7.0 and 0.7.2. There must be some
other variable here. Memory pressure maybe?

On Sat, Feb 19, 2011 at 8:16 AM, Ivan Georgiev <yn...@bk.ru> wrote:
> Hi,
>
> I am testing 0.7.2 on a Windows 2003 x64 system(one node) and I am having
> the following problem.
> My insertion speed is relatively slow, so the memtables do not get full and
> the actual flushing is triggered by memtable_flush_after_mins, this happens
> on the hour mark. My problem with 0.7.2 is that when that happens, the cpu
> spikes to 25% overall usage (x4 Xeon) and the operation takes anywhere from
> 2 to 4 minutes, leaving the node not responding during that time. This has
> forced to me to increase the rpc timeout option to beyond what i feel
> comfortable with.
> I have run multiple tests with 0.7.0 and 0.7.2 with the same dataset and the
> results are consistent. During the same operation 0.7.0 takes about 10
> seconds to complete vs. 2 to 4 minutes for 0.7.2. I am attaching a log with
> the timestamps from one such flushing of 0.7.2. Please let me know if there
> is anything i can do to speed up and get results similar to 0.7.0.
>
> Regards:
> Ivan
>
> This is the log of the operations which took most time during the flush
> operation. Using 0.7.0, with the same number of operations the flushing
> takes less than 10 seconds.
>
> INFO 01:36:44,906 Enqueuing flush of
> Memtable-ArchiveFiles.6f776e65724944@1225856921(1619949 bytes, 34467
> operations)
> INFO 01:37:47,375 Completed flushing
> C:\Storage\data\Storage\ArchiveFiles.68617368-f-3-Data.db (5549187 bytes)
> INFO 01:37:47,375 Writing Memtable-ArchiveFiles.6e616d65@978152661(1619949
> bytes, 34467 operations)
> INFO 01:37:47,375 Enqueuing flush of
> Memtable-ArchiveFiles.706172656e74466f6c6465724944@2097700961(1619949 bytes,
> 34467 operations)
> INFO 01:38:51,343 Completed flushing
> C:\Storage\data\Storage\ArchiveFiles.6e616d65-f-3-Data.db (3820265 bytes)
> INFO 01:38:51,343 Writing
> Memtable-ArchiveFiles.6f776e65724944@1225856921(1619949 bytes, 34467
> operations)
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com