You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by aaron morton <aa...@thelastpickle.com> on 2013/02/01 18:55:07 UTC

Re: CPU hotspot at BloomFilterSerializer#deserialize

> 5. the problematic Data file contains only 5 to 10 keys data but large(2.4G)
So very large rows ? 
What does nodetool cfstats or cfhistograms say about the row sizes ? 


> 1. what is happening?

I think this is partially large rows and partially the query pattern, this is only by roughly correct http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/ and my talk here http://www.datastax.com/events/cassandrasummit2012/presentations

> 3. any more info required to proceed?

Do some tests with different query techniques…

Get a single named column. 
Get the first 10 columns using the natural column order.
Get the last 10 columns using the reversed order. 

Hope that helps. 

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 31/01/2013, at 7:20 PM, Takenori Sato <ts...@cloudian.com> wrote:

> Hi all,
> 
> We have a situation that CPU loads on some of our nodes in a cluster has spiked occasionally since the last November, which is triggered by requests for rows that reside on two specific sstables.
> 
> We confirmed the followings(when spiked):
> 
> version: 1.0.7(current) <- 0.8.6 <- 0.8.5 <- 0.7.8
> jdk: Oracle 1.6.0
> 
> 1. a profiling showed that BloomFilterSerializer#deserialize was the hotspot(70% of the total load by running threads)
> 
> * the stack trace looked like this(simplified)
> 90.4% - org.apache.cassandra.db.ReadVerbHandler.doVerb
> 90.4% - org.apache.cassandra.db.SliceByNamesReadCommand.getRow
> ...
> 90.4% - org.apache.cassandra.db.CollationController.collectTimeOrderedData
> ...
> 89.5% - org.apache.cassandra.db.columniterator.SSTableNamesIterator.read
> ...
> 79.9% - org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter
> 68.9% - org.apache.cassandra.io.sstable.BloomFilterSerializer.deserialize
> 66.7% - java.io.DataInputStream.readLong
> 
> 2. Usually, 1 should be so fast that a profiling by sampling can not detect
> 
> 3. no pressure on Cassandra's VM heap nor on machine in overal
> 
> 4. a little I/O traffic for our 8 disks/node(up to 100tps/disk by "iostat 1 1000")
> 
> 5. the problematic Data file contains only 5 to 10 keys data but large(2.4G)
> 
> 6. the problematic Filter file size is only 256B(could be normal)
> 
> 
> So now, I am trying to read the Filter file in the same way BloomFilterSerializer#deserialize does as possible as I can, in order to see if the file is something wrong.
> 
> Could you give me some advise on:
> 
> 1. what is happening?
> 2. the best way to simulate the BloomFilterSerializer#deserialize
> 3. any more info required to proceed?
> 
> Thanks,
> Takenori

Re: CPU hotspot at BloomFilterSerializer#deserialize

Posted by "Takenori Sato(Cloudian)" <ts...@cloudian.com>.

Hi,

We found this issue is specific to 1.0.1 through 1.0.8, which was fixed 
at 1.0.9.

https://issues.apache.org/jira/browse/CASSANDRA-4023

So by upgrading, we will see a reasonable performnace no matter how 
large row we have!

Thanks,
Takenori

(2013/02/05 2:29), aaron morton wrote:
>> Yes, it contains a big row that goes up to 2GB with more than a 
>> million of columns.
> I've run tests with 10 million small columns and reasonable 
> performance. I've not looked at 1 million large columns.
>
>>> - BloomFilterSerializer#deserialize does readLong iteratively at 
>>> each page
>>> of size 4K for a given row, which means it could be 500,000 loops(calls
>>> readLong) for a 2G row(from 1.0.7 source).
> There is only one Bloom filter per row in an SSTable, not one per 
> column index/page.
>
> It could take a while if there are a lot of sstables in the read.
>
> nodetool cfhistorgrams will let you know, run it once to reset the 
> counts , then do your test, then run it again.
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 4/02/2013, at 4:13 AM, Edward Capriolo <edlinuxguru@gmail.com 
> <ma...@gmail.com>> wrote:
>
>> It is interesting the press c* got about having 2 billion columns in a
>> row. You *can* do it but it brings to light some realities of what
>> that means.
>>
>> On Sun, Feb 3, 2013 at 8:09 AM, Takenori Sato <tsato@cloudian.com 
>> <ma...@cloudian.com>> wrote:
>>> Hi Aaron,
>>>
>>> Thanks for your answers. That helped me get a big picture.
>>>
>>> Yes, it contains a big row that goes up to 2GB with more than a 
>>> million of
>>> columns.
>>>
>>> Let me confirm if I correctly understand.
>>>
>>> - The stack trace is from Slice By Names query. And the 
>>> deserialization is
>>> at the step 3, "Read the row level Bloom Filter", on your blog.
>>>
>>> - BloomFilterSerializer#deserialize does readLong iteratively at 
>>> each page
>>> of size 4K for a given row, which means it could be 500,000 loops(calls
>>> readLong) for a 2G row(from 1.0.7 source).
>>>
>>> Correct?
>>>
>>> That makes sense Slice By Names queries against such a wide row 
>>> could be CPU
>>> bottleneck. In fact, in our test environment, a
>>> BloomFilterSerializer#deserialize of such a case takes more than 
>>> 10ms, up to
>>> 100ms.
>>>
>>>> Get a single named column.
>>>> Get the first 10 columns using the natural column order.
>>>> Get the last 10 columns using the reversed order.
>>>
>>> Interesting. A query pattern could make a difference?
>>>
>>> We thought the only solutions is to change the data structure(don't 
>>> use such
>>> a wide row if it is retrieved by Slice By Names query).
>>>
>>> Anyway, will give it a try!
>>>
>>> Best,
>>> Takenori
>>>
>>> On Sat, Feb 2, 2013 at 2:55 AM, aaron morton 
>>> <aaron@thelastpickle.com <ma...@thelastpickle.com>>
>>> wrote:
>>>>
>>>> 5. the problematic Data file contains only 5 to 10 keys data but
>>>> large(2.4G)
>>>>
>>>> So very large rows ?
>>>> What does nodetool cfstats or cfhistograms say about the row sizes ?
>>>>
>>>>
>>>> 1. what is happening?
>>>>
>>>> I think this is partially large rows and partially the query 
>>>> pattern, this
>>>> is only by roughly correct
>>>> http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/ and my 
>>>> talk here
>>>> http://www.datastax.com/events/cassandrasummit2012/presentations
>>>>
>>>> 3. any more info required to proceed?
>>>>
>>>> Do some tests with different query techniques…
>>>>
>>>> Get a single named column.
>>>> Get the first 10 columns using the natural column order.
>>>> Get the last 10 columns using the reversed order.
>>>>
>>>> Hope that helps.
>>>>
>>>> -----------------
>>>> Aaron Morton
>>>> Freelance Cassandra Developer
>>>> New Zealand
>>>>
>>>> @aaronmorton
>>>> http://www.thelastpickle.com
>>>>
>>>> On 31/01/2013, at 7:20 PM, Takenori Sato <ts...@cloudian.com> wrote:
>>>>
>>>> Hi all,
>>>>
>>>> We have a situation that CPU loads on some of our nodes in a 
>>>> cluster has
>>>> spiked occasionally since the last November, which is triggered by 
>>>> requests
>>>> for rows that reside on two specific sstables.
>>>>
>>>> We confirmed the followings(when spiked):
>>>>
>>>> version: 1.0.7(current) <- 0.8.6 <- 0.8.5 <- 0.7.8
>>>> jdk: Oracle 1.6.0
>>>>
>>>> 1. a profiling showed that BloomFilterSerializer#deserialize was the
>>>> hotspot(70% of the total load by running threads)
>>>>
>>>> * the stack trace looked like this(simplified)
>>>> 90.4% - org.apache.cassandra.db.ReadVerbHandler.doVerb
>>>> 90.4% - org.apache.cassandra.db.SliceByNamesReadCommand.getRow
>>>> ...
>>>> 90.4% - 
>>>> org.apache.cassandra.db.CollationController.collectTimeOrderedData
>>>> ...
>>>> 89.5% - 
>>>> org.apache.cassandra.db.columniterator.SSTableNamesIterator.read
>>>> ...
>>>> 79.9% - org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter
>>>> 68.9% - 
>>>> org.apache.cassandra.io.sstable.BloomFilterSerializer.deserialize
>>>> 66.7% - java.io.DataInputStream.readLong
>>>>
>>>> 2. Usually, 1 should be so fast that a profiling by sampling can not
>>>> detect
>>>>
>>>> 3. no pressure on Cassandra's VM heap nor on machine in overal
>>>>
>>>> 4. a little I/O traffic for our 8 disks/node(up to 100tps/disk by 
>>>> "iostat
>>>> 1 1000")
>>>>
>>>> 5. the problematic Data file contains only 5 to 10 keys data but
>>>> large(2.4G)
>>>>
>>>> 6. the problematic Filter file size is only 256B(could be normal)
>>>>
>>>>
>>>> So now, I am trying to read the Filter file in the same way
>>>> BloomFilterSerializer#deserialize does as possible as I can, in 
>>>> order to see
>>>> if the file is something wrong.
>>>>
>>>> Could you give me some advise on:
>>>>
>>>> 1. what is happening?
>>>> 2. the best way to simulate the BloomFilterSerializer#deserialize
>>>> 3. any more info required to proceed?
>>>>
>>>> Thanks,
>>>> Takenori
>>>>
>>>>
>>>
>

Re: CPU hotspot at BloomFilterSerializer#deserialize

Posted by aaron morton <aa...@thelastpickle.com>.

> Yes, it contains a big row that goes up to 2GB with more than a million of columns.

I've run tests with 10 million small columns and reasonable performance. I've not looked at 1 million large columns.  

>> - BloomFilterSerializer#deserialize does readLong iteratively at each page
>> of size 4K for a given row, which means it could be 500,000 loops(calls
>> readLong) for a 2G row(from 1.0.7 source).
There is only one Bloom filter per row in an SSTable, not one per column index/page. 

It could take a while if there are a lot of sstables in the read. 

nodetool cfhistorgrams will let you know, run it once to reset the counts , then do your test, then run it again. 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 4/02/2013, at 4:13 AM, Edward Capriolo <ed...@gmail.com> wrote:

> It is interesting the press c* got about having 2 billion columns in a
> row. You *can* do it but it brings to light some realities of what
> that means.
> 
> On Sun, Feb 3, 2013 at 8:09 AM, Takenori Sato <ts...@cloudian.com> wrote:
>> Hi Aaron,
>> 
>> Thanks for your answers. That helped me get a big picture.
>> 
>> Yes, it contains a big row that goes up to 2GB with more than a million of
>> columns.
>> 
>> Let me confirm if I correctly understand.
>> 
>> - The stack trace is from Slice By Names query. And the deserialization is
>> at the step 3, "Read the row level Bloom Filter", on your blog.
>> 
>> - BloomFilterSerializer#deserialize does readLong iteratively at each page
>> of size 4K for a given row, which means it could be 500,000 loops(calls
>> readLong) for a 2G row(from 1.0.7 source).
>> 
>> Correct?
>> 
>> That makes sense Slice By Names queries against such a wide row could be CPU
>> bottleneck. In fact, in our test environment, a
>> BloomFilterSerializer#deserialize of such a case takes more than 10ms, up to
>> 100ms.
>> 
>>> Get a single named column.
>>> Get the first 10 columns using the natural column order.
>>> Get the last 10 columns using the reversed order.
>> 
>> Interesting. A query pattern could make a difference?
>> 
>> We thought the only solutions is to change the data structure(don't use such
>> a wide row if it is retrieved by Slice By Names query).
>> 
>> Anyway, will give it a try!
>> 
>> Best,
>> Takenori
>> 
>> On Sat, Feb 2, 2013 at 2:55 AM, aaron morton <aa...@thelastpickle.com>
>> wrote:
>>> 
>>> 5. the problematic Data file contains only 5 to 10 keys data but
>>> large(2.4G)
>>> 
>>> So very large rows ?
>>> What does nodetool cfstats or cfhistograms say about the row sizes ?
>>> 
>>> 
>>> 1. what is happening?
>>> 
>>> I think this is partially large rows and partially the query pattern, this
>>> is only by roughly correct
>>> http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/ and my talk here
>>> http://www.datastax.com/events/cassandrasummit2012/presentations
>>> 
>>> 3. any more info required to proceed?
>>> 
>>> Do some tests with different query techniques…
>>> 
>>> Get a single named column.
>>> Get the first 10 columns using the natural column order.
>>> Get the last 10 columns using the reversed order.
>>> 
>>> Hope that helps.
>>> 
>>> -----------------
>>> Aaron Morton
>>> Freelance Cassandra Developer
>>> New Zealand
>>> 
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>> 
>>> On 31/01/2013, at 7:20 PM, Takenori Sato <ts...@cloudian.com> wrote:
>>> 
>>> Hi all,
>>> 
>>> We have a situation that CPU loads on some of our nodes in a cluster has
>>> spiked occasionally since the last November, which is triggered by requests
>>> for rows that reside on two specific sstables.
>>> 
>>> We confirmed the followings(when spiked):
>>> 
>>> version: 1.0.7(current) <- 0.8.6 <- 0.8.5 <- 0.7.8
>>> jdk: Oracle 1.6.0
>>> 
>>> 1. a profiling showed that BloomFilterSerializer#deserialize was the
>>> hotspot(70% of the total load by running threads)
>>> 
>>> * the stack trace looked like this(simplified)
>>> 90.4% - org.apache.cassandra.db.ReadVerbHandler.doVerb
>>> 90.4% - org.apache.cassandra.db.SliceByNamesReadCommand.getRow
>>> ...
>>> 90.4% - org.apache.cassandra.db.CollationController.collectTimeOrderedData
>>> ...
>>> 89.5% - org.apache.cassandra.db.columniterator.SSTableNamesIterator.read
>>> ...
>>> 79.9% - org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter
>>> 68.9% - org.apache.cassandra.io.sstable.BloomFilterSerializer.deserialize
>>> 66.7% - java.io.DataInputStream.readLong
>>> 
>>> 2. Usually, 1 should be so fast that a profiling by sampling can not
>>> detect
>>> 
>>> 3. no pressure on Cassandra's VM heap nor on machine in overal
>>> 
>>> 4. a little I/O traffic for our 8 disks/node(up to 100tps/disk by "iostat
>>> 1 1000")
>>> 
>>> 5. the problematic Data file contains only 5 to 10 keys data but
>>> large(2.4G)
>>> 
>>> 6. the problematic Filter file size is only 256B(could be normal)
>>> 
>>> 
>>> So now, I am trying to read the Filter file in the same way
>>> BloomFilterSerializer#deserialize does as possible as I can, in order to see
>>> if the file is something wrong.
>>> 
>>> Could you give me some advise on:
>>> 
>>> 1. what is happening?
>>> 2. the best way to simulate the BloomFilterSerializer#deserialize
>>> 3. any more info required to proceed?
>>> 
>>> Thanks,
>>> Takenori
>>> 
>>> 
>>

Re: CPU hotspot at BloomFilterSerializer#deserialize

Posted by Edward Capriolo <ed...@gmail.com>.

It is interesting the press c* got about having 2 billion columns in a
row. You *can* do it but it brings to light some realities of what
that means.

On Sun, Feb 3, 2013 at 8:09 AM, Takenori Sato <ts...@cloudian.com> wrote:
> Hi Aaron,
>
> Thanks for your answers. That helped me get a big picture.
>
> Yes, it contains a big row that goes up to 2GB with more than a million of
> columns.
>
> Let me confirm if I correctly understand.
>
> - The stack trace is from Slice By Names query. And the deserialization is
> at the step 3, "Read the row level Bloom Filter", on your blog.
>
> - BloomFilterSerializer#deserialize does readLong iteratively at each page
> of size 4K for a given row, which means it could be 500,000 loops(calls
> readLong) for a 2G row(from 1.0.7 source).
>
> Correct?
>
> That makes sense Slice By Names queries against such a wide row could be CPU
> bottleneck. In fact, in our test environment, a
> BloomFilterSerializer#deserialize of such a case takes more than 10ms, up to
> 100ms.
>
>> Get a single named column.
>> Get the first 10 columns using the natural column order.
>> Get the last 10 columns using the reversed order.
>
> Interesting. A query pattern could make a difference?
>
> We thought the only solutions is to change the data structure(don't use such
> a wide row if it is retrieved by Slice By Names query).
>
> Anyway, will give it a try!
>
> Best,
> Takenori
>
> On Sat, Feb 2, 2013 at 2:55 AM, aaron morton <aa...@thelastpickle.com>
> wrote:
>>
>> 5. the problematic Data file contains only 5 to 10 keys data but
>> large(2.4G)
>>
>> So very large rows ?
>> What does nodetool cfstats or cfhistograms say about the row sizes ?
>>
>>
>> 1. what is happening?
>>
>> I think this is partially large rows and partially the query pattern, this
>> is only by roughly correct
>> http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/ and my talk here
>> http://www.datastax.com/events/cassandrasummit2012/presentations
>>
>> 3. any more info required to proceed?
>>
>> Do some tests with different query techniques…
>>
>> Get a single named column.
>> Get the first 10 columns using the natural column order.
>> Get the last 10 columns using the reversed order.
>>
>> Hope that helps.
>>
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Developer
>> New Zealand
>>
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 31/01/2013, at 7:20 PM, Takenori Sato <ts...@cloudian.com> wrote:
>>
>> Hi all,
>>
>> We have a situation that CPU loads on some of our nodes in a cluster has
>> spiked occasionally since the last November, which is triggered by requests
>> for rows that reside on two specific sstables.
>>
>> We confirmed the followings(when spiked):
>>
>> version: 1.0.7(current) <- 0.8.6 <- 0.8.5 <- 0.7.8
>> jdk: Oracle 1.6.0
>>
>> 1. a profiling showed that BloomFilterSerializer#deserialize was the
>> hotspot(70% of the total load by running threads)
>>
>> * the stack trace looked like this(simplified)
>> 90.4% - org.apache.cassandra.db.ReadVerbHandler.doVerb
>> 90.4% - org.apache.cassandra.db.SliceByNamesReadCommand.getRow
>> ...
>> 90.4% - org.apache.cassandra.db.CollationController.collectTimeOrderedData
>> ...
>> 89.5% - org.apache.cassandra.db.columniterator.SSTableNamesIterator.read
>> ...
>> 79.9% - org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter
>> 68.9% - org.apache.cassandra.io.sstable.BloomFilterSerializer.deserialize
>> 66.7% - java.io.DataInputStream.readLong
>>
>> 2. Usually, 1 should be so fast that a profiling by sampling can not
>> detect
>>
>> 3. no pressure on Cassandra's VM heap nor on machine in overal
>>
>> 4. a little I/O traffic for our 8 disks/node(up to 100tps/disk by "iostat
>> 1 1000")
>>
>> 5. the problematic Data file contains only 5 to 10 keys data but
>> large(2.4G)
>>
>> 6. the problematic Filter file size is only 256B(could be normal)
>>
>>
>> So now, I am trying to read the Filter file in the same way
>> BloomFilterSerializer#deserialize does as possible as I can, in order to see
>> if the file is something wrong.
>>
>> Could you give me some advise on:
>>
>> 1. what is happening?
>> 2. the best way to simulate the BloomFilterSerializer#deserialize
>> 3. any more info required to proceed?
>>
>> Thanks,
>> Takenori
>>
>>
>

Re: CPU hotspot at BloomFilterSerializer#deserialize

Posted by Takenori Sato <ts...@cloudian.com>.

Hi Aaron,

Thanks for your answers. That helped me get a big picture.

Yes, it contains a big row that goes up to 2GB with more than a million of
columns.

Let me confirm if I correctly understand.

- The stack trace is from Slice By Names query. And the deserialization is
at the step 3, "Read the row level Bloom Filter", on your blog.

- BloomFilterSerializer#deserialize does readLong iteratively at each page
of size 4K for a given row, which means it could be 500,000 loops(calls
readLong) for a 2G row(from 1.0.7 source).

Correct?

That makes sense Slice By Names queries against such a wide row could be
CPU bottleneck. In fact, in our test environment, a
BloomFilterSerializer#deserialize of such a case takes more than 10ms, up
to 100ms.

> Get a single named column.
> Get the first 10 columns using the natural column order.
> Get the last 10 columns using the reversed order.

Interesting. A query pattern could make a difference?

We thought the only solutions is to change the data structure(don't use
such a wide row if it is retrieved by Slice By Names query).

Anyway, will give it a try!

Best,
Takenori

On Sat, Feb 2, 2013 at 2:55 AM, aaron morton <aa...@thelastpickle.com>wrote:

> 5. the problematic Data file contains only 5 to 10 keys data but
> large(2.4G)
>
> So very large rows ?
> What does nodetool cfstats or cfhistograms say about the row sizes ?
>
>
> 1. what is happening?
>
> I think this is partially large rows and partially the query pattern, this
> is only by roughly correct
> http://thelastpickle.com/2011/07/04/Cassandra-Query-Plans/ and my talk
> here http://www.datastax.com/events/cassandrasummit2012/presentations
>
> 3. any more info required to proceed?
>
> Do some tests with different query techniques…
>
> Get a single named column.
> Get the first 10 columns using the natural column order.
> Get the last 10 columns using the reversed order.
>
> Hope that helps.
>
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 31/01/2013, at 7:20 PM, Takenori Sato <ts...@cloudian.com> wrote:
>
> Hi all,
>
> We have a situation that CPU loads on some of our nodes in a cluster has
> spiked occasionally since the last November, which is triggered by requests
> for rows that reside on two specific sstables.
>
> We confirmed the followings(when spiked):
>
> version: 1.0.7(current) <- 0.8.6 <- 0.8.5 <- 0.7.8
> jdk: Oracle 1.6.0
>
> 1. a profiling showed that BloomFilterSerializer#deserialize was the
> hotspot(70% of the total load by running threads)
>
> * the stack trace looked like this(simplified)
> 90.4% - org.apache.cassandra.db.ReadVerbHandler.doVerb
> 90.4% - org.apache.cassandra.db.SliceByNamesReadCommand.getRow
> ...
> 90.4% - org.apache.cassandra.db.CollationController.collectTimeOrderedData
> ...
> 89.5% - org.apache.cassandra.db.columniterator.SSTableNamesIterator.read
> ...
> 79.9% - org.apache.cassandra.io.sstable.IndexHelper.defreezeBloomFilter
> 68.9% - org.apache.cassandra.io.sstable.BloomFilterSerializer.deserialize
> 66.7% - java.io.DataInputStream.readLong
>
> 2. Usually, 1 should be so fast that a profiling by sampling can not detect
>
> 3. no pressure on Cassandra's VM heap nor on machine in overal
>
> 4. a little I/O traffic for our 8 disks/node(up to 100tps/disk by "iostat
> 1 1000")
>
> 5. the problematic Data file contains only 5 to 10 keys data but
> large(2.4G)
>
> 6. the problematic Filter file size is only 256B(could be normal)
>
>
> So now, I am trying to read the Filter file in the same way
> BloomFilterSerializer#deserialize does as possible as I can, in order to
> see if the file is something wrong.
>
> Could you give me some advise on:
>
> 1. what is happening?
> 2. the best way to simulate the BloomFilterSerializer#deserialize
> 3. any more info required to proceed?
>
> Thanks,
> Takenori
>
>
>