You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Chris Lohfink (JIRA)" <ji...@apache.org> on 2014/05/16 13:18:57 UTC
[jira] [Comment Edited] (CASSANDRA-7247) Provide top ten most
frequent keys per column family
[ https://issues.apache.org/jira/browse/CASSANDRA-7247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999621#comment-13999621 ]
Chris Lohfink edited comment on CASSANDRA-7247 at 5/16/14 5:53 AM:
-------------------------------------------------------------------
Problem is StreamSummary is not thread safe. There is a ConcurrentStreamSummary, which I found in this implementation to be ~5x slower then a synchronized block around the offer of the non-thread safe one. Concurrent did perform similarly when also wrapped in synchronized block which I will show below but because it would lose any benefit of being a concurrent implementation when access is serialized I think the faster impl is best.
Done on 2013 retina MBP with 500gb ssd against trunk:
{code:title=No Changes}
id, ops , op/s, key/s, mean, med, .95, .99, .999, max, time, stderr
4 threadCount, 634450 , 21692, 21692, 0.2, 0.2, 0.2, 0.2, 0.4, 740.1, 29.2, 0.01188
8 threadCount, 886600 , 29762, 29762, 0.3, 0.2, 0.3, 0.4, 1.3, 1007.3, 29.8, 0.01220
16 threadCount, 912050 , 29035, 29035, 0.5, 0.3, 0.9, 2.5, 11.2, 1393.8, 31.4, 0.01162
24 threadCount, 1022250 , 32681, 32681, 0.7, 0.5, 1.0, 2.9, 13.5, 1126.5, 31.3, 0.00923
36 threadCount, 946550 , 30900, 30900, 1.2, 0.8, 1.4, 3.0, 22.5, 1369.2, 30.6, 0.01089
{code}
{code:title=With Patch}
id, ops , op/s, key/s, mean, med, .95, .99, .999, max, time, stderr
4 threadCount, 643900 , 21700, 21700, 0.2, 0.2, 0.2, 0.2, 0.9, 941.1, 29.7, 0.01079
8 threadCount, 942100 , 32300, 32300, 0.2, 0.2, 0.3, 0.3, 1.2, 849.5, 29.2, 0.01519
16 threadCount, 907400 , 30650, 30650, 0.5, 0.3, 0.8, 1.9, 10.7, 1124.0, 29.6, 0.01112
24 threadCount, 1026150 , 31753, 31753, 0.7, 0.5, 0.9, 3.3, 20.6, 1299.0, 32.3, 0.01295
36 threadCount, 980600 , 30077, 30077, 1.2, 0.8, 1.3, 2.7, 24.9, 1394.3, 32.6, 0.01747
{code}
{code:title=ConcurrentStreamSummary with sync}
4 threadCount, 494350 , 16643, 16643, 0.2, 0.2, 0.3, 0.3, 1.0, 943.6, 29.7, 0.01286
8 threadCount, 812950 , 26358, 26358, 0.3, 0.2, 0.3, 0.5, 1.4, 1488.9, 30.8, 0.01909
16 threadCount, 877500 , 27396, 27396, 0.6, 0.3, 1.0, 2.2, 12.1, 1299.2, 32.0, 0.01824
24 threadCount, 837550 , 25345, 25345, 0.9, 0.4, 1.2, 3.7, 84.2, 2123.6, 33.0, 0.02437
36 threadCount, 910200 , 28008, 28008, 1.3, 0.6, 2.8, 9.2, 32.2, 1212.8, 32.5, 0.01654
{code}
was (Author: cnlwsu):
Problem is StreamSummary is not thread safe. There is a ConcurrentStreamSummary, which I found in this implementation to be ~5x slower then a synchronized block around the offer of the non-thread safe one. Concurrent did perform similarly when also wrapped in synchronized block which I will show below but because it would lose any benefit of being a concurrent implementation when access is serialized I think the faster impl is best.
Done on 2013 retina MBP with 500gb ssd against trunk:
{code:title=No Changes}
id, ops , op/s, key/s, mean, med, .95, .99, .999, max, time, stderr
4 threadCount, 634450 , 21692, 21692, 0.2, 0.2, 0.2, 0.2, 0.4, 740.1, 29.2, 0.01188
8 threadCount, 886600 , 29762, 29762, 0.3, 0.2, 0.3, 0.4, 1.3, 1007.3, 29.8, 0.01220
16 threadCount, 912050 , 29035, 29035, 0.5, 0.3, 0.9, 2.5, 11.2, 1393.8, 31.4, 0.01162
24 threadCount, 1022250 , 32681, 32681, 0.7, 0.5, 1.0, 2.9, 13.5, 1126.5, 31.3, 0.00923
36 threadCount, 946550 , 30900, 30900, 1.2, 0.8, 1.4, 3.0, 22.5, 1369.2, 30.6, 0.01089
{code}
{code:title=With Patch}
id, ops , op/s, key/s, mean, med, .95, .99, .999, max, time, stderr
4 threadCount, 643900 , 21700, 21700, 0.2, 0.2, 0.2, 0.2, 0.9, 941.1, 29.7, 0.01079
8 threadCount, 942100 , 32300, 32300, 0.2, 0.2, 0.3, 0.3, 1.2, 849.5, 29.2, 0.01519
16 threadCount, 907400 , 30650, 30650, 0.5, 0.3, 0.8, 1.9, 10.7, 1124.0, 29.6, 0.01112
24 threadCount, 1026150 , 31753, 31753, 0.7, 0.5, 0.9, 3.3, 20.6, 1299.0, 32.3, 0.01295
36 threadCount, 980600 , 30077, 30077, 1.2, 0.8, 1.3, 2.7, 24.9, 1394.3, 32.6, 0.01747
{code}
> Provide top ten most frequent keys per column family
> ----------------------------------------------------
>
> Key: CASSANDRA-7247
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7247
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Chris Lohfink
> Priority: Minor
> Attachments: patch.diff
>
>
> Since already have the nice addthis stream library, can use it to keep track of most frequent DecoratedKeys that come through the system using StreamSummaries ([nice explaination|http://boundary.com/blog/2013/05/14/approximate-heavy-hitters-the-spacesaving-algorithm/]). Then provide a new metric to access them via JMX.
--
This message was sent by Atlassian JIRA
(v6.2#6252)