You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Pavel Yaskevich (Commented) (JIRA)" <ji...@apache.org> on 2012/01/04 14:27:38 UTC

[jira] [Commented] (CASSANDRA-3623) use MMapedBuffer in CompressedSegmentedFile.getSegment

    [ https://issues.apache.org/jira/browse/CASSANDRA-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179469#comment-13179469 ] 

Pavel Yaskevich commented on CASSANDRA-3623:
--------------------------------------------

MMappedIO-Performance.docx test for 10,000 columnSize is misleading again because you actually use -S 100000 which is 10 times bigger than suggested 10,000.

Tested v3 of this patch + 3610 (v3) and 3611 (v4) with disc_access_mode: mmap (and standard) and crc_check_chance: 0.0 on my real machine with 2GB RAM and Quad-Core AMD Opteron processor under Debian (2.6.35) GNU/Linux.

Used stress tool to populate db "./bin/stress -n 300000 -S 512 -I SnappyCompressor" and right after that used "update column family Standard1 with compression_options = {'sstable_compression' : 'org.apache.cassandra.io.compress.SnappyCompressor', 'crc_check_chance':'0.0'};" from CLI, made sure all was flushed/compacted and stopped Cassandra. Please note that generated data does not entirely fit into page cache.

Test #1:
  # sync && echo 1 > /proc/sys/vm/drop_caches
  # changed ./conf/cassandra.yaml with "disk_access_mode: mmap"
  # started Cassandra
  # run `./bin/stress -n 300000 -S 512 -o read`
{noformat}
total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
16438,1643,1643,0.029554629516972866,10
40997,2455,2455,0.020681908872511097,20
66256,2525,2525,0.020270200720535255,30
90857,2460,2460,0.020607454981504816,41
115779,2492,2492,0.020273372923521386,51
141033,2525,2525,0.020168923734853884,61
166268,2523,2523,0.020269823657618386,72
191018,2475,2475,0.02026589898989899,82
216367,2534,2534,0.020031519981064342,92
241153,2478,2478,0.020092875010086338,102
265959,2480,2480,0.020124244134483594,113
290228,2426,2426,0.019975400716964027,123
300000,977,977,0.012085448219402373,127
{noformat}
    # run #4 once again to see how populated page cache affected performance
{noformat}
total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
50913,5091,5091,0.0036437844951191247,10
106548,5563,5563,0.003795344657140289,20
164274,5772,5772,0.0050692928662994146,30
220312,5603,5603,0.003771262357685856,40
276125,5581,5581,0.0037274111766076004,50
300000,2387,2387,0.003665089005235602,55
{noformat}


Test #2:
  # sync && echo 1 > /proc/sys/vm/drop_caches
  # changed ./conf/cassandra.yaml with "disk_access_mode: standard"
  # started Cassandra
  # run `./bin/stress -n 300000 -S 512 -o read`
{noformat}
total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
36048,3604,3604,0.00862633155792277,10
92134,5608,5608,0.004530007488499804,20
148475,5634,5634,0.004739603485916118,30
204987,5651,5651,0.004508653029445074,40
262779,5779,5779,0.004955564784053157,51
300000,3722,3722,0.004320276188173343,57
{noformat}
  # run #4 once again to see how populated page cache affected performance
{noformat}
pavel1:/usr/src/cassandra/tools/stress# ./bin/stress -n 300000 -S 512 -I SnappyCompressor -o read
total,interval_op_rate,interval_key_rate,avg_latency,elapsed_time
50151,5015,5015,0.004033399134613467,10
105726,5557,5557,0.0039961673414304994,20
162237,5651,5651,0.003965387269735096,30
218366,5612,5612,0.003923764898715459,40
274388,5602,5602,0.003912695012673592,50
300000,2561,2561,0.0034509995314696237,55
{noformat}

I did re-run mmap test on the cold page cache few times to make sure that it's the real behavior. The test shows that mmap and standard I/O are not really different on my machine and mmap'ed I/O performs worse on the cold cache, the same effect would stand in the situation with high number of page faults.

                
> use MMapedBuffer in CompressedSegmentedFile.getSegment
> ------------------------------------------------------
>
>                 Key: CASSANDRA-3623
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3623
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 1.1
>            Reporter: Vijay
>            Assignee: Vijay
>              Labels: compression
>             Fix For: 1.1
>
>         Attachments: 0001-MMaped-Compression-segmented-file-v2.patch, 0001-MMaped-Compression-segmented-file-v3.patch, 0001-MMaped-Compression-segmented-file.patch, 0002-tests-for-MMaped-Compression-segmented-file-v2.patch, 0002-tests-for-MMaped-Compression-segmented-file-v3.patch, CRC+MMapIO.xlsx, MMappedIO-Performance.docx
>
>
> CompressedSegmentedFile.getSegment seem to open a new file and doesnt seem to use the MMap and hence a higher CPU on the nodes and higher latencies on reads. 
> This ticket is to implement the TODO mentioned in CompressedRandomAccessReader
> // TODO refactor this to separate concept of "buffer to avoid lots of read() syscalls" and "compression buffer"
> but i think a separate class for the Buffer will be better.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira