You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Vijay (JIRA)" <ji...@apache.org> on 2012/06/28 04:46:42 UTC

[jira] [Created] (CASSANDRA-4388) Use MMap for CompressedSegmentFile with Native Checksum

Vijay created CASSANDRA-4388:
--------------------------------

             Summary: Use MMap for CompressedSegmentFile with Native Checksum
                 Key: CASSANDRA-4388
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4388
             Project: Cassandra
          Issue Type: Improvement
          Components: Core
    Affects Versions: 1.2
            Reporter: Vijay
            Assignee: Vijay


Use MMap for CompressedSegmentFile (Something similar to Cassandra-3623) and use Native Checksum (HDFS-2080) to avoid memcpy and be faster in its calculations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-4388) Use MMap for CompressedSegmentFile with Native Checksum

Posted by "Pavel Yaskevich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-4388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pavel Yaskevich updated CASSANDRA-4388:
---------------------------------------

    Affects Version/s:     (was: 1.2)
        Fix Version/s: 1.2
    
> Use MMap for CompressedSegmentFile with Native Checksum
> -------------------------------------------------------
>
>                 Key: CASSANDRA-4388
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4388
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Vijay
>            Assignee: Vijay
>             Fix For: 1.2
>
>         Attachments: 0001-CASSANDRA-4388.patch
>
>
> Use MMap for CompressedSegmentFile (Something similar to CASSANDRA-3623) and use Native Checksum (HDFS-2080) to avoid memcpy and be faster in its calculations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-4388) Use MMap for CompressedSegmentFile with Native Checksum

Posted by "Vijay (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-4388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vijay updated CASSANDRA-4388:
-----------------------------

    Description: Use MMap for CompressedSegmentFile (Something similar to CASSANDRA-3623) and use Native Checksum (HDFS-2080) to avoid memcpy and be faster in its calculations.  (was: Use MMap for CompressedSegmentFile (Something similar to Cassandra-3623) and use Native Checksum (HDFS-2080) to avoid memcpy and be faster in its calculations.)
    
> Use MMap for CompressedSegmentFile with Native Checksum
> -------------------------------------------------------
>
>                 Key: CASSANDRA-4388
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4388
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 1.2
>            Reporter: Vijay
>            Assignee: Vijay
>
> Use MMap for CompressedSegmentFile (Something similar to CASSANDRA-3623) and use Native Checksum (HDFS-2080) to avoid memcpy and be faster in its calculations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-4388) Use MMap for CompressedSegmentFile with Native Checksum

Posted by "Pavel Yaskevich (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419870#comment-13419870 ] 

Pavel Yaskevich commented on CASSANDRA-4388:
--------------------------------------------

I'm concerned about all data copying that CMFDI introduces - thread local byte buffers and allocation/hashing by chunk length, which kind of beats the purpose of having that data mmap'ed into process space. Also your benchmark compares apples to oranges as CRAR doesn't use the same CRC32 implementation so when you change it to use PureJavaCrc32() or disable checksum checking (as it adds to the experiment entropy) it actually shows the results equal to (or a bit better, ~5-6 secs, than) CMFDI on machines I have tested it.

The last thing - node crashes on reads if you do following steps:

1). set disk_access_mode: mmap
2). write 100000 -S 512 (to trigger compaction)
3). read (everything ok so far)
4). switch to disk_access_mode: standard
5). read (data still could be read)
6). write 100000 -S 512 (then wait for compactions to finish before shutdown)
7). read (to verify that data is still readable in standard mode after all compactions)
8). do steps 1. and 3. and node would throw OOM as well as throw bunch of following exceptions

{noformat}
java.lang.AssertionError: Interval min > max
          at org.apache.cassandra.utils.IntervalTree$IntervalNode.<init>(IntervalTree.java:250)
          at org.apache.cassandra.utils.IntervalTree.<init>(IntervalTree.java:72)
          at org.apache.cassandra.utils.IntervalTree.build(IntervalTree.java:81)
          at org.apache.cassandra.db.DeletionInfo.add(DeletionInfo.java:177)
          at org.apache.cassandra.db.AbstractThreadUnsafeSortedColumns.delete(AbstractThreadUnsafeSortedColumns.java:43)
          at org.apache.cassandra.db.AbstractColumnContainer.delete(AbstractColumnContainer.java:54)
          at org.apache.cassandra.db.ColumnFamily.addAtom(ColumnFamily.java:224)
          at org.apache.cassandra.db.filter.QueryFilter$2.getNext(QueryFilter.java:189)
          at org.apache.cassandra.db.filter.QueryFilter$2.hasNext(QueryFilter.java:161)
          at org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:146)
          at org.apache.cassandra.utils.MergeIterator$ManyToOne.<init>(MergeIterator.java:89)
          at org.apache.cassandra.utils.MergeIterator.get(MergeIterator.java:46)
          at org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:141)
          at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:91)
          at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:281)
          at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:61)
          at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1318)
          at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1172)
          at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1107)
          at org.apache.cassandra.db.Table.getRow(Table.java:339)
          at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:69)
          at org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:827)
          at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1280)
          at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
          at java.lang.Thread.run(Thread.java:662)
{noformat}
 
                
> Use MMap for CompressedSegmentFile with Native Checksum
> -------------------------------------------------------
>
>                 Key: CASSANDRA-4388
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4388
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Vijay
>            Assignee: Vijay
>             Fix For: 1.2
>
>         Attachments: 0001-CASSANDRA-4388.patch
>
>
> Use MMap for CompressedSegmentFile (Something similar to CASSANDRA-3623) and use Native Checksum (HDFS-2080) to avoid memcpy and be faster in its calculations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-4388) Use MMap for CompressedSegmentFile with Native Checksum

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-4388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-4388:
--------------------------------------

    Reviewer: xedin
    
> Use MMap for CompressedSegmentFile with Native Checksum
> -------------------------------------------------------
>
>                 Key: CASSANDRA-4388
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4388
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 1.2
>            Reporter: Vijay
>            Assignee: Vijay
>         Attachments: 0001-CASSANDRA-4388.patch
>
>
> Use MMap for CompressedSegmentFile (Something similar to CASSANDRA-3623) and use Native Checksum (HDFS-2080) to avoid memcpy and be faster in its calculations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-4388) Use MMap for CompressedSegmentFile with Native Checksum

Posted by "Vijay (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-4388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vijay updated CASSANDRA-4388:
-----------------------------

    Attachment: 0001-CASSANDRA-4388.patch

There is 2 parts to this patch:
1) We have to support MMap to get access to DirectBB
2) Use hadoop's native API to use native CRC & CRCC 

For the second part needs Native C library which is included in the alpha version of hadoop jar. Hence we might want to wait till it is released (to avoid breaking other dependent code).

Bench Mark (First part is still better): 
http://pastebin.com/g4p3Sf6v
                
> Use MMap for CompressedSegmentFile with Native Checksum
> -------------------------------------------------------
>
>                 Key: CASSANDRA-4388
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4388
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 1.2
>            Reporter: Vijay
>            Assignee: Vijay
>         Attachments: 0001-CASSANDRA-4388.patch
>
>
> Use MMap for CompressedSegmentFile (Something similar to CASSANDRA-3623) and use Native Checksum (HDFS-2080) to avoid memcpy and be faster in its calculations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira