You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Vijay (JIRA)" <ji...@apache.org> on 2012/06/28 04:46:42 UTC
[jira] [Created] (CASSANDRA-4388) Use MMap for
CompressedSegmentFile with Native Checksum
Vijay created CASSANDRA-4388:
--------------------------------
Summary: Use MMap for CompressedSegmentFile with Native Checksum
Key: CASSANDRA-4388
URL: https://issues.apache.org/jira/browse/CASSANDRA-4388
Project: Cassandra
Issue Type: Improvement
Components: Core
Affects Versions: 1.2
Reporter: Vijay
Assignee: Vijay
Use MMap for CompressedSegmentFile (Something similar to Cassandra-3623) and use Native Checksum (HDFS-2080) to avoid memcpy and be faster in its calculations.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-4388) Use MMap for
CompressedSegmentFile with Native Checksum
Posted by "Pavel Yaskevich (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-4388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Pavel Yaskevich updated CASSANDRA-4388:
---------------------------------------
Affects Version/s: (was: 1.2)
Fix Version/s: 1.2
> Use MMap for CompressedSegmentFile with Native Checksum
> -------------------------------------------------------
>
> Key: CASSANDRA-4388
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4388
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Reporter: Vijay
> Assignee: Vijay
> Fix For: 1.2
>
> Attachments: 0001-CASSANDRA-4388.patch
>
>
> Use MMap for CompressedSegmentFile (Something similar to CASSANDRA-3623) and use Native Checksum (HDFS-2080) to avoid memcpy and be faster in its calculations.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-4388) Use MMap for
CompressedSegmentFile with Native Checksum
Posted by "Vijay (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-4388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vijay updated CASSANDRA-4388:
-----------------------------
Description: Use MMap for CompressedSegmentFile (Something similar to CASSANDRA-3623) and use Native Checksum (HDFS-2080) to avoid memcpy and be faster in its calculations. (was: Use MMap for CompressedSegmentFile (Something similar to Cassandra-3623) and use Native Checksum (HDFS-2080) to avoid memcpy and be faster in its calculations.)
> Use MMap for CompressedSegmentFile with Native Checksum
> -------------------------------------------------------
>
> Key: CASSANDRA-4388
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4388
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Affects Versions: 1.2
> Reporter: Vijay
> Assignee: Vijay
>
> Use MMap for CompressedSegmentFile (Something similar to CASSANDRA-3623) and use Native Checksum (HDFS-2080) to avoid memcpy and be faster in its calculations.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4388) Use MMap for
CompressedSegmentFile with Native Checksum
Posted by "Pavel Yaskevich (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-4388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419870#comment-13419870 ]
Pavel Yaskevich commented on CASSANDRA-4388:
--------------------------------------------
I'm concerned about all data copying that CMFDI introduces - thread local byte buffers and allocation/hashing by chunk length, which kind of beats the purpose of having that data mmap'ed into process space. Also your benchmark compares apples to oranges as CRAR doesn't use the same CRC32 implementation so when you change it to use PureJavaCrc32() or disable checksum checking (as it adds to the experiment entropy) it actually shows the results equal to (or a bit better, ~5-6 secs, than) CMFDI on machines I have tested it.
The last thing - node crashes on reads if you do following steps:
1). set disk_access_mode: mmap
2). write 100000 -S 512 (to trigger compaction)
3). read (everything ok so far)
4). switch to disk_access_mode: standard
5). read (data still could be read)
6). write 100000 -S 512 (then wait for compactions to finish before shutdown)
7). read (to verify that data is still readable in standard mode after all compactions)
8). do steps 1. and 3. and node would throw OOM as well as throw bunch of following exceptions
{noformat}
java.lang.AssertionError: Interval min > max
at org.apache.cassandra.utils.IntervalTree$IntervalNode.<init>(IntervalTree.java:250)
at org.apache.cassandra.utils.IntervalTree.<init>(IntervalTree.java:72)
at org.apache.cassandra.utils.IntervalTree.build(IntervalTree.java:81)
at org.apache.cassandra.db.DeletionInfo.add(DeletionInfo.java:177)
at org.apache.cassandra.db.AbstractThreadUnsafeSortedColumns.delete(AbstractThreadUnsafeSortedColumns.java:43)
at org.apache.cassandra.db.AbstractColumnContainer.delete(AbstractColumnContainer.java:54)
at org.apache.cassandra.db.ColumnFamily.addAtom(ColumnFamily.java:224)
at org.apache.cassandra.db.filter.QueryFilter$2.getNext(QueryFilter.java:189)
at org.apache.cassandra.db.filter.QueryFilter$2.hasNext(QueryFilter.java:161)
at org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:146)
at org.apache.cassandra.utils.MergeIterator$ManyToOne.<init>(MergeIterator.java:89)
at org.apache.cassandra.utils.MergeIterator.get(MergeIterator.java:46)
at org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:141)
at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:91)
at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:281)
at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:61)
at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1318)
at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1172)
at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1107)
at org.apache.cassandra.db.Table.getRow(Table.java:339)
at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:69)
at org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:827)
at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1280)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
{noformat}
> Use MMap for CompressedSegmentFile with Native Checksum
> -------------------------------------------------------
>
> Key: CASSANDRA-4388
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4388
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Reporter: Vijay
> Assignee: Vijay
> Fix For: 1.2
>
> Attachments: 0001-CASSANDRA-4388.patch
>
>
> Use MMap for CompressedSegmentFile (Something similar to CASSANDRA-3623) and use Native Checksum (HDFS-2080) to avoid memcpy and be faster in its calculations.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-4388) Use MMap for
CompressedSegmentFile with Native Checksum
Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-4388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Ellis updated CASSANDRA-4388:
--------------------------------------
Reviewer: xedin
> Use MMap for CompressedSegmentFile with Native Checksum
> -------------------------------------------------------
>
> Key: CASSANDRA-4388
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4388
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Affects Versions: 1.2
> Reporter: Vijay
> Assignee: Vijay
> Attachments: 0001-CASSANDRA-4388.patch
>
>
> Use MMap for CompressedSegmentFile (Something similar to CASSANDRA-3623) and use Native Checksum (HDFS-2080) to avoid memcpy and be faster in its calculations.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-4388) Use MMap for
CompressedSegmentFile with Native Checksum
Posted by "Vijay (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-4388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vijay updated CASSANDRA-4388:
-----------------------------
Attachment: 0001-CASSANDRA-4388.patch
There is 2 parts to this patch:
1) We have to support MMap to get access to DirectBB
2) Use hadoop's native API to use native CRC & CRCC
For the second part needs Native C library which is included in the alpha version of hadoop jar. Hence we might want to wait till it is released (to avoid breaking other dependent code).
Bench Mark (First part is still better):
http://pastebin.com/g4p3Sf6v
> Use MMap for CompressedSegmentFile with Native Checksum
> -------------------------------------------------------
>
> Key: CASSANDRA-4388
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4388
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Affects Versions: 1.2
> Reporter: Vijay
> Assignee: Vijay
> Attachments: 0001-CASSANDRA-4388.patch
>
>
> Use MMap for CompressedSegmentFile (Something similar to CASSANDRA-3623) and use Native Checksum (HDFS-2080) to avoid memcpy and be faster in its calculations.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira