You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Paulo Motta (JIRA)" <ji...@apache.org> on 2016/02/05 17:19:40 UTC

[jira] [Commented] (CASSANDRA-10990) Support streaming of older version sstables in 3.0

    [ https://issues.apache.org/jira/browse/CASSANDRA-10990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15134391#comment-15134391 ] 

Paulo Motta commented on CASSANDRA-10990:
-----------------------------------------

Initial version is ready for review. Feedback on approach and correctness will be greatly appreciated.

*Patch Overview*

The patch adds support for streaming pre-3.0 sstables and a comprehensive test suite around it. Adding support to non-static-compact tables was simple, basically wokaround the lack of serialization header by using a header with no stats and deserialize clustering prefix with old format deserializer while serializing in new format.

The main challenge was to provide support to streaming compact static tables, because in the new format the static columns must be the first columns in a partition while in the previous format they can be in any position of the partition. This means that each partition must be traversed to search for static columns and then rewinded to search for remaining non-static columns.

In order to solve this I added a new {{CachedInputStream}} that adds mark/reset functionality to a source stream and allows to cooperatively cascade multiple {{CachedInputStream}} with different capacities to create an input stream cache hierarchy. For instance, I used this feature on {{StreamDeserializer}} for pre-3.0 sstables that uses a {{MemoryCachedInputStream}} that falls back to a {{FileCachedInputStream}} when it runs out of capacity in memory. The {{FileCachedInputStream}} may write a temporary buffer file to a data directory and remove it once the file is successfully streamed or if it fails.

This approach allow us to use the {{OldFormatDeserializer}} transparently, and the same code path for reading pre-3.0 sstables is used to stream pre-3.0 sstables. Note that the {{CachedInputStream}} is only used to stream pre-3.0 sstables in order to provide rewind functionality and will not affect existing behavior.

Please note that performance was not the objective here, but mostly support streaming functionality of pre-3.0 sstables. Compact static tables may suffer a slight performance hit due to buffer copying and rewinding, but non-compact static tables will not have performance affected since the stream cache will not be used.

*Tests*

* *Unit tests*: Extended {{LegacySStableTest}} to test streaming of legacy compact sstables since jb version.
** Add comprehensive test suite for different {{CachedInputStream}} variants on {{RewindableDataInputStreamPlusTest}}
* *SStable loader dtests*: Extended {{sstable_generation_loading_test}} to sstableload 2.1 (ka) sstables with different compression settings.
* *Upgrade dtests*: Extended CASSANDRA-10563 upgrade dtests to bootstrap soon after upgrading, to test bootstrap streaming of legacy sstables.

*TODO*

* Cleanup of leftover buffer files on startup.
* Improve documentation of {{CachedInputStream}}, {{MemoryCachedInputStream}} and {{FileCachedInputStream}}
* Make max memory buffer size a system property and change it on dtests
* {{LegacySSTableTest}} passes when executed individually but fails when executed on a suite, probably some leftovers from previous test that need to be cleaned up.
* Add la sstables to {{sstable_generation_loading_test}}
* Fix {{upgrade_8099_test.py:TestBootstrapAfterUpgrade.upgrade_with_wide_partition_test}}

||3.0||dtest||
|[branch|https://github.com/apache/cassandra/compare/cassandra-3.0...pauloricardomg:10990]|[branch|https://github.com/riptano/cassandra-dtest/compare/master...pauloricardomg:10990]|
|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-10990-testall/lastCompletedBuild/testReport/]|
|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-10990-dtest/lastCompletedBuild/testReport/]|

[~philipthompson] when you have time, could you please setup a custom dtest run with the dtest branch above? Thanks!

> Support streaming of older version sstables in 3.0
> --------------------------------------------------
>
>                 Key: CASSANDRA-10990
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10990
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Streaming and Messaging
>            Reporter: Jeremy Hanna
>            Assignee: Paulo Motta
>
> In 2.0 we introduced support for streaming older versioned sstables (CASSANDRA-5772).  In 3.0, because of the rewrite of the storage layer, this became no longer supported.  So currently, while 3.0 can read sstables in the 2.1/2.2 format, it cannot stream the older versioned sstables.  We should do some work to make this still possible to be consistent with what CASSANDRA-5772 provided.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)