You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Paulo Motta (JIRA)" <ji...@apache.org> on 2016/02/10 22:34:18 UTC

[jira] [Comment Edited] (CASSANDRA-10990) Support streaming of older version sstables in 3.0

    [ https://issues.apache.org/jira/browse/CASSANDRA-10990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15141660#comment-15141660 ] 

Paulo Motta edited comment on CASSANDRA-10990 at 2/10/16 9:33 PM:
------------------------------------------------------------------

Thanks for the comments [~yukim].

bq. What's the difference between MemoryCachedInputStream and BufferedInputStream? 

The main difference between {{MemoryCachedInputStream}} and {{BufferedInputStream}} is that the former has the ability to mark/reset a parent/source stream when it runs out of capacity without losing its mark state, allowing us to cascade a {{FileCachedInputStream}} with a {{MemoryCachedInputStream}} to provide a multi-tiered cached input stream.

Another less relevant difference is that {{BufferedInputStream}} always does buffered reads of up to the capacity of its buffer, while {{MemoryCachedInputStream}} only buffer reads when it's marked and only the amount that was consumed via its {{read}}/{{skip}} methods.

bq. Why can't we use the latter? 

I tried extending {{BufferedInputStream}} to add the ability to mark a parent stream when it runs out of capacity, but that involved reimplementing and/or changing most of its methods since {{BufferedInputStream}} always reads from its internal buffer and re-fills it when necessary and most of its methods rely on that logic. Reading from a parent stream when the buffer is full would change this assumption what would require a significant refactor in most of its methods. I'm open to suggestions if you see a way of easily adapting {{BufferedInputStream}} to fulfil that requirement.

bq. {{MemoryCachedInputStream}} uses default {{ByteArrayOutputStream}} constructor which has only size of 32 bytes. Isn't this too small to use for cache?

Probably, I will try to find a better value for this. Do you easily remember if there is a way to retrieve the average partition size for a given table? I remember seeing something along those lines but I'm not sure where it is..

I will start work on the remaining TODO points and review comments. Please let me know if you have something to add.


was (Author: pauloricardomg):
Thanks for the comments.

bq. What's the difference between MemoryCachedInputStream and BufferedInputStream? Why can't we use the latter? 

The main difference between {{MemoryCachedInputStream}} and {{BufferedInputStream}} is that the former has the ability to mark/reset a parent/source stream when it runs out of capacity without losing its mark state, allowing us to cascade a {{FileCachedInputStream}} with a {{MemoryCachedInputStream}} to provide a multi-tiered cached input stream. 


> Support streaming of older version sstables in 3.0
> --------------------------------------------------
>
>                 Key: CASSANDRA-10990
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10990
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Streaming and Messaging
>            Reporter: Jeremy Hanna
>            Assignee: Paulo Motta
>
> In 2.0 we introduced support for streaming older versioned sstables (CASSANDRA-5772).  In 3.0, because of the rewrite of the storage layer, this became no longer supported.  So currently, while 3.0 can read sstables in the 2.1/2.2 format, it cannot stream the older versioned sstables.  We should do some work to make this still possible to be consistent with what CASSANDRA-5772 provided.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)