You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Benedict (JIRA)" <ji...@apache.org> on 2015/02/09 10:20:34 UTC
[jira] [Updated] (CASSANDRA-8747) Make SSTableWriter.openEarly behaviour more robust

     [ https://issues.apache.org/jira/browse/CASSANDRA-8747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Benedict updated CASSANDRA-8747:
--------------------------------
    Description: 
Currently openEarly does some fairly ugly looping back over the summary data we've collected looking for one we think should be fully covered in the Index and Data files, and that should have a safe boundary between it and the end of an IndexSummary entry so that when scanning across it we should not accidentally read an incomplete key. The approach taken is a little difficult to reason about though, and be confident of, and I now realise is also very subtly broken. Since we're cleaning up the behaviour around this code, it seemed worthwhile to improve its clarity and make its behaviour easier to reason about. The current behaviour can be characterised as:

# Take the current Index file length
# Find the IndexSummary boundary key (first key in an interval) that starts past this position
# Take the IndexSummary boundary key (first key) for the preceding interval as our initial boundary
# Construct a reader with this boundary
# Lookup our last key in the reader, and if its end position is past the end of the data file, take the prior summary boundary. Repeat until we find one starting before the end.

The bug may well be very hard to exhibit, or even impossible, but is that if we have a single very large partition followed by 127 very tiny partitions (or whatever the IndexSummary interval is configured as), our IndexSummary interval buffer may not guarantee the record we have selected as our end is fully readable.

The new approach is to track in the IndexSummary the safe and optimal boundary point (i.e. the last record in each summary interval) and its bounds in the index and data files. On flushing either file, we notify the summary builder to the new flush points, and it consults its map of these and selects the last such boundary that can safely be read in both. This is much easier to understand, and has no such subtle risk.

  was:
Currently openEarly does some fairly ugly looping back over the summary data we've collected looking for one we think should be fully covered in the Index and Data files, and that should have a safe boundary between it and the end of an IndexSummary entry so that when scanning across it we should not accidentally read an incomplete key. The approach taken is a little difficult to reason about though, and be confident of. Since we're cleaning up the behaviour around this code, it seemed worthwhile to improve its clarity and make its behaviour easier to reason about. The current behaviour can be characterised as:

Find the first summary record

       Priority: Major  (was: Minor)
     Issue Type: Bug  (was: Improvement)
        Summary: Make SSTableWriter.openEarly behaviour more robust  (was: Make SSTableWriter.openEarly behaviour more obvious)

> Make SSTableWriter.openEarly behaviour more robust
> --------------------------------------------------
>
>                 Key: CASSANDRA-8747
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8747
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Benedict
>            Assignee: Benedict
>             Fix For: 2.1.4
>
>
> Currently openEarly does some fairly ugly looping back over the summary data we've collected looking for one we think should be fully covered in the Index and Data files, and that should have a safe boundary between it and the end of an IndexSummary entry so that when scanning across it we should not accidentally read an incomplete key. The approach taken is a little difficult to reason about though, and be confident of, and I now realise is also very subtly broken. Since we're cleaning up the behaviour around this code, it seemed worthwhile to improve its clarity and make its behaviour easier to reason about. The current behaviour can be characterised as:
> # Take the current Index file length
> # Find the IndexSummary boundary key (first key in an interval) that starts past this position
> # Take the IndexSummary boundary key (first key) for the preceding interval as our initial boundary
> # Construct a reader with this boundary
> # Lookup our last key in the reader, and if its end position is past the end of the data file, take the prior summary boundary. Repeat until we find one starting before the end.
> The bug may well be very hard to exhibit, or even impossible, but is that if we have a single very large partition followed by 127 very tiny partitions (or whatever the IndexSummary interval is configured as), our IndexSummary interval buffer may not guarantee the record we have selected as our end is fully readable.
> The new approach is to track in the IndexSummary the safe and optimal boundary point (i.e. the last record in each summary interval) and its bounds in the index and data files. On flushing either file, we notify the summary builder to the new flush points, and it consults its map of these and selects the last such boundary that can safely be read in both. This is much easier to understand, and has no such subtle risk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)