You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by "David Medinets (JIRA)" <ji...@apache.org> on 2013/06/22 02:05:20 UTC

[jira] [Comment Edited] (ACCUMULO-1528) Scans should deterministically return entries with identical timestamps

    [ https://issues.apache.org/jira/browse/ACCUMULO-1528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13690918#comment-13690918 ] 

David Medinets edited comment on ACCUMULO-1528 at 6/22/13 12:03 AM:
--------------------------------------------------------------------

I am restating the issue to ensure I understand it. Here is the data:


{code}
R1 CF:CQ CV T=1 Value1
R2 CF:CQ CV T=2 Value2 <-- 'problem entry'
R2 CF:CQ CV T=2 Value3 <-- 'problem entry'
R2 CF:CQ CV T=2 Value4 <-- 'problem entry'
R2 CF:CQ CV T=2 Value5 <-- 'problem entry'
R3 CF:CQ CV T=1 Value6
{code}

If VersioningIterator is set to 1, the following might be returned:


{code}
R1 CF:CQ CV T=1 Value1
R2 CF:CQ CV T=2 Value2  <-- this is the non-deterministic entry (i.e., any one of 4 values could be here.)
R3 CF:CQ CV T=1 Value6
{code}

But it is also possible that the set might be:

{code}
R1 CF:CQ CV T=1 Value1
R2 CF:CQ CV T=2 Value3  <-- this is the non-deterministic entry (i.e., any one of 4 values could be here.)
R3 CF:CQ CV T=1 Value6
{code}

If the VersioningIterator is set to two, any two of the four values could be returned. Since the client only see two of the four values from R2 and then sees R3 the client can't tell that data is missing.
                
      was (Author: medined):
    I am restating the issue to ensure I understand it. Here is the data:


{code}
R1 CF:CQ CV T=1 Value1
R2 CF:CQ CV T=2 Value2 <-- 'problem entry'
R2 CF:CQ CV T=2 Value3 <-- 'problem entry'
R2 CF:CQ CV T=2 Value4 <-- 'problem entry'
R2 CF:CQ CV T=2 Value5 <-- 'problem entry'

R3 CF:CQ CV T=1 Value6
{code}

If VersioningIterator is set to 1, the following might be returned:


{code}
R1 CF:CQ CV T=1 Value1
R2 CF:CQ CV T=2 Value2  <-- this is the non-deterministic entry (i.e., any one of 4 values could be here.)

R3 CF:CQ CV T=1 Value6
{code}

But it is also possible that the set might be:

{code}
R1 CF:CQ CV T=1 Value1
R2 CF:CQ CV T=2 Value3  <-- this is the non-deterministic entry (i.e., any one of 4 values could be here.)

R3 CF:CQ CV T=1 Value6
{code}

If the VersioningIterator is set to two, any two of the four values could be returned. Since the client only see two of the four values from R2 and then sees R3 the client can't tell that data is missing.
                  
> Scans should deterministically return entries with identical timestamps
> -----------------------------------------------------------------------
>
>                 Key: ACCUMULO-1528
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-1528
>             Project: Accumulo
>          Issue Type: Sub-task
>          Components: tserver
>            Reporter: Christopher Tubbs
>            Priority: Minor
>
> Scans will return multiple versions of the same key (down to identical timestamps, but possibly with different values), non-deterministically. A source identity (eg. filename/timestamp) could be used to order these consistently.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira