You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by "Eric Newton (JIRA)" <ji...@apache.org> on 2015/03/11 18:48:38 UTC

[jira] [Commented] (ACCUMULO-3646) Duplicate entries when iterator emits entries past seek() range

    [ https://issues.apache.org/jira/browse/ACCUMULO-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14357274#comment-14357274 ] 

Eric Newton commented on ACCUMULO-3646:
---------------------------------------

It is intended.

To fix the issue, the parallel scans performed by the {{BatchWriter}} would need to be serialized, so individual scan requests going to the servers could contain the last key received in the last requested range.  This would serialize the requests, which would negate the benefits of the {{BatchScanner}}.


> Duplicate entries when iterator emits entries past seek() range
> ---------------------------------------------------------------
>
>                 Key: ACCUMULO-3646
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-3646
>             Project: Accumulo
>          Issue Type: Bug
>          Components: client, mini, tserver
>    Affects Versions: 1.6.1
>         Environment: Ubuntu 14.04, Accumulo 1.6.1, Hadoop 2.6.0, Zookeeper 3.4.6
>            Reporter: Dylan Hutchison
>            Priority: Minor
>
> The SortedKeyValueIterator's seek() method documents that an iterator may return keys past the range passed to seek().  However, an iterator set at scan-time that returns values past the range passed to seek() will return those keys multiple times if the client uses a BatchScanner.  This does not occur when the client uses a Scanner. This has nothing to do with the VersioningIterator. This has nothing to do with the entries actually in the table. Also affects MiniAccumulo.
> If this is intended, we should update the SortedKeyValueIterator seek() documentation with a warning that returning keys past the seek() range may result in a client seeing duplicate keys. If this is not intended, then it is a bug.
> Test code: See [InjectTest|https://github.com/Accla/d4m_api_java/blob/master/src/test/java/edu/mit/ll/graphulo/InjectTest.java]
> * method {{testInjectOnScan_Empty}} fails because it uses a BatchScanner
> * method {{testInjectOnScan_Empty_Reg}} passes because it uses a Scanner
> In these methods, the [InjectIterator|https://github.com/Accla/d4m_api_java/blob/master/src/main/java/edu/mit/ll/graphulo/InjectIterator.java] emits entries that go beyond the seek() range.  We confirm what is going on by placing a [DebugIterator|https://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/iterators/DebugIterator.html] right after.
> Logs when using the BatchScanner:
> notice that the "m1" row is returned twice:
> {noformat}
> 015-03-05 06:05:34,768 [graphulo.BranchIterator] INFO : class edu.mit.ll.graphulo.InjectIterator: init on scope scan
> 2015-03-05 06:05:34,768 [graphulo.BranchIterator] INFO : class edu.mit.ll.graphulo.InjectIterator: init on scope scan
> 2015-03-05 06:05:34,770 [iterators.DebugIterator] DEBUG: init(edu.mit.ll.graphulo.InjectIterator@e9fe846, {}, org.apache.accumulo.tserver.TabletIteratorEnvironment@b99fd03)
> 2015-03-05 06:05:34,771 [iterators.DebugIterator] DEBUG: 0x516E9F1F seek((-inf,f%00; : [] 9223372036854775807 false), [], false)
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() --> true
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F getTopKey() --> a1 colF3:colQ3 [] 1425553534769 false
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() --> true
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F getTopKey() --> a1 colF3:colQ3 [] 1425553534769 false
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F getTopValue() --> 1
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F next()
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() --> true
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F getTopKey() --> c1 colF3:colQ3 [] 1425553534769 false
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() --> true
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F getTopKey() --> c1 colF3:colQ3 [] 1425553534769 false
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() --> true
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F getTopKey() --> c1 colF3:colQ3 [] 1425553534769 false
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F getTopValue() --> 1
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F next()
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() --> true
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F getTopKey() --> m1 colF3:colQ3 [] 1425553534769 false
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() --> true
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F getTopKey() --> m1 colF3:colQ3 [] 1425553534769 false
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() --> true
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F getTopKey() --> m1 colF3:colQ3 [] 1425553534769 false
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F getTopValue() --> 1
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F next()
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() --> false
> 2015-03-05 06:05:34,772 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() --> false
> 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x516E9F1F hasTop() --> false
> 2015-03-05 06:05:34,770 [iterators.DebugIterator] DEBUG: init(edu.mit.ll.graphulo.InjectIterator@2528a1f1, {}, org.apache.accumulo.tserver.TabletIteratorEnvironment@244a532a)
> 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA seek([f%00; : [] 9223372036854775807 false,+inf), [], false)
> 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA hasTop() --> true
> 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA getTopKey() --> m1 colF3:colQ3 [] 1425553534769 false
> 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA hasTop() --> true
> 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA getTopKey() --> m1 colF3:colQ3 [] 1425553534769 false
> 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA hasTop() --> true
> 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA getTopKey() --> m1 colF3:colQ3 [] 1425553534769 false
> 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA getTopValue() --> 1
> 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA next()
> 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA hasTop() --> false
> 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA hasTop() --> false
> 2015-03-05 06:05:34,773 [iterators.DebugIterator] DEBUG: 0x5DBB88BA hasTop() --> false
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)