You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "stack (JIRA)" <ji...@apache.org> on 2010/01/27 01:50:34 UTC
[jira] Commented: (HBASE-1485) Wrong or indeterminate behavior when there are duplicate versions of a column

    [ https://issues.apache.org/jira/browse/HBASE-1485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12805304#action_12805304 ] 

stack commented on HBASE-1485:
------------------------------

>From the list:

{code}
On Tue, Jan 26, 2010 at 9:36 AM, Rod Cope <rod.cope at openlogic dot com> wrote:
> Hi,
>
> I¹m seeing behavior on 0.20.2 and 0.20.3 that doesn¹t seem quite right and
> would like to know if this is by design, a bug, or something I¹m doing
> wrong.
>
> Background:
>
> When I do a put that includes a timestamp like this (conceptually  I know
> this is not the actual API), it works just fine.
>  put ³table², ³family², ³column², ³bbb², 12345
>
> Then, if I do another put in the same client code using the same timestamp
> like this...
>  put ³table², ³family², ³column², ³aaa², 12345
>
> ...and I create a scanner, grab a Result, and iterate over all values using
> list(), I get this...
>  ³table², ³family², ³column², ³aaa², 12345
>
> So far, so good.  Now, if I truncate the table from the shell and run a new
> program that does a flush() on the table between the two put¹s, but does it
> in the same client program back-to-back, I also get the same results from
> list().
>
> -----
>
> Problem:
>
> Here¹s where the trouble starts.  I truncate the table and run a new program
> that puts ³bbb², flushes the table, and quits.  Here¹s what I get from
> list():
>  ³table², ³family², ³column², ³bbb², 12345
>
> Then I run another program that puts ³aaa², flushes, and quits.  Here¹s what
> I get from list():
>  ³table², ³family², ³column², ³aaa², 12345
>  ³table², ³family², ³column², ³bbb², 12345
>
> And if I then run a third program that puts ³ccc², flushes, and quits, I get
> this from list():
>  ³table², ³family², ³column², ³ccc², 12345
>  ³table², ³family², ³column², ³bbb², 12345
>  ³table², ³family², ³column², ³aaa², 12345
>
> I¹m getting three different values for identical
> table/family/qualifier/timestamp tuples.  Does this seem right?  There also
> doesn¹t seem to be a defined sort order, probably because the timestamps are
> identical.
>
> Also, if instead of using list(), I use getMap(), then I always only get a
> single result.  The single result is always the last item in the lists above
> (i.e., ³bbb² then ³bbb² then ³aaa²).  I get identical results from using
> getNoVersionMap().
>
> I suspect that this same behavior could occur when HBase decides to flush on
> its own, but I could be wrong.  As you can imagine, this can cause problems
> because clients can¹t know from the results of calling list() which value is
> ³right² or ³newest².  They also can¹t rely on getMap() or getNoVersionMap()
> because the single result that gets returned is not necessarily ³right² or
> ³newest².
>
> I¹ve reproduced everything above in a stand-alone installation and also with
> a 7 regionserver cluster with the final 0.20.3.  I started down this
> debugging path originally because I ran into this problem on the 7
> regionserver cluster with one table of 100+ regions.  I was flushing
> programmatically at the end of some large imports because I'm doing
> setWriteToWAL(false) for load performance.
>
> Am I doing something wrong?  Did I miss an HBase assumption about flushing
> and/or identical timestamps?
>
> Any help would be much appreciated.
{code}

> Wrong or indeterminate behavior when there are duplicate versions of a column
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-1485
>                 URL: https://issues.apache.org/jira/browse/HBASE-1485
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.20.0
>            Reporter: Jonathan Gray
>             Fix For: 0.21.0
>
>
> As of now, both gets and scanners will end up returning all duplicate versions of a column.  The ordering of them is indeterminate.
> We need to decide what the desired/expected behavior should be and make it happen.
> Note:  It's nearly impossible for this to work with Gets as they are now implemented in 1304 so this is really a Scanner issue.  To implement this correctly with Gets, we would have to undo basically all the optimizations that Gets do and making them far slower than a Scanner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.