You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by Ryan Rawson <ry...@gmail.com> on 2009/07/09 20:52:25 UTC

Re: [jira] Commented: (HBASE-1485) Wrong or indeterminate behavior when there are duplicate versions of a column

The sequenceid in the file tells you the newest (largest=newest). If the
heap used that we might be sitting pretty.

We want to avoid using ts for filename I think, not sure what assumptions
might break.

On Jul 9, 2009 11:42 AM, "Jonathan Gray (JIRA)" <ji...@apache.org> wrote:

[
https://issues.apache.org/jira/browse/HBASE-1485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12729379#action_12729379]

Jonathan Gray commented on HBASE-1485:
--------------------------------------

I've had at least three people with a use case for this.

Might create a couple sub-tasks here so we can at least head in the right
direction.

First, we need to make scanners ignore duplicate versions of the same
column. The trickiest part is, how do we determine which to keep? We want
to always come from the latest storefile, but I believe their IDs are still
random and not timestamps? We might need to make that change to fix this.
Would also then require a modification to the KVHeap to take this into
account, all other things considered equal.

Once we have scanners working, that will mean the proper thing is enforced
on major (and if we want, minor) compactions.

Gets will only work once we re-implement Gets as an optimized scan (taking
advantage of bloom filters, mostly).

I remember why I punted this to 0.20.1, the tricky part at the beginning is
pretty tough and touches a good bit of core read-path code.

Revisiting now, we'll see. Anyone else interested in this / want to work on
it?

> Wrong or indeterminate behavior when there are duplicate versions of a
column > -----------------...