You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Jim Kellerman (JIRA)" <ji...@apache.org> on 2009/02/17 05:12:59 UTC

[jira] Created: (HBASE-1202) getRow does not always work when specifying number of versions

getRow does not always work when specifying number of versions
--------------------------------------------------------------

                 Key: HBASE-1202
                 URL: https://issues.apache.org/jira/browse/HBASE-1202
             Project: Hadoop HBase
          Issue Type: Bug
          Components: regionserver
    Affects Versions: 0.19.0, 0.19.1, 0.20.0
            Reporter: Jim Kellerman
            Priority: Blocker
             Fix For: 0.19.1, 0.20.0


When a cell that exists is updated, getRow specifying number of versions does not work.
What is returned is the original value at that timestamp, instead of the updated value.

Note that this only applies when more than one version is specified. getRow with (implied) timestamp = latest does work.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1202) getRow does not always work when specifying number of versions

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697895#action_12697895 ] 

Jim Kellerman commented on HBASE-1202:
--------------------------------------

I don't think numVersions being a count of all results is the issue. (but it is a problem if it hasn't been fixed).

What the test does is:
1. store the value "value1" in column contents:contents at a specified timestamp.
2. shutdown and restart the cluster to force data to disk.
3. store the value "value2" in column contents:contents at the *same* timestamp as the first value.
4. call getRow(row) and it gets back "value2" as expected.
5. call getRow(row, HConstants.ALL_VERSIONS) and it gets back "value1" and not "value2"

But now I understand what is going on.

Cell contains a SortedMap<Long, byte[]> (where Long is the timestamp). So what happens is that "value2" is fetched out of the memcache and then "value1" is fetched from disk and because the timestamps are the same, overwrites the entry containing "value2". 

I think when we are looking for multiple versions, we need to check if we already have a match for row/column/timestamp and not insert a second value if we already have one at that timestamp.


> getRow does not always work when specifying number of versions
> --------------------------------------------------------------
>
>                 Key: HBASE-1202
>                 URL: https://issues.apache.org/jira/browse/HBASE-1202
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.19.0, 0.19.1, 0.20.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>             Fix For: 0.19.2, 0.20.0
>
>         Attachments: TestGetRowVersions.java
>
>
> When a cell that exists is updated, getRow specifying number of versions does not work.
> What is returned is the original value at that timestamp, instead of the updated value.
> Note that this only applies when more than one version is specified. getRow with (implied) timestamp = latest does work.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1202) getRow does not always work when specifying number of versions

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697907#action_12697907 ] 

Jim Kellerman commented on HBASE-1202:
--------------------------------------

It turns out that HStore has the same problem as Memcache, i.e., keeping a count of versions on a per-column basis instead of per-cell.

> getRow does not always work when specifying number of versions
> --------------------------------------------------------------
>
>                 Key: HBASE-1202
>                 URL: https://issues.apache.org/jira/browse/HBASE-1202
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.19.0, 0.19.1, 0.20.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>             Fix For: 0.19.2, 0.20.0
>
>         Attachments: TestGetRowVersions.java
>
>
> When a cell that exists is updated, getRow specifying number of versions does not work.
> What is returned is the original value at that timestamp, instead of the updated value.
> Note that this only applies when more than one version is specified. getRow with (implied) timestamp = latest does work.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1202) getRow does not always work when specifying number of versions

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman updated HBASE-1202:
---------------------------------

    Attachment: TestGetRowVersions.java

This is a test program that demonstrates the problem.

> getRow does not always work when specifying number of versions
> --------------------------------------------------------------
>
>                 Key: HBASE-1202
>                 URL: https://issues.apache.org/jira/browse/HBASE-1202
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.19.0, 0.19.1, 0.20.0
>            Reporter: Jim Kellerman
>            Priority: Blocker
>             Fix For: 0.19.1, 0.20.0
>
>         Attachments: TestGetRowVersions.java
>
>
> When a cell that exists is updated, getRow specifying number of versions does not work.
> What is returned is the original value at that timestamp, instead of the updated value.
> Note that this only applies when more than one version is specified. getRow with (implied) timestamp = latest does work.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1202) getRow does not always work when specifying number of versions

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697941#action_12697941 ] 

Jim Kellerman commented on HBASE-1202:
--------------------------------------

I was wrong about Memcache and HStore. After reading more closely, they do count numVersions on a per cell basis.

> getRow does not always work when specifying number of versions
> --------------------------------------------------------------
>
>                 Key: HBASE-1202
>                 URL: https://issues.apache.org/jira/browse/HBASE-1202
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.19.0, 0.19.1, 0.20.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>             Fix For: 0.19.2, 0.20.0
>
>         Attachments: TestGetRowVersions.java
>
>
> When a cell that exists is updated, getRow specifying number of versions does not work.
> What is returned is the original value at that timestamp, instead of the updated value.
> Note that this only applies when more than one version is specified. getRow with (implied) timestamp = latest does work.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1202) getRow does not always work when specifying number of versions

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12682379#action_12682379 ] 

stack commented on HBASE-1202:
------------------------------

I just noticed in memcache that numVersions is not per column but a count of all results found so far.  Maybe related?

> getRow does not always work when specifying number of versions
> --------------------------------------------------------------
>
>                 Key: HBASE-1202
>                 URL: https://issues.apache.org/jira/browse/HBASE-1202
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.19.0, 0.19.1, 0.20.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>             Fix For: 0.19.2, 0.20.0
>
>         Attachments: TestGetRowVersions.java
>
>
> When a cell that exists is updated, getRow specifying number of versions does not work.
> What is returned is the original value at that timestamp, instead of the updated value.
> Note that this only applies when more than one version is specified. getRow with (implied) timestamp = latest does work.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (HBASE-1202) getRow does not always work when specifying number of versions

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman resolved HBASE-1202.
----------------------------------

    Resolution: Fixed

Added new test case TestGetRowVersions.
Committed to branch and trunk.

> getRow does not always work when specifying number of versions
> --------------------------------------------------------------
>
>                 Key: HBASE-1202
>                 URL: https://issues.apache.org/jira/browse/HBASE-1202
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.19.0, 0.19.1, 0.20.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>             Fix For: 0.19.2, 0.20.0
>
>         Attachments: TestGetRowVersions.java
>
>
> When a cell that exists is updated, getRow specifying number of versions does not work.
> What is returned is the original value at that timestamp, instead of the updated value.
> Note that this only applies when more than one version is specified. getRow with (implied) timestamp = latest does work.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1202) getRow does not always work when specifying number of versions

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman updated HBASE-1202:
---------------------------------

         Priority: Major  (was: Blocker)
    Fix Version/s:     (was: 0.19.1)
                   0.19.2

Downgrading to major and moving 0.19.2. It can be fixed when we do scanners with multiple versions.

> getRow does not always work when specifying number of versions
> --------------------------------------------------------------
>
>                 Key: HBASE-1202
>                 URL: https://issues.apache.org/jira/browse/HBASE-1202
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.19.0, 0.19.1, 0.20.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>             Fix For: 0.19.2, 0.20.0
>
>         Attachments: TestGetRowVersions.java
>
>
> When a cell that exists is updated, getRow specifying number of versions does not work.
> What is returned is the original value at that timestamp, instead of the updated value.
> Note that this only applies when more than one version is specified. getRow with (implied) timestamp = latest does work.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HBASE-1202) getRow does not always work when specifying number of versions

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman reassigned HBASE-1202:
------------------------------------

    Assignee: Jim Kellerman

> getRow does not always work when specifying number of versions
> --------------------------------------------------------------
>
>                 Key: HBASE-1202
>                 URL: https://issues.apache.org/jira/browse/HBASE-1202
>             Project: Hadoop HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.19.0, 0.19.1, 0.20.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>            Priority: Blocker
>             Fix For: 0.19.1, 0.20.0
>
>         Attachments: TestGetRowVersions.java
>
>
> When a cell that exists is updated, getRow specifying number of versions does not work.
> What is returned is the original value at that timestamp, instead of the updated value.
> Note that this only applies when more than one version is specified. getRow with (implied) timestamp = latest does work.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.