You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "stack (JIRA)" <ji...@apache.org> on 2009/09/15 22:55:57 UTC

[jira] Created: (HBASE-1841) If multiple of same key in an hfile and they span blocks, may miss the earlier keys on a lookup

If multiple of same key in an hfile and they span blocks, may miss the earlier keys on a lookup
-----------------------------------------------------------------------------------------------

                 Key: HBASE-1841
                 URL: https://issues.apache.org/jira/browse/HBASE-1841
             Project: Hadoop HBase
          Issue Type: Bug
            Reporter: stack


See HBASE-818 for description by Schubert Zhang -- discovered by him doing a code review of hfile.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1841) If multiple of same key in an hfile and they span blocks, may miss the earlier keys on a lookup

Posted by "Schubert Zhang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756912#action_12756912 ] 

Schubert Zhang commented on HBASE-1841:
---------------------------------------

The previous jira should be HBASE-1818.

> If multiple of same key in an hfile and they span blocks, may miss the earlier keys on a lookup
> -----------------------------------------------------------------------------------------------
>
>                 Key: HBASE-1841
>                 URL: https://issues.apache.org/jira/browse/HBASE-1841
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>
> See HBASE-818 for description by Schubert Zhang -- discovered by him doing a code review of hfile.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1841) If multiple of same key in an hfile and they span blocks, may miss the earlier keys on a lookup

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776535#action_12776535 ] 

stack commented on HBASE-1841:
------------------------------

Fix 2. sounds good Schubert.  We've been toying with making such a change but shied from it near 0.20 release as too big a change.  It would require a rewrite of the .META. table and of the indices in all store files.  Don't worry about this part.  We can help w/ that.

> If multiple of same key in an hfile and they span blocks, may miss the earlier keys on a lookup
> -----------------------------------------------------------------------------------------------
>
>                 Key: HBASE-1841
>                 URL: https://issues.apache.org/jira/browse/HBASE-1841
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>
> See HBASE-818 for description by Schubert Zhang -- discovered by him doing a code review of hfile.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1841) If multiple of same key in an hfile and they span blocks, may miss the earlier keys on a lookup

Posted by "Schubert Zhang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776671#action_12776671 ] 

Schubert Zhang commented on HBASE-1841:
---------------------------------------

I think maybe the range boundary schema would be changed with this new patch.

old: [row10, row20) [row20, row30) [row30, ...)

changed to:

new:  (..., row20] (row20, row30] (row30, row40]


> If multiple of same key in an hfile and they span blocks, may miss the earlier keys on a lookup
> -----------------------------------------------------------------------------------------------
>
>                 Key: HBASE-1841
>                 URL: https://issues.apache.org/jira/browse/HBASE-1841
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>         Attachments: HBASE-1841-step1.patch, HBASE-1841-step2.patch
>
>
> See HBASE-818 for description by Schubert Zhang -- discovered by him doing a code review of hfile.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1841) If multiple of same key in an hfile and they span blocks, may miss the earlier keys on a lookup

Posted by "Schubert Zhang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Schubert Zhang updated HBASE-1841:
----------------------------------

    Attachment:     (was: HBASE-1841-step2.patch)

> If multiple of same key in an hfile and they span blocks, may miss the earlier keys on a lookup
> -----------------------------------------------------------------------------------------------
>
>                 Key: HBASE-1841
>                 URL: https://issues.apache.org/jira/browse/HBASE-1841
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>         Attachments: HBASE-1841-step1-v2.patch, HBASE-1841-step2-v2.patch
>
>
> See HBASE-818 for description by Schubert Zhang -- discovered by him doing a code review of hfile.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1841) If multiple of same key in an hfile and they span blocks, may miss the earlier keys on a lookup

Posted by "ryan rawson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778055#action_12778055 ] 

ryan rawson commented on HBASE-1841:
------------------------------------

seekBefore() exists to implement "getClosestRowBefore" which is how the .META. lookup works.  This is how clients find which regionserver hosts which region.  Needless to say we want this to keep on working.

It might be possible to change how META works, to use first key of the region, but that requires a huge patch to touch nearly every part of hbase in one go, including all the tests. In fact you might have to accentuate the tests substantially, since what we have works now, and the change might do who knows what.

Bigtable paper aside, what we have now works, and doesn't present any specific problems other than the ideologically purity of adhering to exactly what Google did.

> If multiple of same key in an hfile and they span blocks, may miss the earlier keys on a lookup
> -----------------------------------------------------------------------------------------------
>
>                 Key: HBASE-1841
>                 URL: https://issues.apache.org/jira/browse/HBASE-1841
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: Schubert Zhang
>             Fix For: 0.21.0
>
>         Attachments: HBASE-1841-step1-v2.patch, HBASE-1841-step2-v2.patch
>
>
> See HBASE-818 for description by Schubert Zhang -- discovered by him doing a code review of hfile.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1841) If multiple of same key in an hfile and they span blocks, may miss the earlier keys on a lookup

Posted by "Schubert Zhang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Schubert Zhang updated HBASE-1841:
----------------------------------

    Attachment: HBASE-1841-step2.patch

This is the step2 patch.

@stack, 
Could you please have a review of this patch?
I just have a simple verfication of it, I don't know if it is ok for the .META. usage.

> If multiple of same key in an hfile and they span blocks, may miss the earlier keys on a lookup
> -----------------------------------------------------------------------------------------------
>
>                 Key: HBASE-1841
>                 URL: https://issues.apache.org/jira/browse/HBASE-1841
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>         Attachments: HBASE-1841-step1.patch, HBASE-1841-step2.patch
>
>
> See HBASE-818 for description by Schubert Zhang -- discovered by him doing a code review of hfile.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1841) If multiple of same key in an hfile and they span blocks, may miss the earlier keys on a lookup

Posted by "ryan rawson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776699#action_12776699 ] 

ryan rawson commented on HBASE-1841:
------------------------------------

My review of step1:

- dont throw that exception for reasons stated above. We are opening ourselves up to a hbase cluster that is unfixable.
- using a member variable for keyComp isn't really ideal - the state doesn't last the length of the object (which is what that implies), and it very temporary. A boolean to the checkBlockBoundary(boolean sameKey) would be better.  Or why even call checkBlockBoundary if the key is the same?  



> If multiple of same key in an hfile and they span blocks, may miss the earlier keys on a lookup
> -----------------------------------------------------------------------------------------------
>
>                 Key: HBASE-1841
>                 URL: https://issues.apache.org/jira/browse/HBASE-1841
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>         Attachments: HBASE-1841-step1.patch, HBASE-1841-step2.patch
>
>
> See HBASE-818 for description by Schubert Zhang -- discovered by him doing a code review of hfile.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1841) If multiple of same key in an hfile and they span blocks, may miss the earlier keys on a lookup

Posted by "Schubert Zhang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777440#action_12777440 ] 

Schubert Zhang commented on HBASE-1841:
---------------------------------------

@ stack

Thanks stack. A new issue HBASE-1978 for the future patch.

@ ryan and stack

I think the "seekBefore" should not be reason to change lastKey indexing to firstKey indexing. 
I my opinion, our major use cases are "seek to a point, and scan next....", these use cases need lastKey indexing, i.e. (startKey, endKey] range/block scheme.
But the other scheme, i.e.  [startKey, endKey) range/block is more suited to "seek to a point, and scan prev....".

I think the Bigtable paper's describe is a good choice.

> If multiple of same key in an hfile and they span blocks, may miss the earlier keys on a lookup
> -----------------------------------------------------------------------------------------------
>
>                 Key: HBASE-1841
>                 URL: https://issues.apache.org/jira/browse/HBASE-1841
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: Schubert Zhang
>             Fix For: 0.21.0
>
>         Attachments: HBASE-1841-step1-v2.patch, HBASE-1841-step2-v2.patch
>
>
> See HBASE-818 for description by Schubert Zhang -- discovered by him doing a code review of hfile.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1841) If multiple of same key in an hfile and they span blocks, may miss the earlier keys on a lookup

Posted by "ryan rawson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776697#action_12776697 ] 

ryan rawson commented on HBASE-1841:
------------------------------------

I dont think we want to be throwing exceptions like that during writing, we'd be preventing flushing and compactions, the former potentially causing data loss in pre-HDFS-265, and the latter wedging up a hbase install.

As for the key order in the index, there were some calls that depended on the choice of the first key in the block to work, specifically 'seekBefore'. By changing the block index, this call won't work anymore in all cases.  I should know, I started out with a 'last key' block index, but had to switch to 'first key' to make seekBefore work.

> If multiple of same key in an hfile and they span blocks, may miss the earlier keys on a lookup
> -----------------------------------------------------------------------------------------------
>
>                 Key: HBASE-1841
>                 URL: https://issues.apache.org/jira/browse/HBASE-1841
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>         Attachments: HBASE-1841-step1.patch, HBASE-1841-step2.patch
>
>
> See HBASE-818 for description by Schubert Zhang -- discovered by him doing a code review of hfile.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1841) If multiple of same key in an hfile and they span blocks, may miss the earlier keys on a lookup

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776828#action_12776828 ] 

stack commented on HBASE-1841:
------------------------------

+1 on patch.  Ryan, do you want to review?  Otherwise, I'll commit.

> If multiple of same key in an hfile and they span blocks, may miss the earlier keys on a lookup
> -----------------------------------------------------------------------------------------------
>
>                 Key: HBASE-1841
>                 URL: https://issues.apache.org/jira/browse/HBASE-1841
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>         Attachments: HBASE-1841-step1-v2.patch, HBASE-1841-step1.patch, HBASE-1841-step2.patch
>
>
> See HBASE-818 for description by Schubert Zhang -- discovered by him doing a code review of hfile.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1841) If multiple of same key in an hfile and they span blocks, may miss the earlier keys on a lookup

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776679#action_12776679 ] 

stack commented on HBASE-1841:
------------------------------

Thanks for working on this Schubert.

On the first patch, step 1, if key is same as last, do not make a new block.  AND if we keep getting the same key and current block is 2*blocksize, throw an exception?

If so, I think you should just let the block run rather than throw an exception... let it grow.

Also, does the keyComp have to be a data  member?  Can it be scoped within the key append perhaps passed into checkBlockBoundary?

Review of second patch coming up....

> If multiple of same key in an hfile and they span blocks, may miss the earlier keys on a lookup
> -----------------------------------------------------------------------------------------------
>
>                 Key: HBASE-1841
>                 URL: https://issues.apache.org/jira/browse/HBASE-1841
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>         Attachments: HBASE-1841-step1.patch, HBASE-1841-step2.patch
>
>
> See HBASE-818 for description by Schubert Zhang -- discovered by him doing a code review of hfile.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1841) If multiple of same key in an hfile and they span blocks, may miss the earlier keys on a lookup

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-1841:
-------------------------

       Resolution: Fixed
    Fix Version/s: 0.21.0
     Hadoop Flags: [Reviewed]
           Status: Resolved  (was: Patch Available)

Committed step1 to TRUNK.  Please open new issue for step2 Schubert (Thanks for the step1 patch).

> If multiple of same key in an hfile and they span blocks, may miss the earlier keys on a lookup
> -----------------------------------------------------------------------------------------------
>
>                 Key: HBASE-1841
>                 URL: https://issues.apache.org/jira/browse/HBASE-1841
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: Schubert Zhang
>             Fix For: 0.21.0
>
>         Attachments: HBASE-1841-step1-v2.patch, HBASE-1841-step2-v2.patch
>
>
> See HBASE-818 for description by Schubert Zhang -- discovered by him doing a code review of hfile.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1841) If multiple of same key in an hfile and they span blocks, may miss the earlier keys on a lookup

Posted by "Schubert Zhang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Schubert Zhang updated HBASE-1841:
----------------------------------

    Attachment: HBASE-1841-step2-v2.patch

new patch HBASE-1841-step2-v2.patch for step2.

Consider seekTo and seekBefore.

> If multiple of same key in an hfile and they span blocks, may miss the earlier keys on a lookup
> -----------------------------------------------------------------------------------------------
>
>                 Key: HBASE-1841
>                 URL: https://issues.apache.org/jira/browse/HBASE-1841
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>         Attachments: HBASE-1841-step1-v2.patch, HBASE-1841-step2-v2.patch
>
>
> See HBASE-818 for description by Schubert Zhang -- discovered by him doing a code review of hfile.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1841) If multiple of same key in an hfile and they span blocks, may miss the earlier keys on a lookup

Posted by "Schubert Zhang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Schubert Zhang updated HBASE-1841:
----------------------------------

    Attachment: HBASE-1841-step1-v2.patch

For step1: 

The patch is updated according to the comments.
I know there is a potential risk of Out-Of-Memory, when there are too many duplications of a key. This scenario will generate too big block and need many memory. But I know it is rare in HBase.

> If multiple of same key in an hfile and they span blocks, may miss the earlier keys on a lookup
> -----------------------------------------------------------------------------------------------
>
>                 Key: HBASE-1841
>                 URL: https://issues.apache.org/jira/browse/HBASE-1841
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>         Attachments: HBASE-1841-step1-v2.patch, HBASE-1841-step1.patch, HBASE-1841-step2.patch
>
>
> See HBASE-818 for description by Schubert Zhang -- discovered by him doing a code review of hfile.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1841) If multiple of same key in an hfile and they span blocks, may miss the earlier keys on a lookup

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777189#action_12777189 ] 

stack commented on HBASE-1841:
------------------------------

Schubert:  I'd suggest that you open a new issue for step2 since step1 patch addresses the immediate issue.  Good on you.

> If multiple of same key in an hfile and they span blocks, may miss the earlier keys on a lookup
> -----------------------------------------------------------------------------------------------
>
>                 Key: HBASE-1841
>                 URL: https://issues.apache.org/jira/browse/HBASE-1841
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>         Attachments: HBASE-1841-step1-v2.patch, HBASE-1841-step2-v2.patch
>
>
> See HBASE-818 for description by Schubert Zhang -- discovered by him doing a code review of hfile.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1841) If multiple of same key in an hfile and they span blocks, may miss the earlier keys on a lookup

Posted by "Schubert Zhang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777819#action_12777819 ] 

Schubert Zhang commented on HBASE-1841:
---------------------------------------

Oh, I may be wrong, scan prev and scan next is not the reason.

> If multiple of same key in an hfile and they span blocks, may miss the earlier keys on a lookup
> -----------------------------------------------------------------------------------------------
>
>                 Key: HBASE-1841
>                 URL: https://issues.apache.org/jira/browse/HBASE-1841
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: Schubert Zhang
>             Fix For: 0.21.0
>
>         Attachments: HBASE-1841-step1-v2.patch, HBASE-1841-step2-v2.patch
>
>
> See HBASE-818 for description by Schubert Zhang -- discovered by him doing a code review of hfile.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HBASE-1841) If multiple of same key in an hfile and they span blocks, may miss the earlier keys on a lookup

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack reassigned HBASE-1841:
----------------------------

    Assignee: Schubert Zhang

> If multiple of same key in an hfile and they span blocks, may miss the earlier keys on a lookup
> -----------------------------------------------------------------------------------------------
>
>                 Key: HBASE-1841
>                 URL: https://issues.apache.org/jira/browse/HBASE-1841
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: Schubert Zhang
>         Attachments: HBASE-1841-step1-v2.patch, HBASE-1841-step2-v2.patch
>
>
> See HBASE-818 for description by Schubert Zhang -- discovered by him doing a code review of hfile.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1841) If multiple of same key in an hfile and they span blocks, may miss the earlier keys on a lookup

Posted by "Schubert Zhang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Schubert Zhang updated HBASE-1841:
----------------------------------

      Tags: HFile
    Status: Patch Available  (was: Open)

This patch is for setp-1 fix.
It allow some a small quantity of duplication (not bigger then one block size).

> If multiple of same key in an hfile and they span blocks, may miss the earlier keys on a lookup
> -----------------------------------------------------------------------------------------------
>
>                 Key: HBASE-1841
>                 URL: https://issues.apache.org/jira/browse/HBASE-1841
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>
> See HBASE-818 for description by Schubert Zhang -- discovered by him doing a code review of hfile.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-1841) If multiple of same key in an hfile and they span blocks, may miss the earlier keys on a lookup

Posted by "Schubert Zhang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-1841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776530#action_12776530 ] 

Schubert Zhang commented on HBASE-1841:
---------------------------------------

I am considering following two steps to fix this issue.

Step-1: Light fix

Modify the HFile.Writer to prevent generating a problem HFile which include duplicated keys which straddle block boundary.
This light fix can avoid read missing issue when there are just a small quantity of duplication.
This should be a temporary fix.
I will provide this patch soon.


Step-2: Complete fix

(1) Modify the block index to point to the last key. 
(2) Modify the binary search to return the first item when duplicating.

In fact, we can refer to the section 5.1 of the Google Bigtable paper.

"The METADATA table stores the location of a tablet under a row key that is an encoding of the tablet's table identifer and its end row."

The theory of Bigtable's METADATA is same as the BlockIndex in a SSTable, so we should use EndKey in HFile's BlockIndex.

In my experiences of Hypertable (I had detailedly researched the METADATA structure of Hypertable in year 2008), the METADATA is also "tableID:endRow". 

This fix shall be complete and have many code changes, I will try to provide patch if I have time.

> If multiple of same key in an hfile and they span blocks, may miss the earlier keys on a lookup
> -----------------------------------------------------------------------------------------------
>
>                 Key: HBASE-1841
>                 URL: https://issues.apache.org/jira/browse/HBASE-1841
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>
> See HBASE-818 for description by Schubert Zhang -- discovered by him doing a code review of hfile.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1841) If multiple of same key in an hfile and they span blocks, may miss the earlier keys on a lookup

Posted by "Schubert Zhang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Schubert Zhang updated HBASE-1841:
----------------------------------

    Attachment: HBASE-1841-step1.patch

> If multiple of same key in an hfile and they span blocks, may miss the earlier keys on a lookup
> -----------------------------------------------------------------------------------------------
>
>                 Key: HBASE-1841
>                 URL: https://issues.apache.org/jira/browse/HBASE-1841
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>         Attachments: HBASE-1841-step1.patch
>
>
> See HBASE-818 for description by Schubert Zhang -- discovered by him doing a code review of hfile.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-1841) If multiple of same key in an hfile and they span blocks, may miss the earlier keys on a lookup

Posted by "Schubert Zhang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-1841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Schubert Zhang updated HBASE-1841:
----------------------------------

    Attachment:     (was: HBASE-1841-step1.patch)

> If multiple of same key in an hfile and they span blocks, may miss the earlier keys on a lookup
> -----------------------------------------------------------------------------------------------
>
>                 Key: HBASE-1841
>                 URL: https://issues.apache.org/jira/browse/HBASE-1841
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>         Attachments: HBASE-1841-step1-v2.patch, HBASE-1841-step2-v2.patch
>
>
> See HBASE-818 for description by Schubert Zhang -- discovered by him doing a code review of hfile.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.