You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Schubert Zhang (JIRA)" <ji...@apache.org> on 2009/11/11 18:35:39 UTC

[jira] Commented: (HBASE-1841) If multiple of same key in an hfile and they span blocks, may miss the earlier keys on a lookup

    [ https://issues.apache.org/jira/browse/HBASE-1841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776530#action_12776530 ] 

Schubert Zhang commented on HBASE-1841:
---------------------------------------

I am considering following two steps to fix this issue.

Step-1: Light fix

Modify the HFile.Writer to prevent generating a problem HFile which include duplicated keys which straddle block boundary.
This light fix can avoid read missing issue when there are just a small quantity of duplication.
This should be a temporary fix.
I will provide this patch soon.


Step-2: Complete fix

(1) Modify the block index to point to the last key. 
(2) Modify the binary search to return the first item when duplicating.

In fact, we can refer to the section 5.1 of the Google Bigtable paper.

"The METADATA table stores the location of a tablet under a row key that is an encoding of the tablet's table identifer and its end row."

The theory of Bigtable's METADATA is same as the BlockIndex in a SSTable, so we should use EndKey in HFile's BlockIndex.

In my experiences of Hypertable (I had detailedly researched the METADATA structure of Hypertable in year 2008), the METADATA is also "tableID:endRow". 

This fix shall be complete and have many code changes, I will try to provide patch if I have time.

> If multiple of same key in an hfile and they span blocks, may miss the earlier keys on a lookup
> -----------------------------------------------------------------------------------------------
>
>                 Key: HBASE-1841
>                 URL: https://issues.apache.org/jira/browse/HBASE-1841
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>
> See HBASE-818 for description by Schubert Zhang -- discovered by him doing a code review of hfile.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.