You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hbase.apache.org by "Appy (JIRA)" <ji...@apache.org> on 2016/02/09 00:51:39 UTC

[jira] [Created] (HBASE-15235) Inconsistent cell reads over multiple bulk-loaded HFiles

Appy created HBASE-15235:
----------------------------

             Summary: Inconsistent cell reads over multiple bulk-loaded HFiles
                 Key: HBASE-15235
                 URL: https://issues.apache.org/jira/browse/HBASE-15235
             Project: HBase
          Issue Type: Bug
            Reporter: Appy
            Assignee: Appy


If there are two bulkloaded hfiles in a region with same seqID and duplicate keys*, get and scan may return different values for a key.
More details:
- one of the rows had 200k+ columns. say row is 'r', column family is 'cf' and column qualifiers are 1 to 1000.
- hfiles were split somewhere along that row, but there were a range of columns in both hfiles. For eg, something like - hfile1: ["",  r:cf:70) and hfile2: [r:cf:40, ....).
- Between columns 40 to 70, some (not all) columns were in both the files with different values. Whereas other were only in one of the files.

In such a case, depending on file size (because we take it into account when sorting hfiles internally), we may get different values for the same cell (say "r", "cf:50") depending on what we call: get "r" "cf:50" or get "r" "cf:".

I have been able to replicate this issue, will post the instructions shortly.

* not sure how this would happen. These files are small ~50M, nor could i find any setting for max file size that could lead to splits. Need to investigate more.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)