You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org> on 2013/08/15 23:55:59 UTC

[jira] [Commented] (HBASE-8521) Cells cannot be overwritten with bulk loaded HFiles

    [ https://issues.apache.org/jira/browse/HBASE-8521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13741540#comment-13741540 ] 

Jean-Daniel Cryans commented on HBASE-8521:
-------------------------------------------

This jira will need a title that accurately describes what's being done here.

On the patch:

 - There are 11 "== Durability.USE_DEFAULT", can we just have a method somewhere (like in Mutation) that does it and is named "isDefaultDurability"?
 - LoadIncrementalHFiles.assignSeqIds should be final.
 - Since bulk loaded files can have sequence ids, we should print it out. StoreFile.toStringDetailed is a candidate for that change, there might be more
 - What's up with the commented out code in HRegion?
 - Is it passing all the unit tests? A trunk version would here to get some Hadoop QA love.

I tested the patch to see if Jonathan's original use case is covered and it looks like it is. I also did some mixed workloads of normal Puts and bulk loaded files.
                
> Cells cannot be overwritten with bulk loaded HFiles
> ---------------------------------------------------
>
>                 Key: HBASE-8521
>                 URL: https://issues.apache.org/jira/browse/HBASE-8521
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.1
>            Reporter: Jonathan Natkins
>         Attachments: HBASE-8521.diff, HBASE-8521-v0-0.94.patch, HBASE-8521-v1-0.94.patch, HBASE-8521-v2-0.94.patch, hfileDirs.tar.gz
>
>
> Let's say you have a pre-built HFile that contains a cell:
> ('rowkey1', 'family1', 'qual1', 1234L, 'value1')
> We bulk load this first HFile. Now, let's create a second HFile that contains a cell that overwrites the first:
> ('rowkey1', 'family1', 'qual1', 1234L, 'value2')
> That gets bulk loaded into the table, but the value that HBase bubbles up is still 'value1'.
> It seems that there's no way to overwrite a cell for a particular timestamp without an explicit put operation. This seems to be the case even after minor and major compactions happen.
> My guess is that this is pretty closely related to the sequence number work being done on the compaction algorithm via HBASE-7842, but I'm not sure if one of would fix the other.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira