You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Jingcheng Du (JIRA)" <ji...@apache.org> on 2014/06/18 12:19:12 UTC

[jira] [Commented] (HBASE-11339) HBase MOB

    [ https://issues.apache.org/jira/browse/HBASE-11339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14035550#comment-14035550 ] 

Jingcheng Du commented on HBASE-11339:
--------------------------------------

To correct the typo.
bq. I'm not convinced. The idea I'm suggesting is having a special lob log file that is written once at write time that is essentially the lob store files in the doc, and put a reference to it (file name, and offset) in the normal wal. This allows the lob to only be written once. I don't see how this would be less efficient than an approach that must write the values out at least twice.
In this way, we save the Lob files as SequenceFiles, and save the offset and file name back into the Put before putting the KV into the MemStore, right?
1. If so, we don't use the MemStore to save the Lob data, right? Then how to read the Lob data that are not sync yet(which are still in the writer buffer)?
2. We need add a preSync and preAppend to the HLog so that we could sync the Lob files before the HLogs are sync.
3. In order to get the correct offset, we have to synchronize the prePut in the coprocessor, or we could use different Lob files for each thread?


> HBase MOB
> ---------
>
>                 Key: HBASE-11339
>                 URL: https://issues.apache.org/jira/browse/HBASE-11339
>             Project: HBase
>          Issue Type: New Feature
>          Components: regionserver, Scanners
>            Reporter: Jingcheng Du
>            Assignee: Jingcheng Du
>         Attachments: HBase LOB Design.pdf
>
>
>   It's quite useful to save the massive binary data like images, documents into Apache HBase. Unfortunately directly saving the binary LOB(large object) to HBase leads to a worse performance since the frequent split and compaction.
>   In this design, the LOB data are stored in an more efficient way, which keeps a high write/read performance and guarantees the data consistency in Apache HBase.



--
This message was sent by Atlassian JIRA
(v6.2#6252)