You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Jonathan Hsieh (JIRA)" <ji...@apache.org> on 2014/06/17 20:15:19 UTC

[jira] [Comment Edited] (HBASE-11339) HBase LOB

    [ https://issues.apache.org/jira/browse/HBASE-11339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14030285#comment-14030285 ] 

Jonathan Hsieh edited comment on HBASE-11339 at 6/17/14 6:13 PM:
-----------------------------------------------------------------

Nice doc.  I did a quick read and have some design level questions and concerns:

The core problem we are trying to avoid is write amplification (writing the data in the hlog, then in flush and then over and over again with compactions).

Does the proposed design write out LOBs to both the HLog and then later LOB files?  As designed, it must write them to the log so that we preserve durability and consistency properties of a row.

(+) good that this should just would work with replication
(-) in the best case, the data is written at least twice -- once before the ack is sent to the client and then again on flush.  Can we limit this to once?

We could avoid extra writes by just writing to a separate LOB log/file.  Was this considered?

Is there any consideration of locality and performance?

5MB cells are large but aren't really that big.  Maybe this should just be "blobs" (binary large objects) or "mobs" (medium objects)?  the objects being immutable is important too.





was (Author: jmhsieh):
Nice doc.  I did a quick read and have some design level questions and concerns:

The core problem we are trying to avoid is write amplification (writing the data in the hlog, then in flush and then over and over again with compactions).

Does the proposed design write out LOBs to both the HLog and then later LOB files?  As designed, it must write them to the log so that we preserve durability and consistency properties of a row.
+ good that this should just would work with replication
- in the best case, the data is written at least twice -- once before the ack is sent to the client and then again on flush.  Can we limit this to once?

We could avoid extra writes by just writing to a separate LOB log/file.  Was this considered?

Is there any consideration of locality and performance?

5MB cells are large but aren't really that big.  Maybe this should just be "blobs" (binary large objects) or "mobs" (medium objects)?  the objects being immutable is important too.




> HBase LOB
> ---------
>
>                 Key: HBASE-11339
>                 URL: https://issues.apache.org/jira/browse/HBASE-11339
>             Project: HBase
>          Issue Type: New Feature
>          Components: regionserver, Scanners
>            Reporter: Jingcheng Du
>            Assignee: Jingcheng Du
>         Attachments: HBase LOB Design.pdf
>
>
>   It's quite useful to save the massive binary data like images, documents into Apache HBase. Unfortunately directly saving the binary LOB(large object) to HBase leads to a worse performance since the frequent split and compaction.
>   In this design, the LOB data are stored in an more efficient way, which keeps a high write/read performance and guarantees the data consistency in Apache HBase.



--
This message was sent by Atlassian JIRA
(v6.2#6252)