You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by 谢良 <xi...@xiaomi.com> on 2013/10/23 08:56:42 UTC

答复: How can I insert large image or video into HBase?

Do you care about low latency?  if so, then maybe it's not a good choice to store big file into hbase, especially few G size, that's definitely will bring a GC hurt:)

Best,
Liang
________________________________________
发件人: Roman Nikitchenko [roman@nikitchenko.dp.ua]
发送时间: 2013年10月23日 14:50
收件人: user@hbase.apache.org
主题: Re: How can I insert large image or video into HBase?

We had need to store LOT of files (mostly not so big but they could be up
to few G) and we have decided to do it based on HBase. We store files in
column blocks.

Shortly solution is:
1. 2 Column families: single for metadata and single for content.
2. Class abstraction that provides client with stream to write file or read
it.
3. Internal write buffer and internal buffer of formed puts so write speed
is really good. Up to 2 times better than HDFS on files below 128K.
4. If client uses buffered writes, place buffers 1:1 to columns
(segmentation control).
5. Seek is implemented based on 2 client filters (to limit column range and
to get only qualifiers). So based on skip() we check what block shall we
buffer and set current position.

Advantages of this solution are:
- Small files problem is solved (shall I comment something here?).
- Thread safe without headaches.
- Possibility to use compression transparently.
- Metadata is really flexible (ok, HDFS can get it but again using HBase,
otherwise - small files problem is yours).
- Locality control due to regions (not possible with HDFS).
- Very effective MR processing due to previous point.
- Ability to use 'lightweight MR'

Disadvantages:
- Somewhat more complex client. We just have encapsulated this and don't
care any more. Right today I plan to add SHA1 hashes support.
- On large files (10M and more) we are notable slower than HDFS. Probably
it can be improved with MemStores configuration and so on but I just don't
care, for our needs it is enough.

These ideas should be enough to understand approach.
BTW I consider to publish this solution on GitHub to get some kind of
'community review'.

Best regards,
Roman.


On 23 October 2013 07:12, Jean-Marc Spaggiari <je...@spaggiari.org>wrote:

> Put your file into HDFS and store only the name into HBase. HBase is not
> done do store large files.
>
> JM
>
>
> 2013/10/23 Jack Chan <cd...@gmail.com>
>
> > Hi All:
> >
> > This could be a stupid question.But here it goes....
> > We knew that we can use "put" to insert some small files by converting it
> > to bytes first.
> > But for a large file,I think we would better stream it first.
> > So,how can we insert the large file into HBase through Java code using
> the
> > stream way?
> >
> > Thanks and regards
> >
> >
> >
> > Jack Chan.
>

Re: 答复: How can I insert large image or video into HBase?

Posted by Roman Nikitchenko <ro...@nikitchenko.dp.ua>.
As file is broken by blocks during any operation I think GC is loaded not
so much more than during plain HDFS operations (actually I thing less).
More serious load here is on MemStore and WAL but reliability is what I
want and yes, nothing is for free.

Best regards,
Roman.

On 23 October 2013 09:56, 谢良 <xi...@xiaomi.com> wrote:

> Do you care about low latency?  if so, then maybe it's not a good choice
> to store big file into hbase, especially few G size, that's definitely will
> bring a GC hurt:)
>
> Best,
> Liang
>
>