You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by kavishahuja <ka...@yahoo.com> on 2013/01/05 11:11:24 UTC

Storing images in Hbase

*Hello EVERYBODY
first of all, a happy new year to everyone !!
I need a small help regarding pushing images into apache HBase(DB)...i know
its about converting objects into bytes and then saving those bytes into
hbase rows. But still i cant do it.
Kindly help !! *

Regards,
Kavish



--
View this message in context: http://apache-hbase.679495.n3.nabble.com/Storing-images-in-Hbase-tp4036184.html
Sent from the HBase User mailing list archive at Nabble.com.

答复: 答复: Storing images in Hbase

Posted by 谢良 <xi...@xiaomi.com>.
HBase is not the best choice for blob(photo/image/...) storage(file sizes are ofter smaller than tens of MB). 

Here are several blob storage systems :
google blob storage : https://developers.google.com/appengine/docs/java/blobstore/overview
facebook haystack : http://www.facebook.com/note.php?note_id=76191543919
twitter : http://engineering.twitter.com/2012/12/blobstore-twitters-in-house-photo.html
taobao tfs :  http://code.taobao.org/p/tfs/src/trunk/src/ (https://github.com/taobao/tfs)

Thanks,
________________________________________
发件人: Mohit Anchlia [mohitanchlia@gmail.com]
发送时间: 2013年1月6日 13:45
收件人: user@hbase.apache.org
Cc: user@hbase.apache.org
主题: Re: 答复: Storing images in Hbase

IMHO Use dfs unread for blobs and use Hbase for meta data

Sent from my iPhone

On Jan 5, 2013, at 7:58 PM, 谢良 <xi...@xiaomi.com> wrote:

> Just out of curiousity, why not considering a blob storage system ?
>
> Best Regards,
> Liang
> ________________________________________
> 发件人: kavishahuja [kavishahuja@yahoo.com]
> 发送时间: 2013年1月5日 18:11
> 收件人: user@hbase.apache.org
> 主题: Storing images in Hbase
>
> *Hello EVERYBODY
> first of all, a happy new year to everyone !!
> I need a small help regarding pushing images into apache HBase(DB)...i know
> its about converting objects into bytes and then saving those bytes into
> hbase rows. But still i cant do it.
> Kindly help !! *
>
> Regards,
> Kavish
>
>
>
> --
> View this message in context: http://apache-hbase.679495.n3.nabble.com/Storing-images-in-Hbase-tp4036184.html
> Sent from the HBase User mailing list archive at Nabble.com.

Re: 答复: Storing images in Hbase

Posted by Mohit Anchlia <mo...@gmail.com>.
IMHO Use dfs unread for blobs and use Hbase for meta data

Sent from my iPhone

On Jan 5, 2013, at 7:58 PM, 谢良 <xi...@xiaomi.com> wrote:

> Just out of curiousity, why not considering a blob storage system ?
> 
> Best Regards,
> Liang
> ________________________________________
> 发件人: kavishahuja [kavishahuja@yahoo.com]
> 发送时间: 2013年1月5日 18:11
> 收件人: user@hbase.apache.org
> 主题: Storing images in Hbase
> 
> *Hello EVERYBODY
> first of all, a happy new year to everyone !!
> I need a small help regarding pushing images into apache HBase(DB)...i know
> its about converting objects into bytes and then saving those bytes into
> hbase rows. But still i cant do it.
> Kindly help !! *
> 
> Regards,
> Kavish
> 
> 
> 
> --
> View this message in context: http://apache-hbase.679495.n3.nabble.com/Storing-images-in-Hbase-tp4036184.html
> Sent from the HBase User mailing list archive at Nabble.com.

答复: Storing images in Hbase

Posted by 谢良 <xi...@xiaomi.com>.
Just out of curiousity, why not considering a blob storage system ?

Best Regards,
Liang
________________________________________
发件人: kavishahuja [kavishahuja@yahoo.com]
发送时间: 2013年1月5日 18:11
收件人: user@hbase.apache.org
主题: Storing images in Hbase

*Hello EVERYBODY
first of all, a happy new year to everyone !!
I need a small help regarding pushing images into apache HBase(DB)...i know
its about converting objects into bytes and then saving those bytes into
hbase rows. But still i cant do it.
Kindly help !! *

Regards,
Kavish



--
View this message in context: http://apache-hbase.679495.n3.nabble.com/Storing-images-in-Hbase-tp4036184.html
Sent from the HBase User mailing list archive at Nabble.com.

Re: Storing images in Hbase

Posted by Jack Levin <ma...@gmail.com>.
http://img338.imageshack.us/img338/6831/screenshot20130111at949.png

this shows how often we flush, and how large are the region files.  We
do have bloomfilters turn up, that we don't incur extra seeks across
multiple RS files.

-Jack

On Fri, Jan 11, 2013 at 9:47 AM, Jack Levin <ma...@gmail.com> wrote:
> We buffer all accesses to HBASE with Varnish SSD based caching layer.
> So the impact for reads is negligible.  We have 70 node cluster, 8 GB
> of RAM per node, relatively weak nodes (intel core 2 duo), with
> 10-12TB per server of disks.  Inserting 600,000 images per day.  We
> have relatively little of compaction activity as we made our write
> cache much larger than read cache - so we don't experience region file
> fragmentation as much.
>
> -Jack
>
> On Fri, Jan 11, 2013 at 9:40 AM, Mohit Anchlia <mo...@gmail.com> wrote:
>> I think it really depends on volume of the traffic, data distribution per
>> region, how and when files compaction occurs, number of nodes in the
>> cluster. In my experience when it comes to blob data where you are serving
>> 10s of thousand+ requests/sec writes and reads then it's very difficult to
>> manage HBase without very hard operations and maintenance in play. Jack
>> earlier mentioned they have 1 billion images, It would be interesting to
>> know what they see in terms of compaction, no of requests per sec. I'd be
>> surprised that in high volume site it can be done without any Caching layer
>> on the top to alleviate IO spikes that occurs because of GC and compactions.
>>
>> On Fri, Jan 11, 2013 at 7:27 AM, Mohammad Tariq <do...@gmail.com> wrote:
>>
>>> IMHO, if the image files are not too huge, Hbase can efficiently serve the
>>> purpose. You can store some additional info along with the file depending
>>> upon your search criteria to make the search faster. Say if you want to
>>> fetch images by the type, you can store images in one column and its
>>> extension in another column(jpg, tiff etc).
>>>
>>> BTW, what exactly is the problem which you are facing. You have written
>>> "But I still cant do it"?
>>>
>>> Warm Regards,
>>> Tariq
>>> https://mtariq.jux.com/
>>>
>>>
>>> On Fri, Jan 11, 2013 at 8:30 PM, Michael Segel <michael_segel@hotmail.com
>>> >wrote:
>>>
>>> > That's a viable option.
>>> > HDFS reads are faster than HBase, but it would require first hitting the
>>> > index in HBase which points to the file and then fetching the file.
>>> > It could be faster... we found storing binary data in a sequence file and
>>> > indexed on HBase to be faster than HBase, however, YMMV and HBase has
>>> been
>>> > improved since we did that project....
>>> >
>>> >
>>> > On Jan 10, 2013, at 10:56 PM, shashwat shriparv <
>>> dwivedishashwat@gmail.com>
>>> > wrote:
>>> >
>>> > > Hi Kavish,
>>> > >
>>> > > i have a better idea for you copy your image files to a single file on
>>> > > hdfs, and if new image comes append it to the existing image, and keep
>>> > and
>>> > > update the metadata and the offset to the HBase. Because if you put
>>> > bigger
>>> > > image in hbase it wil lead to some issue.
>>> > >
>>> > >
>>> > >
>>> > > ∞
>>> > > Shashwat Shriparv
>>> > >
>>> > >
>>> > >
>>> > > On Fri, Jan 11, 2013 at 9:21 AM, lars hofhansl <la...@apache.org>
>>> wrote:
>>> > >
>>> > >> Interesting. That's close to a PB if my math is correct.
>>> > >> Is there a write up about this somewhere? Something that we could link
>>> > >> from the HBase homepage?
>>> > >>
>>> > >> -- Lars
>>> > >>
>>> > >>
>>> > >> ----- Original Message -----
>>> > >> From: Jack Levin <ma...@gmail.com>
>>> > >> To: user@hbase.apache.org
>>> > >> Cc: Andrew Purtell <ap...@apache.org>
>>> > >> Sent: Thursday, January 10, 2013 9:24 AM
>>> > >> Subject: Re: Storing images in Hbase
>>> > >>
>>> > >> We stored about 1 billion images into hbase with file size up to 10MB.
>>> > >> Its been running for close to 2 years without issues and serves
>>> > >> delivery of images for Yfrog and ImageShack.  If you have any
>>> > >> questions about the setup, I would be glad to answer them.
>>> > >>
>>> > >> -Jack
>>> > >>
>>> > >> On Sun, Jan 6, 2013 at 1:09 PM, Mohit Anchlia <mohitanchlia@gmail.com
>>> >
>>> > >> wrote:
>>> > >>> I have done extensive testing and have found that blobs don't belong
>>> in
>>> > >> the
>>> > >>> databases but are rather best left out on the file system. Andrew
>>> > >> outlined
>>> > >>> issues that you'll face and not to mention IO issues when compaction
>>> > >> occurs
>>> > >>> over large files.
>>> > >>>
>>> > >>> On Sun, Jan 6, 2013 at 12:52 PM, Andrew Purtell <apurtell@apache.org
>>> >
>>> > >> wrote:
>>> > >>>
>>> > >>>> I meant this to say "a few really large values"
>>> > >>>>
>>> > >>>> On Sun, Jan 6, 2013 at 12:49 PM, Andrew Purtell <
>>> apurtell@apache.org>
>>> > >>>> wrote:
>>> > >>>>
>>> > >>>>> Consider if the split threshold is 2 GB but your one row contains
>>> 10
>>> > >> GB
>>> > >>>> as
>>> > >>>>> really large value.
>>> > >>>>
>>> > >>>>
>>> > >>>>
>>> > >>>>
>>> > >>>> --
>>> > >>>> Best regards,
>>> > >>>>
>>> > >>>>   - Andy
>>> > >>>>
>>> > >>>> Problems worthy of attack prove their worth by hitting back. - Piet
>>> > Hein
>>> > >>>> (via Tom White)
>>> > >>>>
>>> > >>
>>> > >>
>>> >
>>> >
>>>

Re: Storing images in Hbase

Posted by Mohit Anchlia <mo...@gmail.com>.
Thanks Jack for sharing this information. This definitely makes sense when
using the type of caching layer. You mentioned about increasing write
cache, I am assuming you had to increase the following parameters in
addition to increase the memstore size:

hbase.hregion.max.filesize
hbase.hregion.memstore.flush.size

On Fri, Jan 11, 2013 at 9:47 AM, Jack Levin <ma...@gmail.com> wrote:

> We buffer all accesses to HBASE with Varnish SSD based caching layer.
> So the impact for reads is negligible.  We have 70 node cluster, 8 GB
> of RAM per node, relatively weak nodes (intel core 2 duo), with
> 10-12TB per server of disks.  Inserting 600,000 images per day.  We
> have relatively little of compaction activity as we made our write
> cache much larger than read cache - so we don't experience region file
> fragmentation as much.
>
> -Jack
>
> On Fri, Jan 11, 2013 at 9:40 AM, Mohit Anchlia <mo...@gmail.com>
> wrote:
> > I think it really depends on volume of the traffic, data distribution per
> > region, how and when files compaction occurs, number of nodes in the
> > cluster. In my experience when it comes to blob data where you are
> serving
> > 10s of thousand+ requests/sec writes and reads then it's very difficult
> to
> > manage HBase without very hard operations and maintenance in play. Jack
> > earlier mentioned they have 1 billion images, It would be interesting to
> > know what they see in terms of compaction, no of requests per sec. I'd be
> > surprised that in high volume site it can be done without any Caching
> layer
> > on the top to alleviate IO spikes that occurs because of GC and
> compactions.
> >
> > On Fri, Jan 11, 2013 at 7:27 AM, Mohammad Tariq <do...@gmail.com>
> wrote:
> >
> >> IMHO, if the image files are not too huge, Hbase can efficiently serve
> the
> >> purpose. You can store some additional info along with the file
> depending
> >> upon your search criteria to make the search faster. Say if you want to
> >> fetch images by the type, you can store images in one column and its
> >> extension in another column(jpg, tiff etc).
> >>
> >> BTW, what exactly is the problem which you are facing. You have written
> >> "But I still cant do it"?
> >>
> >> Warm Regards,
> >> Tariq
> >> https://mtariq.jux.com/
> >>
> >>
> >> On Fri, Jan 11, 2013 at 8:30 PM, Michael Segel <
> michael_segel@hotmail.com
> >> >wrote:
> >>
> >> > That's a viable option.
> >> > HDFS reads are faster than HBase, but it would require first hitting
> the
> >> > index in HBase which points to the file and then fetching the file.
> >> > It could be faster... we found storing binary data in a sequence file
> and
> >> > indexed on HBase to be faster than HBase, however, YMMV and HBase has
> >> been
> >> > improved since we did that project....
> >> >
> >> >
> >> > On Jan 10, 2013, at 10:56 PM, shashwat shriparv <
> >> dwivedishashwat@gmail.com>
> >> > wrote:
> >> >
> >> > > Hi Kavish,
> >> > >
> >> > > i have a better idea for you copy your image files to a single file
> on
> >> > > hdfs, and if new image comes append it to the existing image, and
> keep
> >> > and
> >> > > update the metadata and the offset to the HBase. Because if you put
> >> > bigger
> >> > > image in hbase it wil lead to some issue.
> >> > >
> >> > >
> >> > >
> >> > > ∞
> >> > > Shashwat Shriparv
> >> > >
> >> > >
> >> > >
> >> > > On Fri, Jan 11, 2013 at 9:21 AM, lars hofhansl <la...@apache.org>
> >> wrote:
> >> > >
> >> > >> Interesting. That's close to a PB if my math is correct.
> >> > >> Is there a write up about this somewhere? Something that we could
> link
> >> > >> from the HBase homepage?
> >> > >>
> >> > >> -- Lars
> >> > >>
> >> > >>
> >> > >> ----- Original Message -----
> >> > >> From: Jack Levin <ma...@gmail.com>
> >> > >> To: user@hbase.apache.org
> >> > >> Cc: Andrew Purtell <ap...@apache.org>
> >> > >> Sent: Thursday, January 10, 2013 9:24 AM
> >> > >> Subject: Re: Storing images in Hbase
> >> > >>
> >> > >> We stored about 1 billion images into hbase with file size up to
> 10MB.
> >> > >> Its been running for close to 2 years without issues and serves
> >> > >> delivery of images for Yfrog and ImageShack.  If you have any
> >> > >> questions about the setup, I would be glad to answer them.
> >> > >>
> >> > >> -Jack
> >> > >>
> >> > >> On Sun, Jan 6, 2013 at 1:09 PM, Mohit Anchlia <
> mohitanchlia@gmail.com
> >> >
> >> > >> wrote:
> >> > >>> I have done extensive testing and have found that blobs don't
> belong
> >> in
> >> > >> the
> >> > >>> databases but are rather best left out on the file system. Andrew
> >> > >> outlined
> >> > >>> issues that you'll face and not to mention IO issues when
> compaction
> >> > >> occurs
> >> > >>> over large files.
> >> > >>>
> >> > >>> On Sun, Jan 6, 2013 at 12:52 PM, Andrew Purtell <
> apurtell@apache.org
> >> >
> >> > >> wrote:
> >> > >>>
> >> > >>>> I meant this to say "a few really large values"
> >> > >>>>
> >> > >>>> On Sun, Jan 6, 2013 at 12:49 PM, Andrew Purtell <
> >> apurtell@apache.org>
> >> > >>>> wrote:
> >> > >>>>
> >> > >>>>> Consider if the split threshold is 2 GB but your one row
> contains
> >> 10
> >> > >> GB
> >> > >>>> as
> >> > >>>>> really large value.
> >> > >>>>
> >> > >>>>
> >> > >>>>
> >> > >>>>
> >> > >>>> --
> >> > >>>> Best regards,
> >> > >>>>
> >> > >>>>   - Andy
> >> > >>>>
> >> > >>>> Problems worthy of attack prove their worth by hitting back. -
> Piet
> >> > Hein
> >> > >>>> (via Tom White)
> >> > >>>>
> >> > >>
> >> > >>
> >> >
> >> >
> >>
>

Re: Storing images in Hbase

Posted by Jack Levin <ma...@gmail.com>.
We buffer all accesses to HBASE with Varnish SSD based caching layer.
So the impact for reads is negligible.  We have 70 node cluster, 8 GB
of RAM per node, relatively weak nodes (intel core 2 duo), with
10-12TB per server of disks.  Inserting 600,000 images per day.  We
have relatively little of compaction activity as we made our write
cache much larger than read cache - so we don't experience region file
fragmentation as much.

-Jack

On Fri, Jan 11, 2013 at 9:40 AM, Mohit Anchlia <mo...@gmail.com> wrote:
> I think it really depends on volume of the traffic, data distribution per
> region, how and when files compaction occurs, number of nodes in the
> cluster. In my experience when it comes to blob data where you are serving
> 10s of thousand+ requests/sec writes and reads then it's very difficult to
> manage HBase without very hard operations and maintenance in play. Jack
> earlier mentioned they have 1 billion images, It would be interesting to
> know what they see in terms of compaction, no of requests per sec. I'd be
> surprised that in high volume site it can be done without any Caching layer
> on the top to alleviate IO spikes that occurs because of GC and compactions.
>
> On Fri, Jan 11, 2013 at 7:27 AM, Mohammad Tariq <do...@gmail.com> wrote:
>
>> IMHO, if the image files are not too huge, Hbase can efficiently serve the
>> purpose. You can store some additional info along with the file depending
>> upon your search criteria to make the search faster. Say if you want to
>> fetch images by the type, you can store images in one column and its
>> extension in another column(jpg, tiff etc).
>>
>> BTW, what exactly is the problem which you are facing. You have written
>> "But I still cant do it"?
>>
>> Warm Regards,
>> Tariq
>> https://mtariq.jux.com/
>>
>>
>> On Fri, Jan 11, 2013 at 8:30 PM, Michael Segel <michael_segel@hotmail.com
>> >wrote:
>>
>> > That's a viable option.
>> > HDFS reads are faster than HBase, but it would require first hitting the
>> > index in HBase which points to the file and then fetching the file.
>> > It could be faster... we found storing binary data in a sequence file and
>> > indexed on HBase to be faster than HBase, however, YMMV and HBase has
>> been
>> > improved since we did that project....
>> >
>> >
>> > On Jan 10, 2013, at 10:56 PM, shashwat shriparv <
>> dwivedishashwat@gmail.com>
>> > wrote:
>> >
>> > > Hi Kavish,
>> > >
>> > > i have a better idea for you copy your image files to a single file on
>> > > hdfs, and if new image comes append it to the existing image, and keep
>> > and
>> > > update the metadata and the offset to the HBase. Because if you put
>> > bigger
>> > > image in hbase it wil lead to some issue.
>> > >
>> > >
>> > >
>> > > ∞
>> > > Shashwat Shriparv
>> > >
>> > >
>> > >
>> > > On Fri, Jan 11, 2013 at 9:21 AM, lars hofhansl <la...@apache.org>
>> wrote:
>> > >
>> > >> Interesting. That's close to a PB if my math is correct.
>> > >> Is there a write up about this somewhere? Something that we could link
>> > >> from the HBase homepage?
>> > >>
>> > >> -- Lars
>> > >>
>> > >>
>> > >> ----- Original Message -----
>> > >> From: Jack Levin <ma...@gmail.com>
>> > >> To: user@hbase.apache.org
>> > >> Cc: Andrew Purtell <ap...@apache.org>
>> > >> Sent: Thursday, January 10, 2013 9:24 AM
>> > >> Subject: Re: Storing images in Hbase
>> > >>
>> > >> We stored about 1 billion images into hbase with file size up to 10MB.
>> > >> Its been running for close to 2 years without issues and serves
>> > >> delivery of images for Yfrog and ImageShack.  If you have any
>> > >> questions about the setup, I would be glad to answer them.
>> > >>
>> > >> -Jack
>> > >>
>> > >> On Sun, Jan 6, 2013 at 1:09 PM, Mohit Anchlia <mohitanchlia@gmail.com
>> >
>> > >> wrote:
>> > >>> I have done extensive testing and have found that blobs don't belong
>> in
>> > >> the
>> > >>> databases but are rather best left out on the file system. Andrew
>> > >> outlined
>> > >>> issues that you'll face and not to mention IO issues when compaction
>> > >> occurs
>> > >>> over large files.
>> > >>>
>> > >>> On Sun, Jan 6, 2013 at 12:52 PM, Andrew Purtell <apurtell@apache.org
>> >
>> > >> wrote:
>> > >>>
>> > >>>> I meant this to say "a few really large values"
>> > >>>>
>> > >>>> On Sun, Jan 6, 2013 at 12:49 PM, Andrew Purtell <
>> apurtell@apache.org>
>> > >>>> wrote:
>> > >>>>
>> > >>>>> Consider if the split threshold is 2 GB but your one row contains
>> 10
>> > >> GB
>> > >>>> as
>> > >>>>> really large value.
>> > >>>>
>> > >>>>
>> > >>>>
>> > >>>>
>> > >>>> --
>> > >>>> Best regards,
>> > >>>>
>> > >>>>   - Andy
>> > >>>>
>> > >>>> Problems worthy of attack prove their worth by hitting back. - Piet
>> > Hein
>> > >>>> (via Tom White)
>> > >>>>
>> > >>
>> > >>
>> >
>> >
>>

Re: Storing images in Hbase

Posted by Mohit Anchlia <mo...@gmail.com>.
I think it really depends on volume of the traffic, data distribution per
region, how and when files compaction occurs, number of nodes in the
cluster. In my experience when it comes to blob data where you are serving
10s of thousand+ requests/sec writes and reads then it's very difficult to
manage HBase without very hard operations and maintenance in play. Jack
earlier mentioned they have 1 billion images, It would be interesting to
know what they see in terms of compaction, no of requests per sec. I'd be
surprised that in high volume site it can be done without any Caching layer
on the top to alleviate IO spikes that occurs because of GC and compactions.

On Fri, Jan 11, 2013 at 7:27 AM, Mohammad Tariq <do...@gmail.com> wrote:

> IMHO, if the image files are not too huge, Hbase can efficiently serve the
> purpose. You can store some additional info along with the file depending
> upon your search criteria to make the search faster. Say if you want to
> fetch images by the type, you can store images in one column and its
> extension in another column(jpg, tiff etc).
>
> BTW, what exactly is the problem which you are facing. You have written
> "But I still cant do it"?
>
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
>
>
> On Fri, Jan 11, 2013 at 8:30 PM, Michael Segel <michael_segel@hotmail.com
> >wrote:
>
> > That's a viable option.
> > HDFS reads are faster than HBase, but it would require first hitting the
> > index in HBase which points to the file and then fetching the file.
> > It could be faster... we found storing binary data in a sequence file and
> > indexed on HBase to be faster than HBase, however, YMMV and HBase has
> been
> > improved since we did that project....
> >
> >
> > On Jan 10, 2013, at 10:56 PM, shashwat shriparv <
> dwivedishashwat@gmail.com>
> > wrote:
> >
> > > Hi Kavish,
> > >
> > > i have a better idea for you copy your image files to a single file on
> > > hdfs, and if new image comes append it to the existing image, and keep
> > and
> > > update the metadata and the offset to the HBase. Because if you put
> > bigger
> > > image in hbase it wil lead to some issue.
> > >
> > >
> > >
> > > ∞
> > > Shashwat Shriparv
> > >
> > >
> > >
> > > On Fri, Jan 11, 2013 at 9:21 AM, lars hofhansl <la...@apache.org>
> wrote:
> > >
> > >> Interesting. That's close to a PB if my math is correct.
> > >> Is there a write up about this somewhere? Something that we could link
> > >> from the HBase homepage?
> > >>
> > >> -- Lars
> > >>
> > >>
> > >> ----- Original Message -----
> > >> From: Jack Levin <ma...@gmail.com>
> > >> To: user@hbase.apache.org
> > >> Cc: Andrew Purtell <ap...@apache.org>
> > >> Sent: Thursday, January 10, 2013 9:24 AM
> > >> Subject: Re: Storing images in Hbase
> > >>
> > >> We stored about 1 billion images into hbase with file size up to 10MB.
> > >> Its been running for close to 2 years without issues and serves
> > >> delivery of images for Yfrog and ImageShack.  If you have any
> > >> questions about the setup, I would be glad to answer them.
> > >>
> > >> -Jack
> > >>
> > >> On Sun, Jan 6, 2013 at 1:09 PM, Mohit Anchlia <mohitanchlia@gmail.com
> >
> > >> wrote:
> > >>> I have done extensive testing and have found that blobs don't belong
> in
> > >> the
> > >>> databases but are rather best left out on the file system. Andrew
> > >> outlined
> > >>> issues that you'll face and not to mention IO issues when compaction
> > >> occurs
> > >>> over large files.
> > >>>
> > >>> On Sun, Jan 6, 2013 at 12:52 PM, Andrew Purtell <apurtell@apache.org
> >
> > >> wrote:
> > >>>
> > >>>> I meant this to say "a few really large values"
> > >>>>
> > >>>> On Sun, Jan 6, 2013 at 12:49 PM, Andrew Purtell <
> apurtell@apache.org>
> > >>>> wrote:
> > >>>>
> > >>>>> Consider if the split threshold is 2 GB but your one row contains
> 10
> > >> GB
> > >>>> as
> > >>>>> really large value.
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>> --
> > >>>> Best regards,
> > >>>>
> > >>>>   - Andy
> > >>>>
> > >>>> Problems worthy of attack prove their worth by hitting back. - Piet
> > Hein
> > >>>> (via Tom White)
> > >>>>
> > >>
> > >>
> >
> >
>

Re: Storing images in Hbase

Posted by Mohammad Tariq <do...@gmail.com>.
IMHO, if the image files are not too huge, Hbase can efficiently serve the
purpose. You can store some additional info along with the file depending
upon your search criteria to make the search faster. Say if you want to
fetch images by the type, you can store images in one column and its
extension in another column(jpg, tiff etc).

BTW, what exactly is the problem which you are facing. You have written
"But I still cant do it"?

Warm Regards,
Tariq
https://mtariq.jux.com/


On Fri, Jan 11, 2013 at 8:30 PM, Michael Segel <mi...@hotmail.com>wrote:

> That's a viable option.
> HDFS reads are faster than HBase, but it would require first hitting the
> index in HBase which points to the file and then fetching the file.
> It could be faster... we found storing binary data in a sequence file and
> indexed on HBase to be faster than HBase, however, YMMV and HBase has been
> improved since we did that project....
>
>
> On Jan 10, 2013, at 10:56 PM, shashwat shriparv <dw...@gmail.com>
> wrote:
>
> > Hi Kavish,
> >
> > i have a better idea for you copy your image files to a single file on
> > hdfs, and if new image comes append it to the existing image, and keep
> and
> > update the metadata and the offset to the HBase. Because if you put
> bigger
> > image in hbase it wil lead to some issue.
> >
> >
> >
> > ∞
> > Shashwat Shriparv
> >
> >
> >
> > On Fri, Jan 11, 2013 at 9:21 AM, lars hofhansl <la...@apache.org> wrote:
> >
> >> Interesting. That's close to a PB if my math is correct.
> >> Is there a write up about this somewhere? Something that we could link
> >> from the HBase homepage?
> >>
> >> -- Lars
> >>
> >>
> >> ----- Original Message -----
> >> From: Jack Levin <ma...@gmail.com>
> >> To: user@hbase.apache.org
> >> Cc: Andrew Purtell <ap...@apache.org>
> >> Sent: Thursday, January 10, 2013 9:24 AM
> >> Subject: Re: Storing images in Hbase
> >>
> >> We stored about 1 billion images into hbase with file size up to 10MB.
> >> Its been running for close to 2 years without issues and serves
> >> delivery of images for Yfrog and ImageShack.  If you have any
> >> questions about the setup, I would be glad to answer them.
> >>
> >> -Jack
> >>
> >> On Sun, Jan 6, 2013 at 1:09 PM, Mohit Anchlia <mo...@gmail.com>
> >> wrote:
> >>> I have done extensive testing and have found that blobs don't belong in
> >> the
> >>> databases but are rather best left out on the file system. Andrew
> >> outlined
> >>> issues that you'll face and not to mention IO issues when compaction
> >> occurs
> >>> over large files.
> >>>
> >>> On Sun, Jan 6, 2013 at 12:52 PM, Andrew Purtell <ap...@apache.org>
> >> wrote:
> >>>
> >>>> I meant this to say "a few really large values"
> >>>>
> >>>> On Sun, Jan 6, 2013 at 12:49 PM, Andrew Purtell <ap...@apache.org>
> >>>> wrote:
> >>>>
> >>>>> Consider if the split threshold is 2 GB but your one row contains 10
> >> GB
> >>>> as
> >>>>> really large value.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Best regards,
> >>>>
> >>>>   - Andy
> >>>>
> >>>> Problems worthy of attack prove their worth by hitting back. - Piet
> Hein
> >>>> (via Tom White)
> >>>>
> >>
> >>
>
>

Re: Storing images in Hbase

Posted by Michael Segel <mi...@hotmail.com>.
That's a viable option. 
HDFS reads are faster than HBase, but it would require first hitting the index in HBase which points to the file and then fetching the file. 
It could be faster... we found storing binary data in a sequence file and indexed on HBase to be faster than HBase, however, YMMV and HBase has been improved since we did that project.... 


On Jan 10, 2013, at 10:56 PM, shashwat shriparv <dw...@gmail.com> wrote:

> Hi Kavish,
> 
> i have a better idea for you copy your image files to a single file on
> hdfs, and if new image comes append it to the existing image, and keep and
> update the metadata and the offset to the HBase. Because if you put bigger
> image in hbase it wil lead to some issue.
> 
> 
> 
> ∞
> Shashwat Shriparv
> 
> 
> 
> On Fri, Jan 11, 2013 at 9:21 AM, lars hofhansl <la...@apache.org> wrote:
> 
>> Interesting. That's close to a PB if my math is correct.
>> Is there a write up about this somewhere? Something that we could link
>> from the HBase homepage?
>> 
>> -- Lars
>> 
>> 
>> ----- Original Message -----
>> From: Jack Levin <ma...@gmail.com>
>> To: user@hbase.apache.org
>> Cc: Andrew Purtell <ap...@apache.org>
>> Sent: Thursday, January 10, 2013 9:24 AM
>> Subject: Re: Storing images in Hbase
>> 
>> We stored about 1 billion images into hbase with file size up to 10MB.
>> Its been running for close to 2 years without issues and serves
>> delivery of images for Yfrog and ImageShack.  If you have any
>> questions about the setup, I would be glad to answer them.
>> 
>> -Jack
>> 
>> On Sun, Jan 6, 2013 at 1:09 PM, Mohit Anchlia <mo...@gmail.com>
>> wrote:
>>> I have done extensive testing and have found that blobs don't belong in
>> the
>>> databases but are rather best left out on the file system. Andrew
>> outlined
>>> issues that you'll face and not to mention IO issues when compaction
>> occurs
>>> over large files.
>>> 
>>> On Sun, Jan 6, 2013 at 12:52 PM, Andrew Purtell <ap...@apache.org>
>> wrote:
>>> 
>>>> I meant this to say "a few really large values"
>>>> 
>>>> On Sun, Jan 6, 2013 at 12:49 PM, Andrew Purtell <ap...@apache.org>
>>>> wrote:
>>>> 
>>>>> Consider if the split threshold is 2 GB but your one row contains 10
>> GB
>>>> as
>>>>> really large value.
>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Best regards,
>>>> 
>>>>   - Andy
>>>> 
>>>> Problems worthy of attack prove their worth by hitting back. - Piet Hein
>>>> (via Tom White)
>>>> 
>> 
>> 


Re: Storing images in Hbase

Posted by shashwat shriparv <dw...@gmail.com>.
Hi Kavish,

i have a better idea for you copy your image files to a single file on
hdfs, and if new image comes append it to the existing image, and keep and
update the metadata and the offset to the HBase. Because if you put bigger
image in hbase it wil lead to some issue.



∞
Shashwat Shriparv



On Fri, Jan 11, 2013 at 9:21 AM, lars hofhansl <la...@apache.org> wrote:

> Interesting. That's close to a PB if my math is correct.
> Is there a write up about this somewhere? Something that we could link
> from the HBase homepage?
>
> -- Lars
>
>
> ----- Original Message -----
> From: Jack Levin <ma...@gmail.com>
> To: user@hbase.apache.org
> Cc: Andrew Purtell <ap...@apache.org>
> Sent: Thursday, January 10, 2013 9:24 AM
> Subject: Re: Storing images in Hbase
>
> We stored about 1 billion images into hbase with file size up to 10MB.
> Its been running for close to 2 years without issues and serves
> delivery of images for Yfrog and ImageShack.  If you have any
> questions about the setup, I would be glad to answer them.
>
> -Jack
>
> On Sun, Jan 6, 2013 at 1:09 PM, Mohit Anchlia <mo...@gmail.com>
> wrote:
> > I have done extensive testing and have found that blobs don't belong in
> the
> > databases but are rather best left out on the file system. Andrew
> outlined
> > issues that you'll face and not to mention IO issues when compaction
> occurs
> > over large files.
> >
> > On Sun, Jan 6, 2013 at 12:52 PM, Andrew Purtell <ap...@apache.org>
> wrote:
> >
> >> I meant this to say "a few really large values"
> >>
> >> On Sun, Jan 6, 2013 at 12:49 PM, Andrew Purtell <ap...@apache.org>
> >> wrote:
> >>
> >> > Consider if the split threshold is 2 GB but your one row contains 10
> GB
> >> as
> >> > really large value.
> >>
> >>
> >>
> >>
> >>  --
> >> Best regards,
> >>
> >>    - Andy
> >>
> >> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> >> (via Tom White)
> >>
>
>

Re: Storing images in Hbase

Posted by Marcos Ortiz <ml...@uci.cu>.
It would be nice a blog post around this.

El 11/01/2013 0:51, lars hofhansl escribió:
> Interesting. That's close to a PB if my math is correct.
> Is there a write up about this somewhere? Something that we could link from the HBase homepage?
>
> -- Lars
>
>
> ----- Original Message -----
> From: Jack Levin <ma...@gmail.com>
> To: user@hbase.apache.org
> Cc: Andrew Purtell <ap...@apache.org>
> Sent: Thursday, January 10, 2013 9:24 AM
> Subject: Re: Storing images in Hbase
>
> We stored about 1 billion images into hbase with file size up to 10MB.
> Its been running for close to 2 years without issues and serves
> delivery of images for Yfrog and ImageShack.  If you have any
> questions about the setup, I would be glad to answer them.
>
> -Jack
>
> On Sun, Jan 6, 2013 at 1:09 PM, Mohit Anchlia <mo...@gmail.com> wrote:
>> I have done extensive testing and have found that blobs don't belong in the
>> databases but are rather best left out on the file system. Andrew outlined
>> issues that you'll face and not to mention IO issues when compaction occurs
>> over large files.
>>
>> On Sun, Jan 6, 2013 at 12:52 PM, Andrew Purtell <ap...@apache.org> wrote:
>>
>>> I meant this to say "a few really large values"
>>>
>>> On Sun, Jan 6, 2013 at 12:49 PM, Andrew Purtell <ap...@apache.org>
>>> wrote:
>>>
>>>> Consider if the split threshold is 2 GB but your one row contains 10 GB
>>> as
>>>> really large value.
>>>
>>>
>>>
>>>    --
>>> Best regards,
>>>
>>>      - Andy
>>>
>>> Problems worthy of attack prove their worth by hitting back. - Piet Hein
>>> (via Tom White)
>>>
>
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>
> http://www.uci.cu
> http://www.facebook.com/universidad.uci
> http://www.flickr.com/photos/universidad_uci




10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

Re: Storing images in Hbase

Posted by lars hofhansl <la...@apache.org>.
Interesting. That's close to a PB if my math is correct.
Is there a write up about this somewhere? Something that we could link from the HBase homepage?

-- Lars


----- Original Message -----
From: Jack Levin <ma...@gmail.com>
To: user@hbase.apache.org
Cc: Andrew Purtell <ap...@apache.org>
Sent: Thursday, January 10, 2013 9:24 AM
Subject: Re: Storing images in Hbase

We stored about 1 billion images into hbase with file size up to 10MB.
Its been running for close to 2 years without issues and serves
delivery of images for Yfrog and ImageShack.  If you have any
questions about the setup, I would be glad to answer them.

-Jack

On Sun, Jan 6, 2013 at 1:09 PM, Mohit Anchlia <mo...@gmail.com> wrote:
> I have done extensive testing and have found that blobs don't belong in the
> databases but are rather best left out on the file system. Andrew outlined
> issues that you'll face and not to mention IO issues when compaction occurs
> over large files.
>
> On Sun, Jan 6, 2013 at 12:52 PM, Andrew Purtell <ap...@apache.org> wrote:
>
>> I meant this to say "a few really large values"
>>
>> On Sun, Jan 6, 2013 at 12:49 PM, Andrew Purtell <ap...@apache.org>
>> wrote:
>>
>> > Consider if the split threshold is 2 GB but your one row contains 10 GB
>> as
>> > really large value.
>>
>>
>>
>>
>>  --
>> Best regards,
>>
>>    - Andy
>>
>> Problems worthy of attack prove their worth by hitting back. - Piet Hein
>> (via Tom White)
>>


Re: Storing images in Hbase

Posted by Mohammad Tariq <do...@gmail.com>.
Thanks Leonid.

Warm Regards,
Tariq
https://mtariq.jux.com/


On Fri, Jan 11, 2013 at 2:15 AM, Leonid Fedotov <lf...@hortonworks.com>wrote:

> I'm voting for continuing here as well…
> So, location is up to Jack. :)
>
> Thank you!
>
> Sincerely,
> Leonid Fedotov
>
> On Jan 10, 2013, at 11:24 AM, Mohammad Tariq wrote:
>
> > Jack, Leonid,
> >
> >    I request you guys to please continue the discussion
> > through the thread itself if possible for you both. I would
> > like to know about Jack's setup. I too find it quite interesting.
> >
> > Many thanks.
> >
> > Warm Regards,
> > Tariq
> > https://mtariq.jux.com/
> >
> >
> > On Fri, Jan 11, 2013 at 12:50 AM, Leonid Fedotov
> > <lf...@hortonworks.com>wrote:
> >
> >> Jack,
> >> yes, this is very interesting to know your setup details.
> >> Could you please provide more information?
> >> Or we can take this off the list if you like…
> >>
> >> Thank you!
> >>
> >> Sincerely,
> >> Leonid Fedotov
> >>
> >> On Jan 10, 2013, at 9:24 AM, Jack Levin wrote:
> >>
> >>> We stored about 1 billion images into hbase with file size up to 10MB.
> >>> Its been running for close to 2 years without issues and serves
> >>> delivery of images for Yfrog and ImageShack.  If you have any
> >>> questions about the setup, I would be glad to answer them.
> >>>
> >>> -Jack
> >>>
> >>> On Sun, Jan 6, 2013 at 1:09 PM, Mohit Anchlia <mo...@gmail.com>
> >> wrote:
> >>>> I have done extensive testing and have found that blobs don't belong
> in
> >> the
> >>>> databases but are rather best left out on the file system. Andrew
> >> outlined
> >>>> issues that you'll face and not to mention IO issues when compaction
> >> occurs
> >>>> over large files.
> >>>>
> >>>> On Sun, Jan 6, 2013 at 12:52 PM, Andrew Purtell <ap...@apache.org>
> >> wrote:
> >>>>
> >>>>> I meant this to say "a few really large values"
> >>>>>
> >>>>> On Sun, Jan 6, 2013 at 12:49 PM, Andrew Purtell <apurtell@apache.org
> >
> >>>>> wrote:
> >>>>>
> >>>>>> Consider if the split threshold is 2 GB but your one row contains 10
> >> GB
> >>>>> as
> >>>>>> really large value.
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Best regards,
> >>>>>
> >>>>>  - Andy
> >>>>>
> >>>>> Problems worthy of attack prove their worth by hitting back. - Piet
> >> Hein
> >>>>> (via Tom White)
> >>>>>
> >>
> >>
>
>

Re: Storing images in Hbase

Posted by Leonid Fedotov <lf...@hortonworks.com>.
I'm voting for continuing here as well…
So, location is up to Jack. :)

Thank you!

Sincerely,
Leonid Fedotov

On Jan 10, 2013, at 11:24 AM, Mohammad Tariq wrote:

> Jack, Leonid,
> 
>    I request you guys to please continue the discussion
> through the thread itself if possible for you both. I would
> like to know about Jack's setup. I too find it quite interesting.
> 
> Many thanks.
> 
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
> 
> 
> On Fri, Jan 11, 2013 at 12:50 AM, Leonid Fedotov
> <lf...@hortonworks.com>wrote:
> 
>> Jack,
>> yes, this is very interesting to know your setup details.
>> Could you please provide more information?
>> Or we can take this off the list if you like…
>> 
>> Thank you!
>> 
>> Sincerely,
>> Leonid Fedotov
>> 
>> On Jan 10, 2013, at 9:24 AM, Jack Levin wrote:
>> 
>>> We stored about 1 billion images into hbase with file size up to 10MB.
>>> Its been running for close to 2 years without issues and serves
>>> delivery of images for Yfrog and ImageShack.  If you have any
>>> questions about the setup, I would be glad to answer them.
>>> 
>>> -Jack
>>> 
>>> On Sun, Jan 6, 2013 at 1:09 PM, Mohit Anchlia <mo...@gmail.com>
>> wrote:
>>>> I have done extensive testing and have found that blobs don't belong in
>> the
>>>> databases but are rather best left out on the file system. Andrew
>> outlined
>>>> issues that you'll face and not to mention IO issues when compaction
>> occurs
>>>> over large files.
>>>> 
>>>> On Sun, Jan 6, 2013 at 12:52 PM, Andrew Purtell <ap...@apache.org>
>> wrote:
>>>> 
>>>>> I meant this to say "a few really large values"
>>>>> 
>>>>> On Sun, Jan 6, 2013 at 12:49 PM, Andrew Purtell <ap...@apache.org>
>>>>> wrote:
>>>>> 
>>>>>> Consider if the split threshold is 2 GB but your one row contains 10
>> GB
>>>>> as
>>>>>> really large value.
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Best regards,
>>>>> 
>>>>>  - Andy
>>>>> 
>>>>> Problems worthy of attack prove their worth by hitting back. - Piet
>> Hein
>>>>> (via Tom White)
>>>>> 
>> 
>> 


Re: Storing images in Hbase

Posted by Michael Segel <mi...@hotmail.com>.
Been there, done that... kind of an interesting problem... 

Someone earlier said that HBase isn't good for images.  It works pretty well, again it depends on the use case.

Your schema is also going to play a role and you're going to have to tune things a little differently because when you pull an image, you're pulling a larger chunk of data as well as you want to make sure you can fit a decent number of images within a region. 


How are you planning on using the images? Are you going to run a M/R job and see if you can't spot landmarks and businesses in a photo? Language translations? 
Or just a repository? 


On Jan 10, 2013, at 12:23 PM, Marcos Ortiz <ml...@uci.cu> wrote:

> This is a very interesting setup to analyze. I´m working in a similar problem
> with HBase, so, any help is welcome.
> 
> El 10/01/2013 16:39, Doug Meil escribió:
>> +1.
>> 
>> This question comes up enough on the dist-list it's worth getting some
>> pointers on record.
>> 
>> 
>> 
>> 
>> 
>> On 1/10/13 2:24 PM, "Mohammad Tariq" <do...@gmail.com> wrote:
>> 
>>> Jack, Leonid,
>>> 
>>>    I request you guys to please continue the discussion
>>> through the thread itself if possible for you both. I would
>>> like to know about Jack's setup. I too find it quite interesting.
>>> 
>>> Many thanks.
>>> 
>>> Warm Regards,
>>> Tariq
>>> https://mtariq.jux.com/
>>> 
>>> 
>>> On Fri, Jan 11, 2013 at 12:50 AM, Leonid Fedotov
>>> <lf...@hortonworks.com>wrote:
>>> 
>>>> Jack,
>>>> yes, this is very interesting to know your setup details.
>>>> Could you please provide more information?
>>>> Or we can take this off the list if you likeŠ
>>>> 
>>>> Thank you!
>>>> 
>>>> Sincerely,
>>>> Leonid Fedotov
>>>> 
>>>> On Jan 10, 2013, at 9:24 AM, Jack Levin wrote:
>>>> 
>>>>> We stored about 1 billion images into hbase with file size up to 10MB.
>>>>> Its been running for close to 2 years without issues and serves
>>>>> delivery of images for Yfrog and ImageShack.  If you have any
>>>>> questions about the setup, I would be glad to answer them.
>>>>> 
>>>>> -Jack
>>>>> 
>>>>> On Sun, Jan 6, 2013 at 1:09 PM, Mohit Anchlia <mo...@gmail.com>
>>>> wrote:
>>>>>> I have done extensive testing and have found that blobs don't belong
>>>> in
>>>> the
>>>>>> databases but are rather best left out on the file system. Andrew
>>>> outlined
>>>>>> issues that you'll face and not to mention IO issues when compaction
>>>> occurs
>>>>>> over large files.
>>>>>> 
>>>>>> On Sun, Jan 6, 2013 at 12:52 PM, Andrew Purtell <ap...@apache.org>
>>>> wrote:
>>>>>>> I meant this to say "a few really large values"
>>>>>>> 
>>>>>>> On Sun, Jan 6, 2013 at 12:49 PM, Andrew Purtell
>>>> <ap...@apache.org>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Consider if the split threshold is 2 GB but your one row contains
>>>> 10
>>>> GB
>>>>>>> as
>>>>>>>> really large value.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Best regards,
>>>>>>> 
>>>>>>>   - Andy
>>>>>>> 
>>>>>>> Problems worthy of attack prove their worth by hitting back. - Piet
>>>> Hein
>>>>>>> (via Tom White)
>>>>>>> 
>>>> 
> 
> -- 
> 
> Marcos Ortíz Valmaseda
> Blog: http://marcosluis2186.posterous.com
> Twitter: @marcosluis2186 <http://twitter.com/marcosluis2186>
> 
> 
> 
> 
> 
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> 
> http://www.uci.cu
> http://www.facebook.com/universidad.uci
> http://www.flickr.com/photos/universidad_uci


Re: Storing images in Hbase

Posted by Marcos Ortiz <ml...@uci.cu>.
This is a very interesting setup to analyze. I´m working in a similar 
problem
with HBase, so, any help is welcome.

El 10/01/2013 16:39, Doug Meil escribió:
> +1.
>
> This question comes up enough on the dist-list it's worth getting some
> pointers on record.
>
>
>
>
>
> On 1/10/13 2:24 PM, "Mohammad Tariq" <do...@gmail.com> wrote:
>
>> Jack, Leonid,
>>
>>     I request you guys to please continue the discussion
>> through the thread itself if possible for you both. I would
>> like to know about Jack's setup. I too find it quite interesting.
>>
>> Many thanks.
>>
>> Warm Regards,
>> Tariq
>> https://mtariq.jux.com/
>>
>>
>> On Fri, Jan 11, 2013 at 12:50 AM, Leonid Fedotov
>> <lf...@hortonworks.com>wrote:
>>
>>> Jack,
>>> yes, this is very interesting to know your setup details.
>>> Could you please provide more information?
>>> Or we can take this off the list if you likeŠ
>>>
>>> Thank you!
>>>
>>> Sincerely,
>>> Leonid Fedotov
>>>
>>> On Jan 10, 2013, at 9:24 AM, Jack Levin wrote:
>>>
>>>> We stored about 1 billion images into hbase with file size up to 10MB.
>>>> Its been running for close to 2 years without issues and serves
>>>> delivery of images for Yfrog and ImageShack.  If you have any
>>>> questions about the setup, I would be glad to answer them.
>>>>
>>>> -Jack
>>>>
>>>> On Sun, Jan 6, 2013 at 1:09 PM, Mohit Anchlia <mo...@gmail.com>
>>> wrote:
>>>>> I have done extensive testing and have found that blobs don't belong
>>> in
>>> the
>>>>> databases but are rather best left out on the file system. Andrew
>>> outlined
>>>>> issues that you'll face and not to mention IO issues when compaction
>>> occurs
>>>>> over large files.
>>>>>
>>>>> On Sun, Jan 6, 2013 at 12:52 PM, Andrew Purtell <ap...@apache.org>
>>> wrote:
>>>>>> I meant this to say "a few really large values"
>>>>>>
>>>>>> On Sun, Jan 6, 2013 at 12:49 PM, Andrew Purtell
>>> <ap...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> Consider if the split threshold is 2 GB but your one row contains
>>> 10
>>> GB
>>>>>> as
>>>>>>> really large value.
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best regards,
>>>>>>
>>>>>>    - Andy
>>>>>>
>>>>>> Problems worthy of attack prove their worth by hitting back. - Piet
>>> Hein
>>>>>> (via Tom White)
>>>>>>
>>>

-- 

Marcos Ortíz Valmaseda
Blog: http://marcosluis2186.posterous.com
Twitter: @marcosluis2186 <http://twitter.com/marcosluis2186>





10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

Re: Storing images in Hbase

Posted by Doug Meil <do...@explorysmedical.com>.
+1.

This question comes up enough on the dist-list it's worth getting some
pointers on record.





On 1/10/13 2:24 PM, "Mohammad Tariq" <do...@gmail.com> wrote:

>Jack, Leonid,
>
>    I request you guys to please continue the discussion
>through the thread itself if possible for you both. I would
>like to know about Jack's setup. I too find it quite interesting.
>
>Many thanks.
>
>Warm Regards,
>Tariq
>https://mtariq.jux.com/
>
>
>On Fri, Jan 11, 2013 at 12:50 AM, Leonid Fedotov
><lf...@hortonworks.com>wrote:
>
>> Jack,
>> yes, this is very interesting to know your setup details.
>> Could you please provide more information?
>> Or we can take this off the list if you likeŠ
>>
>> Thank you!
>>
>> Sincerely,
>> Leonid Fedotov
>>
>> On Jan 10, 2013, at 9:24 AM, Jack Levin wrote:
>>
>> > We stored about 1 billion images into hbase with file size up to 10MB.
>> > Its been running for close to 2 years without issues and serves
>> > delivery of images for Yfrog and ImageShack.  If you have any
>> > questions about the setup, I would be glad to answer them.
>> >
>> > -Jack
>> >
>> > On Sun, Jan 6, 2013 at 1:09 PM, Mohit Anchlia <mo...@gmail.com>
>> wrote:
>> >> I have done extensive testing and have found that blobs don't belong
>>in
>> the
>> >> databases but are rather best left out on the file system. Andrew
>> outlined
>> >> issues that you'll face and not to mention IO issues when compaction
>> occurs
>> >> over large files.
>> >>
>> >> On Sun, Jan 6, 2013 at 12:52 PM, Andrew Purtell <ap...@apache.org>
>> wrote:
>> >>
>> >>> I meant this to say "a few really large values"
>> >>>
>> >>> On Sun, Jan 6, 2013 at 12:49 PM, Andrew Purtell
>><ap...@apache.org>
>> >>> wrote:
>> >>>
>> >>>> Consider if the split threshold is 2 GB but your one row contains
>>10
>> GB
>> >>> as
>> >>>> really large value.
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Best regards,
>> >>>
>> >>>   - Andy
>> >>>
>> >>> Problems worthy of attack prove their worth by hitting back. - Piet
>> Hein
>> >>> (via Tom White)
>> >>>
>>
>>



Re: Storing images in Hbase

Posted by Mohammad Tariq <do...@gmail.com>.
Jack, Leonid,

    I request you guys to please continue the discussion
through the thread itself if possible for you both. I would
like to know about Jack's setup. I too find it quite interesting.

Many thanks.

Warm Regards,
Tariq
https://mtariq.jux.com/


On Fri, Jan 11, 2013 at 12:50 AM, Leonid Fedotov
<lf...@hortonworks.com>wrote:

> Jack,
> yes, this is very interesting to know your setup details.
> Could you please provide more information?
> Or we can take this off the list if you like…
>
> Thank you!
>
> Sincerely,
> Leonid Fedotov
>
> On Jan 10, 2013, at 9:24 AM, Jack Levin wrote:
>
> > We stored about 1 billion images into hbase with file size up to 10MB.
> > Its been running for close to 2 years without issues and serves
> > delivery of images for Yfrog and ImageShack.  If you have any
> > questions about the setup, I would be glad to answer them.
> >
> > -Jack
> >
> > On Sun, Jan 6, 2013 at 1:09 PM, Mohit Anchlia <mo...@gmail.com>
> wrote:
> >> I have done extensive testing and have found that blobs don't belong in
> the
> >> databases but are rather best left out on the file system. Andrew
> outlined
> >> issues that you'll face and not to mention IO issues when compaction
> occurs
> >> over large files.
> >>
> >> On Sun, Jan 6, 2013 at 12:52 PM, Andrew Purtell <ap...@apache.org>
> wrote:
> >>
> >>> I meant this to say "a few really large values"
> >>>
> >>> On Sun, Jan 6, 2013 at 12:49 PM, Andrew Purtell <ap...@apache.org>
> >>> wrote:
> >>>
> >>>> Consider if the split threshold is 2 GB but your one row contains 10
> GB
> >>> as
> >>>> really large value.
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>> Best regards,
> >>>
> >>>   - Andy
> >>>
> >>> Problems worthy of attack prove their worth by hitting back. - Piet
> Hein
> >>> (via Tom White)
> >>>
>
>

Re: Storing images in Hbase

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
It might be interesting to share that here, just in case someone else
is facing the same usecase?

JM

2013/1/10, Leonid Fedotov <lf...@hortonworks.com>:
> Jack,
> yes, this is very interesting to know your setup details.
> Could you please provide more information?
> Or we can take this off the list if you like…
>
> Thank you!
>
> Sincerely,
> Leonid Fedotov
>
> On Jan 10, 2013, at 9:24 AM, Jack Levin wrote:
>
>> We stored about 1 billion images into hbase with file size up to 10MB.
>> Its been running for close to 2 years without issues and serves
>> delivery of images for Yfrog and ImageShack.  If you have any
>> questions about the setup, I would be glad to answer them.
>>
>> -Jack
>>
>> On Sun, Jan 6, 2013 at 1:09 PM, Mohit Anchlia <mo...@gmail.com>
>> wrote:
>>> I have done extensive testing and have found that blobs don't belong in
>>> the
>>> databases but are rather best left out on the file system. Andrew
>>> outlined
>>> issues that you'll face and not to mention IO issues when compaction
>>> occurs
>>> over large files.
>>>
>>> On Sun, Jan 6, 2013 at 12:52 PM, Andrew Purtell <ap...@apache.org>
>>> wrote:
>>>
>>>> I meant this to say "a few really large values"
>>>>
>>>> On Sun, Jan 6, 2013 at 12:49 PM, Andrew Purtell <ap...@apache.org>
>>>> wrote:
>>>>
>>>>> Consider if the split threshold is 2 GB but your one row contains 10
>>>>> GB
>>>> as
>>>>> really large value.
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best regards,
>>>>
>>>>   - Andy
>>>>
>>>> Problems worthy of attack prove their worth by hitting back. - Piet
>>>> Hein
>>>> (via Tom White)
>>>>
>
>

Re: Storing images in Hbase

Posted by Leonid Fedotov <lf...@hortonworks.com>.
Jack,
yes, this is very interesting to know your setup details.
Could you please provide more information?
Or we can take this off the list if you like…

Thank you!

Sincerely,
Leonid Fedotov

On Jan 10, 2013, at 9:24 AM, Jack Levin wrote:

> We stored about 1 billion images into hbase with file size up to 10MB.
> Its been running for close to 2 years without issues and serves
> delivery of images for Yfrog and ImageShack.  If you have any
> questions about the setup, I would be glad to answer them.
> 
> -Jack
> 
> On Sun, Jan 6, 2013 at 1:09 PM, Mohit Anchlia <mo...@gmail.com> wrote:
>> I have done extensive testing and have found that blobs don't belong in the
>> databases but are rather best left out on the file system. Andrew outlined
>> issues that you'll face and not to mention IO issues when compaction occurs
>> over large files.
>> 
>> On Sun, Jan 6, 2013 at 12:52 PM, Andrew Purtell <ap...@apache.org> wrote:
>> 
>>> I meant this to say "a few really large values"
>>> 
>>> On Sun, Jan 6, 2013 at 12:49 PM, Andrew Purtell <ap...@apache.org>
>>> wrote:
>>> 
>>>> Consider if the split threshold is 2 GB but your one row contains 10 GB
>>> as
>>>> really large value.
>>> 
>>> 
>>> 
>>> 
>>> --
>>> Best regards,
>>> 
>>>   - Andy
>>> 
>>> Problems worthy of attack prove their worth by hitting back. - Piet Hein
>>> (via Tom White)
>>> 


Re: Storing images in Hbase

Posted by Jack Levin <ma...@gmail.com>.
We stored about 1 billion images into hbase with file size up to 10MB.
 Its been running for close to 2 years without issues and serves
delivery of images for Yfrog and ImageShack.  If you have any
questions about the setup, I would be glad to answer them.

-Jack

On Sun, Jan 6, 2013 at 1:09 PM, Mohit Anchlia <mo...@gmail.com> wrote:
> I have done extensive testing and have found that blobs don't belong in the
> databases but are rather best left out on the file system. Andrew outlined
> issues that you'll face and not to mention IO issues when compaction occurs
> over large files.
>
> On Sun, Jan 6, 2013 at 12:52 PM, Andrew Purtell <ap...@apache.org> wrote:
>
>> I meant this to say "a few really large values"
>>
>> On Sun, Jan 6, 2013 at 12:49 PM, Andrew Purtell <ap...@apache.org>
>> wrote:
>>
>> > Consider if the split threshold is 2 GB but your one row contains 10 GB
>> as
>> > really large value.
>>
>>
>>
>>
>>  --
>> Best regards,
>>
>>    - Andy
>>
>> Problems worthy of attack prove their worth by hitting back. - Piet Hein
>> (via Tom White)
>>

Re: Storing images in Hbase

Posted by Mohit Anchlia <mo...@gmail.com>.
I have done extensive testing and have found that blobs don't belong in the
databases but are rather best left out on the file system. Andrew outlined
issues that you'll face and not to mention IO issues when compaction occurs
over large files.

On Sun, Jan 6, 2013 at 12:52 PM, Andrew Purtell <ap...@apache.org> wrote:

> I meant this to say "a few really large values"
>
> On Sun, Jan 6, 2013 at 12:49 PM, Andrew Purtell <ap...@apache.org>
> wrote:
>
> > Consider if the split threshold is 2 GB but your one row contains 10 GB
> as
> > really large value.
>
>
>
>
>  --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>

Re: Storing images in Hbase

Posted by Andrew Purtell <ap...@apache.org>.
I meant this to say "a few really large values"

On Sun, Jan 6, 2013 at 12:49 PM, Andrew Purtell <ap...@apache.org> wrote:

> Consider if the split threshold is 2 GB but your one row contains 10 GB as
> really large value.




-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: Storing images in Hbase

Posted by Andrew Purtell <ap...@apache.org>.
What do you mean by "very large"?

One possible source of performance concern is HBase RPC does not do
positioned/chunked/partial reads, so both on the RegionServer and client
the entirety of value data will be in the heap. A lot of really large
objects brought in this way under high concurrency can cause excessive GC
from fragmentation or OOME conditions if the heap isn't adequately sized.
The recommendation of ~10 MB max is to mitigate these effects. There is
nothing scientific about that number though, it's a rule of thumb, I've
built HBase applications with a max value size of 100 MB and it performed
adequately. (Larger objects were split into 100 MB chunks and keyed as
$rowkey$chunk where $chunk was an integer serialized with Bytes.toInt()).

Another is a consequence of the fact a row cannot be split. This means that
if the data in a single row grows significantly larger than the region
split threshold, you will have this one region sized differently from the
others, and this can lead to unexpected behavior. Consider if the split
threshold is 2 GB but your one row contains 10 GB as really large value.
This is undesirable because HBase expects housekeeping on a given region to
be more or less equal to others: compaction, etc.

>From the application POV, if you have a few really big value size outliers,
then these could be like land mines if the app is short scanning over table
data. Gets or Scans including such values will have widely varying latency
from others. But this would be an application design problem.



On Sun, Jan 6, 2013 at 12:28 PM, Asaf Mesika <as...@gmail.com> wrote:

> What's the penalty performance wise of saving a very large value in a
> KeyValue in hbase? Splits, scans, etc.
>
> Sent from my iPad
>
> On 6 בינו 2013, at 22:12, Andrew Purtell <ap...@apache.org> wrote:
>
> > Also YFrog / ImageShack serves all of its assets out of HBase too, so for
> > reasonably sized images some are having success. See
> > http://www.slideshare.net/jacque74/hug-hbase-presentation
> >
> >
> > On Sun, Jan 6, 2013 at 3:58 AM, Yusup Ashrap <ap...@gmail.com> wrote:
> >
> >> there are a lot great discussions on Quora on this topic.
> >>
> >>
> http://www.quora.com/Apache-Hadoop/Is-HBase-appropriate-for-indexed-blob-storage-in-HDFS
> >> http://www.quora.com/Is-it-possible-to-use-HDFS-HBase-to-serve-images
> >>
> >>
> http://www.quora.com/What-is-a-good-choice-for-storing-blob-like-files-in-a-distributed-environment
> >>
> >
> >
> >
> > --
> > Best regards,
> >
> >   - Andy
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet Hein
> > (via Tom White)
>



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: Storing images in Hbase

Posted by Asaf Mesika <as...@gmail.com>.
What's the penalty performance wise of saving a very large value in a
KeyValue in hbase? Splits, scans, etc.

Sent from my iPad

On 6 בינו 2013, at 22:12, Andrew Purtell <ap...@apache.org> wrote:

> Also YFrog / ImageShack serves all of its assets out of HBase too, so for
> reasonably sized images some are having success. See
> http://www.slideshare.net/jacque74/hug-hbase-presentation
>
>
> On Sun, Jan 6, 2013 at 3:58 AM, Yusup Ashrap <ap...@gmail.com> wrote:
>
>> there are a lot great discussions on Quora on this topic.
>>
>> http://www.quora.com/Apache-Hadoop/Is-HBase-appropriate-for-indexed-blob-storage-in-HDFS
>> http://www.quora.com/Is-it-possible-to-use-HDFS-HBase-to-serve-images
>>
>> http://www.quora.com/What-is-a-good-choice-for-storing-blob-like-files-in-a-distributed-environment
>>
>
>
>
> --
> Best regards,
>
>   - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)

Re: Storing images in Hbase

Posted by Amandeep Khurana <am...@gmail.com>.
To add to Andy's point - storing images in HBase is fine as long as
the size of each image isn't huge. A couple for MBs per row in HBase
do just fine. But once you start getting into 10s of MBs, there are
more optimal solutions you can explore and HBase might not be the best
bet.

Amandeep

On Jan 6, 2013, at 12:12 PM, Andrew Purtell <ap...@apache.org> wrote:

> Also YFrog / ImageShack serves all of its assets out of HBase too, so for
> reasonably sized images some are having success. See
> http://www.slideshare.net/jacque74/hug-hbase-presentation
>
>
> On Sun, Jan 6, 2013 at 3:58 AM, Yusup Ashrap <ap...@gmail.com> wrote:
>
>> there are a lot great discussions on Quora on this topic.
>>
>> http://www.quora.com/Apache-Hadoop/Is-HBase-appropriate-for-indexed-blob-storage-in-HDFS
>> http://www.quora.com/Is-it-possible-to-use-HDFS-HBase-to-serve-images
>>
>> http://www.quora.com/What-is-a-good-choice-for-storing-blob-like-files-in-a-distributed-environment
>
>
>
> --
> Best regards,
>
>   - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)

Re: Storing images in Hbase

Posted by Andrew Purtell <ap...@apache.org>.
Also YFrog / ImageShack serves all of its assets out of HBase too, so for
reasonably sized images some are having success. See
http://www.slideshare.net/jacque74/hug-hbase-presentation


On Sun, Jan 6, 2013 at 3:58 AM, Yusup Ashrap <ap...@gmail.com> wrote:

> there are a lot great discussions on Quora on this topic.
>
> http://www.quora.com/Apache-Hadoop/Is-HBase-appropriate-for-indexed-blob-storage-in-HDFS
> http://www.quora.com/Is-it-possible-to-use-HDFS-HBase-to-serve-images
>
> http://www.quora.com/What-is-a-good-choice-for-storing-blob-like-files-in-a-distributed-environment
>



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: Storing images in Hbase

Posted by Yusup Ashrap <ap...@gmail.com>.
there are a lot great discussions on Quora on this topic.
http://www.quora.com/Apache-Hadoop/Is-HBase-appropriate-for-indexed-blob-storage-in-HDFS
http://www.quora.com/Is-it-possible-to-use-HDFS-HBase-to-serve-images
http://www.quora.com/What-is-a-good-choice-for-storing-blob-like-files-in-a-distributed-environment

Re: Storing images in Hbase

Posted by Damien Hardy <dh...@viadeoteam.com>.
Hi there,
Thank you, and happy new year.
I had the same problematic and wrote a python module⁰ for thumbor¹
I use the Thrift interface for HBase to store image blobs.
As allready said you have to keep images blob quite small (for latency
problematic in web you have to keep them small too) ~100ko, so HBase should
keep good performances.

BTW Stumbleupon store all its assets in HBase :
http://bb10.com/java-hadoop-hbase-user/2012-03/msg00054.html

[0] https://github.com/dhardy92/thumbor_hbase
[1] https://github.com/globocom/thumbor/wiki

Cheers,

-- 
Damien

Le 6 janv. 2013 04:46, "kavishahuja" <ka...@yahoo.com> a écrit :

> *Hello EVERYBODY
> first of all, a happy new year to everyone !!
> I need a small help regarding pushing images into apache HBase(DB)...i know
> its about converting objects into bytes and then saving those bytes into
> hbase rows. But still i cant do it.
> Kindly help !! *
>
> Regards,
> Kavish
>
>