You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Ascot Moss <as...@gmail.com> on 2016/04/17 03:21:01 UTC

To Store Large Number of Video and Image files

Hi,

I have a project that needs to store large number of image and video files,
the file size varies from 10MB to 10GB, the initial number of files will be
0.1 billion and would grow over 1 billion, what will be the practical
recommendations to store and view these files?



#1 One cluster, store the HDFS URL in HBase and store the actual file in
HDFS? (block_size as 128MB and replication factor as 3)


#2 One cluster, Store small files in HBase directly and use #1 for large
files? (block_size as 128MB and replication factor as 3)


#3 Multiple Hadoop/HBase clusters, each with different block_size settings?


     e.g. cluster 1 (small): block_size as 128MB and replication factor as
3, store all files in HBase if their file size is smaller 128MB

            cluster 2 (large): bigger block_size, say 4GB, replication
factor as 3, store the HDFS URL in HBase and store the actual file in HDFS



#4 Use Hadoop Federation for large number of files?


About Fault Tolerance, need to consider four types of failures: driver,
host, rack, and  datacenter failures.


Regards

Re: To Store Large Number of Video and Image files

Posted by Ascot Moss <as...@gmail.com>.
Sorry to ask the following question again.


About HBase-11339,
"The size of the MOB data could not be very large, it better to keep the
MOB size within 100KB and 10MB. Since MOB cells are written into the
memstore before flushing, large MOB cells stress the memory in region
servers."

Can this be resolved if we provide more RAM in region servers? for
instances, the servers in the cluster, each has 768GB RAM + 14 x 6T HDD.

regards

On Sun, Apr 17, 2016 at 2:37 PM, Ascot Moss <as...@gmail.com> wrote:

> Hi,
>
> Any idea about the implementation of Facebook f4? Does it use HBase as the
> indexer?
>
> Regards
>
> On Sun, Apr 17, 2016 at 2:36 PM, Ascot Moss <as...@gmail.com> wrote:
>
>> Hi,
>>
>> Yes, the files are immutable.
>>
>> Regards
>>
>>
>> On Sun, Apr 17, 2016 at 12:25 PM, Vladimir Rodionov <
>> vladrodionov@gmail.com> wrote:
>>
>>> >>  have a project that needs to store large number of image and video
>>> files,
>>> >>the file size varies from 10MB to 10GB, the initial number of files
>>> will
>>> be
>>> >>0.1 billion and would grow over 1 billion, what will be the practical
>>> >>recommendations to store and view these files?
>>> >>
>>> Files are immutable?
>>> Write small files  (less than 1 HDFS block) to large blob (combine them
>>> into single file), store large files
>>> directly to HDFS. Keep path index in HBase.
>>>
>>> If you need to delete files, mark them as deleted in HBase and run
>>> periodically GC job to perform real cleaning.
>>>
>>> -Vlad
>>>
>>> On Sat, Apr 16, 2016 at 7:35 PM, Ted Yu <yu...@gmail.com> wrote:
>>>
>>> > There was HBASE-15370 for backport but it was decided not to backport
>>> the
>>> > feature.
>>> >
>>> > FYI
>>> >
>>> > On Sat, Apr 16, 2016 at 7:26 PM, Ascot Moss <as...@gmail.com>
>>> wrote:
>>> >
>>> > > Hi,
>>> > >
>>> > > About HBase-11339,
>>> > > "The size of the MOB data could not be very large, it better to keep
>>> the
>>> > > MOB size within 100KB and 10MB. Since MOB cells are written into the
>>> > > memstore before flushing, large MOB cells stress the memory in region
>>> > > servers."
>>> > >
>>> > > Can this be resolved if we provide more RAM in region servers? for
>>> > > instances, the servers in the cluster, each has 768GB RAM + 14 x 6T
>>> HDD.
>>> > >
>>> > > regards
>>> > >
>>> > >
>>> > >
>>> > > On Sun, Apr 17, 2016 at 9:56 AM, Ascot Moss <as...@gmail.com>
>>> > wrote:
>>> > >
>>> > > > Thanks Ted!
>>> > > >
>>> > > > Just visited HBASE-11339, its status is "resolved" however, it is
>>> for
>>> > > > "Fix Version : 2.0.0."
>>> > > > How to patch it to current HBase stable version (v1.1.4) ?
>>> > > >
>>> > > > About Fault Tolerance to DataCenter level, I am thinking HBase
>>> > > Replication
>>> > > > method to replicate HBase Tables to another cluster (backup one),
>>> is
>>> > > there
>>> > > > any real world reference about the replication performance, for
>>> > instances
>>> > > > if the bandwidth is 100MB/s?
>>> > > >
>>> > > > Regards
>>> > > >
>>> > > >
>>> > > >
>>> > > >
>>> > > >
>>> > > >
>>> > > >    -
>>> > > >
>>> > > >
>>> > > > On Sun, Apr 17, 2016 at 9:40 AM, Ted Yu <yu...@gmail.com>
>>> wrote:
>>> > > >
>>> > > >> Have you taken a look at HBASE-11339 (HBase MOB) ?
>>> > > >>
>>> > > >> Note: this feature does not handle 10GB objects well. Consider
>>> store
>>> > GB
>>> > > >> image on hdfs.
>>> > > >>
>>> > > >> Cheers
>>> > > >>
>>> > > >> On Sat, Apr 16, 2016 at 6:21 PM, Ascot Moss <ascot.moss@gmail.com
>>> >
>>> > > wrote:
>>> > > >>
>>> > > >> > Hi,
>>> > > >> >
>>> > > >> > I have a project that needs to store large number of image and
>>> video
>>> > > >> files,
>>> > > >> > the file size varies from 10MB to 10GB, the initial number of
>>> files
>>> > > >> will be
>>> > > >> > 0.1 billion and would grow over 1 billion, what will be the
>>> > practical
>>> > > >> > recommendations to store and view these files?
>>> > > >> >
>>> > > >> >
>>> > > >> >
>>> > > >> > #1 One cluster, store the HDFS URL in HBase and store the actual
>>> > file
>>> > > in
>>> > > >> > HDFS? (block_size as 128MB and replication factor as 3)
>>> > > >> >
>>> > > >> >
>>> > > >> > #2 One cluster, Store small files in HBase directly and use #1
>>> for
>>> > > large
>>> > > >> > files? (block_size as 128MB and replication factor as 3)
>>> > > >> >
>>> > > >> >
>>> > > >> > #3 Multiple Hadoop/HBase clusters, each with different
>>> block_size
>>> > > >> settings?
>>> > > >> >
>>> > > >> >
>>> > > >> >      e.g. cluster 1 (small): block_size as 128MB and replication
>>> > > factor
>>> > > >> as
>>> > > >> > 3, store all files in HBase if their file size is smaller 128MB
>>> > > >> >
>>> > > >> >             cluster 2 (large): bigger block_size, say 4GB,
>>> > replication
>>> > > >> > factor as 3, store the HDFS URL in HBase and store the actual
>>> file
>>> > in
>>> > > >> HDFS
>>> > > >> >
>>> > > >> >
>>> > > >> >
>>> > > >> > #4 Use Hadoop Federation for large number of files?
>>> > > >> >
>>> > > >> >
>>> > > >> > About Fault Tolerance, need to consider four types of failures:
>>> > > driver,
>>> > > >> > host, rack, and  datacenter failures.
>>> > > >> >
>>> > > >> >
>>> > > >> > Regards
>>> > > >> >
>>> > > >>
>>> > > >
>>> > > >
>>> > >
>>> >
>>>
>>
>>
>

Re: To Store Large Number of Video and Image files

Posted by Ascot Moss <as...@gmail.com>.
Hi,

Any idea about the implementation of Facebook f4? Does it use HBase as the
indexer?

Regards

On Sun, Apr 17, 2016 at 2:36 PM, Ascot Moss <as...@gmail.com> wrote:

> Hi,
>
> Yes, the files are immutable.
>
> Regards
>
>
> On Sun, Apr 17, 2016 at 12:25 PM, Vladimir Rodionov <
> vladrodionov@gmail.com> wrote:
>
>> >>  have a project that needs to store large number of image and video
>> files,
>> >>the file size varies from 10MB to 10GB, the initial number of files will
>> be
>> >>0.1 billion and would grow over 1 billion, what will be the practical
>> >>recommendations to store and view these files?
>> >>
>> Files are immutable?
>> Write small files  (less than 1 HDFS block) to large blob (combine them
>> into single file), store large files
>> directly to HDFS. Keep path index in HBase.
>>
>> If you need to delete files, mark them as deleted in HBase and run
>> periodically GC job to perform real cleaning.
>>
>> -Vlad
>>
>> On Sat, Apr 16, 2016 at 7:35 PM, Ted Yu <yu...@gmail.com> wrote:
>>
>> > There was HBASE-15370 for backport but it was decided not to backport
>> the
>> > feature.
>> >
>> > FYI
>> >
>> > On Sat, Apr 16, 2016 at 7:26 PM, Ascot Moss <as...@gmail.com>
>> wrote:
>> >
>> > > Hi,
>> > >
>> > > About HBase-11339,
>> > > "The size of the MOB data could not be very large, it better to keep
>> the
>> > > MOB size within 100KB and 10MB. Since MOB cells are written into the
>> > > memstore before flushing, large MOB cells stress the memory in region
>> > > servers."
>> > >
>> > > Can this be resolved if we provide more RAM in region servers? for
>> > > instances, the servers in the cluster, each has 768GB RAM + 14 x 6T
>> HDD.
>> > >
>> > > regards
>> > >
>> > >
>> > >
>> > > On Sun, Apr 17, 2016 at 9:56 AM, Ascot Moss <as...@gmail.com>
>> > wrote:
>> > >
>> > > > Thanks Ted!
>> > > >
>> > > > Just visited HBASE-11339, its status is "resolved" however, it is
>> for
>> > > > "Fix Version : 2.0.0."
>> > > > How to patch it to current HBase stable version (v1.1.4) ?
>> > > >
>> > > > About Fault Tolerance to DataCenter level, I am thinking HBase
>> > > Replication
>> > > > method to replicate HBase Tables to another cluster (backup one), is
>> > > there
>> > > > any real world reference about the replication performance, for
>> > instances
>> > > > if the bandwidth is 100MB/s?
>> > > >
>> > > > Regards
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >    -
>> > > >
>> > > >
>> > > > On Sun, Apr 17, 2016 at 9:40 AM, Ted Yu <yu...@gmail.com>
>> wrote:
>> > > >
>> > > >> Have you taken a look at HBASE-11339 (HBase MOB) ?
>> > > >>
>> > > >> Note: this feature does not handle 10GB objects well. Consider
>> store
>> > GB
>> > > >> image on hdfs.
>> > > >>
>> > > >> Cheers
>> > > >>
>> > > >> On Sat, Apr 16, 2016 at 6:21 PM, Ascot Moss <as...@gmail.com>
>> > > wrote:
>> > > >>
>> > > >> > Hi,
>> > > >> >
>> > > >> > I have a project that needs to store large number of image and
>> video
>> > > >> files,
>> > > >> > the file size varies from 10MB to 10GB, the initial number of
>> files
>> > > >> will be
>> > > >> > 0.1 billion and would grow over 1 billion, what will be the
>> > practical
>> > > >> > recommendations to store and view these files?
>> > > >> >
>> > > >> >
>> > > >> >
>> > > >> > #1 One cluster, store the HDFS URL in HBase and store the actual
>> > file
>> > > in
>> > > >> > HDFS? (block_size as 128MB and replication factor as 3)
>> > > >> >
>> > > >> >
>> > > >> > #2 One cluster, Store small files in HBase directly and use #1
>> for
>> > > large
>> > > >> > files? (block_size as 128MB and replication factor as 3)
>> > > >> >
>> > > >> >
>> > > >> > #3 Multiple Hadoop/HBase clusters, each with different block_size
>> > > >> settings?
>> > > >> >
>> > > >> >
>> > > >> >      e.g. cluster 1 (small): block_size as 128MB and replication
>> > > factor
>> > > >> as
>> > > >> > 3, store all files in HBase if their file size is smaller 128MB
>> > > >> >
>> > > >> >             cluster 2 (large): bigger block_size, say 4GB,
>> > replication
>> > > >> > factor as 3, store the HDFS URL in HBase and store the actual
>> file
>> > in
>> > > >> HDFS
>> > > >> >
>> > > >> >
>> > > >> >
>> > > >> > #4 Use Hadoop Federation for large number of files?
>> > > >> >
>> > > >> >
>> > > >> > About Fault Tolerance, need to consider four types of failures:
>> > > driver,
>> > > >> > host, rack, and  datacenter failures.
>> > > >> >
>> > > >> >
>> > > >> > Regards
>> > > >> >
>> > > >>
>> > > >
>> > > >
>> > >
>> >
>>
>
>

Re: To Store Large Number of Video and Image files

Posted by Ascot Moss <as...@gmail.com>.
Hi,

Yes, the files are immutable.

Regards


On Sun, Apr 17, 2016 at 12:25 PM, Vladimir Rodionov <vl...@gmail.com>
wrote:

> >>  have a project that needs to store large number of image and video
> files,
> >>the file size varies from 10MB to 10GB, the initial number of files will
> be
> >>0.1 billion and would grow over 1 billion, what will be the practical
> >>recommendations to store and view these files?
> >>
> Files are immutable?
> Write small files  (less than 1 HDFS block) to large blob (combine them
> into single file), store large files
> directly to HDFS. Keep path index in HBase.
>
> If you need to delete files, mark them as deleted in HBase and run
> periodically GC job to perform real cleaning.
>
> -Vlad
>
> On Sat, Apr 16, 2016 at 7:35 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > There was HBASE-15370 for backport but it was decided not to backport the
> > feature.
> >
> > FYI
> >
> > On Sat, Apr 16, 2016 at 7:26 PM, Ascot Moss <as...@gmail.com>
> wrote:
> >
> > > Hi,
> > >
> > > About HBase-11339,
> > > "The size of the MOB data could not be very large, it better to keep
> the
> > > MOB size within 100KB and 10MB. Since MOB cells are written into the
> > > memstore before flushing, large MOB cells stress the memory in region
> > > servers."
> > >
> > > Can this be resolved if we provide more RAM in region servers? for
> > > instances, the servers in the cluster, each has 768GB RAM + 14 x 6T
> HDD.
> > >
> > > regards
> > >
> > >
> > >
> > > On Sun, Apr 17, 2016 at 9:56 AM, Ascot Moss <as...@gmail.com>
> > wrote:
> > >
> > > > Thanks Ted!
> > > >
> > > > Just visited HBASE-11339, its status is "resolved" however, it is for
> > > > "Fix Version : 2.0.0."
> > > > How to patch it to current HBase stable version (v1.1.4) ?
> > > >
> > > > About Fault Tolerance to DataCenter level, I am thinking HBase
> > > Replication
> > > > method to replicate HBase Tables to another cluster (backup one), is
> > > there
> > > > any real world reference about the replication performance, for
> > instances
> > > > if the bandwidth is 100MB/s?
> > > >
> > > > Regards
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >    -
> > > >
> > > >
> > > > On Sun, Apr 17, 2016 at 9:40 AM, Ted Yu <yu...@gmail.com> wrote:
> > > >
> > > >> Have you taken a look at HBASE-11339 (HBase MOB) ?
> > > >>
> > > >> Note: this feature does not handle 10GB objects well. Consider store
> > GB
> > > >> image on hdfs.
> > > >>
> > > >> Cheers
> > > >>
> > > >> On Sat, Apr 16, 2016 at 6:21 PM, Ascot Moss <as...@gmail.com>
> > > wrote:
> > > >>
> > > >> > Hi,
> > > >> >
> > > >> > I have a project that needs to store large number of image and
> video
> > > >> files,
> > > >> > the file size varies from 10MB to 10GB, the initial number of
> files
> > > >> will be
> > > >> > 0.1 billion and would grow over 1 billion, what will be the
> > practical
> > > >> > recommendations to store and view these files?
> > > >> >
> > > >> >
> > > >> >
> > > >> > #1 One cluster, store the HDFS URL in HBase and store the actual
> > file
> > > in
> > > >> > HDFS? (block_size as 128MB and replication factor as 3)
> > > >> >
> > > >> >
> > > >> > #2 One cluster, Store small files in HBase directly and use #1 for
> > > large
> > > >> > files? (block_size as 128MB and replication factor as 3)
> > > >> >
> > > >> >
> > > >> > #3 Multiple Hadoop/HBase clusters, each with different block_size
> > > >> settings?
> > > >> >
> > > >> >
> > > >> >      e.g. cluster 1 (small): block_size as 128MB and replication
> > > factor
> > > >> as
> > > >> > 3, store all files in HBase if their file size is smaller 128MB
> > > >> >
> > > >> >             cluster 2 (large): bigger block_size, say 4GB,
> > replication
> > > >> > factor as 3, store the HDFS URL in HBase and store the actual file
> > in
> > > >> HDFS
> > > >> >
> > > >> >
> > > >> >
> > > >> > #4 Use Hadoop Federation for large number of files?
> > > >> >
> > > >> >
> > > >> > About Fault Tolerance, need to consider four types of failures:
> > > driver,
> > > >> > host, rack, and  datacenter failures.
> > > >> >
> > > >> >
> > > >> > Regards
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
>

Re: To Store Large Number of Video and Image files

Posted by Vladimir Rodionov <vl...@gmail.com>.
>>  have a project that needs to store large number of image and video
files,
>>the file size varies from 10MB to 10GB, the initial number of files will
be
>>0.1 billion and would grow over 1 billion, what will be the practical
>>recommendations to store and view these files?
>>
Files are immutable?
Write small files  (less than 1 HDFS block) to large blob (combine them
into single file), store large files
directly to HDFS. Keep path index in HBase.

If you need to delete files, mark them as deleted in HBase and run
periodically GC job to perform real cleaning.

-Vlad

On Sat, Apr 16, 2016 at 7:35 PM, Ted Yu <yu...@gmail.com> wrote:

> There was HBASE-15370 for backport but it was decided not to backport the
> feature.
>
> FYI
>
> On Sat, Apr 16, 2016 at 7:26 PM, Ascot Moss <as...@gmail.com> wrote:
>
> > Hi,
> >
> > About HBase-11339,
> > "The size of the MOB data could not be very large, it better to keep the
> > MOB size within 100KB and 10MB. Since MOB cells are written into the
> > memstore before flushing, large MOB cells stress the memory in region
> > servers."
> >
> > Can this be resolved if we provide more RAM in region servers? for
> > instances, the servers in the cluster, each has 768GB RAM + 14 x 6T HDD.
> >
> > regards
> >
> >
> >
> > On Sun, Apr 17, 2016 at 9:56 AM, Ascot Moss <as...@gmail.com>
> wrote:
> >
> > > Thanks Ted!
> > >
> > > Just visited HBASE-11339, its status is "resolved" however, it is for
> > > "Fix Version : 2.0.0."
> > > How to patch it to current HBase stable version (v1.1.4) ?
> > >
> > > About Fault Tolerance to DataCenter level, I am thinking HBase
> > Replication
> > > method to replicate HBase Tables to another cluster (backup one), is
> > there
> > > any real world reference about the replication performance, for
> instances
> > > if the bandwidth is 100MB/s?
> > >
> > > Regards
> > >
> > >
> > >
> > >
> > >
> > >
> > >    -
> > >
> > >
> > > On Sun, Apr 17, 2016 at 9:40 AM, Ted Yu <yu...@gmail.com> wrote:
> > >
> > >> Have you taken a look at HBASE-11339 (HBase MOB) ?
> > >>
> > >> Note: this feature does not handle 10GB objects well. Consider store
> GB
> > >> image on hdfs.
> > >>
> > >> Cheers
> > >>
> > >> On Sat, Apr 16, 2016 at 6:21 PM, Ascot Moss <as...@gmail.com>
> > wrote:
> > >>
> > >> > Hi,
> > >> >
> > >> > I have a project that needs to store large number of image and video
> > >> files,
> > >> > the file size varies from 10MB to 10GB, the initial number of files
> > >> will be
> > >> > 0.1 billion and would grow over 1 billion, what will be the
> practical
> > >> > recommendations to store and view these files?
> > >> >
> > >> >
> > >> >
> > >> > #1 One cluster, store the HDFS URL in HBase and store the actual
> file
> > in
> > >> > HDFS? (block_size as 128MB and replication factor as 3)
> > >> >
> > >> >
> > >> > #2 One cluster, Store small files in HBase directly and use #1 for
> > large
> > >> > files? (block_size as 128MB and replication factor as 3)
> > >> >
> > >> >
> > >> > #3 Multiple Hadoop/HBase clusters, each with different block_size
> > >> settings?
> > >> >
> > >> >
> > >> >      e.g. cluster 1 (small): block_size as 128MB and replication
> > factor
> > >> as
> > >> > 3, store all files in HBase if their file size is smaller 128MB
> > >> >
> > >> >             cluster 2 (large): bigger block_size, say 4GB,
> replication
> > >> > factor as 3, store the HDFS URL in HBase and store the actual file
> in
> > >> HDFS
> > >> >
> > >> >
> > >> >
> > >> > #4 Use Hadoop Federation for large number of files?
> > >> >
> > >> >
> > >> > About Fault Tolerance, need to consider four types of failures:
> > driver,
> > >> > host, rack, and  datacenter failures.
> > >> >
> > >> >
> > >> > Regards
> > >> >
> > >>
> > >
> > >
> >
>

Re: To Store Large Number of Video and Image files

Posted by Ted Yu <yu...@gmail.com>.
There was HBASE-15370 for backport but it was decided not to backport the
feature.

FYI

On Sat, Apr 16, 2016 at 7:26 PM, Ascot Moss <as...@gmail.com> wrote:

> Hi,
>
> About HBase-11339,
> "The size of the MOB data could not be very large, it better to keep the
> MOB size within 100KB and 10MB. Since MOB cells are written into the
> memstore before flushing, large MOB cells stress the memory in region
> servers."
>
> Can this be resolved if we provide more RAM in region servers? for
> instances, the servers in the cluster, each has 768GB RAM + 14 x 6T HDD.
>
> regards
>
>
>
> On Sun, Apr 17, 2016 at 9:56 AM, Ascot Moss <as...@gmail.com> wrote:
>
> > Thanks Ted!
> >
> > Just visited HBASE-11339, its status is "resolved" however, it is for
> > "Fix Version : 2.0.0."
> > How to patch it to current HBase stable version (v1.1.4) ?
> >
> > About Fault Tolerance to DataCenter level, I am thinking HBase
> Replication
> > method to replicate HBase Tables to another cluster (backup one), is
> there
> > any real world reference about the replication performance, for instances
> > if the bandwidth is 100MB/s?
> >
> > Regards
> >
> >
> >
> >
> >
> >
> >    -
> >
> >
> > On Sun, Apr 17, 2016 at 9:40 AM, Ted Yu <yu...@gmail.com> wrote:
> >
> >> Have you taken a look at HBASE-11339 (HBase MOB) ?
> >>
> >> Note: this feature does not handle 10GB objects well. Consider store GB
> >> image on hdfs.
> >>
> >> Cheers
> >>
> >> On Sat, Apr 16, 2016 at 6:21 PM, Ascot Moss <as...@gmail.com>
> wrote:
> >>
> >> > Hi,
> >> >
> >> > I have a project that needs to store large number of image and video
> >> files,
> >> > the file size varies from 10MB to 10GB, the initial number of files
> >> will be
> >> > 0.1 billion and would grow over 1 billion, what will be the practical
> >> > recommendations to store and view these files?
> >> >
> >> >
> >> >
> >> > #1 One cluster, store the HDFS URL in HBase and store the actual file
> in
> >> > HDFS? (block_size as 128MB and replication factor as 3)
> >> >
> >> >
> >> > #2 One cluster, Store small files in HBase directly and use #1 for
> large
> >> > files? (block_size as 128MB and replication factor as 3)
> >> >
> >> >
> >> > #3 Multiple Hadoop/HBase clusters, each with different block_size
> >> settings?
> >> >
> >> >
> >> >      e.g. cluster 1 (small): block_size as 128MB and replication
> factor
> >> as
> >> > 3, store all files in HBase if their file size is smaller 128MB
> >> >
> >> >             cluster 2 (large): bigger block_size, say 4GB, replication
> >> > factor as 3, store the HDFS URL in HBase and store the actual file in
> >> HDFS
> >> >
> >> >
> >> >
> >> > #4 Use Hadoop Federation for large number of files?
> >> >
> >> >
> >> > About Fault Tolerance, need to consider four types of failures:
> driver,
> >> > host, rack, and  datacenter failures.
> >> >
> >> >
> >> > Regards
> >> >
> >>
> >
> >
>

Re: To Store Large Number of Video and Image files

Posted by Ascot Moss <as...@gmail.com>.
Hi,

About HBase-11339,
"The size of the MOB data could not be very large, it better to keep the
MOB size within 100KB and 10MB. Since MOB cells are written into the
memstore before flushing, large MOB cells stress the memory in region
servers."

Can this be resolved if we provide more RAM in region servers? for
instances, the servers in the cluster, each has 768GB RAM + 14 x 6T HDD.

regards



On Sun, Apr 17, 2016 at 9:56 AM, Ascot Moss <as...@gmail.com> wrote:

> Thanks Ted!
>
> Just visited HBASE-11339, its status is "resolved" however, it is for
> "Fix Version : 2.0.0."
> How to patch it to current HBase stable version (v1.1.4) ?
>
> About Fault Tolerance to DataCenter level, I am thinking HBase Replication
> method to replicate HBase Tables to another cluster (backup one), is there
> any real world reference about the replication performance, for instances
> if the bandwidth is 100MB/s?
>
> Regards
>
>
>
>
>
>
>    -
>
>
> On Sun, Apr 17, 2016 at 9:40 AM, Ted Yu <yu...@gmail.com> wrote:
>
>> Have you taken a look at HBASE-11339 (HBase MOB) ?
>>
>> Note: this feature does not handle 10GB objects well. Consider store GB
>> image on hdfs.
>>
>> Cheers
>>
>> On Sat, Apr 16, 2016 at 6:21 PM, Ascot Moss <as...@gmail.com> wrote:
>>
>> > Hi,
>> >
>> > I have a project that needs to store large number of image and video
>> files,
>> > the file size varies from 10MB to 10GB, the initial number of files
>> will be
>> > 0.1 billion and would grow over 1 billion, what will be the practical
>> > recommendations to store and view these files?
>> >
>> >
>> >
>> > #1 One cluster, store the HDFS URL in HBase and store the actual file in
>> > HDFS? (block_size as 128MB and replication factor as 3)
>> >
>> >
>> > #2 One cluster, Store small files in HBase directly and use #1 for large
>> > files? (block_size as 128MB and replication factor as 3)
>> >
>> >
>> > #3 Multiple Hadoop/HBase clusters, each with different block_size
>> settings?
>> >
>> >
>> >      e.g. cluster 1 (small): block_size as 128MB and replication factor
>> as
>> > 3, store all files in HBase if their file size is smaller 128MB
>> >
>> >             cluster 2 (large): bigger block_size, say 4GB, replication
>> > factor as 3, store the HDFS URL in HBase and store the actual file in
>> HDFS
>> >
>> >
>> >
>> > #4 Use Hadoop Federation for large number of files?
>> >
>> >
>> > About Fault Tolerance, need to consider four types of failures: driver,
>> > host, rack, and  datacenter failures.
>> >
>> >
>> > Regards
>> >
>>
>
>

Re: To Store Large Number of Video and Image files

Posted by Ascot Moss <as...@gmail.com>.
Thanks Ted!

Just visited HBASE-11339, its status is "resolved" however, it is for "Fix
Version : 2.0.0."
How to patch it to current HBase stable version (v1.1.4) ?

About Fault Tolerance to DataCenter level, I am thinking HBase Replication
method to replicate HBase Tables to another cluster (backup one), is there
any real world reference about the replication performance, for instances
if the bandwidth is 100MB/s?

Regards






   -


On Sun, Apr 17, 2016 at 9:40 AM, Ted Yu <yu...@gmail.com> wrote:

> Have you taken a look at HBASE-11339 (HBase MOB) ?
>
> Note: this feature does not handle 10GB objects well. Consider store GB
> image on hdfs.
>
> Cheers
>
> On Sat, Apr 16, 2016 at 6:21 PM, Ascot Moss <as...@gmail.com> wrote:
>
> > Hi,
> >
> > I have a project that needs to store large number of image and video
> files,
> > the file size varies from 10MB to 10GB, the initial number of files will
> be
> > 0.1 billion and would grow over 1 billion, what will be the practical
> > recommendations to store and view these files?
> >
> >
> >
> > #1 One cluster, store the HDFS URL in HBase and store the actual file in
> > HDFS? (block_size as 128MB and replication factor as 3)
> >
> >
> > #2 One cluster, Store small files in HBase directly and use #1 for large
> > files? (block_size as 128MB and replication factor as 3)
> >
> >
> > #3 Multiple Hadoop/HBase clusters, each with different block_size
> settings?
> >
> >
> >      e.g. cluster 1 (small): block_size as 128MB and replication factor
> as
> > 3, store all files in HBase if their file size is smaller 128MB
> >
> >             cluster 2 (large): bigger block_size, say 4GB, replication
> > factor as 3, store the HDFS URL in HBase and store the actual file in
> HDFS
> >
> >
> >
> > #4 Use Hadoop Federation for large number of files?
> >
> >
> > About Fault Tolerance, need to consider four types of failures: driver,
> > host, rack, and  datacenter failures.
> >
> >
> > Regards
> >
>

Re: To Store Large Number of Video and Image files

Posted by Ted Yu <yu...@gmail.com>.
Have you taken a look at HBASE-11339 (HBase MOB) ?

Note: this feature does not handle 10GB objects well. Consider store GB
image on hdfs.

Cheers

On Sat, Apr 16, 2016 at 6:21 PM, Ascot Moss <as...@gmail.com> wrote:

> Hi,
>
> I have a project that needs to store large number of image and video files,
> the file size varies from 10MB to 10GB, the initial number of files will be
> 0.1 billion and would grow over 1 billion, what will be the practical
> recommendations to store and view these files?
>
>
>
> #1 One cluster, store the HDFS URL in HBase and store the actual file in
> HDFS? (block_size as 128MB and replication factor as 3)
>
>
> #2 One cluster, Store small files in HBase directly and use #1 for large
> files? (block_size as 128MB and replication factor as 3)
>
>
> #3 Multiple Hadoop/HBase clusters, each with different block_size settings?
>
>
>      e.g. cluster 1 (small): block_size as 128MB and replication factor as
> 3, store all files in HBase if their file size is smaller 128MB
>
>             cluster 2 (large): bigger block_size, say 4GB, replication
> factor as 3, store the HDFS URL in HBase and store the actual file in HDFS
>
>
>
> #4 Use Hadoop Federation for large number of files?
>
>
> About Fault Tolerance, need to consider four types of failures: driver,
> host, rack, and  datacenter failures.
>
>
> Regards
>