You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Upendra Yadav <up...@gmail.com> on 2014/02/25 08:25:01 UTC

Is HBase is feasible for storing 4-5 MB of data as cell value

I have to use hbase and have mix type of data

Some of them have size 1-4K(Mail- Header....) and others
>5MB(Attachments...)

And also we need only random access: any data

Is HBase is feasible for storing this type of data

What will be my schema design -
will have to go with 2 different Table -> 1st one for  1-4K and 2nd for big
file
(because of memstore flush will flush other CF, and huge random access)

Or there is other way:;

Thanks

RE: Is HBase is feasible for storing 4-5 MB of data as cell value

Posted by Wei Tan <wt...@us.ibm.com>.
Image :)
Best regards,
Wei

---------------------------------
Wei Tan, PhD
Research Staff Member
IBM T. J. Watson Research Center
http://researcher.ibm.com/person/us-wtan



From:   Vladimir Rodionov <vr...@carrieriq.com>
To:     "user@hbase.apache.org" <us...@hbase.apache.org>, 
Date:   02/27/2014 01:22 AM
Subject:        RE: Is HBase is feasible for storing 4-5 MB of data as 
cell value




What type of analytics are you going to do on medium sized objects (1M)?

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: Wei Tan [wtan@us.ibm.com]
Sent: Wednesday, February 26, 2014 9:48 PM
To: user@hbase.apache.org
Subject: Re: Is HBase is feasible for storing 4-5 MB of data as cell value

I am thinking of storing medium sized objects (~1M) using HBase. The
advantage of using HBase rather than HBase (storing pointers) + HDFS, in
my mind, is:
data locality. When I want to run analytics, I will access these objects
using HBase scan, and HBase stores KVs in a sequential manner. If I use
HDFS, there is no guarantee that row 1 and row 2's files are adjacent to
each other.
store small files in HDFS is not efficient. Facebook's Haystack sort of
stitch small files together, while HBase achieve the same effect.


Any disadvantage I missed? I am also thinking of larger block size given
the object size.

Thanks,
Wei



From:   Upendra Yadav <up...@gmail.com>
To:     user@hbase.apache.org,
Date:   02/25/2014 03:31 PM
Subject:        Re: Is HBase is feasible for storing 4-5 MB of data as
cell value



Me too realize same what you suggest...: (Keep them in a separate files in
HDFS and store in HBase only references)

will try several attachments into a single file...

And Thanks a lot...


On Wed, Feb 26, 2014 at 1:45 AM, Vladimir Rodionov
<vr...@carrieriq.com>wrote:

> Usually, it is not advisable to store such a large values in HBase (to
> avoid excessive IO during compaction).
> Keep them in a separate files in HDFS and store in HBase only
references.
> To overcome inherent max file number limitation of NN
> you can bulk several values into a single file (you will need separate
> process -M/R job to garbage collect expired or deleted items).
>
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: vrodionov@carrieriq.com
>
> ________________________________________
> From: Ted Yu [yuzhihong@gmail.com]
> Sent: Tuesday, February 25, 2014 12:02 PM
> To: user@hbase.apache.org
> Subject: Re: Is HBase is feasible for storing 4-5 MB of data as cell
value
>
> Minor:
> Value 0 also means no cap - see HTable#validatePut()
>
>     if (maxKeyValueSize > 0) {
>
> ...
>
>           if (kv.getLength() > maxKeyValueSize) {
>
>             throw new IllegalArgumentException("KeyValue size too
large");
>
>           }
>
>
> On Tue, Feb 25, 2014 at 11:52 AM, Ameya Kanitkar <am...@groupon.com>
> wrote:
>
> > The only other thing I'd add is, by default HBase caps size of the
data
> per
> > column at 10 MB (I think). You can change that by changing this
setting:
> >
> > hbase.client.keyvalue.maxsize
> > in hbase-site.xml
> >
> > -1 means no cap. You can put other numbers for appropriate cap for
your
> use
> > case.
> >
> > Ameya
> >
> >
> > On Tue, Feb 25, 2014 at 12:12 AM, shashwat shriparv <
> > dwivedishashwat@gmail.com> wrote:
> >
> > > Yes for sure you can use hbase for this, you can have
> > > 1. different fields of mail in different column of a column family
and
> > > attachment as a binary array also in a column.
> > > 2. you can keep whole message in columns in hbase and the
attachments
> are
> > > large enoug on the hdfs and some reference to it in hbase table.
> > > 3. schema you can decide, you can have a matrix how you store values
to
> > > that you can decide.
> > >
> > >
> > > *Warm Regards_**∞_*
> > > * Shashwat Shriparv*
> > >  [image: http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9]<
> > > http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9>[image:
> > > https://twitter.com/shriparv] <https://twitter.com/shriparv>[image:
> > > https://www.facebook.com/shriparv] <
https://www.facebook.com/shriparv
> > > >[image:
> > > http://google.com/+ShashwatShriparv]
> > > <http://google.com/+ShashwatShriparv>[image:
> > > http://www.youtube.com/user/sShriparv/videos]<
> > > http://www.youtube.com/user/sShriparv/videos>[image:
> > > http://profile.yahoo.com/SWXSTW3DVSDTF2HHSRM47AV6DI/] <
> > shriparv@yahoo.com>
> > >
> > >
> > >
> > > On Tue, Feb 25, 2014 at 12:55 PM, Upendra Yadav
<upendra1024@gmail.com
> > > >wrote:
> > >
> > > > I have to use hbase and have mix type of data
> > > >
> > > > Some of them have size 1-4K(Mail- Header....) and others
> > > > >5MB(Attachments...)
> > > >
> > > > And also we need only random access: any data
> > > >
> > > > Is HBase is feasible for storing this type of data
> > > >
> > > > What will be my schema design -
> > > > will have to go with 2 different Table -> 1st one for  1-4K and
2nd
> for
> > > big
> > > > file
> > > > (because of memstore flush will flush other CF, and huge random
> access)
> > > >
> > > > Or there is other way:;
> > > >
> > > > Thanks
> > > >
> > >
> >
>
> Confidentiality Notice:  The information contained in this message,
> including any attachments hereto, may be confidential and is intended to
be
> read only by the individual or entity to whom this message is addressed.
If
> the reader of this message is not the intended recipient or an agent or
> designee of the intended recipient, please note that any review, use,
> disclosure or distribution of this message or its attachments, in any
form,
> is strictly prohibited.  If you have received this message in error,
please
> immediately notify the sender and/or Notifications@carrieriq.com and
> delete or destroy any copy of this message and its attachments.
>


Confidentiality Notice:  The information contained in this message, 
including any attachments hereto, may be confidential and is intended to 
be read only by the individual or entity to whom this message is 
addressed. If the reader of this message is not the intended recipient or 
an agent or designee of the intended recipient, please note that any 
review, use, disclosure or distribution of this message or its 
attachments, in any form, is strictly prohibited.  If you have received 
this message in error, please immediately notify the sender and/or 
Notifications@carrieriq.com and delete or destroy any copy of this message 
and its attachments.



RE: Is HBase is feasible for storing 4-5 MB of data as cell value

Posted by Vladimir Rodionov <vr...@carrieriq.com>.
What type of analytics are you going to do on medium sized objects (1M)?

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: Wei Tan [wtan@us.ibm.com]
Sent: Wednesday, February 26, 2014 9:48 PM
To: user@hbase.apache.org
Subject: Re: Is HBase is feasible for storing 4-5 MB of data as cell value

I am thinking of storing medium sized objects (~1M) using HBase. The
advantage of using HBase rather than HBase (storing pointers) + HDFS, in
my mind, is:
data locality. When I want to run analytics, I will access these objects
using HBase scan, and HBase stores KVs in a sequential manner. If I use
HDFS, there is no guarantee that row 1 and row 2's files are adjacent to
each other.
store small files in HDFS is not efficient. Facebook's Haystack sort of
stitch small files together, while HBase achieve the same effect.


Any disadvantage I missed? I am also thinking of larger block size given
the object size.

Thanks,
Wei



From:   Upendra Yadav <up...@gmail.com>
To:     user@hbase.apache.org,
Date:   02/25/2014 03:31 PM
Subject:        Re: Is HBase is feasible for storing 4-5 MB of data as
cell value



Me too realize same what you suggest...: (Keep them in a separate files in
HDFS and store in HBase only references)

will try several attachments into a single file...

And Thanks a lot...


On Wed, Feb 26, 2014 at 1:45 AM, Vladimir Rodionov
<vr...@carrieriq.com>wrote:

> Usually, it is not advisable to store such a large values in HBase (to
> avoid excessive IO during compaction).
> Keep them in a separate files in HDFS and store in HBase only
references.
> To overcome inherent max file number limitation of NN
> you can bulk several values into a single file (you will need separate
> process -M/R job to garbage collect expired or deleted items).
>
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: vrodionov@carrieriq.com
>
> ________________________________________
> From: Ted Yu [yuzhihong@gmail.com]
> Sent: Tuesday, February 25, 2014 12:02 PM
> To: user@hbase.apache.org
> Subject: Re: Is HBase is feasible for storing 4-5 MB of data as cell
value
>
> Minor:
> Value 0 also means no cap - see HTable#validatePut()
>
>     if (maxKeyValueSize > 0) {
>
> ...
>
>           if (kv.getLength() > maxKeyValueSize) {
>
>             throw new IllegalArgumentException("KeyValue size too
large");
>
>           }
>
>
> On Tue, Feb 25, 2014 at 11:52 AM, Ameya Kanitkar <am...@groupon.com>
> wrote:
>
> > The only other thing I'd add is, by default HBase caps size of the
data
> per
> > column at 10 MB (I think). You can change that by changing this
setting:
> >
> > hbase.client.keyvalue.maxsize
> > in hbase-site.xml
> >
> > -1 means no cap. You can put other numbers for appropriate cap for
your
> use
> > case.
> >
> > Ameya
> >
> >
> > On Tue, Feb 25, 2014 at 12:12 AM, shashwat shriparv <
> > dwivedishashwat@gmail.com> wrote:
> >
> > > Yes for sure you can use hbase for this, you can have
> > > 1. different fields of mail in different column of a column family
and
> > > attachment as a binary array also in a column.
> > > 2. you can keep whole message in columns in hbase and the
attachments
> are
> > > large enoug on the hdfs and some reference to it in hbase table.
> > > 3. schema you can decide, you can have a matrix how you store values
to
> > > that you can decide.
> > >
> > >
> > > *Warm Regards_**∞_*
> > > * Shashwat Shriparv*
> > >  [image: http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9]<
> > > http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9>[image:
> > > https://twitter.com/shriparv] <https://twitter.com/shriparv>[image:
> > > https://www.facebook.com/shriparv] <
https://www.facebook.com/shriparv
> > > >[image:
> > > http://google.com/+ShashwatShriparv]
> > > <http://google.com/+ShashwatShriparv>[image:
> > > http://www.youtube.com/user/sShriparv/videos]<
> > > http://www.youtube.com/user/sShriparv/videos>[image:
> > > http://profile.yahoo.com/SWXSTW3DVSDTF2HHSRM47AV6DI/] <
> > shriparv@yahoo.com>
> > >
> > >
> > >
> > > On Tue, Feb 25, 2014 at 12:55 PM, Upendra Yadav
<upendra1024@gmail.com
> > > >wrote:
> > >
> > > > I have to use hbase and have mix type of data
> > > >
> > > > Some of them have size 1-4K(Mail- Header....) and others
> > > > >5MB(Attachments...)
> > > >
> > > > And also we need only random access: any data
> > > >
> > > > Is HBase is feasible for storing this type of data
> > > >
> > > > What will be my schema design -
> > > > will have to go with 2 different Table -> 1st one for  1-4K and
2nd
> for
> > > big
> > > > file
> > > > (because of memstore flush will flush other CF, and huge random
> access)
> > > >
> > > > Or there is other way:;
> > > >
> > > > Thanks
> > > >
> > >
> >
>
> Confidentiality Notice:  The information contained in this message,
> including any attachments hereto, may be confidential and is intended to
be
> read only by the individual or entity to whom this message is addressed.
If
> the reader of this message is not the intended recipient or an agent or
> designee of the intended recipient, please note that any review, use,
> disclosure or distribution of this message or its attachments, in any
form,
> is strictly prohibited.  If you have received this message in error,
please
> immediately notify the sender and/or Notifications@carrieriq.com and
> delete or destroy any copy of this message and its attachments.
>


Confidentiality Notice:  The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com and delete or destroy any copy of this message and its attachments.

Re: Is HBase is feasible for storing 4-5 MB of data as cell value

Posted by Wei Tan <wt...@us.ibm.com>.
I am thinking of storing medium sized objects (~1M) using HBase. The 
advantage of using HBase rather than HBase (storing pointers) + HDFS, in 
my mind, is:
data locality. When I want to run analytics, I will access these objects 
using HBase scan, and HBase stores KVs in a sequential manner. If I use 
HDFS, there is no guarantee that row 1 and row 2's files are adjacent to 
each other.
store small files in HDFS is not efficient. Facebook's Haystack sort of 
stitch small files together, while HBase achieve the same effect.


Any disadvantage I missed? I am also thinking of larger block size given 
the object size.

Thanks,
Wei



From:   Upendra Yadav <up...@gmail.com>
To:     user@hbase.apache.org, 
Date:   02/25/2014 03:31 PM
Subject:        Re: Is HBase is feasible for storing 4-5 MB of data as 
cell value



Me too realize same what you suggest...: (Keep them in a separate files in
HDFS and store in HBase only references)

will try several attachments into a single file...

And Thanks a lot...


On Wed, Feb 26, 2014 at 1:45 AM, Vladimir Rodionov
<vr...@carrieriq.com>wrote:

> Usually, it is not advisable to store such a large values in HBase (to
> avoid excessive IO during compaction).
> Keep them in a separate files in HDFS and store in HBase only 
references.
> To overcome inherent max file number limitation of NN
> you can bulk several values into a single file (you will need separate
> process -M/R job to garbage collect expired or deleted items).
>
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: vrodionov@carrieriq.com
>
> ________________________________________
> From: Ted Yu [yuzhihong@gmail.com]
> Sent: Tuesday, February 25, 2014 12:02 PM
> To: user@hbase.apache.org
> Subject: Re: Is HBase is feasible for storing 4-5 MB of data as cell 
value
>
> Minor:
> Value 0 also means no cap - see HTable#validatePut()
>
>     if (maxKeyValueSize > 0) {
>
> ...
>
>           if (kv.getLength() > maxKeyValueSize) {
>
>             throw new IllegalArgumentException("KeyValue size too 
large");
>
>           }
>
>
> On Tue, Feb 25, 2014 at 11:52 AM, Ameya Kanitkar <am...@groupon.com>
> wrote:
>
> > The only other thing I'd add is, by default HBase caps size of the 
data
> per
> > column at 10 MB (I think). You can change that by changing this 
setting:
> >
> > hbase.client.keyvalue.maxsize
> > in hbase-site.xml
> >
> > -1 means no cap. You can put other numbers for appropriate cap for 
your
> use
> > case.
> >
> > Ameya
> >
> >
> > On Tue, Feb 25, 2014 at 12:12 AM, shashwat shriparv <
> > dwivedishashwat@gmail.com> wrote:
> >
> > > Yes for sure you can use hbase for this, you can have
> > > 1. different fields of mail in different column of a column family 
and
> > > attachment as a binary array also in a column.
> > > 2. you can keep whole message in columns in hbase and the 
attachments
> are
> > > large enoug on the hdfs and some reference to it in hbase table.
> > > 3. schema you can decide, you can have a matrix how you store values 
to
> > > that you can decide.
> > >
> > >
> > > *Warm Regards_**∞_*
> > > * Shashwat Shriparv*
> > >  [image: http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9]<
> > > http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9>[image:
> > > https://twitter.com/shriparv] <https://twitter.com/shriparv>[image:
> > > https://www.facebook.com/shriparv] <
https://www.facebook.com/shriparv
> > > >[image:
> > > http://google.com/+ShashwatShriparv]
> > > <http://google.com/+ShashwatShriparv>[image:
> > > http://www.youtube.com/user/sShriparv/videos]<
> > > http://www.youtube.com/user/sShriparv/videos>[image:
> > > http://profile.yahoo.com/SWXSTW3DVSDTF2HHSRM47AV6DI/] <
> > shriparv@yahoo.com>
> > >
> > >
> > >
> > > On Tue, Feb 25, 2014 at 12:55 PM, Upendra Yadav 
<upendra1024@gmail.com
> > > >wrote:
> > >
> > > > I have to use hbase and have mix type of data
> > > >
> > > > Some of them have size 1-4K(Mail- Header....) and others
> > > > >5MB(Attachments...)
> > > >
> > > > And also we need only random access: any data
> > > >
> > > > Is HBase is feasible for storing this type of data
> > > >
> > > > What will be my schema design -
> > > > will have to go with 2 different Table -> 1st one for  1-4K and 
2nd
> for
> > > big
> > > > file
> > > > (because of memstore flush will flush other CF, and huge random
> access)
> > > >
> > > > Or there is other way:;
> > > >
> > > > Thanks
> > > >
> > >
> >
>
> Confidentiality Notice:  The information contained in this message,
> including any attachments hereto, may be confidential and is intended to 
be
> read only by the individual or entity to whom this message is addressed. 
If
> the reader of this message is not the intended recipient or an agent or
> designee of the intended recipient, please note that any review, use,
> disclosure or distribution of this message or its attachments, in any 
form,
> is strictly prohibited.  If you have received this message in error, 
please
> immediately notify the sender and/or Notifications@carrieriq.com and
> delete or destroy any copy of this message and its attachments.
>


Re: Is HBase is feasible for storing 4-5 MB of data as cell value

Posted by Upendra Yadav <up...@gmail.com>.
Me too realize same what you suggest...: (Keep them in a separate files in
HDFS and store in HBase only references)

will try several attachments into a single file...

And Thanks a lot...


On Wed, Feb 26, 2014 at 1:45 AM, Vladimir Rodionov
<vr...@carrieriq.com>wrote:

> Usually, it is not advisable to store such a large values in HBase (to
> avoid excessive IO during compaction).
> Keep them in a separate files in HDFS and store in HBase only references.
> To overcome inherent max file number limitation of NN
> you can bulk several values into a single file (you will need separate
> process -M/R job to garbage collect expired or deleted items).
>
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: vrodionov@carrieriq.com
>
> ________________________________________
> From: Ted Yu [yuzhihong@gmail.com]
> Sent: Tuesday, February 25, 2014 12:02 PM
> To: user@hbase.apache.org
> Subject: Re: Is HBase is feasible for storing 4-5 MB of data as cell value
>
> Minor:
> Value 0 also means no cap - see HTable#validatePut()
>
>     if (maxKeyValueSize > 0) {
>
> ...
>
>           if (kv.getLength() > maxKeyValueSize) {
>
>             throw new IllegalArgumentException("KeyValue size too large");
>
>           }
>
>
> On Tue, Feb 25, 2014 at 11:52 AM, Ameya Kanitkar <am...@groupon.com>
> wrote:
>
> > The only other thing I'd add is, by default HBase caps size of the data
> per
> > column at 10 MB (I think). You can change that by changing this setting:
> >
> > hbase.client.keyvalue.maxsize
> > in hbase-site.xml
> >
> > -1 means no cap. You can put other numbers for appropriate cap for your
> use
> > case.
> >
> > Ameya
> >
> >
> > On Tue, Feb 25, 2014 at 12:12 AM, shashwat shriparv <
> > dwivedishashwat@gmail.com> wrote:
> >
> > > Yes for sure you can use hbase for this, you can have
> > > 1. different fields of mail in different column of a column family and
> > > attachment as a binary array also in a column.
> > > 2. you can keep whole message in columns in hbase and the attachments
> are
> > > large enoug on the hdfs and some reference to it in hbase table.
> > > 3. schema you can decide, you can have a matrix how you store values to
> > > that you can decide.
> > >
> > >
> > > *Warm Regards_**∞_*
> > > * Shashwat Shriparv*
> > >  [image: http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9]<
> > > http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9>[image:
> > > https://twitter.com/shriparv] <https://twitter.com/shriparv>[image:
> > > https://www.facebook.com/shriparv] <https://www.facebook.com/shriparv
> > > >[image:
> > > http://google.com/+ShashwatShriparv]
> > > <http://google.com/+ShashwatShriparv>[image:
> > > http://www.youtube.com/user/sShriparv/videos]<
> > > http://www.youtube.com/user/sShriparv/videos>[image:
> > > http://profile.yahoo.com/SWXSTW3DVSDTF2HHSRM47AV6DI/] <
> > shriparv@yahoo.com>
> > >
> > >
> > >
> > > On Tue, Feb 25, 2014 at 12:55 PM, Upendra Yadav <upendra1024@gmail.com
> > > >wrote:
> > >
> > > > I have to use hbase and have mix type of data
> > > >
> > > > Some of them have size 1-4K(Mail- Header....) and others
> > > > >5MB(Attachments...)
> > > >
> > > > And also we need only random access: any data
> > > >
> > > > Is HBase is feasible for storing this type of data
> > > >
> > > > What will be my schema design -
> > > > will have to go with 2 different Table -> 1st one for  1-4K and 2nd
> for
> > > big
> > > > file
> > > > (because of memstore flush will flush other CF, and huge random
> access)
> > > >
> > > > Or there is other way:;
> > > >
> > > > Thanks
> > > >
> > >
> >
>
> Confidentiality Notice:  The information contained in this message,
> including any attachments hereto, may be confidential and is intended to be
> read only by the individual or entity to whom this message is addressed. If
> the reader of this message is not the intended recipient or an agent or
> designee of the intended recipient, please note that any review, use,
> disclosure or distribution of this message or its attachments, in any form,
> is strictly prohibited.  If you have received this message in error, please
> immediately notify the sender and/or Notifications@carrieriq.com and
> delete or destroy any copy of this message and its attachments.
>

RE: Is HBase is feasible for storing 4-5 MB of data as cell value

Posted by Vladimir Rodionov <vr...@carrieriq.com>.
Usually, it is not advisable to store such a large values in HBase (to avoid excessive IO during compaction).
Keep them in a separate files in HDFS and store in HBase only references. To overcome inherent max file number limitation of NN
you can bulk several values into a single file (you will need separate process -M/R job to garbage collect expired or deleted items).

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: Ted Yu [yuzhihong@gmail.com]
Sent: Tuesday, February 25, 2014 12:02 PM
To: user@hbase.apache.org
Subject: Re: Is HBase is feasible for storing 4-5 MB of data as cell value

Minor:
Value 0 also means no cap - see HTable#validatePut()

    if (maxKeyValueSize > 0) {

...

          if (kv.getLength() > maxKeyValueSize) {

            throw new IllegalArgumentException("KeyValue size too large");

          }


On Tue, Feb 25, 2014 at 11:52 AM, Ameya Kanitkar <am...@groupon.com> wrote:

> The only other thing I'd add is, by default HBase caps size of the data per
> column at 10 MB (I think). You can change that by changing this setting:
>
> hbase.client.keyvalue.maxsize
> in hbase-site.xml
>
> -1 means no cap. You can put other numbers for appropriate cap for your use
> case.
>
> Ameya
>
>
> On Tue, Feb 25, 2014 at 12:12 AM, shashwat shriparv <
> dwivedishashwat@gmail.com> wrote:
>
> > Yes for sure you can use hbase for this, you can have
> > 1. different fields of mail in different column of a column family and
> > attachment as a binary array also in a column.
> > 2. you can keep whole message in columns in hbase and the attachments are
> > large enoug on the hdfs and some reference to it in hbase table.
> > 3. schema you can decide, you can have a matrix how you store values to
> > that you can decide.
> >
> >
> > *Warm Regards_**∞_*
> > * Shashwat Shriparv*
> >  [image: http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9]<
> > http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9>[image:
> > https://twitter.com/shriparv] <https://twitter.com/shriparv>[image:
> > https://www.facebook.com/shriparv] <https://www.facebook.com/shriparv
> > >[image:
> > http://google.com/+ShashwatShriparv]
> > <http://google.com/+ShashwatShriparv>[image:
> > http://www.youtube.com/user/sShriparv/videos]<
> > http://www.youtube.com/user/sShriparv/videos>[image:
> > http://profile.yahoo.com/SWXSTW3DVSDTF2HHSRM47AV6DI/] <
> shriparv@yahoo.com>
> >
> >
> >
> > On Tue, Feb 25, 2014 at 12:55 PM, Upendra Yadav <upendra1024@gmail.com
> > >wrote:
> >
> > > I have to use hbase and have mix type of data
> > >
> > > Some of them have size 1-4K(Mail- Header....) and others
> > > >5MB(Attachments...)
> > >
> > > And also we need only random access: any data
> > >
> > > Is HBase is feasible for storing this type of data
> > >
> > > What will be my schema design -
> > > will have to go with 2 different Table -> 1st one for  1-4K and 2nd for
> > big
> > > file
> > > (because of memstore flush will flush other CF, and huge random access)
> > >
> > > Or there is other way:;
> > >
> > > Thanks
> > >
> >
>

Confidentiality Notice:  The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com and delete or destroy any copy of this message and its attachments.

Re: Is HBase is feasible for storing 4-5 MB of data as cell value

Posted by Ted Yu <yu...@gmail.com>.
Minor:
Value 0 also means no cap - see HTable#validatePut()

    if (maxKeyValueSize > 0) {

...

          if (kv.getLength() > maxKeyValueSize) {

            throw new IllegalArgumentException("KeyValue size too large");

          }


On Tue, Feb 25, 2014 at 11:52 AM, Ameya Kanitkar <am...@groupon.com> wrote:

> The only other thing I'd add is, by default HBase caps size of the data per
> column at 10 MB (I think). You can change that by changing this setting:
>
> hbase.client.keyvalue.maxsize
> in hbase-site.xml
>
> -1 means no cap. You can put other numbers for appropriate cap for your use
> case.
>
> Ameya
>
>
> On Tue, Feb 25, 2014 at 12:12 AM, shashwat shriparv <
> dwivedishashwat@gmail.com> wrote:
>
> > Yes for sure you can use hbase for this, you can have
> > 1. different fields of mail in different column of a column family and
> > attachment as a binary array also in a column.
> > 2. you can keep whole message in columns in hbase and the attachments are
> > large enoug on the hdfs and some reference to it in hbase table.
> > 3. schema you can decide, you can have a matrix how you store values to
> > that you can decide.
> >
> >
> > *Warm Regards_**∞_*
> > * Shashwat Shriparv*
> >  [image: http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9]<
> > http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9>[image:
> > https://twitter.com/shriparv] <https://twitter.com/shriparv>[image:
> > https://www.facebook.com/shriparv] <https://www.facebook.com/shriparv
> > >[image:
> > http://google.com/+ShashwatShriparv]
> > <http://google.com/+ShashwatShriparv>[image:
> > http://www.youtube.com/user/sShriparv/videos]<
> > http://www.youtube.com/user/sShriparv/videos>[image:
> > http://profile.yahoo.com/SWXSTW3DVSDTF2HHSRM47AV6DI/] <
> shriparv@yahoo.com>
> >
> >
> >
> > On Tue, Feb 25, 2014 at 12:55 PM, Upendra Yadav <upendra1024@gmail.com
> > >wrote:
> >
> > > I have to use hbase and have mix type of data
> > >
> > > Some of them have size 1-4K(Mail- Header....) and others
> > > >5MB(Attachments...)
> > >
> > > And also we need only random access: any data
> > >
> > > Is HBase is feasible for storing this type of data
> > >
> > > What will be my schema design -
> > > will have to go with 2 different Table -> 1st one for  1-4K and 2nd for
> > big
> > > file
> > > (because of memstore flush will flush other CF, and huge random access)
> > >
> > > Or there is other way:;
> > >
> > > Thanks
> > >
> >
>

Re: Is HBase is feasible for storing 4-5 MB of data as cell value

Posted by Ameya Kanitkar <am...@groupon.com>.
The only other thing I'd add is, by default HBase caps size of the data per
column at 10 MB (I think). You can change that by changing this setting:

hbase.client.keyvalue.maxsize
in hbase-site.xml

-1 means no cap. You can put other numbers for appropriate cap for your use
case.

Ameya


On Tue, Feb 25, 2014 at 12:12 AM, shashwat shriparv <
dwivedishashwat@gmail.com> wrote:

> Yes for sure you can use hbase for this, you can have
> 1. different fields of mail in different column of a column family and
> attachment as a binary array also in a column.
> 2. you can keep whole message in columns in hbase and the attachments are
> large enoug on the hdfs and some reference to it in hbase table.
> 3. schema you can decide, you can have a matrix how you store values to
> that you can decide.
>
>
> *Warm Regards_**∞_*
> * Shashwat Shriparv*
>  [image: http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9]<
> http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9>[image:
> https://twitter.com/shriparv] <https://twitter.com/shriparv>[image:
> https://www.facebook.com/shriparv] <https://www.facebook.com/shriparv
> >[image:
> http://google.com/+ShashwatShriparv]
> <http://google.com/+ShashwatShriparv>[image:
> http://www.youtube.com/user/sShriparv/videos]<
> http://www.youtube.com/user/sShriparv/videos>[image:
> http://profile.yahoo.com/SWXSTW3DVSDTF2HHSRM47AV6DI/] <sh...@yahoo.com>
>
>
>
> On Tue, Feb 25, 2014 at 12:55 PM, Upendra Yadav <upendra1024@gmail.com
> >wrote:
>
> > I have to use hbase and have mix type of data
> >
> > Some of them have size 1-4K(Mail- Header....) and others
> > >5MB(Attachments...)
> >
> > And also we need only random access: any data
> >
> > Is HBase is feasible for storing this type of data
> >
> > What will be my schema design -
> > will have to go with 2 different Table -> 1st one for  1-4K and 2nd for
> big
> > file
> > (because of memstore flush will flush other CF, and huge random access)
> >
> > Or there is other way:;
> >
> > Thanks
> >
>

Re: Is HBase is feasible for storing 4-5 MB of data as cell value

Posted by shashwat shriparv <dw...@gmail.com>.
Yes for sure you can use hbase for this, you can have
1. different fields of mail in different column of a column family and
attachment as a binary array also in a column.
2. you can keep whole message in columns in hbase and the attachments are
large enoug on the hdfs and some reference to it in hbase table.
3. schema you can decide, you can have a matrix how you store values to
that you can decide.


*Warm Regards_**∞_*
* Shashwat Shriparv*
 [image: http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9]<http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9>[image:
https://twitter.com/shriparv] <https://twitter.com/shriparv>[image:
https://www.facebook.com/shriparv] <https://www.facebook.com/shriparv>[image:
http://google.com/+ShashwatShriparv]
<http://google.com/+ShashwatShriparv>[image:
http://www.youtube.com/user/sShriparv/videos]<http://www.youtube.com/user/sShriparv/videos>[image:
http://profile.yahoo.com/SWXSTW3DVSDTF2HHSRM47AV6DI/] <sh...@yahoo.com>



On Tue, Feb 25, 2014 at 12:55 PM, Upendra Yadav <up...@gmail.com>wrote:

> I have to use hbase and have mix type of data
>
> Some of them have size 1-4K(Mail- Header....) and others
> >5MB(Attachments...)
>
> And also we need only random access: any data
>
> Is HBase is feasible for storing this type of data
>
> What will be my schema design -
> will have to go with 2 different Table -> 1st one for  1-4K and 2nd for big
> file
> (because of memstore flush will flush other CF, and huge random access)
>
> Or there is other way:;
>
> Thanks
>