You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Marko Dinic <ha...@gmail.com> on 2015/12/01 12:48:57 UTC

Am I crazy or that's not that much?

Hi everyone,

I'm new to HBase and I have a simple question - is 800.000 columns a lot to
be stored in a single column family?

This data will be mostly be processed as MR jobs.

My guess is that it is not, since all the values are stored in single
Region, so there shouldn't be a problem.

Is there any limit to number of columns in a column family?

-- 
Marko Dinic

Re: Am I crazy or that's not that much?

Posted by Ted Yu <yu...@gmail.com>.
bq. current MR implementation my OOME if there is too many columns

This is related:
HBASE-14696 Support setting allowPartialResults in mapreduce Mappers

but it is not in any hbase release yet.

FYI

On Tue, Dec 1, 2015 at 7:16 AM, Jean-Marc Spaggiari <jean-marc@spaggiari.org
> wrote:

> I can not say if you are crazy or not. Only you know ;)
>
> Now, regarding the number of columns... it depends...
> If you want to store 800 000 1MB columns, it's almost 800GB for one region.
> Forget that! HBase will not split within a row. So you will kill you RS
> with a that big region. But if you want to store 800 000 8 bytes columns,
> it's only 6MB per row, which is totally doable in recent HBase versions.
> But think about:
> - If no consistency constraint, add the CQ (Column Qualifier) as part of
> the key to be able to split.
> - Regroup some values together if the are accessed together. If you always
> ready 10K at a time, just put those 10K together in a single cell.
>
> Also, keep in mind that current MR implementation my OOME if there is too
> many columns... A fix is coming, but is not ready yet.
>
> Now, regarding column families, use them only if you need them. Very
> different access pattern or data format (JPG vs plain text, etc.) can
> justify another column family, but most of the time you do all what you
> meed with a single one...
>
> HTH,
>
> JMS
>
> 2015-12-01 6:48 GMT-05:00 Marko Dinic <ha...@gmail.com>:
>
> > Hi everyone,
> >
> > I'm new to HBase and I have a simple question - is 800.000 columns a lot
> to
> > be stored in a single column family?
> >
> > This data will be mostly be processed as MR jobs.
> >
> > My guess is that it is not, since all the values are stored in single
> > Region, so there shouldn't be a problem.
> >
> > Is there any limit to number of columns in a column family?
> >
> > --
> > Marko Dinic
> >
>

Re: Am I crazy or that's not that much?

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
I can not say if you are crazy or not. Only you know ;)

Now, regarding the number of columns... it depends...
If you want to store 800 000 1MB columns, it's almost 800GB for one region.
Forget that! HBase will not split within a row. So you will kill you RS
with a that big region. But if you want to store 800 000 8 bytes columns,
it's only 6MB per row, which is totally doable in recent HBase versions.
But think about:
- If no consistency constraint, add the CQ (Column Qualifier) as part of
the key to be able to split.
- Regroup some values together if the are accessed together. If you always
ready 10K at a time, just put those 10K together in a single cell.

Also, keep in mind that current MR implementation my OOME if there is too
many columns... A fix is coming, but is not ready yet.

Now, regarding column families, use them only if you need them. Very
different access pattern or data format (JPG vs plain text, etc.) can
justify another column family, but most of the time you do all what you
meed with a single one...

HTH,

JMS

2015-12-01 6:48 GMT-05:00 Marko Dinic <ha...@gmail.com>:

> Hi everyone,
>
> I'm new to HBase and I have a simple question - is 800.000 columns a lot to
> be stored in a single column family?
>
> This data will be mostly be processed as MR jobs.
>
> My guess is that it is not, since all the values are stored in single
> Region, so there shouldn't be a problem.
>
> Is there any limit to number of columns in a column family?
>
> --
> Marko Dinic
>