You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Wayne <wa...@gmail.com> on 2012/01/20 20:43:19 UTC

0.92 Max Row Size

Does 0.92 support a significant increase in row size over 0.90.x? With
0.90.4 we have seen writes start choking at 30 million cols/row and reads
start choking at 10 million cols/row. Can we assume these numbers will go
up with .92 and if yes how much?

Thanks

Re: 0.92 Max Row Size

Posted by Stack <st...@duboce.net>.

On Sat, Jan 21, 2012 at 5:34 AM, Wayne <wa...@gmail.com> wrote:

> Sorry but it would be too hard for us to be able to provide enough info in
> a Jira to accurately reproduce. Our read problem is through thrift and has
> everything to do with the row just being too big to bring back in its
> entirety (13 million col row times out 1/3 of the time). Filters in .92 and
> thrift should help us there. I just closed
> https://issues.apache.org/jira/browse/HBASE-4187 as filters now support
> offset, limit patterns for the get. Of course we would all prefer a
> streaming model to avoid any of these issues and having to build our
> own pseudo streaming model. Is Thrift still the best option for high
> performance python based reads? From Hadoop World it seems some people are
> pushing thrift and others are pushing Avro. Does .92 bundle/work with
> Thrift .8 and are the memory leaks fixed in .8?
>
>
For what you are doing, python client, I'd say yes.



> As far as the write bottleneck it has a lot to do with memory, and other
> low level config issues. I would hope that the automated tests of hbase can
> eventually include patterns for large col counts. In order for hbase to
> truly be a col based storage system it needs to scale cols into the 100s
> millions and beyond. This is the pattern we have the hardest time modeling
> in base because there is an unknown "limit" here we have to watch out for.
> There is a known limit that a row must be stored within 1 and only one
> region, but that should not be a problem. One single large region storing
> one large row should still "work".
>
>
We don't have such a test in our suite currently.  It would be a good idea
to add it.  I made HBASE-5244.

St.Ack

Re: 0.92 Max Row Size

Posted by Wayne <wa...@gmail.com>.

Our memory problems might be as simple as not closing a scanner every time
one is opened, but I know we had to implement nagios based restarts of
thrift as our 4g thrift memory gets eaten up and it eventually freezes and
stop responding to requests after less than 1 week of running. We are
running the thrift that is bundled with 0.90.4 so hopefully a lot of this
is fixed now...

Thanks.


On Sat, Jan 21, 2012 at 9:29 AM, <yu...@gmail.com> wrote:

> Thrift has been upgraded to 0.8 in trunk. 0.92 still uses 0.7
>
> Can you provide Jira number which deals with memory leak ?
>
> Thanks
>
>
>
> On Jan 21, 2012, at 5:34 AM, Wayne <wa...@gmail.com> wrote:
>
> > Sorry but it would be too hard for us to be able to provide enough info
> in
> > a Jira to accurately reproduce. Our read problem is through thrift and
> has
> > everything to do with the row just being too big to bring back in its
> > entirety (13 million col row times out 1/3 of the time). Filters in .92
> and
> > thrift should help us there. I just closed
> > https://issues.apache.org/jira/browse/HBASE-4187 as filters now support
> > offset, limit patterns for the get. Of course we would all prefer a
> > streaming model to avoid any of these issues and having to build our
> > own pseudo streaming model. Is Thrift still the best option for high
> > performance python based reads? From Hadoop World it seems some people
> are
> > pushing thrift and others are pushing Avro. Does .92 bundle/work with
> > Thrift .8 and are the memory leaks fixed in .8?
> >
> > As far as the write bottleneck it has a lot to do with memory, and other
> > low level config issues. I would hope that the automated tests of hbase
> can
> > eventually include patterns for large col counts. In order for hbase to
> > truly be a col based storage system it needs to scale cols into the 100s
> > millions and beyond. This is the pattern we have the hardest time
> modeling
> > in base because there is an unknown "limit" here we have to watch out
> for.
> > There is a known limit that a row must be stored within 1 and only one
> > region, but that should not be a problem. One single large region storing
> > one large row should still "work".
> >
> > Thanks.
> >
> > On Fri, Jan 20, 2012 at 3:45 PM, Stack <st...@duboce.net> wrote:
> >
> >> On Fri, Jan 20, 2012 at 11:43 AM, Wayne <wa...@gmail.com> wrote:
> >>
> >>> Does 0.92 support a significant increase in row size over 0.90.x? With
> >>> 0.90.4 we have seen writes start choking at 30 million cols/row and
> reads
> >>> start choking at 10 million cols/row. Can we assume these numbers will
> go
> >>> up with .92 and if yes how much?
> >>>
> >>>
> >> Any chance of a JIRA on issues you see Wayne when writes/read choke?
> >> Thanks,
> >>
> >> St.Ack
> >> P.S. I don't know of any comparison.  We have new fileformat in 0.92.0
> and
> >> both read/write paths have been amended so it could be different; not
> sure
> >> if better or worse.
> >>
>

Re: 0.92 Max Row Size

Posted by yu...@gmail.com.

Thrift has been upgraded to 0.8 in trunk. 0.92 still uses 0.7

Can you provide Jira number which deals with memory leak ?

Thanks



On Jan 21, 2012, at 5:34 AM, Wayne <wa...@gmail.com> wrote:

> Sorry but it would be too hard for us to be able to provide enough info in
> a Jira to accurately reproduce. Our read problem is through thrift and has
> everything to do with the row just being too big to bring back in its
> entirety (13 million col row times out 1/3 of the time). Filters in .92 and
> thrift should help us there. I just closed
> https://issues.apache.org/jira/browse/HBASE-4187 as filters now support
> offset, limit patterns for the get. Of course we would all prefer a
> streaming model to avoid any of these issues and having to build our
> own pseudo streaming model. Is Thrift still the best option for high
> performance python based reads? From Hadoop World it seems some people are
> pushing thrift and others are pushing Avro. Does .92 bundle/work with
> Thrift .8 and are the memory leaks fixed in .8?
> 
> As far as the write bottleneck it has a lot to do with memory, and other
> low level config issues. I would hope that the automated tests of hbase can
> eventually include patterns for large col counts. In order for hbase to
> truly be a col based storage system it needs to scale cols into the 100s
> millions and beyond. This is the pattern we have the hardest time modeling
> in base because there is an unknown "limit" here we have to watch out for.
> There is a known limit that a row must be stored within 1 and only one
> region, but that should not be a problem. One single large region storing
> one large row should still "work".
> 
> Thanks.
> 
> On Fri, Jan 20, 2012 at 3:45 PM, Stack <st...@duboce.net> wrote:
> 
>> On Fri, Jan 20, 2012 at 11:43 AM, Wayne <wa...@gmail.com> wrote:
>> 
>>> Does 0.92 support a significant increase in row size over 0.90.x? With
>>> 0.90.4 we have seen writes start choking at 30 million cols/row and reads
>>> start choking at 10 million cols/row. Can we assume these numbers will go
>>> up with .92 and if yes how much?
>>> 
>>> 
>> Any chance of a JIRA on issues you see Wayne when writes/read choke?
>> Thanks,
>> 
>> St.Ack
>> P.S. I don't know of any comparison.  We have new fileformat in 0.92.0 and
>> both read/write paths have been amended so it could be different; not sure
>> if better or worse.
>>

Re: 0.92 Max Row Size

Posted by Wayne <wa...@gmail.com>.

Sorry but it would be too hard for us to be able to provide enough info in
a Jira to accurately reproduce. Our read problem is through thrift and has
everything to do with the row just being too big to bring back in its
entirety (13 million col row times out 1/3 of the time). Filters in .92 and
thrift should help us there. I just closed
https://issues.apache.org/jira/browse/HBASE-4187 as filters now support
offset, limit patterns for the get. Of course we would all prefer a
streaming model to avoid any of these issues and having to build our
own pseudo streaming model. Is Thrift still the best option for high
performance python based reads? From Hadoop World it seems some people are
pushing thrift and others are pushing Avro. Does .92 bundle/work with
Thrift .8 and are the memory leaks fixed in .8?

As far as the write bottleneck it has a lot to do with memory, and other
low level config issues. I would hope that the automated tests of hbase can
eventually include patterns for large col counts. In order for hbase to
truly be a col based storage system it needs to scale cols into the 100s
millions and beyond. This is the pattern we have the hardest time modeling
in base because there is an unknown "limit" here we have to watch out for.
There is a known limit that a row must be stored within 1 and only one
region, but that should not be a problem. One single large region storing
one large row should still "work".

Thanks.

On Fri, Jan 20, 2012 at 3:45 PM, Stack <st...@duboce.net> wrote:

> On Fri, Jan 20, 2012 at 11:43 AM, Wayne <wa...@gmail.com> wrote:
>
> > Does 0.92 support a significant increase in row size over 0.90.x? With
> > 0.90.4 we have seen writes start choking at 30 million cols/row and reads
> > start choking at 10 million cols/row. Can we assume these numbers will go
> > up with .92 and if yes how much?
> >
> >
> Any chance of a JIRA on issues you see Wayne when writes/read choke?
> Thanks,
>
> St.Ack
> P.S. I don't know of any comparison.  We have new fileformat in 0.92.0 and
> both read/write paths have been amended so it could be different; not sure
> if better or worse.
>

Re: 0.92 Max Row Size

Posted by Stack <st...@duboce.net>.

On Fri, Jan 20, 2012 at 11:43 AM, Wayne <wa...@gmail.com> wrote:

> Does 0.92 support a significant increase in row size over 0.90.x? With
> 0.90.4 we have seen writes start choking at 30 million cols/row and reads
> start choking at 10 million cols/row. Can we assume these numbers will go
> up with .92 and if yes how much?
>
>
Any chance of a JIRA on issues you see Wayne when writes/read choke?
Thanks,

St.Ack
P.S. I don't know of any comparison.  We have new fileformat in 0.92.0 and
both read/write paths have been amended so it could be different; not sure
if better or worse.