You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@pig.apache.org by 某因幡 <te...@gmail.com> on 2012/09/04 12:39:15 UTC

Re: Extremely slow when loading small amount of data from HBase

After merging ~8000 regions to ~4000 on an 8-node cluster the things
is getting better.
Should I continue merging?


2012/8/29 Dmitriy Ryaboy <dv...@gmail.com>:
> Can you try the same scans with a regular hbase mapreduce job? If you see the same problem, it's an hbase issue. Otherwise, we need to see the script and some facts about your table (how many regions, how many rows, how big a cluster, is the small range all on one region server, etc)
>
> On Aug 27, 2012, at 11:49 PM, 某因幡 <te...@gmail.com> wrote:
>
>> When I load a range of data from HBase simply using row key range in
>> HBaseStorageHandler, I find that the speed is acceptable when I'm
>> trying to load some tens of millions rows or more, while the only map
>> ends up in a timeout when it's some thousands of rows.
>> What is going wrong here? Tried both Pig-0.9.2 and Pig-0.10.0.
>>
>>
>> --
>> language: Chinese, Japanese, English



-- 
language: Chinese, Japanese, English

Re: Extremely slow when loading small amount of data from HBase

Posted by 某因幡 <te...@gmail.com>.

Yes hbase.hregion.max.filesize was set to default 256MB and it was too low.


2012/9/5 Jean-Marc Spaggiari <je...@spaggiari.org>:
> But I think you should also look at why we have so many regions...
> Because even if you merge them manually now, you might face the same
> issu soon.
>
> 2012/9/5, n keywal <nk...@gmail.com>:
>> Hi,
>>
>> With 8 regionservers, yes, you can. Target a few hundreds by default imho.
>>
>> N.
>>
>> On Wed, Sep 5, 2012 at 4:55 AM, 某因幡 <te...@gmail.com> wrote:
>>
>>> +HBase users.
>>>
>>>
>>> ---------- Forwarded message ----------
>>> From: Dmitriy Ryaboy <dv...@gmail.com>
>>> Date: 2012/9/4
>>> Subject: Re: Extremely slow when loading small amount of data from HBase
>>> To: "user@pig.apache.org" <us...@pig.apache.org>
>>>
>>>
>>> I think the hbase folks recommend something like 40 regions per node
>>> per table, but I might be misremembering something. Have you tried
>>> emailing the hbase users list?
>>>
>>> On Sep 4, 2012, at 3:39 AM, 某因幡 <te...@gmail.com> wrote:
>>>
>>> > After merging ~8000 regions to ~4000 on an 8-node cluster the things
>>> > is getting better.
>>> > Should I continue merging?
>>> >
>>> >
>>> > 2012/8/29 Dmitriy Ryaboy <dv...@gmail.com>:
>>> >> Can you try the same scans with a regular hbase mapreduce job? If you
>>> see the same problem, it's an hbase issue. Otherwise, we need to see the
>>> script and some facts about your table (how many regions, how many rows,
>>> how big a cluster, is the small range all on one region server, etc)
>>> >>
>>> >> On Aug 27, 2012, at 11:49 PM, 某因幡 <te...@gmail.com> wrote:
>>> >>
>>> >>> When I load a range of data from HBase simply using row key range in
>>> >>> HBaseStorageHandler, I find that the speed is acceptable when I'm
>>> >>> trying to load some tens of millions rows or more, while the only map
>>> >>> ends up in a timeout when it's some thousands of rows.
>>> >>> What is going wrong here? Tried both Pig-0.9.2 and Pig-0.10.0.
>>> >>>
>>> >>>
>>> >>> --
>>> >>> language: Chinese, Japanese, English
>>> >
>>> >
>>> >
>>> > --
>>> > language: Chinese, Japanese, English
>>>
>>>
>>> --
>>> language: Chinese, Japanese, English
>>>
>>



-- 
language: Chinese, Japanese, English

Re: Extremely slow when loading small amount of data from HBase

Posted by Doug Meil <do...@explorysmedical.com>.

You have are 4000 regions on an 8 node cluster?  I think you need to bring
that *way* down…  

re:  "something like 40 regions"


Yep… around there.  See…


http://hbase.apache.org/book.html#bigger.regions



On 9/5/12 8:06 AM, "Jean-Marc Spaggiari" <je...@spaggiari.org> wrote:

>But I think you should also look at why we have so many regions...
>Because even if you merge them manually now, you might face the same
>issu soon.
>
>2012/9/5, n keywal <nk...@gmail.com>:
>> Hi,
>>
>> With 8 regionservers, yes, you can. Target a few hundreds by default
>>imho.
>>
>> N.
>>
>> On Wed, Sep 5, 2012 at 4:55 AM, 某因幡 <te...@gmail.com> wrote:
>>
>>> +HBase users.
>>>
>>>
>>> ---------- Forwarded message ----------
>>> From: Dmitriy Ryaboy <dv...@gmail.com>
>>> Date: 2012/9/4
>>> Subject: Re: Extremely slow when loading small amount of data from
>>>HBase
>>> To: "user@pig.apache.org" <us...@pig.apache.org>
>>>
>>>
>>> I think the hbase folks recommend something like 40 regions per node
>>> per table, but I might be misremembering something. Have you tried
>>> emailing the hbase users list?
>>>
>>> On Sep 4, 2012, at 3:39 AM, 某因幡 <te...@gmail.com> wrote:
>>>
>>> > After merging ~8000 regions to ~4000 on an 8-node cluster the things
>>> > is getting better.
>>> > Should I continue merging?
>>> >
>>> >
>>> > 2012/8/29 Dmitriy Ryaboy <dv...@gmail.com>:
>>> >> Can you try the same scans with a regular hbase mapreduce job? If
>>>you
>>> see the same problem, it's an hbase issue. Otherwise, we need to see
>>>the
>>> script and some facts about your table (how many regions, how many
>>>rows,
>>> how big a cluster, is the small range all on one region server, etc)
>>> >>
>>> >> On Aug 27, 2012, at 11:49 PM, 某因幡 <te...@gmail.com> wrote:
>>> >>
>>> >>> When I load a range of data from HBase simply using row key range
>>>in
>>> >>> HBaseStorageHandler, I find that the speed is acceptable when I'm
>>> >>> trying to load some tens of millions rows or more, while the only
>>>map
>>> >>> ends up in a timeout when it's some thousands of rows.
>>> >>> What is going wrong here? Tried both Pig-0.9.2 and Pig-0.10.0.
>>> >>>
>>> >>>
>>> >>> --
>>> >>> language: Chinese, Japanese, English
>>> >
>>> >
>>> >
>>> > --
>>> > language: Chinese, Japanese, English
>>>
>>>
>>> --
>>> language: Chinese, Japanese, English
>>>
>>
>

Re: Extremely slow when loading small amount of data from HBase

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

But I think you should also look at why we have so many regions...
Because even if you merge them manually now, you might face the same
issu soon.

2012/9/5, n keywal <nk...@gmail.com>:
> Hi,
>
> With 8 regionservers, yes, you can. Target a few hundreds by default imho.
>
> N.
>
> On Wed, Sep 5, 2012 at 4:55 AM, 某因幡 <te...@gmail.com> wrote:
>
>> +HBase users.
>>
>>
>> ---------- Forwarded message ----------
>> From: Dmitriy Ryaboy <dv...@gmail.com>
>> Date: 2012/9/4
>> Subject: Re: Extremely slow when loading small amount of data from HBase
>> To: "user@pig.apache.org" <us...@pig.apache.org>
>>
>>
>> I think the hbase folks recommend something like 40 regions per node
>> per table, but I might be misremembering something. Have you tried
>> emailing the hbase users list?
>>
>> On Sep 4, 2012, at 3:39 AM, 某因幡 <te...@gmail.com> wrote:
>>
>> > After merging ~8000 regions to ~4000 on an 8-node cluster the things
>> > is getting better.
>> > Should I continue merging?
>> >
>> >
>> > 2012/8/29 Dmitriy Ryaboy <dv...@gmail.com>:
>> >> Can you try the same scans with a regular hbase mapreduce job? If you
>> see the same problem, it's an hbase issue. Otherwise, we need to see the
>> script and some facts about your table (how many regions, how many rows,
>> how big a cluster, is the small range all on one region server, etc)
>> >>
>> >> On Aug 27, 2012, at 11:49 PM, 某因幡 <te...@gmail.com> wrote:
>> >>
>> >>> When I load a range of data from HBase simply using row key range in
>> >>> HBaseStorageHandler, I find that the speed is acceptable when I'm
>> >>> trying to load some tens of millions rows or more, while the only map
>> >>> ends up in a timeout when it's some thousands of rows.
>> >>> What is going wrong here? Tried both Pig-0.9.2 and Pig-0.10.0.
>> >>>
>> >>>
>> >>> --
>> >>> language: Chinese, Japanese, English
>> >
>> >
>> >
>> > --
>> > language: Chinese, Japanese, English
>>
>>
>> --
>> language: Chinese, Japanese, English
>>
>

Re: Extremely slow when loading small amount of data from HBase

Posted by n keywal <nk...@gmail.com>.

Hi,

With 8 regionservers, yes, you can. Target a few hundreds by default imho.

N.

On Wed, Sep 5, 2012 at 4:55 AM, 某因幡 <te...@gmail.com> wrote:

> +HBase users.
>
>
> ---------- Forwarded message ----------
> From: Dmitriy Ryaboy <dv...@gmail.com>
> Date: 2012/9/4
> Subject: Re: Extremely slow when loading small amount of data from HBase
> To: "user@pig.apache.org" <us...@pig.apache.org>
>
>
> I think the hbase folks recommend something like 40 regions per node
> per table, but I might be misremembering something. Have you tried
> emailing the hbase users list?
>
> On Sep 4, 2012, at 3:39 AM, 某因幡 <te...@gmail.com> wrote:
>
> > After merging ~8000 regions to ~4000 on an 8-node cluster the things
> > is getting better.
> > Should I continue merging?
> >
> >
> > 2012/8/29 Dmitriy Ryaboy <dv...@gmail.com>:
> >> Can you try the same scans with a regular hbase mapreduce job? If you
> see the same problem, it's an hbase issue. Otherwise, we need to see the
> script and some facts about your table (how many regions, how many rows,
> how big a cluster, is the small range all on one region server, etc)
> >>
> >> On Aug 27, 2012, at 11:49 PM, 某因幡 <te...@gmail.com> wrote:
> >>
> >>> When I load a range of data from HBase simply using row key range in
> >>> HBaseStorageHandler, I find that the speed is acceptable when I'm
> >>> trying to load some tens of millions rows or more, while the only map
> >>> ends up in a timeout when it's some thousands of rows.
> >>> What is going wrong here? Tried both Pig-0.9.2 and Pig-0.10.0.
> >>>
> >>>
> >>> --
> >>> language: Chinese, Japanese, English
> >
> >
> >
> > --
> > language: Chinese, Japanese, English
>
>
> --
> language: Chinese, Japanese, English
>

Fwd: Extremely slow when loading small amount of data from HBase

Posted by 某因幡 <te...@gmail.com>.

+HBase users.


---------- Forwarded message ----------
From: Dmitriy Ryaboy <dv...@gmail.com>
Date: 2012/9/4
Subject: Re: Extremely slow when loading small amount of data from HBase
To: "user@pig.apache.org" <us...@pig.apache.org>


I think the hbase folks recommend something like 40 regions per node
per table, but I might be misremembering something. Have you tried
emailing the hbase users list?

On Sep 4, 2012, at 3:39 AM, 某因幡 <te...@gmail.com> wrote:

> After merging ~8000 regions to ~4000 on an 8-node cluster the things
> is getting better.
> Should I continue merging?
>
>
> 2012/8/29 Dmitriy Ryaboy <dv...@gmail.com>:
>> Can you try the same scans with a regular hbase mapreduce job? If you see the same problem, it's an hbase issue. Otherwise, we need to see the script and some facts about your table (how many regions, how many rows, how big a cluster, is the small range all on one region server, etc)
>>
>> On Aug 27, 2012, at 11:49 PM, 某因幡 <te...@gmail.com> wrote:
>>
>>> When I load a range of data from HBase simply using row key range in
>>> HBaseStorageHandler, I find that the speed is acceptable when I'm
>>> trying to load some tens of millions rows or more, while the only map
>>> ends up in a timeout when it's some thousands of rows.
>>> What is going wrong here? Tried both Pig-0.9.2 and Pig-0.10.0.
>>>
>>>
>>> --
>>> language: Chinese, Japanese, English
>
>
>
> --
> language: Chinese, Japanese, English


-- 
language: Chinese, Japanese, English

Re: Extremely slow when loading small amount of data from HBase

Posted by Dmitriy Ryaboy <dv...@gmail.com>.

I think the hbase folks recommend something like 40 regions per node per table, but I might be misremembering something. Have you tried emailing the hbase users list?

On Sep 4, 2012, at 3:39 AM, 某因幡 <te...@gmail.com> wrote:

> After merging ~8000 regions to ~4000 on an 8-node cluster the things
> is getting better.
> Should I continue merging?
> 
> 
> 2012/8/29 Dmitriy Ryaboy <dv...@gmail.com>:
>> Can you try the same scans with a regular hbase mapreduce job? If you see the same problem, it's an hbase issue. Otherwise, we need to see the script and some facts about your table (how many regions, how many rows, how big a cluster, is the small range all on one region server, etc)
>> 
>> On Aug 27, 2012, at 11:49 PM, 某因幡 <te...@gmail.com> wrote:
>> 
>>> When I load a range of data from HBase simply using row key range in
>>> HBaseStorageHandler, I find that the speed is acceptable when I'm
>>> trying to load some tens of millions rows or more, while the only map
>>> ends up in a timeout when it's some thousands of rows.
>>> What is going wrong here? Tried both Pig-0.9.2 and Pig-0.10.0.
>>> 
>>> 
>>> --
>>> language: Chinese, Japanese, English
> 
> 
> 
> -- 
> language: Chinese, Japanese, English