You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Brian Jeltema <br...@digitalenvoy.net> on 2014/11/10 20:10:32 UTC

what can cause RegionTooBusyException?

I’m running a map/reduce job against a table that is performing a large number of writes (probably updating every row).
The job is failing with the exception below. This is a solid failure; it dies at the same point in the application,
and at the same row in the table. So I doubt it’s a conflict with compaction (and the UI shows no compaction in progress),
or that there is a load-related cause.

‘hbase hbck’ does not report any inconsistencies. The ‘waitForAllPreviousOpsAndReset’ leads me to suspect that
there is operation in progress that is hung and blocking the update. I don’t see anything suspicious in the HBase logs.
The data at the point of failure is not unusual, and is identical to many preceding rows.
Does anybody have any ideas of what I should look for to find the cause of this RegionTooBusyException?

This is Hadoop 2.4 and HBase 0.98.

14/11/10 13:46:13 INFO mapreduce.Job: Task Id : attempt_1415210751318_0010_m_000314_1, Status : FAILED
Error: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1744 actions: RegionTooBusyException: 1744 times,
at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:207)
at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1700(AsyncProcess.java:187)
at org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1568)
at org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:1023)
at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:995)
at org.apache.hadoop.hbase.client.HTable.put(HTable.java:953)

Brian

Re: what can cause RegionTooBusyException?

Posted by Qiang Tian <ti...@gmail.com>.

or:
          LOG.warn("Region " + region.getRegionNameAsString() + " has too
many " +
            "store files; delaying flush up to " + this.blockingWaitTime +
"ms");

sth like:

WARN org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Region
occurrence,\x17\xF1o\x9C,1340981109494.ecb85155563c6614e5448c7d700b909e.
has too many store files; delaying flush up to 90000ms



On Wed, Nov 12, 2014 at 10:26 AM, Qiang Tian <ti...@gmail.com> wrote:

> the checkResource Ted mentioned is a good suspect. see online hbase book
> "9.7.7.7.1.1. Being Stuck".
> Did you see below message in your RS log?
>         LOG.info("Waited " + (System.currentTimeMillis() - fqe.createTime)
> +
>           "ms on a compaction to clean up 'too many store files'; waited "
> +
>           "long enough... proceeding with flush of " +
>           region.getRegionNameAsString());
>
>
> I did a quick test setting "hbase.hregion.memstore.block.multiplier" = 0,
> issuing a put in hbase shell will trigger flush and throw region too busy
> exception to client,  and the retry mechanism will make it done in next
> multi RPC call.
>
>
>
> On Wed, Nov 12, 2014 at 1:21 AM, Brian Jeltema <
> brian.jeltema@digitalenvoy.net> wrote:
>
>> Thanks. I appear to have resolved this problem by restarting the HBase
>> Master and the RegionServers
>> that were reporting the failure.
>>
>> Brian
>>
>> On Nov 11, 2014, at 12:13 PM, Ted Yu <yu...@gmail.com> wrote:
>>
>> > For your first question, region server web UI,
>> > rs-status#regionRequestStats, shows Write Request Count.
>> >
>> > You can monitor the value for the underlying region to see if it
>> receives
>> > above-normal writes.
>> >
>> > Cheers
>> >
>> > On Mon, Nov 10, 2014 at 4:06 PM, Brian Jeltema <bd...@gmail.com>
>> wrote:
>> >
>> >>> Was the region containing this row hot around the time of failure ?
>> >>
>> >> How do I measure that?
>> >>
>> >>>
>> >>> Can you check region server log (along with monitoring tool) what
>> >> memstore pressure was ?
>> >>
>> >> I didn't see anything in the region server logs to indicate a problem.
>> And
>> >> given the
>> >> reproducibility of the behavior, it's hard to see how dynamic
>> parameters
>> >> such as
>> >> memory pressure could be at the root of the problem.
>> >>
>> >> Brian
>> >>
>> >> On Nov 10, 2014, at 3:22 PM, Ted Yu <yu...@gmail.com> wrote:
>> >>
>> >>> Was the region containing this row hot around the time of failure ?
>> >>>
>> >>> Can you check region server log (along with monitoring tool) what
>> >> memstore pressure was ?
>> >>>
>> >>> Thanks
>> >>>
>> >>> On Nov 10, 2014, at 11:34 AM, Brian Jeltema <
>> >> brian.jeltema@digitalenvoy.net> wrote:
>> >>>
>> >>>>> How many tasks may write to this row concurrently ?
>> >>>>
>> >>>> only 1 mapper should be writing to this row. Is there a way to check
>> >> which
>> >>>> locks are being held?
>> >>>>
>> >>>>> Which 0.98 release are you using ?
>> >>>>
>> >>>> 0.98.0.2.1.2.1-471-hadoop2
>> >>>>
>> >>>> Thanks
>> >>>> Brian
>> >>>>
>> >>>> On Nov 10, 2014, at 2:21 PM, Ted Yu <yu...@gmail.com> wrote:
>> >>>>
>> >>>>> There could be more than one reason where RegionTooBusyException is
>> >> thrown.
>> >>>>> Below are two (from HRegion):
>> >>>>>
>> >>>>> * We throw RegionTooBusyException if above memstore limit
>> >>>>> * and expect client to retry using some kind of backoff
>> >>>>> */
>> >>>>> private void checkResources()
>> >>>>>
>> >>>>> * Try to acquire a lock.  Throw RegionTooBusyException
>> >>>>>
>> >>>>> * if failed to get the lock in time. Throw InterruptedIOException
>> >>>>>
>> >>>>> * if interrupted while waiting for the lock.
>> >>>>>
>> >>>>> */
>> >>>>>
>> >>>>> private void lock(final Lock lock, final int multiplier)
>> >>>>>
>> >>>>> How many tasks may write to this row concurrently ?
>> >>>>>
>> >>>>> Which 0.98 release are you using ?
>> >>>>>
>> >>>>> Cheers
>> >>>>>
>> >>>>> On Mon, Nov 10, 2014 at 11:10 AM, Brian Jeltema <
>> >>>>> brian.jeltema@digitalenvoy.net> wrote:
>> >>>>>
>> >>>>>> I’m running a map/reduce job against a table that is performing a
>> >> large
>> >>>>>> number of writes (probably updating every row).
>> >>>>>> The job is failing with the exception below. This is a solid
>> failure;
>> >> it
>> >>>>>> dies at the same point in the application,
>> >>>>>> and at the same row in the table. So I doubt it’s a conflict with
>> >>>>>> compaction (and the UI shows no compaction in progress),
>> >>>>>> or that there is a load-related cause.
>> >>>>>>
>> >>>>>> ‘hbase hbck’ does not report any inconsistencies. The
>> >>>>>> ‘waitForAllPreviousOpsAndReset’ leads me to suspect that
>> >>>>>> there is operation in progress that is hung and blocking the
>> update. I
>> >>>>>> don’t see anything suspicious in the HBase logs.
>> >>>>>> The data at the point of failure is not unusual, and is identical
>> to
>> >> many
>> >>>>>> preceding rows.
>> >>>>>> Does anybody have any ideas of what I should look for to find the
>> >> cause of
>> >>>>>> this RegionTooBusyException?
>> >>>>>>
>> >>>>>> This is Hadoop 2.4 and HBase 0.98.
>> >>>>>>
>> >>>>>> 14/11/10 13:46:13 INFO mapreduce.Job: Task Id :
>> >>>>>> attempt_1415210751318_0010_m_000314_1, Status : FAILED
>> >>>>>> Error:
>> >>>>>>
>> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
>> >> Failed
>> >>>>>> 1744 actions: RegionTooBusyException: 1744 times,
>> >>>>>>     at
>> >>>>>>
>> >>
>> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:207)
>> >>>>>>     at
>> >>>>>>
>> >>
>> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1700(AsyncProcess.java:187)
>> >>>>>>     at
>> >>>>>>
>> >>
>> org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1568)
>> >>>>>>     at
>> >>>>>>
>> >>
>> org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:1023)
>> >>>>>>     at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:995)
>> >>>>>>     at org.apache.hadoop.hbase.client.HTable.put(HTable.java:953)
>> >>>>>>
>> >>>>>> Brian
>> >>>>
>> >>>
>> >>
>> >>
>>
>>
>

Re: what can cause RegionTooBusyException?

Posted by Qiang Tian <ti...@gmail.com>.

the checkResource Ted mentioned is a good suspect. see online hbase book
"9.7.7.7.1.1. Being Stuck".
Did you see below message in your RS log?
        LOG.info("Waited " + (System.currentTimeMillis() - fqe.createTime) +
          "ms on a compaction to clean up 'too many store files'; waited " +
          "long enough... proceeding with flush of " +
          region.getRegionNameAsString());


I did a quick test setting "hbase.hregion.memstore.block.multiplier" = 0,
issuing a put in hbase shell will trigger flush and throw region too busy
exception to client,  and the retry mechanism will make it done in next
multi RPC call.



On Wed, Nov 12, 2014 at 1:21 AM, Brian Jeltema <
brian.jeltema@digitalenvoy.net> wrote:

> Thanks. I appear to have resolved this problem by restarting the HBase
> Master and the RegionServers
> that were reporting the failure.
>
> Brian
>
> On Nov 11, 2014, at 12:13 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > For your first question, region server web UI,
> > rs-status#regionRequestStats, shows Write Request Count.
> >
> > You can monitor the value for the underlying region to see if it receives
> > above-normal writes.
> >
> > Cheers
> >
> > On Mon, Nov 10, 2014 at 4:06 PM, Brian Jeltema <bd...@gmail.com>
> wrote:
> >
> >>> Was the region containing this row hot around the time of failure ?
> >>
> >> How do I measure that?
> >>
> >>>
> >>> Can you check region server log (along with monitoring tool) what
> >> memstore pressure was ?
> >>
> >> I didn't see anything in the region server logs to indicate a problem.
> And
> >> given the
> >> reproducibility of the behavior, it's hard to see how dynamic parameters
> >> such as
> >> memory pressure could be at the root of the problem.
> >>
> >> Brian
> >>
> >> On Nov 10, 2014, at 3:22 PM, Ted Yu <yu...@gmail.com> wrote:
> >>
> >>> Was the region containing this row hot around the time of failure ?
> >>>
> >>> Can you check region server log (along with monitoring tool) what
> >> memstore pressure was ?
> >>>
> >>> Thanks
> >>>
> >>> On Nov 10, 2014, at 11:34 AM, Brian Jeltema <
> >> brian.jeltema@digitalenvoy.net> wrote:
> >>>
> >>>>> How many tasks may write to this row concurrently ?
> >>>>
> >>>> only 1 mapper should be writing to this row. Is there a way to check
> >> which
> >>>> locks are being held?
> >>>>
> >>>>> Which 0.98 release are you using ?
> >>>>
> >>>> 0.98.0.2.1.2.1-471-hadoop2
> >>>>
> >>>> Thanks
> >>>> Brian
> >>>>
> >>>> On Nov 10, 2014, at 2:21 PM, Ted Yu <yu...@gmail.com> wrote:
> >>>>
> >>>>> There could be more than one reason where RegionTooBusyException is
> >> thrown.
> >>>>> Below are two (from HRegion):
> >>>>>
> >>>>> * We throw RegionTooBusyException if above memstore limit
> >>>>> * and expect client to retry using some kind of backoff
> >>>>> */
> >>>>> private void checkResources()
> >>>>>
> >>>>> * Try to acquire a lock.  Throw RegionTooBusyException
> >>>>>
> >>>>> * if failed to get the lock in time. Throw InterruptedIOException
> >>>>>
> >>>>> * if interrupted while waiting for the lock.
> >>>>>
> >>>>> */
> >>>>>
> >>>>> private void lock(final Lock lock, final int multiplier)
> >>>>>
> >>>>> How many tasks may write to this row concurrently ?
> >>>>>
> >>>>> Which 0.98 release are you using ?
> >>>>>
> >>>>> Cheers
> >>>>>
> >>>>> On Mon, Nov 10, 2014 at 11:10 AM, Brian Jeltema <
> >>>>> brian.jeltema@digitalenvoy.net> wrote:
> >>>>>
> >>>>>> I’m running a map/reduce job against a table that is performing a
> >> large
> >>>>>> number of writes (probably updating every row).
> >>>>>> The job is failing with the exception below. This is a solid
> failure;
> >> it
> >>>>>> dies at the same point in the application,
> >>>>>> and at the same row in the table. So I doubt it’s a conflict with
> >>>>>> compaction (and the UI shows no compaction in progress),
> >>>>>> or that there is a load-related cause.
> >>>>>>
> >>>>>> ‘hbase hbck’ does not report any inconsistencies. The
> >>>>>> ‘waitForAllPreviousOpsAndReset’ leads me to suspect that
> >>>>>> there is operation in progress that is hung and blocking the
> update. I
> >>>>>> don’t see anything suspicious in the HBase logs.
> >>>>>> The data at the point of failure is not unusual, and is identical to
> >> many
> >>>>>> preceding rows.
> >>>>>> Does anybody have any ideas of what I should look for to find the
> >> cause of
> >>>>>> this RegionTooBusyException?
> >>>>>>
> >>>>>> This is Hadoop 2.4 and HBase 0.98.
> >>>>>>
> >>>>>> 14/11/10 13:46:13 INFO mapreduce.Job: Task Id :
> >>>>>> attempt_1415210751318_0010_m_000314_1, Status : FAILED
> >>>>>> Error:
> >>>>>> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
> >> Failed
> >>>>>> 1744 actions: RegionTooBusyException: 1744 times,
> >>>>>>     at
> >>>>>>
> >>
> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:207)
> >>>>>>     at
> >>>>>>
> >>
> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1700(AsyncProcess.java:187)
> >>>>>>     at
> >>>>>>
> >>
> org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1568)
> >>>>>>     at
> >>>>>>
> >>
> org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:1023)
> >>>>>>     at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:995)
> >>>>>>     at org.apache.hadoop.hbase.client.HTable.put(HTable.java:953)
> >>>>>>
> >>>>>> Brian
> >>>>
> >>>
> >>
> >>
>
>

Re: what can cause RegionTooBusyException?

Posted by Brian Jeltema <br...@digitalenvoy.net>.

Thanks. I appear to have resolved this problem by restarting the HBase Master and the RegionServers
that were reporting the failure.

Brian

On Nov 11, 2014, at 12:13 PM, Ted Yu <yu...@gmail.com> wrote:

> For your first question, region server web UI,
> rs-status#regionRequestStats, shows Write Request Count.
> 
> You can monitor the value for the underlying region to see if it receives
> above-normal writes.
> 
> Cheers
> 
> On Mon, Nov 10, 2014 at 4:06 PM, Brian Jeltema <bd...@gmail.com> wrote:
> 
>>> Was the region containing this row hot around the time of failure ?
>> 
>> How do I measure that?
>> 
>>> 
>>> Can you check region server log (along with monitoring tool) what
>> memstore pressure was ?
>> 
>> I didn't see anything in the region server logs to indicate a problem. And
>> given the
>> reproducibility of the behavior, it's hard to see how dynamic parameters
>> such as
>> memory pressure could be at the root of the problem.
>> 
>> Brian
>> 
>> On Nov 10, 2014, at 3:22 PM, Ted Yu <yu...@gmail.com> wrote:
>> 
>>> Was the region containing this row hot around the time of failure ?
>>> 
>>> Can you check region server log (along with monitoring tool) what
>> memstore pressure was ?
>>> 
>>> Thanks
>>> 
>>> On Nov 10, 2014, at 11:34 AM, Brian Jeltema <
>> brian.jeltema@digitalenvoy.net> wrote:
>>> 
>>>>> How many tasks may write to this row concurrently ?
>>>> 
>>>> only 1 mapper should be writing to this row. Is there a way to check
>> which
>>>> locks are being held?
>>>> 
>>>>> Which 0.98 release are you using ?
>>>> 
>>>> 0.98.0.2.1.2.1-471-hadoop2
>>>> 
>>>> Thanks
>>>> Brian
>>>> 
>>>> On Nov 10, 2014, at 2:21 PM, Ted Yu <yu...@gmail.com> wrote:
>>>> 
>>>>> There could be more than one reason where RegionTooBusyException is
>> thrown.
>>>>> Below are two (from HRegion):
>>>>> 
>>>>> * We throw RegionTooBusyException if above memstore limit
>>>>> * and expect client to retry using some kind of backoff
>>>>> */
>>>>> private void checkResources()
>>>>> 
>>>>> * Try to acquire a lock.  Throw RegionTooBusyException
>>>>> 
>>>>> * if failed to get the lock in time. Throw InterruptedIOException
>>>>> 
>>>>> * if interrupted while waiting for the lock.
>>>>> 
>>>>> */
>>>>> 
>>>>> private void lock(final Lock lock, final int multiplier)
>>>>> 
>>>>> How many tasks may write to this row concurrently ?
>>>>> 
>>>>> Which 0.98 release are you using ?
>>>>> 
>>>>> Cheers
>>>>> 
>>>>> On Mon, Nov 10, 2014 at 11:10 AM, Brian Jeltema <
>>>>> brian.jeltema@digitalenvoy.net> wrote:
>>>>> 
>>>>>> I’m running a map/reduce job against a table that is performing a
>> large
>>>>>> number of writes (probably updating every row).
>>>>>> The job is failing with the exception below. This is a solid failure;
>> it
>>>>>> dies at the same point in the application,
>>>>>> and at the same row in the table. So I doubt it’s a conflict with
>>>>>> compaction (and the UI shows no compaction in progress),
>>>>>> or that there is a load-related cause.
>>>>>> 
>>>>>> ‘hbase hbck’ does not report any inconsistencies. The
>>>>>> ‘waitForAllPreviousOpsAndReset’ leads me to suspect that
>>>>>> there is operation in progress that is hung and blocking the update. I
>>>>>> don’t see anything suspicious in the HBase logs.
>>>>>> The data at the point of failure is not unusual, and is identical to
>> many
>>>>>> preceding rows.
>>>>>> Does anybody have any ideas of what I should look for to find the
>> cause of
>>>>>> this RegionTooBusyException?
>>>>>> 
>>>>>> This is Hadoop 2.4 and HBase 0.98.
>>>>>> 
>>>>>> 14/11/10 13:46:13 INFO mapreduce.Job: Task Id :
>>>>>> attempt_1415210751318_0010_m_000314_1, Status : FAILED
>>>>>> Error:
>>>>>> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
>> Failed
>>>>>> 1744 actions: RegionTooBusyException: 1744 times,
>>>>>>     at
>>>>>> 
>> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:207)
>>>>>>     at
>>>>>> 
>> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1700(AsyncProcess.java:187)
>>>>>>     at
>>>>>> 
>> org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1568)
>>>>>>     at
>>>>>> 
>> org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:1023)
>>>>>>     at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:995)
>>>>>>     at org.apache.hadoop.hbase.client.HTable.put(HTable.java:953)
>>>>>> 
>>>>>> Brian
>>>> 
>>> 
>> 
>>

Re: what can cause RegionTooBusyException?

Posted by Ted Yu <yu...@gmail.com>.

For your first question, region server web UI,
rs-status#regionRequestStats, shows Write Request Count.

You can monitor the value for the underlying region to see if it receives
above-normal writes.

Cheers

On Mon, Nov 10, 2014 at 4:06 PM, Brian Jeltema <bd...@gmail.com> wrote:

> > Was the region containing this row hot around the time of failure ?
>
> How do I measure that?
>
> >
> > Can you check region server log (along with monitoring tool) what
> memstore pressure was ?
>
> I didn't see anything in the region server logs to indicate a problem. And
> given the
> reproducibility of the behavior, it's hard to see how dynamic parameters
> such as
> memory pressure could be at the root of the problem.
>
> Brian
>
> On Nov 10, 2014, at 3:22 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > Was the region containing this row hot around the time of failure ?
> >
> > Can you check region server log (along with monitoring tool) what
> memstore pressure was ?
> >
> > Thanks
> >
> > On Nov 10, 2014, at 11:34 AM, Brian Jeltema <
> brian.jeltema@digitalenvoy.net> wrote:
> >
> >>> How many tasks may write to this row concurrently ?
> >>
> >> only 1 mapper should be writing to this row. Is there a way to check
> which
> >> locks are being held?
> >>
> >>> Which 0.98 release are you using ?
> >>
> >> 0.98.0.2.1.2.1-471-hadoop2
> >>
> >> Thanks
> >> Brian
> >>
> >> On Nov 10, 2014, at 2:21 PM, Ted Yu <yu...@gmail.com> wrote:
> >>
> >>> There could be more than one reason where RegionTooBusyException is
> thrown.
> >>> Below are two (from HRegion):
> >>>
> >>> * We throw RegionTooBusyException if above memstore limit
> >>> * and expect client to retry using some kind of backoff
> >>> */
> >>> private void checkResources()
> >>>
> >>> * Try to acquire a lock.  Throw RegionTooBusyException
> >>>
> >>> * if failed to get the lock in time. Throw InterruptedIOException
> >>>
> >>> * if interrupted while waiting for the lock.
> >>>
> >>> */
> >>>
> >>> private void lock(final Lock lock, final int multiplier)
> >>>
> >>> How many tasks may write to this row concurrently ?
> >>>
> >>> Which 0.98 release are you using ?
> >>>
> >>> Cheers
> >>>
> >>> On Mon, Nov 10, 2014 at 11:10 AM, Brian Jeltema <
> >>> brian.jeltema@digitalenvoy.net> wrote:
> >>>
> >>>> I’m running a map/reduce job against a table that is performing a
> large
> >>>> number of writes (probably updating every row).
> >>>> The job is failing with the exception below. This is a solid failure;
> it
> >>>> dies at the same point in the application,
> >>>> and at the same row in the table. So I doubt it’s a conflict with
> >>>> compaction (and the UI shows no compaction in progress),
> >>>> or that there is a load-related cause.
> >>>>
> >>>> ‘hbase hbck’ does not report any inconsistencies. The
> >>>> ‘waitForAllPreviousOpsAndReset’ leads me to suspect that
> >>>> there is operation in progress that is hung and blocking the update. I
> >>>> don’t see anything suspicious in the HBase logs.
> >>>> The data at the point of failure is not unusual, and is identical to
> many
> >>>> preceding rows.
> >>>> Does anybody have any ideas of what I should look for to find the
> cause of
> >>>> this RegionTooBusyException?
> >>>>
> >>>> This is Hadoop 2.4 and HBase 0.98.
> >>>>
> >>>> 14/11/10 13:46:13 INFO mapreduce.Job: Task Id :
> >>>> attempt_1415210751318_0010_m_000314_1, Status : FAILED
> >>>> Error:
> >>>> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
> Failed
> >>>> 1744 actions: RegionTooBusyException: 1744 times,
> >>>>      at
> >>>>
> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:207)
> >>>>      at
> >>>>
> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1700(AsyncProcess.java:187)
> >>>>      at
> >>>>
> org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1568)
> >>>>      at
> >>>>
> org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:1023)
> >>>>      at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:995)
> >>>>      at org.apache.hadoop.hbase.client.HTable.put(HTable.java:953)
> >>>>
> >>>> Brian
> >>
> >
>
>

Re: what can cause RegionTooBusyException?

Posted by Brian Jeltema <bd...@gmail.com>.

> Was the region containing this row hot around the time of failure ?

How do I measure that?

> 
> Can you check region server log (along with monitoring tool) what memstore pressure was ?

I didn't see anything in the region server logs to indicate a problem. And given the 
reproducibility of the behavior, it's hard to see how dynamic parameters such as
memory pressure could be at the root of the problem.

Brian

On Nov 10, 2014, at 3:22 PM, Ted Yu <yu...@gmail.com> wrote:

> Was the region containing this row hot around the time of failure ?
> 
> Can you check region server log (along with monitoring tool) what memstore pressure was ?
> 
> Thanks
> 
> On Nov 10, 2014, at 11:34 AM, Brian Jeltema <br...@digitalenvoy.net> wrote:
> 
>>> How many tasks may write to this row concurrently ?
>> 
>> only 1 mapper should be writing to this row. Is there a way to check which
>> locks are being held?
>> 
>>> Which 0.98 release are you using ?
>> 
>> 0.98.0.2.1.2.1-471-hadoop2
>> 
>> Thanks
>> Brian
>> 
>> On Nov 10, 2014, at 2:21 PM, Ted Yu <yu...@gmail.com> wrote:
>> 
>>> There could be more than one reason where RegionTooBusyException is thrown.
>>> Below are two (from HRegion):
>>> 
>>> * We throw RegionTooBusyException if above memstore limit
>>> * and expect client to retry using some kind of backoff
>>> */
>>> private void checkResources()
>>> 
>>> * Try to acquire a lock.  Throw RegionTooBusyException
>>> 
>>> * if failed to get the lock in time. Throw InterruptedIOException
>>> 
>>> * if interrupted while waiting for the lock.
>>> 
>>> */
>>> 
>>> private void lock(final Lock lock, final int multiplier)
>>> 
>>> How many tasks may write to this row concurrently ?
>>> 
>>> Which 0.98 release are you using ?
>>> 
>>> Cheers
>>> 
>>> On Mon, Nov 10, 2014 at 11:10 AM, Brian Jeltema <
>>> brian.jeltema@digitalenvoy.net> wrote:
>>> 
>>>> I’m running a map/reduce job against a table that is performing a large
>>>> number of writes (probably updating every row).
>>>> The job is failing with the exception below. This is a solid failure; it
>>>> dies at the same point in the application,
>>>> and at the same row in the table. So I doubt it’s a conflict with
>>>> compaction (and the UI shows no compaction in progress),
>>>> or that there is a load-related cause.
>>>> 
>>>> ‘hbase hbck’ does not report any inconsistencies. The
>>>> ‘waitForAllPreviousOpsAndReset’ leads me to suspect that
>>>> there is operation in progress that is hung and blocking the update. I
>>>> don’t see anything suspicious in the HBase logs.
>>>> The data at the point of failure is not unusual, and is identical to many
>>>> preceding rows.
>>>> Does anybody have any ideas of what I should look for to find the cause of
>>>> this RegionTooBusyException?
>>>> 
>>>> This is Hadoop 2.4 and HBase 0.98.
>>>> 
>>>> 14/11/10 13:46:13 INFO mapreduce.Job: Task Id :
>>>> attempt_1415210751318_0010_m_000314_1, Status : FAILED
>>>> Error:
>>>> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed
>>>> 1744 actions: RegionTooBusyException: 1744 times,
>>>>      at
>>>> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:207)
>>>>      at
>>>> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1700(AsyncProcess.java:187)
>>>>      at
>>>> org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1568)
>>>>      at
>>>> org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:1023)
>>>>      at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:995)
>>>>      at org.apache.hadoop.hbase.client.HTable.put(HTable.java:953)
>>>> 
>>>> Brian
>> 
>

Re: what can cause RegionTooBusyException?

Posted by Ted Yu <yu...@gmail.com>.

Was the region containing this row hot around the time of failure ?

Can you check region server log (along with monitoring tool) what memstore pressure was ?

Thanks

On Nov 10, 2014, at 11:34 AM, Brian Jeltema <br...@digitalenvoy.net> wrote:

>> How many tasks may write to this row concurrently ?
> 
> only 1 mapper should be writing to this row. Is there a way to check which
> locks are being held?
> 
>> Which 0.98 release are you using ?
> 
> 0.98.0.2.1.2.1-471-hadoop2
> 
> Thanks
> Brian
> 
> On Nov 10, 2014, at 2:21 PM, Ted Yu <yu...@gmail.com> wrote:
> 
>> There could be more than one reason where RegionTooBusyException is thrown.
>> Below are two (from HRegion):
>> 
>>  * We throw RegionTooBusyException if above memstore limit
>>  * and expect client to retry using some kind of backoff
>> */
>> private void checkResources()
>> 
>>  * Try to acquire a lock.  Throw RegionTooBusyException
>> 
>>  * if failed to get the lock in time. Throw InterruptedIOException
>> 
>>  * if interrupted while waiting for the lock.
>> 
>>  */
>> 
>> private void lock(final Lock lock, final int multiplier)
>> 
>> How many tasks may write to this row concurrently ?
>> 
>> Which 0.98 release are you using ?
>> 
>> Cheers
>> 
>> On Mon, Nov 10, 2014 at 11:10 AM, Brian Jeltema <
>> brian.jeltema@digitalenvoy.net> wrote:
>> 
>>> I’m running a map/reduce job against a table that is performing a large
>>> number of writes (probably updating every row).
>>> The job is failing with the exception below. This is a solid failure; it
>>> dies at the same point in the application,
>>> and at the same row in the table. So I doubt it’s a conflict with
>>> compaction (and the UI shows no compaction in progress),
>>> or that there is a load-related cause.
>>> 
>>> ‘hbase hbck’ does not report any inconsistencies. The
>>> ‘waitForAllPreviousOpsAndReset’ leads me to suspect that
>>> there is operation in progress that is hung and blocking the update. I
>>> don’t see anything suspicious in the HBase logs.
>>> The data at the point of failure is not unusual, and is identical to many
>>> preceding rows.
>>> Does anybody have any ideas of what I should look for to find the cause of
>>> this RegionTooBusyException?
>>> 
>>> This is Hadoop 2.4 and HBase 0.98.
>>> 
>>> 14/11/10 13:46:13 INFO mapreduce.Job: Task Id :
>>> attempt_1415210751318_0010_m_000314_1, Status : FAILED
>>> Error:
>>> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed
>>> 1744 actions: RegionTooBusyException: 1744 times,
>>>       at
>>> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:207)
>>>       at
>>> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1700(AsyncProcess.java:187)
>>>       at
>>> org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1568)
>>>       at
>>> org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:1023)
>>>       at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:995)
>>>       at org.apache.hadoop.hbase.client.HTable.put(HTable.java:953)
>>> 
>>> Brian
>

Re: what can cause RegionTooBusyException?

Posted by Brian Jeltema <br...@digitalenvoy.net>.

> How many tasks may write to this row concurrently ?

only 1 mapper should be writing to this row. Is there a way to check which
locks are being held?

> Which 0.98 release are you using ?

0.98.0.2.1.2.1-471-hadoop2

Thanks
Brian

On Nov 10, 2014, at 2:21 PM, Ted Yu <yu...@gmail.com> wrote:

> There could be more than one reason where RegionTooBusyException is thrown.
> Below are two (from HRegion):
> 
>   * We throw RegionTooBusyException if above memstore limit
>   * and expect client to retry using some kind of backoff
>  */
>  private void checkResources()
> 
>   * Try to acquire a lock.  Throw RegionTooBusyException
> 
>   * if failed to get the lock in time. Throw InterruptedIOException
> 
>   * if interrupted while waiting for the lock.
> 
>   */
> 
>  private void lock(final Lock lock, final int multiplier)
> 
> How many tasks may write to this row concurrently ?
> 
> Which 0.98 release are you using ?
> 
> Cheers
> 
> On Mon, Nov 10, 2014 at 11:10 AM, Brian Jeltema <
> brian.jeltema@digitalenvoy.net> wrote:
> 
>> I’m running a map/reduce job against a table that is performing a large
>> number of writes (probably updating every row).
>> The job is failing with the exception below. This is a solid failure; it
>> dies at the same point in the application,
>> and at the same row in the table. So I doubt it’s a conflict with
>> compaction (and the UI shows no compaction in progress),
>> or that there is a load-related cause.
>> 
>> ‘hbase hbck’ does not report any inconsistencies. The
>> ‘waitForAllPreviousOpsAndReset’ leads me to suspect that
>> there is operation in progress that is hung and blocking the update. I
>> don’t see anything suspicious in the HBase logs.
>> The data at the point of failure is not unusual, and is identical to many
>> preceding rows.
>> Does anybody have any ideas of what I should look for to find the cause of
>> this RegionTooBusyException?
>> 
>> This is Hadoop 2.4 and HBase 0.98.
>> 
>> 14/11/10 13:46:13 INFO mapreduce.Job: Task Id :
>> attempt_1415210751318_0010_m_000314_1, Status : FAILED
>> Error:
>> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed
>> 1744 actions: RegionTooBusyException: 1744 times,
>>        at
>> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:207)
>>        at
>> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1700(AsyncProcess.java:187)
>>        at
>> org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1568)
>>        at
>> org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:1023)
>>        at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:995)
>>        at org.apache.hadoop.hbase.client.HTable.put(HTable.java:953)
>> 
>> Brian

Re: what can cause RegionTooBusyException?

Posted by Ted Yu <yu...@gmail.com>.

There could be more than one reason where RegionTooBusyException is thrown.
Below are two (from HRegion):

   * We throw RegionTooBusyException if above memstore limit
   * and expect client to retry using some kind of backoff
  */
  private void checkResources()

   * Try to acquire a lock.  Throw RegionTooBusyException

   * if failed to get the lock in time. Throw InterruptedIOException

   * if interrupted while waiting for the lock.

   */

  private void lock(final Lock lock, final int multiplier)

How many tasks may write to this row concurrently ?

Which 0.98 release are you using ?

Cheers

On Mon, Nov 10, 2014 at 11:10 AM, Brian Jeltema <
brian.jeltema@digitalenvoy.net> wrote:

> I’m running a map/reduce job against a table that is performing a large
> number of writes (probably updating every row).
> The job is failing with the exception below. This is a solid failure; it
> dies at the same point in the application,
> and at the same row in the table. So I doubt it’s a conflict with
> compaction (and the UI shows no compaction in progress),
> or that there is a load-related cause.
>
> ‘hbase hbck’ does not report any inconsistencies. The
> ‘waitForAllPreviousOpsAndReset’ leads me to suspect that
> there is operation in progress that is hung and blocking the update. I
> don’t see anything suspicious in the HBase logs.
> The data at the point of failure is not unusual, and is identical to many
> preceding rows.
> Does anybody have any ideas of what I should look for to find the cause of
> this RegionTooBusyException?
>
> This is Hadoop 2.4 and HBase 0.98.
>
> 14/11/10 13:46:13 INFO mapreduce.Job: Task Id :
> attempt_1415210751318_0010_m_000314_1, Status : FAILED
> Error:
> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed
> 1744 actions: RegionTooBusyException: 1744 times,
>         at
> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:207)
>         at
> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1700(AsyncProcess.java:187)
>         at
> org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1568)
>         at
> org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:1023)
>         at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:995)
>         at org.apache.hadoop.hbase.client.HTable.put(HTable.java:953)
>
> Brian