You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "stack (JIRA)" <ji...@apache.org> on 2008/09/03 23:39:44 UTC

[jira] Created: (HBASE-866) Blocking for ten minutes at a time

Blocking for ten minutes at a time
----------------------------------

                 Key: HBASE-866
                 URL: https://issues.apache.org/jira/browse/HBASE-866
             Project: Hadoop HBase
          Issue Type: Bug
            Reporter: stack


I've been testing running biggish MR jobs uploading into hbase.  My jobs consistently fail with child task timing out its ten minute period.  Adding logging, was able to see that we're actual stuck in a commit.  Following the thread of the row we're committing, I see this in the log:

{code}
...
2008-09-03 18:37:03,446 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Flush requested on TestTable,0029377106,1220466998108
2008-09-03 18:37:03,446 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memcache flush for region TestTable,0029377106,1220466998108. Current region memcache size 64.0m
2008-09-03 18:37:03,446 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 1 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
2008-09-03 18:37:13,450 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 1 on 60020'
2008-09-03 18:37:16,089 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 16 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 1 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 4 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 6 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 2 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 12 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 9 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
2008-09-03 18:37:16,091 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 7 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
2008-09-03 18:37:21,984 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Finished memcache flush for region TestTable,0029377106,1220466998108 in 18538ms, sequence id=2852547, compaction requested=false
2008-09-03 18:47:06,241 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memcache flush for region TestTable,0029377106,1220466998108. Current region memcache size 64.0m
2008-09-03 18:47:10,031 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Finished memcache flush for region TestTable,0029377106,1220466998108 in 3790ms, sequence id=2919208, compaction requested=true
2008-09-03 18:47:10,031 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 9 on 60020'
2008-09-03 18:47:10,031 DEBUG org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction requested for region: TestTable,0029377106,1220466998108
2008-09-03 18:47:10,031 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 12 on 60020'
2008-09-03 18:47:10,032 INFO org.apache.hadoop.hbase.regionserver.HRegion: starting compaction on region TestTable,0029377106,1220466998108
2008-09-03 18:47:10,032 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 7 on 60020'
2008-09-03 18:47:10,035 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 6 on 60020'
2008-09-03 18:47:10,035 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 4 on 60020'
2008-09-03 18:47:10,035 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 2 on 60020'
2008-09-03 18:47:10,037 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 16 on 60020'
2008-09-03 18:47:10,043 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 1 on 60020'
2008-09-03 18:47:18,403 INFO org.apache.hadoop.hbase.regionserver.HRegion: compaction completed on region TestTable,0029377106,1220466998108 in 8sec
...
{code}

Notice how we're blocked for ten minutes until new flush runs.  My guess is that the flush that is going on concurrent with the blocking is clearing the flag 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-866) Blocking for ten minutes at a time

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-866:
------------------------

    Priority: Minor  (was: Major)

Ok. This issue is not as bad as I thought.   I had hbase.hregion.memcache.block.multiplier set to 1 -- and optional flushes to every ten minutes -- so was seeing loads of blocking.  Downgrading to minor.  The possibility for our blocking forever is still there so needs to be fixed; its just not too likely.  Have been able to load up a 4 node cluster w/ 1/4B rows using default heap sizes with about 300 regions per server without my MR loader failing a single task.

> Blocking for ten minutes at a time
> ----------------------------------
>
>                 Key: HBASE-866
>                 URL: https://issues.apache.org/jira/browse/HBASE-866
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>            Priority: Minor
>
> I've been testing running biggish MR jobs uploading into hbase.  My jobs consistently fail with child task timing out its ten minute period.  Adding logging, was able to see that we're actual stuck in a commit.  Following the thread of the row we're committing, I see this in the log:
> {code}
> ...
> 2008-09-03 18:37:03,446 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Flush requested on TestTable,0029377106,1220466998108
> 2008-09-03 18:37:03,446 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memcache flush for region TestTable,0029377106,1220466998108. Current region memcache size 64.0m
> 2008-09-03 18:37:03,446 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 1 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:13,450 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 1 on 60020'
> 2008-09-03 18:37:16,089 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 16 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 1 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 4 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 6 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 2 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 12 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 9 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:16,091 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 7 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:21,984 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Finished memcache flush for region TestTable,0029377106,1220466998108 in 18538ms, sequence id=2852547, compaction requested=false
> 2008-09-03 18:47:06,241 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memcache flush for region TestTable,0029377106,1220466998108. Current region memcache size 64.0m
> 2008-09-03 18:47:10,031 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Finished memcache flush for region TestTable,0029377106,1220466998108 in 3790ms, sequence id=2919208, compaction requested=true
> 2008-09-03 18:47:10,031 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 9 on 60020'
> 2008-09-03 18:47:10,031 DEBUG org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction requested for region: TestTable,0029377106,1220466998108
> 2008-09-03 18:47:10,031 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 12 on 60020'
> 2008-09-03 18:47:10,032 INFO org.apache.hadoop.hbase.regionserver.HRegion: starting compaction on region TestTable,0029377106,1220466998108
> 2008-09-03 18:47:10,032 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 7 on 60020'
> 2008-09-03 18:47:10,035 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 6 on 60020'
> 2008-09-03 18:47:10,035 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 4 on 60020'
> 2008-09-03 18:47:10,035 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 2 on 60020'
> 2008-09-03 18:47:10,037 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 16 on 60020'
> 2008-09-03 18:47:10,043 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 1 on 60020'
> 2008-09-03 18:47:18,403 INFO org.apache.hadoop.hbase.regionserver.HRegion: compaction completed on region TestTable,0029377106,1220466998108 in 8sec
> ...
> {code}
> Notice how we're blocked for ten minutes until new flush runs.  My guess is that the flush that is going on concurrent with the blocking is clearing the flag 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-866) Blocking for ten minutes at a time

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12641107#action_12641107 ] 

stack commented on HBASE-866:
-----------------------------

Pirroh from #IRC had this issue w/ recent hbase.  He had set the flushsize down to 16MB.  Try doing this to see if it makes the issue come back.

> Blocking for ten minutes at a time
> ----------------------------------
>
>                 Key: HBASE-866
>                 URL: https://issues.apache.org/jira/browse/HBASE-866
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>            Priority: Minor
>
> I've been testing running biggish MR jobs uploading into hbase.  My jobs consistently fail with child task timing out its ten minute period.  Adding logging, was able to see that we're actual stuck in a commit.  Following the thread of the row we're committing, I see this in the log:
> {code}
> ...
> 2008-09-03 18:37:03,446 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Flush requested on TestTable,0029377106,1220466998108
> 2008-09-03 18:37:03,446 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memcache flush for region TestTable,0029377106,1220466998108. Current region memcache size 64.0m
> 2008-09-03 18:37:03,446 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 1 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:13,450 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 1 on 60020'
> 2008-09-03 18:37:16,089 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 16 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 1 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 4 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 6 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 2 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 12 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 9 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:16,091 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 7 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:21,984 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Finished memcache flush for region TestTable,0029377106,1220466998108 in 18538ms, sequence id=2852547, compaction requested=false
> 2008-09-03 18:47:06,241 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memcache flush for region TestTable,0029377106,1220466998108. Current region memcache size 64.0m
> 2008-09-03 18:47:10,031 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Finished memcache flush for region TestTable,0029377106,1220466998108 in 3790ms, sequence id=2919208, compaction requested=true
> 2008-09-03 18:47:10,031 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 9 on 60020'
> 2008-09-03 18:47:10,031 DEBUG org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction requested for region: TestTable,0029377106,1220466998108
> 2008-09-03 18:47:10,031 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 12 on 60020'
> 2008-09-03 18:47:10,032 INFO org.apache.hadoop.hbase.regionserver.HRegion: starting compaction on region TestTable,0029377106,1220466998108
> 2008-09-03 18:47:10,032 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 7 on 60020'
> 2008-09-03 18:47:10,035 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 6 on 60020'
> 2008-09-03 18:47:10,035 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 4 on 60020'
> 2008-09-03 18:47:10,035 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 2 on 60020'
> 2008-09-03 18:47:10,037 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 16 on 60020'
> 2008-09-03 18:47:10,043 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 1 on 60020'
> 2008-09-03 18:47:18,403 INFO org.apache.hadoop.hbase.regionserver.HRegion: compaction completed on region TestTable,0029377106,1220466998108 in 8sec
> ...
> {code}
> Notice how we're blocked for ten minutes until new flush runs.  My guess is that the flush that is going on concurrent with the blocking is clearing the flag 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-866) Blocking for ten minutes at a time

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12672708#action_12672708 ] 

Andrew Purtell commented on HBASE-866:
--------------------------------------

The information in the TOE report does not indicate what specific region is being contacted for the store, only the URL that the TOE is processing. For example we store a record for the URL, a record for the content (key is MD5 hash of the content), some other metadata records in other tables, etc. 

My cluster is constantly freezing up on the write path now and I am needing to restart it every couple of hours. I have taken a snapshot of all thread/stack traces just before the last restart, if you would like to take a look. 

> Blocking for ten minutes at a time
> ----------------------------------
>
>                 Key: HBASE-866
>                 URL: https://issues.apache.org/jira/browse/HBASE-866
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>            Priority: Minor
>         Attachments: thread_dump.txt
>
>
> I've been testing running biggish MR jobs uploading into hbase.  My jobs consistently fail with child task timing out its ten minute period.  Adding logging, was able to see that we're actual stuck in a commit.  Following the thread of the row we're committing, I see this in the log:
> {code}
> ...
> 2008-09-03 18:37:03,446 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Flush requested on TestTable,0029377106,1220466998108
> 2008-09-03 18:37:03,446 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memcache flush for region TestTable,0029377106,1220466998108. Current region memcache size 64.0m
> 2008-09-03 18:37:03,446 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 1 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:13,450 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 1 on 60020'
> 2008-09-03 18:37:16,089 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 16 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 1 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 4 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 6 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 2 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 12 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 9 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:16,091 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 7 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:21,984 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Finished memcache flush for region TestTable,0029377106,1220466998108 in 18538ms, sequence id=2852547, compaction requested=false
> 2008-09-03 18:47:06,241 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memcache flush for region TestTable,0029377106,1220466998108. Current region memcache size 64.0m
> 2008-09-03 18:47:10,031 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Finished memcache flush for region TestTable,0029377106,1220466998108 in 3790ms, sequence id=2919208, compaction requested=true
> 2008-09-03 18:47:10,031 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 9 on 60020'
> 2008-09-03 18:47:10,031 DEBUG org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction requested for region: TestTable,0029377106,1220466998108
> 2008-09-03 18:47:10,031 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 12 on 60020'
> 2008-09-03 18:47:10,032 INFO org.apache.hadoop.hbase.regionserver.HRegion: starting compaction on region TestTable,0029377106,1220466998108
> 2008-09-03 18:47:10,032 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 7 on 60020'
> 2008-09-03 18:47:10,035 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 6 on 60020'
> 2008-09-03 18:47:10,035 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 4 on 60020'
> 2008-09-03 18:47:10,035 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 2 on 60020'
> 2008-09-03 18:47:10,037 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 16 on 60020'
> 2008-09-03 18:47:10,043 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 1 on 60020'
> 2008-09-03 18:47:18,403 INFO org.apache.hadoop.hbase.regionserver.HRegion: compaction completed on region TestTable,0029377106,1220466998108 in 8sec
> ...
> {code}
> Notice how we're blocked for ten minutes until new flush runs.  My guess is that the flush that is going on concurrent with the blocking is clearing the flag 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-866) Blocking for ten minutes at a time

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12672528#action_12672528 ] 

stack commented on HBASE-866:
-----------------------------

Were you able to look at logs on the node they were all hung on Andrew?  Did it have logging of 'Blocking updates' with no corresponding 'unblocking' messages?  Or could you get a thread dump out of the HRS to see why its hung up?  Thanks A.

> Blocking for ten minutes at a time
> ----------------------------------
>
>                 Key: HBASE-866
>                 URL: https://issues.apache.org/jira/browse/HBASE-866
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>            Priority: Minor
>         Attachments: thread_dump.txt
>
>
> I've been testing running biggish MR jobs uploading into hbase.  My jobs consistently fail with child task timing out its ten minute period.  Adding logging, was able to see that we're actual stuck in a commit.  Following the thread of the row we're committing, I see this in the log:
> {code}
> ...
> 2008-09-03 18:37:03,446 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Flush requested on TestTable,0029377106,1220466998108
> 2008-09-03 18:37:03,446 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memcache flush for region TestTable,0029377106,1220466998108. Current region memcache size 64.0m
> 2008-09-03 18:37:03,446 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 1 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:13,450 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 1 on 60020'
> 2008-09-03 18:37:16,089 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 16 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 1 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 4 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 6 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 2 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 12 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 9 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:16,091 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 7 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:21,984 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Finished memcache flush for region TestTable,0029377106,1220466998108 in 18538ms, sequence id=2852547, compaction requested=false
> 2008-09-03 18:47:06,241 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memcache flush for region TestTable,0029377106,1220466998108. Current region memcache size 64.0m
> 2008-09-03 18:47:10,031 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Finished memcache flush for region TestTable,0029377106,1220466998108 in 3790ms, sequence id=2919208, compaction requested=true
> 2008-09-03 18:47:10,031 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 9 on 60020'
> 2008-09-03 18:47:10,031 DEBUG org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction requested for region: TestTable,0029377106,1220466998108
> 2008-09-03 18:47:10,031 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 12 on 60020'
> 2008-09-03 18:47:10,032 INFO org.apache.hadoop.hbase.regionserver.HRegion: starting compaction on region TestTable,0029377106,1220466998108
> 2008-09-03 18:47:10,032 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 7 on 60020'
> 2008-09-03 18:47:10,035 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 6 on 60020'
> 2008-09-03 18:47:10,035 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 4 on 60020'
> 2008-09-03 18:47:10,035 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 2 on 60020'
> 2008-09-03 18:47:10,037 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 16 on 60020'
> 2008-09-03 18:47:10,043 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 1 on 60020'
> 2008-09-03 18:47:18,403 INFO org.apache.hadoop.hbase.regionserver.HRegion: compaction completed on region TestTable,0029377106,1220466998108 in 8sec
> ...
> {code}
> Notice how we're blocked for ten minutes until new flush runs.  My guess is that the flush that is going on concurrent with the blocking is clearing the flag 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-866) Blocking for ten minutes at a time

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12672469#action_12672469 ] 

Andrew Purtell commented on HBASE-866:
--------------------------------------

Here is a Hertitrix hbase-writer thread stuck in RPC for over an hour:

[ToeThread #96: http://esinidc.onlinedown.net/down/NXSetup.zip
 CrawlURI http://esinidc.onlinedown.net/down/NXSetup.zip LX http://www.onlinedown.net/soft/2919.htm    0 attempts
    in processor: Archiver
    ACTIVE for 1h21m41s176ms
    step: ABOUT_TO_BEGIN_PROCESSOR for 1h21m2s77ms
Java Thread State: WAITING
Blocked/Waiting On: NONE
    java.lang.Object.wait(Native Method)
    java.lang.Object.wait(Object.java:485)
    org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:695)
    org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:321)
    $Proxy12.batchUpdates(Unknown Source)
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers$2.call(HConnectionManager.java:953)
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers$2.call(HConnectionManager.java:951)
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:845)
    org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:950)
    org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:1372)
    org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1316)
    org.apache.hadoop.hbase.client.HTable.commit(HTable.java:1296)
    net.iridiant.heritrix.writer.HBaseWriter.write(Unknown Source)
    net.iridiant.heritrix.writer.HBaseWriterProcessor.write(Unknown Source)
    net.iridiant.heritrix.writer.HBaseWriterProcessor.innerProcessResult(Unknown Source)
    org.archive.modules.Processor.process(Processor.java:123)
    org.archive.crawler.framework.ToeThread.processCrawlUri(ToeThread.java:310)
    org.archive.crawler.framework.ToeThread.run(ToeThread.java:157)
]

I have a pool of 100 TOE threads all stuck in Archiver, a few in RPC, but only on one node. There are two other nodes still writing at full speed into HBase.

Intermittent synchronization/wake problem on the write path? Needs investigation.

> Blocking for ten minutes at a time
> ----------------------------------
>
>                 Key: HBASE-866
>                 URL: https://issues.apache.org/jira/browse/HBASE-866
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>            Priority: Minor
>         Attachments: thread_dump.txt
>
>
> I've been testing running biggish MR jobs uploading into hbase.  My jobs consistently fail with child task timing out its ten minute period.  Adding logging, was able to see that we're actual stuck in a commit.  Following the thread of the row we're committing, I see this in the log:
> {code}
> ...
> 2008-09-03 18:37:03,446 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Flush requested on TestTable,0029377106,1220466998108
> 2008-09-03 18:37:03,446 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memcache flush for region TestTable,0029377106,1220466998108. Current region memcache size 64.0m
> 2008-09-03 18:37:03,446 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 1 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:13,450 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 1 on 60020'
> 2008-09-03 18:37:16,089 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 16 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 1 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 4 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 6 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 2 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 12 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 9 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:16,091 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 7 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:21,984 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Finished memcache flush for region TestTable,0029377106,1220466998108 in 18538ms, sequence id=2852547, compaction requested=false
> 2008-09-03 18:47:06,241 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memcache flush for region TestTable,0029377106,1220466998108. Current region memcache size 64.0m
> 2008-09-03 18:47:10,031 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Finished memcache flush for region TestTable,0029377106,1220466998108 in 3790ms, sequence id=2919208, compaction requested=true
> 2008-09-03 18:47:10,031 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 9 on 60020'
> 2008-09-03 18:47:10,031 DEBUG org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction requested for region: TestTable,0029377106,1220466998108
> 2008-09-03 18:47:10,031 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 12 on 60020'
> 2008-09-03 18:47:10,032 INFO org.apache.hadoop.hbase.regionserver.HRegion: starting compaction on region TestTable,0029377106,1220466998108
> 2008-09-03 18:47:10,032 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 7 on 60020'
> 2008-09-03 18:47:10,035 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 6 on 60020'
> 2008-09-03 18:47:10,035 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 4 on 60020'
> 2008-09-03 18:47:10,035 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 2 on 60020'
> 2008-09-03 18:47:10,037 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 16 on 60020'
> 2008-09-03 18:47:10,043 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 1 on 60020'
> 2008-09-03 18:47:18,403 INFO org.apache.hadoop.hbase.regionserver.HRegion: compaction completed on region TestTable,0029377106,1220466998108 in 8sec
> ...
> {code}
> Notice how we're blocked for ten minutes until new flush runs.  My guess is that the flush that is going on concurrent with the blocking is clearing the flag 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-866) Blocking for ten minutes at a time

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12672719#action_12672719 ] 

stack commented on HBASE-866:
-----------------------------

Those OOME's in the IPC handlers are bad for us.  If the request made it into the HRS, we'd have a crack at it letting go of our memory resevoir and then shutting down.  Looks like we have another reason to mod our version of hadoop's RPC.

> Blocking for ten minutes at a time
> ----------------------------------
>
>                 Key: HBASE-866
>                 URL: https://issues.apache.org/jira/browse/HBASE-866
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>            Priority: Minor
>         Attachments: thread_dump.txt
>
>
> I've been testing running biggish MR jobs uploading into hbase.  My jobs consistently fail with child task timing out its ten minute period.  Adding logging, was able to see that we're actual stuck in a commit.  Following the thread of the row we're committing, I see this in the log:
> {code}
> ...
> 2008-09-03 18:37:03,446 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Flush requested on TestTable,0029377106,1220466998108
> 2008-09-03 18:37:03,446 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memcache flush for region TestTable,0029377106,1220466998108. Current region memcache size 64.0m
> 2008-09-03 18:37:03,446 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 1 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:13,450 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 1 on 60020'
> 2008-09-03 18:37:16,089 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 16 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 1 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 4 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 6 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 2 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 12 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 9 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:16,091 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 7 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:21,984 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Finished memcache flush for region TestTable,0029377106,1220466998108 in 18538ms, sequence id=2852547, compaction requested=false
> 2008-09-03 18:47:06,241 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memcache flush for region TestTable,0029377106,1220466998108. Current region memcache size 64.0m
> 2008-09-03 18:47:10,031 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Finished memcache flush for region TestTable,0029377106,1220466998108 in 3790ms, sequence id=2919208, compaction requested=true
> 2008-09-03 18:47:10,031 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 9 on 60020'
> 2008-09-03 18:47:10,031 DEBUG org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction requested for region: TestTable,0029377106,1220466998108
> 2008-09-03 18:47:10,031 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 12 on 60020'
> 2008-09-03 18:47:10,032 INFO org.apache.hadoop.hbase.regionserver.HRegion: starting compaction on region TestTable,0029377106,1220466998108
> 2008-09-03 18:47:10,032 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 7 on 60020'
> 2008-09-03 18:47:10,035 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 6 on 60020'
> 2008-09-03 18:47:10,035 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 4 on 60020'
> 2008-09-03 18:47:10,035 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 2 on 60020'
> 2008-09-03 18:47:10,037 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 16 on 60020'
> 2008-09-03 18:47:10,043 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 1 on 60020'
> 2008-09-03 18:47:18,403 INFO org.apache.hadoop.hbase.regionserver.HRegion: compaction completed on region TestTable,0029377106,1220466998108 in 8sec
> ...
> {code}
> Notice how we're blocked for ten minutes until new flush runs.  My guess is that the flush that is going on concurrent with the blocking is clearing the flag 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-866) Blocking for ten minutes at a time

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628208#action_12628208 ] 

stack commented on HBASE-866:
-----------------------------

Here is another example:

{code}
2008-09-03 23:35:54,899 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Flush requested on TestTable,0134223832,1220484926984
2008-09-03 23:35:54,899 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memcache flush for region TestTable,0134223832,1220484926984. Current region memcache size 64.0m
2008-09-03 23:35:54,899 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 10 on 60020' on region TestTable,0134223832,1220484926984: Memcache size 64.0m is >= than blocking 64.0m size
2008-09-03 23:35:54,901 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0134223832,1220484926984 'IPC Server handler 10 on 60020'
2008-09-03 23:36:05,287 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 1 on 60020' on region TestTable,0134223832,1220484926984: Memcache size 64.0m is >= than blocking 64.0m size
2008-09-03 23:36:05,287 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 8 on 60020' on region TestTable,0134223832,1220484926984: Memcache size 64.0m is >= than blocking 64.0m size
2008-09-03 23:36:05,287 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 0 on 60020' on region TestTable,0134223832,1220484926984: Memcache size 64.0m is >= than blocking 64.0m size
2008-09-03 23:36:05,287 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 2 on 60020' on region TestTable,0134223832,1220484926984: Memcache size 64.0m is >= than blocking 64.0m size
2008-09-03 23:36:05,288 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 10 on 60020' on region TestTable,0134223832,1220484926984: Memcache size 64.0m is >= than blocking 64.0m size
2008-09-03 23:36:05,288 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 9 on 60020' on region TestTable,0134223832,1220484926984: Memcache size 64.0m is >= than blocking 64.0m size
2008-09-03 23:36:05,288 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 7 on 60020' on region TestTable,0134223832,1220484926984: Memcache size 64.0m is >= than blocking 64.0m size
2008-09-03 23:36:05,288 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 12 on 60020' on region TestTable,0134223832,1220484926984: Memcache size 64.0m is >= than blocking 64.0m size
2008-09-03 23:36:05,650 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Finished memcache flush for region TestTable,0134223832,1220484926984 in 10751ms, sequence id=12850777, compaction requested=false
2008-09-03 23:46:00,731 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memcache flush for region TestTable,0134223832,1220484926984. Current region memcache size 64.0m
2008-09-03 23:46:00,731 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0134223832,1220484926984 'IPC Server handler 9 on 60020'
2008-09-03 23:46:00,732 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0134223832,1220484926984 'IPC Server handler 10 on 60020'
2008-09-03 23:46:00,732 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0134223832,1220484926984 'IPC Server handler 12 on 60020'
2008-09-03 23:46:00,732 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0134223832,1220484926984 'IPC Server handler 0 on 60020'
2008-09-03 23:46:00,741 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0134223832,1220484926984 'IPC Server handler 8 on 60020'
2008-09-03 23:46:00,741 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0134223832,1220484926984 'IPC Server handler 1 on 60020'
2008-09-03 23:46:00,741 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0134223832,1220484926984 'IPC Server handler 7 on 60020'
2008-09-03 23:46:00,742 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0134223832,1220484926984 'IPC Server handler 2 on 60020'
2008-09-03 23:46:04,123 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Finished memcache flush for region TestTable,0134223832,1220484926984 in 3392ms, sequence id=12916127, compaction requested=true
{code}

Running the above, I'd moved notification of blocking threads to just after clear of memcache rather than later down in the flush but didn't seem to matter:
{code}
Index: src/java/org/apache/hadoop/hbase/regionserver/HRegion.java
===================================================================
--- src/java/org/apache/hadoop/hbase/regionserver/HRegion.java  (revision 691805)
+++ src/java/org/apache/hadoop/hbase/regionserver/HRegion.java  (working copy)
@@ -1042,6 +1042,10 @@
     } finally {
       this.updatesLock.writeLock().unlock();
     }
+    // Notify anyone waiting on memcache to clear: e.g. checkResources().
+    synchronized (this) {
+      notifyAll();
+    }
 
     // Any failure from here on out will be catastrophic requiring server
     // restart so hlog content can be replayed and put back into the memcache.
@@ -1081,13 +1085,6 @@
     //     log-sequence-ids can be safely ignored.
     this.log.completeCacheFlush(getRegionName(),
         regionInfo.getTableDesc().getName(), completeSequenceId);
-
-    // C. Finally notify anyone waiting on memcache to clear:
-    // e.g. checkResources().
-    synchronized (this) {
-      notifyAll();
-    }
-    
     if (LOG.isDebugEnabled()) {
       String timeTaken = StringUtils.formatTimeDiff(System.currentTimeMillis(), 
           startTime);
{code}


> Blocking for ten minutes at a time
> ----------------------------------
>
>                 Key: HBASE-866
>                 URL: https://issues.apache.org/jira/browse/HBASE-866
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>
> I've been testing running biggish MR jobs uploading into hbase.  My jobs consistently fail with child task timing out its ten minute period.  Adding logging, was able to see that we're actual stuck in a commit.  Following the thread of the row we're committing, I see this in the log:
> {code}
> ...
> 2008-09-03 18:37:03,446 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Flush requested on TestTable,0029377106,1220466998108
> 2008-09-03 18:37:03,446 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memcache flush for region TestTable,0029377106,1220466998108. Current region memcache size 64.0m
> 2008-09-03 18:37:03,446 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 1 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:13,450 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 1 on 60020'
> 2008-09-03 18:37:16,089 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 16 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 1 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 4 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 6 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 2 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 12 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 9 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:16,091 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 7 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:21,984 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Finished memcache flush for region TestTable,0029377106,1220466998108 in 18538ms, sequence id=2852547, compaction requested=false
> 2008-09-03 18:47:06,241 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memcache flush for region TestTable,0029377106,1220466998108. Current region memcache size 64.0m
> 2008-09-03 18:47:10,031 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Finished memcache flush for region TestTable,0029377106,1220466998108 in 3790ms, sequence id=2919208, compaction requested=true
> 2008-09-03 18:47:10,031 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 9 on 60020'
> 2008-09-03 18:47:10,031 DEBUG org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction requested for region: TestTable,0029377106,1220466998108
> 2008-09-03 18:47:10,031 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 12 on 60020'
> 2008-09-03 18:47:10,032 INFO org.apache.hadoop.hbase.regionserver.HRegion: starting compaction on region TestTable,0029377106,1220466998108
> 2008-09-03 18:47:10,032 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 7 on 60020'
> 2008-09-03 18:47:10,035 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 6 on 60020'
> 2008-09-03 18:47:10,035 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 4 on 60020'
> 2008-09-03 18:47:10,035 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 2 on 60020'
> 2008-09-03 18:47:10,037 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 16 on 60020'
> 2008-09-03 18:47:10,043 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 1 on 60020'
> 2008-09-03 18:47:18,403 INFO org.apache.hadoop.hbase.regionserver.HRegion: compaction completed on region TestTable,0029377106,1220466998108 in 8sec
> ...
> {code}
> Notice how we're blocked for ten minutes until new flush runs.  My guess is that the flush that is going on concurrent with the blocking is clearing the flag 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HBASE-866) Blocking for ten minutes at a time

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12672717#action_12672717 ] 

Andrew Purtell commented on HBASE-866:
--------------------------------------

Ok, I have figured out what is going on, at least in my case.

OOME in IPC server handler causes the IPC handler to abort, but the client never learns about this, so it waits and waits and waits... I have seen Heritrix writer threads that have been waiting for 7+ hours. And, the OOME does not take down the HRS, so it stays up in some degraded state. See HBASE-1196. Sorry to pollute this issue. 



> Blocking for ten minutes at a time
> ----------------------------------
>
>                 Key: HBASE-866
>                 URL: https://issues.apache.org/jira/browse/HBASE-866
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>            Priority: Minor
>         Attachments: thread_dump.txt
>
>
> I've been testing running biggish MR jobs uploading into hbase.  My jobs consistently fail with child task timing out its ten minute period.  Adding logging, was able to see that we're actual stuck in a commit.  Following the thread of the row we're committing, I see this in the log:
> {code}
> ...
> 2008-09-03 18:37:03,446 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Flush requested on TestTable,0029377106,1220466998108
> 2008-09-03 18:37:03,446 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memcache flush for region TestTable,0029377106,1220466998108. Current region memcache size 64.0m
> 2008-09-03 18:37:03,446 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 1 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:13,450 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 1 on 60020'
> 2008-09-03 18:37:16,089 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 16 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 1 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 4 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 6 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 2 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 12 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 9 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:16,091 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 7 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:21,984 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Finished memcache flush for region TestTable,0029377106,1220466998108 in 18538ms, sequence id=2852547, compaction requested=false
> 2008-09-03 18:47:06,241 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memcache flush for region TestTable,0029377106,1220466998108. Current region memcache size 64.0m
> 2008-09-03 18:47:10,031 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Finished memcache flush for region TestTable,0029377106,1220466998108 in 3790ms, sequence id=2919208, compaction requested=true
> 2008-09-03 18:47:10,031 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 9 on 60020'
> 2008-09-03 18:47:10,031 DEBUG org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction requested for region: TestTable,0029377106,1220466998108
> 2008-09-03 18:47:10,031 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 12 on 60020'
> 2008-09-03 18:47:10,032 INFO org.apache.hadoop.hbase.regionserver.HRegion: starting compaction on region TestTable,0029377106,1220466998108
> 2008-09-03 18:47:10,032 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 7 on 60020'
> 2008-09-03 18:47:10,035 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 6 on 60020'
> 2008-09-03 18:47:10,035 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 4 on 60020'
> 2008-09-03 18:47:10,035 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 2 on 60020'
> 2008-09-03 18:47:10,037 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 16 on 60020'
> 2008-09-03 18:47:10,043 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 1 on 60020'
> 2008-09-03 18:47:18,403 INFO org.apache.hadoop.hbase.regionserver.HRegion: compaction completed on region TestTable,0029377106,1220466998108 in 8sec
> ...
> {code}
> Notice how we're blocked for ten minutes until new flush runs.  My guess is that the flush that is going on concurrent with the blocking is clearing the flag 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HBASE-866) Blocking for ten minutes at a time

Posted by "ryan rawson (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ryan rawson updated HBASE-866:
------------------------------

    Attachment: thread_dump.txt

I got the stall as well... I was running a 9-in parallel load, and my cluster just stalled. Included is the logfile from 1 of the 2 (out of 3 total) stalled servers... the thread dump is embedded in the logfile, as I requested it via the web UI.

> Blocking for ten minutes at a time
> ----------------------------------
>
>                 Key: HBASE-866
>                 URL: https://issues.apache.org/jira/browse/HBASE-866
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>            Priority: Minor
>         Attachments: thread_dump.txt
>
>
> I've been testing running biggish MR jobs uploading into hbase.  My jobs consistently fail with child task timing out its ten minute period.  Adding logging, was able to see that we're actual stuck in a commit.  Following the thread of the row we're committing, I see this in the log:
> {code}
> ...
> 2008-09-03 18:37:03,446 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Flush requested on TestTable,0029377106,1220466998108
> 2008-09-03 18:37:03,446 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memcache flush for region TestTable,0029377106,1220466998108. Current region memcache size 64.0m
> 2008-09-03 18:37:03,446 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 1 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:13,450 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 1 on 60020'
> 2008-09-03 18:37:16,089 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 16 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 1 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 4 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 6 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 2 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 12 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:16,090 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 9 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:16,091 INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 7 on 60020' on region TestTable,0029377106,1220466998108: Memcache size 64.0m is >= than blocking 64.0m size
> 2008-09-03 18:37:21,984 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Finished memcache flush for region TestTable,0029377106,1220466998108 in 18538ms, sequence id=2852547, compaction requested=false
> 2008-09-03 18:47:06,241 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started memcache flush for region TestTable,0029377106,1220466998108. Current region memcache size 64.0m
> 2008-09-03 18:47:10,031 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Finished memcache flush for region TestTable,0029377106,1220466998108 in 3790ms, sequence id=2919208, compaction requested=true
> 2008-09-03 18:47:10,031 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 9 on 60020'
> 2008-09-03 18:47:10,031 DEBUG org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction requested for region: TestTable,0029377106,1220466998108
> 2008-09-03 18:47:10,031 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 12 on 60020'
> 2008-09-03 18:47:10,032 INFO org.apache.hadoop.hbase.regionserver.HRegion: starting compaction on region TestTable,0029377106,1220466998108
> 2008-09-03 18:47:10,032 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 7 on 60020'
> 2008-09-03 18:47:10,035 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 6 on 60020'
> 2008-09-03 18:47:10,035 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 4 on 60020'
> 2008-09-03 18:47:10,035 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 2 on 60020'
> 2008-09-03 18:47:10,037 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 16 on 60020'
> 2008-09-03 18:47:10,043 INFO org.apache.hadoop.hbase.regionserver.HRegion: Unblocking updates for region TestTable,0029377106,1220466998108 'IPC Server handler 1 on 60020'
> 2008-09-03 18:47:18,403 INFO org.apache.hadoop.hbase.regionserver.HRegion: compaction completed on region TestTable,0029377106,1220466998108 in 8sec
> ...
> {code}
> Notice how we're blocked for ten minutes until new flush runs.  My guess is that the flush that is going on concurrent with the blocking is clearing the flag 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.