You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hbase.apache.org by "Prakash Khemani (JIRA)" <ji...@apache.org> on 2012/04/23 18:57:38 UTC

[jira] [Created] (HBASE-5860) splitlogmanager should not unnecessarily resubmit tasks when zk unavailable

Prakash Khemani created HBASE-5860:
--------------------------------------

             Summary: splitlogmanager should not unnecessarily resubmit tasks when zk unavailable
                 Key: HBASE-5860
                 URL: https://issues.apache.org/jira/browse/HBASE-5860
             Project: HBase
          Issue Type: Improvement
            Reporter: Prakash Khemani
            Assignee: Prakash Khemani


(Doesn't really impact the run time or correctness of log splitting)

say the master has lost connection to zk. splitlogmanager's timeoutmanager will realize that all the tasks that were submitted are still unassigned. It will resubmit those tasks (i.e. create dummy znodes)

splitlogmanager should realze that the tasks are unassigned but their znodes have not been created.


012-04-20 13:11:20,516 INFO org.apache.hadoop.hbase.master.SplitLogManager: dead splitlog worker msgstore295.snc4.facebook.com,60020,1334948757026
2012-04-20 13:11:20,517 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: Scheduling batch of logs to split
2012-04-20 13:11:20,517 INFO org.apache.hadoop.hbase.master.SplitLogManager: started splitting logs in [hdfs://msgstore215.snc4.facebook.com:9000/MSGSTORE215-SNC4-HBASE/.logs/msgstore295.snc4.facebook.com,60020,1334948757026-splitting]
2012-04-20 13:11:20,565 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server msgstore235.snc4.facebook.com/10.30.222.186:2181
2012-04-20 13:11:20,566 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to msgstore235.snc4.facebook.com/10.30.222.186:2181, initiating session
2012-04-20 13:11:20,575 INFO org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 4 unassigned = 4
2012-04-20 13:11:20,576 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: resubmitting unassigned task(s) after timeout
2012-04-20 13:11:21,577 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: resubmitting unassigned task(s) after timeout
2012-04-20 13:11:21,683 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x36ccb0f8010002, likely server has closed socket, closing socket connection and attempting reconnect
2012-04-20 13:11:21,683 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x136ccb0f4890000, likely server has closed socket, closing socket connection and attempting reconnect
2012-04-20 13:11:21,786 WARN org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback: create rc =CONNECTIONLOSS for /hbase/splitlog/hdfs%3A%2F%2Fmsgstore215.snc4.facebook.com%3A9000%2FMSGSTORE215-SNC4-HBASE%2F.logs%2Fmsgstore295.snc4.facebook.com%2C60020%2C1334948757026-splitting%2F10.30.251.186%253A60020.1334951586677 retry=3
2012-04-20 13:11:21,786 WARN org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback: create rc =CONNECTIONLOSS for /hbase/splitlog/hdfs%3A%2F%2Fmsgstore215.snc4.facebook.com%3A9000%2FMSGSTORE215-SNC4-HBASE%2F.logs%2Fmsgstore295.snc4.facebook.com%2C60020%2C1334948757026-splitting%2F10.30.251.186%253A60020.1334951920332 retry=3

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5860) splitlogmanager should not unnecessarily resubmit tasks when zk unavailable

Posted by "Zhihong Yu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13262318#comment-13262318 ] 

Zhihong Yu commented on HBASE-5860:
-----------------------------------

Patch makes sense.
{code}
+  static boolean isAnyCreateZNodePending() {
{code}
This method can be made private, right ?
Would isAnyZNodeCreationPending be a better name ?
                
> splitlogmanager should not unnecessarily resubmit tasks when zk unavailable
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5860
>                 URL: https://issues.apache.org/jira/browse/HBASE-5860
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Prakash Khemani
>            Assignee: Prakash Khemani
>         Attachments: 0001-HBASE-5860-splitlogmanager-should-not-unnecessarily-.patch
>
>
> (Doesn't really impact the run time or correctness of log splitting)
> say the master has lost connection to zk. splitlogmanager's timeoutmanager will realize that all the tasks that were submitted are still unassigned. It will resubmit those tasks (i.e. create dummy znodes)
> splitlogmanager should realze that the tasks are unassigned but their znodes have not been created.
> 012-04-20 13:11:20,516 INFO org.apache.hadoop.hbase.master.SplitLogManager: dead splitlog worker msgstore295.snc4.facebook.com,60020,1334948757026
> 2012-04-20 13:11:20,517 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: Scheduling batch of logs to split
> 2012-04-20 13:11:20,517 INFO org.apache.hadoop.hbase.master.SplitLogManager: started splitting logs in [hdfs://msgstore215.snc4.facebook.com:9000/MSGSTORE215-SNC4-HBASE/.logs/msgstore295.snc4.facebook.com,60020,1334948757026-splitting]
> 2012-04-20 13:11:20,565 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server msgstore235.snc4.facebook.com/10.30.222.186:2181
> 2012-04-20 13:11:20,566 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to msgstore235.snc4.facebook.com/10.30.222.186:2181, initiating session
> 2012-04-20 13:11:20,575 INFO org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 4 unassigned = 4
> 2012-04-20 13:11:20,576 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: resubmitting unassigned task(s) after timeout
> 2012-04-20 13:11:21,577 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: resubmitting unassigned task(s) after timeout
> 2012-04-20 13:11:21,683 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x36ccb0f8010002, likely server has closed socket, closing socket connection and attempting reconnect
> 2012-04-20 13:11:21,683 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x136ccb0f4890000, likely server has closed socket, closing socket connection and attempting reconnect
> 2012-04-20 13:11:21,786 WARN org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback: create rc =CONNECTIONLOSS for /hbase/splitlog/hdfs%3A%2F%2Fmsgstore215.snc4.facebook.com%3A9000%2FMSGSTORE215-SNC4-HBASE%2F.logs%2Fmsgstore295.snc4.facebook.com%2C60020%2C1334948757026-splitting%2F10.30.251.186%253A60020.1334951586677 retry=3
> 2012-04-20 13:11:21,786 WARN org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback: create rc =CONNECTIONLOSS for /hbase/splitlog/hdfs%3A%2F%2Fmsgstore215.snc4.facebook.com%3A9000%2FMSGSTORE215-SNC4-HBASE%2F.logs%2Fmsgstore295.snc4.facebook.com%2C60020%2C1334948757026-splitting%2F10.30.251.186%253A60020.1334951920332 retry=3

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5860) splitlogmanager should not unnecessarily resubmit tasks when zk unavailable

Posted by "Prakash Khemani (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Prakash Khemani updated HBASE-5860:
-----------------------------------

    Attachment: 0001-HBASE-5860-splitlogmanager-should-not-unnecessarily-.patch

Nicolas's feedback applied.

also reduced the RESCAN retries to 0.
                
> splitlogmanager should not unnecessarily resubmit tasks when zk unavailable
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5860
>                 URL: https://issues.apache.org/jira/browse/HBASE-5860
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Prakash Khemani
>            Assignee: Prakash Khemani
>         Attachments: 0001-HBASE-5860-splitlogmanager-should-not-unnecessarily-.patch, 0001-HBASE-5860-splitlogmanager-should-not-unnecessarily-.patch
>
>
> (Doesn't really impact the run time or correctness of log splitting)
> say the master has lost connection to zk. splitlogmanager's timeoutmanager will realize that all the tasks that were submitted are still unassigned. It will resubmit those tasks (i.e. create dummy znodes)
> splitlogmanager should realze that the tasks are unassigned but their znodes have not been created.
> 012-04-20 13:11:20,516 INFO org.apache.hadoop.hbase.master.SplitLogManager: dead splitlog worker msgstore295.snc4.facebook.com,60020,1334948757026
> 2012-04-20 13:11:20,517 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: Scheduling batch of logs to split
> 2012-04-20 13:11:20,517 INFO org.apache.hadoop.hbase.master.SplitLogManager: started splitting logs in [hdfs://msgstore215.snc4.facebook.com:9000/MSGSTORE215-SNC4-HBASE/.logs/msgstore295.snc4.facebook.com,60020,1334948757026-splitting]
> 2012-04-20 13:11:20,565 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server msgstore235.snc4.facebook.com/10.30.222.186:2181
> 2012-04-20 13:11:20,566 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to msgstore235.snc4.facebook.com/10.30.222.186:2181, initiating session
> 2012-04-20 13:11:20,575 INFO org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 4 unassigned = 4
> 2012-04-20 13:11:20,576 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: resubmitting unassigned task(s) after timeout
> 2012-04-20 13:11:21,577 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: resubmitting unassigned task(s) after timeout
> 2012-04-20 13:11:21,683 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x36ccb0f8010002, likely server has closed socket, closing socket connection and attempting reconnect
> 2012-04-20 13:11:21,683 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x136ccb0f4890000, likely server has closed socket, closing socket connection and attempting reconnect
> 2012-04-20 13:11:21,786 WARN org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback: create rc =CONNECTIONLOSS for /hbase/splitlog/hdfs%3A%2F%2Fmsgstore215.snc4.facebook.com%3A9000%2FMSGSTORE215-SNC4-HBASE%2F.logs%2Fmsgstore295.snc4.facebook.com%2C60020%2C1334948757026-splitting%2F10.30.251.186%253A60020.1334951586677 retry=3
> 2012-04-20 13:11:21,786 WARN org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback: create rc =CONNECTIONLOSS for /hbase/splitlog/hdfs%3A%2F%2Fmsgstore215.snc4.facebook.com%3A9000%2FMSGSTORE215-SNC4-HBASE%2F.logs%2Fmsgstore295.snc4.facebook.com%2C60020%2C1334948757026-splitting%2F10.30.251.186%253A60020.1334951920332 retry=3

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5860) splitlogmanager should not unnecessarily resubmit tasks when zk unavailable

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265544#comment-13265544 ] 

Hadoop QA commented on HBASE-5860:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12525131/0001-HBASE-5860-splitlogmanager-should-not-unnecessarily-.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 hadoop23.  The patch compiles against the hadoop 0.23.x profile.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 4 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.client.TestShell

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1699//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1699//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1699//console

This message is automatically generated.
                
> splitlogmanager should not unnecessarily resubmit tasks when zk unavailable
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5860
>                 URL: https://issues.apache.org/jira/browse/HBASE-5860
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Prakash Khemani
>            Assignee: Prakash Khemani
>         Attachments: 0001-HBASE-5860-splitlogmanager-should-not-unnecessarily-.patch, 0001-HBASE-5860-splitlogmanager-should-not-unnecessarily-.patch
>
>
> (Doesn't really impact the run time or correctness of log splitting)
> say the master has lost connection to zk. splitlogmanager's timeoutmanager will realize that all the tasks that were submitted are still unassigned. It will resubmit those tasks (i.e. create dummy znodes)
> splitlogmanager should realze that the tasks are unassigned but their znodes have not been created.
> 012-04-20 13:11:20,516 INFO org.apache.hadoop.hbase.master.SplitLogManager: dead splitlog worker msgstore295.snc4.facebook.com,60020,1334948757026
> 2012-04-20 13:11:20,517 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: Scheduling batch of logs to split
> 2012-04-20 13:11:20,517 INFO org.apache.hadoop.hbase.master.SplitLogManager: started splitting logs in [hdfs://msgstore215.snc4.facebook.com:9000/MSGSTORE215-SNC4-HBASE/.logs/msgstore295.snc4.facebook.com,60020,1334948757026-splitting]
> 2012-04-20 13:11:20,565 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server msgstore235.snc4.facebook.com/10.30.222.186:2181
> 2012-04-20 13:11:20,566 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to msgstore235.snc4.facebook.com/10.30.222.186:2181, initiating session
> 2012-04-20 13:11:20,575 INFO org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 4 unassigned = 4
> 2012-04-20 13:11:20,576 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: resubmitting unassigned task(s) after timeout
> 2012-04-20 13:11:21,577 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: resubmitting unassigned task(s) after timeout
> 2012-04-20 13:11:21,683 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x36ccb0f8010002, likely server has closed socket, closing socket connection and attempting reconnect
> 2012-04-20 13:11:21,683 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x136ccb0f4890000, likely server has closed socket, closing socket connection and attempting reconnect
> 2012-04-20 13:11:21,786 WARN org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback: create rc =CONNECTIONLOSS for /hbase/splitlog/hdfs%3A%2F%2Fmsgstore215.snc4.facebook.com%3A9000%2FMSGSTORE215-SNC4-HBASE%2F.logs%2Fmsgstore295.snc4.facebook.com%2C60020%2C1334948757026-splitting%2F10.30.251.186%253A60020.1334951586677 retry=3
> 2012-04-20 13:11:21,786 WARN org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback: create rc =CONNECTIONLOSS for /hbase/splitlog/hdfs%3A%2F%2Fmsgstore215.snc4.facebook.com%3A9000%2FMSGSTORE215-SNC4-HBASE%2F.logs%2Fmsgstore295.snc4.facebook.com%2C60020%2C1334948757026-splitting%2F10.30.251.186%253A60020.1334951920332 retry=3

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5860) splitlogmanager should not unnecessarily resubmit tasks when zk unavailable

Posted by "Prakash Khemani (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265461#comment-13265461 ] 

Prakash Khemani commented on HBASE-5860:
----------------------------------------

I had missed the fact that isAnyCreateZKNodePending() misses the create of RESCAN nodes. Will provide a fix.

I was aware of the race condition where isAnyCreateZKNodePending() will return false even when create-zknode is soon going to be retried. Not worth fixing for the reason you outlined - creating an extra RESCAN node doesn't hurt. (The code change you have outlined will need some more changes to make it work)
                
> splitlogmanager should not unnecessarily resubmit tasks when zk unavailable
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5860
>                 URL: https://issues.apache.org/jira/browse/HBASE-5860
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Prakash Khemani
>            Assignee: Prakash Khemani
>         Attachments: 0001-HBASE-5860-splitlogmanager-should-not-unnecessarily-.patch
>
>
> (Doesn't really impact the run time or correctness of log splitting)
> say the master has lost connection to zk. splitlogmanager's timeoutmanager will realize that all the tasks that were submitted are still unassigned. It will resubmit those tasks (i.e. create dummy znodes)
> splitlogmanager should realze that the tasks are unassigned but their znodes have not been created.
> 012-04-20 13:11:20,516 INFO org.apache.hadoop.hbase.master.SplitLogManager: dead splitlog worker msgstore295.snc4.facebook.com,60020,1334948757026
> 2012-04-20 13:11:20,517 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: Scheduling batch of logs to split
> 2012-04-20 13:11:20,517 INFO org.apache.hadoop.hbase.master.SplitLogManager: started splitting logs in [hdfs://msgstore215.snc4.facebook.com:9000/MSGSTORE215-SNC4-HBASE/.logs/msgstore295.snc4.facebook.com,60020,1334948757026-splitting]
> 2012-04-20 13:11:20,565 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server msgstore235.snc4.facebook.com/10.30.222.186:2181
> 2012-04-20 13:11:20,566 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to msgstore235.snc4.facebook.com/10.30.222.186:2181, initiating session
> 2012-04-20 13:11:20,575 INFO org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 4 unassigned = 4
> 2012-04-20 13:11:20,576 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: resubmitting unassigned task(s) after timeout
> 2012-04-20 13:11:21,577 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: resubmitting unassigned task(s) after timeout
> 2012-04-20 13:11:21,683 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x36ccb0f8010002, likely server has closed socket, closing socket connection and attempting reconnect
> 2012-04-20 13:11:21,683 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x136ccb0f4890000, likely server has closed socket, closing socket connection and attempting reconnect
> 2012-04-20 13:11:21,786 WARN org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback: create rc =CONNECTIONLOSS for /hbase/splitlog/hdfs%3A%2F%2Fmsgstore215.snc4.facebook.com%3A9000%2FMSGSTORE215-SNC4-HBASE%2F.logs%2Fmsgstore295.snc4.facebook.com%2C60020%2C1334948757026-splitting%2F10.30.251.186%253A60020.1334951586677 retry=3
> 2012-04-20 13:11:21,786 WARN org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback: create rc =CONNECTIONLOSS for /hbase/splitlog/hdfs%3A%2F%2Fmsgstore215.snc4.facebook.com%3A9000%2FMSGSTORE215-SNC4-HBASE%2F.logs%2Fmsgstore295.snc4.facebook.com%2C60020%2C1334948757026-splitting%2F10.30.251.186%253A60020.1334951920332 retry=3

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5860) splitlogmanager should not unnecessarily resubmit tasks when zk unavailable

Posted by "Nicolas Spiegelberg (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265162#comment-13265162 ] 

Nicolas Spiegelberg commented on HBASE-5860:
--------------------------------------------

Also, it looks like there is a race condition in CreateAsyncCallback.processResult.  The code is roughly:
{code}
tot_mgr_node_create_result.incrementAndGet();
  if (rc != KeeperException.Code.NODEEXISTS.intValue()) {
    if (retry_count > 0) {
      tot_mgr_node_create_retry.incrementAndGet();
      createNode(path, retry_count - 1);
    }
  }
{code}
So, we should change this to:
{code}
try {
  if (rc != KeeperException.Code.NODEEXISTS.intValue()) {
    if (retry_count > 0) {
      tot_mgr_node_create_retry.incrementAndGet();
      createNode(path, retry_count - 1);
    }
  }
} finally {
  tot_mgr_node_create_result.incrementAndGet();
}
{code}
so we don't mark the znode as responding until we decide if it's a failure and we need to reenqueue.  Maybe the repercussions of creating an extra RESCAN node aren't worth finding and fixing all these subtle race conditions?
                
> splitlogmanager should not unnecessarily resubmit tasks when zk unavailable
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5860
>                 URL: https://issues.apache.org/jira/browse/HBASE-5860
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Prakash Khemani
>            Assignee: Prakash Khemani
>         Attachments: 0001-HBASE-5860-splitlogmanager-should-not-unnecessarily-.patch
>
>
> (Doesn't really impact the run time or correctness of log splitting)
> say the master has lost connection to zk. splitlogmanager's timeoutmanager will realize that all the tasks that were submitted are still unassigned. It will resubmit those tasks (i.e. create dummy znodes)
> splitlogmanager should realze that the tasks are unassigned but their znodes have not been created.
> 012-04-20 13:11:20,516 INFO org.apache.hadoop.hbase.master.SplitLogManager: dead splitlog worker msgstore295.snc4.facebook.com,60020,1334948757026
> 2012-04-20 13:11:20,517 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: Scheduling batch of logs to split
> 2012-04-20 13:11:20,517 INFO org.apache.hadoop.hbase.master.SplitLogManager: started splitting logs in [hdfs://msgstore215.snc4.facebook.com:9000/MSGSTORE215-SNC4-HBASE/.logs/msgstore295.snc4.facebook.com,60020,1334948757026-splitting]
> 2012-04-20 13:11:20,565 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server msgstore235.snc4.facebook.com/10.30.222.186:2181
> 2012-04-20 13:11:20,566 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to msgstore235.snc4.facebook.com/10.30.222.186:2181, initiating session
> 2012-04-20 13:11:20,575 INFO org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 4 unassigned = 4
> 2012-04-20 13:11:20,576 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: resubmitting unassigned task(s) after timeout
> 2012-04-20 13:11:21,577 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: resubmitting unassigned task(s) after timeout
> 2012-04-20 13:11:21,683 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x36ccb0f8010002, likely server has closed socket, closing socket connection and attempting reconnect
> 2012-04-20 13:11:21,683 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x136ccb0f4890000, likely server has closed socket, closing socket connection and attempting reconnect
> 2012-04-20 13:11:21,786 WARN org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback: create rc =CONNECTIONLOSS for /hbase/splitlog/hdfs%3A%2F%2Fmsgstore215.snc4.facebook.com%3A9000%2FMSGSTORE215-SNC4-HBASE%2F.logs%2Fmsgstore295.snc4.facebook.com%2C60020%2C1334948757026-splitting%2F10.30.251.186%253A60020.1334951586677 retry=3
> 2012-04-20 13:11:21,786 WARN org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback: create rc =CONNECTIONLOSS for /hbase/splitlog/hdfs%3A%2F%2Fmsgstore215.snc4.facebook.com%3A9000%2FMSGSTORE215-SNC4-HBASE%2F.logs%2Fmsgstore295.snc4.facebook.com%2C60020%2C1334948757026-splitting%2F10.30.251.186%253A60020.1334951920332 retry=3

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5860) splitlogmanager should not unnecessarily resubmit tasks when zk unavailable

Posted by "Nicolas Spiegelberg (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13263851#comment-13263851 ] 

Nicolas Spiegelberg commented on HBASE-5860:
--------------------------------------------

+1
                
> splitlogmanager should not unnecessarily resubmit tasks when zk unavailable
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5860
>                 URL: https://issues.apache.org/jira/browse/HBASE-5860
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Prakash Khemani
>            Assignee: Prakash Khemani
>         Attachments: 0001-HBASE-5860-splitlogmanager-should-not-unnecessarily-.patch
>
>
> (Doesn't really impact the run time or correctness of log splitting)
> say the master has lost connection to zk. splitlogmanager's timeoutmanager will realize that all the tasks that were submitted are still unassigned. It will resubmit those tasks (i.e. create dummy znodes)
> splitlogmanager should realze that the tasks are unassigned but their znodes have not been created.
> 012-04-20 13:11:20,516 INFO org.apache.hadoop.hbase.master.SplitLogManager: dead splitlog worker msgstore295.snc4.facebook.com,60020,1334948757026
> 2012-04-20 13:11:20,517 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: Scheduling batch of logs to split
> 2012-04-20 13:11:20,517 INFO org.apache.hadoop.hbase.master.SplitLogManager: started splitting logs in [hdfs://msgstore215.snc4.facebook.com:9000/MSGSTORE215-SNC4-HBASE/.logs/msgstore295.snc4.facebook.com,60020,1334948757026-splitting]
> 2012-04-20 13:11:20,565 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server msgstore235.snc4.facebook.com/10.30.222.186:2181
> 2012-04-20 13:11:20,566 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to msgstore235.snc4.facebook.com/10.30.222.186:2181, initiating session
> 2012-04-20 13:11:20,575 INFO org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 4 unassigned = 4
> 2012-04-20 13:11:20,576 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: resubmitting unassigned task(s) after timeout
> 2012-04-20 13:11:21,577 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: resubmitting unassigned task(s) after timeout
> 2012-04-20 13:11:21,683 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x36ccb0f8010002, likely server has closed socket, closing socket connection and attempting reconnect
> 2012-04-20 13:11:21,683 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x136ccb0f4890000, likely server has closed socket, closing socket connection and attempting reconnect
> 2012-04-20 13:11:21,786 WARN org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback: create rc =CONNECTIONLOSS for /hbase/splitlog/hdfs%3A%2F%2Fmsgstore215.snc4.facebook.com%3A9000%2FMSGSTORE215-SNC4-HBASE%2F.logs%2Fmsgstore295.snc4.facebook.com%2C60020%2C1334948757026-splitting%2F10.30.251.186%253A60020.1334951586677 retry=3
> 2012-04-20 13:11:21,786 WARN org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback: create rc =CONNECTIONLOSS for /hbase/splitlog/hdfs%3A%2F%2Fmsgstore215.snc4.facebook.com%3A9000%2FMSGSTORE215-SNC4-HBASE%2F.logs%2Fmsgstore295.snc4.facebook.com%2C60020%2C1334948757026-splitting%2F10.30.251.186%253A60020.1334951920332 retry=3

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5860) splitlogmanager should not unnecessarily resubmit tasks when zk unavailable

Posted by "Zhihong Yu (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-5860:
------------------------------

    Hadoop Flags: Reviewed
          Status: Patch Available  (was: Open)
    
> splitlogmanager should not unnecessarily resubmit tasks when zk unavailable
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5860
>                 URL: https://issues.apache.org/jira/browse/HBASE-5860
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Prakash Khemani
>            Assignee: Prakash Khemani
>         Attachments: 0001-HBASE-5860-splitlogmanager-should-not-unnecessarily-.patch, 0001-HBASE-5860-splitlogmanager-should-not-unnecessarily-.patch
>
>
> (Doesn't really impact the run time or correctness of log splitting)
> say the master has lost connection to zk. splitlogmanager's timeoutmanager will realize that all the tasks that were submitted are still unassigned. It will resubmit those tasks (i.e. create dummy znodes)
> splitlogmanager should realze that the tasks are unassigned but their znodes have not been created.
> 012-04-20 13:11:20,516 INFO org.apache.hadoop.hbase.master.SplitLogManager: dead splitlog worker msgstore295.snc4.facebook.com,60020,1334948757026
> 2012-04-20 13:11:20,517 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: Scheduling batch of logs to split
> 2012-04-20 13:11:20,517 INFO org.apache.hadoop.hbase.master.SplitLogManager: started splitting logs in [hdfs://msgstore215.snc4.facebook.com:9000/MSGSTORE215-SNC4-HBASE/.logs/msgstore295.snc4.facebook.com,60020,1334948757026-splitting]
> 2012-04-20 13:11:20,565 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server msgstore235.snc4.facebook.com/10.30.222.186:2181
> 2012-04-20 13:11:20,566 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to msgstore235.snc4.facebook.com/10.30.222.186:2181, initiating session
> 2012-04-20 13:11:20,575 INFO org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 4 unassigned = 4
> 2012-04-20 13:11:20,576 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: resubmitting unassigned task(s) after timeout
> 2012-04-20 13:11:21,577 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: resubmitting unassigned task(s) after timeout
> 2012-04-20 13:11:21,683 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x36ccb0f8010002, likely server has closed socket, closing socket connection and attempting reconnect
> 2012-04-20 13:11:21,683 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x136ccb0f4890000, likely server has closed socket, closing socket connection and attempting reconnect
> 2012-04-20 13:11:21,786 WARN org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback: create rc =CONNECTIONLOSS for /hbase/splitlog/hdfs%3A%2F%2Fmsgstore215.snc4.facebook.com%3A9000%2FMSGSTORE215-SNC4-HBASE%2F.logs%2Fmsgstore295.snc4.facebook.com%2C60020%2C1334948757026-splitting%2F10.30.251.186%253A60020.1334951586677 retry=3
> 2012-04-20 13:11:21,786 WARN org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback: create rc =CONNECTIONLOSS for /hbase/splitlog/hdfs%3A%2F%2Fmsgstore215.snc4.facebook.com%3A9000%2FMSGSTORE215-SNC4-HBASE%2F.logs%2Fmsgstore295.snc4.facebook.com%2C60020%2C1334948757026-splitting%2F10.30.251.186%253A60020.1334951920332 retry=3

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5860) splitlogmanager should not unnecessarily resubmit tasks when zk unavailable

Posted by "Nicolas Spiegelberg (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265150#comment-13265150 ] 

Nicolas Spiegelberg commented on HBASE-5860:
--------------------------------------------

@Prakash:  this code wouldn't pick up that the RESCAN znode was created because that uses createRescanNode() instead of createNode().  Should we not also increment tot_mgr_node_create_queued for createRescanNode() and increment tot_mgr_node_create_result in CreateRescanAsyncCallback.processResult?
                
> splitlogmanager should not unnecessarily resubmit tasks when zk unavailable
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5860
>                 URL: https://issues.apache.org/jira/browse/HBASE-5860
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Prakash Khemani
>            Assignee: Prakash Khemani
>         Attachments: 0001-HBASE-5860-splitlogmanager-should-not-unnecessarily-.patch
>
>
> (Doesn't really impact the run time or correctness of log splitting)
> say the master has lost connection to zk. splitlogmanager's timeoutmanager will realize that all the tasks that were submitted are still unassigned. It will resubmit those tasks (i.e. create dummy znodes)
> splitlogmanager should realze that the tasks are unassigned but their znodes have not been created.
> 012-04-20 13:11:20,516 INFO org.apache.hadoop.hbase.master.SplitLogManager: dead splitlog worker msgstore295.snc4.facebook.com,60020,1334948757026
> 2012-04-20 13:11:20,517 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: Scheduling batch of logs to split
> 2012-04-20 13:11:20,517 INFO org.apache.hadoop.hbase.master.SplitLogManager: started splitting logs in [hdfs://msgstore215.snc4.facebook.com:9000/MSGSTORE215-SNC4-HBASE/.logs/msgstore295.snc4.facebook.com,60020,1334948757026-splitting]
> 2012-04-20 13:11:20,565 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server msgstore235.snc4.facebook.com/10.30.222.186:2181
> 2012-04-20 13:11:20,566 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to msgstore235.snc4.facebook.com/10.30.222.186:2181, initiating session
> 2012-04-20 13:11:20,575 INFO org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 4 unassigned = 4
> 2012-04-20 13:11:20,576 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: resubmitting unassigned task(s) after timeout
> 2012-04-20 13:11:21,577 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: resubmitting unassigned task(s) after timeout
> 2012-04-20 13:11:21,683 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x36ccb0f8010002, likely server has closed socket, closing socket connection and attempting reconnect
> 2012-04-20 13:11:21,683 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x136ccb0f4890000, likely server has closed socket, closing socket connection and attempting reconnect
> 2012-04-20 13:11:21,786 WARN org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback: create rc =CONNECTIONLOSS for /hbase/splitlog/hdfs%3A%2F%2Fmsgstore215.snc4.facebook.com%3A9000%2FMSGSTORE215-SNC4-HBASE%2F.logs%2Fmsgstore295.snc4.facebook.com%2C60020%2C1334948757026-splitting%2F10.30.251.186%253A60020.1334951586677 retry=3
> 2012-04-20 13:11:21,786 WARN org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback: create rc =CONNECTIONLOSS for /hbase/splitlog/hdfs%3A%2F%2Fmsgstore215.snc4.facebook.com%3A9000%2FMSGSTORE215-SNC4-HBASE%2F.logs%2Fmsgstore295.snc4.facebook.com%2C60020%2C1334948757026-splitting%2F10.30.251.186%253A60020.1334951920332 retry=3

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5860) splitlogmanager should not unnecessarily resubmit tasks when zk unavailable

Posted by "Jimmy Xiang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13263871#comment-13263871 ] 

Jimmy Xiang commented on HBASE-5860:
------------------------------------

Looks good to me.
                
> splitlogmanager should not unnecessarily resubmit tasks when zk unavailable
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5860
>                 URL: https://issues.apache.org/jira/browse/HBASE-5860
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Prakash Khemani
>            Assignee: Prakash Khemani
>         Attachments: 0001-HBASE-5860-splitlogmanager-should-not-unnecessarily-.patch
>
>
> (Doesn't really impact the run time or correctness of log splitting)
> say the master has lost connection to zk. splitlogmanager's timeoutmanager will realize that all the tasks that were submitted are still unassigned. It will resubmit those tasks (i.e. create dummy znodes)
> splitlogmanager should realze that the tasks are unassigned but their znodes have not been created.
> 012-04-20 13:11:20,516 INFO org.apache.hadoop.hbase.master.SplitLogManager: dead splitlog worker msgstore295.snc4.facebook.com,60020,1334948757026
> 2012-04-20 13:11:20,517 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: Scheduling batch of logs to split
> 2012-04-20 13:11:20,517 INFO org.apache.hadoop.hbase.master.SplitLogManager: started splitting logs in [hdfs://msgstore215.snc4.facebook.com:9000/MSGSTORE215-SNC4-HBASE/.logs/msgstore295.snc4.facebook.com,60020,1334948757026-splitting]
> 2012-04-20 13:11:20,565 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server msgstore235.snc4.facebook.com/10.30.222.186:2181
> 2012-04-20 13:11:20,566 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to msgstore235.snc4.facebook.com/10.30.222.186:2181, initiating session
> 2012-04-20 13:11:20,575 INFO org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 4 unassigned = 4
> 2012-04-20 13:11:20,576 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: resubmitting unassigned task(s) after timeout
> 2012-04-20 13:11:21,577 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: resubmitting unassigned task(s) after timeout
> 2012-04-20 13:11:21,683 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x36ccb0f8010002, likely server has closed socket, closing socket connection and attempting reconnect
> 2012-04-20 13:11:21,683 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x136ccb0f4890000, likely server has closed socket, closing socket connection and attempting reconnect
> 2012-04-20 13:11:21,786 WARN org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback: create rc =CONNECTIONLOSS for /hbase/splitlog/hdfs%3A%2F%2Fmsgstore215.snc4.facebook.com%3A9000%2FMSGSTORE215-SNC4-HBASE%2F.logs%2Fmsgstore295.snc4.facebook.com%2C60020%2C1334948757026-splitting%2F10.30.251.186%253A60020.1334951586677 retry=3
> 2012-04-20 13:11:21,786 WARN org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback: create rc =CONNECTIONLOSS for /hbase/splitlog/hdfs%3A%2F%2Fmsgstore215.snc4.facebook.com%3A9000%2FMSGSTORE215-SNC4-HBASE%2F.logs%2Fmsgstore295.snc4.facebook.com%2C60020%2C1334948757026-splitting%2F10.30.251.186%253A60020.1334951920332 retry=3

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5860) splitlogmanager should not unnecessarily resubmit tasks when zk unavailable

Posted by "Prakash Khemani (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-5860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Prakash Khemani updated HBASE-5860:
-----------------------------------

    Attachment: 0001-HBASE-5860-splitlogmanager-should-not-unnecessarily-.patch

avoid resubmitting tasks to zk when there are pending zkk nodes create.
                
> splitlogmanager should not unnecessarily resubmit tasks when zk unavailable
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5860
>                 URL: https://issues.apache.org/jira/browse/HBASE-5860
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Prakash Khemani
>            Assignee: Prakash Khemani
>         Attachments: 0001-HBASE-5860-splitlogmanager-should-not-unnecessarily-.patch
>
>
> (Doesn't really impact the run time or correctness of log splitting)
> say the master has lost connection to zk. splitlogmanager's timeoutmanager will realize that all the tasks that were submitted are still unassigned. It will resubmit those tasks (i.e. create dummy znodes)
> splitlogmanager should realze that the tasks are unassigned but their znodes have not been created.
> 012-04-20 13:11:20,516 INFO org.apache.hadoop.hbase.master.SplitLogManager: dead splitlog worker msgstore295.snc4.facebook.com,60020,1334948757026
> 2012-04-20 13:11:20,517 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: Scheduling batch of logs to split
> 2012-04-20 13:11:20,517 INFO org.apache.hadoop.hbase.master.SplitLogManager: started splitting logs in [hdfs://msgstore215.snc4.facebook.com:9000/MSGSTORE215-SNC4-HBASE/.logs/msgstore295.snc4.facebook.com,60020,1334948757026-splitting]
> 2012-04-20 13:11:20,565 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server msgstore235.snc4.facebook.com/10.30.222.186:2181
> 2012-04-20 13:11:20,566 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to msgstore235.snc4.facebook.com/10.30.222.186:2181, initiating session
> 2012-04-20 13:11:20,575 INFO org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 4 unassigned = 4
> 2012-04-20 13:11:20,576 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: resubmitting unassigned task(s) after timeout
> 2012-04-20 13:11:21,577 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: resubmitting unassigned task(s) after timeout
> 2012-04-20 13:11:21,683 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x36ccb0f8010002, likely server has closed socket, closing socket connection and attempting reconnect
> 2012-04-20 13:11:21,683 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x136ccb0f4890000, likely server has closed socket, closing socket connection and attempting reconnect
> 2012-04-20 13:11:21,786 WARN org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback: create rc =CONNECTIONLOSS for /hbase/splitlog/hdfs%3A%2F%2Fmsgstore215.snc4.facebook.com%3A9000%2FMSGSTORE215-SNC4-HBASE%2F.logs%2Fmsgstore295.snc4.facebook.com%2C60020%2C1334948757026-splitting%2F10.30.251.186%253A60020.1334951586677 retry=3
> 2012-04-20 13:11:21,786 WARN org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback: create rc =CONNECTIONLOSS for /hbase/splitlog/hdfs%3A%2F%2Fmsgstore215.snc4.facebook.com%3A9000%2FMSGSTORE215-SNC4-HBASE%2F.logs%2Fmsgstore295.snc4.facebook.com%2C60020%2C1334948757026-splitting%2F10.30.251.186%253A60020.1334951920332 retry=3

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5860) splitlogmanager should not unnecessarily resubmit tasks when zk unavailable

Posted by "Nicolas Spiegelberg (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-5860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13265491#comment-13265491 ] 

Nicolas Spiegelberg commented on HBASE-5860:
--------------------------------------------

I guess changing the retries to 0 should also fix the HBASE-5890 problem as well?  We shouldn't get a NODEEXISTS return for the RESCAN because we create it as EPHEMERAL_SEQUENTIAL.
                
> splitlogmanager should not unnecessarily resubmit tasks when zk unavailable
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5860
>                 URL: https://issues.apache.org/jira/browse/HBASE-5860
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Prakash Khemani
>            Assignee: Prakash Khemani
>         Attachments: 0001-HBASE-5860-splitlogmanager-should-not-unnecessarily-.patch, 0001-HBASE-5860-splitlogmanager-should-not-unnecessarily-.patch
>
>
> (Doesn't really impact the run time or correctness of log splitting)
> say the master has lost connection to zk. splitlogmanager's timeoutmanager will realize that all the tasks that were submitted are still unassigned. It will resubmit those tasks (i.e. create dummy znodes)
> splitlogmanager should realze that the tasks are unassigned but their znodes have not been created.
> 012-04-20 13:11:20,516 INFO org.apache.hadoop.hbase.master.SplitLogManager: dead splitlog worker msgstore295.snc4.facebook.com,60020,1334948757026
> 2012-04-20 13:11:20,517 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: Scheduling batch of logs to split
> 2012-04-20 13:11:20,517 INFO org.apache.hadoop.hbase.master.SplitLogManager: started splitting logs in [hdfs://msgstore215.snc4.facebook.com:9000/MSGSTORE215-SNC4-HBASE/.logs/msgstore295.snc4.facebook.com,60020,1334948757026-splitting]
> 2012-04-20 13:11:20,565 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server msgstore235.snc4.facebook.com/10.30.222.186:2181
> 2012-04-20 13:11:20,566 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to msgstore235.snc4.facebook.com/10.30.222.186:2181, initiating session
> 2012-04-20 13:11:20,575 INFO org.apache.hadoop.hbase.master.SplitLogManager: total tasks = 4 unassigned = 4
> 2012-04-20 13:11:20,576 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: resubmitting unassigned task(s) after timeout
> 2012-04-20 13:11:21,577 DEBUG org.apache.hadoop.hbase.master.SplitLogManager: resubmitting unassigned task(s) after timeout
> 2012-04-20 13:11:21,683 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x36ccb0f8010002, likely server has closed socket, closing socket connection and attempting reconnect
> 2012-04-20 13:11:21,683 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x136ccb0f4890000, likely server has closed socket, closing socket connection and attempting reconnect
> 2012-04-20 13:11:21,786 WARN org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback: create rc =CONNECTIONLOSS for /hbase/splitlog/hdfs%3A%2F%2Fmsgstore215.snc4.facebook.com%3A9000%2FMSGSTORE215-SNC4-HBASE%2F.logs%2Fmsgstore295.snc4.facebook.com%2C60020%2C1334948757026-splitting%2F10.30.251.186%253A60020.1334951586677 retry=3
> 2012-04-20 13:11:21,786 WARN org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback: create rc =CONNECTIONLOSS for /hbase/splitlog/hdfs%3A%2F%2Fmsgstore215.snc4.facebook.com%3A9000%2FMSGSTORE215-SNC4-HBASE%2F.logs%2Fmsgstore295.snc4.facebook.com%2C60020%2C1334948757026-splitting%2F10.30.251.186%253A60020.1334951920332 retry=3

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira