You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by "Avery Ching (JIRA)" <ji...@apache.org> on 2012/10/05 02:43:47 UTC

[jira] [Created] (GIRAPH-356) Help debug ZooKeeper issues

Avery Ching created GIRAPH-356:
----------------------------------

             Summary: Help debug ZooKeeper issues
                 Key: GIRAPH-356
                 URL: https://issues.apache.org/jira/browse/GIRAPH-356
             Project: Giraph
          Issue Type: Improvement
            Reporter: Avery Ching


Currently, if the ZooKeeper process fails, we have little information on why and what happened.  This patch addresses this by keeping the last 100 log lines and dumps when the map fails under a RuntimeException.

Here is an example of a master task failure when there is an invalid JVM argument passed to ZooKeeper.  The error is much for obvious now.

2012-10-04 15:05:28,916 WARN org.apache.giraph.zk.ZooKeeperManager: logZooKeeperOutput: Dumping up to last 100 lines of the ZooKeeper process STDOUT and STDERR.
2012-10-04 15:05:28,916 WARN org.apache.giraph.zk.ZooKeeperManager$StreamCollector: Unrecognized option: -BadOpt
2012-10-04 15:05:28,916 WARN org.apache.giraph.zk.ZooKeeperManager$StreamCollector: Could not create the Java virtual machine.
2012-10-04 15:05:28,919 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
2012-10-04 15:05:28,959 WARN org.apache.hadoop.mapred.Child: Error running child
java.lang.IllegalStateException: run: Caught an unrecoverable exception onlineZooKeeperServers: Failed to connect in 5 tries!
                                 at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:591)
                                 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
                                 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
                                 at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
                                 at java.security.AccessController.doPrivileged(Native Method)
                                 at javax.security.auth.Subject.doAs(Subject.java:396)
                                 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
                                 at org.apache.hadoop.mapred.Child.main(Child.java:253)
Caused by: java.lang.IllegalStateException: onlineZooKeeperServers: Failed to connect in 5 tries!
       at org.apache.giraph.zk.ZooKeeperManager.onlineZooKeeperServers(ZooKeeperManager.java:721)
       at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:328)
       at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:573)
       ... 7 more
2012-10-04 15:05:28,963 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-356) Improve ZooKeeper issues

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13470716#comment-13470716 ] 

Hudson commented on GIRAPH-356:
-------------------------------

Integrated in Giraph-trunk-Commit #226 (See [https://builds.apache.org/job/Giraph-trunk-Commit/226/])
    GIRAPH-356: Improve ZooKeeper issues. (aching) (Revision 1394826)

     Result = SUCCESS
aching : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1394826
Files : 
* /giraph/trunk/CHANGELOG
* /giraph/trunk/pom.xml
* /giraph/trunk/src/main/java/org/apache/giraph/GiraphConfiguration.java
* /giraph/trunk/src/main/java/org/apache/giraph/graph/BspService.java
* /giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceMaster.java
* /giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java
* /giraph/trunk/src/main/java/org/apache/giraph/zk/ZooKeeperManager.java

                
> Improve ZooKeeper issues
> ------------------------
>
>                 Key: GIRAPH-356
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-356
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Avery Ching
>            Assignee: Avery Ching
>         Attachments: GIRAPH-356.2.patch, GIRAPH-356.patch
>
>
> Currently, if the ZooKeeper process fails, we have little information on why and what happened.  This patch addresses this by keeping the last 100 log lines and dumps when the map fails under a RuntimeException.
> Here is an example of a master task failure when there is an invalid JVM argument passed to ZooKeeper.  The error is much for obvious now.
> 2012-10-04 15:05:28,916 WARN org.apache.giraph.zk.ZooKeeperManager: logZooKeeperOutput: Dumping up to last 100 lines of the ZooKeeper process STDOUT and STDERR.
> 2012-10-04 15:05:28,916 WARN org.apache.giraph.zk.ZooKeeperManager$StreamCollector: Unrecognized option: -BadOpt
> 2012-10-04 15:05:28,916 WARN org.apache.giraph.zk.ZooKeeperManager$StreamCollector: Could not create the Java virtual machine.
> 2012-10-04 15:05:28,919 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
> 2012-10-04 15:05:28,959 WARN org.apache.hadoop.mapred.Child: Error running child
> java.lang.IllegalStateException: run: Caught an unrecoverable exception onlineZooKeeperServers: Failed to connect in 5 tries!
>                                  at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:591)
>                                  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
>                                  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
>                                  at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
>                                  at java.security.AccessController.doPrivileged(Native Method)
>                                  at javax.security.auth.Subject.doAs(Subject.java:396)
>                                  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>                                  at org.apache.hadoop.mapred.Child.main(Child.java:253)
> Caused by: java.lang.IllegalStateException: onlineZooKeeperServers: Failed to connect in 5 tries!
>        at org.apache.giraph.zk.ZooKeeperManager.onlineZooKeeperServers(ZooKeeperManager.java:721)
>        at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:328)
>        at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:573)
>        ... 7 more
> 2012-10-04 15:05:28,963 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-356) Improve ZooKeeper issues

Posted by "Alessandro Presta (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13470666#comment-13470666 ] 

Alessandro Presta commented on GIRAPH-356:
------------------------------------------

Looks good, +1.
                
> Improve ZooKeeper issues
> ------------------------
>
>                 Key: GIRAPH-356
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-356
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Avery Ching
>            Assignee: Avery Ching
>         Attachments: GIRAPH-356.2.patch, GIRAPH-356.patch
>
>
> Currently, if the ZooKeeper process fails, we have little information on why and what happened.  This patch addresses this by keeping the last 100 log lines and dumps when the map fails under a RuntimeException.
> Here is an example of a master task failure when there is an invalid JVM argument passed to ZooKeeper.  The error is much for obvious now.
> 2012-10-04 15:05:28,916 WARN org.apache.giraph.zk.ZooKeeperManager: logZooKeeperOutput: Dumping up to last 100 lines of the ZooKeeper process STDOUT and STDERR.
> 2012-10-04 15:05:28,916 WARN org.apache.giraph.zk.ZooKeeperManager$StreamCollector: Unrecognized option: -BadOpt
> 2012-10-04 15:05:28,916 WARN org.apache.giraph.zk.ZooKeeperManager$StreamCollector: Could not create the Java virtual machine.
> 2012-10-04 15:05:28,919 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
> 2012-10-04 15:05:28,959 WARN org.apache.hadoop.mapred.Child: Error running child
> java.lang.IllegalStateException: run: Caught an unrecoverable exception onlineZooKeeperServers: Failed to connect in 5 tries!
>                                  at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:591)
>                                  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
>                                  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
>                                  at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
>                                  at java.security.AccessController.doPrivileged(Native Method)
>                                  at javax.security.auth.Subject.doAs(Subject.java:396)
>                                  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>                                  at org.apache.hadoop.mapred.Child.main(Child.java:253)
> Caused by: java.lang.IllegalStateException: onlineZooKeeperServers: Failed to connect in 5 tries!
>        at org.apache.giraph.zk.ZooKeeperManager.onlineZooKeeperServers(ZooKeeperManager.java:721)
>        at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:328)
>        at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:573)
>        ... 7 more
> 2012-10-04 15:05:28,963 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (GIRAPH-356) Help debug ZooKeeper issues

Posted by "Avery Ching (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/GIRAPH-356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Avery Ching updated GIRAPH-356:
-------------------------------

    Attachment: GIRAPH-356.2.patch

Updated patch to address all the ZooKeeper issues I could find at scale.

-Configuration ZooKeeper connection attempts, min/max session timeout, force sync (off for perf), skip ACLS (no for perf)
-Do not kill job on a disconnect event, it's still possible for the client to connect again, only session expired is bad
-Dump failed workers on the master when a superstep does not get started due to missing ZooKeeper health
-Dump last 100 lines of ZooKeeper process stdout/stderr when there is a failure that could be related to ZooKeeper
-Small change for more descriptive message when can't find last good checkpoint
                
> Help debug ZooKeeper issues
> ---------------------------
>
>                 Key: GIRAPH-356
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-356
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Avery Ching
>            Assignee: Avery Ching
>         Attachments: GIRAPH-356.2.patch, GIRAPH-356.patch
>
>
> Currently, if the ZooKeeper process fails, we have little information on why and what happened.  This patch addresses this by keeping the last 100 log lines and dumps when the map fails under a RuntimeException.
> Here is an example of a master task failure when there is an invalid JVM argument passed to ZooKeeper.  The error is much for obvious now.
> 2012-10-04 15:05:28,916 WARN org.apache.giraph.zk.ZooKeeperManager: logZooKeeperOutput: Dumping up to last 100 lines of the ZooKeeper process STDOUT and STDERR.
> 2012-10-04 15:05:28,916 WARN org.apache.giraph.zk.ZooKeeperManager$StreamCollector: Unrecognized option: -BadOpt
> 2012-10-04 15:05:28,916 WARN org.apache.giraph.zk.ZooKeeperManager$StreamCollector: Could not create the Java virtual machine.
> 2012-10-04 15:05:28,919 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
> 2012-10-04 15:05:28,959 WARN org.apache.hadoop.mapred.Child: Error running child
> java.lang.IllegalStateException: run: Caught an unrecoverable exception onlineZooKeeperServers: Failed to connect in 5 tries!
>                                  at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:591)
>                                  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
>                                  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
>                                  at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
>                                  at java.security.AccessController.doPrivileged(Native Method)
>                                  at javax.security.auth.Subject.doAs(Subject.java:396)
>                                  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>                                  at org.apache.hadoop.mapred.Child.main(Child.java:253)
> Caused by: java.lang.IllegalStateException: onlineZooKeeperServers: Failed to connect in 5 tries!
>        at org.apache.giraph.zk.ZooKeeperManager.onlineZooKeeperServers(ZooKeeperManager.java:721)
>        at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:328)
>        at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:573)
>        ... 7 more
> 2012-10-04 15:05:28,963 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-356) Improve ZooKeeper issues

Posted by "Avery Ching (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13470637#comment-13470637 ] 

Avery Ching commented on GIRAPH-356:
------------------------------------

This should be ready to go, passes 'mvn clean verify' and works on a real cluster with a big app.
                
> Improve ZooKeeper issues
> ------------------------
>
>                 Key: GIRAPH-356
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-356
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Avery Ching
>            Assignee: Avery Ching
>         Attachments: GIRAPH-356.2.patch, GIRAPH-356.patch
>
>
> Currently, if the ZooKeeper process fails, we have little information on why and what happened.  This patch addresses this by keeping the last 100 log lines and dumps when the map fails under a RuntimeException.
> Here is an example of a master task failure when there is an invalid JVM argument passed to ZooKeeper.  The error is much for obvious now.
> 2012-10-04 15:05:28,916 WARN org.apache.giraph.zk.ZooKeeperManager: logZooKeeperOutput: Dumping up to last 100 lines of the ZooKeeper process STDOUT and STDERR.
> 2012-10-04 15:05:28,916 WARN org.apache.giraph.zk.ZooKeeperManager$StreamCollector: Unrecognized option: -BadOpt
> 2012-10-04 15:05:28,916 WARN org.apache.giraph.zk.ZooKeeperManager$StreamCollector: Could not create the Java virtual machine.
> 2012-10-04 15:05:28,919 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
> 2012-10-04 15:05:28,959 WARN org.apache.hadoop.mapred.Child: Error running child
> java.lang.IllegalStateException: run: Caught an unrecoverable exception onlineZooKeeperServers: Failed to connect in 5 tries!
>                                  at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:591)
>                                  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
>                                  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
>                                  at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
>                                  at java.security.AccessController.doPrivileged(Native Method)
>                                  at javax.security.auth.Subject.doAs(Subject.java:396)
>                                  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>                                  at org.apache.hadoop.mapred.Child.main(Child.java:253)
> Caused by: java.lang.IllegalStateException: onlineZooKeeperServers: Failed to connect in 5 tries!
>        at org.apache.giraph.zk.ZooKeeperManager.onlineZooKeeperServers(ZooKeeperManager.java:721)
>        at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:328)
>        at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:573)
>        ... 7 more
> 2012-10-04 15:05:28,963 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (GIRAPH-356) Help debug ZooKeeper issues

Posted by "Avery Ching (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/GIRAPH-356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Avery Ching updated GIRAPH-356:
-------------------------------

    Attachment: GIRAPH-356.patch

Here is a patch that produces this new output.
                
> Help debug ZooKeeper issues
> ---------------------------
>
>                 Key: GIRAPH-356
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-356
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Avery Ching
>         Attachments: GIRAPH-356.patch
>
>
> Currently, if the ZooKeeper process fails, we have little information on why and what happened.  This patch addresses this by keeping the last 100 log lines and dumps when the map fails under a RuntimeException.
> Here is an example of a master task failure when there is an invalid JVM argument passed to ZooKeeper.  The error is much for obvious now.
> 2012-10-04 15:05:28,916 WARN org.apache.giraph.zk.ZooKeeperManager: logZooKeeperOutput: Dumping up to last 100 lines of the ZooKeeper process STDOUT and STDERR.
> 2012-10-04 15:05:28,916 WARN org.apache.giraph.zk.ZooKeeperManager$StreamCollector: Unrecognized option: -BadOpt
> 2012-10-04 15:05:28,916 WARN org.apache.giraph.zk.ZooKeeperManager$StreamCollector: Could not create the Java virtual machine.
> 2012-10-04 15:05:28,919 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
> 2012-10-04 15:05:28,959 WARN org.apache.hadoop.mapred.Child: Error running child
> java.lang.IllegalStateException: run: Caught an unrecoverable exception onlineZooKeeperServers: Failed to connect in 5 tries!
>                                  at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:591)
>                                  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
>                                  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
>                                  at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
>                                  at java.security.AccessController.doPrivileged(Native Method)
>                                  at javax.security.auth.Subject.doAs(Subject.java:396)
>                                  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>                                  at org.apache.hadoop.mapred.Child.main(Child.java:253)
> Caused by: java.lang.IllegalStateException: onlineZooKeeperServers: Failed to connect in 5 tries!
>        at org.apache.giraph.zk.ZooKeeperManager.onlineZooKeeperServers(ZooKeeperManager.java:721)
>        at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:328)
>        at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:573)
>        ... 7 more
> 2012-10-04 15:05:28,963 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (GIRAPH-356) Improve ZooKeeper issues

Posted by "Avery Ching (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/GIRAPH-356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Avery Ching updated GIRAPH-356:
-------------------------------

    Summary: Improve ZooKeeper issues  (was: Help debug ZooKeeper issues)
    
> Improve ZooKeeper issues
> ------------------------
>
>                 Key: GIRAPH-356
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-356
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Avery Ching
>            Assignee: Avery Ching
>         Attachments: GIRAPH-356.2.patch, GIRAPH-356.patch
>
>
> Currently, if the ZooKeeper process fails, we have little information on why and what happened.  This patch addresses this by keeping the last 100 log lines and dumps when the map fails under a RuntimeException.
> Here is an example of a master task failure when there is an invalid JVM argument passed to ZooKeeper.  The error is much for obvious now.
> 2012-10-04 15:05:28,916 WARN org.apache.giraph.zk.ZooKeeperManager: logZooKeeperOutput: Dumping up to last 100 lines of the ZooKeeper process STDOUT and STDERR.
> 2012-10-04 15:05:28,916 WARN org.apache.giraph.zk.ZooKeeperManager$StreamCollector: Unrecognized option: -BadOpt
> 2012-10-04 15:05:28,916 WARN org.apache.giraph.zk.ZooKeeperManager$StreamCollector: Could not create the Java virtual machine.
> 2012-10-04 15:05:28,919 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
> 2012-10-04 15:05:28,959 WARN org.apache.hadoop.mapred.Child: Error running child
> java.lang.IllegalStateException: run: Caught an unrecoverable exception onlineZooKeeperServers: Failed to connect in 5 tries!
>                                  at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:591)
>                                  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
>                                  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
>                                  at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
>                                  at java.security.AccessController.doPrivileged(Native Method)
>                                  at javax.security.auth.Subject.doAs(Subject.java:396)
>                                  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>                                  at org.apache.hadoop.mapred.Child.main(Child.java:253)
> Caused by: java.lang.IllegalStateException: onlineZooKeeperServers: Failed to connect in 5 tries!
>        at org.apache.giraph.zk.ZooKeeperManager.onlineZooKeeperServers(ZooKeeperManager.java:721)
>        at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:328)
>        at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:573)
>        ... 7 more
> 2012-10-04 15:05:28,963 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-356) Improve ZooKeeper issues

Posted by "Avery Ching (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13470691#comment-13470691 ] 

Avery Ching commented on GIRAPH-356:
------------------------------------

Thanks for the quick review.  Committing.
                
> Improve ZooKeeper issues
> ------------------------
>
>                 Key: GIRAPH-356
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-356
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Avery Ching
>            Assignee: Avery Ching
>         Attachments: GIRAPH-356.2.patch, GIRAPH-356.patch
>
>
> Currently, if the ZooKeeper process fails, we have little information on why and what happened.  This patch addresses this by keeping the last 100 log lines and dumps when the map fails under a RuntimeException.
> Here is an example of a master task failure when there is an invalid JVM argument passed to ZooKeeper.  The error is much for obvious now.
> 2012-10-04 15:05:28,916 WARN org.apache.giraph.zk.ZooKeeperManager: logZooKeeperOutput: Dumping up to last 100 lines of the ZooKeeper process STDOUT and STDERR.
> 2012-10-04 15:05:28,916 WARN org.apache.giraph.zk.ZooKeeperManager$StreamCollector: Unrecognized option: -BadOpt
> 2012-10-04 15:05:28,916 WARN org.apache.giraph.zk.ZooKeeperManager$StreamCollector: Could not create the Java virtual machine.
> 2012-10-04 15:05:28,919 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
> 2012-10-04 15:05:28,959 WARN org.apache.hadoop.mapred.Child: Error running child
> java.lang.IllegalStateException: run: Caught an unrecoverable exception onlineZooKeeperServers: Failed to connect in 5 tries!
>                                  at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:591)
>                                  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
>                                  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
>                                  at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
>                                  at java.security.AccessController.doPrivileged(Native Method)
>                                  at javax.security.auth.Subject.doAs(Subject.java:396)
>                                  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>                                  at org.apache.hadoop.mapred.Child.main(Child.java:253)
> Caused by: java.lang.IllegalStateException: onlineZooKeeperServers: Failed to connect in 5 tries!
>        at org.apache.giraph.zk.ZooKeeperManager.onlineZooKeeperServers(ZooKeeperManager.java:721)
>        at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:328)
>        at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:573)
>        ... 7 more
> 2012-10-04 15:05:28,963 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira