You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by "Alessio Arleo (JIRA)" <ji...@apache.org> on 2015/03/25 09:42:53 UTC

[jira] [Commented] (GIRAPH-970) Missing chosen workers on superstep -1

    [ https://issues.apache.org/jira/browse/GIRAPH-970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14379506#comment-14379506 ] 

Alessio Arleo commented on GIRAPH-970:
--------------------------------------

In my case the error was related to Giraph-904 bug (https://issues.apache.org/jira/browse/GIRAPH-904). In fact, in the first line* the system logs that there are missing workers. In this case, my hostname was made up by both lowercase and uppercase letters, while the system reports my hostname with only lowecase letters. I managed to solve my issue by using a hostname made up by only lowercase chars. Anyway, this is not a solution, and must be solved going to the root of the problem. I do not know if this issue has been solved in 1.2.0, but for sure it is an open issue in Giraph 1.1.0. I'll investigate further and try to solve the issue, but help is appreciated.

*"Missing chosen workers [Worker(hostname=virtualmint-h023, MRtaskID=1, port=30001)] on superstep -1"

> Missing chosen workers on superstep -1
> --------------------------------------
>
>                 Key: GIRAPH-970
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-970
>             Project: Giraph
>          Issue Type: Bug
>          Components: bsp
>    Affects Versions: 1.1.0
>         Environment: Linux version 3.13.0-37-generic (buildd@kapok) (gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) 64 bit
> Hadoop 1.2.1
>            Reporter: Alessio Arleo
>
> I found a problem with Giraph 1.1.0 while trying to execute the ShortestPathComputation example. 
> This is the command given:
> $HADOOP_HOME/bin/hadoop jar  ~/git/giraph_patched/giraph-examples/target/giraph-examples-1.1.0-for-hadoop-1.2.1-jar-with-dependencies.jar org.apache.giraph.GiraphRunner  org.apache.giraph.examples.SimpleShortestPathsComputation -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip /users/hadoop/input/tiny_graph.txt -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op /users/hadoop/output/shortestpath -w 1
> And there is the output:
> #################################
> Warning: $HADOOP_HOME is deprecated.
> 14/12/15 12:07:36 INFO utils.ConfigurationUtils: No edge input format specified. Ensure your InputFormat does not require one.
> 14/12/15 12:07:36 INFO utils.ConfigurationUtils: No edge output format specified. Ensure your OutputFormat does not require one.
> 14/12/15 12:07:36 INFO job.GiraphJob: run: Since checkpointing is disabled (default), do not allow any task retries (setting mapred.map.max.attempts = 0, old value = 4)
> 14/12/15 12:07:38 INFO job.GiraphJob: Tracking URL: http://VirtualMINT-H023:50030/jobdetails.jsp?jobid=job_201412151205_0001
> 14/12/15 12:07:38 INFO job.GiraphJob: Waiting for resources... Job will start only when it gets all 2 mappers
> 14/12/15 12:08:51 INFO job.HaltApplicationUtils$DefaultHaltInstructionsWriter: writeHaltInstructions: To halt after next superstep execute: 'bin/halt-application --zkServer virtualmint-h023:22181 --zkNode /_hadoopBsp/job_201412151205_0001/_haltComputation'
> 14/12/15 12:08:51 INFO mapred.JobClient: Running job: job_201412151205_0001
> 14/12/15 12:08:52 INFO mapred.JobClient:  map 100% reduce 0%
> ################################
> The computation hangs here until the timeout is reached. Here is what I found while reading the first worker log.
> 2014-12-15 12:12:16,303 INFO org.apache.giraph.master.BspServiceMaster: createVertexInputSplits: Starting to write input split data to zookeeper with 1 threads
> 2014-12-15 12:12:16,314 INFO org.apache.giraph.master.BspServiceMaster: createVertexInputSplits: Done writing input split data to zookeeper
> 2014-12-15 12:12:16,332 INFO org.apache.giraph.comm.netty.NettyClient: Using Netty without authentication.
> 2014-12-15 12:12:16,341 INFO org.apache.giraph.comm.netty.NettyClient: connectAllAddresses: Successfully added 1 connections, (1 total connected) 0 failed, 0 failures total.
> 2014-12-15 12:12:16,344 INFO org.apache.giraph.partition.PartitionUtils: computePartitionCount: Creating 1, default would have been 1 partitions.
> 2014-12-15 12:12:16,373 INFO org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: 0 out of 1 workers finished on superstep -1 on path /_hadoopBsp/job_201412151211_0001/_vertexInputSplitDoneDir
> 2014-12-15 12:12:16,375 INFO org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: Waiting on [virtualmint-h023_1]
> 2014-12-15 12:12:16,393 INFO org.apache.giraph.comm.netty.NettyServer: start: Using Netty without authentication.
> 2014-12-15 12:12:16,464 ERROR org.apache.giraph.master.BspServiceMaster: barrierOnWorkerList: Missing chosen workers [Worker(hostname=virtualmint-h023, MRtaskID=1, port=30001)] on superstep -1
> 2014-12-15 12:12:16,464 ERROR org.apache.giraph.master.MasterThread: masterThread: Master algorithm failed with IllegalStateException
> java.lang.IllegalStateException: coordinateVertexInputSplits: Worker failed during input split (currently not supported)
> 	at org.apache.giraph.master.BspServiceMaster.coordinateInputSplits(BspServiceMaster.java:1489)
> 	at org.apache.giraph.master.BspServiceMaster.coordinateSuperstep(BspServiceMaster.java:1656)
> 	at org.apache.giraph.master.MasterThread.run(MasterThread.java:124)
> 2014-12-15 12:12:16,464 FATAL org.apache.giraph.graph.GraphTaskManager: uncaughtException: OverrideExceptionHandler on thread org.apache.giraph.master.MasterThread, msg = java.lang.IllegalStateException: coordinateVertexInputSplits: Worker failed during input split (currently not supported), exiting...
> java.lang.IllegalStateException: java.lang.IllegalStateException: coordinateVertexInputSplits: Worker failed during input split (currently not supported)
> 	at org.apache.giraph.master.MasterThread.run(MasterThread.java:194)
> Caused by: java.lang.IllegalStateException: coordinateVertexInputSplits: Worker failed during input split (currently not supported)
> 	at org.apache.giraph.master.BspServiceMaster.coordinateInputSplits(BspServiceMaster.java:1489)
> 	at org.apache.giraph.master.BspServiceMaster.coordinateSuperstep(BspServiceMaster.java:1656)
> 	at org.apache.giraph.master.MasterThread.run(MasterThread.java:124)
> 2014-12-15 12:12:16,464 WARN org.apache.giraph.zk.ZooKeeperManager: logZooKeeperOutput: Dumping up to last 100 lines of the ZooKeeper process STDOUT and STDERR.
> ################################
> Computation does not even get to first superstep. Giraph cannot find the worker. Giraph-904 patch applied to BspServiceMaster.
> I am running the Hadoop 1.2.1 on a single machine with the configuration suggested in the Giraph Quick Start guide. Hadoop itself works fine (tested with wordcount example). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)