You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@giraph.apache.org by Cheng Wang <su...@gmail.com> on 2015/04/29 23:28:28 UTC

problems running my simple page rank example

Hi,

I am new to Giraph. Recently I am trying to write a very simple PageRank
program using Giraph, which is as below:

package org.apache.giraph.examples;

import org.apache.giraph.graph.BasicComputation;
import org.apache.giraph.conf.LongConfOption;
import org.apache.giraph.edge.Edge;
import org.apache.giraph.graph.Vertex;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.FloatWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.log4j.Logger;

import java.io.IOException;

/**
 * My simplified Google page rank example.
 */
@Algorithm(
    name = "Page Rank",
    description = "My simplified page rank"
)

public class MyPageRankComputation extends BasicComputation<
    LongWritable, DoubleWritable, FloatWritable, DoubleWritable> {

  public static final int MAX_SUPERSTEPS = 2;

  @Override
  public void compute(Vertex<LongWritable, DoubleWritable, FloatWritable>
vertex,
      Iterable<DoubleWritable> messages) throws IOException {

    if (getSuperstep() >= 1) {
      double sum = 0;
      for (DoubleWritable message : messages) {
        sum += message.get();
      }
      vertex.setValue(new DoubleWritable(sum));
    }

    if (getSuperstep() < MAX_SUPERSTEPS) {
      int numEdges = vertex.getNumEdges();
      DoubleWritable message = new DoubleWritable(vertex.getValue().get() /
numEdges);
      sendMessageToAllEdges(vertex, message);
    } else {
      vertex.voteToHalt();
    }
  }
}

I didn't use Aggregator just to make the program simple.
And put the program under the path of the giraph examples:
/home/hduser/my-giraph/giraph-examples/src/main/java/org/apache/giraph/examples

where I just extract the folder giraph-examples from the giraph repo and
put it into another folder called my-giraph.

The compilation is fine. I also set the HADOOP_CLASSPATH as:

export
HADOOP_CLASSPATH=/home/hduser/my-giraph/giraph-examples/target/giraph-examples-1.2.0-SNAPSHOT-for-hadoop-1.2.1-jar-with-dependencies.jar:$HADOOP_PATH

export
LIBJARS=/home/hduser/my-giraph/giraph-examples/target/giraph-examples-1.2.0-SNAPSHOT-for-hadoop-1.2.1-jar-with-dependencies.jar:/usr/local/giraph/giraph-core.jar


TO run the program, I provide the input command line which I mimic the
"Giraph Quick Start Guide, Running a Giraph Job",
http://giraph.apache.org/quick_start.html

$HADOOP_HOME/bin/hadoop jar
$GIRAPH_HOME/giraph-examples/target/giraph-examples-1.2.0-SNAPSHOT-for-hadoop-1.2.1-jar-with-dependencies.jar
 org.apache.giraph.GiraphRunner
org.apache.giraph.examples.MyPageRankComputation -vif
org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat
-vip /user/hduser/page_rank/input/tiny_input.txt -vof
org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op
/user/hduser/page_rank/output -w 1

The input is very similar to SSSP's, which is :

[1,0.2,[[2,0],[4,0]]]
[2,0.2,[3,0],[5,0]]
[3,0.2,[4,0]]
[4,0.2,[5,0]]
[5,0.2,[1,0],[2,0],[3,0]]

So far so good !!

---------------
Now the problem is when I run the job, it gets hanged on the reduce phase,
of which is shown as below:
////////////////////////////////////////////////
hduser@cwang ~/my-giraph/giraph-examples/target $ $HADOOP_HOME/bin/hadoop
jar
$GIRAPH_HOME/giraph-examples/target/giraph-examples-1.2.0-SNAPSHOT-for-hadoop-1.2.1-jar-with-dependencies.jar
 org.apache.giraph.GiraphRunner
org.apache.giraph.examples.MyPageRankComputation -vif
org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat
-vip /user/hduser/page_rank/input/tiny_input.txt -vof
org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op
/user/hduser/page_rank/output -w 1
15/04/29 16:14:59 INFO utils.ConfigurationUtils: No edge input format
specified. Ensure your InputFormat does not require one.
15/04/29 16:14:59 INFO utils.ConfigurationUtils: No edge output format
specified. Ensure your OutputFormat does not require one.
15/04/29 16:15:00 INFO job.GiraphJob: run: Since checkpointing is disabled
(default), do not allow any task retries (setting mapred.map.max.attempts =
1, old value = 4)
15/04/29 16:15:02 INFO job.GiraphJob: Tracking URL:
http://hdnode01:50030/jobdetails.jsp?jobid=job_201504291528_0005
15/04/29 16:15:02 INFO job.GiraphJob: Waiting for resources... Job will
start only when it gets all 2 mappers
15/04/29 16:15:39 INFO
job.HaltApplicationUtils$DefaultHaltInstructionsWriter:
writeHaltInstructions: To halt after next superstep execute:
'bin/halt-application --zkServer cwang:22181 --zkNode
/_hadoopBsp/job_201504291528_0005/_haltComputation'
15/04/29 16:15:39 INFO mapred.JobClient: Running job: job_201504291528_0005
15/04/29 16:15:40 INFO mapred.JobClient:  map 100% reduce 0%
15/04/29 16:20:28 INFO mapred.JobClient: Job complete: job_201504291528_0005
15/04/29 16:20:28 INFO mapred.JobClient: Counters: 5
15/04/29 16:20:28 INFO mapred.JobClient:   Job Counters
15/04/29 16:20:28 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=625803
15/04/29 16:20:28 INFO mapred.JobClient:     Total time spent by all
reduces waiting after reserving slots (ms)=0
15/04/29 16:20:28 INFO mapred.JobClient:     Total time spent by all maps
waiting after reserving slots (ms)=0
15/04/29 16:20:28 INFO mapred.JobClient:     Launched map tasks=2
15/04/29 16:20:28 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
//////////////////////////////////////////////////

And there is no desired output generated.

Can someone tell me where is the problem?


Thanks
Cheng

Re: problems running my simple page rank example

Posted by Cheng Wang <su...@gmail.com>.
Superb!!
It simply works if I used an "undirected" graph, i.e, if there is an edge
from vertex a to b, there must be another edge from b to a.
I also tested the Shortest Path program given by the Giraph code... by
removing one edge from the input file, and it repeated the same problem as
mine: the program hangs on 100% on Mapper....

I do read the blog  but I am still not clear where is the problem? Can we
say it is a bug in Giraph??

Cheng

On Thu, Apr 30, 2015 at 2:59 PM, Ritesh Kumar Singh <
riteshoneinamillion@gmail.com> wrote:

> Try this blog once :
>
> marsty5.com/2013/04/29/run-example-in-giraph-shortest-paths/
>
> Helped a lot of us :)
>

Re: problems running my simple page rank example

Posted by Ritesh Kumar Singh <ri...@gmail.com>.
Try this blog once :

marsty5.com/2013/04/29/run-example-in-giraph-shortest-paths/

Helped a lot of us :)

Re: problems running my simple page rank example

Posted by Cheng Wang <su...@gmail.com>.
Hello,

Could someone respond my question?

Thanks

On Wed, Apr 29, 2015 at 4:28 PM, Cheng Wang <su...@gmail.com>
wrote:

> Hi,
>
> I am new to Giraph. Recently I am trying to write a very simple PageRank
> program using Giraph, which is as below:
>
> package org.apache.giraph.examples;
>
> import org.apache.giraph.graph.BasicComputation;
> import org.apache.giraph.conf.LongConfOption;
> import org.apache.giraph.edge.Edge;
> import org.apache.giraph.graph.Vertex;
> import org.apache.hadoop.io.DoubleWritable;
> import org.apache.hadoop.io.FloatWritable;
> import org.apache.hadoop.io.LongWritable;
> import org.apache.log4j.Logger;
>
> import java.io.IOException;
>
> /**
>  * My simplified Google page rank example.
>  */
> @Algorithm(
>     name = "Page Rank",
>     description = "My simplified page rank"
> )
>
> public class MyPageRankComputation extends BasicComputation<
>     LongWritable, DoubleWritable, FloatWritable, DoubleWritable> {
>
>   public static final int MAX_SUPERSTEPS = 2;
>
>   @Override
>   public void compute(Vertex<LongWritable, DoubleWritable, FloatWritable>
> vertex,
>       Iterable<DoubleWritable> messages) throws IOException {
>
>     if (getSuperstep() >= 1) {
>       double sum = 0;
>       for (DoubleWritable message : messages) {
>         sum += message.get();
>       }
>       vertex.setValue(new DoubleWritable(sum));
>     }
>
>     if (getSuperstep() < MAX_SUPERSTEPS) {
>       int numEdges = vertex.getNumEdges();
>       DoubleWritable message = new DoubleWritable(vertex.getValue().get()
> / numEdges);
>       sendMessageToAllEdges(vertex, message);
>     } else {
>       vertex.voteToHalt();
>     }
>   }
> }
>
> I didn't use Aggregator just to make the program simple.
> And put the program under the path of the giraph examples:
>
> /home/hduser/my-giraph/giraph-examples/src/main/java/org/apache/giraph/examples
>
> where I just extract the folder giraph-examples from the giraph repo and
> put it into another folder called my-giraph.
>
> The compilation is fine. I also set the HADOOP_CLASSPATH as:
>
> export
> HADOOP_CLASSPATH=/home/hduser/my-giraph/giraph-examples/target/giraph-examples-1.2.0-SNAPSHOT-for-hadoop-1.2.1-jar-with-dependencies.jar:$HADOOP_PATH
>
> export
> LIBJARS=/home/hduser/my-giraph/giraph-examples/target/giraph-examples-1.2.0-SNAPSHOT-for-hadoop-1.2.1-jar-with-dependencies.jar:/usr/local/giraph/giraph-core.jar
>
>
> TO run the program, I provide the input command line which I mimic the
> "Giraph Quick Start Guide, Running a Giraph Job",
> http://giraph.apache.org/quick_start.html
>
> $HADOOP_HOME/bin/hadoop jar
> $GIRAPH_HOME/giraph-examples/target/giraph-examples-1.2.0-SNAPSHOT-for-hadoop-1.2.1-jar-with-dependencies.jar
>  org.apache.giraph.GiraphRunner
> org.apache.giraph.examples.MyPageRankComputation -vif
> org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat
> -vip /user/hduser/page_rank/input/tiny_input.txt -vof
> org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op
> /user/hduser/page_rank/output -w 1
>
> The input is very similar to SSSP's, which is :
>
> [1,0.2,[[2,0],[4,0]]]
> [2,0.2,[3,0],[5,0]]
> [3,0.2,[4,0]]
> [4,0.2,[5,0]]
> [5,0.2,[1,0],[2,0],[3,0]]
>
> So far so good !!
>
> ---------------
> Now the problem is when I run the job, it gets hanged on the reduce phase,
> of which is shown as below:
> ////////////////////////////////////////////////
> hduser@cwang ~/my-giraph/giraph-examples/target $ $HADOOP_HOME/bin/hadoop
> jar
> $GIRAPH_HOME/giraph-examples/target/giraph-examples-1.2.0-SNAPSHOT-for-hadoop-1.2.1-jar-with-dependencies.jar
>  org.apache.giraph.GiraphRunner
> org.apache.giraph.examples.MyPageRankComputation -vif
> org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat
> -vip /user/hduser/page_rank/input/tiny_input.txt -vof
> org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op
> /user/hduser/page_rank/output -w 1
> 15/04/29 16:14:59 INFO utils.ConfigurationUtils: No edge input format
> specified. Ensure your InputFormat does not require one.
> 15/04/29 16:14:59 INFO utils.ConfigurationUtils: No edge output format
> specified. Ensure your OutputFormat does not require one.
> 15/04/29 16:15:00 INFO job.GiraphJob: run: Since checkpointing is disabled
> (default), do not allow any task retries (setting mapred.map.max.attempts =
> 1, old value = 4)
> 15/04/29 16:15:02 INFO job.GiraphJob: Tracking URL:
> http://hdnode01:50030/jobdetails.jsp?jobid=job_201504291528_0005
> 15/04/29 16:15:02 INFO job.GiraphJob: Waiting for resources... Job will
> start only when it gets all 2 mappers
> 15/04/29 16:15:39 INFO
> job.HaltApplicationUtils$DefaultHaltInstructionsWriter:
> writeHaltInstructions: To halt after next superstep execute:
> 'bin/halt-application --zkServer cwang:22181 --zkNode
> /_hadoopBsp/job_201504291528_0005/_haltComputation'
> 15/04/29 16:15:39 INFO mapred.JobClient: Running job: job_201504291528_0005
> 15/04/29 16:15:40 INFO mapred.JobClient:  map 100% reduce 0%
> 15/04/29 16:20:28 INFO mapred.JobClient: Job complete:
> job_201504291528_0005
> 15/04/29 16:20:28 INFO mapred.JobClient: Counters: 5
> 15/04/29 16:20:28 INFO mapred.JobClient:   Job Counters
> 15/04/29 16:20:28 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=625803
> 15/04/29 16:20:28 INFO mapred.JobClient:     Total time spent by all
> reduces waiting after reserving slots (ms)=0
> 15/04/29 16:20:28 INFO mapred.JobClient:     Total time spent by all maps
> waiting after reserving slots (ms)=0
> 15/04/29 16:20:28 INFO mapred.JobClient:     Launched map tasks=2
> 15/04/29 16:20:28 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
> //////////////////////////////////////////////////
>
> And there is no desired output generated.
>
> Can someone tell me where is the problem?
>
>
> Thanks
> Cheng
>
>