You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@giraph.apache.org by Jeffrey Yunes <je...@berkeley.edu> on 2012/02/17 03:12:39 UTC

maven, hadoop, zookeeper, and giraph!

Hi Giraph community,
I think I followed all of the directions (for a Giraph on a psuedo-cluster), and it looks like

> mvn clean test -Dprop.mapred.job.tracker=localhost:9001

runs fine. However, I'm new to the Hadoop infrastructure, and have a couple of questions about getting started with Giraph.

1)
> hadoop jar target/giraph-0.2-SNAPSHOT-jar-with-dependencies.jar org.apache.giraph.benchmark.PageRankBenchmark -e 1 -s 3 -v -V 50 -w 3

gives me the error "java.lang.NullPointerException at at org.apache.giraph.benchmark.PageRankBenchmark.run(PageRankBenchmark.java:127)" It looks like some error with configuration?

2) How should I / do I enable the log4j? An appender that writes to the HDFS? How else could I grep all my logs for errors and things?

3) With regard to Giraph and maven, none of the directions suggested doing "local overrides." Therefore, why should I expect my Giraph installation to refer to libraries and configuration in "~/Applications/hadoop or zookeeper" rather than those in "~.m2/repo?"

4) Why doesn't running maven for Giraph install hadoop along the way (or does it)?

I'd appreciate if you'd help improve my understanding!

Thanks!
-Jeff




Re: maven, hadoop, zookeeper, and giraph!

Posted by Avery Ching <ac...@apache.org>.
Hi Jeffrey,

Best attempt as answers inline.

On 2/16/12 6:12 PM, Jeffrey Yunes wrote:
> Hi Giraph community,
> I think I followed all of the directions (for a Giraph on a psuedo-cluster), and it looks like
>
>> mvn clean test -Dprop.mapred.job.tracker=localhost:9001
> runs fine. However, I'm new to the Hadoop infrastructure, and have a couple of questions about getting started with Giraph.
>
> 1)
>> hadoop jar target/giraph-0.2-SNAPSHOT-jar-with-dependencies.jar org.apache.giraph.benchmark.PageRankBenchmark -e 1 -s 3 -v -V 50 -w 3
> gives me the error "java.lang.NullPointerException at at org.apache.giraph.benchmark.PageRankBenchmark.run(PageRankBenchmark.java:127)" It looks like some error with configuration?

This is a bug.  I have a quick fix for it.  Sorry about that.  I opened 
an issue for it.  https://issues.apache.org/jira/browse/GIRAPH-150

diff --git 
a/src/main/java/org/apache/giraph/benchmark/PageRankBenchmark.java b/
index 0e76122..4d08929 100644
--- a/src/main/java/org/apache/giraph/benchmark/PageRankBenchmark.java
+++ b/src/main/java/org/apache/giraph/benchmark/PageRankBenchmark.java
@@ -124,7 +124,8 @@ public class PageRankBenchmark extends EdgeListVertex<
      } else {
        job.setVertexClass(PageRankBenchmark.class);
      }
-    LOG.info("Using class " + 
BspUtils.getVertexClass(getConf()).getName());
+    LOG.info("Using class " +
+        BspUtils.getVertexClass(job.getConfiguration()).getName());
      job.setVertexInputFormatClass(PseudoRandomVertexInputFormat.class);
      job.setWorkerConfiguration(workers, workers, 100.0f);

> 2) How should I / do I enable the log4j? An appender that writes to the HDFS? How else could I grep all my logs for errors and things?
log4j is used by the task trackers to dump to the job logs.  If you 
click on your running job in the web page, you can then click into each 
task and look at the logs under 'Task Logs'.  You can configure the task 
tracker log4jproperties to set the log level, but the default is info I 
believe.
> 3) With regard to Giraph and maven, none of the directions suggested doing "local overrides." Therefore, why should I expect my Giraph installation to refer to libraries and configuration in "~/Applications/hadoop or zookeeper" rather than those in "~.m2/repo?"
Giraph builts a massive jar that has all the required classes and jars 
to launch ZooKeeper and interact with Hadoop.  This makes for easy 
deployment to a running cluster.

> 4) Why doesn't running maven for Giraph install hadoop along the way (or does it)?
Because there are so many versions of Hadoop and if you are lauching 
Hadoop, then the hadoop jar should be in your classpath automatically.

> I'd appreciate if you'd help improve my understanding!
No problem.  Welcome to Giraph!

> Thanks!
> -Jeff
>
>
>