You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@giraph.apache.org by RainShine79 <ra...@googlemail.com> on 2014/10/16 13:57:56 UTC
HiveGiraphRunner - SanityCheck fails due to missing partition of output table?
Hello fellow coders,
i am currently trying to start a standart giraph job (PageRankVertex
) on a set of vertexes and edges drawn from hive tables. For this i use the class HiveGiraphRunner
as it is recommended at different tutorials.
As required i started a thrift-server, implemented the abstract classes which convert table data to vertexes or edges and vice versa. I also created the tables test1
(containing all vertexes), test2
(containing all edges) and test3
(being an empty 2-column table: containing one String to store the vertexIds and another Double to store the page rank values).
The final command i used to start the page rank job looks like this (except for the newlines which i added for better readability):
hadoop jar giraph-hive-1.0.0.jar org.apache.giraph.hive.HiveGiraphRunner
-libjars ~/giraph/giraph-examples-1.0.0.jar
-vertexClass org.apache.giraph.examples.PageRankVertex
-hiveToVertexClass org.apache.giraph.hive.input.vertex.MyHiveToVertexImpl
-hiveToEdgeClass org.apache.giraph.hive.input.edge.MyHiveToEdgeImpl
-vertexToHiveClass org.apache.giraph.hive.output.MyVertexToHiveImpl
-w 5 -vi test1 -ei test2 -o test3
-hiveconf hive.metastore.uris=thrift://localhost:10000
-hiveconf javax.jdo.option.ConnectionURL=jdbc:derby:;databaseName=metastore_db;create=true
The result i get looks like this:
14/10/16 08:22:49 INFO hive.metastore: Trying to connect to metastore with URI thrift://localhost:10000
14/10/16 08:22:49 INFO hive.metastore: Waiting 1 seconds before next connection attempt.
14/10/16 08:22:50 INFO hive.metastore: Connected to metastore.
Exception in thread "main" java.lang.NullPointerException
at com.facebook.giraph.hive.output.HiveOutputDescription.numPartitionValues(HiveOutputDescription.java:106)
at com.facebook.giraph.hive.output.HiveApiOutputFormat.sanityCheck(HiveApiOutputFormat.java:185)
at com.facebook.giraph.hive.output.HiveApiOutputFormat.initProfile(HiveApiOutputFormat.java:142)
at org.apache.giraph.hive.HiveGiraphRunner.setupHiveOutput(HiveGiraphRunner.java:282)
at org.apache.giraph.hive.HiveGiraphRunner.run(HiveGiraphRunner.java:236)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.giraph.hive.HiveGiraphRunner.main(HiveGiraphRunner.java:212)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
The exception suggests, that the output table test3
needs to be partitioned, but i do not understand why and in what way, since using a partition only makes inserting data more complex as it requires additional information.
Does there have to be a special kind of partition for the output table? And how will giraph be able to choose a partition to insert the output data into?
As i also put this question up at StackOverflow, where the text is better formatted, i give you the link to the question:
http://stackoverflow.com/questions/26401418/hivegiraphrunner-sanitycheck-fails-due-to-missing-partition-of-output-table
Thanks for your help in advance!
R.
Sent with Unibox