You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hama.apache.org by Apache Wiki <wi...@apache.org> on 2012/01/31 09:57:15 UTC
[Hama Wiki] Update of "PageRank" by thomasjungblut
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Hama Wiki" for change notification.
The "PageRank" page has been changed by thomasjungblut:
http://wiki.apache.org/hama/PageRank?action=diff&rev1=8&rev2=9
* Introduces partitioning and collective communication
* Lets the user submit his/her own TextFile to calculate the sites' Pagerank!
-
- == Implementation ==
-
- For detailed questions in terms of implementation have a look at my blog.
- It describes the algorithm and focuses on the main ideas showing implementation things.
-
- http://codingwiththomas.blogspot.com/2011/04/pagerank-with-apache-hama.html
-
== Usage ==
{{{
- hama/bin/hama jar ../hama-0.x.0-examples.jar page <damping factor> <epsilon error> <optional: output path> <optional: input path>
+ bin/hama jar ../hama-0.4.0-examples.jar pagerank <input path> <output path> [damping factor] [epsilon error] [tasks]
}}}
-
- Change "x" to the version you are using!
-
- '''The output path should never be the root path!''' It is default on "pagerank/out".
-
- The default parameters are:
+ The default parameters for pagerank are:
{{{
0.85 0.001
@@ -52, +39 @@
Make sure that every site's outlink can somewhere be found in the file as a key-site. Otherwise it will result in weird NullPointerExceptions.
- A call could look like this:
+ Now you need to transform the text file using:
+ {{{
+ bin/hama jar ../hama-0.4.0-examples.jar pagerank-text2seq /tmp/input.txt /tmp/out/
+ }}}
+
+ Then you can run pagerank on it with:
{{{
- 0.85 0.001 pagerank/out pagerank/in
+ bin/hama jar ../hama-0.4.0-examples.jar pagerank /tmp/out /tmp/pagerank-output
}}}
- '''Make sure that if you provide an in-path, you're setting an outpath, too!'''
+ Note that based on what you have configured, the paths may be in HDFS or on local disk.
== Output ==
@@ -69, +61 @@
== Sample Adjacencylist File ==
- You can download an adjacencylist Textfile containing 2,442,507 vertices and 32,282,149 edges, including a random web graph based on the URLs of dmoz.org (http://rdf.dmoz.org/). It is arround 680mb large and can be downloaded here:
+ You can create a large pagerank input file by using the PagerankTeragen file from here: http://code.google.com/p/hama-shortest-paths/source/browse/trunk/hama-gsoc/src/de/jungblut/hama/util/PagerankTeragen.java
+ It is based on MapReduce and requires a running Hadoop cluster. You can create a file using
- http://hama-shortest-paths.googlecode.com/svn/trunk/hama-gsoc/files/pagerank/input/pagerankAdjacencylist.txt
-
- You can run it with
{{{
- hama/bin/hama jar ../hama-0.x.0-examples.jar page 0.85 0.001 pagerank/output PATH_TO_YOUR_TXT_FILE
+ hadoop/bin hadoop -jar <jar containing the pagerank teragen> <number of vertices> <number of reducers / output files> <number of edges per vertex> <output path>
}}}
-
- Obviously you have to replace "PATH_TO_YOUR_TXT_FILE" to the path where your downloaded file lies.
Have fun! If you are facing problems, feel free to ask questions on the official mailing list.
+
+ == Implementation ==
+
+ For detailed questions in terms of implementation have a look at my blog.
+ It describes the algorithm and focuses on the main ideas showing implementation things.
+
+ http://codingwiththomas.blogspot.com/2011/04/pagerank-with-apache-hama.html
+