You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hama.apache.org by Apache Wiki <wi...@apache.org> on 2012/01/31 09:57:15 UTC

[Hama Wiki] Update of "PageRank" by thomasjungblut

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hama Wiki" for change notification.

The "PageRank" page has been changed by thomasjungblut:
http://wiki.apache.org/hama/PageRank?action=diff&rev1=8&rev2=9

   * Introduces partitioning and collective communication
   * Lets the user submit his/her own TextFile to calculate the sites' Pagerank!
  
- 
- == Implementation ==
- 
- For detailed questions in terms of implementation have a look at my blog.
- It describes the algorithm and focuses on the main ideas showing implementation things.
- 
- http://codingwiththomas.blogspot.com/2011/04/pagerank-with-apache-hama.html
- 
  == Usage ==
  
  {{{
- hama/bin/hama jar ../hama-0.x.0-examples.jar page <damping factor> <epsilon error> <optional: output path> <optional: input path>
+ bin/hama jar ../hama-0.4.0-examples.jar pagerank <input path> <output path> [damping factor] [epsilon error] [tasks]
  }}}
  
- 
- Change "x" to the version you are using!
- 
- '''The output path should never be the root path!''' It is default on "pagerank/out".
- 
- The default parameters are:
+ The default parameters for pagerank are:
  
  {{{
  0.85 0.001
@@ -52, +39 @@

  
  Make sure that every site's outlink can somewhere be found in the file as a key-site. Otherwise it will result in weird NullPointerExceptions.
  
- A call could look like this:
+ Now you need to transform the text file using:
+ {{{
+ bin/hama jar ../hama-0.4.0-examples.jar pagerank-text2seq /tmp/input.txt /tmp/out/
+ }}}
+ 
+ Then you can run pagerank on it with:
  
  {{{
- 0.85 0.001 pagerank/out pagerank/in
+ bin/hama jar ../hama-0.4.0-examples.jar pagerank /tmp/out /tmp/pagerank-output
  }}}
  
- '''Make sure that if you provide an in-path, you're setting an outpath, too!'''
+ Note that based on what you have configured, the paths may be in HDFS or on local disk.
  
  == Output ==
  
@@ -69, +61 @@

  
  == Sample Adjacencylist File ==
  
- You can download an adjacencylist Textfile containing 2,442,507 vertices and 32,282,149 edges, including a random web graph based on the URLs of dmoz.org (http://rdf.dmoz.org/). It is arround 680mb large and can be downloaded here:
+ You can create a large pagerank input file by using the PagerankTeragen file from here: http://code.google.com/p/hama-shortest-paths/source/browse/trunk/hama-gsoc/src/de/jungblut/hama/util/PagerankTeragen.java
  
+ It is based on MapReduce and requires a running Hadoop cluster. You can create a file using
- http://hama-shortest-paths.googlecode.com/svn/trunk/hama-gsoc/files/pagerank/input/pagerankAdjacencylist.txt
- 
- You can run it with
  
  {{{
- hama/bin/hama jar ../hama-0.x.0-examples.jar page 0.85 0.001 pagerank/output PATH_TO_YOUR_TXT_FILE
+ hadoop/bin hadoop -jar <jar containing the pagerank teragen> <number of vertices> <number of reducers / output files> <number of edges per vertex> <output path>
  }}}
- 
- Obviously you have to replace "PATH_TO_YOUR_TXT_FILE" to the path where your downloaded file lies.
  
  Have fun! If you are facing problems, feel free to ask questions on the official mailing list.
  
+ 
+ == Implementation ==
+ 
+ For detailed questions in terms of implementation have a look at my blog.
+ It describes the algorithm and focuses on the main ideas showing implementation things.
+ 
+ http://codingwiththomas.blogspot.com/2011/04/pagerank-with-apache-hama.html
+