You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by Julien Nioche <jn...@apache.org> on 2013/10/18 13:17:27 UTC

Re: Review Request 13492: LinkRank implementation with Giraph

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13492/#review27183
-----------------------------------------------------------


I think this should not be a Nutch module within Giraph but be part of Nutch instead and mimic what is done in Nutch 1.x in nutch.scoring.webgraph package. The patch should be applied to the Nutch-2.x branch and the packages should reflect this e.g. org.apache.nutch.linkrank. There should also be a new set of Nutch commands added to the script in src/bin/nutch.

- Julien Nioche


On Aug. 30, 2013, 7:59 p.m., Ahmet Emre Aladag wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/13492/
> -----------------------------------------------------------
> 
> (Updated Aug. 30, 2013, 7:59 p.m.)
> 
> 
> Review request for giraph.
> 
> 
> Bugs: GIRAPH-729
>     https://issues.apache.org/jira/browse/GIRAPH-729
> 
> 
> Repository: giraph-git
> 
> 
> Description
> -------
> 
> Currently, Nutch 2.x lacks LinkRank (a variant of PageRank). Adding a module for Nutch including LinkRank and other possible ranking algorithms would be useful for Apache Community. This module can be used by Nutch 1.x and other apps as well.
> 
> Attached you can find my patch. It includes:
> 
> * I/O formats (URL Text-URL Text edges, URL Text nodes) for reading from HDFS and HBase, 
> * Self-link and duplicate-link elimination
> * LinkRank computation (10 iterations by default).
> * Cumulative distribution normalization
> 
> 
> Diffs
> -----
> 
>   giraph-nutch/pom.xml PRE-CREATION 
>   giraph-nutch/src/main/assembly/compile.xml PRE-CREATION 
>   giraph-nutch/src/main/java/org/apache/giraph/nutch/LinkRank/LinkRankComputation.java PRE-CREATION 
>   giraph-nutch/src/main/java/org/apache/giraph/nutch/LinkRank/LinkRankVertexMasterCompute.java PRE-CREATION 
>   giraph-nutch/src/main/java/org/apache/giraph/nutch/LinkRank/io/filters/HostRankVertexFilter.java PRE-CREATION 
>   giraph-nutch/src/main/java/org/apache/giraph/nutch/LinkRank/io/filters/LinkRankEdgeFilter.java PRE-CREATION 
>   giraph-nutch/src/main/java/org/apache/giraph/nutch/LinkRank/io/filters/LinkRankVertexFilter.java PRE-CREATION 
>   giraph-nutch/src/main/java/org/apache/giraph/nutch/LinkRank/io/filters/package-info.java PRE-CREATION 
>   giraph-nutch/src/main/java/org/apache/giraph/nutch/LinkRank/io/formats/LinkRankEdgeInputFormat.java PRE-CREATION 
>   giraph-nutch/src/main/java/org/apache/giraph/nutch/LinkRank/io/formats/LinkRankVertexInputFormat.java PRE-CREATION 
>   giraph-nutch/src/main/java/org/apache/giraph/nutch/LinkRank/io/formats/LinkRankVertexOutputFormat.java PRE-CREATION 
>   giraph-nutch/src/main/java/org/apache/giraph/nutch/LinkRank/io/formats/LinkRankVertexUniformInputFormat.java PRE-CREATION 
>   giraph-nutch/src/main/java/org/apache/giraph/nutch/LinkRank/io/formats/Nutch2HostInputFormat.java PRE-CREATION 
>   giraph-nutch/src/main/java/org/apache/giraph/nutch/LinkRank/io/formats/Nutch2HostOutputFormat.java PRE-CREATION 
>   giraph-nutch/src/main/java/org/apache/giraph/nutch/LinkRank/io/formats/Nutch2WebpageInputFormat.java PRE-CREATION 
>   giraph-nutch/src/main/java/org/apache/giraph/nutch/LinkRank/io/formats/Nutch2WebpageOutputFormat.java PRE-CREATION 
>   giraph-nutch/src/main/java/org/apache/giraph/nutch/LinkRank/io/formats/package-info.java PRE-CREATION 
>   giraph-nutch/src/main/java/org/apache/giraph/nutch/LinkRank/io/package-info.java PRE-CREATION 
>   giraph-nutch/src/main/java/org/apache/giraph/nutch/LinkRank/package-info.java PRE-CREATION 
>   giraph-nutch/src/main/java/org/apache/giraph/nutch/package-info.java PRE-CREATION 
>   giraph-nutch/src/main/java/org/apache/giraph/nutch/utils/NutchUtil.java PRE-CREATION 
>   giraph-nutch/src/main/java/org/apache/giraph/nutch/utils/StringDoublePair.java PRE-CREATION 
>   giraph-nutch/src/main/java/org/apache/giraph/nutch/utils/StringFloatPair.java PRE-CREATION 
>   giraph-nutch/src/main/java/org/apache/giraph/nutch/utils/StringStringPair.java PRE-CREATION 
>   giraph-nutch/src/main/java/org/apache/giraph/nutch/utils/package-info.java PRE-CREATION 
>   giraph-nutch/src/test/java/org/apache/giraph/nutch/HostRankHBaseTest.java PRE-CREATION 
>   giraph-nutch/src/test/java/org/apache/giraph/nutch/LinkRankComputationTest.java PRE-CREATION 
>   giraph-nutch/src/test/java/org/apache/giraph/nutch/LinkRankHBaseTest.java PRE-CREATION 
>   giraph-nutch/src/test/java/org/apache/giraph/nutch/package-info.java PRE-CREATION 
>   pom.xml 41b6bb1 
> 
> Diff: https://reviews.apache.org/r/13492/diff/
> 
> 
> Testing
> -------
> 
> * Unittests for computation on HDFS and HBase.
> 
> 
> Thanks,
> 
> Ahmet Emre Aladag
> 
>


Re: Review Request 13492: LinkRank implementation with Giraph

Posted by Claudio Martella <cl...@gmail.com>.

> On Oct. 18, 2013, 11:17 a.m., Julien Nioche wrote:
> > I think this should not be a Nutch module within Giraph but be part of Nutch instead and mimic what is done in Nutch 1.x in nutch.scoring.webgraph package. The patch should be applied to the Nutch-2.x branch and the packages should reflect this e.g. org.apache.nutch.linkrank. There should also be a new set of Nutch commands added to the script in src/bin/nutch.

I fully agree with you Julien. In fact, we also suggested this would go into Nutch. Giraph-side, we gave a review of the code to make sure the implementation would "make sense".


- Claudio


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13492/#review27183
-----------------------------------------------------------


On Aug. 30, 2013, 7:59 p.m., Ahmet Emre Aladag wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/13492/
> -----------------------------------------------------------
> 
> (Updated Aug. 30, 2013, 7:59 p.m.)
> 
> 
> Review request for giraph.
> 
> 
> Bugs: GIRAPH-729
>     https://issues.apache.org/jira/browse/GIRAPH-729
> 
> 
> Repository: giraph-git
> 
> 
> Description
> -------
> 
> Currently, Nutch 2.x lacks LinkRank (a variant of PageRank). Adding a module for Nutch including LinkRank and other possible ranking algorithms would be useful for Apache Community. This module can be used by Nutch 1.x and other apps as well.
> 
> Attached you can find my patch. It includes:
> 
> * I/O formats (URL Text-URL Text edges, URL Text nodes) for reading from HDFS and HBase, 
> * Self-link and duplicate-link elimination
> * LinkRank computation (10 iterations by default).
> * Cumulative distribution normalization
> 
> 
> Diffs
> -----
> 
>   giraph-nutch/pom.xml PRE-CREATION 
>   giraph-nutch/src/main/assembly/compile.xml PRE-CREATION 
>   giraph-nutch/src/main/java/org/apache/giraph/nutch/LinkRank/LinkRankComputation.java PRE-CREATION 
>   giraph-nutch/src/main/java/org/apache/giraph/nutch/LinkRank/LinkRankVertexMasterCompute.java PRE-CREATION 
>   giraph-nutch/src/main/java/org/apache/giraph/nutch/LinkRank/io/filters/HostRankVertexFilter.java PRE-CREATION 
>   giraph-nutch/src/main/java/org/apache/giraph/nutch/LinkRank/io/filters/LinkRankEdgeFilter.java PRE-CREATION 
>   giraph-nutch/src/main/java/org/apache/giraph/nutch/LinkRank/io/filters/LinkRankVertexFilter.java PRE-CREATION 
>   giraph-nutch/src/main/java/org/apache/giraph/nutch/LinkRank/io/filters/package-info.java PRE-CREATION 
>   giraph-nutch/src/main/java/org/apache/giraph/nutch/LinkRank/io/formats/LinkRankEdgeInputFormat.java PRE-CREATION 
>   giraph-nutch/src/main/java/org/apache/giraph/nutch/LinkRank/io/formats/LinkRankVertexInputFormat.java PRE-CREATION 
>   giraph-nutch/src/main/java/org/apache/giraph/nutch/LinkRank/io/formats/LinkRankVertexOutputFormat.java PRE-CREATION 
>   giraph-nutch/src/main/java/org/apache/giraph/nutch/LinkRank/io/formats/LinkRankVertexUniformInputFormat.java PRE-CREATION 
>   giraph-nutch/src/main/java/org/apache/giraph/nutch/LinkRank/io/formats/Nutch2HostInputFormat.java PRE-CREATION 
>   giraph-nutch/src/main/java/org/apache/giraph/nutch/LinkRank/io/formats/Nutch2HostOutputFormat.java PRE-CREATION 
>   giraph-nutch/src/main/java/org/apache/giraph/nutch/LinkRank/io/formats/Nutch2WebpageInputFormat.java PRE-CREATION 
>   giraph-nutch/src/main/java/org/apache/giraph/nutch/LinkRank/io/formats/Nutch2WebpageOutputFormat.java PRE-CREATION 
>   giraph-nutch/src/main/java/org/apache/giraph/nutch/LinkRank/io/formats/package-info.java PRE-CREATION 
>   giraph-nutch/src/main/java/org/apache/giraph/nutch/LinkRank/io/package-info.java PRE-CREATION 
>   giraph-nutch/src/main/java/org/apache/giraph/nutch/LinkRank/package-info.java PRE-CREATION 
>   giraph-nutch/src/main/java/org/apache/giraph/nutch/package-info.java PRE-CREATION 
>   giraph-nutch/src/main/java/org/apache/giraph/nutch/utils/NutchUtil.java PRE-CREATION 
>   giraph-nutch/src/main/java/org/apache/giraph/nutch/utils/StringDoublePair.java PRE-CREATION 
>   giraph-nutch/src/main/java/org/apache/giraph/nutch/utils/StringFloatPair.java PRE-CREATION 
>   giraph-nutch/src/main/java/org/apache/giraph/nutch/utils/StringStringPair.java PRE-CREATION 
>   giraph-nutch/src/main/java/org/apache/giraph/nutch/utils/package-info.java PRE-CREATION 
>   giraph-nutch/src/test/java/org/apache/giraph/nutch/HostRankHBaseTest.java PRE-CREATION 
>   giraph-nutch/src/test/java/org/apache/giraph/nutch/LinkRankComputationTest.java PRE-CREATION 
>   giraph-nutch/src/test/java/org/apache/giraph/nutch/LinkRankHBaseTest.java PRE-CREATION 
>   giraph-nutch/src/test/java/org/apache/giraph/nutch/package-info.java PRE-CREATION 
>   pom.xml 41b6bb1 
> 
> Diff: https://reviews.apache.org/r/13492/diff/
> 
> 
> Testing
> -------
> 
> * Unittests for computation on HDFS and HBase.
> 
> 
> Thanks,
> 
> Ahmet Emre Aladag
> 
>