You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by Sebastian Schelter <ss...@apache.org> on 2011/12/24 10:32:15 UTC
Review Request: Port of the HCC algorithm for identifying all connected
components of a graph
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3313/
-----------------------------------------------------------
Review request for giraph.
Summary
-------
Port of the HCC algorithm to Giraph. Each vertex needs to find the smallest vertex id in its component.
I created a very memory-efficient abstract vertex in org.apache.giraph.graph.IntIntNullIntVertex and had org.apache.giraph.examples.ConnectedComponentsVertex extend that. org.apache.giraph.examples.ConnectedComponentsVertexTest contains an "integration" test on toy data.
I had to patch org.apache.giraph.utils.InternalVertexRunner to allow the use of combiners and to shutdown() the local zookeeper instance in the tests.
Local and pseudo-distributed unit tests were passed. I also tested the algorithm on a 6-machine hadoop cluster using the wikipedia pagelink graph (5.7M vertices, 130M edges).
This addresses bug GIRAPH-115.
https://issues.apache.org/jira/browse/GIRAPH-115
Diffs
-----
/trunk/src/main/java/org/apache/giraph/examples/ConnectedComponentsVertex.java PRE-CREATION
/trunk/src/main/java/org/apache/giraph/examples/IntIntNullIntTextInputFormat.java PRE-CREATION
/trunk/src/main/java/org/apache/giraph/examples/MinimumIntCombiner.java PRE-CREATION
/trunk/src/main/java/org/apache/giraph/examples/VertexWithComponentTextOutputFormat.java PRE-CREATION
/trunk/src/main/java/org/apache/giraph/graph/IntIntNullIntVertex.java PRE-CREATION
/trunk/src/main/java/org/apache/giraph/utils/InternalVertexRunner.java 1222837
/trunk/src/main/java/org/apache/giraph/utils/UnmodifiableIntArrayIterator.java PRE-CREATION
/trunk/src/test/java/org/apache/giraph/examples/ConnectedComponentsVertexTest.java PRE-CREATION
/trunk/src/test/java/org/apache/giraph/examples/MinimumIntCombinerTest.java PRE-CREATION
Diff: https://reviews.apache.org/r/3313/diff
Testing
-------
Thanks,
Sebastian
Re: Review Request: Port of the HCC algorithm for identifying all connected
components of a graph
Posted by Avery Ching <av...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3313/#review4117
-----------------------------------------------------------
Ship it!
Looks great. I'll commit on your behalf.
/trunk/src/test/java/org/apache/giraph/examples/ConnectedComponentsVertexTest.java
<https://reviews.apache.org/r/3313/#comment9277>
I'll fix this one for you. =) Remove */.
- Avery
On 2011-12-25 09:36:39, Sebastian Schelter wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/3313/
> -----------------------------------------------------------
>
> (Updated 2011-12-25 09:36:39)
>
>
> Review request for giraph.
>
>
> Summary
> -------
>
> Port of the HCC algorithm to Giraph. Each vertex needs to find the smallest vertex id in its component.
>
> I created a very memory-efficient abstract vertex in org.apache.giraph.graph.IntIntNullIntVertex and had org.apache.giraph.examples.ConnectedComponentsVertex extend that. org.apache.giraph.examples.ConnectedComponentsVertexTest contains an "integration" test on toy data.
>
> I had to patch org.apache.giraph.utils.InternalVertexRunner to allow the use of combiners and to shutdown() the local zookeeper instance in the tests.
>
> Local and pseudo-distributed unit tests were passed. I also tested the algorithm on a 6-machine hadoop cluster using the wikipedia pagelink graph (5.7M vertices, 130M edges).
>
>
> This addresses bug GIRAPH-115.
> https://issues.apache.org/jira/browse/GIRAPH-115
>
>
> Diffs
> -----
>
> /trunk/src/main/java/org/apache/giraph/examples/ConnectedComponentsVertex.java PRE-CREATION
> /trunk/src/main/java/org/apache/giraph/examples/IntIntNullIntTextInputFormat.java PRE-CREATION
> /trunk/src/main/java/org/apache/giraph/examples/MinimumIntCombiner.java PRE-CREATION
> /trunk/src/main/java/org/apache/giraph/examples/VertexWithComponentTextOutputFormat.java PRE-CREATION
> /trunk/src/main/java/org/apache/giraph/graph/IntIntNullIntVertex.java PRE-CREATION
> /trunk/src/main/java/org/apache/giraph/utils/InternalVertexRunner.java 1222837
> /trunk/src/main/java/org/apache/giraph/utils/UnmodifiableIntArrayIterator.java PRE-CREATION
> /trunk/src/test/java/org/apache/giraph/examples/ConnectedComponentsVertexTest.java PRE-CREATION
> /trunk/src/test/java/org/apache/giraph/examples/MinimumIntCombinerTest.java PRE-CREATION
> /trunk/src/test/java/org/apache/giraph/examples/SimpleShortestPathVertexTest.java 1222837
>
> Diff: https://reviews.apache.org/r/3313/diff
>
>
> Testing
> -------
>
>
> Thanks,
>
> Sebastian
>
>
Re: Review Request: Port of the HCC algorithm for identifying all connected
components of a graph
Posted by Sebastian Schelter <ss...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3313/
-----------------------------------------------------------
(Updated 2011-12-25 09:36:39.177630)
Review request for giraph.
Changes
-------
Reworked code to reflect Avery's comments regarding the code conventions.
Summary
-------
Port of the HCC algorithm to Giraph. Each vertex needs to find the smallest vertex id in its component.
I created a very memory-efficient abstract vertex in org.apache.giraph.graph.IntIntNullIntVertex and had org.apache.giraph.examples.ConnectedComponentsVertex extend that. org.apache.giraph.examples.ConnectedComponentsVertexTest contains an "integration" test on toy data.
I had to patch org.apache.giraph.utils.InternalVertexRunner to allow the use of combiners and to shutdown() the local zookeeper instance in the tests.
Local and pseudo-distributed unit tests were passed. I also tested the algorithm on a 6-machine hadoop cluster using the wikipedia pagelink graph (5.7M vertices, 130M edges).
This addresses bug GIRAPH-115.
https://issues.apache.org/jira/browse/GIRAPH-115
Diffs (updated)
-----
/trunk/src/main/java/org/apache/giraph/examples/ConnectedComponentsVertex.java PRE-CREATION
/trunk/src/main/java/org/apache/giraph/examples/IntIntNullIntTextInputFormat.java PRE-CREATION
/trunk/src/main/java/org/apache/giraph/examples/MinimumIntCombiner.java PRE-CREATION
/trunk/src/main/java/org/apache/giraph/examples/VertexWithComponentTextOutputFormat.java PRE-CREATION
/trunk/src/main/java/org/apache/giraph/graph/IntIntNullIntVertex.java PRE-CREATION
/trunk/src/main/java/org/apache/giraph/utils/InternalVertexRunner.java 1222837
/trunk/src/main/java/org/apache/giraph/utils/UnmodifiableIntArrayIterator.java PRE-CREATION
/trunk/src/test/java/org/apache/giraph/examples/ConnectedComponentsVertexTest.java PRE-CREATION
/trunk/src/test/java/org/apache/giraph/examples/MinimumIntCombinerTest.java PRE-CREATION
/trunk/src/test/java/org/apache/giraph/examples/SimpleShortestPathVertexTest.java 1222837
Diff: https://reviews.apache.org/r/3313/diff
Testing
-------
Thanks,
Sebastian
Re: Review Request: Port of the HCC algorithm for identifying all connected
components of a graph
Posted by Avery Ching <av...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3313/#review4114
-----------------------------------------------------------
Sebastian, this is really awesome work, thanks for sharing it! While I didn't read the paper, your code looks good and the compute() code is pretty straightforward. IntIntNullIntVertex.java is a good example of how to make a very compact vertex.
I only have a few minor formatting requests.
In the CODE_CONVENTIONS, comments should be:
- All classes, members, and member methods should have Javadoc in the following
style. C-style comments for javadoc and // comments for non-javadoc. Also,
the comment block should have a line break that separates the comment
section and the @ section.
While not in the CODE_CONVENTIONS, but should be, Giraph follows spaces in the <>, i.e. <I, V, E, M>. Can you please add spaces i.e. <IntWritable, IntWritable, NullWritable, IntWritable>?
I marked as many as I could find, please correct any others I have missed.
/trunk/src/main/java/org/apache/giraph/examples/ConnectedComponentsVertex.java
<https://reviews.apache.org/r/3313/#comment9254>
CODE_CONVENTIONS comment suggestion.
/trunk/src/main/java/org/apache/giraph/examples/ConnectedComponentsVertex.java
<https://reviews.apache.org/r/3313/#comment9253>
You could shorten this a tad with the foreach pattern instead.
/trunk/src/main/java/org/apache/giraph/examples/ConnectedComponentsVertex.java
<https://reviews.apache.org/r/3313/#comment9255>
CODE_CONVENTIONS comment suggestion.
/trunk/src/main/java/org/apache/giraph/examples/ConnectedComponentsVertex.java
<https://reviews.apache.org/r/3313/#comment9256>
Could be foreach (again).
/trunk/src/main/java/org/apache/giraph/examples/ConnectedComponentsVertex.java
<https://reviews.apache.org/r/3313/#comment9257>
CODE_CONVENTIONS
/trunk/src/main/java/org/apache/giraph/examples/ConnectedComponentsVertex.java
<https://reviews.apache.org/r/3313/#comment9258>
CODE_CONVENTIONS
/trunk/src/main/java/org/apache/giraph/examples/IntIntNullIntTextInputFormat.java
<https://reviews.apache.org/r/3313/#comment9259>
CODE_CONVENTIONS
/trunk/src/main/java/org/apache/giraph/examples/IntIntNullIntTextInputFormat.java
<https://reviews.apache.org/r/3313/#comment9260>
CODE_CONVENTIONS
/trunk/src/main/java/org/apache/giraph/examples/IntIntNullIntTextInputFormat.java
<https://reviews.apache.org/r/3313/#comment9261>
CODE_CONVENTIONS
/trunk/src/main/java/org/apache/giraph/examples/IntIntNullIntTextInputFormat.java
<https://reviews.apache.org/r/3313/#comment9262>
CODE_CONVENTIONS
/trunk/src/main/java/org/apache/giraph/examples/IntIntNullIntTextInputFormat.java
<https://reviews.apache.org/r/3313/#comment9263>
CODE_CONVENTIONS
/trunk/src/main/java/org/apache/giraph/examples/IntIntNullIntTextInputFormat.java
<https://reviews.apache.org/r/3313/#comment9264>
CODE_CONVENTIONS
/trunk/src/main/java/org/apache/giraph/examples/MinimumIntCombiner.java
<https://reviews.apache.org/r/3313/#comment9265>
CODE_CONVENTIONS
/trunk/src/main/java/org/apache/giraph/examples/VertexWithComponentTextOutputFormat.java
<https://reviews.apache.org/r/3313/#comment9266>
Capital 'text'
/trunk/src/main/java/org/apache/giraph/graph/IntIntNullIntVertex.java
<https://reviews.apache.org/r/3313/#comment9267>
NullWritable,IntWritable
should be
NullWritable, IntWritable
/trunk/src/main/java/org/apache/giraph/utils/InternalVertexRunner.java
<https://reviews.apache.org/r/3313/#comment9268>
String,String -> String, String
- Avery
On 2011-12-24 09:32:15, Sebastian Schelter wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/3313/
> -----------------------------------------------------------
>
> (Updated 2011-12-24 09:32:15)
>
>
> Review request for giraph.
>
>
> Summary
> -------
>
> Port of the HCC algorithm to Giraph. Each vertex needs to find the smallest vertex id in its component.
>
> I created a very memory-efficient abstract vertex in org.apache.giraph.graph.IntIntNullIntVertex and had org.apache.giraph.examples.ConnectedComponentsVertex extend that. org.apache.giraph.examples.ConnectedComponentsVertexTest contains an "integration" test on toy data.
>
> I had to patch org.apache.giraph.utils.InternalVertexRunner to allow the use of combiners and to shutdown() the local zookeeper instance in the tests.
>
> Local and pseudo-distributed unit tests were passed. I also tested the algorithm on a 6-machine hadoop cluster using the wikipedia pagelink graph (5.7M vertices, 130M edges).
>
>
> This addresses bug GIRAPH-115.
> https://issues.apache.org/jira/browse/GIRAPH-115
>
>
> Diffs
> -----
>
> /trunk/src/main/java/org/apache/giraph/examples/ConnectedComponentsVertex.java PRE-CREATION
> /trunk/src/main/java/org/apache/giraph/examples/IntIntNullIntTextInputFormat.java PRE-CREATION
> /trunk/src/main/java/org/apache/giraph/examples/MinimumIntCombiner.java PRE-CREATION
> /trunk/src/main/java/org/apache/giraph/examples/VertexWithComponentTextOutputFormat.java PRE-CREATION
> /trunk/src/main/java/org/apache/giraph/graph/IntIntNullIntVertex.java PRE-CREATION
> /trunk/src/main/java/org/apache/giraph/utils/InternalVertexRunner.java 1222837
> /trunk/src/main/java/org/apache/giraph/utils/UnmodifiableIntArrayIterator.java PRE-CREATION
> /trunk/src/test/java/org/apache/giraph/examples/ConnectedComponentsVertexTest.java PRE-CREATION
> /trunk/src/test/java/org/apache/giraph/examples/MinimumIntCombinerTest.java PRE-CREATION
>
> Diff: https://reviews.apache.org/r/3313/diff
>
>
> Testing
> -------
>
>
> Thanks,
>
> Sebastian
>
>