You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by Alessandro Presta <al...@fb.com> on 2012/11/02 05:06:45 UTC

Re: Review Request: Allow creation of graph by adding edges that span multiple workers


> On Oct. 31, 2012, 8:43 p.m., Nitay Joffe wrote:
> > This is really cool stuff Alessandro. Logic and design wise it looks sweet to me.
> > The main comments I have are around too much copy/pasting. There are many places where I think the copy/paste can be cleaned up fairly easily. See detailed comments for examples.
> >

About to update the diff with all the suggested cleanup to minimize code duplication. Had to introduce a factory :/


> On Oct. 31, 2012, 8:43 p.m., Nitay Joffe wrote:
> > /trunk/giraph/src/main/java/org/apache/giraph/graph/VertexInputSplitsCallable.java, line 1
> > <https://reviews.apache.org/r/7784/diff/1/?file=183025#file183025line1>
> >
> >     I'm assuming this is same as InputSplitsCallable as before right?

Yes


> On Oct. 31, 2012, 8:43 p.m., Nitay Joffe wrote:
> > /trunk/giraph/src/main/java/org/apache/giraph/graph/BspServiceMaster.java, line 295
> > <https://reviews.apache.org/r/7784/diff/1/?file=183014#file183014line295>
> >
> >     Let's not be lazy. MakeĀ a common function with generateVertexInputSplits() and call it here. Copy/paste is bad form and has bitten us many times before.

Fixed by adding a common interface to VertexInputFormat and EdgeInputFormat that defines getSplits().


> On Oct. 31, 2012, 8:43 p.m., Nitay Joffe wrote:
> > /trunk/giraph/src/main/java/org/apache/giraph/graph/BspService.java, line 219
> > <https://reviews.apache.org/r/7784/diff/1/?file=183013#file183013line219>
> >
> >     I see a lot of duplication here. If these four and the previous four paths for vertexes are going to be kept around for the long term (not just some intermediate thing while we're developing quickly), how about we stick them in a class? Then have one instance for vertexes and one for edges? Something like zkInputSplitPaths?

Good idea, thanks!


> On Oct. 31, 2012, 8:43 p.m., Nitay Joffe wrote:
> > /trunk/giraph/src/main/java/org/apache/giraph/graph/EdgeReader.java, line 29
> > <https://reviews.apache.org/r/7784/diff/1/?file=183019#file183019line29>
> >
> >     s/vertices/edges ?

/g


> On Oct. 31, 2012, 8:43 p.m., Nitay Joffe wrote:
> > /trunk/giraph/src/main/java/org/apache/giraph/io/IntIntTextVertexValueInputFormat.java, line 60
> > <https://reviews.apache.org/r/7784/diff/1/?file=183032#file183032line60>
> >
> >     Guava's stuff might actually be faster / better for this? Splitter TAB_SPLITTER = Splitter.on("\t") up top, then
> >     TAB_SPLITTER.split(line.toString()) here?

Do you have evidence that it's significantly faster in this case?
Splitter produces an Iterable<String>, which makes it a bit annoying to access the first two elements.


- Alessandro


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7784/#review12969
-----------------------------------------------------------


On Oct. 30, 2012, 11:44 p.m., Alessandro Presta wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7784/
> -----------------------------------------------------------
> 
> (Updated Oct. 30, 2012, 11:44 p.m.)
> 
> 
> Review request for giraph.
> 
> 
> Description
> -------
> 
> See https://issues.apache.org/jira/browse/GIRAPH-155
> 
> 
> This addresses bug GIRAPH-155.
>     https://issues.apache.org/jira/browse/GIRAPH-155
> 
> 
> Diffs
> -----
> 
>   /trunk/giraph/src/main/java/org/apache/giraph/GiraphConfiguration.java 1403451 
>   /trunk/giraph/src/main/java/org/apache/giraph/ImmutableClassesGiraphConfiguration.java 1403451 
>   /trunk/giraph/src/main/java/org/apache/giraph/bsp/CentralizedServiceMaster.java 1403451 
>   /trunk/giraph/src/main/java/org/apache/giraph/comm/WorkerServer.java 1403451 
>   /trunk/giraph/src/main/java/org/apache/giraph/comm/netty/NettyWorkerClientRequestProcessor.java 1403451 
>   /trunk/giraph/src/main/java/org/apache/giraph/comm/netty/NettyWorkerServer.java 1403451 
>   /trunk/giraph/src/main/java/org/apache/giraph/comm/requests/SendPartitionMutationsRequest.java 1403451 
>   /trunk/giraph/src/main/java/org/apache/giraph/graph/BasicVertexValueReader.java PRE-CREATION 
>   /trunk/giraph/src/main/java/org/apache/giraph/graph/BspService.java 1403451 
>   /trunk/giraph/src/main/java/org/apache/giraph/graph/BspServiceMaster.java 1403451 
>   /trunk/giraph/src/main/java/org/apache/giraph/graph/BspServiceWorker.java 1403451 
>   /trunk/giraph/src/main/java/org/apache/giraph/graph/BspUtils.java 1403451 
>   /trunk/giraph/src/main/java/org/apache/giraph/graph/EdgeInputFormat.java PRE-CREATION 
>   /trunk/giraph/src/main/java/org/apache/giraph/graph/EdgeInputSplitsCallable.java PRE-CREATION 
>   /trunk/giraph/src/main/java/org/apache/giraph/graph/EdgeReader.java PRE-CREATION 
>   /trunk/giraph/src/main/java/org/apache/giraph/graph/EdgeWithSource.java PRE-CREATION 
>   /trunk/giraph/src/main/java/org/apache/giraph/graph/GiraphJob.java 1403451 
>   /trunk/giraph/src/main/java/org/apache/giraph/graph/GraphMapper.java 1403451 
>   /trunk/giraph/src/main/java/org/apache/giraph/graph/InputSplitsCallable.java 1403451 
>   /trunk/giraph/src/main/java/org/apache/giraph/graph/MasterThread.java 1403451 
>   /trunk/giraph/src/main/java/org/apache/giraph/graph/VertexInputSplitsCallable.java PRE-CREATION 
>   /trunk/giraph/src/main/java/org/apache/giraph/graph/VertexReader.java 1403451 
>   /trunk/giraph/src/main/java/org/apache/giraph/graph/VertexResolver.java 1403451 
>   /trunk/giraph/src/main/java/org/apache/giraph/graph/VertexValueInputFormat.java PRE-CREATION 
>   /trunk/giraph/src/main/java/org/apache/giraph/graph/VertexValueReader.java PRE-CREATION 
>   /trunk/giraph/src/main/java/org/apache/giraph/io/GiraphFileInputFormat.java PRE-CREATION 
>   /trunk/giraph/src/main/java/org/apache/giraph/io/GiraphTextInputFormat.java PRE-CREATION 
>   /trunk/giraph/src/main/java/org/apache/giraph/io/IntIntTextVertexValueInputFormat.java PRE-CREATION 
>   /trunk/giraph/src/main/java/org/apache/giraph/io/IntNullTextEdgeInputFormat.java PRE-CREATION 
>   /trunk/giraph/src/main/java/org/apache/giraph/io/TextEdgeInputFormat.java PRE-CREATION 
>   /trunk/giraph/src/main/java/org/apache/giraph/io/TextVertexInputFormat.java 1403451 
>   /trunk/giraph/src/main/java/org/apache/giraph/io/TextVertexValueInputFormat.java PRE-CREATION 
>   /trunk/giraph/src/main/java/org/apache/giraph/utils/FileUtils.java 1403451 
>   /trunk/giraph/src/main/java/org/apache/giraph/utils/InternalVertexRunner.java 1403451 
>   /trunk/giraph/src/test/java/org/apache/giraph/BspCase.java 1403451 
>   /trunk/giraph/src/test/java/org/apache/giraph/TestEdgeInput.java PRE-CREATION 
>   /trunk/giraph/src/test/java/org/apache/giraph/TestJsonBase64Format.java 1403451 
> 
> Diff: https://reviews.apache.org/r/7784/diff/
> 
> 
> Testing
> -------
> 
> 1) mvn verify
> 2) pseudo-distributed tests
> 3) PageRank on cluster with real text input (will post perf results on the issue)
> 
> 
> Thanks,
> 
> Alessandro Presta
> 
>