You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by Pavan Kumar Athivarapu <pa...@outlook.com> on 2014/06/09 03:32:04 UTC

Re: Review Request 19405: GIRAPH-874 - Specialized byte array partitions


> On April 2, 2014, 9:11 a.m., Lukas Nalezenec wrote:
> > giraph-core/src/main/java/org/apache/giraph/partition/primitives/LongByteArrayPartition.java, line 232
> > <https://reviews.apache.org/r/19405/diff/2/?file=531353#file531353line232>
> >
> >     Small note:
> >     
> >     When the partition is configured to use UnsafeByteArrayOutputStream it allocates memory 
> >     in multiples of two in method ensureSize. It might make sense for graphs heavily mutated during computation
> >     but for lot of applications size of buffer never changes after starting first iteration or changes but not so much. 
> >     
> >     LOG:
> >     Current buffer size is 153 bytes, Current buffer position is 150 (bytes), I need 10 more bytes
> >     Alocating new buffer with size 326 bytes
> >     
> >       private void ensureSize(int size) {
> >         if (pos + size > buf.length) {
> >           byte[] newBuf = new byte[(buf.length + size) << 1];
> >           System.arraycopy(buf, 0, newBuf, 0, pos);
> >           buf = newBuf;
> >         }
> >       }
> >     
> >

sure, I think we should make it somehow configurable with default being implemented as doubling every so often. 
would you like to write code for that? also please take a look at GIRAPH-892 before doing so, since it defines a few more DataOutputs


> On April 2, 2014, 9:11 a.m., Lukas Nalezenec wrote:
> > giraph-core/src/main/java/org/apache/giraph/partition/primitives/LongByteArrayPartition.java, line 293
> > <https://reviews.apache.org/r/19405/diff/2/?file=531353#file531353line293>
> >
> >     Small note:
> >     Some algorithms may benefit from iterating vertices ordered by key. We cant use sorted iterator by default since partitions could be big but there could be configuration option to turn in on.
> >

Any example of such algorithms? 
Since anything done on a vertex or by it, in a superstep is not visible until the next superstep, I cannot see how this would be helpful


- Pavan Kumar


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19405/#review39247
-----------------------------------------------------------


On March 21, 2014, 4:22 p.m., Craig Muchinsky wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19405/
> -----------------------------------------------------------
> 
> (Updated March 21, 2014, 4:22 p.m.)
> 
> 
> Review request for giraph.
> 
> 
> Repository: giraph-git
> 
> 
> Description
> -------
> 
> This patch adds 2 new byte array partition variations that are optimized for int/long ids. They leverage fastutil primitive maps and allow for vertex object reuse during iteration because they don't keep a reference to the vertexId object in the primitive map.
> 
> Additional unit tests were added to TestPartitionStores which cover the new IntByteArrayPartition class, which is functionally identical to LongByteArrayPartition.
> 
> 
> Diffs
> -----
> 
>   giraph-core/src/main/java/org/apache/giraph/partition/primitives/IntByteArrayPartition.java PRE-CREATION 
>   giraph-core/src/main/java/org/apache/giraph/partition/primitives/LongByteArrayPartition.java PRE-CREATION 
>   giraph-core/src/main/java/org/apache/giraph/partition/primitives/package-info.java PRE-CREATION 
>   giraph-core/src/test/java/org/apache/giraph/partition/TestPartitionStores.java 08f4544 
> 
> Diff: https://reviews.apache.org/r/19405/diff/
> 
> 
> Testing
> -------
> 
> Successful "mvn clean verify" with hadoop_2 profile, and 4B vertex 5B edge graph tested on 18 node 432 core cluster.
> 
> 
> Thanks,
> 
> Craig Muchinsky
> 
>