You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by Maja Kabiljo <ma...@fb.com> on 2013/02/05 06:42:07 UTC

Review Request: GIRAPH-498: We should check input splits status from zookeeeper once per worker, not once per split thread

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9301/
-----------------------------------------------------------

Review request for giraph.


Description
-------

When using a lot of workers and a lot of input split threads, checking that all input splits are finished after the reading is done takes a long time, since we check every input split once per thread.


This addresses bug GIRAPH-498.
    https://issues.apache.org/jira/browse/GIRAPH-498


Diffs
-----

  giraph-core/src/main/java/org/apache/giraph/conf/GiraphConfiguration.java 796047d 
  giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java f542344 
  giraph-core/src/main/java/org/apache/giraph/worker/EdgeInputSplitsCallable.java 7d40dfb 
  giraph-core/src/main/java/org/apache/giraph/worker/EdgeInputSplitsCallableFactory.java 1adcd73 
  giraph-core/src/main/java/org/apache/giraph/worker/InputSplitPathOrganizer.java bfaefd2 
  giraph-core/src/main/java/org/apache/giraph/worker/InputSplitsCallable.java d09ca2b 
  giraph-core/src/main/java/org/apache/giraph/worker/InputSplitsHandler.java PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/worker/VertexInputSplitsCallable.java a4f98e1 
  giraph-core/src/main/java/org/apache/giraph/worker/VertexInputSplitsCallableFactory.java 0d617dc 
  giraph-core/src/test/java/org/apache/giraph/TestBspBasic.java 987f51c 
  giraph-hbase/.graph.csv.crc PRE-CREATION 
  giraph-hbase/graph.csv PRE-CREATION 

Diff: https://reviews.apache.org/r/9301/diff/


Testing
-------

mvn clean verify

Real application, using 200 workers and 20 input threads:
- trunk - about 560s for input split threads to finish, 720s for input superstep
- with this patch - about 310s for input split threads to finish, 500s for input superstep


Thanks,

Maja Kabiljo


Re: Review Request: GIRAPH-498: We should check input splits status from zookeeeper once per worker, not once per split thread

Posted by Alessandro Presta <al...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9301/#review16087
-----------------------------------------------------------

Ship it!


I really like the refactoring, taking input split reservation logic out of the callables. Also a good cleanup of InputSplitPathOrganizer.
This was long overdue and the benefits are evident.
+1


giraph-core/src/main/java/org/apache/giraph/conf/GiraphConfiguration.java
<https://reviews.apache.org/r/9301/#comment34405>

    I find the "should" form a bit exotic. I think "useInputSplitLocality" would be consistent with the codebase.



giraph-core/src/main/java/org/apache/giraph/worker/InputSplitPathOrganizer.java
<https://reviews.apache.org/r/9301/#comment34406>

    I also found extending Iterable a bit too implicit here. Good refactor.


- Alessandro Presta


On Feb. 5, 2013, 5:42 a.m., Maja Kabiljo wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/9301/
> -----------------------------------------------------------
> 
> (Updated Feb. 5, 2013, 5:42 a.m.)
> 
> 
> Review request for giraph.
> 
> 
> Description
> -------
> 
> When using a lot of workers and a lot of input split threads, checking that all input splits are finished after the reading is done takes a long time, since we check every input split once per thread.
> 
> 
> This addresses bug GIRAPH-498.
>     https://issues.apache.org/jira/browse/GIRAPH-498
> 
> 
> Diffs
> -----
> 
>   giraph-core/src/main/java/org/apache/giraph/conf/GiraphConfiguration.java 796047d 
>   giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java f542344 
>   giraph-core/src/main/java/org/apache/giraph/worker/EdgeInputSplitsCallable.java 7d40dfb 
>   giraph-core/src/main/java/org/apache/giraph/worker/EdgeInputSplitsCallableFactory.java 1adcd73 
>   giraph-core/src/main/java/org/apache/giraph/worker/InputSplitPathOrganizer.java bfaefd2 
>   giraph-core/src/main/java/org/apache/giraph/worker/InputSplitsCallable.java d09ca2b 
>   giraph-core/src/main/java/org/apache/giraph/worker/InputSplitsHandler.java PRE-CREATION 
>   giraph-core/src/main/java/org/apache/giraph/worker/VertexInputSplitsCallable.java a4f98e1 
>   giraph-core/src/main/java/org/apache/giraph/worker/VertexInputSplitsCallableFactory.java 0d617dc 
>   giraph-core/src/test/java/org/apache/giraph/TestBspBasic.java 987f51c 
>   giraph-hbase/.graph.csv.crc PRE-CREATION 
>   giraph-hbase/graph.csv PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/9301/diff/
> 
> 
> Testing
> -------
> 
> mvn clean verify
> 
> Real application, using 200 workers and 20 input threads:
> - trunk - about 560s for input split threads to finish, 720s for input superstep
> - with this patch - about 310s for input split threads to finish, 500s for input superstep
> 
> 
> Thanks,
> 
> Maja Kabiljo
> 
>


Re: Review Request: GIRAPH-498: We should check input splits status from zookeeeper once per worker, not once per split thread

Posted by Maja Kabiljo <ma...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9301/
-----------------------------------------------------------

(Updated Feb. 5, 2013, 6:34 p.m.)


Review request for giraph.


Changes
-------

useInputSplitLocality


Description
-------

When using a lot of workers and a lot of input split threads, checking that all input splits are finished after the reading is done takes a long time, since we check every input split once per thread.


This addresses bug GIRAPH-498.
    https://issues.apache.org/jira/browse/GIRAPH-498


Diffs (updated)
-----

  giraph-core/src/main/java/org/apache/giraph/conf/GiraphConfiguration.java 796047d 
  giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java a48c5ea 
  giraph-core/src/main/java/org/apache/giraph/worker/EdgeInputSplitsCallable.java 7d40dfb 
  giraph-core/src/main/java/org/apache/giraph/worker/EdgeInputSplitsCallableFactory.java 1adcd73 
  giraph-core/src/main/java/org/apache/giraph/worker/InputSplitPathOrganizer.java bfaefd2 
  giraph-core/src/main/java/org/apache/giraph/worker/InputSplitsCallable.java d09ca2b 
  giraph-core/src/main/java/org/apache/giraph/worker/InputSplitsHandler.java PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/worker/VertexInputSplitsCallable.java a4f98e1 
  giraph-core/src/main/java/org/apache/giraph/worker/VertexInputSplitsCallableFactory.java 0d617dc 
  giraph-core/src/test/java/org/apache/giraph/TestBspBasic.java 987f51c 

Diff: https://reviews.apache.org/r/9301/diff/


Testing
-------

mvn clean verify

Real application, using 200 workers and 20 input threads:
- trunk - about 560s for input split threads to finish, 720s for input superstep
- with this patch - about 310s for input split threads to finish, 500s for input superstep


Thanks,

Maja Kabiljo


Re: Review Request: GIRAPH-498: We should check input splits status from zookeeeper once per worker, not once per split thread

Posted by Eli Reisman <in...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9301/#review16119
-----------------------------------------------------------

Ship it!


I'm +1 on this, it is long overdue and I was a bit troubled by things like this since we made the move to multithreading. I think there are more places where work like this would be beneficial. Nice job!

- Eli Reisman


On Feb. 5, 2013, 5:42 a.m., Maja Kabiljo wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/9301/
> -----------------------------------------------------------
> 
> (Updated Feb. 5, 2013, 5:42 a.m.)
> 
> 
> Review request for giraph.
> 
> 
> Description
> -------
> 
> When using a lot of workers and a lot of input split threads, checking that all input splits are finished after the reading is done takes a long time, since we check every input split once per thread.
> 
> 
> This addresses bug GIRAPH-498.
>     https://issues.apache.org/jira/browse/GIRAPH-498
> 
> 
> Diffs
> -----
> 
>   giraph-core/src/main/java/org/apache/giraph/conf/GiraphConfiguration.java 796047d 
>   giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java f542344 
>   giraph-core/src/main/java/org/apache/giraph/worker/EdgeInputSplitsCallable.java 7d40dfb 
>   giraph-core/src/main/java/org/apache/giraph/worker/EdgeInputSplitsCallableFactory.java 1adcd73 
>   giraph-core/src/main/java/org/apache/giraph/worker/InputSplitPathOrganizer.java bfaefd2 
>   giraph-core/src/main/java/org/apache/giraph/worker/InputSplitsCallable.java d09ca2b 
>   giraph-core/src/main/java/org/apache/giraph/worker/InputSplitsHandler.java PRE-CREATION 
>   giraph-core/src/main/java/org/apache/giraph/worker/VertexInputSplitsCallable.java a4f98e1 
>   giraph-core/src/main/java/org/apache/giraph/worker/VertexInputSplitsCallableFactory.java 0d617dc 
>   giraph-core/src/test/java/org/apache/giraph/TestBspBasic.java 987f51c 
>   giraph-hbase/.graph.csv.crc PRE-CREATION 
>   giraph-hbase/graph.csv PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/9301/diff/
> 
> 
> Testing
> -------
> 
> mvn clean verify
> 
> Real application, using 200 workers and 20 input threads:
> - trunk - about 560s for input split threads to finish, 720s for input superstep
> - with this patch - about 310s for input split threads to finish, 500s for input superstep
> 
> 
> Thanks,
> 
> Maja Kabiljo
> 
>