You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by Maja Kabiljo <ma...@fb.com> on 2013/02/05 06:42:07 UTC
Review Request: GIRAPH-498: We should check input splits status from
zookeeeper once per worker, not once per split thread
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9301/
-----------------------------------------------------------
Review request for giraph.
Description
-------
When using a lot of workers and a lot of input split threads, checking that all input splits are finished after the reading is done takes a long time, since we check every input split once per thread.
This addresses bug GIRAPH-498.
https://issues.apache.org/jira/browse/GIRAPH-498
Diffs
-----
giraph-core/src/main/java/org/apache/giraph/conf/GiraphConfiguration.java 796047d
giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java f542344
giraph-core/src/main/java/org/apache/giraph/worker/EdgeInputSplitsCallable.java 7d40dfb
giraph-core/src/main/java/org/apache/giraph/worker/EdgeInputSplitsCallableFactory.java 1adcd73
giraph-core/src/main/java/org/apache/giraph/worker/InputSplitPathOrganizer.java bfaefd2
giraph-core/src/main/java/org/apache/giraph/worker/InputSplitsCallable.java d09ca2b
giraph-core/src/main/java/org/apache/giraph/worker/InputSplitsHandler.java PRE-CREATION
giraph-core/src/main/java/org/apache/giraph/worker/VertexInputSplitsCallable.java a4f98e1
giraph-core/src/main/java/org/apache/giraph/worker/VertexInputSplitsCallableFactory.java 0d617dc
giraph-core/src/test/java/org/apache/giraph/TestBspBasic.java 987f51c
giraph-hbase/.graph.csv.crc PRE-CREATION
giraph-hbase/graph.csv PRE-CREATION
Diff: https://reviews.apache.org/r/9301/diff/
Testing
-------
mvn clean verify
Real application, using 200 workers and 20 input threads:
- trunk - about 560s for input split threads to finish, 720s for input superstep
- with this patch - about 310s for input split threads to finish, 500s for input superstep
Thanks,
Maja Kabiljo
Re: Review Request: GIRAPH-498: We should check input splits status from
zookeeeper once per worker, not once per split thread
Posted by Alessandro Presta <al...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9301/#review16087
-----------------------------------------------------------
Ship it!
I really like the refactoring, taking input split reservation logic out of the callables. Also a good cleanup of InputSplitPathOrganizer.
This was long overdue and the benefits are evident.
+1
giraph-core/src/main/java/org/apache/giraph/conf/GiraphConfiguration.java
<https://reviews.apache.org/r/9301/#comment34405>
I find the "should" form a bit exotic. I think "useInputSplitLocality" would be consistent with the codebase.
giraph-core/src/main/java/org/apache/giraph/worker/InputSplitPathOrganizer.java
<https://reviews.apache.org/r/9301/#comment34406>
I also found extending Iterable a bit too implicit here. Good refactor.
- Alessandro Presta
On Feb. 5, 2013, 5:42 a.m., Maja Kabiljo wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/9301/
> -----------------------------------------------------------
>
> (Updated Feb. 5, 2013, 5:42 a.m.)
>
>
> Review request for giraph.
>
>
> Description
> -------
>
> When using a lot of workers and a lot of input split threads, checking that all input splits are finished after the reading is done takes a long time, since we check every input split once per thread.
>
>
> This addresses bug GIRAPH-498.
> https://issues.apache.org/jira/browse/GIRAPH-498
>
>
> Diffs
> -----
>
> giraph-core/src/main/java/org/apache/giraph/conf/GiraphConfiguration.java 796047d
> giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java f542344
> giraph-core/src/main/java/org/apache/giraph/worker/EdgeInputSplitsCallable.java 7d40dfb
> giraph-core/src/main/java/org/apache/giraph/worker/EdgeInputSplitsCallableFactory.java 1adcd73
> giraph-core/src/main/java/org/apache/giraph/worker/InputSplitPathOrganizer.java bfaefd2
> giraph-core/src/main/java/org/apache/giraph/worker/InputSplitsCallable.java d09ca2b
> giraph-core/src/main/java/org/apache/giraph/worker/InputSplitsHandler.java PRE-CREATION
> giraph-core/src/main/java/org/apache/giraph/worker/VertexInputSplitsCallable.java a4f98e1
> giraph-core/src/main/java/org/apache/giraph/worker/VertexInputSplitsCallableFactory.java 0d617dc
> giraph-core/src/test/java/org/apache/giraph/TestBspBasic.java 987f51c
> giraph-hbase/.graph.csv.crc PRE-CREATION
> giraph-hbase/graph.csv PRE-CREATION
>
> Diff: https://reviews.apache.org/r/9301/diff/
>
>
> Testing
> -------
>
> mvn clean verify
>
> Real application, using 200 workers and 20 input threads:
> - trunk - about 560s for input split threads to finish, 720s for input superstep
> - with this patch - about 310s for input split threads to finish, 500s for input superstep
>
>
> Thanks,
>
> Maja Kabiljo
>
>
Re: Review Request: GIRAPH-498: We should check input splits status from
zookeeeper once per worker, not once per split thread
Posted by Maja Kabiljo <ma...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9301/
-----------------------------------------------------------
(Updated Feb. 5, 2013, 6:34 p.m.)
Review request for giraph.
Changes
-------
useInputSplitLocality
Description
-------
When using a lot of workers and a lot of input split threads, checking that all input splits are finished after the reading is done takes a long time, since we check every input split once per thread.
This addresses bug GIRAPH-498.
https://issues.apache.org/jira/browse/GIRAPH-498
Diffs (updated)
-----
giraph-core/src/main/java/org/apache/giraph/conf/GiraphConfiguration.java 796047d
giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java a48c5ea
giraph-core/src/main/java/org/apache/giraph/worker/EdgeInputSplitsCallable.java 7d40dfb
giraph-core/src/main/java/org/apache/giraph/worker/EdgeInputSplitsCallableFactory.java 1adcd73
giraph-core/src/main/java/org/apache/giraph/worker/InputSplitPathOrganizer.java bfaefd2
giraph-core/src/main/java/org/apache/giraph/worker/InputSplitsCallable.java d09ca2b
giraph-core/src/main/java/org/apache/giraph/worker/InputSplitsHandler.java PRE-CREATION
giraph-core/src/main/java/org/apache/giraph/worker/VertexInputSplitsCallable.java a4f98e1
giraph-core/src/main/java/org/apache/giraph/worker/VertexInputSplitsCallableFactory.java 0d617dc
giraph-core/src/test/java/org/apache/giraph/TestBspBasic.java 987f51c
Diff: https://reviews.apache.org/r/9301/diff/
Testing
-------
mvn clean verify
Real application, using 200 workers and 20 input threads:
- trunk - about 560s for input split threads to finish, 720s for input superstep
- with this patch - about 310s for input split threads to finish, 500s for input superstep
Thanks,
Maja Kabiljo
Re: Review Request: GIRAPH-498: We should check input splits status from
zookeeeper once per worker, not once per split thread
Posted by Eli Reisman <in...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9301/#review16119
-----------------------------------------------------------
Ship it!
I'm +1 on this, it is long overdue and I was a bit troubled by things like this since we made the move to multithreading. I think there are more places where work like this would be beneficial. Nice job!
- Eli Reisman
On Feb. 5, 2013, 5:42 a.m., Maja Kabiljo wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/9301/
> -----------------------------------------------------------
>
> (Updated Feb. 5, 2013, 5:42 a.m.)
>
>
> Review request for giraph.
>
>
> Description
> -------
>
> When using a lot of workers and a lot of input split threads, checking that all input splits are finished after the reading is done takes a long time, since we check every input split once per thread.
>
>
> This addresses bug GIRAPH-498.
> https://issues.apache.org/jira/browse/GIRAPH-498
>
>
> Diffs
> -----
>
> giraph-core/src/main/java/org/apache/giraph/conf/GiraphConfiguration.java 796047d
> giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java f542344
> giraph-core/src/main/java/org/apache/giraph/worker/EdgeInputSplitsCallable.java 7d40dfb
> giraph-core/src/main/java/org/apache/giraph/worker/EdgeInputSplitsCallableFactory.java 1adcd73
> giraph-core/src/main/java/org/apache/giraph/worker/InputSplitPathOrganizer.java bfaefd2
> giraph-core/src/main/java/org/apache/giraph/worker/InputSplitsCallable.java d09ca2b
> giraph-core/src/main/java/org/apache/giraph/worker/InputSplitsHandler.java PRE-CREATION
> giraph-core/src/main/java/org/apache/giraph/worker/VertexInputSplitsCallable.java a4f98e1
> giraph-core/src/main/java/org/apache/giraph/worker/VertexInputSplitsCallableFactory.java 0d617dc
> giraph-core/src/test/java/org/apache/giraph/TestBspBasic.java 987f51c
> giraph-hbase/.graph.csv.crc PRE-CREATION
> giraph-hbase/graph.csv PRE-CREATION
>
> Diff: https://reviews.apache.org/r/9301/diff/
>
>
> Testing
> -------
>
> mvn clean verify
>
> Real application, using 200 workers and 20 input threads:
> - trunk - about 560s for input split threads to finish, 720s for input superstep
> - with this patch - about 310s for input split threads to finish, 500s for input superstep
>
>
> Thanks,
>
> Maja Kabiljo
>
>