You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by "Eli Reisman (JIRA)" <ji...@apache.org> on 2012/10/05 05:57:47 UTC

[jira] [Updated] (GIRAPH-307) InputSplit list can be long with many workers (and locality info) and should not be re-created every time a worker calls reserveInputSplit()

     [ https://issues.apache.org/jira/browse/GIRAPH-307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eli Reisman updated GIRAPH-307:
-------------------------------

    Attachment: GIRAPH-307-3.patch

Thanks again Maja, I rebased this and fixed the test name. It passed mvn verify again now.

It should reduce ZK traffic during input superstep but in the brief testing I did it did not trim much time off input superstep. Its just a small fix I think. If I recall it prevents the repeated calls to ZK and the rebuild of the path list for every iteration on the list by all workers when the list itself never changes.

Thanks again for the review!
                
> InputSplit list can be long with many workers (and locality info) and should not be re-created every time a worker calls reserveInputSplit()
> --------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: GIRAPH-307
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-307
>             Project: Giraph
>          Issue Type: Improvement
>          Components: bsp, graph
>    Affects Versions: 0.2.0
>            Reporter: Eli Reisman
>            Assignee: Eli Reisman
>            Priority: Minor
>             Fix For: 0.2.0
>
>         Attachments: GIRAPH-307-1.patch, GIRAPH-307-2.patch, GIRAPH-307-3.patch
>
>
> While instrumenting the INPUT_SUPERSTEP and watching various runs, I see the input split list generated every time a worker calls reserveInputSplit is, for all intents and purposes, immutable per job. Therefore, we can save a fair amount of memory by not re-creating the list and re-querying ZooKeeper on each pass to claim another split. Only the reserved and finished children lists are ever mutated during the input phase of the job.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira