You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by "Eli Reisman (JIRA)" <ji...@apache.org> on 2013/01/10 21:02:13 UTC

[jira] [Commented] (GIRAPH-477) Fetching locality info in InputSplitPathOrganizer causes jobs to hang

    [ https://issues.apache.org/jira/browse/GIRAPH-477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550346#comment-13550346 ] 

Eli Reisman commented on GIRAPH-477:
------------------------------------

+1. Sorry about the headaches, I know this never did much for you on the FB clusters but it sure helped at LI and I expect on that sort of cluster config for others as well? Thats my hope anyway ;)

1. The only clusters I have used this feature with provided the info we needed for locality. If some of yours are not doing that, perhaps depending on the locality string you get (null, empty?) we could get this to intelligently default to "no locality ordering" when its clear there's nothing to work with?

2. Things have changed in this neck of the woods since I've been in the codebase but there was a hardcoded default of "5 entries or less" for localtiy string, you could use a default of "0" to perhaps trigger the effect you want more easily? May not be applicable any more and also may not matter; this patch looks great to me.

3. Anyone else need to chime in or should I commit this?

                
> Fetching locality info in InputSplitPathOrganizer causes jobs to hang
> ---------------------------------------------------------------------
>
>                 Key: GIRAPH-477
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-477
>             Project: Giraph
>          Issue Type: Bug
>            Reporter: Alessandro Presta
>            Assignee: Alessandro Presta
>         Attachments: GIRAPH-477.patch
>
>
> In the presence of many input splits (>6000 in our case) and input split threads (3000), the loop that fetches locality info for all splits from ZooKeeper becomes a bottleneck. A few workers aren't able to even iterate once over the list, run into increased GC pauses, and eventually time out.
> Furthermore, depending on the cluster configuration, it's not always possible/useful to exploit locality.
> We should add a flag so that the feature can be optionally disabled.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira