You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Yan Zhou (JIRA)" <ji...@apache.org> on 2010/08/04 19:51:16 UTC

[jira] Created: (PIG-1535) Combined input splits need to consider rack-locality for the underlying splits of rack info.

Combined input splits need to consider rack-locality for the underlying splits of rack info.
--------------------------------------------------------------------------------------------

                 Key: PIG-1535
                 URL: https://issues.apache.org/jira/browse/PIG-1535
             Project: Pig
          Issue Type: Improvement
            Reporter: Yan Zhou


PIG-1518 will add support to incorporate multiple small splits into bigger yet less splits. In doing so, the underlying generic input split's node-locality is consulted  to maximize the data node-locality for the "big" splits. The rack-locality info is unavailable because the generic input splits do not have the info currently. MAPREDUCE-1698 is filed to address the lack of rack info in InputSplit. On the other hand, for many other types of input splits the rack info is available. FileSplit is an example. Future Howl's input splits will also contain the rack-locality info. 

In summary, before MAPREDUCE-1698 is resolved if ever, for some specific types of input splits, the small splits could be combined with the awareness of the rack-locality, by, probably, the same or similar algorithms by the CombineFileInputFormat.

But it would mean non-trivial extra work on top of PIG-1518 and may be out of reach of 0.8, hence a separate JIRA.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.