You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Arun C Murthy (JIRA)" <ji...@apache.org> on 2007/06/04 12:09:35 UTC

[jira] Created: (HADOOP-1459) FileSystem.getFileCacheHints returns IP addresses rather than hostnames, which breaks 'data-locality' in map-reduce

FileSystem.getFileCacheHints returns IP addresses rather than hostnames, which breaks 'data-locality' in map-reduce
-------------------------------------------------------------------------------------------------------------------

                 Key: HADOOP-1459
                 URL: https://issues.apache.org/jira/browse/HADOOP-1459
             Project: Hadoop
          Issue Type: Bug
          Components: dfs
    Affects Versions: 0.14.0
            Reporter: Arun C Murthy
            Priority: Critical
             Fix For: 0.14.0


FileSystem.getFileCacheHints via DFSClient.getHints (post HADOOP-894?) returns IP address of the datanodes instead of the hostnames which breaks mapping from task-tracker to datanodes in map-reduce i.e. the system cannot intelligently place maps on datanodes where blocks are present.

I have verified that this affects trunk only, branch-0.13.0 seems ok.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1459) FileSystem.getFileCacheHints returns IP addresses rather than hostnames, which breaks 'data-locality' in map-reduce

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12502934 ] 

Raghu Angadi commented on HADOOP-1459:
--------------------------------------

> My proposition would be to return back to hostnames instead of ip addresses.
Please see HADOOP-985 for context why we moved to using IP addresses. It is just that not all Hadoop components moved to IPs.



> FileSystem.getFileCacheHints returns IP addresses rather than hostnames, which breaks 'data-locality' in map-reduce
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1459
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1459
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.0
>            Reporter: Arun C Murthy
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.14.0
>
>         Attachments: getHintsIpAddress.patch
>
>
> FileSystem.getFileCacheHints via DFSClient.getHints (post HADOOP-894?) returns IP address of the datanodes instead of the hostnames which breaks mapping from task-tracker to datanodes in map-reduce i.e. the system cannot intelligently place maps on datanodes where blocks are present.
> I have verified that this affects trunk only, branch-0.13.0 seems ok.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1459) FileSystem.getFileCacheHints returns IP addresses rather than hostnames, which breaks 'data-locality' in map-reduce

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12506039 ] 

Hadoop QA commented on HADOOP-1459:
-----------------------------------

Integrated in Hadoop-Nightly #127 (See [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/127/])

> FileSystem.getFileCacheHints returns IP addresses rather than hostnames, which breaks 'data-locality' in map-reduce
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1459
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1459
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.0
>            Reporter: Arun C Murthy
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.14.0
>
>         Attachments: getHintsIpAddress2.patch
>
>
> FileSystem.getFileCacheHints via DFSClient.getHints (post HADOOP-894?) returns IP address of the datanodes instead of the hostnames which breaks mapping from task-tracker to datanodes in map-reduce i.e. the system cannot intelligently place maps on datanodes where blocks are present.
> I have verified that this affects trunk only, branch-0.13.0 seems ok.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1459) FileSystem.getFileCacheHints returns IP addresses rather than hostnames, which breaks 'data-locality' in map-reduce

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12503702 ] 

Konstantin Shvachko commented on HADOOP-1459:
---------------------------------------------

Protocol version should be changed any time the protocol is changing, which is what this patch does.

> FileSystem.getFileCacheHints returns IP addresses rather than hostnames, which breaks 'data-locality' in map-reduce
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1459
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1459
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.0
>            Reporter: Arun C Murthy
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.14.0
>
>         Attachments: getHintsIpAddress.patch
>
>
> FileSystem.getFileCacheHints via DFSClient.getHints (post HADOOP-894?) returns IP address of the datanodes instead of the hostnames which breaks mapping from task-tracker to datanodes in map-reduce i.e. the system cannot intelligently place maps on datanodes where blocks are present.
> I have verified that this affects trunk only, branch-0.13.0 seems ok.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1459) FileSystem.getFileCacheHints returns IP addresses rather than hostnames, which breaks 'data-locality' in map-reduce

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12503560 ] 

Konstantin Shvachko commented on HADOOP-1459:
---------------------------------------------

This patch should change protocol version.

> FileSystem.getFileCacheHints returns IP addresses rather than hostnames, which breaks 'data-locality' in map-reduce
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1459
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1459
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.0
>            Reporter: Arun C Murthy
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.14.0
>
>         Attachments: getHintsIpAddress.patch
>
>
> FileSystem.getFileCacheHints via DFSClient.getHints (post HADOOP-894?) returns IP address of the datanodes instead of the hostnames which breaks mapping from task-tracker to datanodes in map-reduce i.e. the system cannot intelligently place maps on datanodes where blocks are present.
> I have verified that this affects trunk only, branch-0.13.0 seems ok.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1459) FileSystem.getFileCacheHints returns IP addresses rather than hostnames, which breaks 'data-locality' in map-reduce

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12503948 ] 

dhruba borthakur commented on HADOOP-1459:
------------------------------------------

Yes, I like the idea where DFS keeps fully qualified hostnames and sends
these to the map-reduce framework. 

Thanks,
dhruba



> FileSystem.getFileCacheHints returns IP addresses rather than hostnames, which breaks 'data-locality' in map-reduce
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1459
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1459
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.0
>            Reporter: Arun C Murthy
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.14.0
>
>         Attachments: getHintsIpAddress2.patch
>
>
> FileSystem.getFileCacheHints via DFSClient.getHints (post HADOOP-894?) returns IP address of the datanodes instead of the hostnames which breaks mapping from task-tracker to datanodes in map-reduce i.e. the system cannot intelligently place maps on datanodes where blocks are present.
> I have verified that this affects trunk only, branch-0.13.0 seems ok.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1459) FileSystem.getFileCacheHints returns IP addresses rather than hostnames, which breaks 'data-locality' in map-reduce

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12502699 ] 

dhruba borthakur commented on HADOOP-1459:
------------------------------------------

This allows the WebUI to display machine names as hostNames rather than IP address.

> FileSystem.getFileCacheHints returns IP addresses rather than hostnames, which breaks 'data-locality' in map-reduce
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1459
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1459
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.0
>            Reporter: Arun C Murthy
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.14.0
>
>         Attachments: getHintsIpAddress.patch
>
>
> FileSystem.getFileCacheHints via DFSClient.getHints (post HADOOP-894?) returns IP address of the datanodes instead of the hostnames which breaks mapping from task-tracker to datanodes in map-reduce i.e. the system cannot intelligently place maps on datanodes where blocks are present.
> I have verified that this affects trunk only, branch-0.13.0 seems ok.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1459) FileSystem.getFileCacheHints returns IP addresses rather than hostnames, which breaks 'data-locality' in map-reduce

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12503107 ] 

Raghu Angadi commented on HADOOP-1459:
--------------------------------------

bq. Note that this increases serialization cost for any DatanodeInfo tranfer, which is pretty much most RPC. This will needs a protocol version change since this won't work with prev clients/datanodes.

Does this also affect fsimage version?

> FileSystem.getFileCacheHints returns IP addresses rather than hostnames, which breaks 'data-locality' in map-reduce
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1459
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1459
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.0
>            Reporter: Arun C Murthy
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.14.0
>
>         Attachments: getHintsIpAddress.patch
>
>
> FileSystem.getFileCacheHints via DFSClient.getHints (post HADOOP-894?) returns IP address of the datanodes instead of the hostnames which breaks mapping from task-tracker to datanodes in map-reduce i.e. the system cannot intelligently place maps on datanodes where blocks are present.
> I have verified that this affects trunk only, branch-0.13.0 seems ok.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1459) FileSystem.getFileCacheHints returns IP addresses rather than hostnames, which breaks 'data-locality' in map-reduce

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12502930 ] 

Raghu Angadi commented on HADOOP-1459:
--------------------------------------

Note that this increases serialization cost for any DatanodeInfo tranfer, which is pretty much most RPC. This will needs a protocol version change since this won't work with prev clients/datanodes.


> FileSystem.getFileCacheHints returns IP addresses rather than hostnames, which breaks 'data-locality' in map-reduce
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1459
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1459
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.0
>            Reporter: Arun C Murthy
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.14.0
>
>         Attachments: getHintsIpAddress.patch
>
>
> FileSystem.getFileCacheHints via DFSClient.getHints (post HADOOP-894?) returns IP address of the datanodes instead of the hostnames which breaks mapping from task-tracker to datanodes in map-reduce i.e. the system cannot intelligently place maps on datanodes where blocks are present.
> I have verified that this affects trunk only, branch-0.13.0 seems ok.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1459) FileSystem.getFileCacheHints returns IP addresses rather than hostnames, which breaks 'data-locality' in map-reduce

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-1459:
-------------------------------------

    Attachment: getHintsIpAddress.patch

The getBlockLocations patch HADOOP-894 had a sideeffect that getHints started returning IP addresses instead of hostnames. This patch fixes this problem.



> FileSystem.getFileCacheHints returns IP addresses rather than hostnames, which breaks 'data-locality' in map-reduce
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1459
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1459
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.0
>            Reporter: Arun C Murthy
>            Priority: Blocker
>             Fix For: 0.14.0
>
>         Attachments: getHintsIpAddress.patch
>
>
> FileSystem.getFileCacheHints via DFSClient.getHints (post HADOOP-894?) returns IP address of the datanodes instead of the hostnames which breaks mapping from task-tracker to datanodes in map-reduce i.e. the system cannot intelligently place maps on datanodes where blocks are present.
> I have verified that this affects trunk only, branch-0.13.0 seems ok.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HADOOP-1459) FileSystem.getFileCacheHints returns IP addresses rather than hostnames, which breaks 'data-locality' in map-reduce

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur reassigned HADOOP-1459:
----------------------------------------

    Assignee: dhruba borthakur

> FileSystem.getFileCacheHints returns IP addresses rather than hostnames, which breaks 'data-locality' in map-reduce
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1459
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1459
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.0
>            Reporter: Arun C Murthy
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.14.0
>
>         Attachments: getHintsIpAddress.patch
>
>
> FileSystem.getFileCacheHints via DFSClient.getHints (post HADOOP-894?) returns IP address of the datanodes instead of the hostnames which breaks mapping from task-tracker to datanodes in map-reduce i.e. the system cannot intelligently place maps on datanodes where blocks are present.
> I have verified that this affects trunk only, branch-0.13.0 seems ok.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1459) FileSystem.getFileCacheHints returns IP addresses rather than hostnames, which breaks 'data-locality' in map-reduce

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12502917 ] 

Konstantin Shvachko commented on HADOOP-1459:
---------------------------------------------

Could you please remove redundant
import org.apache.hadoop.io.UTF8
if you change DatanodeInfo.

I think this is the right fix for the time being, although I'm not happy that
# we should keep 2 different identifications for the nodes and that
# we have different ways of node identification in different components of hadoop.

My proposition would be to return back to hostnames instead of ip addresses.
But this of course belongs to a different issue.

> FileSystem.getFileCacheHints returns IP addresses rather than hostnames, which breaks 'data-locality' in map-reduce
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1459
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1459
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.0
>            Reporter: Arun C Murthy
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.14.0
>
>         Attachments: getHintsIpAddress.patch
>
>
> FileSystem.getFileCacheHints via DFSClient.getHints (post HADOOP-894?) returns IP address of the datanodes instead of the hostnames which breaks mapping from task-tracker to datanodes in map-reduce i.e. the system cannot intelligently place maps on datanodes where blocks are present.
> I have verified that this affects trunk only, branch-0.13.0 seems ok.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1459) FileSystem.getFileCacheHints returns IP addresses rather than hostnames, which breaks 'data-locality' in map-reduce

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated HADOOP-1459:
---------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

I just committed this.  Thanks, Dhruba!

> FileSystem.getFileCacheHints returns IP addresses rather than hostnames, which breaks 'data-locality' in map-reduce
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1459
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1459
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.0
>            Reporter: Arun C Murthy
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.14.0
>
>         Attachments: getHintsIpAddress2.patch
>
>
> FileSystem.getFileCacheHints via DFSClient.getHints (post HADOOP-894?) returns IP address of the datanodes instead of the hostnames which breaks mapping from task-tracker to datanodes in map-reduce i.e. the system cannot intelligently place maps on datanodes where blocks are present.
> I have verified that this affects trunk only, branch-0.13.0 seems ok.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1459) FileSystem.getFileCacheHints returns IP addresses rather than hostnames, which breaks 'data-locality' in map-reduce

Posted by "Konstantin Shvachko (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12503945 ] 

Konstantin Shvachko commented on HADOOP-1459:
---------------------------------------------

+1
Should we file a separate issue for replacing ips by fully qualified hostnames?

> FileSystem.getFileCacheHints returns IP addresses rather than hostnames, which breaks 'data-locality' in map-reduce
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1459
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1459
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.0
>            Reporter: Arun C Murthy
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.14.0
>
>         Attachments: getHintsIpAddress2.patch
>
>
> FileSystem.getFileCacheHints via DFSClient.getHints (post HADOOP-894?) returns IP address of the datanodes instead of the hostnames which breaks mapping from task-tracker to datanodes in map-reduce i.e. the system cannot intelligently place maps on datanodes where blocks are present.
> I have verified that this affects trunk only, branch-0.13.0 seems ok.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1459) FileSystem.getFileCacheHints returns IP addresses rather than hostnames, which breaks 'data-locality' in map-reduce

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12502932 ] 

Raghu Angadi commented on HADOOP-1459:
--------------------------------------

{quote}
I think this is the right fix for the time being, although I'm not happy that
# we should keep 2 different identifications for the nodes and that
# we have different ways of node identification in different components of hadoop
{quote}
+1.
Till now Namenode kept 'hostName' only for information purpose.


> FileSystem.getFileCacheHints returns IP addresses rather than hostnames, which breaks 'data-locality' in map-reduce
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1459
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1459
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.0
>            Reporter: Arun C Murthy
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.14.0
>
>         Attachments: getHintsIpAddress.patch
>
>
> FileSystem.getFileCacheHints via DFSClient.getHints (post HADOOP-894?) returns IP address of the datanodes instead of the hostnames which breaks mapping from task-tracker to datanodes in map-reduce i.e. the system cannot intelligently place maps on datanodes where blocks are present.
> I have verified that this affects trunk only, branch-0.13.0 seems ok.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1459) FileSystem.getFileCacheHints returns IP addresses rather than hostnames, which breaks 'data-locality' in map-reduce

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12502937 ] 

Raghu Angadi commented on HADOOP-1459:
--------------------------------------

This perticular issue was considered in HADOOP-985 : 
from https://issues.apache.org/jira/browse/HADOOP-985#action_12473585 :
{quote}
Thanks Hairong. I will include both in a new patch.

This changes the what DFS returns for getDatanodeHints(), which is ultimately used by mapreduce. Two options for handling this:

a) we can modify getDatanodeHints() to return what it used return before this patch. i.e. return descriptor.getHostName() instead of descriptor.getHost(). Advantage is that no changes are necessary in mapreduce. But does not confirm to 'ip every where' policy.

b) Make Job and task tracker also deal in ips. I am not sure yet how intrusive this change is.

My preference is (a). comments?
{quote}

> FileSystem.getFileCacheHints returns IP addresses rather than hostnames, which breaks 'data-locality' in map-reduce
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1459
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1459
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.0
>            Reporter: Arun C Murthy
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.14.0
>
>         Attachments: getHintsIpAddress.patch
>
>
> FileSystem.getFileCacheHints via DFSClient.getHints (post HADOOP-894?) returns IP address of the datanodes instead of the hostnames which breaks mapping from task-tracker to datanodes in map-reduce i.e. the system cannot intelligently place maps on datanodes where blocks are present.
> I have verified that this affects trunk only, branch-0.13.0 seems ok.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1459) FileSystem.getFileCacheHints returns IP addresses rather than hostnames, which breaks 'data-locality' in map-reduce

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-1459:
-------------------------------------

    Attachment:     (was: getHintsIpAddress.patch)

> FileSystem.getFileCacheHints returns IP addresses rather than hostnames, which breaks 'data-locality' in map-reduce
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1459
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1459
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.0
>            Reporter: Arun C Murthy
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.14.0
>
>         Attachments: getHintsIpAddress2.patch
>
>
> FileSystem.getFileCacheHints via DFSClient.getHints (post HADOOP-894?) returns IP address of the datanodes instead of the hostnames which breaks mapping from task-tracker to datanodes in map-reduce i.e. the system cannot intelligently place maps on datanodes where blocks are present.
> I have verified that this affects trunk only, branch-0.13.0 seems ok.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1459) FileSystem.getFileCacheHints returns IP addresses rather than hostnames, which breaks 'data-locality' in map-reduce

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12503626 ] 

dhruba borthakur commented on HADOOP-1459:
------------------------------------------

The protocol version was already bumped up since the 0.13 release. hence I have not changed it in this patch. 

> FileSystem.getFileCacheHints returns IP addresses rather than hostnames, which breaks 'data-locality' in map-reduce
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1459
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1459
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.0
>            Reporter: Arun C Murthy
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.14.0
>
>         Attachments: getHintsIpAddress.patch
>
>
> FileSystem.getFileCacheHints via DFSClient.getHints (post HADOOP-894?) returns IP address of the datanodes instead of the hostnames which breaks mapping from task-tracker to datanodes in map-reduce i.e. the system cannot intelligently place maps on datanodes where blocks are present.
> I have verified that this affects trunk only, branch-0.13.0 seems ok.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1459) FileSystem.getFileCacheHints returns IP addresses rather than hostnames, which breaks 'data-locality' in map-reduce

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12503156 ] 

dhruba borthakur commented on HADOOP-1459:
------------------------------------------

This does not affect fsimage version. DatanodeInfo is used to communicate over-the-wire whereas DatanodeImage is used to persistent info into fsImage. I di dnot change the serialization of DatanodeImage.


> FileSystem.getFileCacheHints returns IP addresses rather than hostnames, which breaks 'data-locality' in map-reduce
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1459
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1459
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.0
>            Reporter: Arun C Murthy
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.14.0
>
>         Attachments: getHintsIpAddress.patch
>
>
> FileSystem.getFileCacheHints via DFSClient.getHints (post HADOOP-894?) returns IP address of the datanodes instead of the hostnames which breaks mapping from task-tracker to datanodes in map-reduce i.e. the system cannot intelligently place maps on datanodes where blocks are present.
> I have verified that this affects trunk only, branch-0.13.0 seems ok.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1459) FileSystem.getFileCacheHints returns IP addresses rather than hostnames, which breaks 'data-locality' in map-reduce

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-1459:
-------------------------------------

    Attachment: getHintsIpAddress2.patch

Bump up ClientProtocol version too.

> FileSystem.getFileCacheHints returns IP addresses rather than hostnames, which breaks 'data-locality' in map-reduce
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1459
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1459
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.0
>            Reporter: Arun C Murthy
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.14.0
>
>         Attachments: getHintsIpAddress.patch, getHintsIpAddress2.patch
>
>
> FileSystem.getFileCacheHints via DFSClient.getHints (post HADOOP-894?) returns IP address of the datanodes instead of the hostnames which breaks mapping from task-tracker to datanodes in map-reduce i.e. the system cannot intelligently place maps on datanodes where blocks are present.
> I have verified that this affects trunk only, branch-0.13.0 seems ok.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1459) FileSystem.getFileCacheHints returns IP addresses rather than hostnames, which breaks 'data-locality' in map-reduce

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12504114 ] 

Owen O'Malley commented on HADOOP-1459:
---------------------------------------

In the medium term we should move all of Hadoop to use IP addresses instead of hostnames. I've filed the relevant bug in HADOOP-1487.

> FileSystem.getFileCacheHints returns IP addresses rather than hostnames, which breaks 'data-locality' in map-reduce
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1459
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1459
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.0
>            Reporter: Arun C Murthy
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.14.0
>
>         Attachments: getHintsIpAddress2.patch
>
>
> FileSystem.getFileCacheHints via DFSClient.getHints (post HADOOP-894?) returns IP address of the datanodes instead of the hostnames which breaks mapping from task-tracker to datanodes in map-reduce i.e. the system cannot intelligently place maps on datanodes where blocks are present.
> I have verified that this affects trunk only, branch-0.13.0 seems ok.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1459) FileSystem.getFileCacheHints returns IP addresses rather than hostnames, which breaks 'data-locality' in map-reduce

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-1459:
-------------------------------------

    Priority: Blocker  (was: Critical)

Marking this issue as a blocker for 0.14 release because it breaks locality of map-reduce with DFS. 

> FileSystem.getFileCacheHints returns IP addresses rather than hostnames, which breaks 'data-locality' in map-reduce
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1459
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1459
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.0
>            Reporter: Arun C Murthy
>            Priority: Blocker
>             Fix For: 0.14.0
>
>
> FileSystem.getFileCacheHints via DFSClient.getHints (post HADOOP-894?) returns IP address of the datanodes instead of the hostnames which breaks mapping from task-tracker to datanodes in map-reduce i.e. the system cannot intelligently place maps on datanodes where blocks are present.
> I have verified that this affects trunk only, branch-0.13.0 seems ok.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1459) FileSystem.getFileCacheHints returns IP addresses rather than hostnames, which breaks 'data-locality' in map-reduce

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-1459:
-------------------------------------

    Status: Patch Available  (was: Open)

The namenode now serializes the hostName as part of DatanodeInfo.  The namenode already had the hostName readily available.

Another alternative would have been to make a getHostName() call on the DFSClient. But we did not adopt this approach because a getHostName() call for every replica of every block could be somewhat time-consuming. 

> FileSystem.getFileCacheHints returns IP addresses rather than hostnames, which breaks 'data-locality' in map-reduce
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1459
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1459
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.0
>            Reporter: Arun C Murthy
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.14.0
>
>         Attachments: getHintsIpAddress.patch
>
>
> FileSystem.getFileCacheHints via DFSClient.getHints (post HADOOP-894?) returns IP address of the datanodes instead of the hostnames which breaks mapping from task-tracker to datanodes in map-reduce i.e. the system cannot intelligently place maps on datanodes where blocks are present.
> I have verified that this affects trunk only, branch-0.13.0 seems ok.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.