You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Liyin Tang (Created) (JIRA)" <ji...@apache.org> on 2012/01/23 19:38:40 UTC

[jira] [Created] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
---------------------------------------------------------------------------

                 Key: HBASE-5259
                 URL: https://issues.apache.org/jira/browse/HBASE-5259
             Project: HBase
          Issue Type: Improvement
            Reporter: Liyin Tang
            Assignee: Liyin Tang


Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 

The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).

So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.







--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13209155#comment-13209155 ] 

Phabricator commented on HBASE-5259:
------------------------------------

mbautin has committed the revision "[jira][HBASE-5259] Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.".

REVISION DETAIL
  https://reviews.facebook.net/D1413

COMMIT
  https://reviews.facebook.net/rHBASEEIGHTNINEFBBRANCH1239892

                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>             Fix For: 0.94.0
>
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, HBASE-5259.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194473#comment-13194473 ] 

Phabricator commented on HBASE-5259:
------------------------------------

Liyin has commented on the revision "[jira][HBASE-5259] Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.".

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:89 @tedyu, I prefer to name this variable as reverseDNSCacheMap. Do you have any specific reason to changing it to reverseDNSCache ?
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:198 @tedyu, This function is a wrapper function of DNS.reverseDNS which also provides caching as you suggest.
  However I believe this function is supposed to keep the same behavior as DNS.reverseDNS including throwing out NamingException.

REVISION DETAIL
  https://reviews.facebook.net/D1413

                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194951#comment-13194951 ] 

Phabricator commented on HBASE-5259:
------------------------------------

Kannan has commented on the revision "[jira][HBASE-5259] Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.".

  typo:
  <<this error case which isn't supposed to happen>>
  I meant to say:
  <<this error case which isn't supposed to happen normally>>

REVISION DETAIL
  https://reviews.facebook.net/D1413

                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13197508#comment-13197508 ] 

Hudson commented on HBASE-5259:
-------------------------------

Integrated in HBase-TRUNK #2649 (See [https://builds.apache.org/job/HBase-TRUNK/2649/])
    HBASE-5259 Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup (Liyin Tang)

tedyu : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java

                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>             Fix For: 0.94.0
>
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, HBASE-5259.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194927#comment-13194927 ] 

Phabricator commented on HBASE-5259:
------------------------------------

tedyu has commented on the revision "[jira][HBASE-5259] Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.".

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:198 I am learning about the possibilities of reverse DNS failure:

  http://www.crucialp.com/resources/tutorials/web-hosting/how-reverse-dns-works-rdns.php

  I think we should be prepared for such occasion as I outlined @ 9:43pm.
  Just for your reference.
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:198 bq. this error case which isn't supposed to happen
  If I understand the statement correctly, you didn't say 'definitely not possible'.

  My earlier analysis w.r.t. NamingException shows that we would incur extra delay in case reverse DNS fails since the assignment on line 169 doesn't put the fall back value into cache.
  This can be regarded as performance regression compared to previous implementation where reverse DNS is not taken into account.

REVISION DETAIL
  https://reviews.facebook.net/D1413

                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194513#comment-13194513 ] 

Phabricator commented on HBASE-5259:
------------------------------------

gqchen has commented on the revision "[jira][HBASE-5259] Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.".

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:89 I guess this is more like a personal preference. For non-trivial data structures, I personally find it helpful to have the type in the variable name, so that when you read the code where the variable is being used, you don't have to guess.

REVISION DETAIL
  https://reviews.facebook.net/D1413

                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-5259:
------------------------------

      Resolution: Fixed
    Hadoop Flags: Reviewed
          Status: Resolved  (was: Patch Available)
    
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>             Fix For: 0.94.0
>
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, HBASE-5259.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194493#comment-13194493 ] 

Phabricator commented on HBASE-5259:
------------------------------------

Kannan has commented on the revision "[jira][HBASE-5259] Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.".

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:198 Ted: If you are trying to optimize for the performance of this error case which isn't supposed to happen, I don't think it is really worth it. Furthermore, the defaulting logic of falling back to the old style hostname is in the caller.

REVISION DETAIL
  https://reviews.facebook.net/D1413

                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194489#comment-13194489 ] 

Phabricator commented on HBASE-5259:
------------------------------------

tedyu has commented on the revision "[jira][HBASE-5259] Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.".

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:89 Map is reflected in the type of this field.
  My suggestion was only for your reference.
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:198 If NamingException is thrown out of line 201, line 202 would be skipped.
  Line 169 might be executed multiple times because regionServerAddress across multiple iterations may carry the same (unresolvable) value.

  Correct me if I am wrong.

REVISION DETAIL
  https://reviews.facebook.net/D1413

                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Phabricator (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Phabricator updated HBASE-5259:
-------------------------------

    Attachment: D1413.2.patch

Liyin updated the revision "[jira][HBASE-5259] Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.".
Reviewers: Kannan, Karthik, mbautin

  Address Ted's comments.

REVISION DETAIL
  https://reviews.facebook.net/D1413

AFFECTED FILES
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java

                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194491#comment-13194491 ] 

Phabricator commented on HBASE-5259:
------------------------------------

tedyu has commented on the revision "[jira][HBASE-5259] Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.".

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:89 Map is reflected in the type of this field.
  My suggestion was only for your reference.
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:198 If NamingException is thrown out of line 201, line 202 would be skipped.
  Line 169 might be executed multiple times because regionServerAddress across multiple iterations may carry the same (unresolvable) value.

  Correct me if I am wrong.

REVISION DETAIL
  https://reviews.facebook.net/D1413

                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192885#comment-13192885 ] 

Phabricator commented on HBASE-5259:
------------------------------------

tedyu has commented on the revision "[jira][HBASE-5259] Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.".

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:153 I briefly went over DNS.reverseDns() and didn't see caching there.
  Can we introduce caching here to avoid looking up the same address again and again ?

  The second parameter is supposed to be 'The host name of a reachable DNS server'
  Should we allow the user to specify such a server ?


REVISION DETAIL
  https://reviews.facebook.net/D1413

                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Phabricator (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Phabricator updated HBASE-5259:
-------------------------------

    Attachment: D1413.3.patch

Liyin updated the revision "[jira][HBASE-5259] Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.".
Reviewers: Kannan, Karthik, mbautin

  refactoring the code.

REVISION DETAIL
  https://reviews.facebook.net/D1413

AFFECTED FILES
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java

                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194492#comment-13194492 ] 

Phabricator commented on HBASE-5259:
------------------------------------

Kannan has commented on the revision "[jira][HBASE-5259] Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.".

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:198 Ted: If you are trying to optimize for the performance of this error case which isn't supposed to happen, I don't think it is really worth it. Furthermore, the defaulting logic of falling back to the old style hostname is in the caller.

REVISION DETAIL
  https://reviews.facebook.net/D1413

                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194924#comment-13194924 ] 

Phabricator commented on HBASE-5259:
------------------------------------

tedyu has commented on the revision "[jira][HBASE-5259] Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.".

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:198 I am learning about the possibilities of reverse DNS failure:

  http://www.crucialp.com/resources/tutorials/web-hosting/how-reverse-dns-works-rdns.php

  I think we should be prepared for such occasion as I outlined @ 9:43pm.
  Just for your reference.
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:198 bq. this error case which isn't supposed to happen
  If I understand the statement correctly, you didn't say 'definitely not possible'.

  My earlier analysis w.r.t. NamingException shows that we would incur extra delay in case reverse DNS fails since the assignment on line 169 doesn't put the fall back value into cache.
  This can be regarded as performance regression compared to previous implementation where reverse DNS is not taken into account.

REVISION DETAIL
  https://reviews.facebook.net/D1413

                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13193287#comment-13193287 ] 

Phabricator commented on HBASE-5259:
------------------------------------

tedyu has commented on the revision "[jira][HBASE-5259] Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.".

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:89 I think reverseDNSCache is a good enough name.
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:202 Should NamingException be handled here ?

REVISION DETAIL
  https://reviews.facebook.net/D1413

                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194511#comment-13194511 ] 

Phabricator commented on HBASE-5259:
------------------------------------

gqchen has commented on the revision "[jira][HBASE-5259] Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.".

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:89 I guess this is more like a personal preference. For non-trivial data structures, I personally find it helpful to have the type in the variable name, so that when you read the code where the variable is being used, you don't have to guess.

REVISION DETAIL
  https://reviews.facebook.net/D1413

                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13193290#comment-13193290 ] 

Phabricator commented on HBASE-5259:
------------------------------------

tedyu has commented on the revision "[jira][HBASE-5259] Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.".

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:89 I think reverseDNSCache is a good enough name.
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:202 Should NamingException be handled here ?

REVISION DETAIL
  https://reviews.facebook.net/D1413

                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Phabricator (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Phabricator updated HBASE-5259:
-------------------------------

    Attachment: D1413.1.patch

Liyin requested code review of "[jira][HBASE-5259] Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.".
Reviewers: Kannan, Karthik, mbautin

  Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality.

  The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).

  So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

TEST PLAN
  running all the unit tests

REVISION DETAIL
  https://reviews.facebook.net/D1413

AFFECTED FILES
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java

MANAGE HERALD DIFFERENTIAL RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/2937/

Tip: use the X-Herald-Rules header to filter Herald messages in your client.

                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Phabricator (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Phabricator updated HBASE-5259:
-------------------------------

    Attachment: D1413.1.patch

Liyin requested code review of "[jira][HBASE-5259] Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.".
Reviewers: Kannan, Karthik, mbautin

  Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality.

  The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).

  So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

TEST PLAN
  running all the unit tests

REVISION DETAIL
  https://reviews.facebook.net/D1413

AFFECTED FILES
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java

MANAGE HERALD DIFFERENTIAL RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/2937/

Tip: use the X-Herald-Rules header to filter Herald messages in your client.

                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194999#comment-13194999 ] 

Zhihong Yu commented on HBASE-5259:
-----------------------------------

bq. recovery from a transitent DNS look up failure
Correction: I was talking about reverse DNS failure which is caused by loss of PTR record. This aspect is new due to the inclusion of reverse DNS lookup.
                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13197162#comment-13197162 ] 

Zhihong Yu commented on HBASE-5259:
-----------------------------------

Integrated to TRUNK.

Thanks for the patch, Liyin.

Thanks for the review Kannan and Jerry.
                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>             Fix For: 0.94.0
>
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, HBASE-5259.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13209164#comment-13209164 ] 

Phabricator commented on HBASE-5259:
------------------------------------

mbautin has committed the revision "[jira][HBASE-5259] Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.".

REVISION DETAIL
  https://reviews.facebook.net/D1413

COMMIT
  https://reviews.facebook.net/rHBASEEIGHTNINEFBBRANCH1239892

                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>             Fix For: 0.94.0
>
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, HBASE-5259.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13209175#comment-13209175 ] 

Phabricator commented on HBASE-5259:
------------------------------------

mbautin has committed the revision "[jira][HBASE-5259] Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.".

REVISION DETAIL
  https://reviews.facebook.net/D1413

COMMIT
  https://reviews.facebook.net/rHBASEEIGHTNINEFBBRANCH1239892

                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>             Fix For: 0.94.0
>
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, HBASE-5259.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13193288#comment-13193288 ] 

Phabricator commented on HBASE-5259:
------------------------------------

tedyu has commented on the revision "[jira][HBASE-5259] Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.".

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:89 I think reverseDNSCache is a good enough name.
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:202 Should NamingException be handled here ?

REVISION DETAIL
  https://reviews.facebook.net/D1413

                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Liyin Tang (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13195345#comment-13195345 ] 

Liyin Tang commented on HBASE-5259:
-----------------------------------

Has that package been deprecated ?
Two similar packages look confusing to me.
                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, HBASE-5259.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194475#comment-13194475 ] 

Phabricator commented on HBASE-5259:
------------------------------------

Liyin has commented on the revision "[jira][HBASE-5259] Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.".

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:89 @tedyu, I prefer to name this variable as reverseDNSCacheMap. Do you have any specific reason to changing it to reverseDNSCache ?
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:198 @tedyu, This function is a wrapper function of DNS.reverseDNS which also provides caching as you suggest.
  However I believe this function is supposed to keep the same behavior as DNS.reverseDNS including throwing out NamingException.

REVISION DETAIL
  https://reviews.facebook.net/D1413

                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194512#comment-13194512 ] 

Phabricator commented on HBASE-5259:
------------------------------------

gqchen has commented on the revision "[jira][HBASE-5259] Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.".

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:89 I guess this is more like a personal preference. For non-trivial data structures, I personally find it helpful to have the type in the variable name, so that when you read the code where the variable is being used, you don't have to guess.

REVISION DETAIL
  https://reviews.facebook.net/D1413

                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194952#comment-13194952 ] 

Phabricator commented on HBASE-5259:
------------------------------------

Kannan has commented on the revision "[jira][HBASE-5259] Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.".

  typo:
  <<this error case which isn't supposed to happen>>
  I meant to say:
  <<this error case which isn't supposed to happen normally>>

REVISION DETAIL
  https://reviews.facebook.net/D1413

                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194483#comment-13194483 ] 

Phabricator commented on HBASE-5259:
------------------------------------

Kannan has accepted the revision "[jira][HBASE-5259] Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.".

  Liyin -- looks good to me. One minor suggestion inlined.

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:171 logging here is unnecessary because of logging in line 191. The "split" (TableSplit's toString() method already will print the regionLocation along with the start/stop keys for each map task.

REVISION DETAIL
  https://reviews.facebook.net/D1413

                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194490#comment-13194490 ] 

Phabricator commented on HBASE-5259:
------------------------------------

tedyu has commented on the revision "[jira][HBASE-5259] Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.".

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:89 Map is reflected in the type of this field.
  My suggestion was only for your reference.
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:198 If NamingException is thrown out of line 201, line 202 would be skipped.
  Line 169 might be executed multiple times because regionServerAddress across multiple iterations may carry the same (unresolvable) value.

  Correct me if I am wrong.

REVISION DETAIL
  https://reviews.facebook.net/D1413

                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194992#comment-13194992 ] 

Phabricator commented on HBASE-5259:
------------------------------------

Liyin has commented on the revision "[jira][HBASE-5259] Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.".

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:202 I don't think so. In your suggestion, you won't have a change to recovery from a transitent DNS look up failure, will you?

  And also the reverseDNSCacheMap is sopposed to cache the DNS lookup result instead of caching other falt tolerant result.

  In addition, If your DNS server has some failures, your task tracker and job tracker will be slow down as well.

REVISION DETAIL
  https://reviews.facebook.net/D1413

                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194990#comment-13194990 ] 

Phabricator commented on HBASE-5259:
------------------------------------

Liyin has commented on the revision "[jira][HBASE-5259] Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.".

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:202 I don't think so. In your suggestion, you won't have a change to recovery from a transitent DNS look up failure, will you?

  And also the reverseDNSCacheMap is sopposed to cache the DNS lookup result instead of caching other falt tolerant result.

  In addition, If your DNS server has some failures, your task tracker and job tracker will be slow down as well.

REVISION DETAIL
  https://reviews.facebook.net/D1413

                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192883#comment-13192883 ] 

Phabricator commented on HBASE-5259:
------------------------------------

tedyu has commented on the revision "[jira][HBASE-5259] Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.".

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:153 I briefly went over DNS.reverseDns() and didn't see caching there.
  Can we introduce caching here to avoid looking up the same address again and again ?

  The second parameter is supposed to be 'The host name of a reachable DNS server'
  Should we allow the user to specify such a server ?


REVISION DETAIL
  https://reviews.facebook.net/D1413

                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Liyin Tang (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13195360#comment-13195360 ] 

Liyin Tang commented on HBASE-5259:
-----------------------------------

I see. I will generate another patch including /mapred.
Thanks Ted.
                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, HBASE-5259.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13195153#comment-13195153 ] 

Zhihong Yu commented on HBASE-5259:
-----------------------------------

I agree that handling general DNS failure cases is outside the scope of this JIRA.
                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, HBASE-5259.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194991#comment-13194991 ] 

Phabricator commented on HBASE-5259:
------------------------------------

Liyin has commented on the revision "[jira][HBASE-5259] Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.".

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:202 I don't think so. In your suggestion, you won't have a change to recovery from a transitent DNS look up failure, will you?

  And also the reverseDNSCacheMap is sopposed to cache the DNS lookup result instead of caching other falt tolerant result.

  In addition, If your DNS server has some failures, your task tracker and job tracker will be slow down as well.

REVISION DETAIL
  https://reviews.facebook.net/D1413

                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194482#comment-13194482 ] 

Phabricator commented on HBASE-5259:
------------------------------------

Kannan has accepted the revision "[jira][HBASE-5259] Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.".

  Liyin -- looks good to me. One minor suggestion inlined.

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:171 logging here is unnecessary because of logging in line 191. The "split" (TableSplit's toString() method already will print the regionLocation along with the start/stop keys for each map task.

REVISION DETAIL
  https://reviews.facebook.net/D1413

                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194514#comment-13194514 ] 

Phabricator commented on HBASE-5259:
------------------------------------

gqchen has commented on the revision "[jira][HBASE-5259] Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.".

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:89 I guess this is more like a personal preference. For non-trivial data structures, I personally find it helpful to have the type in the variable name, so that when you read the code where the variable is being used, you don't have to guess.

REVISION DETAIL
  https://reviews.facebook.net/D1413

                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Phabricator (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Phabricator updated HBASE-5259:
-------------------------------

    Attachment: D1413.1.patch

Liyin requested code review of "[jira][HBASE-5259] Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.".
Reviewers: Kannan, Karthik, mbautin

  Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality.

  The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).

  So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

TEST PLAN
  running all the unit tests

REVISION DETAIL
  https://reviews.facebook.net/D1413

AFFECTED FILES
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java

MANAGE HERALD DIFFERENTIAL RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/2937/

Tip: use the X-Herald-Rules header to filter Herald messages in your client.

                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194474#comment-13194474 ] 

Phabricator commented on HBASE-5259:
------------------------------------

Liyin has commented on the revision "[jira][HBASE-5259] Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.".

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:89 @tedyu, I prefer to name this variable as reverseDNSCacheMap. Do you have any specific reason to changing it to reverseDNSCache ?
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:198 @tedyu, This function is a wrapper function of DNS.reverseDNS which also provides caching as you suggest.
  However I believe this function is supposed to keep the same behavior as DNS.reverseDNS including throwing out NamingException.

REVISION DETAIL
  https://reviews.facebook.net/D1413

                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194993#comment-13194993 ] 

Phabricator commented on HBASE-5259:
------------------------------------

Liyin has commented on the revision "[jira][HBASE-5259] Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.".

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:202 I don't think so. In your suggestion, you won't have a change to recovery from a transitent DNS look up failure, will you?

  And also the reverseDNSCacheMap is sopposed to cache the DNS lookup result instead of caching other falt tolerant result.

  In addition, If your DNS server has some failures, your task tracker and job tracker will be slow down as well.

REVISION DETAIL
  https://reviews.facebook.net/D1413

                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194467#comment-13194467 ] 

Phabricator commented on HBASE-5259:
------------------------------------

tedyu has commented on the revision "[jira][HBASE-5259] Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.".

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:202 I think NamingException handling @ line 166 should be moved here.

REVISION DETAIL
  https://reviews.facebook.net/D1413

                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194468#comment-13194468 ] 

Phabricator commented on HBASE-5259:
------------------------------------

tedyu has commented on the revision "[jira][HBASE-5259] Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.".

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:202 I think NamingException handling @ line 166 should be moved here.

REVISION DETAIL
  https://reviews.facebook.net/D1413

                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194928#comment-13194928 ] 

Phabricator commented on HBASE-5259:
------------------------------------

tedyu has commented on the revision "[jira][HBASE-5259] Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.".

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:198 I am learning about the possibilities of reverse DNS failure:

  http://www.crucialp.com/resources/tutorials/web-hosting/how-reverse-dns-works-rdns.php

  I think we should be prepared for such occasion as I outlined @ 9:43pm.
  Just for your reference.
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:198 bq. this error case which isn't supposed to happen
  If I understand the statement correctly, you didn't say 'definitely not possible'.

  My earlier analysis w.r.t. NamingException shows that we would incur extra delay in case reverse DNS fails since the assignment on line 169 doesn't put the fall back value into cache.
  This can be regarded as performance regression compared to previous implementation where reverse DNS is not taken into account.

REVISION DETAIL
  https://reviews.facebook.net/D1413

                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13209169#comment-13209169 ] 

Phabricator commented on HBASE-5259:
------------------------------------

mbautin has committed the revision "[jira][HBASE-5259] Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.".

REVISION DETAIL
  https://reviews.facebook.net/D1413

COMMIT
  https://reviews.facebook.net/rHBASEEIGHTNINEFBBRANCH1239892

                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>             Fix For: 0.94.0
>
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, HBASE-5259.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194958#comment-13194958 ] 

Zhihong Yu commented on HBASE-5259:
-----------------------------------

Since each review comment results in 4 copies on the JIRA and 4 copies in each of email Inboxes, allow me to continue here.

bq.  <<this error case which isn't supposed to happen normally>>
I understand.
But we should be prepared for abnormal cases, right ?
                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194472#comment-13194472 ] 

Phabricator commented on HBASE-5259:
------------------------------------

Liyin has commented on the revision "[jira][HBASE-5259] Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.".

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:89 @tedyu, I prefer to name this variable as reverseDNSCacheMap. Do you have any specific reason to changing it to reverseDNSCache ?
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:198 @tedyu, This function is a wrapper function of DNS.reverseDNS which also provides caching as you suggest.
  However I believe this function is supposed to keep the same behavior as DNS.reverseDNS including throwing out NamingException.

REVISION DETAIL
  https://reviews.facebook.net/D1413

                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194953#comment-13194953 ] 

Phabricator commented on HBASE-5259:
------------------------------------

Kannan has commented on the revision "[jira][HBASE-5259] Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.".

  typo:
  <<this error case which isn't supposed to happen>>
  I meant to say:
  <<this error case which isn't supposed to happen normally>>

REVISION DETAIL
  https://reviews.facebook.net/D1413

                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194480#comment-13194480 ] 

Phabricator commented on HBASE-5259:
------------------------------------

Kannan has accepted the revision "[jira][HBASE-5259] Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.".

  Liyin -- looks good to me. One minor suggestion inlined.

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:171 logging here is unnecessary because of logging in line 191. The "split" (TableSplit's toString() method already will print the regionLocation along with the start/stop keys for each map task.

REVISION DETAIL
  https://reviews.facebook.net/D1413

                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13195183#comment-13195183 ] 

Hadoop QA commented on HBASE-5259:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12512226/HBASE-5259.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    -1 javadoc.  The javadoc tool appears to have generated -140 warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 161 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.replication.TestReplication
                  org.apache.hadoop.hbase.io.hfile.TestHFileBlock
                  org.apache.hadoop.hbase.mapreduce.TestImportTsv
                  org.apache.hadoop.hbase.mapred.TestTableMapReduce
                  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/859//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/859//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/859//console

This message is automatically generated.
                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, HBASE-5259.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-5259:
------------------------------

    Fix Version/s: 0.94.0

Changing TableInputFormatBase in mapreduce should be good enough.
                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>             Fix For: 0.94.0
>
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, HBASE-5259.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Liyin Tang (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Liyin Tang updated HBASE-5259:
------------------------------

    Attachment: HBASE-5259.patch
    
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, HBASE-5259.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194950#comment-13194950 ] 

Phabricator commented on HBASE-5259:
------------------------------------

Kannan has commented on the revision "[jira][HBASE-5259] Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.".

  typo:
  <<this error case which isn't supposed to happen>>
  I meant to say:
  <<this error case which isn't supposed to happen normally>>

REVISION DETAIL
  https://reviews.facebook.net/D1413

                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13195311#comment-13195311 ] 

Zhihong Yu commented on HBASE-5259:
-----------------------------------

@Liyin:
There is also src/main/java/org/apache/hadoop/hbase/mapred/TableInputFormatBase.java

I guess similar change is applicable there ?
                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, HBASE-5259.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Phabricator (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Phabricator updated HBASE-5259:
-------------------------------

    Attachment: D1413.2.patch

Liyin updated the revision "[jira][HBASE-5259] Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.".
Reviewers: Kannan, Karthik, mbautin

  Address Ted's comments.

REVISION DETAIL
  https://reviews.facebook.net/D1413

AFFECTED FILES
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java

                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Liyin Tang (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Liyin Tang updated HBASE-5259:
------------------------------

    Status: Patch Available  (was: Open)
    
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, HBASE-5259.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13193289#comment-13193289 ] 

Phabricator commented on HBASE-5259:
------------------------------------

tedyu has commented on the revision "[jira][HBASE-5259] Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.".

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:89 I think reverseDNSCache is a good enough name.
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:202 Should NamingException be handled here ?

REVISION DETAIL
  https://reviews.facebook.net/D1413

                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194469#comment-13194469 ] 

Phabricator commented on HBASE-5259:
------------------------------------

tedyu has commented on the revision "[jira][HBASE-5259] Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.".

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:202 I think NamingException handling @ line 166 should be moved here.

REVISION DETAIL
  https://reviews.facebook.net/D1413

                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194470#comment-13194470 ] 

Phabricator commented on HBASE-5259:
------------------------------------

tedyu has commented on the revision "[jira][HBASE-5259] Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.".

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:202 I think NamingException handling @ line 166 should be moved here.

REVISION DETAIL
  https://reviews.facebook.net/D1413

                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194487#comment-13194487 ] 

Phabricator commented on HBASE-5259:
------------------------------------

tedyu has commented on the revision "[jira][HBASE-5259] Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.".

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:89 Map is reflected in the type of this field.
  My suggestion was only for your reference.
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:198 If NamingException is thrown out of line 201, line 202 would be skipped.
  Line 169 might be executed multiple times because regionServerAddress across multiple iterations may carry the same (unresolvable) value.

  Correct me if I am wrong.

REVISION DETAIL
  https://reviews.facebook.net/D1413

                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194481#comment-13194481 ] 

Phabricator commented on HBASE-5259:
------------------------------------

Kannan has accepted the revision "[jira][HBASE-5259] Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.".

  Liyin -- looks good to me. One minor suggestion inlined.

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:171 logging here is unnecessary because of logging in line 191. The "split" (TableSplit's toString() method already will print the regionLocation along with the start/stop keys for each map task.

REVISION DETAIL
  https://reviews.facebook.net/D1413

                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192882#comment-13192882 ] 

Phabricator commented on HBASE-5259:
------------------------------------

tedyu has commented on the revision "[jira][HBASE-5259] Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.".

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:153 I briefly went over DNS.reverseDns() and didn't see caching there.
  Can we introduce caching here to avoid looking up the same address again and again ?

  The second parameter is supposed to be 'The host name of a reachable DNS server'
  Should we allow the user to specify such a server ?


REVISION DETAIL
  https://reviews.facebook.net/D1413

                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194495#comment-13194495 ] 

Phabricator commented on HBASE-5259:
------------------------------------

Kannan has commented on the revision "[jira][HBASE-5259] Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.".

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:198 Ted: If you are trying to optimize for the performance of this error case which isn't supposed to happen, I don't think it is really worth it. Furthermore, the defaulting logic of falling back to the old style hostname is in the caller.

REVISION DETAIL
  https://reviews.facebook.net/D1413

                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192884#comment-13192884 ] 

Phabricator commented on HBASE-5259:
------------------------------------

tedyu has commented on the revision "[jira][HBASE-5259] Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.".

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:153 I briefly went over DNS.reverseDns() and didn't see caching there.
  Can we introduce caching here to avoid looking up the same address again and again ?

  The second parameter is supposed to be 'The host name of a reachable DNS server'
  Should we allow the user to specify such a server ?


REVISION DETAIL
  https://reviews.facebook.net/D1413

                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13195355#comment-13195355 ] 

Zhihong Yu commented on HBASE-5259:
-----------------------------------

Even in hadoop, mapred has been un-deprecated.
I assume there're users who are dependent on mapred/TableInputFormatBase.java

Thanks
                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, HBASE-5259.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Liyin Tang (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13195061#comment-13195061 ] 

Liyin Tang commented on HBASE-5259:
-----------------------------------

Hi Ted, 
I totally understand your concern and appreciate your feedback.
It would be nice to fault tolerant all kinds of DNS server failures, which could be transient failures, loss of PTR or DNS service crash. The tradeoff is to select a most frequent happening failure case and try to tolerate it gracefully. In my perspective, for some large impact failures such as DNS server crash, sometimes it would be better to fire alarm and try to fix it as soon as possible. Also for minor impact failures, it would be great to recovery it naturally. For others, it would be fine to pay some cost. 


If you believe the loss of PTR record is the normal failure case in your systems, I would encourage to open a new jira to handle it properly across all the code base of HBase, DFS and MapReduce. I do believe we need a better fault tolerant policy across all these dependent components.
                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194926#comment-13194926 ] 

Phabricator commented on HBASE-5259:
------------------------------------

tedyu has commented on the revision "[jira][HBASE-5259] Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.".

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:198 I am learning about the possibilities of reverse DNS failure:

  http://www.crucialp.com/resources/tutorials/web-hosting/how-reverse-dns-works-rdns.php

  I think we should be prepared for such occasion as I outlined @ 9:43pm.
  Just for your reference.
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:198 bq. this error case which isn't supposed to happen
  If I understand the statement correctly, you didn't say 'definitely not possible'.

  My earlier analysis w.r.t. NamingException shows that we would incur extra delay in case reverse DNS fails since the assignment on line 169 doesn't put the fall back value into cache.
  This can be regarded as performance regression compared to previous implementation where reverse DNS is not taken into account.

REVISION DETAIL
  https://reviews.facebook.net/D1413

                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Phabricator (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194494#comment-13194494 ] 

Phabricator commented on HBASE-5259:
------------------------------------

Kannan has commented on the revision "[jira][HBASE-5259] Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.".

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:198 Ted: If you are trying to optimize for the performance of this error case which isn't supposed to happen, I don't think it is really worth it. Furthermore, the defaulting logic of falling back to the old style hostname is in the caller.

REVISION DETAIL
  https://reviews.facebook.net/D1413

                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Liyin Tang (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13196315#comment-13196315 ] 

Liyin Tang commented on HBASE-5259:
-----------------------------------

@Ted, the TableInputFormatBase in the mapred package has already Deprecated as the code marked. No need to update the patch.

@Deprecated
public abstract class TableInputFormatBase {}
                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, HBASE-5259.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.

Posted by "Phabricator (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Phabricator updated HBASE-5259:
-------------------------------

    Attachment: D1413.3.patch

Liyin updated the revision "[jira][HBASE-5259] Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.".
Reviewers: Kannan, Karthik, mbautin

  refactoring the code.

REVISION DETAIL
  https://reviews.facebook.net/D1413

AFFECTED FILES
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java

                
> Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-5259
>                 URL: https://issues.apache.org/jira/browse/HBASE-5259
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Liyin Tang
>            Assignee: Liyin Tang
>         Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch, D1413.3.patch
>
>
> Assuming the HBase and MapReduce running in the same cluster, the TableInputFormat is to override the split function which divides all the regions from one particular table into a series of mapper tasks. So each mapper task can process a region or one part of a region. Ideally, the mapper task should run on the same machine on which the region server hosts the corresponding region. That's the motivation that the TableInputFormat sets the RegionLocation so that the MapReduce framework can respect the node locality. 
> The code simply set the host name of the region server as the HRegionLocation. However, the host name of the region server may have different format with the host name of the task tracker (Mapper task). The task tracker always gets its hostname by the reverse DNS lookup. And the DNS service may return different host name format. For example, the host name of the region server is correctly set as a.b.c.d while the reverse DNS lookup may return a.b.c.d. (With an additional doc in the end).
> So the solution is to set the RegionLocation by the reverse DNS lookup as well. No matter what host name format the DNS system is using, the TableInputFormat has the responsibility to keep the consistent host name format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira