You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Robert Dyer <ps...@gmail.com> on 2012/12/16 10:16:19 UTC

Wrong input split locations after enabling reverse DNS

I recently enabled reverse DNS on my test cluster.  Now when I run a MR
job, the HBase input split locations are all adding a period to the end.
 For example:

/default-rack/foo-1.
/default-rack/foo-2.

Yet the machine locations are still correct:

/default-rack/foo-1
/default-rack/foo-2

Since those strings don't match, it isn't assigning the tasks locally.  It
actually thinks 100% of the map tasks are rack-local and 0% data-local
(although in reality, some still wind up being data-local due to sheer
luck).

What is the issue here?  Note that I don't have this problem with the MR
tasks using SequenceFile as input, only with HBase's TableMapper.

Re: Wrong input split locations after enabling reverse DNS

Posted by Robert Dyer <rd...@iastate.edu>.
Just to follow up here, I did manage to test a patch on
TableInputFormatBase.java and it resolved my issue.

I filed https://issues.apache.org/jira/browse/HBASE-7693 and will attach
the patch as soon as my Git updates.


On Mon, Dec 17, 2012 at 8:45 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:

> New issue, the other one is too old.
>
> Thx!
>
> J-D
>
> On Mon, Dec 17, 2012 at 6:39 PM, Robert Dyer <rd...@iastate.edu> wrote:
> > Seems plausible.  A simple grep reveals this:
> >
> > mapreduce/TableInputFormatBase.java:      hostName =
> > DNS.reverseDns(ipAddress, this.nameServer);
> >
> > which is not doing the filtering that HBASE-4109 does.
> >
> > Would this typically be filed as a new issue or brought up in comments on
> > the closed issue?
> >
> >
> > On Mon, Dec 17, 2012 at 8:21 PM, Jean-Daniel Cryans <jdcryans@apache.org
> >
> > wrote:
> >>
> >> Maybe TableInputFormatBase.getSplits is missing something similar to
> >> HBASE-4109?
> >>
> >> J-D
> >>
> >> On Mon, Dec 17, 2012 at 5:26 PM, Robert Dyer <ps...@gmail.com> wrote:
> >> > That's what I thought too.  Except I am running 0.94.2 and this fix
> was
> >> > released in 0.90.4.
> >> >
> >> >
> >> > On Mon, Dec 17, 2012 at 5:11 PM, Stack <st...@duboce.net> wrote:
> >> >
> >> >> On Sun, Dec 16, 2012 at 1:16 AM, Robert Dyer <ps...@gmail.com>
> wrote:
> >> >>
> >> >>> I recently enabled reverse DNS on my test cluster.  Now when I run a
> >> >>> MR
> >> >>> job, the HBase input split locations are all adding a period to the
> >> >>> end.
> >> >>>  For example:
> >> >>>
> >> >>> /default-rack/foo-1.
> >> >>> /default-rack/foo-2.
> >> >>>
> >> >>> Yet the machine locations are still correct:
> >> >>>
> >> >>> /default-rack/foo-1
> >> >>> /default-rack/foo-2
> >> >>>
> >> >>> Since those strings don't match, it isn't assigning the tasks
> locally.
> >> >>> It
> >> >>> actually thinks 100% of the map tasks are rack-local and 0%
> data-local
> >> >>> (although in reality, some still wind up being data-local due to
> sheer
> >> >>> luck).
> >> >>>
> >> >>> What is the issue here?  Note that I don't have this problem with
> the
> >> >>> MR
> >> >>> tasks using SequenceFile as input, only with HBase's TableMapper.
> >> >>>
> >> >>
> >> >>
> >> >> Looks like https://issues.apache.org/jira/browse/HBASE-4109 ?
> >> >> St.Ack
> >> >>
> >> >>
> >
> >
> >
> >
> > --
> >
> > Robert Dyer
> > rdyer@iastate.edu
>



-- 

Robert Dyer
rdyer@iastate.edu

Re: Wrong input split locations after enabling reverse DNS

Posted by Jean-Daniel Cryans <jd...@apache.org>.
New issue, the other one is too old.

Thx!

J-D

On Mon, Dec 17, 2012 at 6:39 PM, Robert Dyer <rd...@iastate.edu> wrote:
> Seems plausible.  A simple grep reveals this:
>
> mapreduce/TableInputFormatBase.java:      hostName =
> DNS.reverseDns(ipAddress, this.nameServer);
>
> which is not doing the filtering that HBASE-4109 does.
>
> Would this typically be filed as a new issue or brought up in comments on
> the closed issue?
>
>
> On Mon, Dec 17, 2012 at 8:21 PM, Jean-Daniel Cryans <jd...@apache.org>
> wrote:
>>
>> Maybe TableInputFormatBase.getSplits is missing something similar to
>> HBASE-4109?
>>
>> J-D
>>
>> On Mon, Dec 17, 2012 at 5:26 PM, Robert Dyer <ps...@gmail.com> wrote:
>> > That's what I thought too.  Except I am running 0.94.2 and this fix was
>> > released in 0.90.4.
>> >
>> >
>> > On Mon, Dec 17, 2012 at 5:11 PM, Stack <st...@duboce.net> wrote:
>> >
>> >> On Sun, Dec 16, 2012 at 1:16 AM, Robert Dyer <ps...@gmail.com> wrote:
>> >>
>> >>> I recently enabled reverse DNS on my test cluster.  Now when I run a
>> >>> MR
>> >>> job, the HBase input split locations are all adding a period to the
>> >>> end.
>> >>>  For example:
>> >>>
>> >>> /default-rack/foo-1.
>> >>> /default-rack/foo-2.
>> >>>
>> >>> Yet the machine locations are still correct:
>> >>>
>> >>> /default-rack/foo-1
>> >>> /default-rack/foo-2
>> >>>
>> >>> Since those strings don't match, it isn't assigning the tasks locally.
>> >>> It
>> >>> actually thinks 100% of the map tasks are rack-local and 0% data-local
>> >>> (although in reality, some still wind up being data-local due to sheer
>> >>> luck).
>> >>>
>> >>> What is the issue here?  Note that I don't have this problem with the
>> >>> MR
>> >>> tasks using SequenceFile as input, only with HBase's TableMapper.
>> >>>
>> >>
>> >>
>> >> Looks like https://issues.apache.org/jira/browse/HBASE-4109 ?
>> >> St.Ack
>> >>
>> >>
>
>
>
>
> --
>
> Robert Dyer
> rdyer@iastate.edu

Re: Wrong input split locations after enabling reverse DNS

Posted by Robert Dyer <rd...@iastate.edu>.
Seems plausible.  A simple grep reveals this:

mapreduce/TableInputFormatBase.java:      hostName =
DNS.reverseDns(ipAddress, this.nameServer);

which is not doing the filtering that HBASE-4109 does.

Would this typically be filed as a new issue or brought up in comments on
the closed issue?


On Mon, Dec 17, 2012 at 8:21 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:

> Maybe TableInputFormatBase.getSplits is missing something similar to
> HBASE-4109?
>
> J-D
>
> On Mon, Dec 17, 2012 at 5:26 PM, Robert Dyer <ps...@gmail.com> wrote:
> > That's what I thought too.  Except I am running 0.94.2 and this fix was
> > released in 0.90.4.
> >
> >
> > On Mon, Dec 17, 2012 at 5:11 PM, Stack <st...@duboce.net> wrote:
> >
> >> On Sun, Dec 16, 2012 at 1:16 AM, Robert Dyer <ps...@gmail.com> wrote:
> >>
> >>> I recently enabled reverse DNS on my test cluster.  Now when I run a MR
> >>> job, the HBase input split locations are all adding a period to the
> end.
> >>>  For example:
> >>>
> >>> /default-rack/foo-1.
> >>> /default-rack/foo-2.
> >>>
> >>> Yet the machine locations are still correct:
> >>>
> >>> /default-rack/foo-1
> >>> /default-rack/foo-2
> >>>
> >>> Since those strings don't match, it isn't assigning the tasks locally.
>  It
> >>> actually thinks 100% of the map tasks are rack-local and 0% data-local
> >>> (although in reality, some still wind up being data-local due to sheer
> >>> luck).
> >>>
> >>> What is the issue here?  Note that I don't have this problem with the
> MR
> >>> tasks using SequenceFile as input, only with HBase's TableMapper.
> >>>
> >>
> >>
> >> Looks like https://issues.apache.org/jira/browse/HBASE-4109 ?
> >> St.Ack
> >>
> >>
>



-- 

Robert Dyer
rdyer@iastate.edu

Re: Wrong input split locations after enabling reverse DNS

Posted by Jean-Daniel Cryans <jd...@apache.org>.
Maybe TableInputFormatBase.getSplits is missing something similar to HBASE-4109?

J-D

On Mon, Dec 17, 2012 at 5:26 PM, Robert Dyer <ps...@gmail.com> wrote:
> That's what I thought too.  Except I am running 0.94.2 and this fix was
> released in 0.90.4.
>
>
> On Mon, Dec 17, 2012 at 5:11 PM, Stack <st...@duboce.net> wrote:
>
>> On Sun, Dec 16, 2012 at 1:16 AM, Robert Dyer <ps...@gmail.com> wrote:
>>
>>> I recently enabled reverse DNS on my test cluster.  Now when I run a MR
>>> job, the HBase input split locations are all adding a period to the end.
>>>  For example:
>>>
>>> /default-rack/foo-1.
>>> /default-rack/foo-2.
>>>
>>> Yet the machine locations are still correct:
>>>
>>> /default-rack/foo-1
>>> /default-rack/foo-2
>>>
>>> Since those strings don't match, it isn't assigning the tasks locally.  It
>>> actually thinks 100% of the map tasks are rack-local and 0% data-local
>>> (although in reality, some still wind up being data-local due to sheer
>>> luck).
>>>
>>> What is the issue here?  Note that I don't have this problem with the MR
>>> tasks using SequenceFile as input, only with HBase's TableMapper.
>>>
>>
>>
>> Looks like https://issues.apache.org/jira/browse/HBASE-4109 ?
>> St.Ack
>>
>>

Re: Wrong input split locations after enabling reverse DNS

Posted by Robert Dyer <ps...@gmail.com>.
That's what I thought too.  Except I am running 0.94.2 and this fix was
released in 0.90.4.


On Mon, Dec 17, 2012 at 5:11 PM, Stack <st...@duboce.net> wrote:

> On Sun, Dec 16, 2012 at 1:16 AM, Robert Dyer <ps...@gmail.com> wrote:
>
>> I recently enabled reverse DNS on my test cluster.  Now when I run a MR
>> job, the HBase input split locations are all adding a period to the end.
>>  For example:
>>
>> /default-rack/foo-1.
>> /default-rack/foo-2.
>>
>> Yet the machine locations are still correct:
>>
>> /default-rack/foo-1
>> /default-rack/foo-2
>>
>> Since those strings don't match, it isn't assigning the tasks locally.  It
>> actually thinks 100% of the map tasks are rack-local and 0% data-local
>> (although in reality, some still wind up being data-local due to sheer
>> luck).
>>
>> What is the issue here?  Note that I don't have this problem with the MR
>> tasks using SequenceFile as input, only with HBase's TableMapper.
>>
>
>
> Looks like https://issues.apache.org/jira/browse/HBASE-4109 ?
> St.Ack
>
>

Re: Wrong input split locations after enabling reverse DNS

Posted by Stack <st...@duboce.net>.
On Sun, Dec 16, 2012 at 1:16 AM, Robert Dyer <ps...@gmail.com> wrote:

> I recently enabled reverse DNS on my test cluster.  Now when I run a MR
> job, the HBase input split locations are all adding a period to the end.
>  For example:
>
> /default-rack/foo-1.
> /default-rack/foo-2.
>
> Yet the machine locations are still correct:
>
> /default-rack/foo-1
> /default-rack/foo-2
>
> Since those strings don't match, it isn't assigning the tasks locally.  It
> actually thinks 100% of the map tasks are rack-local and 0% data-local
> (although in reality, some still wind up being data-local due to sheer
> luck).
>
> What is the issue here?  Note that I don't have this problem with the MR
> tasks using SequenceFile as input, only with HBase's TableMapper.
>


Looks like https://issues.apache.org/jira/browse/HBASE-4109 ?
St.Ack