You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by Kristopher Kane <kk...@gmail.com> on 2012/09/23 16:50:10 UTC

Failed recovery after master/tservers IP chages

All,

I was doing some shuffling around at home and changed IPs on my master
and all tservers.  I thought this would be OK as I had configured
everything via hostnames but I've got some log entries that say
otherwise:


Unable to recover
192.168.122.222:11224/d41fc0de-f4bc-4c28-a4ae-a0114c5911d7(java.io.IOException:
org.apache.thrift.transport.TTransportException:
java.net.ConnectException: Connection timed out)
	java.io.IOException: org.apache.thrift.transport.TTransportException:
java.net.ConnectException: Connection timed out



This is the master reporting on two tservers of the four I have.  Also
to note:  I did have an unclean shutdown prior to the IP changes and
the monitor shows no tablets loaded for any table with the recovery
directory in HDFS empty.
I don't need the data and I can always init, but I was curious about
fixing this to learn more about the system.

Where is a good place to start?

Thanks,


-Kris Kane

Re: Failed recovery after master/tservers IP chages

Posted by John Vines <vi...@apache.org>.
No Jim, it's a thrift transport exception and the dfsclient doesn't use
thrift. Dfs is fairly well designed to avoid any sort of host identity
needed for persistence.

John

Sent from my phone, so pardon thetypos and brevity.
On Sep 23, 2012 1:00 PM, "Jim Klucar" <kl...@gmail.com> wrote:

> Hadoop has some weird DNS/Reverse DNS lookup requirements. My guess
> would be that Hadoop is bonking.
>
> Sent from my iPhone
>
> On Sep 23, 2012, at 11:07 AM, Kristopher Kane <kk...@gmail.com>
> wrote:
>
> > I left some parts out:
> >
> > This is 1.4 and the GC process for fille collection has been running
> > since the cluster turned on.  So, does that mean things are being held
> > up in the WA logs?
> >
> > -KRis
> >
> > On Sun, Sep 23, 2012 at 10:50 AM, Kristopher Kane <kk...@gmail.com>
> wrote:
> >> All,
> >>
> >> I was doing some shuffling around at home and changed IPs on my master
> >> and all tservers.  I thought this would be OK as I had configured
> >> everything via hostnames but I've got some log entries that say
> >> otherwise:
> >>
> >>
> >> Unable to recover
> >>
> 192.168.122.222:11224/d41fc0de-f4bc-4c28-a4ae-a0114c5911d7(java.io.IOException
> :
> >> org.apache.thrift.transport.TTransportException:
> >> java.net.ConnectException: Connection timed out)
> >>        java.io.IOException:
> org.apache.thrift.transport.TTransportException:
> >> java.net.ConnectException: Connection timed out
> >>
> >>
> >>
> >> This is the master reporting on two tservers of the four I have.  Also
> >> to note:  I did have an unclean shutdown prior to the IP changes and
> >> the monitor shows no tablets loaded for any table with the recovery
> >> directory in HDFS empty.
> >> I don't need the data and I can always init, but I was curious about
> >> fixing this to learn more about the system.
> >>
> >> Where is a good place to start?
> >>
> >> Thanks,
> >>
> >>
> >> -Kris Kane
>

Re: Failed recovery after master/tservers IP chages

Posted by Jim Klucar <kl...@gmail.com>.
Hadoop has some weird DNS/Reverse DNS lookup requirements. My guess
would be that Hadoop is bonking.

Sent from my iPhone

On Sep 23, 2012, at 11:07 AM, Kristopher Kane <kk...@gmail.com> wrote:

> I left some parts out:
>
> This is 1.4 and the GC process for fille collection has been running
> since the cluster turned on.  So, does that mean things are being held
> up in the WA logs?
>
> -KRis
>
> On Sun, Sep 23, 2012 at 10:50 AM, Kristopher Kane <kk...@gmail.com> wrote:
>> All,
>>
>> I was doing some shuffling around at home and changed IPs on my master
>> and all tservers.  I thought this would be OK as I had configured
>> everything via hostnames but I've got some log entries that say
>> otherwise:
>>
>>
>> Unable to recover
>> 192.168.122.222:11224/d41fc0de-f4bc-4c28-a4ae-a0114c5911d7(java.io.IOException:
>> org.apache.thrift.transport.TTransportException:
>> java.net.ConnectException: Connection timed out)
>>        java.io.IOException: org.apache.thrift.transport.TTransportException:
>> java.net.ConnectException: Connection timed out
>>
>>
>>
>> This is the master reporting on two tservers of the four I have.  Also
>> to note:  I did have an unclean shutdown prior to the IP changes and
>> the monitor shows no tablets loaded for any table with the recovery
>> directory in HDFS empty.
>> I don't need the data and I can always init, but I was curious about
>> fixing this to learn more about the system.
>>
>> Where is a good place to start?
>>
>> Thanks,
>>
>>
>> -Kris Kane

Re: Failed recovery after master/tservers IP chages

Posted by Kristopher Kane <kk...@gmail.com>.
I left some parts out:

This is 1.4 and the GC process for fille collection has been running
since the cluster turned on.  So, does that mean things are being held
up in the WA logs?

-KRis

On Sun, Sep 23, 2012 at 10:50 AM, Kristopher Kane <kk...@gmail.com> wrote:
> All,
>
> I was doing some shuffling around at home and changed IPs on my master
> and all tservers.  I thought this would be OK as I had configured
> everything via hostnames but I've got some log entries that say
> otherwise:
>
>
> Unable to recover
> 192.168.122.222:11224/d41fc0de-f4bc-4c28-a4ae-a0114c5911d7(java.io.IOException:
> org.apache.thrift.transport.TTransportException:
> java.net.ConnectException: Connection timed out)
>         java.io.IOException: org.apache.thrift.transport.TTransportException:
> java.net.ConnectException: Connection timed out
>
>
>
> This is the master reporting on two tservers of the four I have.  Also
> to note:  I did have an unclean shutdown prior to the IP changes and
> the monitor shows no tablets loaded for any table with the recovery
> directory in HDFS empty.
> I don't need the data and I can always init, but I was curious about
> fixing this to learn more about the system.
>
> Where is a good place to start?
>
> Thanks,
>
>
> -Kris Kane

Re: Failed recovery after master/tservers IP chages

Posted by John Vines <vi...@apache.org>.
Accumulo registers write ahead logs by logger ip, not hostname. So even
though you start up processes by hostname, there is still a dependency on
ip consistency for log recovery.

Sent from my phone, so pardon the typos and brevity.
On Sep 23, 2012 10:50 AM, "Kristopher Kane" <kk...@gmail.com> wrote:

> All,
>
> I was doing some shuffling around at home and changed IPs on my master
> and all tservers.  I thought this would be OK as I had configured
> everything via hostnames but I've got some log entries that say
> otherwise:
>
>
> Unable to recover
>
> 192.168.122.222:11224/d41fc0de-f4bc-4c28-a4ae-a0114c5911d7(java.io.IOException
> :
> org.apache.thrift.transport.TTransportException:
> java.net.ConnectException: Connection timed out)
>         java.io.IOException:
> org.apache.thrift.transport.TTransportException:
> java.net.ConnectException: Connection timed out
>
>
>
> This is the master reporting on two tservers of the four I have.  Also
> to note:  I did have an unclean shutdown prior to the IP changes and
> the monitor shows no tablets loaded for any table with the recovery
> directory in HDFS empty.
> I don't need the data and I can always init, but I was curious about
> fixing this to learn more about the system.
>
> Where is a good place to start?
>
> Thanks,
>
>
> -Kris Kane
>