You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by Kristopher Kane <kk...@gmail.com> on 2012/09/23 16:50:10 UTC
Failed recovery after master/tservers IP chages
All,
I was doing some shuffling around at home and changed IPs on my master
and all tservers. I thought this would be OK as I had configured
everything via hostnames but I've got some log entries that say
otherwise:
Unable to recover
192.168.122.222:11224/d41fc0de-f4bc-4c28-a4ae-a0114c5911d7(java.io.IOException:
org.apache.thrift.transport.TTransportException:
java.net.ConnectException: Connection timed out)
java.io.IOException: org.apache.thrift.transport.TTransportException:
java.net.ConnectException: Connection timed out
This is the master reporting on two tservers of the four I have. Also
to note: I did have an unclean shutdown prior to the IP changes and
the monitor shows no tablets loaded for any table with the recovery
directory in HDFS empty.
I don't need the data and I can always init, but I was curious about
fixing this to learn more about the system.
Where is a good place to start?
Thanks,
-Kris Kane
Re: Failed recovery after master/tservers IP chages
Posted by John Vines <vi...@apache.org>.
No Jim, it's a thrift transport exception and the dfsclient doesn't use
thrift. Dfs is fairly well designed to avoid any sort of host identity
needed for persistence.
John
Sent from my phone, so pardon thetypos and brevity.
On Sep 23, 2012 1:00 PM, "Jim Klucar" <kl...@gmail.com> wrote:
> Hadoop has some weird DNS/Reverse DNS lookup requirements. My guess
> would be that Hadoop is bonking.
>
> Sent from my iPhone
>
> On Sep 23, 2012, at 11:07 AM, Kristopher Kane <kk...@gmail.com>
> wrote:
>
> > I left some parts out:
> >
> > This is 1.4 and the GC process for fille collection has been running
> > since the cluster turned on. So, does that mean things are being held
> > up in the WA logs?
> >
> > -KRis
> >
> > On Sun, Sep 23, 2012 at 10:50 AM, Kristopher Kane <kk...@gmail.com>
> wrote:
> >> All,
> >>
> >> I was doing some shuffling around at home and changed IPs on my master
> >> and all tservers. I thought this would be OK as I had configured
> >> everything via hostnames but I've got some log entries that say
> >> otherwise:
> >>
> >>
> >> Unable to recover
> >>
> 192.168.122.222:11224/d41fc0de-f4bc-4c28-a4ae-a0114c5911d7(java.io.IOException
> :
> >> org.apache.thrift.transport.TTransportException:
> >> java.net.ConnectException: Connection timed out)
> >> java.io.IOException:
> org.apache.thrift.transport.TTransportException:
> >> java.net.ConnectException: Connection timed out
> >>
> >>
> >>
> >> This is the master reporting on two tservers of the four I have. Also
> >> to note: I did have an unclean shutdown prior to the IP changes and
> >> the monitor shows no tablets loaded for any table with the recovery
> >> directory in HDFS empty.
> >> I don't need the data and I can always init, but I was curious about
> >> fixing this to learn more about the system.
> >>
> >> Where is a good place to start?
> >>
> >> Thanks,
> >>
> >>
> >> -Kris Kane
>
Re: Failed recovery after master/tservers IP chages
Posted by Jim Klucar <kl...@gmail.com>.
Hadoop has some weird DNS/Reverse DNS lookup requirements. My guess
would be that Hadoop is bonking.
Sent from my iPhone
On Sep 23, 2012, at 11:07 AM, Kristopher Kane <kk...@gmail.com> wrote:
> I left some parts out:
>
> This is 1.4 and the GC process for fille collection has been running
> since the cluster turned on. So, does that mean things are being held
> up in the WA logs?
>
> -KRis
>
> On Sun, Sep 23, 2012 at 10:50 AM, Kristopher Kane <kk...@gmail.com> wrote:
>> All,
>>
>> I was doing some shuffling around at home and changed IPs on my master
>> and all tservers. I thought this would be OK as I had configured
>> everything via hostnames but I've got some log entries that say
>> otherwise:
>>
>>
>> Unable to recover
>> 192.168.122.222:11224/d41fc0de-f4bc-4c28-a4ae-a0114c5911d7(java.io.IOException:
>> org.apache.thrift.transport.TTransportException:
>> java.net.ConnectException: Connection timed out)
>> java.io.IOException: org.apache.thrift.transport.TTransportException:
>> java.net.ConnectException: Connection timed out
>>
>>
>>
>> This is the master reporting on two tservers of the four I have. Also
>> to note: I did have an unclean shutdown prior to the IP changes and
>> the monitor shows no tablets loaded for any table with the recovery
>> directory in HDFS empty.
>> I don't need the data and I can always init, but I was curious about
>> fixing this to learn more about the system.
>>
>> Where is a good place to start?
>>
>> Thanks,
>>
>>
>> -Kris Kane
Re: Failed recovery after master/tservers IP chages
Posted by Kristopher Kane <kk...@gmail.com>.
I left some parts out:
This is 1.4 and the GC process for fille collection has been running
since the cluster turned on. So, does that mean things are being held
up in the WA logs?
-KRis
On Sun, Sep 23, 2012 at 10:50 AM, Kristopher Kane <kk...@gmail.com> wrote:
> All,
>
> I was doing some shuffling around at home and changed IPs on my master
> and all tservers. I thought this would be OK as I had configured
> everything via hostnames but I've got some log entries that say
> otherwise:
>
>
> Unable to recover
> 192.168.122.222:11224/d41fc0de-f4bc-4c28-a4ae-a0114c5911d7(java.io.IOException:
> org.apache.thrift.transport.TTransportException:
> java.net.ConnectException: Connection timed out)
> java.io.IOException: org.apache.thrift.transport.TTransportException:
> java.net.ConnectException: Connection timed out
>
>
>
> This is the master reporting on two tservers of the four I have. Also
> to note: I did have an unclean shutdown prior to the IP changes and
> the monitor shows no tablets loaded for any table with the recovery
> directory in HDFS empty.
> I don't need the data and I can always init, but I was curious about
> fixing this to learn more about the system.
>
> Where is a good place to start?
>
> Thanks,
>
>
> -Kris Kane
Re: Failed recovery after master/tservers IP chages
Posted by John Vines <vi...@apache.org>.
Accumulo registers write ahead logs by logger ip, not hostname. So even
though you start up processes by hostname, there is still a dependency on
ip consistency for log recovery.
Sent from my phone, so pardon the typos and brevity.
On Sep 23, 2012 10:50 AM, "Kristopher Kane" <kk...@gmail.com> wrote:
> All,
>
> I was doing some shuffling around at home and changed IPs on my master
> and all tservers. I thought this would be OK as I had configured
> everything via hostnames but I've got some log entries that say
> otherwise:
>
>
> Unable to recover
>
> 192.168.122.222:11224/d41fc0de-f4bc-4c28-a4ae-a0114c5911d7(java.io.IOException
> :
> org.apache.thrift.transport.TTransportException:
> java.net.ConnectException: Connection timed out)
> java.io.IOException:
> org.apache.thrift.transport.TTransportException:
> java.net.ConnectException: Connection timed out
>
>
>
> This is the master reporting on two tservers of the four I have. Also
> to note: I did have an unclean shutdown prior to the IP changes and
> the monitor shows no tablets loaded for any table with the recovery
> directory in HDFS empty.
> I don't need the data and I can always init, but I was curious about
> fixing this to learn more about the system.
>
> Where is a good place to start?
>
> Thanks,
>
>
> -Kris Kane
>