You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by "Terry P." <te...@gmail.com> on 2013/08/15 17:01:59 UTC

How to re-IP an Accumulo Cluster

Greetings everyone,
We had to re-IP our entire cluster recently to change subnetworks, and we
essentially lost everything (it was development, so no big deal).  However,
doing a re-IP operation may be required in actual operational cases, and
I'd like to know if it can be done or not so we can note it for the future
(as in document "what not to do" to avoid data loss).

The issue we had was that after shutting down the cluster, re-IPing all
servers, and starting everything back up, the tablets were still assigned
to Tabletservers with the old IP addresses, even though all the hostnames
were the same.  So the system showed 3 Tabletservers, but no tablets, and
no entries in the tables where previously there were 400 million.

So:

A) Does Zookeeper track Tabletservers by IP address only, and not hostname?

B) If A is true, is there a mechanism to change those entries in Zookeeper
so that a re-IP operation could be performed?

Re: How to re-IP an Accumulo Cluster

Posted by Keith Turner <ke...@deenlo.com>.
Use zkCli.sh and look in /accumulo/<accumulo instance id>

In 1.4 Accumulo started locking its info in zookeeper down, so you may need
to execute the following command :

  addauth digest accumulo:SECRET

Replace SECRET with the secret from your accumulo-site.xml file.




On Thu, Aug 15, 2013 at 12:05 PM, Terry P. <te...@gmail.com> wrote:

> Hi Keith,
> Many thanks for your detailed reply. I forgot to mention that yes indeed
> this is on Accumulo 1.4.2, and it was the write-ahead logs that were the
> issue -- partly because two of the tabletservers were not properly shutdown
> before the re-IP operation, so recovery may have been needed on them.
>
> My naivety on Zookeeper certainly hampered the research as well.  How does
> one "look in zookeeper to see what is going on?"  Any pointers would be
> really helpful.
>
> I wish we could go to 1.5 and take advantage of the walogs in HDFS, but no
> can do at this point unfortunately.
>
>
> On Thu, Aug 15, 2013 at 10:24 AM, Keith Turner <ke...@deenlo.com> wrote:
>
>>
>>
>>
>> On Thu, Aug 15, 2013 at 11:01 AM, Terry P. <te...@gmail.com> wrote:
>>
>>> Greetings everyone,
>>> We had to re-IP our entire cluster recently to change subnetworks, and
>>> we essentially lost everything (it was development, so no big deal).
>>> However, doing a re-IP operation may be required in actual operational
>>> cases, and I'd like to know if it can be done or not so we can note it for
>>> the future (as in document "what not to do" to avoid data loss).
>>>
>>> The issue we had was that after shutting down the cluster, re-IPing all
>>> servers, and starting everything back up, the tablets were still assigned
>>> to Tabletservers with the old IP addresses, even though all the hostnames
>>> were the same.  So the system showed 3 Tabletservers, but no tablets, and
>>> no entries in the tables where previously there were 400 million.
>>>
>>> So:
>>>
>>> A) Does Zookeeper track Tabletservers by IP address only, and not
>>> hostname?
>>>
>>
>> It does track by IP address, but not only IP address.  Each tablet server
>> has an ephemeral node in zookeeper under the IP address.  This ehpemeral
>> node should go away when the tserver process dies, and then the master will
>> assume that tserver is dead.  The location of a tablet in the metadata
>> table is conceptually <ephemeral node id>+<IP address>, so once that
>> ephemeral node goes away the location in metadata table is assumed invalid
>> and the tablet is reassigned.   If another tserver starts at the same IP,
>> then the master can differentiate because the ephemeral node is different.
>>
>> You can look at the children nodes under a tserver ip in zookeeper.  Look
>> at the data for the lowest numbered ephemeral node to to get infor about
>> who holds the lock for that IP.
>>
>>
>>
>>
>>> B) If A is true, is there a mechanism to change those entries in
>>> Zookeeper so that a re-IP operation could be performed?
>>>
>>
>> A first step would be to look in zookeeper and see what going on with the
>> ephemeral nodes.
>>
>> In Accumulo 1.3 and 1.4 one thing that normally causes problems when
>> changing lots of IP addrs is write ahead logs.   Tablets point to their
>> write ahead logs using the IP address of the logger. This can cause walog
>> recovery to fail.  In 1.5 walog are stored in HDFS so this not an issue.
>>
>>
>

Re: How to re-IP an Accumulo Cluster

Posted by "Terry P." <te...@gmail.com>.
Hi Keith,
Many thanks for your detailed reply. I forgot to mention that yes indeed
this is on Accumulo 1.4.2, and it was the write-ahead logs that were the
issue -- partly because two of the tabletservers were not properly shutdown
before the re-IP operation, so recovery may have been needed on them.

My naivety on Zookeeper certainly hampered the research as well.  How does
one "look in zookeeper to see what is going on?"  Any pointers would be
really helpful.

I wish we could go to 1.5 and take advantage of the walogs in HDFS, but no
can do at this point unfortunately.


On Thu, Aug 15, 2013 at 10:24 AM, Keith Turner <ke...@deenlo.com> wrote:

>
>
>
> On Thu, Aug 15, 2013 at 11:01 AM, Terry P. <te...@gmail.com> wrote:
>
>> Greetings everyone,
>> We had to re-IP our entire cluster recently to change subnetworks, and we
>> essentially lost everything (it was development, so no big deal).  However,
>> doing a re-IP operation may be required in actual operational cases, and
>> I'd like to know if it can be done or not so we can note it for the future
>> (as in document "what not to do" to avoid data loss).
>>
>> The issue we had was that after shutting down the cluster, re-IPing all
>> servers, and starting everything back up, the tablets were still assigned
>> to Tabletservers with the old IP addresses, even though all the hostnames
>> were the same.  So the system showed 3 Tabletservers, but no tablets, and
>> no entries in the tables where previously there were 400 million.
>>
>> So:
>>
>> A) Does Zookeeper track Tabletservers by IP address only, and not
>> hostname?
>>
>
> It does track by IP address, but not only IP address.  Each tablet server
> has an ephemeral node in zookeeper under the IP address.  This ehpemeral
> node should go away when the tserver process dies, and then the master will
> assume that tserver is dead.  The location of a tablet in the metadata
> table is conceptually <ephemeral node id>+<IP address>, so once that
> ephemeral node goes away the location in metadata table is assumed invalid
> and the tablet is reassigned.   If another tserver starts at the same IP,
> then the master can differentiate because the ephemeral node is different.
>
> You can look at the children nodes under a tserver ip in zookeeper.  Look
> at the data for the lowest numbered ephemeral node to to get infor about
> who holds the lock for that IP.
>
>
>
>
>> B) If A is true, is there a mechanism to change those entries in
>> Zookeeper so that a re-IP operation could be performed?
>>
>
> A first step would be to look in zookeeper and see what going on with the
> ephemeral nodes.
>
> In Accumulo 1.3 and 1.4 one thing that normally causes problems when
> changing lots of IP addrs is write ahead logs.   Tablets point to their
> write ahead logs using the IP address of the logger. This can cause walog
> recovery to fail.  In 1.5 walog are stored in HDFS so this not an issue.
>
>

Re: How to re-IP an Accumulo Cluster

Posted by Keith Turner <ke...@deenlo.com>.
On Thu, Aug 15, 2013 at 11:01 AM, Terry P. <te...@gmail.com> wrote:

> Greetings everyone,
> We had to re-IP our entire cluster recently to change subnetworks, and we
> essentially lost everything (it was development, so no big deal).  However,
> doing a re-IP operation may be required in actual operational cases, and
> I'd like to know if it can be done or not so we can note it for the future
> (as in document "what not to do" to avoid data loss).
>
> The issue we had was that after shutting down the cluster, re-IPing all
> servers, and starting everything back up, the tablets were still assigned
> to Tabletservers with the old IP addresses, even though all the hostnames
> were the same.  So the system showed 3 Tabletservers, but no tablets, and
> no entries in the tables where previously there were 400 million.
>
> So:
>
> A) Does Zookeeper track Tabletservers by IP address only, and not hostname?
>

It does track by IP address, but not only IP address.  Each tablet server
has an ephemeral node in zookeeper under the IP address.  This ehpemeral
node should go away when the tserver process dies, and then the master will
assume that tserver is dead.  The location of a tablet in the metadata
table is conceptually <ephemeral node id>+<IP address>, so once that
ephemeral node goes away the location in metadata table is assumed invalid
and the tablet is reassigned.   If another tserver starts at the same IP,
then the master can differentiate because the ephemeral node is different.

You can look at the children nodes under a tserver ip in zookeeper.  Look
at the data for the lowest numbered ephemeral node to to get infor about
who holds the lock for that IP.




> B) If A is true, is there a mechanism to change those entries in Zookeeper
> so that a re-IP operation could be performed?
>

A first step would be to look in zookeeper and see what going on with the
ephemeral nodes.

In Accumulo 1.3 and 1.4 one thing that normally causes problems when
changing lots of IP addrs is write ahead logs.   Tablets point to their
write ahead logs using the IP address of the logger. This can cause walog
recovery to fail.  In 1.5 walog are stored in HDFS so this not an issue.