You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@accumulo.apache.org by Vikram Srivastava <vi...@cloudera.com> on 2014/01/17 02:46:58 UTC

Accumulo rolling restart

Hi,

Is there a way to re-assign a tablet from one Tserver to another while both
are running, in a manner so as to cause minimum impact to client?

My motivation for this is to use something that does that to do rolling
restart of TServers.

Thanks,

Vikram

Re: Accumulo rolling restart

Posted by Eric Newton <er...@gmail.com>.
You can already shutdown a tablet server with the admin command.  In that
case, the master will move tablets off that server, using the normal
user-specified balancer, and eventually stop the tablet server.  Or you can
just forcibly restart it and just let the normal recovery process handle
the restart event.  The former will provide greater availability for any
one particular tablet, but the later is often faster overall.

The client library will hide any momentary offline tablet.

The ability to micro-manage tablet server assignment is the Balancer API.
The ability to automatically move tablets away from a server already exists.
Discovering new servers already exists.

-Eric



On Thu, Jan 16, 2014 at 11:51 PM, Vikram Srivastava <vi...@cloudera.com>wrote:

> I don't want multiple TServers to host the same tablet. I'm looking for
> something like this:
>
> def moveTablet(Tablet tab, TServer tSrc, TServer tDest):
>    // Tells master to move tab from tSrc to tDest
>
>
> def restartTServer(TServer t1):
>
>   <Tell master to not assign any more tablets to t1>
>
>   Collection<Tablet> t1_tablets = t1.getTablets();
>   Map<Tablet, TServer> newLocations;
>
>   for tablets tab in t1_tablets:
>      moveTablet(tab, t1, <some other TServer "tDest" selected in
> round-robin manner>)
>      newLocations.put(tab, tDest)
>
>   <Restart t1 process>
>
>   for <tab, tDest> in newLocations.entries():
>      moveTablet(tab, tDest, t1)
>
>   <Tell master t1 is eligible again for new tablets>
>
>
> This way client faces interruptions at most twice (assuming nothing else
> fails) during rolling restart of all TServers.
>
>
> On Thu, Jan 16, 2014 at 7:35 PM, Josh Elser <jo...@gmail.com> wrote:
>
> > On Thu, Jan 16, 2014 at 10:00 PM, Vikram Srivastava
> > <vi...@cloudera.com> wrote:
> > > Thanks for the replies. Couple of follow up questions -
> > > 1. What would the client experience during safe shutdown? Both a new
> > client
> > > trying to read a tablet on the TServer going down, and an existing
> client
> > > reading a table on the TServer that's going down.
> >
> > Existing client would talk to master to figure out the new assignment.
> > New client would do the same. Both would poll the master waiting for
> > the new assignment.
> >
> > > 2. The reason I wanted to know about a method for controlled
> > re-assignment
> > > while both TServers are running is so that I can bring the tablets back
> > to
> > > the original TServer, thereby ensuring that each tablet is unavailable
> > only
> > > once during the entire rolling restart process. If there is no such
> > method
> > > currently, I'd be happy to file a jira.
> >
> > Multiple hostings for a tablet would be considered a critical bug --
> > it should never happen. Balancing of tablets across tservers is
> > something that the master is often doing. You shouldn't have to do
> > anything but ensure the processes are running,
> >
> > >
> > >
> > > On Thu, Jan 16, 2014 at 6:21 PM, Eric Newton <er...@gmail.com>
> > wrote:
> > >
> > >> And here's the command to do it:
> > >>
> > >> $ bin/accumulo admin stop server[:port]
> > >>
> > >>
> > >> But recovery is pretty fast... killing tservers can be faster than
> > doing an
> > >> orderly shutdown.
> > >>
> > >> With 1.4, if I was restarting nodes on several racks, I would kill the
> > >> loggers on a rack, flush all tables, and then restart all the tservers
> > and
> > >> loggers on that rack.  Rinse and repeat.
> > >>
> > >> -Eric
> > >>
> > >>
> > >>
> > >> On Thu, Jan 16, 2014 at 9:09 PM, John Vines <vi...@apache.org> wrote:
> > >>
> > >> > You can stop an individual tserver which will do a safe shutdown of
> it
> > >> and
> > >> > reassignment. However, this won't work between releases due to
> > potential
> > >> > version changes.
> > >> >
> > >> > Sent from my phone, please pardon the typos and brevity.
> > >> > On Jan 16, 2014 8:47 PM, "Vikram Srivastava" <vi...@cloudera.com>
> > >> wrote:
> > >> >
> > >> > > Hi,
> > >> > >
> > >> > > Is there a way to re-assign a tablet from one Tserver to another
> > while
> > >> > both
> > >> > > are running, in a manner so as to cause minimum impact to client?
> > >> > >
> > >> > > My motivation for this is to use something that does that to do
> > rolling
> > >> > > restart of TServers.
> > >> > >
> > >> > > Thanks,
> > >> > >
> > >> > > Vikram
> > >> > >
> > >> >
> > >>
> >
>

Re: Accumulo rolling restart

Posted by Vikram Srivastava <vi...@cloudera.com>.
I don't want multiple TServers to host the same tablet. I'm looking for
something like this:

def moveTablet(Tablet tab, TServer tSrc, TServer tDest):
   // Tells master to move tab from tSrc to tDest


def restartTServer(TServer t1):

  <Tell master to not assign any more tablets to t1>

  Collection<Tablet> t1_tablets = t1.getTablets();
  Map<Tablet, TServer> newLocations;

  for tablets tab in t1_tablets:
     moveTablet(tab, t1, <some other TServer "tDest" selected in
round-robin manner>)
     newLocations.put(tab, tDest)

  <Restart t1 process>

  for <tab, tDest> in newLocations.entries():
     moveTablet(tab, tDest, t1)

  <Tell master t1 is eligible again for new tablets>


This way client faces interruptions at most twice (assuming nothing else
fails) during rolling restart of all TServers.


On Thu, Jan 16, 2014 at 7:35 PM, Josh Elser <jo...@gmail.com> wrote:

> On Thu, Jan 16, 2014 at 10:00 PM, Vikram Srivastava
> <vi...@cloudera.com> wrote:
> > Thanks for the replies. Couple of follow up questions -
> > 1. What would the client experience during safe shutdown? Both a new
> client
> > trying to read a tablet on the TServer going down, and an existing client
> > reading a table on the TServer that's going down.
>
> Existing client would talk to master to figure out the new assignment.
> New client would do the same. Both would poll the master waiting for
> the new assignment.
>
> > 2. The reason I wanted to know about a method for controlled
> re-assignment
> > while both TServers are running is so that I can bring the tablets back
> to
> > the original TServer, thereby ensuring that each tablet is unavailable
> only
> > once during the entire rolling restart process. If there is no such
> method
> > currently, I'd be happy to file a jira.
>
> Multiple hostings for a tablet would be considered a critical bug --
> it should never happen. Balancing of tablets across tservers is
> something that the master is often doing. You shouldn't have to do
> anything but ensure the processes are running,
>
> >
> >
> > On Thu, Jan 16, 2014 at 6:21 PM, Eric Newton <er...@gmail.com>
> wrote:
> >
> >> And here's the command to do it:
> >>
> >> $ bin/accumulo admin stop server[:port]
> >>
> >>
> >> But recovery is pretty fast... killing tservers can be faster than
> doing an
> >> orderly shutdown.
> >>
> >> With 1.4, if I was restarting nodes on several racks, I would kill the
> >> loggers on a rack, flush all tables, and then restart all the tservers
> and
> >> loggers on that rack.  Rinse and repeat.
> >>
> >> -Eric
> >>
> >>
> >>
> >> On Thu, Jan 16, 2014 at 9:09 PM, John Vines <vi...@apache.org> wrote:
> >>
> >> > You can stop an individual tserver which will do a safe shutdown of it
> >> and
> >> > reassignment. However, this won't work between releases due to
> potential
> >> > version changes.
> >> >
> >> > Sent from my phone, please pardon the typos and brevity.
> >> > On Jan 16, 2014 8:47 PM, "Vikram Srivastava" <vi...@cloudera.com>
> >> wrote:
> >> >
> >> > > Hi,
> >> > >
> >> > > Is there a way to re-assign a tablet from one Tserver to another
> while
> >> > both
> >> > > are running, in a manner so as to cause minimum impact to client?
> >> > >
> >> > > My motivation for this is to use something that does that to do
> rolling
> >> > > restart of TServers.
> >> > >
> >> > > Thanks,
> >> > >
> >> > > Vikram
> >> > >
> >> >
> >>
>

Re: Accumulo rolling restart

Posted by Josh Elser <jo...@gmail.com>.
On Thu, Jan 16, 2014 at 10:00 PM, Vikram Srivastava
<vi...@cloudera.com> wrote:
> Thanks for the replies. Couple of follow up questions -
> 1. What would the client experience during safe shutdown? Both a new client
> trying to read a tablet on the TServer going down, and an existing client
> reading a table on the TServer that's going down.

Existing client would talk to master to figure out the new assignment.
New client would do the same. Both would poll the master waiting for
the new assignment.

> 2. The reason I wanted to know about a method for controlled re-assignment
> while both TServers are running is so that I can bring the tablets back to
> the original TServer, thereby ensuring that each tablet is unavailable only
> once during the entire rolling restart process. If there is no such method
> currently, I'd be happy to file a jira.

Multiple hostings for a tablet would be considered a critical bug --
it should never happen. Balancing of tablets across tservers is
something that the master is often doing. You shouldn't have to do
anything but ensure the processes are running,

>
>
> On Thu, Jan 16, 2014 at 6:21 PM, Eric Newton <er...@gmail.com> wrote:
>
>> And here's the command to do it:
>>
>> $ bin/accumulo admin stop server[:port]
>>
>>
>> But recovery is pretty fast... killing tservers can be faster than doing an
>> orderly shutdown.
>>
>> With 1.4, if I was restarting nodes on several racks, I would kill the
>> loggers on a rack, flush all tables, and then restart all the tservers and
>> loggers on that rack.  Rinse and repeat.
>>
>> -Eric
>>
>>
>>
>> On Thu, Jan 16, 2014 at 9:09 PM, John Vines <vi...@apache.org> wrote:
>>
>> > You can stop an individual tserver which will do a safe shutdown of it
>> and
>> > reassignment. However, this won't work between releases due to potential
>> > version changes.
>> >
>> > Sent from my phone, please pardon the typos and brevity.
>> > On Jan 16, 2014 8:47 PM, "Vikram Srivastava" <vi...@cloudera.com>
>> wrote:
>> >
>> > > Hi,
>> > >
>> > > Is there a way to re-assign a tablet from one Tserver to another while
>> > both
>> > > are running, in a manner so as to cause minimum impact to client?
>> > >
>> > > My motivation for this is to use something that does that to do rolling
>> > > restart of TServers.
>> > >
>> > > Thanks,
>> > >
>> > > Vikram
>> > >
>> >
>>

Re: Accumulo rolling restart

Posted by Vikram Srivastava <vi...@cloudera.com>.
Thanks for the replies. Couple of follow up questions -
1. What would the client experience during safe shutdown? Both a new client
trying to read a tablet on the TServer going down, and an existing client
reading a table on the TServer that's going down.
2. The reason I wanted to know about a method for controlled re-assignment
while both TServers are running is so that I can bring the tablets back to
the original TServer, thereby ensuring that each tablet is unavailable only
once during the entire rolling restart process. If there is no such method
currently, I'd be happy to file a jira.


On Thu, Jan 16, 2014 at 6:21 PM, Eric Newton <er...@gmail.com> wrote:

> And here's the command to do it:
>
> $ bin/accumulo admin stop server[:port]
>
>
> But recovery is pretty fast... killing tservers can be faster than doing an
> orderly shutdown.
>
> With 1.4, if I was restarting nodes on several racks, I would kill the
> loggers on a rack, flush all tables, and then restart all the tservers and
> loggers on that rack.  Rinse and repeat.
>
> -Eric
>
>
>
> On Thu, Jan 16, 2014 at 9:09 PM, John Vines <vi...@apache.org> wrote:
>
> > You can stop an individual tserver which will do a safe shutdown of it
> and
> > reassignment. However, this won't work between releases due to potential
> > version changes.
> >
> > Sent from my phone, please pardon the typos and brevity.
> > On Jan 16, 2014 8:47 PM, "Vikram Srivastava" <vi...@cloudera.com>
> wrote:
> >
> > > Hi,
> > >
> > > Is there a way to re-assign a tablet from one Tserver to another while
> > both
> > > are running, in a manner so as to cause minimum impact to client?
> > >
> > > My motivation for this is to use something that does that to do rolling
> > > restart of TServers.
> > >
> > > Thanks,
> > >
> > > Vikram
> > >
> >
>

Re: Accumulo rolling restart

Posted by Eric Newton <er...@gmail.com>.
And here's the command to do it:

$ bin/accumulo admin stop server[:port]


But recovery is pretty fast... killing tservers can be faster than doing an
orderly shutdown.

With 1.4, if I was restarting nodes on several racks, I would kill the
loggers on a rack, flush all tables, and then restart all the tservers and
loggers on that rack.  Rinse and repeat.

-Eric



On Thu, Jan 16, 2014 at 9:09 PM, John Vines <vi...@apache.org> wrote:

> You can stop an individual tserver which will do a safe shutdown of it and
> reassignment. However, this won't work between releases due to potential
> version changes.
>
> Sent from my phone, please pardon the typos and brevity.
> On Jan 16, 2014 8:47 PM, "Vikram Srivastava" <vi...@cloudera.com> wrote:
>
> > Hi,
> >
> > Is there a way to re-assign a tablet from one Tserver to another while
> both
> > are running, in a manner so as to cause minimum impact to client?
> >
> > My motivation for this is to use something that does that to do rolling
> > restart of TServers.
> >
> > Thanks,
> >
> > Vikram
> >
>

Re: Accumulo rolling restart

Posted by John Vines <vi...@apache.org>.
You can stop an individual tserver which will do a safe shutdown of it and
reassignment. However, this won't work between releases due to potential
version changes.

Sent from my phone, please pardon the typos and brevity.
On Jan 16, 2014 8:47 PM, "Vikram Srivastava" <vi...@cloudera.com> wrote:

> Hi,
>
> Is there a way to re-assign a tablet from one Tserver to another while both
> are running, in a manner so as to cause minimum impact to client?
>
> My motivation for this is to use something that does that to do rolling
> restart of TServers.
>
> Thanks,
>
> Vikram
>