You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Michael Dagaev <mi...@gmail.com> on 2009/02/24 14:50:13 UTC

Question on region server/data node restart

Hi, all

     As I understand, I can stop a region server and a data node in a cluster
"semi-transparently" for clients, i. e. the requests handled  by the
region server
at that time will fail, but cluster will be working.

If I start the data node and region server  I do not have to do anything to make
them work.

Is it correct ?

Thank you for your cooperation,
M.

Re: Question on region server/data node restart

Posted by Jean-Daniel Cryans <jd...@apache.org>.

Correction, I was suggesting 0.18.2 (the svn branch) since it has many fixes
that Michael would need and it won't break anything for him (as 0.19.0 will
do with MR jobs).

J-D

On Wed, Feb 25, 2009 at 1:33 AM, stack <st...@duboce.net> wrote:

> Michael, as J-D suggests above, can you update to 0.19.0 hbase?  Its better
> about all of this stuff -- though not as reactive as 0.20.0 will be.
> St.Ack
>
> On Tue, Feb 24, 2009 at 8:33 AM, Michael Dagaev <michael.dagaev@gmail.com
> >wrote:
>
> > No problem :)
> >
> > On Tue, Feb 24, 2009 at 6:30 PM, Jean-Daniel Cryans <jdcryans@apache.org
> >
> > wrote:
> > > Ok so that region server must have been holding .META., you will have
> to
> > > restart HBase.
> > >
> > > Sorry
> > >
> > > J-D
> > >
> > > On Tue, Feb 24, 2009 at 11:27 AM, Michael Dagaev
> > > <mi...@gmail.com>wrote:
> > >
> > >> Sorry, I mean that some requests fail when a region server is down in
> > >> Hbase 0.18.1,
> > >> which we are using now.
> > >>
> > >> Besides, when I started the stopped region server and stopped another
> > one,
> > >> not only "old" requests were stuck because of retries but new requests
> > >> (e.g.
> > >> issued by hbase shell) fail too.
> > >>
> > >> The master.jsp also fails with
> > >>
> > >> Trying to contact region server <...>:60020 for region .META.,,1, row
> > >> '', but failed after 10 attempts.
> > >> Exceptions: java.io.IOException: Call failed on local exception
> > >>
> > >> Thank you for your cooperation,
> > >> M.
> > >>
> > >> On Tue, Feb 24, 2009 at 6:06 PM, Jean-Daniel Cryans <
> > jdcryans@apache.org>
> > >> wrote:
> > >> > As I wrote, you should upgrade to 0.18 branch in SVN.
> > >> >
> > >> > J-D
> > >> >
> > >> > On Tue, Feb 24, 2009 at 11:04 AM, Michael Dagaev
> > >> > <mi...@gmail.com>wrote:
> > >> >
> > >> >> I do not if it was holding ROOT or META region.
> > >> >> It looks like requests may fail in Hbase 0.18 if a region server
> > stops.
> > >> >>
> > >> >> Thanks,
> > >> >> M.
> > >> >>
> > >> >> On Tue, Feb 24, 2009 at 5:40 PM, Jean-Daniel Cryans <
> > >> jdcryans@apache.org>
> > >> >> wrote:
> > >> >> > Well this should not happen like that. Was the region server
> > holding
> > >> the
> > >> >> > ROOT or META region? If so, well that's a bug corrected in 0.19.0
> > and
> > >> >> > branch-0.18. I suggest you upgrade to that version if you don't
> > want
> > >> to
> > >> >> > break your MR jobs.
> > >> >> >
> > >> >> > J-D
> > >> >> >
> > >> >> > On Tue, Feb 24, 2009 at 10:33 AM, Michael Dagaev
> > >> >> > <mi...@gmail.com>wrote:
> > >> >> >
> > >> >> >> What I see now is that the client gets an exception (see below)
> > once
> > >> a
> > >> >> >> region servers stops:
> > >> >> >>
> > >> >> >> org.apache.hadoop.hbase.client.NoServerForRegionException: No
> > server
> > >> >> >> address listed in .META.
> > >> >> >> ...
> > >> >> >> Caused by:
> > org.apache.hadoop.hbase.client.RetriesExhaustedException:
> > >> >> >> Trying to contact region server <region server>:60020 for region
> > >> >> >>
> > >> >> >> I guess the exception occurred since the region server is down.
> Is
> > it
> > >> >> >> correct?
> > >> >> >>
> > >> >> >> Thank you for your cooperation,
> > >> >> >> M.
> > >> >> >>
> > >> >> >> P. S. We are running version 0.18.1
> > >> >> >>
> > >> >> >> On Tue, Feb 24, 2009 at 5:07 PM, Jean-Daniel Cryans <
> > >> >> jdcryans@apache.org>
> > >> >> >> wrote:
> > >> >> >> > Correcting myself, no waiting time regards the time to figure
> > the
> > >> node
> > >> >> is
> > >> >> >> > dead. It will still have to fetch the region location in META.
> > >> >> >> >
> > >> >> >> > J-D
> > >> >> >> >
> > >> >> >> >
> > >> >> >> > On Tue, Feb 24, 2009 at 10:02 AM, Jean-Daniel Cryans <
> > >> >> >> jdcryans@apache.org>wrote:
> > >> >> >> >
> > >> >> >> >> Well if a region server dies instead of being cleanly shut
> > down,
> > >> it
> > >> >> >> takes
> > >> >> >> >> in the worst case 180 seconds (a region server lease length)
> > >> before
> > >> >> the
> > >> >> >> >> Master reassigns the regions. Clients trying to connect to
> that
> > >> >> server
> > >> >> >> will
> > >> >> >> >> take IIRC 10 seconds to figure the node is down then the time
> > to
> > >> >> >> communicate
> > >> >> >> >> with ROOT and META is under 1 sec. If META wasn't updated
> yet,
> > it
> > >> >> will
> > >> >> >> retry
> > >> >> >> >> all of that.
> > >> >> >> >>
> > >> >> >> >> In the next release (0.20.0), the master is notified by
> > Zookeeper
> > >> in
> > >> >> the
> > >> >> >> >> following seconds of a region server death and will proceed
> to
> > >> >> reassign
> > >> >> >> the
> > >> >> >> >> regions immediately.
> > >> >> >> >>
> > >> >> >> >> If the client don't have the region in cache and META is
> > updated
> > >> with
> > >> >> >> the
> > >> >> >> >> region server death, there will be no waiting time.
> > >> >> >> >>
> > >> >> >> >> J-D
> > >> >> >> >>
> > >> >> >> >>
> > >> >> >> >> On Tue, Feb 24, 2009 at 9:49 AM, Michael Dagaev <
> > >> >> >> michael.dagaev@gmail.com>wrote:
> > >> >> >> >>
> > >> >> >> >>> Thanks, now it is clear.
> > >> >> >> >>>
> > >> >> >> >>> However, if a region server is down, it takes a lot of time
> to
> > >> retry
> > >> >> >> >>> first,
> > >> >> >> >>> to rescan the META region when the retries fail, rescan
> ROOT,
> > >> etc.
> > >> >> to
> > >> >> >> >>> get eventually to another region server, which will handle
> the
> > >> >> request.
> > >> >> >> >>> Is it correct ?
> > >> >> >> >>>
> > >> >> >> >>> On Tue, Feb 24, 2009 at 4:36 PM, Jean-Daniel Cryans <
> > >> >> >> jdcryans@apache.org>
> > >> >> >> >>> wrote:
> > >> >> >> >>> > This is why we have a META table, it holds the location
> > info.
> > >> See
> > >> >> >> >>> >
> > http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture#client
> > >> >> >> >>> >
> > >> >> >> >>> > J-D
> > >> >> >> >>> >
> > >> >> >> >>> > On Tue, Feb 24, 2009 at 9:28 AM, Michael Dagaev <
> > >> >> >> >>> michael.dagaev@gmail.com>wrote:
> > >> >> >> >>> >
> > >> >> >> >>> >> Thanks, Jean-Daniel.
> > >> >> >> >>> >>
> > >> >> >> >>> >> I did run hbase-daemon stop regionserver and start
> > >> regionserver
> > >> >> >> >>> >> and saw the client retrying to connect to the restarted
> > region
> > >> >> >> server.
> > >> >> >> >>> >>
> > >> >> >> >>> >> How does it know to connect to another region server ?
> > Maybe
> > >> it
> > >> >> >> stops
> > >> >> >> >>> >> retrying, asks master, and get another region server to
> > >> connect
> > >> >> to.
> > >> >> >> >>> >> Is it correct ?
> > >> >> >> >>> >>
> > >> >> >> >>> >> Thank you for your cooperation,
> > >> >> >> >>> >> M.
> > >> >> >> >>> >>
> > >> >> >> >>> >> On Tue, Feb 24, 2009 at 3:56 PM, Jean-Daniel Cryans <
> > >> >> >> >>> jdcryans@apache.org>
> > >> >> >> >>> >> wrote:
> > >> >> >> >>> >> > Michael,
> > >> >> >> >>> >> >
> > >> >> >> >>> >> > Regards stopping those nodes, do it using
> > >> >> >> hadoop-daemon/hbase-daemon
> > >> >> >> >>> to
> > >> >> >> >>> >> stop
> > >> >> >> >>> >> > them cleanly. Requests from the clients will not
> "fail",
> > >> they
> > >> >> will
> > >> >> >> >>> simply
> > >> >> >> >>> >> be
> > >> >> >> >>> >> > told to look elsewhere for the regions they have in
> > cache.
> > >> >> Unless
> > >> >> >> you
> > >> >> >> >>> >> only
> > >> >> >> >>> >> > have 1 region server...
> > >> >> >> >>> >> >
> > >> >> >> >>> >> > Regards starting the nodes, apart from the usual
> > >> >> >> >>> >> hadoop-daemon/hbase-daemon,
> > >> >> >> >>> >> > no.
> > >> >> >> >>> >> >
> > >> >> >> >>> >> > J-D
> > >> >> >> >>> >> >
> > >> >> >> >>> >> > On Tue, Feb 24, 2009 at 8:50 AM, Michael Dagaev <
> > >> >> >> >>> >> michael.dagaev@gmail.com>wrote:
> > >> >> >> >>> >> >
> > >> >> >> >>> >> >> Hi, all
> > >> >> >> >>> >> >>
> > >> >> >> >>> >> >>     As I understand, I can stop a region server and a
> > data
> > >> >> node
> > >> >> >> in a
> > >> >> >> >>> >> >> cluster
> > >> >> >> >>> >> >> "semi-transparently" for clients, i. e. the requests
> > >> handled
> > >> >>  by
> > >> >> >> the
> > >> >> >> >>> >> >> region server
> > >> >> >> >>> >> >> at that time will fail, but cluster will be working.
> > >> >> >> >>> >> >>
> > >> >> >> >>> >> >> If I start the data node and region server  I do not
> > have
> > >> to
> > >> >> do
> > >> >> >> >>> anything
> > >> >> >> >>> >> to
> > >> >> >> >>> >> >> make
> > >> >> >> >>> >> >> them work.
> > >> >> >> >>> >> >>
> > >> >> >> >>> >> >> Is it correct ?
> > >> >> >> >>> >> >>
> > >> >> >> >>> >> >> Thank you for your cooperation,
> > >> >> >> >>> >> >> M.
> > >> >> >> >>> >> >>
> > >> >> >> >>> >> >
> > >> >> >> >>> >>
> > >> >> >> >>> >
> > >> >> >> >>>
> > >> >> >> >>
> > >> >> >> >>
> > >> >> >> >
> > >> >> >>
> > >> >> >
> > >> >>
> > >> >
> > >>
> > >
> >
>

Re: Question on region server/data node restart

Posted by Michael Dagaev <mi...@gmail.com>.

     Thank you for reminding, Stack.

I will look through the release notes and
probably raise the issue today on the team meeting.

M.

On Wed, Feb 25, 2009 at 8:33 AM, stack <st...@duboce.net> wrote:
> Michael, as J-D suggests above, can you update to 0.19.0 hbase?  Its better
> about all of this stuff -- though not as reactive as 0.20.0 will be.
> St.Ack

Re: Question on region server/data node restart

Posted by stack <st...@duboce.net>.

Michael, as J-D suggests above, can you update to 0.19.0 hbase?  Its better
about all of this stuff -- though not as reactive as 0.20.0 will be.
St.Ack

On Tue, Feb 24, 2009 at 8:33 AM, Michael Dagaev <mi...@gmail.com>wrote:

> No problem :)
>
> On Tue, Feb 24, 2009 at 6:30 PM, Jean-Daniel Cryans <jd...@apache.org>
> wrote:
> > Ok so that region server must have been holding .META., you will have to
> > restart HBase.
> >
> > Sorry
> >
> > J-D
> >
> > On Tue, Feb 24, 2009 at 11:27 AM, Michael Dagaev
> > <mi...@gmail.com>wrote:
> >
> >> Sorry, I mean that some requests fail when a region server is down in
> >> Hbase 0.18.1,
> >> which we are using now.
> >>
> >> Besides, when I started the stopped region server and stopped another
> one,
> >> not only "old" requests were stuck because of retries but new requests
> >> (e.g.
> >> issued by hbase shell) fail too.
> >>
> >> The master.jsp also fails with
> >>
> >> Trying to contact region server <...>:60020 for region .META.,,1, row
> >> '', but failed after 10 attempts.
> >> Exceptions: java.io.IOException: Call failed on local exception
> >>
> >> Thank you for your cooperation,
> >> M.
> >>
> >> On Tue, Feb 24, 2009 at 6:06 PM, Jean-Daniel Cryans <
> jdcryans@apache.org>
> >> wrote:
> >> > As I wrote, you should upgrade to 0.18 branch in SVN.
> >> >
> >> > J-D
> >> >
> >> > On Tue, Feb 24, 2009 at 11:04 AM, Michael Dagaev
> >> > <mi...@gmail.com>wrote:
> >> >
> >> >> I do not if it was holding ROOT or META region.
> >> >> It looks like requests may fail in Hbase 0.18 if a region server
> stops.
> >> >>
> >> >> Thanks,
> >> >> M.
> >> >>
> >> >> On Tue, Feb 24, 2009 at 5:40 PM, Jean-Daniel Cryans <
> >> jdcryans@apache.org>
> >> >> wrote:
> >> >> > Well this should not happen like that. Was the region server
> holding
> >> the
> >> >> > ROOT or META region? If so, well that's a bug corrected in 0.19.0
> and
> >> >> > branch-0.18. I suggest you upgrade to that version if you don't
> want
> >> to
> >> >> > break your MR jobs.
> >> >> >
> >> >> > J-D
> >> >> >
> >> >> > On Tue, Feb 24, 2009 at 10:33 AM, Michael Dagaev
> >> >> > <mi...@gmail.com>wrote:
> >> >> >
> >> >> >> What I see now is that the client gets an exception (see below)
> once
> >> a
> >> >> >> region servers stops:
> >> >> >>
> >> >> >> org.apache.hadoop.hbase.client.NoServerForRegionException: No
> server
> >> >> >> address listed in .META.
> >> >> >> ...
> >> >> >> Caused by:
> org.apache.hadoop.hbase.client.RetriesExhaustedException:
> >> >> >> Trying to contact region server <region server>:60020 for region
> >> >> >>
> >> >> >> I guess the exception occurred since the region server is down. Is
> it
> >> >> >> correct?
> >> >> >>
> >> >> >> Thank you for your cooperation,
> >> >> >> M.
> >> >> >>
> >> >> >> P. S. We are running version 0.18.1
> >> >> >>
> >> >> >> On Tue, Feb 24, 2009 at 5:07 PM, Jean-Daniel Cryans <
> >> >> jdcryans@apache.org>
> >> >> >> wrote:
> >> >> >> > Correcting myself, no waiting time regards the time to figure
> the
> >> node
> >> >> is
> >> >> >> > dead. It will still have to fetch the region location in META.
> >> >> >> >
> >> >> >> > J-D
> >> >> >> >
> >> >> >> >
> >> >> >> > On Tue, Feb 24, 2009 at 10:02 AM, Jean-Daniel Cryans <
> >> >> >> jdcryans@apache.org>wrote:
> >> >> >> >
> >> >> >> >> Well if a region server dies instead of being cleanly shut
> down,
> >> it
> >> >> >> takes
> >> >> >> >> in the worst case 180 seconds (a region server lease length)
> >> before
> >> >> the
> >> >> >> >> Master reassigns the regions. Clients trying to connect to that
> >> >> server
> >> >> >> will
> >> >> >> >> take IIRC 10 seconds to figure the node is down then the time
> to
> >> >> >> communicate
> >> >> >> >> with ROOT and META is under 1 sec. If META wasn't updated yet,
> it
> >> >> will
> >> >> >> retry
> >> >> >> >> all of that.
> >> >> >> >>
> >> >> >> >> In the next release (0.20.0), the master is notified by
> Zookeeper
> >> in
> >> >> the
> >> >> >> >> following seconds of a region server death and will proceed to
> >> >> reassign
> >> >> >> the
> >> >> >> >> regions immediately.
> >> >> >> >>
> >> >> >> >> If the client don't have the region in cache and META is
> updated
> >> with
> >> >> >> the
> >> >> >> >> region server death, there will be no waiting time.
> >> >> >> >>
> >> >> >> >> J-D
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> On Tue, Feb 24, 2009 at 9:49 AM, Michael Dagaev <
> >> >> >> michael.dagaev@gmail.com>wrote:
> >> >> >> >>
> >> >> >> >>> Thanks, now it is clear.
> >> >> >> >>>
> >> >> >> >>> However, if a region server is down, it takes a lot of time to
> >> retry
> >> >> >> >>> first,
> >> >> >> >>> to rescan the META region when the retries fail, rescan ROOT,
> >> etc.
> >> >> to
> >> >> >> >>> get eventually to another region server, which will handle the
> >> >> request.
> >> >> >> >>> Is it correct ?
> >> >> >> >>>
> >> >> >> >>> On Tue, Feb 24, 2009 at 4:36 PM, Jean-Daniel Cryans <
> >> >> >> jdcryans@apache.org>
> >> >> >> >>> wrote:
> >> >> >> >>> > This is why we have a META table, it holds the location
> info.
> >> See
> >> >> >> >>> >
> http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture#client
> >> >> >> >>> >
> >> >> >> >>> > J-D
> >> >> >> >>> >
> >> >> >> >>> > On Tue, Feb 24, 2009 at 9:28 AM, Michael Dagaev <
> >> >> >> >>> michael.dagaev@gmail.com>wrote:
> >> >> >> >>> >
> >> >> >> >>> >> Thanks, Jean-Daniel.
> >> >> >> >>> >>
> >> >> >> >>> >> I did run hbase-daemon stop regionserver and start
> >> regionserver
> >> >> >> >>> >> and saw the client retrying to connect to the restarted
> region
> >> >> >> server.
> >> >> >> >>> >>
> >> >> >> >>> >> How does it know to connect to another region server ?
> Maybe
> >> it
> >> >> >> stops
> >> >> >> >>> >> retrying, asks master, and get another region server to
> >> connect
> >> >> to.
> >> >> >> >>> >> Is it correct ?
> >> >> >> >>> >>
> >> >> >> >>> >> Thank you for your cooperation,
> >> >> >> >>> >> M.
> >> >> >> >>> >>
> >> >> >> >>> >> On Tue, Feb 24, 2009 at 3:56 PM, Jean-Daniel Cryans <
> >> >> >> >>> jdcryans@apache.org>
> >> >> >> >>> >> wrote:
> >> >> >> >>> >> > Michael,
> >> >> >> >>> >> >
> >> >> >> >>> >> > Regards stopping those nodes, do it using
> >> >> >> hadoop-daemon/hbase-daemon
> >> >> >> >>> to
> >> >> >> >>> >> stop
> >> >> >> >>> >> > them cleanly. Requests from the clients will not "fail",
> >> they
> >> >> will
> >> >> >> >>> simply
> >> >> >> >>> >> be
> >> >> >> >>> >> > told to look elsewhere for the regions they have in
> cache.
> >> >> Unless
> >> >> >> you
> >> >> >> >>> >> only
> >> >> >> >>> >> > have 1 region server...
> >> >> >> >>> >> >
> >> >> >> >>> >> > Regards starting the nodes, apart from the usual
> >> >> >> >>> >> hadoop-daemon/hbase-daemon,
> >> >> >> >>> >> > no.
> >> >> >> >>> >> >
> >> >> >> >>> >> > J-D
> >> >> >> >>> >> >
> >> >> >> >>> >> > On Tue, Feb 24, 2009 at 8:50 AM, Michael Dagaev <
> >> >> >> >>> >> michael.dagaev@gmail.com>wrote:
> >> >> >> >>> >> >
> >> >> >> >>> >> >> Hi, all
> >> >> >> >>> >> >>
> >> >> >> >>> >> >>     As I understand, I can stop a region server and a
> data
> >> >> node
> >> >> >> in a
> >> >> >> >>> >> >> cluster
> >> >> >> >>> >> >> "semi-transparently" for clients, i. e. the requests
> >> handled
> >> >>  by
> >> >> >> the
> >> >> >> >>> >> >> region server
> >> >> >> >>> >> >> at that time will fail, but cluster will be working.
> >> >> >> >>> >> >>
> >> >> >> >>> >> >> If I start the data node and region server  I do not
> have
> >> to
> >> >> do
> >> >> >> >>> anything
> >> >> >> >>> >> to
> >> >> >> >>> >> >> make
> >> >> >> >>> >> >> them work.
> >> >> >> >>> >> >>
> >> >> >> >>> >> >> Is it correct ?
> >> >> >> >>> >> >>
> >> >> >> >>> >> >> Thank you for your cooperation,
> >> >> >> >>> >> >> M.
> >> >> >> >>> >> >>
> >> >> >> >>> >> >
> >> >> >> >>> >>
> >> >> >> >>> >
> >> >> >> >>>
> >> >> >> >>
> >> >> >> >>
> >> >> >> >
> >> >> >>
> >> >> >
> >> >>
> >> >
> >>
> >
>

Re: Question on region server/data node restart

Posted by Michael Dagaev <mi...@gmail.com>.

No problem :)

On Tue, Feb 24, 2009 at 6:30 PM, Jean-Daniel Cryans <jd...@apache.org> wrote:
> Ok so that region server must have been holding .META., you will have to
> restart HBase.
>
> Sorry
>
> J-D
>
> On Tue, Feb 24, 2009 at 11:27 AM, Michael Dagaev
> <mi...@gmail.com>wrote:
>
>> Sorry, I mean that some requests fail when a region server is down in
>> Hbase 0.18.1,
>> which we are using now.
>>
>> Besides, when I started the stopped region server and stopped another one,
>> not only "old" requests were stuck because of retries but new requests
>> (e.g.
>> issued by hbase shell) fail too.
>>
>> The master.jsp also fails with
>>
>> Trying to contact region server <...>:60020 for region .META.,,1, row
>> '', but failed after 10 attempts.
>> Exceptions: java.io.IOException: Call failed on local exception
>>
>> Thank you for your cooperation,
>> M.
>>
>> On Tue, Feb 24, 2009 at 6:06 PM, Jean-Daniel Cryans <jd...@apache.org>
>> wrote:
>> > As I wrote, you should upgrade to 0.18 branch in SVN.
>> >
>> > J-D
>> >
>> > On Tue, Feb 24, 2009 at 11:04 AM, Michael Dagaev
>> > <mi...@gmail.com>wrote:
>> >
>> >> I do not if it was holding ROOT or META region.
>> >> It looks like requests may fail in Hbase 0.18 if a region server stops.
>> >>
>> >> Thanks,
>> >> M.
>> >>
>> >> On Tue, Feb 24, 2009 at 5:40 PM, Jean-Daniel Cryans <
>> jdcryans@apache.org>
>> >> wrote:
>> >> > Well this should not happen like that. Was the region server holding
>> the
>> >> > ROOT or META region? If so, well that's a bug corrected in 0.19.0 and
>> >> > branch-0.18. I suggest you upgrade to that version if you don't want
>> to
>> >> > break your MR jobs.
>> >> >
>> >> > J-D
>> >> >
>> >> > On Tue, Feb 24, 2009 at 10:33 AM, Michael Dagaev
>> >> > <mi...@gmail.com>wrote:
>> >> >
>> >> >> What I see now is that the client gets an exception (see below) once
>> a
>> >> >> region servers stops:
>> >> >>
>> >> >> org.apache.hadoop.hbase.client.NoServerForRegionException: No server
>> >> >> address listed in .META.
>> >> >> ...
>> >> >> Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException:
>> >> >> Trying to contact region server <region server>:60020 for region
>> >> >>
>> >> >> I guess the exception occurred since the region server is down. Is it
>> >> >> correct?
>> >> >>
>> >> >> Thank you for your cooperation,
>> >> >> M.
>> >> >>
>> >> >> P. S. We are running version 0.18.1
>> >> >>
>> >> >> On Tue, Feb 24, 2009 at 5:07 PM, Jean-Daniel Cryans <
>> >> jdcryans@apache.org>
>> >> >> wrote:
>> >> >> > Correcting myself, no waiting time regards the time to figure the
>> node
>> >> is
>> >> >> > dead. It will still have to fetch the region location in META.
>> >> >> >
>> >> >> > J-D
>> >> >> >
>> >> >> >
>> >> >> > On Tue, Feb 24, 2009 at 10:02 AM, Jean-Daniel Cryans <
>> >> >> jdcryans@apache.org>wrote:
>> >> >> >
>> >> >> >> Well if a region server dies instead of being cleanly shut down,
>> it
>> >> >> takes
>> >> >> >> in the worst case 180 seconds (a region server lease length)
>> before
>> >> the
>> >> >> >> Master reassigns the regions. Clients trying to connect to that
>> >> server
>> >> >> will
>> >> >> >> take IIRC 10 seconds to figure the node is down then the time to
>> >> >> communicate
>> >> >> >> with ROOT and META is under 1 sec. If META wasn't updated yet, it
>> >> will
>> >> >> retry
>> >> >> >> all of that.
>> >> >> >>
>> >> >> >> In the next release (0.20.0), the master is notified by Zookeeper
>> in
>> >> the
>> >> >> >> following seconds of a region server death and will proceed to
>> >> reassign
>> >> >> the
>> >> >> >> regions immediately.
>> >> >> >>
>> >> >> >> If the client don't have the region in cache and META is updated
>> with
>> >> >> the
>> >> >> >> region server death, there will be no waiting time.
>> >> >> >>
>> >> >> >> J-D
>> >> >> >>
>> >> >> >>
>> >> >> >> On Tue, Feb 24, 2009 at 9:49 AM, Michael Dagaev <
>> >> >> michael.dagaev@gmail.com>wrote:
>> >> >> >>
>> >> >> >>> Thanks, now it is clear.
>> >> >> >>>
>> >> >> >>> However, if a region server is down, it takes a lot of time to
>> retry
>> >> >> >>> first,
>> >> >> >>> to rescan the META region when the retries fail, rescan ROOT,
>> etc.
>> >> to
>> >> >> >>> get eventually to another region server, which will handle the
>> >> request.
>> >> >> >>> Is it correct ?
>> >> >> >>>
>> >> >> >>> On Tue, Feb 24, 2009 at 4:36 PM, Jean-Daniel Cryans <
>> >> >> jdcryans@apache.org>
>> >> >> >>> wrote:
>> >> >> >>> > This is why we have a META table, it holds the location info.
>> See
>> >> >> >>> > http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture#client
>> >> >> >>> >
>> >> >> >>> > J-D
>> >> >> >>> >
>> >> >> >>> > On Tue, Feb 24, 2009 at 9:28 AM, Michael Dagaev <
>> >> >> >>> michael.dagaev@gmail.com>wrote:
>> >> >> >>> >
>> >> >> >>> >> Thanks, Jean-Daniel.
>> >> >> >>> >>
>> >> >> >>> >> I did run hbase-daemon stop regionserver and start
>> regionserver
>> >> >> >>> >> and saw the client retrying to connect to the restarted region
>> >> >> server.
>> >> >> >>> >>
>> >> >> >>> >> How does it know to connect to another region server ? Maybe
>> it
>> >> >> stops
>> >> >> >>> >> retrying, asks master, and get another region server to
>> connect
>> >> to.
>> >> >> >>> >> Is it correct ?
>> >> >> >>> >>
>> >> >> >>> >> Thank you for your cooperation,
>> >> >> >>> >> M.
>> >> >> >>> >>
>> >> >> >>> >> On Tue, Feb 24, 2009 at 3:56 PM, Jean-Daniel Cryans <
>> >> >> >>> jdcryans@apache.org>
>> >> >> >>> >> wrote:
>> >> >> >>> >> > Michael,
>> >> >> >>> >> >
>> >> >> >>> >> > Regards stopping those nodes, do it using
>> >> >> hadoop-daemon/hbase-daemon
>> >> >> >>> to
>> >> >> >>> >> stop
>> >> >> >>> >> > them cleanly. Requests from the clients will not "fail",
>> they
>> >> will
>> >> >> >>> simply
>> >> >> >>> >> be
>> >> >> >>> >> > told to look elsewhere for the regions they have in cache.
>> >> Unless
>> >> >> you
>> >> >> >>> >> only
>> >> >> >>> >> > have 1 region server...
>> >> >> >>> >> >
>> >> >> >>> >> > Regards starting the nodes, apart from the usual
>> >> >> >>> >> hadoop-daemon/hbase-daemon,
>> >> >> >>> >> > no.
>> >> >> >>> >> >
>> >> >> >>> >> > J-D
>> >> >> >>> >> >
>> >> >> >>> >> > On Tue, Feb 24, 2009 at 8:50 AM, Michael Dagaev <
>> >> >> >>> >> michael.dagaev@gmail.com>wrote:
>> >> >> >>> >> >
>> >> >> >>> >> >> Hi, all
>> >> >> >>> >> >>
>> >> >> >>> >> >>     As I understand, I can stop a region server and a data
>> >> node
>> >> >> in a
>> >> >> >>> >> >> cluster
>> >> >> >>> >> >> "semi-transparently" for clients, i. e. the requests
>> handled
>> >>  by
>> >> >> the
>> >> >> >>> >> >> region server
>> >> >> >>> >> >> at that time will fail, but cluster will be working.
>> >> >> >>> >> >>
>> >> >> >>> >> >> If I start the data node and region server  I do not have
>> to
>> >> do
>> >> >> >>> anything
>> >> >> >>> >> to
>> >> >> >>> >> >> make
>> >> >> >>> >> >> them work.
>> >> >> >>> >> >>
>> >> >> >>> >> >> Is it correct ?
>> >> >> >>> >> >>
>> >> >> >>> >> >> Thank you for your cooperation,
>> >> >> >>> >> >> M.
>> >> >> >>> >> >>
>> >> >> >>> >> >
>> >> >> >>> >>
>> >> >> >>> >
>> >> >> >>>
>> >> >> >>
>> >> >> >>
>> >> >> >
>> >> >>
>> >> >
>> >>
>> >
>>
>

Re: Question on region server/data node restart

Posted by Jean-Daniel Cryans <jd...@apache.org>.

Ok so that region server must have been holding .META., you will have to
restart HBase.

Sorry

J-D

On Tue, Feb 24, 2009 at 11:27 AM, Michael Dagaev
<mi...@gmail.com>wrote:

> Sorry, I mean that some requests fail when a region server is down in
> Hbase 0.18.1,
> which we are using now.
>
> Besides, when I started the stopped region server and stopped another one,
> not only "old" requests were stuck because of retries but new requests
> (e.g.
> issued by hbase shell) fail too.
>
> The master.jsp also fails with
>
> Trying to contact region server <...>:60020 for region .META.,,1, row
> '', but failed after 10 attempts.
> Exceptions: java.io.IOException: Call failed on local exception
>
> Thank you for your cooperation,
> M.
>
> On Tue, Feb 24, 2009 at 6:06 PM, Jean-Daniel Cryans <jd...@apache.org>
> wrote:
> > As I wrote, you should upgrade to 0.18 branch in SVN.
> >
> > J-D
> >
> > On Tue, Feb 24, 2009 at 11:04 AM, Michael Dagaev
> > <mi...@gmail.com>wrote:
> >
> >> I do not if it was holding ROOT or META region.
> >> It looks like requests may fail in Hbase 0.18 if a region server stops.
> >>
> >> Thanks,
> >> M.
> >>
> >> On Tue, Feb 24, 2009 at 5:40 PM, Jean-Daniel Cryans <
> jdcryans@apache.org>
> >> wrote:
> >> > Well this should not happen like that. Was the region server holding
> the
> >> > ROOT or META region? If so, well that's a bug corrected in 0.19.0 and
> >> > branch-0.18. I suggest you upgrade to that version if you don't want
> to
> >> > break your MR jobs.
> >> >
> >> > J-D
> >> >
> >> > On Tue, Feb 24, 2009 at 10:33 AM, Michael Dagaev
> >> > <mi...@gmail.com>wrote:
> >> >
> >> >> What I see now is that the client gets an exception (see below) once
> a
> >> >> region servers stops:
> >> >>
> >> >> org.apache.hadoop.hbase.client.NoServerForRegionException: No server
> >> >> address listed in .META.
> >> >> ...
> >> >> Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException:
> >> >> Trying to contact region server <region server>:60020 for region
> >> >>
> >> >> I guess the exception occurred since the region server is down. Is it
> >> >> correct?
> >> >>
> >> >> Thank you for your cooperation,
> >> >> M.
> >> >>
> >> >> P. S. We are running version 0.18.1
> >> >>
> >> >> On Tue, Feb 24, 2009 at 5:07 PM, Jean-Daniel Cryans <
> >> jdcryans@apache.org>
> >> >> wrote:
> >> >> > Correcting myself, no waiting time regards the time to figure the
> node
> >> is
> >> >> > dead. It will still have to fetch the region location in META.
> >> >> >
> >> >> > J-D
> >> >> >
> >> >> >
> >> >> > On Tue, Feb 24, 2009 at 10:02 AM, Jean-Daniel Cryans <
> >> >> jdcryans@apache.org>wrote:
> >> >> >
> >> >> >> Well if a region server dies instead of being cleanly shut down,
> it
> >> >> takes
> >> >> >> in the worst case 180 seconds (a region server lease length)
> before
> >> the
> >> >> >> Master reassigns the regions. Clients trying to connect to that
> >> server
> >> >> will
> >> >> >> take IIRC 10 seconds to figure the node is down then the time to
> >> >> communicate
> >> >> >> with ROOT and META is under 1 sec. If META wasn't updated yet, it
> >> will
> >> >> retry
> >> >> >> all of that.
> >> >> >>
> >> >> >> In the next release (0.20.0), the master is notified by Zookeeper
> in
> >> the
> >> >> >> following seconds of a region server death and will proceed to
> >> reassign
> >> >> the
> >> >> >> regions immediately.
> >> >> >>
> >> >> >> If the client don't have the region in cache and META is updated
> with
> >> >> the
> >> >> >> region server death, there will be no waiting time.
> >> >> >>
> >> >> >> J-D
> >> >> >>
> >> >> >>
> >> >> >> On Tue, Feb 24, 2009 at 9:49 AM, Michael Dagaev <
> >> >> michael.dagaev@gmail.com>wrote:
> >> >> >>
> >> >> >>> Thanks, now it is clear.
> >> >> >>>
> >> >> >>> However, if a region server is down, it takes a lot of time to
> retry
> >> >> >>> first,
> >> >> >>> to rescan the META region when the retries fail, rescan ROOT,
> etc.
> >> to
> >> >> >>> get eventually to another region server, which will handle the
> >> request.
> >> >> >>> Is it correct ?
> >> >> >>>
> >> >> >>> On Tue, Feb 24, 2009 at 4:36 PM, Jean-Daniel Cryans <
> >> >> jdcryans@apache.org>
> >> >> >>> wrote:
> >> >> >>> > This is why we have a META table, it holds the location info.
> See
> >> >> >>> > http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture#client
> >> >> >>> >
> >> >> >>> > J-D
> >> >> >>> >
> >> >> >>> > On Tue, Feb 24, 2009 at 9:28 AM, Michael Dagaev <
> >> >> >>> michael.dagaev@gmail.com>wrote:
> >> >> >>> >
> >> >> >>> >> Thanks, Jean-Daniel.
> >> >> >>> >>
> >> >> >>> >> I did run hbase-daemon stop regionserver and start
> regionserver
> >> >> >>> >> and saw the client retrying to connect to the restarted region
> >> >> server.
> >> >> >>> >>
> >> >> >>> >> How does it know to connect to another region server ? Maybe
> it
> >> >> stops
> >> >> >>> >> retrying, asks master, and get another region server to
> connect
> >> to.
> >> >> >>> >> Is it correct ?
> >> >> >>> >>
> >> >> >>> >> Thank you for your cooperation,
> >> >> >>> >> M.
> >> >> >>> >>
> >> >> >>> >> On Tue, Feb 24, 2009 at 3:56 PM, Jean-Daniel Cryans <
> >> >> >>> jdcryans@apache.org>
> >> >> >>> >> wrote:
> >> >> >>> >> > Michael,
> >> >> >>> >> >
> >> >> >>> >> > Regards stopping those nodes, do it using
> >> >> hadoop-daemon/hbase-daemon
> >> >> >>> to
> >> >> >>> >> stop
> >> >> >>> >> > them cleanly. Requests from the clients will not "fail",
> they
> >> will
> >> >> >>> simply
> >> >> >>> >> be
> >> >> >>> >> > told to look elsewhere for the regions they have in cache.
> >> Unless
> >> >> you
> >> >> >>> >> only
> >> >> >>> >> > have 1 region server...
> >> >> >>> >> >
> >> >> >>> >> > Regards starting the nodes, apart from the usual
> >> >> >>> >> hadoop-daemon/hbase-daemon,
> >> >> >>> >> > no.
> >> >> >>> >> >
> >> >> >>> >> > J-D
> >> >> >>> >> >
> >> >> >>> >> > On Tue, Feb 24, 2009 at 8:50 AM, Michael Dagaev <
> >> >> >>> >> michael.dagaev@gmail.com>wrote:
> >> >> >>> >> >
> >> >> >>> >> >> Hi, all
> >> >> >>> >> >>
> >> >> >>> >> >>     As I understand, I can stop a region server and a data
> >> node
> >> >> in a
> >> >> >>> >> >> cluster
> >> >> >>> >> >> "semi-transparently" for clients, i. e. the requests
> handled
> >>  by
> >> >> the
> >> >> >>> >> >> region server
> >> >> >>> >> >> at that time will fail, but cluster will be working.
> >> >> >>> >> >>
> >> >> >>> >> >> If I start the data node and region server  I do not have
> to
> >> do
> >> >> >>> anything
> >> >> >>> >> to
> >> >> >>> >> >> make
> >> >> >>> >> >> them work.
> >> >> >>> >> >>
> >> >> >>> >> >> Is it correct ?
> >> >> >>> >> >>
> >> >> >>> >> >> Thank you for your cooperation,
> >> >> >>> >> >> M.
> >> >> >>> >> >>
> >> >> >>> >> >
> >> >> >>> >>
> >> >> >>> >
> >> >> >>>
> >> >> >>
> >> >> >>
> >> >> >
> >> >>
> >> >
> >>
> >
>

Re: Question on region server/data node restart

Posted by Michael Dagaev <mi...@gmail.com>.

Sorry, I mean that some requests fail when a region server is down in
Hbase 0.18.1,
which we are using now.

Besides, when I started the stopped region server and stopped another one,
not only "old" requests were stuck because of retries but new requests (e.g.
issued by hbase shell) fail too.

The master.jsp also fails with

Trying to contact region server <...>:60020 for region .META.,,1, row
'', but failed after 10 attempts.
Exceptions: java.io.IOException: Call failed on local exception

Thank you for your cooperation,
M.

On Tue, Feb 24, 2009 at 6:06 PM, Jean-Daniel Cryans <jd...@apache.org> wrote:
> As I wrote, you should upgrade to 0.18 branch in SVN.
>
> J-D
>
> On Tue, Feb 24, 2009 at 11:04 AM, Michael Dagaev
> <mi...@gmail.com>wrote:
>
>> I do not if it was holding ROOT or META region.
>> It looks like requests may fail in Hbase 0.18 if a region server stops.
>>
>> Thanks,
>> M.
>>
>> On Tue, Feb 24, 2009 at 5:40 PM, Jean-Daniel Cryans <jd...@apache.org>
>> wrote:
>> > Well this should not happen like that. Was the region server holding the
>> > ROOT or META region? If so, well that's a bug corrected in 0.19.0 and
>> > branch-0.18. I suggest you upgrade to that version if you don't want to
>> > break your MR jobs.
>> >
>> > J-D
>> >
>> > On Tue, Feb 24, 2009 at 10:33 AM, Michael Dagaev
>> > <mi...@gmail.com>wrote:
>> >
>> >> What I see now is that the client gets an exception (see below) once a
>> >> region servers stops:
>> >>
>> >> org.apache.hadoop.hbase.client.NoServerForRegionException: No server
>> >> address listed in .META.
>> >> ...
>> >> Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException:
>> >> Trying to contact region server <region server>:60020 for region
>> >>
>> >> I guess the exception occurred since the region server is down. Is it
>> >> correct?
>> >>
>> >> Thank you for your cooperation,
>> >> M.
>> >>
>> >> P. S. We are running version 0.18.1
>> >>
>> >> On Tue, Feb 24, 2009 at 5:07 PM, Jean-Daniel Cryans <
>> jdcryans@apache.org>
>> >> wrote:
>> >> > Correcting myself, no waiting time regards the time to figure the node
>> is
>> >> > dead. It will still have to fetch the region location in META.
>> >> >
>> >> > J-D
>> >> >
>> >> >
>> >> > On Tue, Feb 24, 2009 at 10:02 AM, Jean-Daniel Cryans <
>> >> jdcryans@apache.org>wrote:
>> >> >
>> >> >> Well if a region server dies instead of being cleanly shut down, it
>> >> takes
>> >> >> in the worst case 180 seconds (a region server lease length) before
>> the
>> >> >> Master reassigns the regions. Clients trying to connect to that
>> server
>> >> will
>> >> >> take IIRC 10 seconds to figure the node is down then the time to
>> >> communicate
>> >> >> with ROOT and META is under 1 sec. If META wasn't updated yet, it
>> will
>> >> retry
>> >> >> all of that.
>> >> >>
>> >> >> In the next release (0.20.0), the master is notified by Zookeeper in
>> the
>> >> >> following seconds of a region server death and will proceed to
>> reassign
>> >> the
>> >> >> regions immediately.
>> >> >>
>> >> >> If the client don't have the region in cache and META is updated with
>> >> the
>> >> >> region server death, there will be no waiting time.
>> >> >>
>> >> >> J-D
>> >> >>
>> >> >>
>> >> >> On Tue, Feb 24, 2009 at 9:49 AM, Michael Dagaev <
>> >> michael.dagaev@gmail.com>wrote:
>> >> >>
>> >> >>> Thanks, now it is clear.
>> >> >>>
>> >> >>> However, if a region server is down, it takes a lot of time to retry
>> >> >>> first,
>> >> >>> to rescan the META region when the retries fail, rescan ROOT, etc.
>> to
>> >> >>> get eventually to another region server, which will handle the
>> request.
>> >> >>> Is it correct ?
>> >> >>>
>> >> >>> On Tue, Feb 24, 2009 at 4:36 PM, Jean-Daniel Cryans <
>> >> jdcryans@apache.org>
>> >> >>> wrote:
>> >> >>> > This is why we have a META table, it holds the location info. See
>> >> >>> > http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture#client
>> >> >>> >
>> >> >>> > J-D
>> >> >>> >
>> >> >>> > On Tue, Feb 24, 2009 at 9:28 AM, Michael Dagaev <
>> >> >>> michael.dagaev@gmail.com>wrote:
>> >> >>> >
>> >> >>> >> Thanks, Jean-Daniel.
>> >> >>> >>
>> >> >>> >> I did run hbase-daemon stop regionserver and start regionserver
>> >> >>> >> and saw the client retrying to connect to the restarted region
>> >> server.
>> >> >>> >>
>> >> >>> >> How does it know to connect to another region server ? Maybe it
>> >> stops
>> >> >>> >> retrying, asks master, and get another region server to connect
>> to.
>> >> >>> >> Is it correct ?
>> >> >>> >>
>> >> >>> >> Thank you for your cooperation,
>> >> >>> >> M.
>> >> >>> >>
>> >> >>> >> On Tue, Feb 24, 2009 at 3:56 PM, Jean-Daniel Cryans <
>> >> >>> jdcryans@apache.org>
>> >> >>> >> wrote:
>> >> >>> >> > Michael,
>> >> >>> >> >
>> >> >>> >> > Regards stopping those nodes, do it using
>> >> hadoop-daemon/hbase-daemon
>> >> >>> to
>> >> >>> >> stop
>> >> >>> >> > them cleanly. Requests from the clients will not "fail", they
>> will
>> >> >>> simply
>> >> >>> >> be
>> >> >>> >> > told to look elsewhere for the regions they have in cache.
>> Unless
>> >> you
>> >> >>> >> only
>> >> >>> >> > have 1 region server...
>> >> >>> >> >
>> >> >>> >> > Regards starting the nodes, apart from the usual
>> >> >>> >> hadoop-daemon/hbase-daemon,
>> >> >>> >> > no.
>> >> >>> >> >
>> >> >>> >> > J-D
>> >> >>> >> >
>> >> >>> >> > On Tue, Feb 24, 2009 at 8:50 AM, Michael Dagaev <
>> >> >>> >> michael.dagaev@gmail.com>wrote:
>> >> >>> >> >
>> >> >>> >> >> Hi, all
>> >> >>> >> >>
>> >> >>> >> >>     As I understand, I can stop a region server and a data
>> node
>> >> in a
>> >> >>> >> >> cluster
>> >> >>> >> >> "semi-transparently" for clients, i. e. the requests handled
>>  by
>> >> the
>> >> >>> >> >> region server
>> >> >>> >> >> at that time will fail, but cluster will be working.
>> >> >>> >> >>
>> >> >>> >> >> If I start the data node and region server  I do not have to
>> do
>> >> >>> anything
>> >> >>> >> to
>> >> >>> >> >> make
>> >> >>> >> >> them work.
>> >> >>> >> >>
>> >> >>> >> >> Is it correct ?
>> >> >>> >> >>
>> >> >>> >> >> Thank you for your cooperation,
>> >> >>> >> >> M.
>> >> >>> >> >>
>> >> >>> >> >
>> >> >>> >>
>> >> >>> >
>> >> >>>
>> >> >>
>> >> >>
>> >> >
>> >>
>> >
>>
>

Re: Question on region server/data node restart

Posted by Michael Dagaev <mi...@gmail.com>.

Sorry. I mean some req

On Tue, Feb 24, 2009 at 6:06 PM, Jean-Daniel Cryans <jd...@apache.org> wrote:
> As I wrote, you should upgrade to 0.18 branch in SVN.
>
> J-D
>
> On Tue, Feb 24, 2009 at 11:04 AM, Michael Dagaev
> <mi...@gmail.com>wrote:
>
>> I do not if it was holding ROOT or META region.
>> It looks like requests may fail in Hbase 0.18 if a region server stops.
>>
>> Thanks,
>> M.
>>
>> On Tue, Feb 24, 2009 at 5:40 PM, Jean-Daniel Cryans <jd...@apache.org>
>> wrote:
>> > Well this should not happen like that. Was the region server holding the
>> > ROOT or META region? If so, well that's a bug corrected in 0.19.0 and
>> > branch-0.18. I suggest you upgrade to that version if you don't want to
>> > break your MR jobs.
>> >
>> > J-D
>> >
>> > On Tue, Feb 24, 2009 at 10:33 AM, Michael Dagaev
>> > <mi...@gmail.com>wrote:
>> >
>> >> What I see now is that the client gets an exception (see below) once a
>> >> region servers stops:
>> >>
>> >> org.apache.hadoop.hbase.client.NoServerForRegionException: No server
>> >> address listed in .META.
>> >> ...
>> >> Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException:
>> >> Trying to contact region server <region server>:60020 for region
>> >>
>> >> I guess the exception occurred since the region server is down. Is it
>> >> correct?
>> >>
>> >> Thank you for your cooperation,
>> >> M.
>> >>
>> >> P. S. We are running version 0.18.1
>> >>
>> >> On Tue, Feb 24, 2009 at 5:07 PM, Jean-Daniel Cryans <
>> jdcryans@apache.org>
>> >> wrote:
>> >> > Correcting myself, no waiting time regards the time to figure the node
>> is
>> >> > dead. It will still have to fetch the region location in META.
>> >> >
>> >> > J-D
>> >> >
>> >> >
>> >> > On Tue, Feb 24, 2009 at 10:02 AM, Jean-Daniel Cryans <
>> >> jdcryans@apache.org>wrote:
>> >> >
>> >> >> Well if a region server dies instead of being cleanly shut down, it
>> >> takes
>> >> >> in the worst case 180 seconds (a region server lease length) before
>> the
>> >> >> Master reassigns the regions. Clients trying to connect to that
>> server
>> >> will
>> >> >> take IIRC 10 seconds to figure the node is down then the time to
>> >> communicate
>> >> >> with ROOT and META is under 1 sec. If META wasn't updated yet, it
>> will
>> >> retry
>> >> >> all of that.
>> >> >>
>> >> >> In the next release (0.20.0), the master is notified by Zookeeper in
>> the
>> >> >> following seconds of a region server death and will proceed to
>> reassign
>> >> the
>> >> >> regions immediately.
>> >> >>
>> >> >> If the client don't have the region in cache and META is updated with
>> >> the
>> >> >> region server death, there will be no waiting time.
>> >> >>
>> >> >> J-D
>> >> >>
>> >> >>
>> >> >> On Tue, Feb 24, 2009 at 9:49 AM, Michael Dagaev <
>> >> michael.dagaev@gmail.com>wrote:
>> >> >>
>> >> >>> Thanks, now it is clear.
>> >> >>>
>> >> >>> However, if a region server is down, it takes a lot of time to retry
>> >> >>> first,
>> >> >>> to rescan the META region when the retries fail, rescan ROOT, etc.
>> to
>> >> >>> get eventually to another region server, which will handle the
>> request.
>> >> >>> Is it correct ?
>> >> >>>
>> >> >>> On Tue, Feb 24, 2009 at 4:36 PM, Jean-Daniel Cryans <
>> >> jdcryans@apache.org>
>> >> >>> wrote:
>> >> >>> > This is why we have a META table, it holds the location info. See
>> >> >>> > http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture#client
>> >> >>> >
>> >> >>> > J-D
>> >> >>> >
>> >> >>> > On Tue, Feb 24, 2009 at 9:28 AM, Michael Dagaev <
>> >> >>> michael.dagaev@gmail.com>wrote:
>> >> >>> >
>> >> >>> >> Thanks, Jean-Daniel.
>> >> >>> >>
>> >> >>> >> I did run hbase-daemon stop regionserver and start regionserver
>> >> >>> >> and saw the client retrying to connect to the restarted region
>> >> server.
>> >> >>> >>
>> >> >>> >> How does it know to connect to another region server ? Maybe it
>> >> stops
>> >> >>> >> retrying, asks master, and get another region server to connect
>> to.
>> >> >>> >> Is it correct ?
>> >> >>> >>
>> >> >>> >> Thank you for your cooperation,
>> >> >>> >> M.
>> >> >>> >>
>> >> >>> >> On Tue, Feb 24, 2009 at 3:56 PM, Jean-Daniel Cryans <
>> >> >>> jdcryans@apache.org>
>> >> >>> >> wrote:
>> >> >>> >> > Michael,
>> >> >>> >> >
>> >> >>> >> > Regards stopping those nodes, do it using
>> >> hadoop-daemon/hbase-daemon
>> >> >>> to
>> >> >>> >> stop
>> >> >>> >> > them cleanly. Requests from the clients will not "fail", they
>> will
>> >> >>> simply
>> >> >>> >> be
>> >> >>> >> > told to look elsewhere for the regions they have in cache.
>> Unless
>> >> you
>> >> >>> >> only
>> >> >>> >> > have 1 region server...
>> >> >>> >> >
>> >> >>> >> > Regards starting the nodes, apart from the usual
>> >> >>> >> hadoop-daemon/hbase-daemon,
>> >> >>> >> > no.
>> >> >>> >> >
>> >> >>> >> > J-D
>> >> >>> >> >
>> >> >>> >> > On Tue, Feb 24, 2009 at 8:50 AM, Michael Dagaev <
>> >> >>> >> michael.dagaev@gmail.com>wrote:
>> >> >>> >> >
>> >> >>> >> >> Hi, all
>> >> >>> >> >>
>> >> >>> >> >>     As I understand, I can stop a region server and a data
>> node
>> >> in a
>> >> >>> >> >> cluster
>> >> >>> >> >> "semi-transparently" for clients, i. e. the requests handled
>>  by
>> >> the
>> >> >>> >> >> region server
>> >> >>> >> >> at that time will fail, but cluster will be working.
>> >> >>> >> >>
>> >> >>> >> >> If I start the data node and region server  I do not have to
>> do
>> >> >>> anything
>> >> >>> >> to
>> >> >>> >> >> make
>> >> >>> >> >> them work.
>> >> >>> >> >>
>> >> >>> >> >> Is it correct ?
>> >> >>> >> >>
>> >> >>> >> >> Thank you for your cooperation,
>> >> >>> >> >> M.
>> >> >>> >> >>
>> >> >>> >> >
>> >> >>> >>
>> >> >>> >
>> >> >>>
>> >> >>
>> >> >>
>> >> >
>> >>
>> >
>>
>

Re: Question on region server/data node restart

Posted by Jean-Daniel Cryans <jd...@apache.org>.

As I wrote, you should upgrade to 0.18 branch in SVN.

J-D

On Tue, Feb 24, 2009 at 11:04 AM, Michael Dagaev
<mi...@gmail.com>wrote:

> I do not if it was holding ROOT or META region.
> It looks like requests may fail in Hbase 0.18 if a region server stops.
>
> Thanks,
> M.
>
> On Tue, Feb 24, 2009 at 5:40 PM, Jean-Daniel Cryans <jd...@apache.org>
> wrote:
> > Well this should not happen like that. Was the region server holding the
> > ROOT or META region? If so, well that's a bug corrected in 0.19.0 and
> > branch-0.18. I suggest you upgrade to that version if you don't want to
> > break your MR jobs.
> >
> > J-D
> >
> > On Tue, Feb 24, 2009 at 10:33 AM, Michael Dagaev
> > <mi...@gmail.com>wrote:
> >
> >> What I see now is that the client gets an exception (see below) once a
> >> region servers stops:
> >>
> >> org.apache.hadoop.hbase.client.NoServerForRegionException: No server
> >> address listed in .META.
> >> ...
> >> Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException:
> >> Trying to contact region server <region server>:60020 for region
> >>
> >> I guess the exception occurred since the region server is down. Is it
> >> correct?
> >>
> >> Thank you for your cooperation,
> >> M.
> >>
> >> P. S. We are running version 0.18.1
> >>
> >> On Tue, Feb 24, 2009 at 5:07 PM, Jean-Daniel Cryans <
> jdcryans@apache.org>
> >> wrote:
> >> > Correcting myself, no waiting time regards the time to figure the node
> is
> >> > dead. It will still have to fetch the region location in META.
> >> >
> >> > J-D
> >> >
> >> >
> >> > On Tue, Feb 24, 2009 at 10:02 AM, Jean-Daniel Cryans <
> >> jdcryans@apache.org>wrote:
> >> >
> >> >> Well if a region server dies instead of being cleanly shut down, it
> >> takes
> >> >> in the worst case 180 seconds (a region server lease length) before
> the
> >> >> Master reassigns the regions. Clients trying to connect to that
> server
> >> will
> >> >> take IIRC 10 seconds to figure the node is down then the time to
> >> communicate
> >> >> with ROOT and META is under 1 sec. If META wasn't updated yet, it
> will
> >> retry
> >> >> all of that.
> >> >>
> >> >> In the next release (0.20.0), the master is notified by Zookeeper in
> the
> >> >> following seconds of a region server death and will proceed to
> reassign
> >> the
> >> >> regions immediately.
> >> >>
> >> >> If the client don't have the region in cache and META is updated with
> >> the
> >> >> region server death, there will be no waiting time.
> >> >>
> >> >> J-D
> >> >>
> >> >>
> >> >> On Tue, Feb 24, 2009 at 9:49 AM, Michael Dagaev <
> >> michael.dagaev@gmail.com>wrote:
> >> >>
> >> >>> Thanks, now it is clear.
> >> >>>
> >> >>> However, if a region server is down, it takes a lot of time to retry
> >> >>> first,
> >> >>> to rescan the META region when the retries fail, rescan ROOT, etc.
> to
> >> >>> get eventually to another region server, which will handle the
> request.
> >> >>> Is it correct ?
> >> >>>
> >> >>> On Tue, Feb 24, 2009 at 4:36 PM, Jean-Daniel Cryans <
> >> jdcryans@apache.org>
> >> >>> wrote:
> >> >>> > This is why we have a META table, it holds the location info. See
> >> >>> > http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture#client
> >> >>> >
> >> >>> > J-D
> >> >>> >
> >> >>> > On Tue, Feb 24, 2009 at 9:28 AM, Michael Dagaev <
> >> >>> michael.dagaev@gmail.com>wrote:
> >> >>> >
> >> >>> >> Thanks, Jean-Daniel.
> >> >>> >>
> >> >>> >> I did run hbase-daemon stop regionserver and start regionserver
> >> >>> >> and saw the client retrying to connect to the restarted region
> >> server.
> >> >>> >>
> >> >>> >> How does it know to connect to another region server ? Maybe it
> >> stops
> >> >>> >> retrying, asks master, and get another region server to connect
> to.
> >> >>> >> Is it correct ?
> >> >>> >>
> >> >>> >> Thank you for your cooperation,
> >> >>> >> M.
> >> >>> >>
> >> >>> >> On Tue, Feb 24, 2009 at 3:56 PM, Jean-Daniel Cryans <
> >> >>> jdcryans@apache.org>
> >> >>> >> wrote:
> >> >>> >> > Michael,
> >> >>> >> >
> >> >>> >> > Regards stopping those nodes, do it using
> >> hadoop-daemon/hbase-daemon
> >> >>> to
> >> >>> >> stop
> >> >>> >> > them cleanly. Requests from the clients will not "fail", they
> will
> >> >>> simply
> >> >>> >> be
> >> >>> >> > told to look elsewhere for the regions they have in cache.
> Unless
> >> you
> >> >>> >> only
> >> >>> >> > have 1 region server...
> >> >>> >> >
> >> >>> >> > Regards starting the nodes, apart from the usual
> >> >>> >> hadoop-daemon/hbase-daemon,
> >> >>> >> > no.
> >> >>> >> >
> >> >>> >> > J-D
> >> >>> >> >
> >> >>> >> > On Tue, Feb 24, 2009 at 8:50 AM, Michael Dagaev <
> >> >>> >> michael.dagaev@gmail.com>wrote:
> >> >>> >> >
> >> >>> >> >> Hi, all
> >> >>> >> >>
> >> >>> >> >>     As I understand, I can stop a region server and a data
> node
> >> in a
> >> >>> >> >> cluster
> >> >>> >> >> "semi-transparently" for clients, i. e. the requests handled
>  by
> >> the
> >> >>> >> >> region server
> >> >>> >> >> at that time will fail, but cluster will be working.
> >> >>> >> >>
> >> >>> >> >> If I start the data node and region server  I do not have to
> do
> >> >>> anything
> >> >>> >> to
> >> >>> >> >> make
> >> >>> >> >> them work.
> >> >>> >> >>
> >> >>> >> >> Is it correct ?
> >> >>> >> >>
> >> >>> >> >> Thank you for your cooperation,
> >> >>> >> >> M.
> >> >>> >> >>
> >> >>> >> >
> >> >>> >>
> >> >>> >
> >> >>>
> >> >>
> >> >>
> >> >
> >>
> >
>

Re: Question on region server/data node restart

Posted by Michael Dagaev <mi...@gmail.com>.

I do not if it was holding ROOT or META region.
It looks like requests may fail in Hbase 0.18 if a region server stops.

Thanks,
M.

On Tue, Feb 24, 2009 at 5:40 PM, Jean-Daniel Cryans <jd...@apache.org> wrote:
> Well this should not happen like that. Was the region server holding the
> ROOT or META region? If so, well that's a bug corrected in 0.19.0 and
> branch-0.18. I suggest you upgrade to that version if you don't want to
> break your MR jobs.
>
> J-D
>
> On Tue, Feb 24, 2009 at 10:33 AM, Michael Dagaev
> <mi...@gmail.com>wrote:
>
>> What I see now is that the client gets an exception (see below) once a
>> region servers stops:
>>
>> org.apache.hadoop.hbase.client.NoServerForRegionException: No server
>> address listed in .META.
>> ...
>> Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException:
>> Trying to contact region server <region server>:60020 for region
>>
>> I guess the exception occurred since the region server is down. Is it
>> correct?
>>
>> Thank you for your cooperation,
>> M.
>>
>> P. S. We are running version 0.18.1
>>
>> On Tue, Feb 24, 2009 at 5:07 PM, Jean-Daniel Cryans <jd...@apache.org>
>> wrote:
>> > Correcting myself, no waiting time regards the time to figure the node is
>> > dead. It will still have to fetch the region location in META.
>> >
>> > J-D
>> >
>> >
>> > On Tue, Feb 24, 2009 at 10:02 AM, Jean-Daniel Cryans <
>> jdcryans@apache.org>wrote:
>> >
>> >> Well if a region server dies instead of being cleanly shut down, it
>> takes
>> >> in the worst case 180 seconds (a region server lease length) before the
>> >> Master reassigns the regions. Clients trying to connect to that server
>> will
>> >> take IIRC 10 seconds to figure the node is down then the time to
>> communicate
>> >> with ROOT and META is under 1 sec. If META wasn't updated yet, it will
>> retry
>> >> all of that.
>> >>
>> >> In the next release (0.20.0), the master is notified by Zookeeper in the
>> >> following seconds of a region server death and will proceed to reassign
>> the
>> >> regions immediately.
>> >>
>> >> If the client don't have the region in cache and META is updated with
>> the
>> >> region server death, there will be no waiting time.
>> >>
>> >> J-D
>> >>
>> >>
>> >> On Tue, Feb 24, 2009 at 9:49 AM, Michael Dagaev <
>> michael.dagaev@gmail.com>wrote:
>> >>
>> >>> Thanks, now it is clear.
>> >>>
>> >>> However, if a region server is down, it takes a lot of time to retry
>> >>> first,
>> >>> to rescan the META region when the retries fail, rescan ROOT, etc. to
>> >>> get eventually to another region server, which will handle the request.
>> >>> Is it correct ?
>> >>>
>> >>> On Tue, Feb 24, 2009 at 4:36 PM, Jean-Daniel Cryans <
>> jdcryans@apache.org>
>> >>> wrote:
>> >>> > This is why we have a META table, it holds the location info. See
>> >>> > http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture#client
>> >>> >
>> >>> > J-D
>> >>> >
>> >>> > On Tue, Feb 24, 2009 at 9:28 AM, Michael Dagaev <
>> >>> michael.dagaev@gmail.com>wrote:
>> >>> >
>> >>> >> Thanks, Jean-Daniel.
>> >>> >>
>> >>> >> I did run hbase-daemon stop regionserver and start regionserver
>> >>> >> and saw the client retrying to connect to the restarted region
>> server.
>> >>> >>
>> >>> >> How does it know to connect to another region server ? Maybe it
>> stops
>> >>> >> retrying, asks master, and get another region server to connect to.
>> >>> >> Is it correct ?
>> >>> >>
>> >>> >> Thank you for your cooperation,
>> >>> >> M.
>> >>> >>
>> >>> >> On Tue, Feb 24, 2009 at 3:56 PM, Jean-Daniel Cryans <
>> >>> jdcryans@apache.org>
>> >>> >> wrote:
>> >>> >> > Michael,
>> >>> >> >
>> >>> >> > Regards stopping those nodes, do it using
>> hadoop-daemon/hbase-daemon
>> >>> to
>> >>> >> stop
>> >>> >> > them cleanly. Requests from the clients will not "fail", they will
>> >>> simply
>> >>> >> be
>> >>> >> > told to look elsewhere for the regions they have in cache. Unless
>> you
>> >>> >> only
>> >>> >> > have 1 region server...
>> >>> >> >
>> >>> >> > Regards starting the nodes, apart from the usual
>> >>> >> hadoop-daemon/hbase-daemon,
>> >>> >> > no.
>> >>> >> >
>> >>> >> > J-D
>> >>> >> >
>> >>> >> > On Tue, Feb 24, 2009 at 8:50 AM, Michael Dagaev <
>> >>> >> michael.dagaev@gmail.com>wrote:
>> >>> >> >
>> >>> >> >> Hi, all
>> >>> >> >>
>> >>> >> >>     As I understand, I can stop a region server and a data node
>> in a
>> >>> >> >> cluster
>> >>> >> >> "semi-transparently" for clients, i. e. the requests handled  by
>> the
>> >>> >> >> region server
>> >>> >> >> at that time will fail, but cluster will be working.
>> >>> >> >>
>> >>> >> >> If I start the data node and region server  I do not have to do
>> >>> anything
>> >>> >> to
>> >>> >> >> make
>> >>> >> >> them work.
>> >>> >> >>
>> >>> >> >> Is it correct ?
>> >>> >> >>
>> >>> >> >> Thank you for your cooperation,
>> >>> >> >> M.
>> >>> >> >>
>> >>> >> >
>> >>> >>
>> >>> >
>> >>>
>> >>
>> >>
>> >
>>
>

Re: Question on region server/data node restart

Posted by Jean-Daniel Cryans <jd...@apache.org>.

Well this should not happen like that. Was the region server holding the
ROOT or META region? If so, well that's a bug corrected in 0.19.0 and
branch-0.18. I suggest you upgrade to that version if you don't want to
break your MR jobs.

J-D

On Tue, Feb 24, 2009 at 10:33 AM, Michael Dagaev
<mi...@gmail.com>wrote:

> What I see now is that the client gets an exception (see below) once a
> region servers stops:
>
> org.apache.hadoop.hbase.client.NoServerForRegionException: No server
> address listed in .META.
> ...
> Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException:
> Trying to contact region server <region server>:60020 for region
>
> I guess the exception occurred since the region server is down. Is it
> correct?
>
> Thank you for your cooperation,
> M.
>
> P. S. We are running version 0.18.1
>
> On Tue, Feb 24, 2009 at 5:07 PM, Jean-Daniel Cryans <jd...@apache.org>
> wrote:
> > Correcting myself, no waiting time regards the time to figure the node is
> > dead. It will still have to fetch the region location in META.
> >
> > J-D
> >
> >
> > On Tue, Feb 24, 2009 at 10:02 AM, Jean-Daniel Cryans <
> jdcryans@apache.org>wrote:
> >
> >> Well if a region server dies instead of being cleanly shut down, it
> takes
> >> in the worst case 180 seconds (a region server lease length) before the
> >> Master reassigns the regions. Clients trying to connect to that server
> will
> >> take IIRC 10 seconds to figure the node is down then the time to
> communicate
> >> with ROOT and META is under 1 sec. If META wasn't updated yet, it will
> retry
> >> all of that.
> >>
> >> In the next release (0.20.0), the master is notified by Zookeeper in the
> >> following seconds of a region server death and will proceed to reassign
> the
> >> regions immediately.
> >>
> >> If the client don't have the region in cache and META is updated with
> the
> >> region server death, there will be no waiting time.
> >>
> >> J-D
> >>
> >>
> >> On Tue, Feb 24, 2009 at 9:49 AM, Michael Dagaev <
> michael.dagaev@gmail.com>wrote:
> >>
> >>> Thanks, now it is clear.
> >>>
> >>> However, if a region server is down, it takes a lot of time to retry
> >>> first,
> >>> to rescan the META region when the retries fail, rescan ROOT, etc. to
> >>> get eventually to another region server, which will handle the request.
> >>> Is it correct ?
> >>>
> >>> On Tue, Feb 24, 2009 at 4:36 PM, Jean-Daniel Cryans <
> jdcryans@apache.org>
> >>> wrote:
> >>> > This is why we have a META table, it holds the location info. See
> >>> > http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture#client
> >>> >
> >>> > J-D
> >>> >
> >>> > On Tue, Feb 24, 2009 at 9:28 AM, Michael Dagaev <
> >>> michael.dagaev@gmail.com>wrote:
> >>> >
> >>> >> Thanks, Jean-Daniel.
> >>> >>
> >>> >> I did run hbase-daemon stop regionserver and start regionserver
> >>> >> and saw the client retrying to connect to the restarted region
> server.
> >>> >>
> >>> >> How does it know to connect to another region server ? Maybe it
> stops
> >>> >> retrying, asks master, and get another region server to connect to.
> >>> >> Is it correct ?
> >>> >>
> >>> >> Thank you for your cooperation,
> >>> >> M.
> >>> >>
> >>> >> On Tue, Feb 24, 2009 at 3:56 PM, Jean-Daniel Cryans <
> >>> jdcryans@apache.org>
> >>> >> wrote:
> >>> >> > Michael,
> >>> >> >
> >>> >> > Regards stopping those nodes, do it using
> hadoop-daemon/hbase-daemon
> >>> to
> >>> >> stop
> >>> >> > them cleanly. Requests from the clients will not "fail", they will
> >>> simply
> >>> >> be
> >>> >> > told to look elsewhere for the regions they have in cache. Unless
> you
> >>> >> only
> >>> >> > have 1 region server...
> >>> >> >
> >>> >> > Regards starting the nodes, apart from the usual
> >>> >> hadoop-daemon/hbase-daemon,
> >>> >> > no.
> >>> >> >
> >>> >> > J-D
> >>> >> >
> >>> >> > On Tue, Feb 24, 2009 at 8:50 AM, Michael Dagaev <
> >>> >> michael.dagaev@gmail.com>wrote:
> >>> >> >
> >>> >> >> Hi, all
> >>> >> >>
> >>> >> >>     As I understand, I can stop a region server and a data node
> in a
> >>> >> >> cluster
> >>> >> >> "semi-transparently" for clients, i. e. the requests handled  by
> the
> >>> >> >> region server
> >>> >> >> at that time will fail, but cluster will be working.
> >>> >> >>
> >>> >> >> If I start the data node and region server  I do not have to do
> >>> anything
> >>> >> to
> >>> >> >> make
> >>> >> >> them work.
> >>> >> >>
> >>> >> >> Is it correct ?
> >>> >> >>
> >>> >> >> Thank you for your cooperation,
> >>> >> >> M.
> >>> >> >>
> >>> >> >
> >>> >>
> >>> >
> >>>
> >>
> >>
> >
>

Re: Question on region server/data node restart

Posted by Michael Dagaev <mi...@gmail.com>.

What I see now is that the client gets an exception (see below) once a
region servers stops:

org.apache.hadoop.hbase.client.NoServerForRegionException: No server
address listed in .META.
...
Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedException:
Trying to contact region server <region server>:60020 for region

I guess the exception occurred since the region server is down. Is it correct?

Thank you for your cooperation,
M.

P. S. We are running version 0.18.1

On Tue, Feb 24, 2009 at 5:07 PM, Jean-Daniel Cryans <jd...@apache.org> wrote:
> Correcting myself, no waiting time regards the time to figure the node is
> dead. It will still have to fetch the region location in META.
>
> J-D
>
>
> On Tue, Feb 24, 2009 at 10:02 AM, Jean-Daniel Cryans <jd...@apache.org>wrote:
>
>> Well if a region server dies instead of being cleanly shut down, it takes
>> in the worst case 180 seconds (a region server lease length) before the
>> Master reassigns the regions. Clients trying to connect to that server will
>> take IIRC 10 seconds to figure the node is down then the time to communicate
>> with ROOT and META is under 1 sec. If META wasn't updated yet, it will retry
>> all of that.
>>
>> In the next release (0.20.0), the master is notified by Zookeeper in the
>> following seconds of a region server death and will proceed to reassign the
>> regions immediately.
>>
>> If the client don't have the region in cache and META is updated with the
>> region server death, there will be no waiting time.
>>
>> J-D
>>
>>
>> On Tue, Feb 24, 2009 at 9:49 AM, Michael Dagaev <mi...@gmail.com>wrote:
>>
>>> Thanks, now it is clear.
>>>
>>> However, if a region server is down, it takes a lot of time to retry
>>> first,
>>> to rescan the META region when the retries fail, rescan ROOT, etc. to
>>> get eventually to another region server, which will handle the request.
>>> Is it correct ?
>>>
>>> On Tue, Feb 24, 2009 at 4:36 PM, Jean-Daniel Cryans <jd...@apache.org>
>>> wrote:
>>> > This is why we have a META table, it holds the location info. See
>>> > http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture#client
>>> >
>>> > J-D
>>> >
>>> > On Tue, Feb 24, 2009 at 9:28 AM, Michael Dagaev <
>>> michael.dagaev@gmail.com>wrote:
>>> >
>>> >> Thanks, Jean-Daniel.
>>> >>
>>> >> I did run hbase-daemon stop regionserver and start regionserver
>>> >> and saw the client retrying to connect to the restarted region server.
>>> >>
>>> >> How does it know to connect to another region server ? Maybe it stops
>>> >> retrying, asks master, and get another region server to connect to.
>>> >> Is it correct ?
>>> >>
>>> >> Thank you for your cooperation,
>>> >> M.
>>> >>
>>> >> On Tue, Feb 24, 2009 at 3:56 PM, Jean-Daniel Cryans <
>>> jdcryans@apache.org>
>>> >> wrote:
>>> >> > Michael,
>>> >> >
>>> >> > Regards stopping those nodes, do it using hadoop-daemon/hbase-daemon
>>> to
>>> >> stop
>>> >> > them cleanly. Requests from the clients will not "fail", they will
>>> simply
>>> >> be
>>> >> > told to look elsewhere for the regions they have in cache. Unless you
>>> >> only
>>> >> > have 1 region server...
>>> >> >
>>> >> > Regards starting the nodes, apart from the usual
>>> >> hadoop-daemon/hbase-daemon,
>>> >> > no.
>>> >> >
>>> >> > J-D
>>> >> >
>>> >> > On Tue, Feb 24, 2009 at 8:50 AM, Michael Dagaev <
>>> >> michael.dagaev@gmail.com>wrote:
>>> >> >
>>> >> >> Hi, all
>>> >> >>
>>> >> >>     As I understand, I can stop a region server and a data node in a
>>> >> >> cluster
>>> >> >> "semi-transparently" for clients, i. e. the requests handled  by the
>>> >> >> region server
>>> >> >> at that time will fail, but cluster will be working.
>>> >> >>
>>> >> >> If I start the data node and region server  I do not have to do
>>> anything
>>> >> to
>>> >> >> make
>>> >> >> them work.
>>> >> >>
>>> >> >> Is it correct ?
>>> >> >>
>>> >> >> Thank you for your cooperation,
>>> >> >> M.
>>> >> >>
>>> >> >
>>> >>
>>> >
>>>
>>
>>
>

Re: Question on region server/data node restart

Posted by Jean-Daniel Cryans <jd...@apache.org>.

Correcting myself, no waiting time regards the time to figure the node is
dead. It will still have to fetch the region location in META.

J-D


On Tue, Feb 24, 2009 at 10:02 AM, Jean-Daniel Cryans <jd...@apache.org>wrote:

> Well if a region server dies instead of being cleanly shut down, it takes
> in the worst case 180 seconds (a region server lease length) before the
> Master reassigns the regions. Clients trying to connect to that server will
> take IIRC 10 seconds to figure the node is down then the time to communicate
> with ROOT and META is under 1 sec. If META wasn't updated yet, it will retry
> all of that.
>
> In the next release (0.20.0), the master is notified by Zookeeper in the
> following seconds of a region server death and will proceed to reassign the
> regions immediately.
>
> If the client don't have the region in cache and META is updated with the
> region server death, there will be no waiting time.
>
> J-D
>
>
> On Tue, Feb 24, 2009 at 9:49 AM, Michael Dagaev <mi...@gmail.com>wrote:
>
>> Thanks, now it is clear.
>>
>> However, if a region server is down, it takes a lot of time to retry
>> first,
>> to rescan the META region when the retries fail, rescan ROOT, etc. to
>> get eventually to another region server, which will handle the request.
>> Is it correct ?
>>
>> On Tue, Feb 24, 2009 at 4:36 PM, Jean-Daniel Cryans <jd...@apache.org>
>> wrote:
>> > This is why we have a META table, it holds the location info. See
>> > http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture#client
>> >
>> > J-D
>> >
>> > On Tue, Feb 24, 2009 at 9:28 AM, Michael Dagaev <
>> michael.dagaev@gmail.com>wrote:
>> >
>> >> Thanks, Jean-Daniel.
>> >>
>> >> I did run hbase-daemon stop regionserver and start regionserver
>> >> and saw the client retrying to connect to the restarted region server.
>> >>
>> >> How does it know to connect to another region server ? Maybe it stops
>> >> retrying, asks master, and get another region server to connect to.
>> >> Is it correct ?
>> >>
>> >> Thank you for your cooperation,
>> >> M.
>> >>
>> >> On Tue, Feb 24, 2009 at 3:56 PM, Jean-Daniel Cryans <
>> jdcryans@apache.org>
>> >> wrote:
>> >> > Michael,
>> >> >
>> >> > Regards stopping those nodes, do it using hadoop-daemon/hbase-daemon
>> to
>> >> stop
>> >> > them cleanly. Requests from the clients will not "fail", they will
>> simply
>> >> be
>> >> > told to look elsewhere for the regions they have in cache. Unless you
>> >> only
>> >> > have 1 region server...
>> >> >
>> >> > Regards starting the nodes, apart from the usual
>> >> hadoop-daemon/hbase-daemon,
>> >> > no.
>> >> >
>> >> > J-D
>> >> >
>> >> > On Tue, Feb 24, 2009 at 8:50 AM, Michael Dagaev <
>> >> michael.dagaev@gmail.com>wrote:
>> >> >
>> >> >> Hi, all
>> >> >>
>> >> >>     As I understand, I can stop a region server and a data node in a
>> >> >> cluster
>> >> >> "semi-transparently" for clients, i. e. the requests handled  by the
>> >> >> region server
>> >> >> at that time will fail, but cluster will be working.
>> >> >>
>> >> >> If I start the data node and region server  I do not have to do
>> anything
>> >> to
>> >> >> make
>> >> >> them work.
>> >> >>
>> >> >> Is it correct ?
>> >> >>
>> >> >> Thank you for your cooperation,
>> >> >> M.
>> >> >>
>> >> >
>> >>
>> >
>>
>
>

Re: Question on region server/data node restart

Posted by Jean-Daniel Cryans <jd...@apache.org>.

Well if a region server dies instead of being cleanly shut down, it takes in
the worst case 180 seconds (a region server lease length) before the Master
reassigns the regions. Clients trying to connect to that server will take
IIRC 10 seconds to figure the node is down then the time to communicate with
ROOT and META is under 1 sec. If META wasn't updated yet, it will retry all
of that.

In the next release (0.20.0), the master is notified by Zookeeper in the
following seconds of a region server death and will proceed to reassign the
regions immediately.

If the client don't have the region in cache and META is updated with the
region server death, there will be no waiting time.

J-D

On Tue, Feb 24, 2009 at 9:49 AM, Michael Dagaev <mi...@gmail.com>wrote:

> Thanks, now it is clear.
>
> However, if a region server is down, it takes a lot of time to retry first,
> to rescan the META region when the retries fail, rescan ROOT, etc. to
> get eventually to another region server, which will handle the request.
> Is it correct ?
>
> On Tue, Feb 24, 2009 at 4:36 PM, Jean-Daniel Cryans <jd...@apache.org>
> wrote:
> > This is why we have a META table, it holds the location info. See
> > http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture#client
> >
> > J-D
> >
> > On Tue, Feb 24, 2009 at 9:28 AM, Michael Dagaev <
> michael.dagaev@gmail.com>wrote:
> >
> >> Thanks, Jean-Daniel.
> >>
> >> I did run hbase-daemon stop regionserver and start regionserver
> >> and saw the client retrying to connect to the restarted region server.
> >>
> >> How does it know to connect to another region server ? Maybe it stops
> >> retrying, asks master, and get another region server to connect to.
> >> Is it correct ?
> >>
> >> Thank you for your cooperation,
> >> M.
> >>
> >> On Tue, Feb 24, 2009 at 3:56 PM, Jean-Daniel Cryans <
> jdcryans@apache.org>
> >> wrote:
> >> > Michael,
> >> >
> >> > Regards stopping those nodes, do it using hadoop-daemon/hbase-daemon
> to
> >> stop
> >> > them cleanly. Requests from the clients will not "fail", they will
> simply
> >> be
> >> > told to look elsewhere for the regions they have in cache. Unless you
> >> only
> >> > have 1 region server...
> >> >
> >> > Regards starting the nodes, apart from the usual
> >> hadoop-daemon/hbase-daemon,
> >> > no.
> >> >
> >> > J-D
> >> >
> >> > On Tue, Feb 24, 2009 at 8:50 AM, Michael Dagaev <
> >> michael.dagaev@gmail.com>wrote:
> >> >
> >> >> Hi, all
> >> >>
> >> >>     As I understand, I can stop a region server and a data node in a
> >> >> cluster
> >> >> "semi-transparently" for clients, i. e. the requests handled  by the
> >> >> region server
> >> >> at that time will fail, but cluster will be working.
> >> >>
> >> >> If I start the data node and region server  I do not have to do
> anything
> >> to
> >> >> make
> >> >> them work.
> >> >>
> >> >> Is it correct ?
> >> >>
> >> >> Thank you for your cooperation,
> >> >> M.
> >> >>
> >> >
> >>
> >
>

Re: Question on region server/data node restart

Posted by Michael Dagaev <mi...@gmail.com>.

Thanks, now it is clear.

However, if a region server is down, it takes a lot of time to retry first,
to rescan the META region when the retries fail, rescan ROOT, etc. to
get eventually to another region server, which will handle the request.
Is it correct ?

On Tue, Feb 24, 2009 at 4:36 PM, Jean-Daniel Cryans <jd...@apache.org> wrote:
> This is why we have a META table, it holds the location info. See
> http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture#client
>
> J-D
>
> On Tue, Feb 24, 2009 at 9:28 AM, Michael Dagaev <mi...@gmail.com>wrote:
>
>> Thanks, Jean-Daniel.
>>
>> I did run hbase-daemon stop regionserver and start regionserver
>> and saw the client retrying to connect to the restarted region server.
>>
>> How does it know to connect to another region server ? Maybe it stops
>> retrying, asks master, and get another region server to connect to.
>> Is it correct ?
>>
>> Thank you for your cooperation,
>> M.
>>
>> On Tue, Feb 24, 2009 at 3:56 PM, Jean-Daniel Cryans <jd...@apache.org>
>> wrote:
>> > Michael,
>> >
>> > Regards stopping those nodes, do it using hadoop-daemon/hbase-daemon to
>> stop
>> > them cleanly. Requests from the clients will not "fail", they will simply
>> be
>> > told to look elsewhere for the regions they have in cache. Unless you
>> only
>> > have 1 region server...
>> >
>> > Regards starting the nodes, apart from the usual
>> hadoop-daemon/hbase-daemon,
>> > no.
>> >
>> > J-D
>> >
>> > On Tue, Feb 24, 2009 at 8:50 AM, Michael Dagaev <
>> michael.dagaev@gmail.com>wrote:
>> >
>> >> Hi, all
>> >>
>> >>     As I understand, I can stop a region server and a data node in a
>> >> cluster
>> >> "semi-transparently" for clients, i. e. the requests handled  by the
>> >> region server
>> >> at that time will fail, but cluster will be working.
>> >>
>> >> If I start the data node and region server  I do not have to do anything
>> to
>> >> make
>> >> them work.
>> >>
>> >> Is it correct ?
>> >>
>> >> Thank you for your cooperation,
>> >> M.
>> >>
>> >
>>
>

Re: Question on region server/data node restart

Posted by Jean-Daniel Cryans <jd...@apache.org>.

This is why we have a META table, it holds the location info. See
http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture#client

J-D

On Tue, Feb 24, 2009 at 9:28 AM, Michael Dagaev <mi...@gmail.com>wrote:

> Thanks, Jean-Daniel.
>
> I did run hbase-daemon stop regionserver and start regionserver
> and saw the client retrying to connect to the restarted region server.
>
> How does it know to connect to another region server ? Maybe it stops
> retrying, asks master, and get another region server to connect to.
> Is it correct ?
>
> Thank you for your cooperation,
> M.
>
> On Tue, Feb 24, 2009 at 3:56 PM, Jean-Daniel Cryans <jd...@apache.org>
> wrote:
> > Michael,
> >
> > Regards stopping those nodes, do it using hadoop-daemon/hbase-daemon to
> stop
> > them cleanly. Requests from the clients will not "fail", they will simply
> be
> > told to look elsewhere for the regions they have in cache. Unless you
> only
> > have 1 region server...
> >
> > Regards starting the nodes, apart from the usual
> hadoop-daemon/hbase-daemon,
> > no.
> >
> > J-D
> >
> > On Tue, Feb 24, 2009 at 8:50 AM, Michael Dagaev <
> michael.dagaev@gmail.com>wrote:
> >
> >> Hi, all
> >>
> >>     As I understand, I can stop a region server and a data node in a
> >> cluster
> >> "semi-transparently" for clients, i. e. the requests handled  by the
> >> region server
> >> at that time will fail, but cluster will be working.
> >>
> >> If I start the data node and region server  I do not have to do anything
> to
> >> make
> >> them work.
> >>
> >> Is it correct ?
> >>
> >> Thank you for your cooperation,
> >> M.
> >>
> >
>

Re: Question on region server/data node restart

Posted by Michael Dagaev <mi...@gmail.com>.

Thanks, Jean-Daniel.

I did run hbase-daemon stop regionserver and start regionserver
and saw the client retrying to connect to the restarted region server.

How does it know to connect to another region server ? Maybe it stops
retrying, asks master, and get another region server to connect to.
Is it correct ?

Thank you for your cooperation,
M.

On Tue, Feb 24, 2009 at 3:56 PM, Jean-Daniel Cryans <jd...@apache.org> wrote:
> Michael,
>
> Regards stopping those nodes, do it using hadoop-daemon/hbase-daemon to stop
> them cleanly. Requests from the clients will not "fail", they will simply be
> told to look elsewhere for the regions they have in cache. Unless you only
> have 1 region server...
>
> Regards starting the nodes, apart from the usual hadoop-daemon/hbase-daemon,
> no.
>
> J-D
>
> On Tue, Feb 24, 2009 at 8:50 AM, Michael Dagaev <mi...@gmail.com>wrote:
>
>> Hi, all
>>
>>     As I understand, I can stop a region server and a data node in a
>> cluster
>> "semi-transparently" for clients, i. e. the requests handled  by the
>> region server
>> at that time will fail, but cluster will be working.
>>
>> If I start the data node and region server  I do not have to do anything to
>> make
>> them work.
>>
>> Is it correct ?
>>
>> Thank you for your cooperation,
>> M.
>>
>

Re: Question on region server/data node restart

Posted by Jean-Daniel Cryans <jd...@apache.org>.

Michael,

Regards stopping those nodes, do it using hadoop-daemon/hbase-daemon to stop
them cleanly. Requests from the clients will not "fail", they will simply be
told to look elsewhere for the regions they have in cache. Unless you only
have 1 region server...

Regards starting the nodes, apart from the usual hadoop-daemon/hbase-daemon,
no.

J-D

On Tue, Feb 24, 2009 at 8:50 AM, Michael Dagaev <mi...@gmail.com>wrote:

> Hi, all
>
>     As I understand, I can stop a region server and a data node in a
> cluster
> "semi-transparently" for clients, i. e. the requests handled  by the
> region server
> at that time will fail, but cluster will be working.
>
> If I start the data node and region server  I do not have to do anything to
> make
> them work.
>
> Is it correct ?
>
> Thank you for your cooperation,
> M.
>