You are viewing a plain text version of this content. The canonical link for it is here.

Posted to derby-user@db.apache.org by Oskar Zinger <os...@yahoo.com> on 2013/03/27 02:05:46 UTC

Re: NullPointerException when Shuting Down Derby

Here is an update...

I started using stopMaster / stopSlave - URL connection attributes before shutting down Derby in replication mode and also a 1000 ms sleep time, and everything seems to be working reliably now.

Not sure what is going on here, but there is a work-around.

I also tried to modify Derby code to bypass all of the NullPointerExceptions (one after another), but on the next start-up I could no longer start replication.

Thanks,
Oskar


________________________________
 From: Oskar Zinger <os...@yahoo.com>
To: Derby Discussion <de...@db.apache.org> 
Sent: Monday, February 25, 2013 2:42 PM
Subject: Re: NullPointerException when Shuting Down Derby
 

Hi Dag,

Imagine a two server system, S1 and S2. One is "designated" primary (S1) and another is secondary (S2). Here is a scenario, and the sequence of events:

Note: Designated primary will always take control back as a primary server in the cluster


1. (S1) primary starts and starts Derby - right now its stand-alone server
2. (S2) secondary starts and starts Derby - now it will setup replication, will execute startSlave in a new thread, and execute startMaster
3. (S1) now designated primary gets shutdown or crashes
4. (S2) the secondary server detects this, assumes the role of primary and stops Derby (shutdown of entire Derby engine - including all databases - NOT using stopMaster / stopSlave), starts Derby as the new master (primary)
5. (S1) now designated primary comes back and wants to take control back as the primary - that's where the problem happens - we call it failback, a couple of things happen:
      -- (S1) starts first as a secondary of the cluster - it needs to resync configuration and database, now (S2) Derby is Primary, (S1) Derby is Secondary
      -- (S1) now sends message to switch roles, (S1) Derby is going to shutdown (NullPointerException) and restart, (S2) is going to shutdown and restart (cannot setup replication because of NPE on S1)

Basically, it works the same way as in Step 4, and no NPE. And the strangest thing is - this is only happening on 1-processor system, its not possible to reproduce on a 2-processor system.

Thanks,
Oskar



________________________________
 From: Dag Wanvik <da...@oracle.com>
To: Derby Discussion <de...@db.apache.org> 
Sent: Sunday, February 24, 2013 10:42 PM
Subject: Re: NullPointerException when Shuting Down Derby
 



On 31.01.2013 06:13, Oskar Zinger wrote:

This is only happening in a specific scenario when a host application server failbacks, so what it does is stops a service that manages derby network server, and restarts it.
>
So, is this an attempt to shut down the ex-slave (now the failed
    over master) after the old master has been (re)started? I would
    perhaps be helpful if you can explain the replication scenario in
    some detail, since replication contains much code specific to
    replication.

Thanks,
Dag

Re: NullPointerException when Shuting Down Derby

Posted by Dag Wanvik <da...@oracle.com>.

Thanks, Oscar. It's good to hear you have found a work-around. These
cases are tricky to root cause unless we have a repro, but at least now
we have a description of your scenario so with some effort we could try
to recreate it.

Thanks,
Dag

On 27.03.2013 11:05, Oskar Zinger wrote:
> Here is an update...
>
> I started using stopMaster / stopSlave - URL connection attributes
> before shutting down Derby in replication mode and also a 1000 ms
> sleep time, and everything seems to be working reliably now.
>
> Not sure what is going on here, but there is a work-around.
>
> I also tried to modify Derby code to bypass all of the
> NullPointerExceptions (one after another), but on the next start-up I
> could no longer start replication.
>
> Thanks,
> Oskar
>
> ------------------------------------------------------------------------
> *From:* Oskar Zinger <os...@yahoo.com>
> *To:* Derby Discussion <de...@db.apache.org>
> *Sent:* Monday, February 25, 2013 2:42 PM
> *Subject:* Re: NullPointerException when Shuting Down Derby
>
> Hi Dag,
>
> Imagine a two server system, S1 and S2. One is "designated" primary
> (S1) and another is secondary (S2). Here is a scenario, and the
> sequence of events:
>
> Note: Designated primary will always take control back as a primary
> server in the cluster
>
> 1. (S1) primary starts and starts Derby - right now its stand-alone server
> 2. (S2) secondary starts and starts Derby - now it will setup
> replication, will execute startSlave in a new thread, and execute
> startMaster
> 3. (S1) now designated primary gets shutdown or crashes
> 4. (S2) the secondary server detects this, assumes the role of primary
> and stops Derby (shutdown of entire Derby engine - including all
> databases - NOT using stopMaster / stopSlave), starts Derby as the new
> master (primary)
> 5. (S1) now designated primary comes back and wants to take control
> back as the primary - that's where the problem happens - we call it
> failback, a couple of things happen:
>       -- (S1) starts first as a secondary of the cluster - it needs to
> resync configuration and database, now (S2) Derby is Primary, (S1)
> Derby is Secondary
>       -- (S1) now sends message to switch roles, (S1) Derby is going
> to shutdown (NullPointerException) and restart, (S2) is going to
> shutdown and restart (cannot setup replication because of NPE on S1)
>
> Basically, it works the same way as in Step 4, and no NPE. And the
> strangest thing is - this is only happening on 1-processor system, its
> not possible to reproduce on a 2-processor system.
>
> Thanks,
> Oskar
>
> ------------------------------------------------------------------------
> *From:* Dag Wanvik <da...@oracle.com>
> *To:* Derby Discussion <de...@db.apache.org>
> *Sent:* Sunday, February 24, 2013 10:42 PM
> *Subject:* Re: NullPointerException when Shuting Down Derby
>
>
> On 31.01.2013 06:13, Oskar Zinger wrote:
>> This is only happening in a specific scenario when a host application
>> server failbacks, so what it does is stops a service that manages
>> derby network server, and restarts it.
>
> So, is this an attempt to shut down the ex-slave (now the failed over
> master) after the old master has been (re)started? I would perhaps be
> helpful if you can explain the replication scenario in some detail,
> since replication contains much code specific to replication.
>
> Thanks,
> Dag
>
>
>