You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@oozie.apache.org by Anand Vidwansa <an...@gmail.com> on 2014/02/18 10:16:38 UTC

Oozie HA support with derby

Hi,

I wanted to know if there is a way to configure oozie HA support with derby
as database.
I understand oozie HA support expects an HA database.
But, is there a way to use native replication in derby to replicate data
between
two hosts and switch to secondary store in case oozie fails over?

Any help is appreciated!

Thanks,
Anand

Re: Oozie HA support with derby

Posted by Anand Vidwansa <an...@gmail.com>.
> The Oozie servers in your Oozie HA setup actually are all active; that is,
> they are all processing jobs at the same time -- there is no failover.

I see. So, this is more of an active-active configuration.

I got answers for all my questions.
Thanks a lot Robert! Your help is greatly appreciated!

Regards,
Anand


On Fri, Feb 21, 2014 at 1:07 AM, Robert Kanter <rk...@cloudera.com> wrote:

> >
> > Is there any known data migration tool which can migrate data from derby
> > to let's say mysql?
>
> Migrating the Oozie data out of Derby to another database is somewhat
> tricky.  You can take a look at this procedure given on the Cloudera
> Community forums, but I can't guarantee that it will work and I can't
> really help you with it:
>
> http://community.cloudera.com/t5/Batch-Processing-and-Workflow/Oozie-embedded-derby-to-mysql-what-is-the-best-way-to-go-about/m-p/5627#M195
> I'd recommend that you just start over with a new empty database in
> MySQL/Oracle/Postgres.  You won't lose any of the actual
> workflows/coordinators themselves, just the historical data; and any
> currently running or queues up workflows/coordinators will have to be
> resubmitted.
>
> I also have one more question. Since traditionally oozie database is
> local, a
> > node outage can make the database unavailable.
> > Can we have the oozie database on an nfs mount, so that in case of oozie
> server
> > process outage, the secondary process which will run on another node, can
> > access the database using nfs mount as well?
>
> The Oozie servers in your Oozie HA setup actually are all active; that is,
> they are all processing jobs at the same time -- there is no failover.  As
> such, the database isn't "local"; that only really has meaning for Derby.
>  When using MySQL/Oracle/Postgres, the database is always "remote" to each
> of the Oozie servers; it may even be a good idea to put it on a different
> machine from any of the Oozie servers.  I'm not an expert on these
> databases, but I don't think you can back them with an NFS mount; each of
> these databases has their own HA solution that you should look into.  So,
> there's no need to worry about an Oozie server process outage with a
> secondary process on another node; in Oozie HA, all Oozie servers are equal
> -- there is no leader.  Does this make sense?
>
>
>
> On Thu, Feb 20, 2014 at 8:36 AM, Anand Vidwansa <an...@gmail.com>
> wrote:
>
> > Thanks a lot for the prompt reply Robert!
> > I do have lot of data in my derby db which I need to migrate now to
> either
> > of mysql/oracle/postgres.
> > Is there any known data migration tool which can migrate data from derby
> to
> > let's say mysql?
> >
> > I also have one more question. Since traditionally oozie database is
> local,
> > a node outage can make the
> > database unavailable.
> > Can we have the oozie database on an nfs mount, so that in case of oozie
> > server process outage,
> > the secondary process which will run on another node, can access the
> > database using nfs mount as well?
> >
> > Thanks,
> > Anand
> >
> >
> > On Tue, Feb 18, 2014 at 11:36 PM, Robert Kanter <rkanter@cloudera.com
> > >wrote:
> >
> > > Hi,
> > >
> > > You can run Oozie without an HA database.  A non-HA MySQL, Oracle, or
> > > Postgres database will work just fine, other than if the database goes
> > > down, your Oozie becomes unavailable (i.e. the database is a single
> point
> > > of failure).
> > >
> > > The reason you can't use Derby for Oozie HA is because it doesn't
> support
> > > multiple concurrent connections.  In Oozie HA, each Oozie server
> connects
> > > to the database, so there are multiple connections; Derby doesn't allow
> > > that so only one of the Oozie servers would be able to connect.
> > >
> > > - Robert
> > >
> > >
> > > On Tue, Feb 18, 2014 at 1:16 AM, Anand Vidwansa <an...@gmail.com>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > I wanted to know if there is a way to configure oozie HA support with
> > > derby
> > > > as database.
> > > > I understand oozie HA support expects an HA database.
> > > > But, is there a way to use native replication in derby to replicate
> > data
> > > > between
> > > > two hosts and switch to secondary store in case oozie fails over?
> > > >
> > > > Any help is appreciated!
> > > >
> > > > Thanks,
> > > > Anand
> > > >
> > >
> >
>

Re: Oozie HA support with derby

Posted by Robert Kanter <rk...@cloudera.com>.
>
> Is there any known data migration tool which can migrate data from derby
> to let's say mysql?

Migrating the Oozie data out of Derby to another database is somewhat
tricky.  You can take a look at this procedure given on the Cloudera
Community forums, but I can't guarantee that it will work and I can't
really help you with it:
http://community.cloudera.com/t5/Batch-Processing-and-Workflow/Oozie-embedded-derby-to-mysql-what-is-the-best-way-to-go-about/m-p/5627#M195
I'd recommend that you just start over with a new empty database in
MySQL/Oracle/Postgres.  You won't lose any of the actual
workflows/coordinators themselves, just the historical data; and any
currently running or queues up workflows/coordinators will have to be
resubmitted.

I also have one more question. Since traditionally oozie database is local, a
> node outage can make the database unavailable.
> Can we have the oozie database on an nfs mount, so that in case of oozie server
> process outage, the secondary process which will run on another node, can
> access the database using nfs mount as well?

The Oozie servers in your Oozie HA setup actually are all active; that is,
they are all processing jobs at the same time -- there is no failover.  As
such, the database isn't "local"; that only really has meaning for Derby.
 When using MySQL/Oracle/Postgres, the database is always "remote" to each
of the Oozie servers; it may even be a good idea to put it on a different
machine from any of the Oozie servers.  I'm not an expert on these
databases, but I don't think you can back them with an NFS mount; each of
these databases has their own HA solution that you should look into.  So,
there's no need to worry about an Oozie server process outage with a
secondary process on another node; in Oozie HA, all Oozie servers are equal
-- there is no leader.  Does this make sense?



On Thu, Feb 20, 2014 at 8:36 AM, Anand Vidwansa <an...@gmail.com> wrote:

> Thanks a lot for the prompt reply Robert!
> I do have lot of data in my derby db which I need to migrate now to either
> of mysql/oracle/postgres.
> Is there any known data migration tool which can migrate data from derby to
> let's say mysql?
>
> I also have one more question. Since traditionally oozie database is local,
> a node outage can make the
> database unavailable.
> Can we have the oozie database on an nfs mount, so that in case of oozie
> server process outage,
> the secondary process which will run on another node, can access the
> database using nfs mount as well?
>
> Thanks,
> Anand
>
>
> On Tue, Feb 18, 2014 at 11:36 PM, Robert Kanter <rkanter@cloudera.com
> >wrote:
>
> > Hi,
> >
> > You can run Oozie without an HA database.  A non-HA MySQL, Oracle, or
> > Postgres database will work just fine, other than if the database goes
> > down, your Oozie becomes unavailable (i.e. the database is a single point
> > of failure).
> >
> > The reason you can't use Derby for Oozie HA is because it doesn't support
> > multiple concurrent connections.  In Oozie HA, each Oozie server connects
> > to the database, so there are multiple connections; Derby doesn't allow
> > that so only one of the Oozie servers would be able to connect.
> >
> > - Robert
> >
> >
> > On Tue, Feb 18, 2014 at 1:16 AM, Anand Vidwansa <an...@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > I wanted to know if there is a way to configure oozie HA support with
> > derby
> > > as database.
> > > I understand oozie HA support expects an HA database.
> > > But, is there a way to use native replication in derby to replicate
> data
> > > between
> > > two hosts and switch to secondary store in case oozie fails over?
> > >
> > > Any help is appreciated!
> > >
> > > Thanks,
> > > Anand
> > >
> >
>

Re: Oozie HA support with derby

Posted by Anand Vidwansa <an...@gmail.com>.
Thanks a lot for the prompt reply Robert!
I do have lot of data in my derby db which I need to migrate now to either
of mysql/oracle/postgres.
Is there any known data migration tool which can migrate data from derby to
let's say mysql?

I also have one more question. Since traditionally oozie database is local,
a node outage can make the
database unavailable.
Can we have the oozie database on an nfs mount, so that in case of oozie
server process outage,
the secondary process which will run on another node, can access the
database using nfs mount as well?

Thanks,
Anand


On Tue, Feb 18, 2014 at 11:36 PM, Robert Kanter <rk...@cloudera.com>wrote:

> Hi,
>
> You can run Oozie without an HA database.  A non-HA MySQL, Oracle, or
> Postgres database will work just fine, other than if the database goes
> down, your Oozie becomes unavailable (i.e. the database is a single point
> of failure).
>
> The reason you can't use Derby for Oozie HA is because it doesn't support
> multiple concurrent connections.  In Oozie HA, each Oozie server connects
> to the database, so there are multiple connections; Derby doesn't allow
> that so only one of the Oozie servers would be able to connect.
>
> - Robert
>
>
> On Tue, Feb 18, 2014 at 1:16 AM, Anand Vidwansa <an...@gmail.com>
> wrote:
>
> > Hi,
> >
> > I wanted to know if there is a way to configure oozie HA support with
> derby
> > as database.
> > I understand oozie HA support expects an HA database.
> > But, is there a way to use native replication in derby to replicate data
> > between
> > two hosts and switch to secondary store in case oozie fails over?
> >
> > Any help is appreciated!
> >
> > Thanks,
> > Anand
> >
>

Re: Oozie HA support with derby

Posted by Robert Kanter <rk...@cloudera.com>.
Hi,

You can run Oozie without an HA database.  A non-HA MySQL, Oracle, or
Postgres database will work just fine, other than if the database goes
down, your Oozie becomes unavailable (i.e. the database is a single point
of failure).

The reason you can't use Derby for Oozie HA is because it doesn't support
multiple concurrent connections.  In Oozie HA, each Oozie server connects
to the database, so there are multiple connections; Derby doesn't allow
that so only one of the Oozie servers would be able to connect.

- Robert


On Tue, Feb 18, 2014 at 1:16 AM, Anand Vidwansa <an...@gmail.com> wrote:

> Hi,
>
> I wanted to know if there is a way to configure oozie HA support with derby
> as database.
> I understand oozie HA support expects an HA database.
> But, is there a way to use native replication in derby to replicate data
> between
> two hosts and switch to secondary store in case oozie fails over?
>
> Any help is appreciated!
>
> Thanks,
> Anand
>