You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Vikram Kone <vi...@gmail.com> on 2015/08/15 17:49:54 UTC

How to run any application on Cassandra cluster in high availability mode

Hi,
We are planning to install Azkaban in solo server mode on a 24
node cassandra cluster to be able to schedule spark jobs with intricate
dependency chain. The problem, is since Cassandra has a no-SPOF
architecture ie any node can become the master for the cluster, it creates
the problem for Azkaban master since it's not a peer-peer architecture
where any node can become the master. Only a single mode has to be master
at any given time.

What are our options here? Are there any framworks or tools out there that
would allow any application to run on a cluster of machines with high
availablity?
Should I be looking at something like zookeeper for this ? Or Mesos may be?

Re: How to run any application on Cassandra cluster in high availability mode

Posted by Ken Hancock <ke...@schange.com>.
Off-topic to the Cassandra list, but corosync/pacemaker comes to mind for
automatic service switchover between nodes.

For monitoring and alerting, there's almost too many to mention...





On Tue, Aug 18, 2015 at 2:45 PM, Vikram Kone <vi...@gmail.com> wrote:

> Hi John,
> I have posted the same Q on azkaban google group but there is no response
> so far :(
> If i want to do the old school way of monitor, alert and start the process
> somewhere else..how can I do this? Are there some ready made tools to do
> this kind of general purpose monitoring and alerting for services on linux?
>
> On Sun, Aug 16, 2015 at 9:38 AM, Prem Yadav <ip...@gmail.com> wrote:
>
>> The MySQL is there just to save the state of things. I suppose it very
>> lightweight. Why not just install mysql on one of the nodes or a VM
>> somewhere.
>>
>>
>> On Sun, Aug 16, 2015 at 3:39 PM, John Wong <go...@gmail.com> wrote:
>>
>>> Sorry i meant integration with Cassandra (based on the docs by default
>>> it suggests MySQL)
>>>
>>>
>>> On Sunday, August 16, 2015, John Wong <go...@gmail.com> wrote:
>>>
>>>> There is no leader in cassandra. I suggest you ask Azkaban community
>>>> about intgteation with Azkaban and Azkaban HA.
>>>>
>>>> On Sunday, August 16, 2015, Vikram Kone <vi...@gmail.com> wrote:
>>>>
>>>>> Can't we use zoo keeper for leader election in Cassandra and based on
>>>>> who is leader ..run azkaban or any app instance for that matter on that
>>>>> Cassandra server. I'm thinking that I can copy the applocation folder to
>>>>> all nodes and then determine which one to run using zookeeper. Is that
>>>>> possible ?
>>>>>
>>>>> Sent from Outlook <http://aka.ms/Ox5hz3>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Sun, Aug 16, 2015 at 6:47 AM -0700, "John Wong" <
>>>>> gokoproject@gmail.com> wrote:
>>>>>
>>>>> Hi
>>>>>>
>>>>>> I am not familiar with Azkaban and probably a better question to the
>>>>>> Azkaban community IMO. But there seems to be two modes (
>>>>>> http://azkaban.github.io/azkaban/docs/2.5/) one is solo and one is
>>>>>> two-server mode, but either way I think still SPOF? If there is no
>>>>>> election, just based on process, my 2 cents would be monitor, alert, and
>>>>>> start the process somewhere else. Better yet, don't install the process on
>>>>>> Cassandra node. Keep your instance for one purpose only. If you run cloud
>>>>>> like AWS you will be able to autoscale min1 max1 easily.
>>>>>>
>>>>>>
>>>>>> Note: In peer-to-peer architecture, there is simply no concept of
>>>>>> master. You can start with some seed nodes for discovery. It depends how
>>>>>> you design discovery.
>>>>>>
>>>>>> On Sat, Aug 15, 2015 at 11:49 AM, Vikram Kone <vi...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>> We are planning to install Azkaban in solo server mode on a 24
>>>>>>> node cassandra cluster to be able to schedule spark jobs with intricate
>>>>>>> dependency chain. The problem, is since Cassandra has a no-SPOF
>>>>>>> architecture ie any node can become the master for the cluster, it creates
>>>>>>> the problem for Azkaban master since it's not a peer-peer architecture
>>>>>>> where any node can become the master. Only a single mode has to be master
>>>>>>> at any given time.
>>>>>>>
>>>>>>> What are our options here? Are there any framworks or tools out
>>>>>>> there that would allow any application to run on a cluster of machines with
>>>>>>> high availablity?
>>>>>>> Should I be looking at something like zookeeper for this ? Or Mesos
>>>>>>> may be?
>>>>>>
>>>>>>
>>>>>>
>>>>
>>>> --
>>>> Sent from Jeff Dean's printf() mobile console
>>>>
>>>
>>>
>>> --
>>> Sent from Jeff Dean's printf() mobile console
>>>
>>
>>
>

Re: How to run any application on Cassandra cluster in high availability mode

Posted by Otis Gospodnetić <ot...@gmail.com>.
Hi Vikram,

Running a monitor somewhere other than on Cassandra node itself.... hmm....
then you'd miss out on JVM metrics, OS metrics, ability to do transaction
tracing, on demand profiling, etc. which are all nice things to have when
you are troubleshooting issues, performance, doing stress tests, tuning,
and optimization...

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


On Tue, Aug 18, 2015 at 2:45 PM, Vikram Kone <vi...@gmail.com> wrote:

> Hi John,
> I have posted the same Q on azkaban google group but there is no response
> so far :(
> If i want to do the old school way of monitor, alert and start the process
> somewhere else..how can I do this? Are there some ready made tools to do
> this kind of general purpose monitoring and alerting for services on linux?
>
> On Sun, Aug 16, 2015 at 9:38 AM, Prem Yadav <ip...@gmail.com> wrote:
>
>> The MySQL is there just to save the state of things. I suppose it very
>> lightweight. Why not just install mysql on one of the nodes or a VM
>> somewhere.
>>
>>
>> On Sun, Aug 16, 2015 at 3:39 PM, John Wong <go...@gmail.com> wrote:
>>
>>> Sorry i meant integration with Cassandra (based on the docs by default
>>> it suggests MySQL)
>>>
>>>
>>> On Sunday, August 16, 2015, John Wong <go...@gmail.com> wrote:
>>>
>>>> There is no leader in cassandra. I suggest you ask Azkaban community
>>>> about intgteation with Azkaban and Azkaban HA.
>>>>
>>>> On Sunday, August 16, 2015, Vikram Kone <vi...@gmail.com> wrote:
>>>>
>>>>> Can't we use zoo keeper for leader election in Cassandra and based on
>>>>> who is leader ..run azkaban or any app instance for that matter on that
>>>>> Cassandra server. I'm thinking that I can copy the applocation folder to
>>>>> all nodes and then determine which one to run using zookeeper. Is that
>>>>> possible ?
>>>>>
>>>>> Sent from Outlook <http://aka.ms/Ox5hz3>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Sun, Aug 16, 2015 at 6:47 AM -0700, "John Wong" <
>>>>> gokoproject@gmail.com> wrote:
>>>>>
>>>>> Hi
>>>>>>
>>>>>> I am not familiar with Azkaban and probably a better question to the
>>>>>> Azkaban community IMO. But there seems to be two modes (
>>>>>> http://azkaban.github.io/azkaban/docs/2.5/) one is solo and one is
>>>>>> two-server mode, but either way I think still SPOF? If there is no
>>>>>> election, just based on process, my 2 cents would be monitor, alert, and
>>>>>> start the process somewhere else. Better yet, don't install the process on
>>>>>> Cassandra node. Keep your instance for one purpose only. If you run cloud
>>>>>> like AWS you will be able to autoscale min1 max1 easily.
>>>>>>
>>>>>>
>>>>>> Note: In peer-to-peer architecture, there is simply no concept of
>>>>>> master. You can start with some seed nodes for discovery. It depends how
>>>>>> you design discovery.
>>>>>>
>>>>>> On Sat, Aug 15, 2015 at 11:49 AM, Vikram Kone <vi...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>> We are planning to install Azkaban in solo server mode on a 24
>>>>>>> node cassandra cluster to be able to schedule spark jobs with intricate
>>>>>>> dependency chain. The problem, is since Cassandra has a no-SPOF
>>>>>>> architecture ie any node can become the master for the cluster, it creates
>>>>>>> the problem for Azkaban master since it's not a peer-peer architecture
>>>>>>> where any node can become the master. Only a single mode has to be master
>>>>>>> at any given time.
>>>>>>>
>>>>>>> What are our options here? Are there any framworks or tools out
>>>>>>> there that would allow any application to run on a cluster of machines with
>>>>>>> high availablity?
>>>>>>> Should I be looking at something like zookeeper for this ? Or Mesos
>>>>>>> may be?
>>>>>>
>>>>>>
>>>>>>
>>>>
>>>> --
>>>> Sent from Jeff Dean's printf() mobile console
>>>>
>>>
>>>
>>> --
>>> Sent from Jeff Dean's printf() mobile console
>>>
>>
>>
>

Re: How to run any application on Cassandra cluster in high availability mode

Posted by Vikram Kone <vi...@gmail.com>.
Hi John,
I have posted the same Q on azkaban google group but there is no response
so far :(
If i want to do the old school way of monitor, alert and start the process
somewhere else..how can I do this? Are there some ready made tools to do
this kind of general purpose monitoring and alerting for services on linux?

On Sun, Aug 16, 2015 at 9:38 AM, Prem Yadav <ip...@gmail.com> wrote:

> The MySQL is there just to save the state of things. I suppose it very
> lightweight. Why not just install mysql on one of the nodes or a VM
> somewhere.
>
>
> On Sun, Aug 16, 2015 at 3:39 PM, John Wong <go...@gmail.com> wrote:
>
>> Sorry i meant integration with Cassandra (based on the docs by default it
>> suggests MySQL)
>>
>>
>> On Sunday, August 16, 2015, John Wong <go...@gmail.com> wrote:
>>
>>> There is no leader in cassandra. I suggest you ask Azkaban community
>>> about intgteation with Azkaban and Azkaban HA.
>>>
>>> On Sunday, August 16, 2015, Vikram Kone <vi...@gmail.com> wrote:
>>>
>>>> Can't we use zoo keeper for leader election in Cassandra and based on
>>>> who is leader ..run azkaban or any app instance for that matter on that
>>>> Cassandra server. I'm thinking that I can copy the applocation folder to
>>>> all nodes and then determine which one to run using zookeeper. Is that
>>>> possible ?
>>>>
>>>> Sent from Outlook <http://aka.ms/Ox5hz3>
>>>>
>>>>
>>>>
>>>>
>>>> On Sun, Aug 16, 2015 at 6:47 AM -0700, "John Wong" <
>>>> gokoproject@gmail.com> wrote:
>>>>
>>>> Hi
>>>>>
>>>>> I am not familiar with Azkaban and probably a better question to the
>>>>> Azkaban community IMO. But there seems to be two modes (
>>>>> http://azkaban.github.io/azkaban/docs/2.5/) one is solo and one is
>>>>> two-server mode, but either way I think still SPOF? If there is no
>>>>> election, just based on process, my 2 cents would be monitor, alert, and
>>>>> start the process somewhere else. Better yet, don't install the process on
>>>>> Cassandra node. Keep your instance for one purpose only. If you run cloud
>>>>> like AWS you will be able to autoscale min1 max1 easily.
>>>>>
>>>>>
>>>>> Note: In peer-to-peer architecture, there is simply no concept of
>>>>> master. You can start with some seed nodes for discovery. It depends how
>>>>> you design discovery.
>>>>>
>>>>> On Sat, Aug 15, 2015 at 11:49 AM, Vikram Kone <vi...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi,
>>>>>> We are planning to install Azkaban in solo server mode on a 24
>>>>>> node cassandra cluster to be able to schedule spark jobs with intricate
>>>>>> dependency chain. The problem, is since Cassandra has a no-SPOF
>>>>>> architecture ie any node can become the master for the cluster, it creates
>>>>>> the problem for Azkaban master since it's not a peer-peer architecture
>>>>>> where any node can become the master. Only a single mode has to be master
>>>>>> at any given time.
>>>>>>
>>>>>> What are our options here? Are there any framworks or tools out there
>>>>>> that would allow any application to run on a cluster of machines with high
>>>>>> availablity?
>>>>>> Should I be looking at something like zookeeper for this ? Or Mesos
>>>>>> may be?
>>>>>
>>>>>
>>>>>
>>>
>>> --
>>> Sent from Jeff Dean's printf() mobile console
>>>
>>
>>
>> --
>> Sent from Jeff Dean's printf() mobile console
>>
>
>

Re: How to run any application on Cassandra cluster in high availability mode

Posted by Prem Yadav <ip...@gmail.com>.
The MySQL is there just to save the state of things. I suppose it very
lightweight. Why not just install mysql on one of the nodes or a VM
somewhere.


On Sun, Aug 16, 2015 at 3:39 PM, John Wong <go...@gmail.com> wrote:

> Sorry i meant integration with Cassandra (based on the docs by default it
> suggests MySQL)
>
>
> On Sunday, August 16, 2015, John Wong <go...@gmail.com> wrote:
>
>> There is no leader in cassandra. I suggest you ask Azkaban community
>> about intgteation with Azkaban and Azkaban HA.
>>
>> On Sunday, August 16, 2015, Vikram Kone <vi...@gmail.com> wrote:
>>
>>> Can't we use zoo keeper for leader election in Cassandra and based on
>>> who is leader ..run azkaban or any app instance for that matter on that
>>> Cassandra server. I'm thinking that I can copy the applocation folder to
>>> all nodes and then determine which one to run using zookeeper. Is that
>>> possible ?
>>>
>>> Sent from Outlook <http://aka.ms/Ox5hz3>
>>>
>>>
>>>
>>>
>>> On Sun, Aug 16, 2015 at 6:47 AM -0700, "John Wong" <
>>> gokoproject@gmail.com> wrote:
>>>
>>> Hi
>>>>
>>>> I am not familiar with Azkaban and probably a better question to the
>>>> Azkaban community IMO. But there seems to be two modes (
>>>> http://azkaban.github.io/azkaban/docs/2.5/) one is solo and one is
>>>> two-server mode, but either way I think still SPOF? If there is no
>>>> election, just based on process, my 2 cents would be monitor, alert, and
>>>> start the process somewhere else. Better yet, don't install the process on
>>>> Cassandra node. Keep your instance for one purpose only. If you run cloud
>>>> like AWS you will be able to autoscale min1 max1 easily.
>>>>
>>>>
>>>> Note: In peer-to-peer architecture, there is simply no concept of
>>>> master. You can start with some seed nodes for discovery. It depends how
>>>> you design discovery.
>>>>
>>>> On Sat, Aug 15, 2015 at 11:49 AM, Vikram Kone <vi...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>> We are planning to install Azkaban in solo server mode on a 24
>>>>> node cassandra cluster to be able to schedule spark jobs with intricate
>>>>> dependency chain. The problem, is since Cassandra has a no-SPOF
>>>>> architecture ie any node can become the master for the cluster, it creates
>>>>> the problem for Azkaban master since it's not a peer-peer architecture
>>>>> where any node can become the master. Only a single mode has to be master
>>>>> at any given time.
>>>>>
>>>>> What are our options here? Are there any framworks or tools out there
>>>>> that would allow any application to run on a cluster of machines with high
>>>>> availablity?
>>>>> Should I be looking at something like zookeeper for this ? Or Mesos
>>>>> may be?
>>>>
>>>>
>>>>
>>
>> --
>> Sent from Jeff Dean's printf() mobile console
>>
>
>
> --
> Sent from Jeff Dean's printf() mobile console
>

Re: How to run any application on Cassandra cluster in high availability mode

Posted by John Wong <go...@gmail.com>.
Sorry i meant integration with Cassandra (based on the docs by default it
suggests MySQL)

On Sunday, August 16, 2015, John Wong <go...@gmail.com> wrote:

> There is no leader in cassandra. I suggest you ask Azkaban community about
> intgteation with Azkaban and Azkaban HA.
>
> On Sunday, August 16, 2015, Vikram Kone <vikramkone@gmail.com
> <javascript:_e(%7B%7D,'cvml','vikramkone@gmail.com');>> wrote:
>
>> Can't we use zoo keeper for leader election in Cassandra and based on who
>> is leader ..run azkaban or any app instance for that matter on that
>> Cassandra server. I'm thinking that I can copy the applocation folder to
>> all nodes and then determine which one to run using zookeeper. Is that
>> possible ?
>>
>> Sent from Outlook <http://aka.ms/Ox5hz3>
>>
>>
>>
>>
>> On Sun, Aug 16, 2015 at 6:47 AM -0700, "John Wong" <gokoproject@gmail.com
>> > wrote:
>>
>> Hi
>>>
>>> I am not familiar with Azkaban and probably a better question to the
>>> Azkaban community IMO. But there seems to be two modes (
>>> http://azkaban.github.io/azkaban/docs/2.5/) one is solo and one is
>>> two-server mode, but either way I think still SPOF? If there is no
>>> election, just based on process, my 2 cents would be monitor, alert, and
>>> start the process somewhere else. Better yet, don't install the process on
>>> Cassandra node. Keep your instance for one purpose only. If you run cloud
>>> like AWS you will be able to autoscale min1 max1 easily.
>>>
>>>
>>> Note: In peer-to-peer architecture, there is simply no concept of
>>> master. You can start with some seed nodes for discovery. It depends how
>>> you design discovery.
>>>
>>> On Sat, Aug 15, 2015 at 11:49 AM, Vikram Kone <vi...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>> We are planning to install Azkaban in solo server mode on a 24
>>>> node cassandra cluster to be able to schedule spark jobs with intricate
>>>> dependency chain. The problem, is since Cassandra has a no-SPOF
>>>> architecture ie any node can become the master for the cluster, it creates
>>>> the problem for Azkaban master since it's not a peer-peer architecture
>>>> where any node can become the master. Only a single mode has to be master
>>>> at any given time.
>>>>
>>>> What are our options here? Are there any framworks or tools out there
>>>> that would allow any application to run on a cluster of machines with high
>>>> availablity?
>>>> Should I be looking at something like zookeeper for this ? Or Mesos may
>>>> be?
>>>
>>>
>>>
>
> --
> Sent from Jeff Dean's printf() mobile console
>


-- 
Sent from Jeff Dean's printf() mobile console

Re: How to run any application on Cassandra cluster in high availability mode

Posted by John Wong <go...@gmail.com>.
There is no leader in cassandra. I suggest you ask Azkaban community about
intgteation with Azkaban and Azkaban HA.

On Sunday, August 16, 2015, Vikram Kone <vi...@gmail.com> wrote:

> Can't we use zoo keeper for leader election in Cassandra and based on who
> is leader ..run azkaban or any app instance for that matter on that
> Cassandra server. I'm thinking that I can copy the applocation folder to
> all nodes and then determine which one to run using zookeeper. Is that
> possible ?
>
> Sent from Outlook <http://aka.ms/Ox5hz3>
>
>
>
>
> On Sun, Aug 16, 2015 at 6:47 AM -0700, "John Wong" <gokoproject@gmail.com
> <javascript:_e(%7B%7D,'cvml','gokoproject@gmail.com');>> wrote:
>
> Hi
>>
>> I am not familiar with Azkaban and probably a better question to the
>> Azkaban community IMO. But there seems to be two modes (
>> http://azkaban.github.io/azkaban/docs/2.5/) one is solo and one is
>> two-server mode, but either way I think still SPOF? If there is no
>> election, just based on process, my 2 cents would be monitor, alert, and
>> start the process somewhere else. Better yet, don't install the process on
>> Cassandra node. Keep your instance for one purpose only. If you run cloud
>> like AWS you will be able to autoscale min1 max1 easily.
>>
>>
>> Note: In peer-to-peer architecture, there is simply no concept of master.
>> You can start with some seed nodes for discovery. It depends how you design
>> discovery.
>>
>> On Sat, Aug 15, 2015 at 11:49 AM, Vikram Kone <vikramkone@gmail.com
>> <javascript:_e(%7B%7D,'cvml','vikramkone@gmail.com');>> wrote:
>>
>>> Hi,
>>> We are planning to install Azkaban in solo server mode on a 24
>>> node cassandra cluster to be able to schedule spark jobs with intricate
>>> dependency chain. The problem, is since Cassandra has a no-SPOF
>>> architecture ie any node can become the master for the cluster, it creates
>>> the problem for Azkaban master since it's not a peer-peer architecture
>>> where any node can become the master. Only a single mode has to be master
>>> at any given time.
>>>
>>> What are our options here? Are there any framworks or tools out there
>>> that would allow any application to run on a cluster of machines with high
>>> availablity?
>>> Should I be looking at something like zookeeper for this ? Or Mesos may
>>> be?
>>
>>
>>

-- 
Sent from Jeff Dean's printf() mobile console

Re: How to run any application on Cassandra cluster in high availability mode

Posted by Vikram Kone <vi...@gmail.com>.
Can't we use zoo keeper for leader election in Cassandra and based on who is leader ..run azkaban or any app instance for that matter on that Cassandra server. I'm thinking that I can copy the applocation folder to all nodes and then determine which one to run using zookeeper. Is that possible ?

Sent from Outlook




On Sun, Aug 16, 2015 at 6:47 AM -0700, "John Wong" <go...@gmail.com> wrote:










Hi
I am not familiar with Azkaban and probably a better question to the Azkaban community IMO. But there seems to be two modes (http://azkaban.github.io/azkaban/docs/2.5/) one is solo and one is two-server mode, but either way I think still SPOF? If there is no election, just based on process, my 2 cents would be monitor, alert, and start the process somewhere else. Better yet, don't install the process on Cassandra node. Keep your instance for one purpose only. If you run cloud like AWS you will be able to autoscale min1 max1 easily.


Note: In peer-to-peer architecture, there is simply no concept of master. You can start with some seed nodes for discovery. It depends how you design discovery.
On Sat, Aug 15, 2015 at 11:49 AM, Vikram Kone <vi...@gmail.com> wrote:
Hi,
We are planning to install Azkaban in solo server mode on a 24 node cassandra cluster to be able to schedule spark jobs with intricate dependency chain. The problem, is since Cassandra has a no-SPOF architecture ie any node can become the master for the cluster, it creates the problem for Azkaban master since it's not a peer-peer architecture where any node can become the master. Only a single mode has to be master at any given time. 
What are our options here? Are there any framworks or tools out there that would allow any application to run on a cluster of machines with high availablity?
Should I be looking at something like zookeeper for this ? Or Mesos may be?

Re: How to run any application on Cassandra cluster in high availability mode

Posted by John Wong <go...@gmail.com>.
Hi

I am not familiar with Azkaban and probably a better question to the
Azkaban community IMO. But there seems to be two modes (
http://azkaban.github.io/azkaban/docs/2.5/) one is solo and one is
two-server mode, but either way I think still SPOF? If there is no
election, just based on process, my 2 cents would be monitor, alert, and
start the process somewhere else. Better yet, don't install the process on
Cassandra node. Keep your instance for one purpose only. If you run cloud
like AWS you will be able to autoscale min1 max1 easily.


Note: In peer-to-peer architecture, there is simply no concept of master.
You can start with some seed nodes for discovery. It depends how you design
discovery.

On Sat, Aug 15, 2015 at 11:49 AM, Vikram Kone <vi...@gmail.com> wrote:

> Hi,
> We are planning to install Azkaban in solo server mode on a 24
> node cassandra cluster to be able to schedule spark jobs with intricate
> dependency chain. The problem, is since Cassandra has a no-SPOF
> architecture ie any node can become the master for the cluster, it creates
> the problem for Azkaban master since it's not a peer-peer architecture
> where any node can become the master. Only a single mode has to be master
> at any given time.
>
> What are our options here? Are there any framworks or tools out there that
> would allow any application to run on a cluster of machines with high
> availablity?
> Should I be looking at something like zookeeper for this ? Or Mesos may
> be?