You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@brooklyn.apache.org by jade mackay <ja...@gmail.com> on 2015/04/18 12:31:28 UTC

Unsuccessful cassandra deployment using yaml from blueprint-libary

Hi,

I am trying to start a cassandra cluster on amazon ec2 using
 cassandra-blueprint.yaml (slightly mdified) from
https://github.com/brooklyncentral/blueprint-library.git:


name: cassandra-cluster-app-defserv
services:
- type: brooklyn.entity.nosql.cassandra.CassandraCluster
  name: Cassandra Cluster
  brooklyn.config:
    cluster.initial.size: 2
    cluster.initial.quorumSize: 1
    provisioning.properties:
      minCores: 1
      minRam: 100

Everything looks fine. I can ssh into the nodes and run nodetoosl status,
which gives reasonable output:

Note: Ownership information does not include topology; for complete
information, specify a keyspace
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load       Owns   Host ID
Token                                    Rack
UN  10.232.138.106  10.77 KB   50.0%  22fda260-3cd5-4342-bcb3-b2d4b38facc5
 -5997542197209433990                     rack1
UN  10.254.20.58    14.04 KB   50.0%  1a1ea5b4-5285-4f88-8e31-c9a4538f6a62
 3225829839645341818                      rack1

However, after a few minutes the instances shut are shut down

2015-04-18 08:51:57,255 INFO  Launching CassandraNodeImpl{id=PRYt1W19}:
cluster BrooklynCluster, hostname (public)
ec2-52-12-82-75.us-west-2.compute.amazonaws.com, hostname (subnet)
ec2-52-12-82-75.us-west-2.compute.amazonaws.com, seeds
ec2-52-12-82-75.us-west-2.compute.amazonaws.com (from
[CassandraNodeImpl{id=PRYt1W19}])
2015-04-18 08:51:59,626 INFO  Launching CassandraNodeImpl{id=zbsCBjGS}:
delaying launch of non-first node by 59s 994ms to prevent schema
disagreements

...good.. and then:

2015-04-18 08:57:11,547 WARN  Error invoking start at
CassandraNodeImpl{id=PRYt1W19}: Timeout waiting for SERVICE_UP from
CassandraNodeImpl{id=PRYt1W19}
2015-04-18 08:57:11,547 WARN  Cluster CassandraClusterImpl{id=Nz5UaPes}
lost all its seeds while starting! Subsequent failure likely, but changing
seeds during startup would risk split-brain:
seeds=[CassandraNodeImpl{id=PRYt1W19}]

... and now shut down cascade starts.

2015-04-18 08:59:20,930 WARN
 brooklyn.management.internal.LocalEntityManager@4dab43cb call to stop
management of unknown entity (already unmanaged?)
CassandraNodeImpl{id=PRYt1W19}; skipping, and all descendants
2015-04-18 08:59:20,934 INFO  Stopped application
BasicApplicationImpl{id=DwHO5Z9Y}


Any advice would be appreciated.

p.s. Is this the correct forum for this query?

Thanks,
Jade

Re: Unsuccessful cassandra deployment using yaml from blueprint-libary

Posted by Svetoslav Neykov <sv...@cloudsoftcorp.com>.

Hi Jade,
In cases like this it's useful to check the logs on the machine. "Timeout waiting for SERVICE_UP " means that Brooklyn doesn't see the process running.
The output of the cassandra process is kept at "cassandra-console.log" in the runtime directory (see run.dir in Sensors tab for any of the cluster nodes). Also you can check if the process is still running on the machine. As Alex suggested check if a single CassandraNode works.
You can stop by our IRC channel #brooklyncentral IRC on FreeNode for more help on troubleshooting this.

Best,
Svet.

> On 19.04.2015 г., at 5:42, jade mackay <ja...@gmail.com> wrote:
> 
> Hi Alex,
> 
> The cluster was shutting down because it catches fire and I was launching
> it from the command line rather than the web console.
> When launched from the web console the cluster persists and data propagates
> over the nodes, despite being on fire with all nodes quarantined.
> Incidentally, I can use the cluster effector expand the cluster but not
> shrink it.
> 
> The top level "cassandra-cluster-app" summary:
> 
> Required entity not healthy: CassandraClusterImpl{id=DUHV0IoT}
> *Failure running task invoking start[locations] on 1 node (FPi4GFmk)
> <http://localhost:8081/#/v1/applications/BiAPu1MO/entities/BiAPu1MO/activities/subtask/FPi4GFmk>:
> *Error
> invoking start at CassandraClusterImpl{id=DUHV0IoT}: Node in cluster
> CassandraClusterImpl{id=DUHV0IoT} failed: 2 of 2 parallel child tasks
> failed, 2 errors including: Error invoking start at
> CassandraNodeImpl{id=Ewkzxyc8}: Timeout waiting for SERVICE_UP from
> CassandraNodeImpl{id=Ewkzxyc8}
> 
> and the "Cassandra Cluster" summary:
> 
> start failed with error: java.lang.IllegalStateException: Node in cluster
> CassandraClusterImpl{id=DUHV0IoT} failed: 2 of 2 parallel child tasks
> failed, 2 errors including: Error invoking start at
> CassandraNodeImpl{id=Ewkzxyc8}: Timeout waiting for SERVICE_UP from
> CassandraNodeImpl{id=Ewkzxyc8}
> *Failure running task starting 2 nodes (parallel) (jmlBKmM8)
> <http://localhost:8081/#/v1/applications/BiAPu1MO/entities/DUHV0IoT/activities/subtask/jmlBKmM8>:
> *2
> of 2 parallel child tasks failed, 2 errors including: Error invoking start
> at CassandraNodeImpl{id=Ewkzxyc8}: Timeout waiting for SERVICE_UP from
> CassandraNodeImpl{id=Ewkzxyc8}
> 
> And one of the (aptly named) burning nodes "CassandraNode:D0wG":
> 
> The software process for this entity does not appear to be running
> *Failure running task post-start (qBmoHFo2)
> <http://localhost:8081/#/v1/applications/BiAPu1MO/entities/D0wGMtHJ/activities/subtask/qBmoHFo2>:
> *
> 
> For reference the blueprint:
> name: cassandra-cluster-app
> services:
> - type: brooklyn.entity.nosql.cassandra.CassandraCluster
>  name: Cassandra Cluster
>  brooklyn.config:
>    cluster.initial.size: 2
>    cluster.initial.quorumSize: 1
>    provisioning.properties:
>      minCores: 1
>      minRam: 512
> location: aws-oregon
> 
> Again, any tips on hunting the issue down would be appreciated.
> 
> Cheers,
> Jade
> 
> 
> 
> 
> 
> 
> On 18 April 2015 at 23:30, Alex Heneveld <al...@cloudsoftcorp.com>
> wrote:
> 
>> 
>> Hi Jade,
>> 
>> Yes, this is the right place for your question.  Getting the Cassandra
>> start-up sequence took some work, especially in different clouds with
>> different notions of public and private networks, but this was hammered out
>> a while ago and it has been pretty reliable since then, including in AWS, I
>> thought.  Some questions and idea...
>> 
>> Does a single CassandraNode work?
>> 
>> The other strange thing is that it is shutting down the application.  A
>> policy might shut down failed nodes -- though I think by default these are
>> "quarantined", ie kept around for investigation rather than outright
>> deleted -- but the *application* should only be shut down if that is
>> manually initiated.  Can you grep the logs for "DwHO5Z9Y" to see what
>> triggered its shutdown?
>> 
>> Finally, another thing to try is giving it a bit more RAM, maybe 100 (mb)
>> is just too low, and that's why the cluster is failing. Try "512m".
>> 
>> Best
>> Alex
>> 
>> 
>> 
>> 
>> On 18/04/2015 11:31, jade mackay wrote:
>> 
>>> Hi,
>>> 
>>> I am trying to start a cassandra cluster on amazon ec2 using
>>>  cassandra-blueprint.yaml (slightly mdified) from
>>> https://github.com/brooklyncentral/blueprint-library.git:
>>> 
>>> 
>>> name: cassandra-cluster-app-defserv
>>> services:
>>> - type: brooklyn.entity.nosql.cassandra.CassandraCluster
>>>   name: Cassandra Cluster
>>>   brooklyn.config:
>>>     cluster.initial.size: 2
>>>     cluster.initial.quorumSize: 1
>>>     provisioning.properties:
>>>       minCores: 1
>>>       minRam: 100
>>> 
>>> Everything looks fine. I can ssh into the nodes and run nodetoosl status,
>>> which gives reasonable output:
>>> 
>>> Note: Ownership information does not include topology; for complete
>>> information, specify a keyspace
>>> Datacenter: datacenter1
>>> =======================
>>> Status=Up/Down
>>> |/ State=Normal/Leaving/Joining/Moving
>>> --  Address         Load       Owns   Host ID
>>> Token                                    Rack
>>> UN  10.232.138.106  10.77 KB   50.0%  22fda260-3cd5-4342-bcb3-b2d4b38facc5
>>>  -5997542197209433990                     rack1
>>> UN  10.254.20.58    14.04 KB   50.0%  1a1ea5b4-5285-4f88-8e31-c9a4538f6a62
>>>  3225829839645341818                      rack1
>>> 
>>> However, after a few minutes the instances shut are shut down
>>> 
>>> 2015-04-18 08:51:57,255 INFO  Launching CassandraNodeImpl{id=PRYt1W19}:
>>> cluster BrooklynCluster, hostname (public)
>>> ec2-52-12-82-75.us-west-2.compute.amazonaws.com, hostname (subnet)
>>> ec2-52-12-82-75.us-west-2.compute.amazonaws.com, seeds
>>> ec2-52-12-82-75.us-west-2.compute.amazonaws.com (from
>>> [CassandraNodeImpl{id=PRYt1W19}])
>>> 2015-04-18 08:51:59,626 INFO  Launching CassandraNodeImpl{id=zbsCBjGS}:
>>> delaying launch of non-first node by 59s 994ms to prevent schema
>>> disagreements
>>> 
>>> ...good.. and then:
>>> 
>>> 2015-04-18 08:57:11,547 WARN  Error invoking start at
>>> CassandraNodeImpl{id=PRYt1W19}: Timeout waiting for SERVICE_UP from
>>> CassandraNodeImpl{id=PRYt1W19}
>>> 2015-04-18 08:57:11,547 WARN  Cluster CassandraClusterImpl{id=Nz5UaPes}
>>> lost all its seeds while starting! Subsequent failure likely, but changing
>>> seeds during startup would risk split-brain:
>>> seeds=[CassandraNodeImpl{id=PRYt1W19}]
>>> 
>>> ... and now shut down cascade starts.
>>> 
>>> 2015-04-18 08:59:20,930 WARN
>>>  brooklyn.management.internal.LocalEntityManager@4dab43cb call to stop
>>> management of unknown entity (already unmanaged?)
>>> CassandraNodeImpl{id=PRYt1W19}; skipping, and all descendants
>>> 2015-04-18 08:59:20,934 INFO  Stopped application
>>> BasicApplicationImpl{id=DwHO5Z9Y}
>>> 
>>> 
>>> Any advice would be appreciated.
>>> 
>>> p.s. Is this the correct forum for this query?
>>> 
>>> Thanks,
>>> Jade
>>> 
>>> 
>> 
> 
> 
> -- 
> Jade Mackay
> e: jademackay@gmail.com
> m: +64-(0)22-319-0847

Re: Unsuccessful cassandra deployment using yaml from blueprint-libary

Posted by jade mackay <ja...@gmail.com>.

Hi Alex,

The cluster was shutting down because it catches fire and I was launching
it from the command line rather than the web console.
When launched from the web console the cluster persists and data propagates
over the nodes, despite being on fire with all nodes quarantined.
Incidentally, I can use the cluster effector expand the cluster but not
shrink it.

The top level "cassandra-cluster-app" summary:

Required entity not healthy: CassandraClusterImpl{id=DUHV0IoT}
*Failure running task invoking start[locations] on 1 node (FPi4GFmk)
<http://localhost:8081/#/v1/applications/BiAPu1MO/entities/BiAPu1MO/activities/subtask/FPi4GFmk>:
*Error
invoking start at CassandraClusterImpl{id=DUHV0IoT}: Node in cluster
CassandraClusterImpl{id=DUHV0IoT} failed: 2 of 2 parallel child tasks
failed, 2 errors including: Error invoking start at
CassandraNodeImpl{id=Ewkzxyc8}: Timeout waiting for SERVICE_UP from
CassandraNodeImpl{id=Ewkzxyc8}

and the "Cassandra Cluster" summary:

start failed with error: java.lang.IllegalStateException: Node in cluster
CassandraClusterImpl{id=DUHV0IoT} failed: 2 of 2 parallel child tasks
failed, 2 errors including: Error invoking start at
CassandraNodeImpl{id=Ewkzxyc8}: Timeout waiting for SERVICE_UP from
CassandraNodeImpl{id=Ewkzxyc8}
*Failure running task starting 2 nodes (parallel) (jmlBKmM8)
<http://localhost:8081/#/v1/applications/BiAPu1MO/entities/DUHV0IoT/activities/subtask/jmlBKmM8>:
*2
of 2 parallel child tasks failed, 2 errors including: Error invoking start
at CassandraNodeImpl{id=Ewkzxyc8}: Timeout waiting for SERVICE_UP from
CassandraNodeImpl{id=Ewkzxyc8}

And one of the (aptly named) burning nodes "CassandraNode:D0wG":

The software process for this entity does not appear to be running
*Failure running task post-start (qBmoHFo2)
<http://localhost:8081/#/v1/applications/BiAPu1MO/entities/D0wGMtHJ/activities/subtask/qBmoHFo2>:
*

For reference the blueprint:
name: cassandra-cluster-app
services:
- type: brooklyn.entity.nosql.cassandra.CassandraCluster
  name: Cassandra Cluster
  brooklyn.config:
    cluster.initial.size: 2
    cluster.initial.quorumSize: 1
    provisioning.properties:
      minCores: 1
      minRam: 512
location: aws-oregon

Again, any tips on hunting the issue down would be appreciated.

Cheers,
Jade






On 18 April 2015 at 23:30, Alex Heneveld <al...@cloudsoftcorp.com>
wrote:

>
> Hi Jade,
>
> Yes, this is the right place for your question.  Getting the Cassandra
> start-up sequence took some work, especially in different clouds with
> different notions of public and private networks, but this was hammered out
> a while ago and it has been pretty reliable since then, including in AWS, I
> thought.  Some questions and idea...
>
> Does a single CassandraNode work?
>
> The other strange thing is that it is shutting down the application.  A
> policy might shut down failed nodes -- though I think by default these are
> "quarantined", ie kept around for investigation rather than outright
> deleted -- but the *application* should only be shut down if that is
> manually initiated.  Can you grep the logs for "DwHO5Z9Y" to see what
> triggered its shutdown?
>
> Finally, another thing to try is giving it a bit more RAM, maybe 100 (mb)
> is just too low, and that's why the cluster is failing. Try "512m".
>
> Best
> Alex
>
>
>
>
> On 18/04/2015 11:31, jade mackay wrote:
>
>> Hi,
>>
>> I am trying to start a cassandra cluster on amazon ec2 using
>>   cassandra-blueprint.yaml (slightly mdified) from
>> https://github.com/brooklyncentral/blueprint-library.git:
>>
>>
>> name: cassandra-cluster-app-defserv
>> services:
>> - type: brooklyn.entity.nosql.cassandra.CassandraCluster
>>    name: Cassandra Cluster
>>    brooklyn.config:
>>      cluster.initial.size: 2
>>      cluster.initial.quorumSize: 1
>>      provisioning.properties:
>>        minCores: 1
>>        minRam: 100
>>
>> Everything looks fine. I can ssh into the nodes and run nodetoosl status,
>> which gives reasonable output:
>>
>> Note: Ownership information does not include topology; for complete
>> information, specify a keyspace
>> Datacenter: datacenter1
>> =======================
>> Status=Up/Down
>> |/ State=Normal/Leaving/Joining/Moving
>> --  Address         Load       Owns   Host ID
>> Token                                    Rack
>> UN  10.232.138.106  10.77 KB   50.0%  22fda260-3cd5-4342-bcb3-b2d4b38facc5
>>   -5997542197209433990                     rack1
>> UN  10.254.20.58    14.04 KB   50.0%  1a1ea5b4-5285-4f88-8e31-c9a4538f6a62
>>   3225829839645341818                      rack1
>>
>> However, after a few minutes the instances shut are shut down
>>
>> 2015-04-18 08:51:57,255 INFO  Launching CassandraNodeImpl{id=PRYt1W19}:
>> cluster BrooklynCluster, hostname (public)
>> ec2-52-12-82-75.us-west-2.compute.amazonaws.com, hostname (subnet)
>> ec2-52-12-82-75.us-west-2.compute.amazonaws.com, seeds
>> ec2-52-12-82-75.us-west-2.compute.amazonaws.com (from
>> [CassandraNodeImpl{id=PRYt1W19}])
>> 2015-04-18 08:51:59,626 INFO  Launching CassandraNodeImpl{id=zbsCBjGS}:
>> delaying launch of non-first node by 59s 994ms to prevent schema
>> disagreements
>>
>> ...good.. and then:
>>
>> 2015-04-18 08:57:11,547 WARN  Error invoking start at
>> CassandraNodeImpl{id=PRYt1W19}: Timeout waiting for SERVICE_UP from
>> CassandraNodeImpl{id=PRYt1W19}
>> 2015-04-18 08:57:11,547 WARN  Cluster CassandraClusterImpl{id=Nz5UaPes}
>> lost all its seeds while starting! Subsequent failure likely, but changing
>> seeds during startup would risk split-brain:
>> seeds=[CassandraNodeImpl{id=PRYt1W19}]
>>
>> ... and now shut down cascade starts.
>>
>> 2015-04-18 08:59:20,930 WARN
>>   brooklyn.management.internal.LocalEntityManager@4dab43cb call to stop
>> management of unknown entity (already unmanaged?)
>> CassandraNodeImpl{id=PRYt1W19}; skipping, and all descendants
>> 2015-04-18 08:59:20,934 INFO  Stopped application
>> BasicApplicationImpl{id=DwHO5Z9Y}
>>
>>
>> Any advice would be appreciated.
>>
>> p.s. Is this the correct forum for this query?
>>
>> Thanks,
>> Jade
>>
>>
>


-- 
Jade Mackay
e: jademackay@gmail.com
m: +64-(0)22-319-0847

Re: Unsuccessful cassandra deployment using yaml from blueprint-libary

Posted by Alex Heneveld <al...@cloudsoftcorp.com>.

Hi Jade,

Yes, this is the right place for your question.  Getting the Cassandra 
start-up sequence took some work, especially in different clouds with 
different notions of public and private networks, but this was hammered 
out a while ago and it has been pretty reliable since then, including in 
AWS, I thought.  Some questions and idea...

Does a single CassandraNode work?

The other strange thing is that it is shutting down the application.  A 
policy might shut down failed nodes -- though I think by default these 
are "quarantined", ie kept around for investigation rather than outright 
deleted -- but the *application* should only be shut down if that is 
manually initiated.  Can you grep the logs for "DwHO5Z9Y" to see what 
triggered its shutdown?

Finally, another thing to try is giving it a bit more RAM, maybe 100 
(mb) is just too low, and that's why the cluster is failing. Try "512m".

Best
Alex



On 18/04/2015 11:31, jade mackay wrote:
> Hi,
>
> I am trying to start a cassandra cluster on amazon ec2 using
>   cassandra-blueprint.yaml (slightly mdified) from
> https://github.com/brooklyncentral/blueprint-library.git:
>
>
> name: cassandra-cluster-app-defserv
> services:
> - type: brooklyn.entity.nosql.cassandra.CassandraCluster
>    name: Cassandra Cluster
>    brooklyn.config:
>      cluster.initial.size: 2
>      cluster.initial.quorumSize: 1
>      provisioning.properties:
>        minCores: 1
>        minRam: 100
>
> Everything looks fine. I can ssh into the nodes and run nodetoosl status,
> which gives reasonable output:
>
> Note: Ownership information does not include topology; for complete
> information, specify a keyspace
> Datacenter: datacenter1
> =======================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address         Load       Owns   Host ID
> Token                                    Rack
> UN  10.232.138.106  10.77 KB   50.0%  22fda260-3cd5-4342-bcb3-b2d4b38facc5
>   -5997542197209433990                     rack1
> UN  10.254.20.58    14.04 KB   50.0%  1a1ea5b4-5285-4f88-8e31-c9a4538f6a62
>   3225829839645341818                      rack1
>
> However, after a few minutes the instances shut are shut down
>
> 2015-04-18 08:51:57,255 INFO  Launching CassandraNodeImpl{id=PRYt1W19}:
> cluster BrooklynCluster, hostname (public)
> ec2-52-12-82-75.us-west-2.compute.amazonaws.com, hostname (subnet)
> ec2-52-12-82-75.us-west-2.compute.amazonaws.com, seeds
> ec2-52-12-82-75.us-west-2.compute.amazonaws.com (from
> [CassandraNodeImpl{id=PRYt1W19}])
> 2015-04-18 08:51:59,626 INFO  Launching CassandraNodeImpl{id=zbsCBjGS}:
> delaying launch of non-first node by 59s 994ms to prevent schema
> disagreements
>
> ...good.. and then:
>
> 2015-04-18 08:57:11,547 WARN  Error invoking start at
> CassandraNodeImpl{id=PRYt1W19}: Timeout waiting for SERVICE_UP from
> CassandraNodeImpl{id=PRYt1W19}
> 2015-04-18 08:57:11,547 WARN  Cluster CassandraClusterImpl{id=Nz5UaPes}
> lost all its seeds while starting! Subsequent failure likely, but changing
> seeds during startup would risk split-brain:
> seeds=[CassandraNodeImpl{id=PRYt1W19}]
>
> ... and now shut down cascade starts.
>
> 2015-04-18 08:59:20,930 WARN
>   brooklyn.management.internal.LocalEntityManager@4dab43cb call to stop
> management of unknown entity (already unmanaged?)
> CassandraNodeImpl{id=PRYt1W19}; skipping, and all descendants
> 2015-04-18 08:59:20,934 INFO  Stopped application
> BasicApplicationImpl{id=DwHO5Z9Y}
>
>
> Any advice would be appreciated.
>
> p.s. Is this the correct forum for this query?
>
> Thanks,
> Jade
>