You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@brooklyn.apache.org by jade mackay <ja...@gmail.com> on 2015/04/18 12:31:28 UTC
Unsuccessful cassandra deployment using yaml from blueprint-libary
Hi,
I am trying to start a cassandra cluster on amazon ec2 using
cassandra-blueprint.yaml (slightly mdified) from
https://github.com/brooklyncentral/blueprint-library.git:
name: cassandra-cluster-app-defserv
services:
- type: brooklyn.entity.nosql.cassandra.CassandraCluster
name: Cassandra Cluster
brooklyn.config:
cluster.initial.size: 2
cluster.initial.quorumSize: 1
provisioning.properties:
minCores: 1
minRam: 100
Everything looks fine. I can ssh into the nodes and run nodetoosl status,
which gives reasonable output:
Note: Ownership information does not include topology; for complete
information, specify a keyspace
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns Host ID
Token Rack
UN 10.232.138.106 10.77 KB 50.0% 22fda260-3cd5-4342-bcb3-b2d4b38facc5
-5997542197209433990 rack1
UN 10.254.20.58 14.04 KB 50.0% 1a1ea5b4-5285-4f88-8e31-c9a4538f6a62
3225829839645341818 rack1
However, after a few minutes the instances shut are shut down
2015-04-18 08:51:57,255 INFO Launching CassandraNodeImpl{id=PRYt1W19}:
cluster BrooklynCluster, hostname (public)
ec2-52-12-82-75.us-west-2.compute.amazonaws.com, hostname (subnet)
ec2-52-12-82-75.us-west-2.compute.amazonaws.com, seeds
ec2-52-12-82-75.us-west-2.compute.amazonaws.com (from
[CassandraNodeImpl{id=PRYt1W19}])
2015-04-18 08:51:59,626 INFO Launching CassandraNodeImpl{id=zbsCBjGS}:
delaying launch of non-first node by 59s 994ms to prevent schema
disagreements
...good.. and then:
2015-04-18 08:57:11,547 WARN Error invoking start at
CassandraNodeImpl{id=PRYt1W19}: Timeout waiting for SERVICE_UP from
CassandraNodeImpl{id=PRYt1W19}
2015-04-18 08:57:11,547 WARN Cluster CassandraClusterImpl{id=Nz5UaPes}
lost all its seeds while starting! Subsequent failure likely, but changing
seeds during startup would risk split-brain:
seeds=[CassandraNodeImpl{id=PRYt1W19}]
... and now shut down cascade starts.
2015-04-18 08:59:20,930 WARN
brooklyn.management.internal.LocalEntityManager@4dab43cb call to stop
management of unknown entity (already unmanaged?)
CassandraNodeImpl{id=PRYt1W19}; skipping, and all descendants
2015-04-18 08:59:20,934 INFO Stopped application
BasicApplicationImpl{id=DwHO5Z9Y}
Any advice would be appreciated.
p.s. Is this the correct forum for this query?
Thanks,
Jade
Re: Unsuccessful cassandra deployment using yaml from blueprint-libary
Posted by Svetoslav Neykov <sv...@cloudsoftcorp.com>.
Hi Jade,
In cases like this it's useful to check the logs on the machine. "Timeout waiting for SERVICE_UP " means that Brooklyn doesn't see the process running.
The output of the cassandra process is kept at "cassandra-console.log" in the runtime directory (see run.dir in Sensors tab for any of the cluster nodes). Also you can check if the process is still running on the machine. As Alex suggested check if a single CassandraNode works.
You can stop by our IRC channel #brooklyncentral IRC on FreeNode for more help on troubleshooting this.
Best,
Svet.
> On 19.04.2015 г., at 5:42, jade mackay <ja...@gmail.com> wrote:
>
> Hi Alex,
>
> The cluster was shutting down because it catches fire and I was launching
> it from the command line rather than the web console.
> When launched from the web console the cluster persists and data propagates
> over the nodes, despite being on fire with all nodes quarantined.
> Incidentally, I can use the cluster effector expand the cluster but not
> shrink it.
>
> The top level "cassandra-cluster-app" summary:
>
> Required entity not healthy: CassandraClusterImpl{id=DUHV0IoT}
> *Failure running task invoking start[locations] on 1 node (FPi4GFmk)
> <http://localhost:8081/#/v1/applications/BiAPu1MO/entities/BiAPu1MO/activities/subtask/FPi4GFmk>:
> *Error
> invoking start at CassandraClusterImpl{id=DUHV0IoT}: Node in cluster
> CassandraClusterImpl{id=DUHV0IoT} failed: 2 of 2 parallel child tasks
> failed, 2 errors including: Error invoking start at
> CassandraNodeImpl{id=Ewkzxyc8}: Timeout waiting for SERVICE_UP from
> CassandraNodeImpl{id=Ewkzxyc8}
>
> and the "Cassandra Cluster" summary:
>
> start failed with error: java.lang.IllegalStateException: Node in cluster
> CassandraClusterImpl{id=DUHV0IoT} failed: 2 of 2 parallel child tasks
> failed, 2 errors including: Error invoking start at
> CassandraNodeImpl{id=Ewkzxyc8}: Timeout waiting for SERVICE_UP from
> CassandraNodeImpl{id=Ewkzxyc8}
> *Failure running task starting 2 nodes (parallel) (jmlBKmM8)
> <http://localhost:8081/#/v1/applications/BiAPu1MO/entities/DUHV0IoT/activities/subtask/jmlBKmM8>:
> *2
> of 2 parallel child tasks failed, 2 errors including: Error invoking start
> at CassandraNodeImpl{id=Ewkzxyc8}: Timeout waiting for SERVICE_UP from
> CassandraNodeImpl{id=Ewkzxyc8}
>
> And one of the (aptly named) burning nodes "CassandraNode:D0wG":
>
> The software process for this entity does not appear to be running
> *Failure running task post-start (qBmoHFo2)
> <http://localhost:8081/#/v1/applications/BiAPu1MO/entities/D0wGMtHJ/activities/subtask/qBmoHFo2>:
> *
>
> For reference the blueprint:
> name: cassandra-cluster-app
> services:
> - type: brooklyn.entity.nosql.cassandra.CassandraCluster
> name: Cassandra Cluster
> brooklyn.config:
> cluster.initial.size: 2
> cluster.initial.quorumSize: 1
> provisioning.properties:
> minCores: 1
> minRam: 512
> location: aws-oregon
>
> Again, any tips on hunting the issue down would be appreciated.
>
> Cheers,
> Jade
>
>
>
>
>
>
> On 18 April 2015 at 23:30, Alex Heneveld <al...@cloudsoftcorp.com>
> wrote:
>
>>
>> Hi Jade,
>>
>> Yes, this is the right place for your question. Getting the Cassandra
>> start-up sequence took some work, especially in different clouds with
>> different notions of public and private networks, but this was hammered out
>> a while ago and it has been pretty reliable since then, including in AWS, I
>> thought. Some questions and idea...
>>
>> Does a single CassandraNode work?
>>
>> The other strange thing is that it is shutting down the application. A
>> policy might shut down failed nodes -- though I think by default these are
>> "quarantined", ie kept around for investigation rather than outright
>> deleted -- but the *application* should only be shut down if that is
>> manually initiated. Can you grep the logs for "DwHO5Z9Y" to see what
>> triggered its shutdown?
>>
>> Finally, another thing to try is giving it a bit more RAM, maybe 100 (mb)
>> is just too low, and that's why the cluster is failing. Try "512m".
>>
>> Best
>> Alex
>>
>>
>>
>>
>> On 18/04/2015 11:31, jade mackay wrote:
>>
>>> Hi,
>>>
>>> I am trying to start a cassandra cluster on amazon ec2 using
>>> cassandra-blueprint.yaml (slightly mdified) from
>>> https://github.com/brooklyncentral/blueprint-library.git:
>>>
>>>
>>> name: cassandra-cluster-app-defserv
>>> services:
>>> - type: brooklyn.entity.nosql.cassandra.CassandraCluster
>>> name: Cassandra Cluster
>>> brooklyn.config:
>>> cluster.initial.size: 2
>>> cluster.initial.quorumSize: 1
>>> provisioning.properties:
>>> minCores: 1
>>> minRam: 100
>>>
>>> Everything looks fine. I can ssh into the nodes and run nodetoosl status,
>>> which gives reasonable output:
>>>
>>> Note: Ownership information does not include topology; for complete
>>> information, specify a keyspace
>>> Datacenter: datacenter1
>>> =======================
>>> Status=Up/Down
>>> |/ State=Normal/Leaving/Joining/Moving
>>> -- Address Load Owns Host ID
>>> Token Rack
>>> UN 10.232.138.106 10.77 KB 50.0% 22fda260-3cd5-4342-bcb3-b2d4b38facc5
>>> -5997542197209433990 rack1
>>> UN 10.254.20.58 14.04 KB 50.0% 1a1ea5b4-5285-4f88-8e31-c9a4538f6a62
>>> 3225829839645341818 rack1
>>>
>>> However, after a few minutes the instances shut are shut down
>>>
>>> 2015-04-18 08:51:57,255 INFO Launching CassandraNodeImpl{id=PRYt1W19}:
>>> cluster BrooklynCluster, hostname (public)
>>> ec2-52-12-82-75.us-west-2.compute.amazonaws.com, hostname (subnet)
>>> ec2-52-12-82-75.us-west-2.compute.amazonaws.com, seeds
>>> ec2-52-12-82-75.us-west-2.compute.amazonaws.com (from
>>> [CassandraNodeImpl{id=PRYt1W19}])
>>> 2015-04-18 08:51:59,626 INFO Launching CassandraNodeImpl{id=zbsCBjGS}:
>>> delaying launch of non-first node by 59s 994ms to prevent schema
>>> disagreements
>>>
>>> ...good.. and then:
>>>
>>> 2015-04-18 08:57:11,547 WARN Error invoking start at
>>> CassandraNodeImpl{id=PRYt1W19}: Timeout waiting for SERVICE_UP from
>>> CassandraNodeImpl{id=PRYt1W19}
>>> 2015-04-18 08:57:11,547 WARN Cluster CassandraClusterImpl{id=Nz5UaPes}
>>> lost all its seeds while starting! Subsequent failure likely, but changing
>>> seeds during startup would risk split-brain:
>>> seeds=[CassandraNodeImpl{id=PRYt1W19}]
>>>
>>> ... and now shut down cascade starts.
>>>
>>> 2015-04-18 08:59:20,930 WARN
>>> brooklyn.management.internal.LocalEntityManager@4dab43cb call to stop
>>> management of unknown entity (already unmanaged?)
>>> CassandraNodeImpl{id=PRYt1W19}; skipping, and all descendants
>>> 2015-04-18 08:59:20,934 INFO Stopped application
>>> BasicApplicationImpl{id=DwHO5Z9Y}
>>>
>>>
>>> Any advice would be appreciated.
>>>
>>> p.s. Is this the correct forum for this query?
>>>
>>> Thanks,
>>> Jade
>>>
>>>
>>
>
>
> --
> Jade Mackay
> e: jademackay@gmail.com
> m: +64-(0)22-319-0847
Re: Unsuccessful cassandra deployment using yaml from blueprint-libary
Posted by jade mackay <ja...@gmail.com>.
Hi Alex,
The cluster was shutting down because it catches fire and I was launching
it from the command line rather than the web console.
When launched from the web console the cluster persists and data propagates
over the nodes, despite being on fire with all nodes quarantined.
Incidentally, I can use the cluster effector expand the cluster but not
shrink it.
The top level "cassandra-cluster-app" summary:
Required entity not healthy: CassandraClusterImpl{id=DUHV0IoT}
*Failure running task invoking start[locations] on 1 node (FPi4GFmk)
<http://localhost:8081/#/v1/applications/BiAPu1MO/entities/BiAPu1MO/activities/subtask/FPi4GFmk>:
*Error
invoking start at CassandraClusterImpl{id=DUHV0IoT}: Node in cluster
CassandraClusterImpl{id=DUHV0IoT} failed: 2 of 2 parallel child tasks
failed, 2 errors including: Error invoking start at
CassandraNodeImpl{id=Ewkzxyc8}: Timeout waiting for SERVICE_UP from
CassandraNodeImpl{id=Ewkzxyc8}
and the "Cassandra Cluster" summary:
start failed with error: java.lang.IllegalStateException: Node in cluster
CassandraClusterImpl{id=DUHV0IoT} failed: 2 of 2 parallel child tasks
failed, 2 errors including: Error invoking start at
CassandraNodeImpl{id=Ewkzxyc8}: Timeout waiting for SERVICE_UP from
CassandraNodeImpl{id=Ewkzxyc8}
*Failure running task starting 2 nodes (parallel) (jmlBKmM8)
<http://localhost:8081/#/v1/applications/BiAPu1MO/entities/DUHV0IoT/activities/subtask/jmlBKmM8>:
*2
of 2 parallel child tasks failed, 2 errors including: Error invoking start
at CassandraNodeImpl{id=Ewkzxyc8}: Timeout waiting for SERVICE_UP from
CassandraNodeImpl{id=Ewkzxyc8}
And one of the (aptly named) burning nodes "CassandraNode:D0wG":
The software process for this entity does not appear to be running
*Failure running task post-start (qBmoHFo2)
<http://localhost:8081/#/v1/applications/BiAPu1MO/entities/D0wGMtHJ/activities/subtask/qBmoHFo2>:
*
For reference the blueprint:
name: cassandra-cluster-app
services:
- type: brooklyn.entity.nosql.cassandra.CassandraCluster
name: Cassandra Cluster
brooklyn.config:
cluster.initial.size: 2
cluster.initial.quorumSize: 1
provisioning.properties:
minCores: 1
minRam: 512
location: aws-oregon
Again, any tips on hunting the issue down would be appreciated.
Cheers,
Jade
On 18 April 2015 at 23:30, Alex Heneveld <al...@cloudsoftcorp.com>
wrote:
>
> Hi Jade,
>
> Yes, this is the right place for your question. Getting the Cassandra
> start-up sequence took some work, especially in different clouds with
> different notions of public and private networks, but this was hammered out
> a while ago and it has been pretty reliable since then, including in AWS, I
> thought. Some questions and idea...
>
> Does a single CassandraNode work?
>
> The other strange thing is that it is shutting down the application. A
> policy might shut down failed nodes -- though I think by default these are
> "quarantined", ie kept around for investigation rather than outright
> deleted -- but the *application* should only be shut down if that is
> manually initiated. Can you grep the logs for "DwHO5Z9Y" to see what
> triggered its shutdown?
>
> Finally, another thing to try is giving it a bit more RAM, maybe 100 (mb)
> is just too low, and that's why the cluster is failing. Try "512m".
>
> Best
> Alex
>
>
>
>
> On 18/04/2015 11:31, jade mackay wrote:
>
>> Hi,
>>
>> I am trying to start a cassandra cluster on amazon ec2 using
>> cassandra-blueprint.yaml (slightly mdified) from
>> https://github.com/brooklyncentral/blueprint-library.git:
>>
>>
>> name: cassandra-cluster-app-defserv
>> services:
>> - type: brooklyn.entity.nosql.cassandra.CassandraCluster
>> name: Cassandra Cluster
>> brooklyn.config:
>> cluster.initial.size: 2
>> cluster.initial.quorumSize: 1
>> provisioning.properties:
>> minCores: 1
>> minRam: 100
>>
>> Everything looks fine. I can ssh into the nodes and run nodetoosl status,
>> which gives reasonable output:
>>
>> Note: Ownership information does not include topology; for complete
>> information, specify a keyspace
>> Datacenter: datacenter1
>> =======================
>> Status=Up/Down
>> |/ State=Normal/Leaving/Joining/Moving
>> -- Address Load Owns Host ID
>> Token Rack
>> UN 10.232.138.106 10.77 KB 50.0% 22fda260-3cd5-4342-bcb3-b2d4b38facc5
>> -5997542197209433990 rack1
>> UN 10.254.20.58 14.04 KB 50.0% 1a1ea5b4-5285-4f88-8e31-c9a4538f6a62
>> 3225829839645341818 rack1
>>
>> However, after a few minutes the instances shut are shut down
>>
>> 2015-04-18 08:51:57,255 INFO Launching CassandraNodeImpl{id=PRYt1W19}:
>> cluster BrooklynCluster, hostname (public)
>> ec2-52-12-82-75.us-west-2.compute.amazonaws.com, hostname (subnet)
>> ec2-52-12-82-75.us-west-2.compute.amazonaws.com, seeds
>> ec2-52-12-82-75.us-west-2.compute.amazonaws.com (from
>> [CassandraNodeImpl{id=PRYt1W19}])
>> 2015-04-18 08:51:59,626 INFO Launching CassandraNodeImpl{id=zbsCBjGS}:
>> delaying launch of non-first node by 59s 994ms to prevent schema
>> disagreements
>>
>> ...good.. and then:
>>
>> 2015-04-18 08:57:11,547 WARN Error invoking start at
>> CassandraNodeImpl{id=PRYt1W19}: Timeout waiting for SERVICE_UP from
>> CassandraNodeImpl{id=PRYt1W19}
>> 2015-04-18 08:57:11,547 WARN Cluster CassandraClusterImpl{id=Nz5UaPes}
>> lost all its seeds while starting! Subsequent failure likely, but changing
>> seeds during startup would risk split-brain:
>> seeds=[CassandraNodeImpl{id=PRYt1W19}]
>>
>> ... and now shut down cascade starts.
>>
>> 2015-04-18 08:59:20,930 WARN
>> brooklyn.management.internal.LocalEntityManager@4dab43cb call to stop
>> management of unknown entity (already unmanaged?)
>> CassandraNodeImpl{id=PRYt1W19}; skipping, and all descendants
>> 2015-04-18 08:59:20,934 INFO Stopped application
>> BasicApplicationImpl{id=DwHO5Z9Y}
>>
>>
>> Any advice would be appreciated.
>>
>> p.s. Is this the correct forum for this query?
>>
>> Thanks,
>> Jade
>>
>>
>
--
Jade Mackay
e: jademackay@gmail.com
m: +64-(0)22-319-0847
Re: Unsuccessful cassandra deployment using yaml from blueprint-libary
Posted by Alex Heneveld <al...@cloudsoftcorp.com>.
Hi Jade,
Yes, this is the right place for your question. Getting the Cassandra
start-up sequence took some work, especially in different clouds with
different notions of public and private networks, but this was hammered
out a while ago and it has been pretty reliable since then, including in
AWS, I thought. Some questions and idea...
Does a single CassandraNode work?
The other strange thing is that it is shutting down the application. A
policy might shut down failed nodes -- though I think by default these
are "quarantined", ie kept around for investigation rather than outright
deleted -- but the *application* should only be shut down if that is
manually initiated. Can you grep the logs for "DwHO5Z9Y" to see what
triggered its shutdown?
Finally, another thing to try is giving it a bit more RAM, maybe 100
(mb) is just too low, and that's why the cluster is failing. Try "512m".
Best
Alex
On 18/04/2015 11:31, jade mackay wrote:
> Hi,
>
> I am trying to start a cassandra cluster on amazon ec2 using
> cassandra-blueprint.yaml (slightly mdified) from
> https://github.com/brooklyncentral/blueprint-library.git:
>
>
> name: cassandra-cluster-app-defserv
> services:
> - type: brooklyn.entity.nosql.cassandra.CassandraCluster
> name: Cassandra Cluster
> brooklyn.config:
> cluster.initial.size: 2
> cluster.initial.quorumSize: 1
> provisioning.properties:
> minCores: 1
> minRam: 100
>
> Everything looks fine. I can ssh into the nodes and run nodetoosl status,
> which gives reasonable output:
>
> Note: Ownership information does not include topology; for complete
> information, specify a keyspace
> Datacenter: datacenter1
> =======================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> -- Address Load Owns Host ID
> Token Rack
> UN 10.232.138.106 10.77 KB 50.0% 22fda260-3cd5-4342-bcb3-b2d4b38facc5
> -5997542197209433990 rack1
> UN 10.254.20.58 14.04 KB 50.0% 1a1ea5b4-5285-4f88-8e31-c9a4538f6a62
> 3225829839645341818 rack1
>
> However, after a few minutes the instances shut are shut down
>
> 2015-04-18 08:51:57,255 INFO Launching CassandraNodeImpl{id=PRYt1W19}:
> cluster BrooklynCluster, hostname (public)
> ec2-52-12-82-75.us-west-2.compute.amazonaws.com, hostname (subnet)
> ec2-52-12-82-75.us-west-2.compute.amazonaws.com, seeds
> ec2-52-12-82-75.us-west-2.compute.amazonaws.com (from
> [CassandraNodeImpl{id=PRYt1W19}])
> 2015-04-18 08:51:59,626 INFO Launching CassandraNodeImpl{id=zbsCBjGS}:
> delaying launch of non-first node by 59s 994ms to prevent schema
> disagreements
>
> ...good.. and then:
>
> 2015-04-18 08:57:11,547 WARN Error invoking start at
> CassandraNodeImpl{id=PRYt1W19}: Timeout waiting for SERVICE_UP from
> CassandraNodeImpl{id=PRYt1W19}
> 2015-04-18 08:57:11,547 WARN Cluster CassandraClusterImpl{id=Nz5UaPes}
> lost all its seeds while starting! Subsequent failure likely, but changing
> seeds during startup would risk split-brain:
> seeds=[CassandraNodeImpl{id=PRYt1W19}]
>
> ... and now shut down cascade starts.
>
> 2015-04-18 08:59:20,930 WARN
> brooklyn.management.internal.LocalEntityManager@4dab43cb call to stop
> management of unknown entity (already unmanaged?)
> CassandraNodeImpl{id=PRYt1W19}; skipping, and all descendants
> 2015-04-18 08:59:20,934 INFO Stopped application
> BasicApplicationImpl{id=DwHO5Z9Y}
>
>
> Any advice would be appreciated.
>
> p.s. Is this the correct forum for this query?
>
> Thanks,
> Jade
>