You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@karaf.apache.org by da3m0npr0c3ss <to...@gmail.com> on 2012/07/12 07:33:53 UTC
Cellar and horizontal scalability issue
Hello,
We've been trying to scale Cellar horizontally, but as we bring up more
nodes, there is a lot of network synchronization between all the nodes to
the point bringing up multiple nodes at the same time takes many minutes. I
have not yet been able to perform root cause analysis but my hunch at this
point is the synchronization mechanism, where the push/pull of nodes w/i a
Cellar group seems to cause a lot of network chatter in the cluster. In
looking athe code, it seems the first node of a group should push to the
data grid, subsequent nodes should pull.
The Hazelcast serialization exceptions mentioned in an earlier post (
http://karaf.922171.n3.nabble.com/Cellar-2-2-4-cellar-event-Hazelcast-serialization-exception-tp4024747.html
) may (or may not) contribute to the slow start up of the cluster.
I'll try and gather more data as well as perform root cause analysis, but
any insight would be appreciated.
Thanks,
John T.
--
View this message in context: http://karaf.922171.n3.nabble.com/Cellar-and-horizontal-scalability-issue-tp4025204.html
Sent from the Karaf - User mailing list archive at Nabble.com.
Re: Cellar and horizontal scalability issue
Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Hi John,
thanks for the update. It could be helpful if you can share your exact
use case (I will reproduce and investigate).
Regards
JB
On 07/12/2012 07:33 AM, da3m0npr0c3ss wrote:
> Hello,
>
> We've been trying to scale Cellar horizontally, but as we bring up more
> nodes, there is a lot of network synchronization between all the nodes to
> the point bringing up multiple nodes at the same time takes many minutes. I
> have not yet been able to perform root cause analysis but my hunch at this
> point is the synchronization mechanism, where the push/pull of nodes w/i a
> Cellar group seems to cause a lot of network chatter in the cluster. In
> looking athe code, it seems the first node of a group should push to the
> data grid, subsequent nodes should pull.
>
> The Hazelcast serialization exceptions mentioned in an earlier post (
> http://karaf.922171.n3.nabble.com/Cellar-2-2-4-cellar-event-Hazelcast-serialization-exception-tp4024747.html
> ) may (or may not) contribute to the slow start up of the cluster.
>
> I'll try and gather more data as well as perform root cause analysis, but
> any insight would be appreciated.
>
> Thanks,
> John T.
>
> --
> View this message in context: http://karaf.922171.n3.nabble.com/Cellar-and-horizontal-scalability-issue-tp4025204.html
> Sent from the Karaf - User mailing list archive at Nabble.com.
>
--
Jean-Baptiste Onofré
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com
Re: Cellar and horizontal scalability issue
Posted by da3m0npr0c3ss <to...@gmail.com>.
JB,
A bit more information. Bringing up all the nodes as part of the
*default* Cellar group is fine. When creating a new group and then using
the cluster commands to add other nodes into the group results in a
scrolling exception across almost all the nodes, e.g.:
On node1
karaf@root> cluster:group-create dos
karaf@root> cluster:group-join dos node2
karaf@root> cluster:group-join dos node3
karaf@root> cluster:group-join dos node4
Fairly soon, the following exception will start scrolling through the karaf
logs:
2012-07-12 11:35:28,273 | WARN | ol-10-thread-177 | RemoteEventHandler
| 192 - org.apache.karaf.cellar.event - 2.2.4 | CELLAR EVENT: node is not
part of the event cluster group
2012-07-12 11:35:28,273 | WARN | ol-10-thread-237 | RemoteEventHandler
| 192 - org.apache.karaf.cellar.event - 2.2.4 | CELLAR EVENT: node is not
part of the event cluster group
2012-07-12 11:35:28,273 | WARN | ol-10-thread-211 | RemoteEventHandler
| 192 - org.apache.karaf.cellar.event - 2.2.4 | CELLAR EVENT: node is not
part of the event cluster group
2012-07-12 11:35:28,273 | WARN | ol-10-thread-150 | RemoteEventHandler
| 192 - org.apache.karaf.cellar.event - 2.2.4 | CELLAR EVENT: node is not
part of the event cluster group
hth,
jt
--
View this message in context: http://karaf.922171.n3.nabble.com/Cellar-and-horizontal-scalability-issue-tp4025204p4025221.html
Sent from the Karaf - User mailing list archive at Nabble.com.
Re: Cellar and horizontal scalability issue
Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Thanks for the detailed information.
I will setup and environment similar to yours.
I keep you posted.
Regards
JB
On 07/12/2012 04:20 PM, da3m0npr0c3ss wrote:
> Hello JB,
>
> We're using Karaf 2.2.8 and Cellar 2.2.4. We're using the
> jre.properties.cxf as we're using CXF. We install 8 cellar nodes or so with
> the following features and bundles:
>
> * features:addurl mvn:org.apache.cxf.karaf/apache-cxf/2.6.1/xml/features
> * features:install cxf
> * features:addurl
> mvn:org.apache.camel.karaf/apache-camel/2.9.2/xml/features
> * features:install camel camel-blueprint camel-eventadmin camel-http4
> * osgi:install -s mvn:org.codehaus.jackson/jackson-core-asl/1.9.7 \
> mvn:org.codehaus.jackson/jackson-jaxrs/1.9.7 \
>
> mvn:org.codehaus.jackson/jackson-mapper-asl/1.9.7 \
> mvn:org.codehaus.jackson/jackson-xc/1.9.2 \
> mvn:org.apache.httpcomponents/httpmime/4.1.2
> * features:addurl
> mvn:org.apache.karaf.cellar/apache-karaf-cellar/2.2.4/xml/features
> * features:install eventadmin wrapper webconsole cellar cellar-event
> cellar-webconsole
> * wrapper:install
>
> We're using the standard karaf, cellar, and hazelcast configurations.
>
> When simultaneously starting the 8 nodes synchronization takes a long time
> (20 ~ 30 minutes) along w/ all the Hazelcast exceptions in the log.
>
> thx,
> jt
>
> --
> View this message in context: http://karaf.922171.n3.nabble.com/Cellar-and-horizontal-scalability-issue-tp4025204p4025214.html
> Sent from the Karaf - User mailing list archive at Nabble.com.
>
--
Jean-Baptiste Onofré
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com
Re: Cellar and horizontal scalability issue
Posted by da3m0npr0c3ss <to...@gmail.com>.
Hello JB,
We're using Karaf 2.2.8 and Cellar 2.2.4. We're using the
jre.properties.cxf as we're using CXF. We install 8 cellar nodes or so with
the following features and bundles:
* features:addurl mvn:org.apache.cxf.karaf/apache-cxf/2.6.1/xml/features
* features:install cxf
* features:addurl
mvn:org.apache.camel.karaf/apache-camel/2.9.2/xml/features
* features:install camel camel-blueprint camel-eventadmin camel-http4
* osgi:install -s mvn:org.codehaus.jackson/jackson-core-asl/1.9.7 \
mvn:org.codehaus.jackson/jackson-jaxrs/1.9.7 \
mvn:org.codehaus.jackson/jackson-mapper-asl/1.9.7 \
mvn:org.codehaus.jackson/jackson-xc/1.9.2 \
mvn:org.apache.httpcomponents/httpmime/4.1.2
* features:addurl
mvn:org.apache.karaf.cellar/apache-karaf-cellar/2.2.4/xml/features
* features:install eventadmin wrapper webconsole cellar cellar-event
cellar-webconsole
* wrapper:install
We're using the standard karaf, cellar, and hazelcast configurations.
When simultaneously starting the 8 nodes synchronization takes a long time
(20 ~ 30 minutes) along w/ all the Hazelcast exceptions in the log.
thx,
jt
--
View this message in context: http://karaf.922171.n3.nabble.com/Cellar-and-horizontal-scalability-issue-tp4025204p4025214.html
Sent from the Karaf - User mailing list archive at Nabble.com.