You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@karaf.apache.org by da3m0npr0c3ss <to...@gmail.com> on 2012/07/12 07:33:53 UTC

Cellar and horizontal scalability issue

Hello,

  We've been trying to scale Cellar horizontally, but as we bring up more
nodes, there is a lot of network synchronization between all the nodes to
the point bringing up multiple nodes at the same time takes many minutes.  I
have not yet been able to perform root cause analysis but my hunch at this
point is the synchronization mechanism, where the push/pull of nodes w/i a
Cellar group seems to cause a lot of network chatter in the cluster. In
looking athe code, it seems the first node of a group should push to the
data grid, subsequent nodes should pull.

  The Hazelcast serialization exceptions mentioned in an earlier post (
http://karaf.922171.n3.nabble.com/Cellar-2-2-4-cellar-event-Hazelcast-serialization-exception-tp4024747.html
) may (or may not) contribute to the slow start up of the cluster.

  I'll try and gather more data as well as perform root cause analysis, but
any insight would be appreciated.

Thanks,
John T.

--
View this message in context: http://karaf.922171.n3.nabble.com/Cellar-and-horizontal-scalability-issue-tp4025204.html
Sent from the Karaf - User mailing list archive at Nabble.com.

Re: Cellar and horizontal scalability issue

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Hi John,

thanks for the update. It could be helpful if you can share your exact 
use case (I will reproduce and investigate).

Regards
JB

On 07/12/2012 07:33 AM, da3m0npr0c3ss wrote:
> Hello,
>
>    We've been trying to scale Cellar horizontally, but as we bring up more
> nodes, there is a lot of network synchronization between all the nodes to
> the point bringing up multiple nodes at the same time takes many minutes.  I
> have not yet been able to perform root cause analysis but my hunch at this
> point is the synchronization mechanism, where the push/pull of nodes w/i a
> Cellar group seems to cause a lot of network chatter in the cluster. In
> looking athe code, it seems the first node of a group should push to the
> data grid, subsequent nodes should pull.
>
>    The Hazelcast serialization exceptions mentioned in an earlier post (
> http://karaf.922171.n3.nabble.com/Cellar-2-2-4-cellar-event-Hazelcast-serialization-exception-tp4024747.html
> ) may (or may not) contribute to the slow start up of the cluster.
>
>    I'll try and gather more data as well as perform root cause analysis, but
> any insight would be appreciated.
>
> Thanks,
> John T.
>
> --
> View this message in context: http://karaf.922171.n3.nabble.com/Cellar-and-horizontal-scalability-issue-tp4025204.html
> Sent from the Karaf - User mailing list archive at Nabble.com.
>

-- 
Jean-Baptiste Onofré
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com



Re: Cellar and horizontal scalability issue

Posted by da3m0npr0c3ss <to...@gmail.com>.
JB,

  A bit more information.  Bringing up all the nodes as part of the
*default* Cellar group is fine.  When creating a new group and then using
the cluster commands to add other nodes into the group results in a
scrolling exception across almost all the nodes, e.g.:

On node1

karaf@root> cluster:group-create dos
karaf@root> cluster:group-join dos node2
karaf@root> cluster:group-join dos node3
karaf@root> cluster:group-join dos node4

Fairly soon, the following exception will start scrolling through the karaf
logs:

2012-07-12 11:35:28,273 | WARN  | ol-10-thread-177 | RemoteEventHandler              
| 192 - org.apache.karaf.cellar.event - 2.2.4 | CELLAR EVENT: node is not
part of the event cluster group
2012-07-12 11:35:28,273 | WARN  | ol-10-thread-237 | RemoteEventHandler              
| 192 - org.apache.karaf.cellar.event - 2.2.4 | CELLAR EVENT: node is not
part of the event cluster group
2012-07-12 11:35:28,273 | WARN  | ol-10-thread-211 | RemoteEventHandler              
| 192 - org.apache.karaf.cellar.event - 2.2.4 | CELLAR EVENT: node is not
part of the event cluster group
2012-07-12 11:35:28,273 | WARN  | ol-10-thread-150 | RemoteEventHandler              
| 192 - org.apache.karaf.cellar.event - 2.2.4 | CELLAR EVENT: node is not
part of the event cluster group

hth,
jt



--
View this message in context: http://karaf.922171.n3.nabble.com/Cellar-and-horizontal-scalability-issue-tp4025204p4025221.html
Sent from the Karaf - User mailing list archive at Nabble.com.

Re: Cellar and horizontal scalability issue

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
Thanks for the detailed information.

I will setup and environment similar to yours.
I keep you posted.

Regards
JB

On 07/12/2012 04:20 PM, da3m0npr0c3ss wrote:
> Hello JB,
>
>    We're using Karaf 2.2.8 and Cellar 2.2.4.  We're using the
> jre.properties.cxf as we're using CXF.  We install 8 cellar nodes or so with
> the following features and bundles:
>
>      * features:addurl mvn:org.apache.cxf.karaf/apache-cxf/2.6.1/xml/features
>      * features:install cxf
>      * features:addurl
> mvn:org.apache.camel.karaf/apache-camel/2.9.2/xml/features
>      * features:install camel camel-blueprint camel-eventadmin camel-http4
>      * osgi:install -s mvn:org.codehaus.jackson/jackson-core-asl/1.9.7 \
>                              mvn:org.codehaus.jackson/jackson-jaxrs/1.9.7 \
>
> mvn:org.codehaus.jackson/jackson-mapper-asl/1.9.7 \
>                              mvn:org.codehaus.jackson/jackson-xc/1.9.2 \
>                              mvn:org.apache.httpcomponents/httpmime/4.1.2
>      * features:addurl
> mvn:org.apache.karaf.cellar/apache-karaf-cellar/2.2.4/xml/features
>      * features:install eventadmin wrapper webconsole cellar cellar-event
> cellar-webconsole
>      * wrapper:install
>
>    We're using the standard karaf, cellar, and hazelcast configurations.
>
>    When simultaneously starting the 8 nodes synchronization takes a long time
> (20 ~ 30 minutes) along w/ all the Hazelcast exceptions in the log.
>
> thx,
> jt
>
> --
> View this message in context: http://karaf.922171.n3.nabble.com/Cellar-and-horizontal-scalability-issue-tp4025204p4025214.html
> Sent from the Karaf - User mailing list archive at Nabble.com.
>

-- 
Jean-Baptiste Onofré
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com



Re: Cellar and horizontal scalability issue

Posted by da3m0npr0c3ss <to...@gmail.com>.
Hello JB,

  We're using Karaf 2.2.8 and Cellar 2.2.4.  We're using the
jre.properties.cxf as we're using CXF.  We install 8 cellar nodes or so with
the following features and bundles:

    * features:addurl mvn:org.apache.cxf.karaf/apache-cxf/2.6.1/xml/features
    * features:install cxf
    * features:addurl
mvn:org.apache.camel.karaf/apache-camel/2.9.2/xml/features
    * features:install camel camel-blueprint camel-eventadmin camel-http4
    * osgi:install -s mvn:org.codehaus.jackson/jackson-core-asl/1.9.7 \
                            mvn:org.codehaus.jackson/jackson-jaxrs/1.9.7 \
                           
mvn:org.codehaus.jackson/jackson-mapper-asl/1.9.7 \
                            mvn:org.codehaus.jackson/jackson-xc/1.9.2 \
                            mvn:org.apache.httpcomponents/httpmime/4.1.2
    * features:addurl
mvn:org.apache.karaf.cellar/apache-karaf-cellar/2.2.4/xml/features
    * features:install eventadmin wrapper webconsole cellar cellar-event
cellar-webconsole
    * wrapper:install

  We're using the standard karaf, cellar, and hazelcast configurations.

  When simultaneously starting the 8 nodes synchronization takes a long time
(20 ~ 30 minutes) along w/ all the Hazelcast exceptions in the log.

thx,
jt

--
View this message in context: http://karaf.922171.n3.nabble.com/Cellar-and-horizontal-scalability-issue-tp4025204p4025214.html
Sent from the Karaf - User mailing list archive at Nabble.com.