You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@helix.apache.org by "Joy (JIRA)" <ji...@apache.org> on 2014/10/29 00:19:33 UTC
[jira] [Created] (HELIX-535) Helix controller stops working with
heavy configuration
Joy created HELIX-535:
-------------------------
Summary: Helix controller stops working with heavy configuration
Key: HELIX-535
URL: https://issues.apache.org/jira/browse/HELIX-535
Project: Apache Helix
Issue Type: Bug
Components: helix-core
Environment: machine:$ uname -a
Linux eat1-app373.stg 2.6.32-220.10.1.el6.x86_64 #1 SMP Fri Mar 9 12:37:51 EST 2012 x86_64 x86_64 x86_64 GNU/Linux
JVM version: $ /export/apps/jdk/current/bin/java -version
java version "1.6.0_21"
Java(TM) SE Runtime Environment (build 1.6.0_21-b06)
Java HotSpot(TM) 64-Bit Server VM (build 17.0-b16, mixed mode)
Reporter: Joy
The issue consistently comes up with heavy configuration: higher number of znodes, higher number of partitions, and higher number of databases.
The goal of our tests is to evaluate the performance of helix controller (in terms of controller latency) with increased number of nodes, databases and partitions.
In our test, we use multiple machines: one for zookeeper, one for helix controller, and the rest are for dummy processes. The configuration is as below:
zkr <----------> helix
^
|
V
dummy processes
We intentionally kill the master dummy processes once every 30 seconds to simulate a failure event. Everything works fine with light configuration such as: 27 nodes + 1db + 729 partitions. However, when the configuration is heavy, such as 81 nodes + 10 databases + 81 partitions for each db, the controller latency increases significantly after several failure events:
Control Latency (ms)
First event : 182
Second event: 188
Third event: 200
Fourth Event: 193
Fifth event: 200
Sixth event: 185
Seventh event: 189
Eight event: 213
Ninth Event: 1082209
And then after this extremely long failure, the helix controller stop working. The controller log is as attached.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)