You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@helix.apache.org by Manikumar Reddy <ku...@nmsworks.co.in> on 2013/06/17 16:01:18 UTC

Controller fault tolerance

Hi,

I am trying to understand the Helix Controller/Cluster manager fault
tolerance mechanism.
Single Controller will become Single-Point-Failure. So what are the
available options/techniques to
achieve controller fault tolerance?   Any pointers/recipes/code snippets?

Regards,
Kumar

Re: Controller fault tolerance

Posted by kishore g <g....@gmail.com>.

Thanks Jason. I am guessing its only the isLeader method returning wrong
results since it compares the name but there is actually one active
controller. Is my understanding correct, if yes. then naming each
controller with different names should work right ?


On Fri, Jun 21, 2013 at 1:44 PM, Zhen Zhang <zz...@linkedin.com> wrote:

>  This is a known bug in helix.
> https://issues.apache.org/jira/browse/HELIX-123
>
>  The problem is we are comparing the instance name of the controller but
> not the session id, so if you start two controllers of the same name,
> isLeader() return true. We will fix it shortly.
>
>  Thanks,
> Jason
>
>   From: Lance Co Ting Keh <la...@box.com>
> Reply-To: "user@helix.incubator.apache.org" <
> user@helix.incubator.apache.org>
> Date: Friday, June 21, 2013 1:39 PM
> To: "user@helix.incubator.apache.org" <us...@helix.incubator.apache.org>
> Subject: Re: Controller fault tolerance
>
>   Hi Kishore,
>
>  I tried starting two controllers programmatically like you mentioned:
>
>  controllerManager = HelixControllerMain.startHelixController(zkAddress,
>
>           clusterName, "controller", HelixControllerMain.STANDALONE);
>
>
> I then called isLeader() on the both managers (http://helix.incubator.apache.org/apidocs/reference/org/apache/helix/HelixManager.html#isLeader()). and both of them returned true. They're obviously both on the same zookeeper instance, and on the same cluster. The controllers are running and so im not sure whether or not its actually leader electing properly, or I'm misinterpreting the isLeader() function
>
>
> Thanks
> Lance
>
>
>
> On Mon, Jun 17, 2013 at 9:22 AM, Manikumar Reddy <ku...@nmsworks.co.in>wrote:
>
>> Hi Kishore,
>>
>> Thanks for the quick response.
>>
>> Regards,
>> Kumar
>>
>>
>> On Mon, Jun 17, 2013 at 8:18 PM, kishore g <g....@gmail.com> wrote:
>>
>>> Hi Kumar,
>>>
>>>  You can start multiple controllers and only one of them will be active
>>> and rest of them will be in standby mode. If the active controller fails,
>>> one of the standby will become active and start managing the cluster.
>>>
>>>  You can start the controllers either using command line or
>>> programmatically.
>>>
>>>  command line
>>>
>>> ./run-helix-controller.sh --zkSvr localhost:2199 --cluster <clustername>
>>>
>>>  using Helix api
>>>
>>> controllerManager = HelixControllerMain.startHelixController(zkAddress,
>>>           clusterName, "controller", HelixControllerMain.STANDALONE);
>>>
>>> Hope this helps.
>>>
>>> thanks,
>>> Kishore G
>>>
>>>
>>>
>>>
>>> On Mon, Jun 17, 2013 at 7:01 AM, Manikumar Reddy <ku...@nmsworks.co.in>wrote:
>>>
>>>> Hi,
>>>>
>>>> I am trying to understand the Helix Controller/Cluster manager fault
>>>> tolerance mechanism.
>>>> Single Controller will become Single-Point-Failure. So what are the
>>>> available options/techniques to
>>>> achieve controller fault tolerance?   Any pointers/recipes/code
>>>> snippets?
>>>>
>>>> Regards,
>>>> Kumar
>>>
>>>
>>>
>>
>

Re: Controller fault tolerance

Posted by kishore g <g....@gmail.com>.

Hi Lance,

We have a test case that tests the scenario you described.
https://github.com/apache/incubator-helix/blob/master/helix-core/src/test/java/org/apache/helix/integration/TestStandAloneCMMain.java

Thanks,
Kishore G


On Wed, Jun 26, 2013 at 3:48 PM, Shi Lu <lu...@gmail.com> wrote:

> Hi Lance:
>
> Here is how the multiple controller leader election works:
>
> In the case that controller x, y, z both try to control a cluster
>
> 1. x, y, z both try to create a zookeeper ephemeral node
> /clusterName/CONTROLLER/LEADER
>
> 2. Only one controller creates the ephemeral node successfully then starts
> controlling the cluster;
>
> 3. Other controllers fail to create the ephemeral node (it is already
> created by the leader), they will register a zookeeper change listener on
> the  /clusterName/CONTROLLER/LEADER ephemeral node; in case that node is
> gone, they will try to create the node, and if successful will control the
> cluster.
>
> So in the two controller case, when you shut down controller A, it may
> take some time for controller B to start controlling the cluster.
>
> Can you share your test code?
>
> Thanks,
> -Shi
>
>
> On Wed, Jun 26, 2013 at 8:43 AM, Lance Co Ting Keh <la...@box.com> wrote:
>
>> Hi guys,
>>
>> I tried naming the controllers differently. I first had one controller
>> running and is printing that it "isLeader()". When i brought up a second
>> controller named differently, the first controller printed that it is NOT
>> the leader and the new controller became the leader. Then I shut off the
>> current leader (second controller) but the first controller still continued
>> printing that it is NOT the leader. Somehow it leader elected once and did
>> not leader elect again. The only way im generating the leader is this:
>>
>>       controllerManager =
>> HelixControllerMain.startHelixController(zkAddress,
>>         clusterName, "controller", HelixControllerMain.STANDALONE);
>>
>>  AND
>>
>>       controllerManager =
>> HelixControllerMain.startHelixController(zkAddress,
>>         clusterName, "controller2", HelixControllerMain.STANDALONE);
>>
>> and im checking by saying controllerManager.isLeader() am i doing
>> something wrong?
>>
>> Thank you
>> Lance
>>
>>
>> On Fri, Jun 21, 2013 at 1:51 PM, Lance Co Ting Keh <la...@box.com> wrote:
>>
>>> Thank you very much for the quick response guys
>>>
>>>
>>> On Fri, Jun 21, 2013 at 1:49 PM, Zhen Zhang <zz...@linkedin.com> wrote:
>>>
>>>>  yes. Using different names for the controllers is a quick workaround.
>>>>
>>>>   From: Lance Co Ting Keh <la...@box.com>
>>>> Reply-To: "user@helix.incubator.apache.org" <
>>>> user@helix.incubator.apache.org>
>>>> Date: Friday, June 21, 2013 1:47 PM
>>>>
>>>> To: "user@helix.incubator.apache.org" <us...@helix.incubator.apache.org>
>>>> Subject: Re: Controller fault tolerance
>>>>
>>>>   Okay thank you. But for now the quick fix is to make sure to name
>>>> the controllers differently?
>>>>
>>>>
>>>> On Fri, Jun 21, 2013 at 1:44 PM, Zhen Zhang <zz...@linkedin.com>wrote:
>>>>
>>>>>  This is a known bug in helix.
>>>>> https://issues.apache.org/jira/browse/HELIX-123
>>>>>
>>>>>  The problem is we are comparing the instance name of the controller
>>>>> but not the session id, so if you start two controllers of the same name,
>>>>> isLeader() return true. We will fix it shortly.
>>>>>
>>>>>  Thanks,
>>>>> Jason
>>>>>
>>>>>   From: Lance Co Ting Keh <la...@box.com>
>>>>> Reply-To: "user@helix.incubator.apache.org" <
>>>>> user@helix.incubator.apache.org>
>>>>> Date: Friday, June 21, 2013 1:39 PM
>>>>> To: "user@helix.incubator.apache.org" <user@helix.incubator.apache.org
>>>>> >
>>>>> Subject: Re: Controller fault tolerance
>>>>>
>>>>>   Hi Kishore,
>>>>>
>>>>>  I tried starting two controllers programmatically like you mentioned:
>>>>>
>>>>>
>>>>>
>>>>> controllerManager = HelixControllerMain.startHelixController(zkAddress,
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>           clusterName, "controller", HelixControllerMain.STANDALONE);
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> I then called isLeader() on the both managers (http://helix.incubator.apache.org/apidocs/reference/org/apache/helix/HelixManager.html#isLeader()). and both of them returned true. They're obviously both on the same zookeeper instance, and on the same cluster. The controllers are running and so im not sure whether or not its actually leader electing properly, or I'm misinterpreting the isLeader() function
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>> Lance
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Jun 17, 2013 at 9:22 AM, Manikumar Reddy <kumar@nmsworks.co.in
>>>>> > wrote:
>>>>>
>>>>>> Hi Kishore,
>>>>>>
>>>>>> Thanks for the quick response.
>>>>>>
>>>>>> Regards,
>>>>>> Kumar
>>>>>>
>>>>>>
>>>>>> On Mon, Jun 17, 2013 at 8:18 PM, kishore g <g....@gmail.com>wrote:
>>>>>>
>>>>>>> Hi Kumar,
>>>>>>>
>>>>>>>  You can start multiple controllers and only one of them will be
>>>>>>> active and rest of them will be in standby mode. If the active controller
>>>>>>> fails, one of the standby will become active and start managing the cluster.
>>>>>>>
>>>>>>>  You can start the controllers either using command line or
>>>>>>> programmatically.
>>>>>>>
>>>>>>>  command line
>>>>>>>
>>>>>>> ./run-helix-controller.sh --zkSvr localhost:2199 --cluster <clustername>
>>>>>>>
>>>>>>>  using Helix api
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> controllerManager = HelixControllerMain.startHelixController(zkAddress,
>>>>>>>
>>>>>>>
>>>>>>>           clusterName, "controller", HelixControllerMain.STANDALONE);
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Hope this helps.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> thanks,
>>>>>>>
>>>>>>>
>>>>>>> Kishore G
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Jun 17, 2013 at 7:01 AM, Manikumar Reddy <
>>>>>>> kumar@nmsworks.co.in> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I am trying to understand the Helix Controller/Cluster manager
>>>>>>>> fault tolerance mechanism.
>>>>>>>> Single Controller will become Single-Point-Failure. So what are the
>>>>>>>> available options/techniques to
>>>>>>>> achieve controller fault tolerance?   Any pointers/recipes/code
>>>>>>>> snippets?
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Kumar
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Controller fault tolerance

Posted by Shi Lu <lu...@gmail.com>.

Hi Lance:

Here is how the multiple controller leader election works:

In the case that controller x, y, z both try to control a cluster

1. x, y, z both try to create a zookeeper ephemeral node
/clusterName/CONTROLLER/LEADER

2. Only one controller creates the ephemeral node successfully then starts
controlling the cluster;

3. Other controllers fail to create the ephemeral node (it is already
created by the leader), they will register a zookeeper change listener on
the  /clusterName/CONTROLLER/LEADER ephemeral node; in case that node is
gone, they will try to create the node, and if successful will control the
cluster.

So in the two controller case, when you shut down controller A, it may take
some time for controller B to start controlling the cluster.

Can you share your test code?

Thanks,
-Shi


On Wed, Jun 26, 2013 at 8:43 AM, Lance Co Ting Keh <la...@box.com> wrote:

> Hi guys,
>
> I tried naming the controllers differently. I first had one controller
> running and is printing that it "isLeader()". When i brought up a second
> controller named differently, the first controller printed that it is NOT
> the leader and the new controller became the leader. Then I shut off the
> current leader (second controller) but the first controller still continued
> printing that it is NOT the leader. Somehow it leader elected once and did
> not leader elect again. The only way im generating the leader is this:
>
>       controllerManager =
> HelixControllerMain.startHelixController(zkAddress,
>         clusterName, "controller", HelixControllerMain.STANDALONE);
>
> AND
>
>       controllerManager =
> HelixControllerMain.startHelixController(zkAddress,
>         clusterName, "controller2", HelixControllerMain.STANDALONE);
>
> and im checking by saying controllerManager.isLeader() am i doing
> something wrong?
>
> Thank you
> Lance
>
>
> On Fri, Jun 21, 2013 at 1:51 PM, Lance Co Ting Keh <la...@box.com> wrote:
>
>> Thank you very much for the quick response guys
>>
>>
>> On Fri, Jun 21, 2013 at 1:49 PM, Zhen Zhang <zz...@linkedin.com> wrote:
>>
>>>  yes. Using different names for the controllers is a quick workaround.
>>>
>>>   From: Lance Co Ting Keh <la...@box.com>
>>> Reply-To: "user@helix.incubator.apache.org" <
>>> user@helix.incubator.apache.org>
>>> Date: Friday, June 21, 2013 1:47 PM
>>>
>>> To: "user@helix.incubator.apache.org" <us...@helix.incubator.apache.org>
>>> Subject: Re: Controller fault tolerance
>>>
>>>   Okay thank you. But for now the quick fix is to make sure to name the
>>> controllers differently?
>>>
>>>
>>> On Fri, Jun 21, 2013 at 1:44 PM, Zhen Zhang <zz...@linkedin.com> wrote:
>>>
>>>>  This is a known bug in helix.
>>>> https://issues.apache.org/jira/browse/HELIX-123
>>>>
>>>>  The problem is we are comparing the instance name of the controller
>>>> but not the session id, so if you start two controllers of the same name,
>>>> isLeader() return true. We will fix it shortly.
>>>>
>>>>  Thanks,
>>>> Jason
>>>>
>>>>   From: Lance Co Ting Keh <la...@box.com>
>>>> Reply-To: "user@helix.incubator.apache.org" <
>>>> user@helix.incubator.apache.org>
>>>> Date: Friday, June 21, 2013 1:39 PM
>>>> To: "user@helix.incubator.apache.org" <us...@helix.incubator.apache.org>
>>>> Subject: Re: Controller fault tolerance
>>>>
>>>>   Hi Kishore,
>>>>
>>>>  I tried starting two controllers programmatically like you mentioned:
>>>>
>>>>
>>>> controllerManager = HelixControllerMain.startHelixController(zkAddress,
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>           clusterName, "controller", HelixControllerMain.STANDALONE);
>>>>
>>>>
>>>>
>>>>
>>>> I then called isLeader() on the both managers (http://helix.incubator.apache.org/apidocs/reference/org/apache/helix/HelixManager.html#isLeader()). and both of them returned true. They're obviously both on the same zookeeper instance, and on the same cluster. The controllers are running and so im not sure whether or not its actually leader electing properly, or I'm misinterpreting the isLeader() function
>>>>
>>>>
>>>>
>>>>
>>>> Thanks
>>>>
>>>> Lance
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Jun 17, 2013 at 9:22 AM, Manikumar Reddy <ku...@nmsworks.co.in>wrote:
>>>>
>>>>> Hi Kishore,
>>>>>
>>>>> Thanks for the quick response.
>>>>>
>>>>> Regards,
>>>>> Kumar
>>>>>
>>>>>
>>>>> On Mon, Jun 17, 2013 at 8:18 PM, kishore g <g....@gmail.com>wrote:
>>>>>
>>>>>> Hi Kumar,
>>>>>>
>>>>>>  You can start multiple controllers and only one of them will be
>>>>>> active and rest of them will be in standby mode. If the active controller
>>>>>> fails, one of the standby will become active and start managing the cluster.
>>>>>>
>>>>>>  You can start the controllers either using command line or
>>>>>> programmatically.
>>>>>>
>>>>>>  command line
>>>>>>
>>>>>> ./run-helix-controller.sh --zkSvr localhost:2199 --cluster <clustername>
>>>>>>
>>>>>>  using Helix api
>>>>>>
>>>>>>
>>>>>> controllerManager = HelixControllerMain.startHelixController(zkAddress,
>>>>>>
>>>>>>           clusterName, "controller", HelixControllerMain.STANDALONE);
>>>>>>
>>>>>>
>>>>>> Hope this helps.
>>>>>>
>>>>>>
>>>>>> thanks,
>>>>>>
>>>>>> Kishore G
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Jun 17, 2013 at 7:01 AM, Manikumar Reddy <
>>>>>> kumar@nmsworks.co.in> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I am trying to understand the Helix Controller/Cluster manager fault
>>>>>>> tolerance mechanism.
>>>>>>> Single Controller will become Single-Point-Failure. So what are the
>>>>>>> available options/techniques to
>>>>>>> achieve controller fault tolerance?   Any pointers/recipes/code
>>>>>>> snippets?
>>>>>>>
>>>>>>> Regards,
>>>>>>> Kumar
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Controller fault tolerance

Posted by Lance Co Ting Keh <la...@box.com>.

Hi guys,

I tried naming the controllers differently. I first had one controller
running and is printing that it "isLeader()". When i brought up a second
controller named differently, the first controller printed that it is NOT
the leader and the new controller became the leader. Then I shut off the
current leader (second controller) but the first controller still continued
printing that it is NOT the leader. Somehow it leader elected once and did
not leader elect again. The only way im generating the leader is this:

      controllerManager =
HelixControllerMain.startHelixController(zkAddress,
        clusterName, "controller", HelixControllerMain.STANDALONE);

AND

      controllerManager =
HelixControllerMain.startHelixController(zkAddress,
        clusterName, "controller2", HelixControllerMain.STANDALONE);

and im checking by saying controllerManager.isLeader() am i doing something
wrong?

Thank you
Lance


On Fri, Jun 21, 2013 at 1:51 PM, Lance Co Ting Keh <la...@box.com> wrote:

> Thank you very much for the quick response guys
>
>
> On Fri, Jun 21, 2013 at 1:49 PM, Zhen Zhang <zz...@linkedin.com> wrote:
>
>>  yes. Using different names for the controllers is a quick workaround.
>>
>>   From: Lance Co Ting Keh <la...@box.com>
>> Reply-To: "user@helix.incubator.apache.org" <
>> user@helix.incubator.apache.org>
>> Date: Friday, June 21, 2013 1:47 PM
>>
>> To: "user@helix.incubator.apache.org" <us...@helix.incubator.apache.org>
>> Subject: Re: Controller fault tolerance
>>
>>   Okay thank you. But for now the quick fix is to make sure to name the
>> controllers differently?
>>
>>
>> On Fri, Jun 21, 2013 at 1:44 PM, Zhen Zhang <zz...@linkedin.com> wrote:
>>
>>>  This is a known bug in helix.
>>> https://issues.apache.org/jira/browse/HELIX-123
>>>
>>>  The problem is we are comparing the instance name of the controller
>>> but not the session id, so if you start two controllers of the same name,
>>> isLeader() return true. We will fix it shortly.
>>>
>>>  Thanks,
>>> Jason
>>>
>>>   From: Lance Co Ting Keh <la...@box.com>
>>> Reply-To: "user@helix.incubator.apache.org" <
>>> user@helix.incubator.apache.org>
>>> Date: Friday, June 21, 2013 1:39 PM
>>> To: "user@helix.incubator.apache.org" <us...@helix.incubator.apache.org>
>>> Subject: Re: Controller fault tolerance
>>>
>>>   Hi Kishore,
>>>
>>>  I tried starting two controllers programmatically like you mentioned:
>>>
>>>  controllerManager = HelixControllerMain.startHelixController(zkAddress,
>>>
>>>
>>>
>>>           clusterName, "controller", HelixControllerMain.STANDALONE);
>>>
>>>
>>> I then called isLeader() on the both managers (http://helix.incubator.apache.org/apidocs/reference/org/apache/helix/HelixManager.html#isLeader()). and both of them returned true. They're obviously both on the same zookeeper instance, and on the same cluster. The controllers are running and so im not sure whether or not its actually leader electing properly, or I'm misinterpreting the isLeader() function
>>>
>>>
>>> Thanks
>>> Lance
>>>
>>>
>>>
>>> On Mon, Jun 17, 2013 at 9:22 AM, Manikumar Reddy <ku...@nmsworks.co.in>wrote:
>>>
>>>> Hi Kishore,
>>>>
>>>> Thanks for the quick response.
>>>>
>>>> Regards,
>>>> Kumar
>>>>
>>>>
>>>> On Mon, Jun 17, 2013 at 8:18 PM, kishore g <g....@gmail.com> wrote:
>>>>
>>>>> Hi Kumar,
>>>>>
>>>>>  You can start multiple controllers and only one of them will be
>>>>> active and rest of them will be in standby mode. If the active controller
>>>>> fails, one of the standby will become active and start managing the cluster.
>>>>>
>>>>>  You can start the controllers either using command line or
>>>>> programmatically.
>>>>>
>>>>>  command line
>>>>>
>>>>> ./run-helix-controller.sh --zkSvr localhost:2199 --cluster <clustername>
>>>>>
>>>>>  using Helix api
>>>>>
>>>>> controllerManager = HelixControllerMain.startHelixController(zkAddress,
>>>>>           clusterName, "controller", HelixControllerMain.STANDALONE);
>>>>>
>>>>> Hope this helps.
>>>>>
>>>>> thanks,
>>>>> Kishore G
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Jun 17, 2013 at 7:01 AM, Manikumar Reddy <kumar@nmsworks.co.in
>>>>> > wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I am trying to understand the Helix Controller/Cluster manager fault
>>>>>> tolerance mechanism.
>>>>>> Single Controller will become Single-Point-Failure. So what are the
>>>>>> available options/techniques to
>>>>>> achieve controller fault tolerance?   Any pointers/recipes/code
>>>>>> snippets?
>>>>>>
>>>>>> Regards,
>>>>>> Kumar
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Controller fault tolerance

Posted by Lance Co Ting Keh <la...@box.com>.

Thank you very much for the quick response guys


On Fri, Jun 21, 2013 at 1:49 PM, Zhen Zhang <zz...@linkedin.com> wrote:

>  yes. Using different names for the controllers is a quick workaround.
>
>   From: Lance Co Ting Keh <la...@box.com>
> Reply-To: "user@helix.incubator.apache.org" <
> user@helix.incubator.apache.org>
> Date: Friday, June 21, 2013 1:47 PM
>
> To: "user@helix.incubator.apache.org" <us...@helix.incubator.apache.org>
> Subject: Re: Controller fault tolerance
>
>   Okay thank you. But for now the quick fix is to make sure to name the
> controllers differently?
>
>
> On Fri, Jun 21, 2013 at 1:44 PM, Zhen Zhang <zz...@linkedin.com> wrote:
>
>>  This is a known bug in helix.
>> https://issues.apache.org/jira/browse/HELIX-123
>>
>>  The problem is we are comparing the instance name of the controller but
>> not the session id, so if you start two controllers of the same name,
>> isLeader() return true. We will fix it shortly.
>>
>>  Thanks,
>> Jason
>>
>>   From: Lance Co Ting Keh <la...@box.com>
>> Reply-To: "user@helix.incubator.apache.org" <
>> user@helix.incubator.apache.org>
>> Date: Friday, June 21, 2013 1:39 PM
>> To: "user@helix.incubator.apache.org" <us...@helix.incubator.apache.org>
>> Subject: Re: Controller fault tolerance
>>
>>   Hi Kishore,
>>
>>  I tried starting two controllers programmatically like you mentioned:
>>
>>  controllerManager = HelixControllerMain.startHelixController(zkAddress,
>>
>>
>>           clusterName, "controller", HelixControllerMain.STANDALONE);
>>
>>
>> I then called isLeader() on the both managers (http://helix.incubator.apache.org/apidocs/reference/org/apache/helix/HelixManager.html#isLeader()). and both of them returned true. They're obviously both on the same zookeeper instance, and on the same cluster. The controllers are running and so im not sure whether or not its actually leader electing properly, or I'm misinterpreting the isLeader() function
>>
>>
>> Thanks
>> Lance
>>
>>
>>
>> On Mon, Jun 17, 2013 at 9:22 AM, Manikumar Reddy <ku...@nmsworks.co.in>wrote:
>>
>>> Hi Kishore,
>>>
>>> Thanks for the quick response.
>>>
>>> Regards,
>>> Kumar
>>>
>>>
>>> On Mon, Jun 17, 2013 at 8:18 PM, kishore g <g....@gmail.com> wrote:
>>>
>>>> Hi Kumar,
>>>>
>>>>  You can start multiple controllers and only one of them will be
>>>> active and rest of them will be in standby mode. If the active controller
>>>> fails, one of the standby will become active and start managing the cluster.
>>>>
>>>>  You can start the controllers either using command line or
>>>> programmatically.
>>>>
>>>>  command line
>>>>
>>>> ./run-helix-controller.sh --zkSvr localhost:2199 --cluster <clustername>
>>>>
>>>>  using Helix api
>>>>
>>>> controllerManager = HelixControllerMain.startHelixController(zkAddress,
>>>>           clusterName, "controller", HelixControllerMain.STANDALONE);
>>>>
>>>> Hope this helps.
>>>>
>>>> thanks,
>>>> Kishore G
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Jun 17, 2013 at 7:01 AM, Manikumar Reddy <ku...@nmsworks.co.in>wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I am trying to understand the Helix Controller/Cluster manager fault
>>>>> tolerance mechanism.
>>>>> Single Controller will become Single-Point-Failure. So what are the
>>>>> available options/techniques to
>>>>> achieve controller fault tolerance?   Any pointers/recipes/code
>>>>> snippets?
>>>>>
>>>>> Regards,
>>>>> Kumar
>>>>
>>>>
>>>>
>>>
>>
>

Re: Controller fault tolerance

Posted by Zhen Zhang <zz...@linkedin.com>.

yes. Using different names for the controllers is a quick workaround.

From: Lance Co Ting Keh <la...@box.com>>
Reply-To: "user@helix.incubator.apache.org<ma...@helix.incubator.apache.org>" <us...@helix.incubator.apache.org>>
Date: Friday, June 21, 2013 1:47 PM
To: "user@helix.incubator.apache.org<ma...@helix.incubator.apache.org>" <us...@helix.incubator.apache.org>>
Subject: Re: Controller fault tolerance

Okay thank you. But for now the quick fix is to make sure to name the controllers differently?

On Fri, Jun 21, 2013 at 1:44 PM, Zhen Zhang <zz...@linkedin.com>> wrote:
This is a known bug in helix.
https://issues.apache.org/jira/browse/HELIX-123

The problem is we are comparing the instance name of the controller but not the session id, so if you start two controllers of the same name, isLeader() return true. We will fix it shortly.

Thanks,
Jason

From: Lance Co Ting Keh <la...@box.com>>
Reply-To: "user@helix.incubator.apache.org<ma...@helix.incubator.apache.org>" <us...@helix.incubator.apache.org>>
Date: Friday, June 21, 2013 1:39 PM
To: "user@helix.incubator.apache.org<ma...@helix.incubator.apache.org>" <us...@helix.incubator.apache.org>>
Subject: Re: Controller fault tolerance

Hi Kishore,

I tried starting two controllers programmatically like you mentioned:

controllerManager = HelixControllerMain.startHelixController(zkAddress,

          clusterName, "controller", HelixControllerMain.STANDALONE);

I then called isLeader() on the both managers (http://helix.incubator.apache.org/apidocs/reference/org/apache/helix/HelixManager.html#isLeader()). and both of them returned true. They're obviously both on the same zookeeper instance, and on the same cluster. The controllers are running and so im not sure whether or not its actually leader electing properly, or I'm misinterpreting the isLeader() function

Thanks
Lance

On Mon, Jun 17, 2013 at 9:22 AM, Manikumar Reddy <ku...@nmsworks.co.in>> wrote:
Hi Kishore,

Thanks for the quick response.

Regards,
Kumar

On Mon, Jun 17, 2013 at 8:18 PM, kishore g <g....@gmail.com>> wrote:
Hi Kumar,

You can start multiple controllers and only one of them will be active and rest of them will be in standby mode. If the active controller fails, one of the standby will become active and start managing the cluster.

You can start the controllers either using command line or programmatically.

command line

./run-helix-controller.sh --zkSvr localhost:2199 --cluster <clustername>

using Helix api

controllerManager = HelixControllerMain.startHelixController(zkAddress,
          clusterName, "controller", HelixControllerMain.STANDALONE);

Hope this helps.

thanks,
Kishore G

On Mon, Jun 17, 2013 at 7:01 AM, Manikumar Reddy <ku...@nmsworks.co.in>> wrote:
Hi,

I am trying to understand the Helix Controller/Cluster manager fault tolerance mechanism.
Single Controller will become Single-Point-Failure. So what are the available options/techniques to
achieve controller fault tolerance?   Any pointers/recipes/code snippets?

Regards,
Kumar

Re: Controller fault tolerance

Posted by Lance Co Ting Keh <la...@box.com>.

Okay thank you. But for now the quick fix is to make sure to name the
controllers differently?


On Fri, Jun 21, 2013 at 1:44 PM, Zhen Zhang <zz...@linkedin.com> wrote:

>  This is a known bug in helix.
> https://issues.apache.org/jira/browse/HELIX-123
>
>  The problem is we are comparing the instance name of the controller but
> not the session id, so if you start two controllers of the same name,
> isLeader() return true. We will fix it shortly.
>
>  Thanks,
> Jason
>
>   From: Lance Co Ting Keh <la...@box.com>
> Reply-To: "user@helix.incubator.apache.org" <
> user@helix.incubator.apache.org>
> Date: Friday, June 21, 2013 1:39 PM
> To: "user@helix.incubator.apache.org" <us...@helix.incubator.apache.org>
> Subject: Re: Controller fault tolerance
>
>   Hi Kishore,
>
>  I tried starting two controllers programmatically like you mentioned:
>
>  controllerManager = HelixControllerMain.startHelixController(zkAddress,
>
>           clusterName, "controller", HelixControllerMain.STANDALONE);
>
>
> I then called isLeader() on the both managers (http://helix.incubator.apache.org/apidocs/reference/org/apache/helix/HelixManager.html#isLeader()). and both of them returned true. They're obviously both on the same zookeeper instance, and on the same cluster. The controllers are running and so im not sure whether or not its actually leader electing properly, or I'm misinterpreting the isLeader() function
>
>
> Thanks
> Lance
>
>
>
> On Mon, Jun 17, 2013 at 9:22 AM, Manikumar Reddy <ku...@nmsworks.co.in>wrote:
>
>> Hi Kishore,
>>
>> Thanks for the quick response.
>>
>> Regards,
>> Kumar
>>
>>
>> On Mon, Jun 17, 2013 at 8:18 PM, kishore g <g....@gmail.com> wrote:
>>
>>> Hi Kumar,
>>>
>>>  You can start multiple controllers and only one of them will be active
>>> and rest of them will be in standby mode. If the active controller fails,
>>> one of the standby will become active and start managing the cluster.
>>>
>>>  You can start the controllers either using command line or
>>> programmatically.
>>>
>>>  command line
>>>
>>> ./run-helix-controller.sh --zkSvr localhost:2199 --cluster <clustername>
>>>
>>>  using Helix api
>>>
>>> controllerManager = HelixControllerMain.startHelixController(zkAddress,
>>>           clusterName, "controller", HelixControllerMain.STANDALONE);
>>>
>>> Hope this helps.
>>>
>>> thanks,
>>> Kishore G
>>>
>>>
>>>
>>>
>>> On Mon, Jun 17, 2013 at 7:01 AM, Manikumar Reddy <ku...@nmsworks.co.in>wrote:
>>>
>>>> Hi,
>>>>
>>>> I am trying to understand the Helix Controller/Cluster manager fault
>>>> tolerance mechanism.
>>>> Single Controller will become Single-Point-Failure. So what are the
>>>> available options/techniques to
>>>> achieve controller fault tolerance?   Any pointers/recipes/code
>>>> snippets?
>>>>
>>>> Regards,
>>>> Kumar
>>>
>>>
>>>
>>
>

Re: Controller fault tolerance

Posted by Zhen Zhang <zz...@linkedin.com>.

This is a known bug in helix.
https://issues.apache.org/jira/browse/HELIX-123

The problem is we are comparing the instance name of the controller but not the session id, so if you start two controllers of the same name, isLeader() return true. We will fix it shortly.

Thanks,
Jason

From: Lance Co Ting Keh <la...@box.com>>
Reply-To: "user@helix.incubator.apache.org<ma...@helix.incubator.apache.org>" <us...@helix.incubator.apache.org>>
Date: Friday, June 21, 2013 1:39 PM
To: "user@helix.incubator.apache.org<ma...@helix.incubator.apache.org>" <us...@helix.incubator.apache.org>>
Subject: Re: Controller fault tolerance

Hi Kishore,

I tried starting two controllers programmatically like you mentioned:


controllerManager = HelixControllerMain.startHelixController(zkAddress,

          clusterName, "controller", HelixControllerMain.STANDALONE);


I then called isLeader() on the both managers (http://helix.incubator.apache.org/apidocs/reference/org/apache/helix/HelixManager.html#isLeader()). and both of them returned true. They're obviously both on the same zookeeper instance, and on the same cluster. The controllers are running and so im not sure whether or not its actually leader electing properly, or I'm misinterpreting the isLeader() function


Thanks
Lance



On Mon, Jun 17, 2013 at 9:22 AM, Manikumar Reddy <ku...@nmsworks.co.in>> wrote:
Hi Kishore,

Thanks for the quick response.

Regards,
Kumar


On Mon, Jun 17, 2013 at 8:18 PM, kishore g <g....@gmail.com>> wrote:
Hi Kumar,

You can start multiple controllers and only one of them will be active and rest of them will be in standby mode. If the active controller fails, one of the standby will become active and start managing the cluster.

You can start the controllers either using command line or programmatically.

command line

./run-helix-controller.sh --zkSvr localhost:2199 --cluster <clustername>

using Helix api

controllerManager = HelixControllerMain.startHelixController(zkAddress,
          clusterName, "controller", HelixControllerMain.STANDALONE);

Hope this helps.

thanks,
Kishore G



On Mon, Jun 17, 2013 at 7:01 AM, Manikumar Reddy <ku...@nmsworks.co.in>> wrote:
Hi,

I am trying to understand the Helix Controller/Cluster manager fault tolerance mechanism.
Single Controller will become Single-Point-Failure. So what are the available options/techniques to
achieve controller fault tolerance?   Any pointers/recipes/code snippets?

Regards,
Kumar

Re: Controller fault tolerance

Posted by Lance Co Ting Keh <la...@box.com>.

Hi Kishore,

I tried starting two controllers programmatically like you mentioned:

controllerManager = HelixControllerMain.startHelixController(zkAddress,
          clusterName, "controller", HelixControllerMain.STANDALONE);


I then called isLeader() on the both managers
(http://helix.incubator.apache.org/apidocs/reference/org/apache/helix/HelixManager.html#isLeader()).
and both of them returned true. They're obviously both on the same
zookeeper instance, and on the same cluster. The controllers are
running and so im not sure whether or not its actually leader electing
properly, or I'm misinterpreting the isLeader() function


Thanks
Lance



On Mon, Jun 17, 2013 at 9:22 AM, Manikumar Reddy <ku...@nmsworks.co.in>wrote:

> Hi Kishore,
>
> Thanks for the quick response.
>
> Regards,
> Kumar
>
>
> On Mon, Jun 17, 2013 at 8:18 PM, kishore g <g....@gmail.com> wrote:
>
>> Hi Kumar,
>>
>> You can start multiple controllers and only one of them will be active
>> and rest of them will be in standby mode. If the active controller fails,
>> one of the standby will become active and start managing the cluster.
>>
>> You can start the controllers either using command line or
>> programmatically.
>>
>> command line
>>
>> ./run-helix-controller.sh --zkSvr localhost:2199 --cluster <clustername>
>>
>> using Helix api
>>
>> controllerManager = HelixControllerMain.startHelixController(zkAddress,
>>
>>           clusterName, "controller", HelixControllerMain.STANDALONE);
>>
>>
>> Hope this helps.
>>
>>
>> thanks,
>> Kishore G
>>
>>
>>
>>
>> On Mon, Jun 17, 2013 at 7:01 AM, Manikumar Reddy <ku...@nmsworks.co.in>wrote:
>>
>>> Hi,
>>>
>>> I am trying to understand the Helix Controller/Cluster manager fault
>>> tolerance mechanism.
>>> Single Controller will become Single-Point-Failure. So what are the
>>> available options/techniques to
>>> achieve controller fault tolerance?   Any pointers/recipes/code snippets?
>>>
>>> Regards,
>>> Kumar
>>
>>
>>
>

Re: Controller fault tolerance

Posted by Manikumar Reddy <ku...@nmsworks.co.in>.

Hi Kishore,

Thanks for the quick response.

Regards,
Kumar

On Mon, Jun 17, 2013 at 8:18 PM, kishore g <g....@gmail.com> wrote:

> Hi Kumar,
>
> You can start multiple controllers and only one of them will be active and
> rest of them will be in standby mode. If the active controller fails, one
> of the standby will become active and start managing the cluster.
>
> You can start the controllers either using command line or
> programmatically.
>
> command line
>
> ./run-helix-controller.sh --zkSvr localhost:2199 --cluster <clustername>
>
> using Helix api
>
> controllerManager = HelixControllerMain.startHelixController(zkAddress,
>           clusterName, "controller", HelixControllerMain.STANDALONE);
>
> Hope this helps.
>
> thanks,
> Kishore G
>
>
>
>
> On Mon, Jun 17, 2013 at 7:01 AM, Manikumar Reddy <ku...@nmsworks.co.in>wrote:
>
>> Hi,
>>
>> I am trying to understand the Helix Controller/Cluster manager fault
>> tolerance mechanism.
>> Single Controller will become Single-Point-Failure. So what are the
>> available options/techniques to
>> achieve controller fault tolerance?   Any pointers/recipes/code snippets?
>>
>> Regards,
>> Kumar
>
>
>

Re: Controller fault tolerance

Posted by kishore g <g....@gmail.com>.

Hi Kumar,

You can start multiple controllers and only one of them will be active and
rest of them will be in standby mode. If the active controller fails, one
of the standby will become active and start managing the cluster.

You can start the controllers either using command line or programmatically.

command line

./run-helix-controller.sh --zkSvr localhost:2199 --cluster <clustername>

using Helix api

controllerManager = HelixControllerMain.startHelixController(zkAddress,
          clusterName, "controller", HelixControllerMain.STANDALONE);

Hope this helps.

thanks,
Kishore G

On Mon, Jun 17, 2013 at 7:01 AM, Manikumar Reddy <ku...@nmsworks.co.in>wrote:

> Hi,
>
> I am trying to understand the Helix Controller/Cluster manager fault
> tolerance mechanism.
> Single Controller will become Single-Point-Failure. So what are the
> available options/techniques to
> achieve controller fault tolerance?   Any pointers/recipes/code snippets?
>
> Regards,
> Kumar