You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@helix.apache.org by jianjie feng <au...@gmail.com> on 2015/01/16 04:57:15 UTC

HELP

hi,
  we are trying to use Helix to manage our clusters ( 300+ nodes) and we
now have a problem, please help!

  let me describe it.

  our clusters is made up of servers and clients; servers are partitioned
into groups ( partition in Helix) and clients are partitioned to
accordingly; now we are trying to do some fault-tolerant thing like this:

  1) if one server-node fails, trigger a state transition (server site) ,
do something like print log, trigger alarm and restart the server process;

  2)then, trigger some state transition on all client-nodes belonging this
partition, do something like kick the fail-server and release the fail
server's resource on client;

  could someone please tell me how to inplement this using Helix, thanks!

  it'll be better if you could show me some code samplesl

  thanks!

Re: HELP

Posted by Zhen Zhang <zz...@linkedin.com>.

Hi Jiangjie,

A few things we need to make clear when using Helix to manage your cluster.

  1.  What is the resource and how it is partitioned. Based on your description, the resource seems to be a set of machines (servers and clients).
  2.  Who host the resource. Helix is about resource assignment in distributed systems. For example, if you have a database, it may be partitioned and hosted by a set of nodes. In your case, it’s not clear who host the resource.
  3.  What is the state model you are going to use?
  4.  Failure handing. In your description, if a server fails, a state transition will be triggered on both servers and clients. It’s not clear which server should receive the notification

Once we are clear on these, it should be fairly straightforward to use Helix. You may also be interested in looking at a few simple examples under the recipes folder (https://github.com/apache/helix/tree/master/recipes).

Thanks,
Jason

From: jianjie feng <au...@gmail.com>>
Reply-To: "user@helix.apache.org<ma...@helix.apache.org>" <us...@helix.apache.org>>
Date: Thursday, January 15, 2015 at 7:57 PM
To: "user@helix.apache.org<ma...@helix.apache.org>" <us...@helix.apache.org>>
Subject: HELP

hi,
  we are trying to use Helix to manage our clusters ( 300+ nodes) and we now have a problem, please help!

  let me describe it.

  our clusters is made up of servers and clients; servers are partitioned into groups ( partition in Helix) and clients are partitioned to accordingly; now we are trying to do some fault-tolerant thing like this:

  1) if one server-node fails, trigger a state transition (server site) , do something like print log, trigger alarm and restart the server process;

  2)then, trigger some state transition on all client-nodes belonging this partition, do something like kick the fail-server and release the fail server's resource on client;

  could someone please tell me how to inplement this using Helix, thanks!

  it'll be better if you could show me some code samplesl

  thanks!