You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@helix.apache.org by Shekhar Bansal <sh...@yahoo.com> on 2017/06/19 14:24:15 UTC

HDFS read load distribution using helix

I have a standalone java app(containerised), it reads data from HDFS, does some transformations and write data to remote storage. I want to make it scalable by launching multiple instances of this java app. My problem is how to assign tasks among these instances. can helix solve this problem?
If yes, can you please help me with following    
   - I referred helix quickstart example and created 1 resource per file but node1 got assigned master for all resources, is it because of simple StateModelDefinition used in quickstart example or I am using it wrong way or is it some limitation of helix   

   - I want to avoid running a separate controller process, so If I run start controller as part of setup will helix be able to elect master controller (in standalone mode), is it advisable to run tens of controllers in distributed mode.   

   - I schedule my app every five minutes using kubernetes cron, is it advisable to use helix for such short lived processes


ThanksShekhar

Re: HDFS read load distribution using helix

Posted by kishore g <g....@gmail.com>.
   1. Currently, Helix ensures even distribution of partitions within a
   resource, not across resources. Is it possible for you to add tasks as part
   of the same resource?
   2.  &3 Yes, you can start the controller as part of your process. But
   since you said you launch this on Kubernetes every 5 minutes, I suggest
   keeping controller and zookeeper running all the time. Controllers are
   light weight and you can get away with a very an entry level container
   spec. It's ok to launch Helix Participants every 5 minutes.

You should consider using Helix Task Framework
<http://helix.apache.org/0.6.7-docs/tutorial_task_framework.html>. It
provides all the functionalities you need.


On Mon, Jun 19, 2017 at 7:24 AM, Shekhar Bansal <sh...@yahoo.com>
wrote:

> I have a standalone java app(containerised), it reads data from HDFS, does
> some transformations and write data to remote storage. I want to make it
> scalable by launching multiple instances of this java app. My problem is
> how to assign tasks among these instances. can helix solve this problem?
>
> If yes, can you please help me with following
>
>    1. I referred helix quickstart example and created 1 resource per file
>    but node1 got assigned master for all resources, is it because of simple
>    StateModelDefinition used in quickstart example or I am using it wrong way
>    or is it some limitation of helix
>    2. I want to avoid running a separate controller process, so If I run
>    start controller as part of setup will helix be able to elect master
>    controller (in standalone mode), is it advisable to run tens of controllers
>    in distributed mode.
>    3. I schedule my app every five minutes using kubernetes cron, is it
>    advisable to use helix for such short lived processes
>
>
>
> Thanks
> Shekhar
>