You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@helix.apache.org by Diot Sébastien <s....@eurodata.de> on 2018/07/12 14:56:19 UTC

Can "resource weight" be taken into consideration when load-balancing using Helix?

Hi,

First message. I've just discovered Apache Helix, while looking at Pinterest Rocksplicator. I was wandering if Helix could replace our home-grown load-balancer.

We have a productive cluster of 12 "large" Java application servers, with a home-grown Java load-balancer, that acts as a "registry", but not as a reverse-proxy. The clients call the load-balancer with the "entity ID" they want to access, and the load-balancer returns them a URL to the application server they should (currently) use to access that entity. The entities are stored in a central DB, and the application servers session pools provide a cache to reduce the load on the DB. The entities vary in size by a factor of 100, with the vast majority (95%) being "small", but a few being "very large". We have about 130K "entities", and per day about 7K different entities are accessed.

Firstly, I'm not sure if this would be modeled as one single "task", and one "partition" per currently cached entity (dynamically added and removed), OR one task per entity (dynamically added and removed), and a single partition per task. Since all data is basically changed on each access, and is stored in a central DB, we have no use for "replicas".

Secondly, since the "size" of each entity can vary a lot, our LB takes the entity size into consideration (together with CPU load and a few other factors) when computing the "load" of each node. So, can "resource weight" be taken into consideration when load-balancing using Helix?

Regards,

Sébastien Diot
Softwareentwickler
Softwareentwicklung edlohn

eurodata AG | Großblittersdorfer Str. 257-259 | D-66119 Saarbrücken
Telefon +49 681 8808 768 | Telefax +49 681 8808 787
s.diot@eurodata.de | www.eurodata.de | www.facebook.com/eurodata.de

HRB 101336 Amtsgericht Saarbrücken | USt-IdNr. DE 182634634
Aufsichtsratsvorsitzender: Franz-Josef Wernze
Vorstand: Dieter Leinen


Re: Can "resource weight" be taken into consideration when load-balancing using Helix?

Posted by kishore g <g....@gmail.com>.
I mostly agree with the solution proposed by Asmund. One addition would be
that you can have replicas for each partition and that would give you fault
tolerance. You can use FULL AUTO mode and let Helix manage everything.

Regarding the statement "My understanding of helix is that it isn't trivial
to dynamically add a partition", it is possible to add partitions
dynamically and we heavily rely on that feature in Pinot.

thanks,
Kishore G

On Thu, Jul 12, 2018 at 11:06 AM, Åsmund Tokheim <as...@gmail.com> wrote:

> Hi
>
> I'm not that experienced with helix, so wait and see if anyone offers any
> corrections.
>
> My understanding of helix is that it isn't trivial to dynamically add a
> partition, and you in any case wouldn't want thousands of partitions or
> 'tasks'.
>
> For a problem like yours, I would define one task with say 100 partitions.
> When the load balancer receives an entity id, you could use something like
> consistent hashing to identify what partition that id belongs to.
>
> That would also to some degree reduce your need for resource weights, as
> averaged over thousand random entities, each partition should roughly be
> the same. I'm not aware of any concept like resource/partition weights but
> you can probably achieve the same effect by using custom rebalanced.
>
> Regards
> Åsmund
>
>
> On Thu, 12 Jul 2018, 16:56 Diot Sébastien, <s....@eurodata.de> wrote:
>
>> Hi,
>>
>> First message. I've just discovered Apache Helix, while looking at
>> Pinterest Rocksplicator. I was wandering if Helix could replace our
>> home-grown load-balancer.
>>
>> We have a productive cluster of 12 "large" Java application servers, with
>> a home-grown Java load-balancer, that acts as a "registry", but not as a
>> reverse-proxy. The clients call the load-balancer with the "entity ID" they
>> want to access, and the load-balancer returns them a URL to the application
>> server they should (currently) use to access that entity. The entities are
>> stored in a central DB, and the application servers session pools provide a
>> cache to reduce the load on the DB. The entities vary in size by a factor
>> of 100, with the vast majority (95%) being "small", but a few being "very
>> large". We have about 130K "entities", and per day about 7K different
>> entities are accessed.
>>
>> Firstly, I'm not sure if this would be modeled as one single "task", and
>> one "partition" per currently cached entity (dynamically added and
>> removed), OR one task per entity (dynamically added and removed), and a
>> single partition per task. Since all data is basically changed on each
>> access, and is stored in a central DB, we have no use for "replicas".
>>
>> Secondly, since the "size" of each entity can vary a lot, our LB takes
>> the entity size into consideration (together with CPU load and a few other
>> factors) when computing the "load" of each node. So, can "resource weight"
>> be taken into consideration when load-balancing using Helix?
>>
>> Regards,
>>
>> Sébastien Diot
>> Softwareentwickler
>> Softwareentwicklung edlohn
>>
>> eurodata AG | Großblittersdorfer Str. 257-259 | D-66119 Saarbrücken
>> <https://maps.google.com/?q=Gro%C3%9Fblittersdorfer+Str.+257-259+%7C+D-66119+Saarbr%C3%BCcken&entry=gmail&source=g>
>> Telefon +49 681 8808 768 | Telefax +49 681 8808 787
>> s.diot@eurodata.de | www.eurodata.de | www.facebook.com/eurodata.de
>>
>> HRB 101336 Amtsgericht Saarbrücken | USt-IdNr. DE 182634634
>> Aufsichtsratsvorsitzender: Franz-Josef Wernze
>> Vorstand: Dieter Leinen
>>
>>

Re: Can "resource weight" be taken into consideration when load-balancing using Helix?

Posted by Åsmund Tokheim <as...@gmail.com>.
Hi

I'm not that experienced with helix, so wait and see if anyone offers any
corrections.

My understanding of helix is that it isn't trivial to dynamically add a
partition, and you in any case wouldn't want thousands of partitions or
'tasks'.

For a problem like yours, I would define one task with say 100 partitions.
When the load balancer receives an entity id, you could use something like
consistent hashing to identify what partition that id belongs to.

That would also to some degree reduce your need for resource weights, as
averaged over thousand random entities, each partition should roughly be
the same. I'm not aware of any concept like resource/partition weights but
you can probably achieve the same effect by using custom rebalanced.

Regards
Åsmund


On Thu, 12 Jul 2018, 16:56 Diot Sébastien, <s....@eurodata.de> wrote:

> Hi,
>
> First message. I've just discovered Apache Helix, while looking at
> Pinterest Rocksplicator. I was wandering if Helix could replace our
> home-grown load-balancer.
>
> We have a productive cluster of 12 "large" Java application servers, with
> a home-grown Java load-balancer, that acts as a "registry", but not as a
> reverse-proxy. The clients call the load-balancer with the "entity ID" they
> want to access, and the load-balancer returns them a URL to the application
> server they should (currently) use to access that entity. The entities are
> stored in a central DB, and the application servers session pools provide a
> cache to reduce the load on the DB. The entities vary in size by a factor
> of 100, with the vast majority (95%) being "small", but a few being "very
> large". We have about 130K "entities", and per day about 7K different
> entities are accessed.
>
> Firstly, I'm not sure if this would be modeled as one single "task", and
> one "partition" per currently cached entity (dynamically added and
> removed), OR one task per entity (dynamically added and removed), and a
> single partition per task. Since all data is basically changed on each
> access, and is stored in a central DB, we have no use for "replicas".
>
> Secondly, since the "size" of each entity can vary a lot, our LB takes the
> entity size into consideration (together with CPU load and a few other
> factors) when computing the "load" of each node. So, can "resource weight"
> be taken into consideration when load-balancing using Helix?
>
> Regards,
>
> Sébastien Diot
> Softwareentwickler
> Softwareentwicklung edlohn
>
> eurodata AG | Großblittersdorfer Str. 257-259 | D-66119 Saarbrücken
> Telefon +49 681 8808 768 | Telefax +49 681 8808 787
> s.diot@eurodata.de | www.eurodata.de | www.facebook.com/eurodata.de
>
> HRB 101336 Amtsgericht Saarbrücken | USt-IdNr. DE 182634634
> Aufsichtsratsvorsitzender: Franz-Josef Wernze
> Vorstand: Dieter Leinen
>
>