You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ignite.apache.org by Peter <gr...@gmx.de> on 2018/11/30 00:44:56 UTC

Fair queue polling policy?

Hello,

My aim is a queue for load balancing that is described in the
documentation
<https://apacheignite.readme.io/v2.6/docs/queue-and-set#section-cache-queues-and-load-balancing>:
create an "ideally balanced system where every node only takes the
number of jobs it can process, and not more."

I'm using jdk8 and ignite 2.6.0. I have successfully set up a two node
ignite cluster where node1 has same CPU count (8) and same RAM as node2
but slightly slower CPU (virtual vs. dedicated). I created one unbounded
queue in this system (no collection configuration, also no config for
cluster except TcpDiscoveryVmIpFinder).

I call queue.put on both nodes at an equal rate and have one
non-ignite-thread per node that does "queue.take()" and what I expect is
that both machines go equally fast into the 100% CPU usage as both
machines poll at their best frequency. But what I observe is that the
slower node (node1) gets approx. 5 times more items via queue.take than
node2. This leads to 10% CPU usage on node2 and 100% CPU usage on node1
and I never had the case where it was equal.

What could be the reason? Is there a fair polling configuration or some
anti-affine? Or is it required to do queue.take() inside a Runnable
submitted via ignite.compute().something?

I also played with CollectionConfiguration.setCacheMode but the problem
persists. Any pointers are appreciated.

Kind Regards
Peter

Re: Fair queue polling policy?

Posted by Peter <gr...@gmx.de>.

Hello Stan,

Have a bit thought about this again. The problem is that I used round
robin and populate the distributed queue on node1 and node2 BUT still
the queue.take is not equally called from both servers and this means
that ignite is artificially increasing network traffic IMO.

> It’s probably due to that the first value (when the queue is empty)
always has the same key

Shouldn't the implementation prefer polling clients that are local to
the "put" if the queue is empty?

Regards
Peter

Am 04.12.18 um 19:07 schrieb Peter:
> Hello Stan,
>
> Thanks for your detailed answer on this topic.
>
> I think you are right that this is no bug and currently I do not see a
> problem with this, except that the documentation
> <https://apacheignite.readme.io/v2.6/docs/queue-and-set#section-cache-queues-and-load-balancing>
> is a bit misleading:/Given this approach, threads on remote nodes will
> only start working on the next job when they have completed the
> previous one, hence creating ideally balanced system where every node
> only takes the number of jobs it can process, and not more. /
>
> I think without round robin this is not 100% true: "creating ideally
> balanced system"
>
> Allow me a further question as I have speed problems for objects in
> the KB-size range. Which overhead can be expected from the distributed
> queue? Can I assume roughly the same numbers like when sending these
> objects via http (round robin) or can it be several times slower like
> I currently observe in my test environment?
>
> Regards
> Peter
>
> Am 03.12.18 um 19:02 schrieb Stanislav Lukyanov:
>>
>> I think what you’re talking about isn’t fairness, it’s round-robinness.
>>
>> You can’t distribute a single piece of work among multiple nodes
>> fairly – one gets it and others don’t.
>>
>> Yes, it could be using different node each time, but it I don’t
>> really a use case for that.
>>
>>  
>>
>> The queue itself isn’t a load balancer implementation, it doesn’t
>> even need to care about fairness or anything.
>>
>> All it need is to implement queue interface efficiently.
>>
>>  
>>
>> I think I can explain the fact that one node gets the data most of
>> the time.
>>
>> It’s probably due to that the first value (when the queue is empty)
>> always has the same key – and always ends up on the same node.
>>
>> So the behavior is not in that the same client get’s the value – it’s
>> in that the same server always stores the first (second, third) value.
>>
>> When all the servers try to get and remove the same value, the one
>> closest to it (i.e. the one storing it) wins.
>>
>> We probably could randomize the distribution – but it’s going to cost
>> us in terms of code complexity and, maybe, performance.
>>
>>  
>>
>> Overall, I don’t think it’s a bug in Ignite, and we would need a
>> solid justification to change the behavior.
>>
>>  
>>
>> Do you have a use case when a random distribution is important?
>>
>>  
>>
>> Stan
>>
>>  
>>
>> *From: *Peter <ma...@gmx.de>
>> *Sent: *30 ноября 2018 г. 17:30
>> *To: *user@ignite.apache.org <ma...@ignite.apache.org>
>> *Subject: *Re: Fair queue polling policy?
>>
>>  
>>
>> Hello,
>>
>>  
>>
>> I have found this discussion
>> <http://apache-ignite-users.70518.x6.nabble.com/Clearing-a-distributed-queue-hangs-after-taking-down-one-node-td7353.html>
>> about the same topic and indeed the example there works and the
>> queues poll fair.
>>
>>  
>>
>> And when I tweak the sleep after put and take, so that the queue
>> stays mostly empty all the, I can reproduce the unfair behaviour!
>>
>> https://github.com/karussell/igniteexample/blob/master/src/main/java/test/IgniteTest.java
>>
>>  
>>
>> I'm not sure if this is a bug as it should be the responsibility of
>> the client to avoid overloading itself. E.g. in my case this happened
>> because I allowed too many threads for the tasks on the polling side,
>> leading to too frequent polling, which leads to this mostly empty queue.
>>
>>  
>>
>> But IMO it should be clarified in the documentation as one expects a
>> round robin behaviour even for empty queues. And e.g. in low latency
>> environments and/or environments with many clients this could make
>> problems. I have created an issue about it here:
>> https://issues.apache.org/jira/browse/IGNITE-10496
>>
>>  
>>
>> Kind Regards
>> Peter
>>
>>  
>>
>> Am 30.11.18 um 01:44 schrieb Peter:
>>
>>     Hello,
>>
>>     My aim is a queue for load balancing that is described in the
>>     documentation
>>     <https://apacheignite.readme.io/v2.6/docs/queue-and-set#section-cache-queues-and-load-balancing>:
>>     create an "ideally balanced system where every node only takes
>>     the number of jobs it can process, and not more."
>>
>>     I'm using jdk8 and ignite 2.6.0. I have successfully set up a two
>>     node ignite cluster where node1 has same CPU count (8) and same
>>     RAM as node2 but slightly slower CPU (virtual vs. dedicated). I
>>     created one unbounded queue in this system (no collection
>>     configuration, also no config for cluster except
>>     TcpDiscoveryVmIpFinder).
>>
>>     I call queue.put on both nodes at an equal rate and have one
>>     non-ignite-thread per node that does "queue.take()" and what I
>>     expect is that both machines go equally fast into the 100% CPU
>>     usage as both machines poll at their best frequency. But what I
>>     observe is that the slower node (node1) gets approx. 5 times more
>>     items via queue.take than node2. This leads to 10% CPU usage on
>>     node2 and 100% CPU usage on node1 and I never had the case where
>>     it was equal.
>>
>>     What could be the reason? Is there a fair polling configuration
>>     or some anti-affine? Or is it required to do queue.take() inside
>>     a Runnable submitted via ignite.compute().something?
>>
>>     I also played with CollectionConfiguration.setCacheMode but the
>>     problem persists. Any pointers are appreciated.
>>
>>     Kind Regards
>>     Peter
>>
>>  
>>
>>  
>>
>

-- 
GraphHopper.com - fast and flexible route planning

Re: Fair queue polling policy?

Posted by Peter <gr...@gmx.de>.

Hello Stan,

Thanks for your detailed answer on this topic.

I think you are right that this is no bug and currently I do not see a
problem with this, except that the documentation
<https://apacheignite.readme.io/v2.6/docs/queue-and-set#section-cache-queues-and-load-balancing>
is a bit misleading:/Given this approach, threads on remote nodes will
only start working on the next job when they have completed the previous
one, hence creating ideally balanced system where every node only takes
the number of jobs it can process, and not more. /

I think without round robin this is not 100% true: "creating ideally
balanced system"

Allow me a further question as I have speed problems for objects in the
KB-size range. Which overhead can be expected from the distributed
queue? Can I assume roughly the same numbers like when sending these
objects via http (round robin) or can it be several times slower like I
currently observe in my test environment?

Regards
Peter

Am 03.12.18 um 19:02 schrieb Stanislav Lukyanov:
>
> I think what you’re talking about isn’t fairness, it’s round-robinness.
>
> You can’t distribute a single piece of work among multiple nodes
> fairly – one gets it and others don’t.
>
> Yes, it could be using different node each time, but it I don’t really
> a use case for that.
>
>  
>
> The queue itself isn’t a load balancer implementation, it doesn’t even
> need to care about fairness or anything.
>
> All it need is to implement queue interface efficiently.
>
>  
>
> I think I can explain the fact that one node gets the data most of the
> time.
>
> It’s probably due to that the first value (when the queue is empty)
> always has the same key – and always ends up on the same node.
>
> So the behavior is not in that the same client get’s the value – it’s
> in that the same server always stores the first (second, third) value.
>
> When all the servers try to get and remove the same value, the one
> closest to it (i.e. the one storing it) wins.
>
> We probably could randomize the distribution – but it’s going to cost
> us in terms of code complexity and, maybe, performance.
>
>  
>
> Overall, I don’t think it’s a bug in Ignite, and we would need a solid
> justification to change the behavior.
>
>  
>
> Do you have a use case when a random distribution is important?
>
>  
>
> Stan
>
>  
>
> *From: *Peter <ma...@gmx.de>
> *Sent: *30 ноября 2018 г. 17:30
> *To: *user@ignite.apache.org <ma...@ignite.apache.org>
> *Subject: *Re: Fair queue polling policy?
>
>  
>
> Hello,
>
>  
>
> I have found this discussion
> <http://apache-ignite-users.70518.x6.nabble.com/Clearing-a-distributed-queue-hangs-after-taking-down-one-node-td7353.html>
> about the same topic and indeed the example there works and the queues
> poll fair.
>
>  
>
> And when I tweak the sleep after put and take, so that the queue stays
> mostly empty all the, I can reproduce the unfair behaviour!
>
> https://github.com/karussell/igniteexample/blob/master/src/main/java/test/IgniteTest.java
>
>  
>
> I'm not sure if this is a bug as it should be the responsibility of
> the client to avoid overloading itself. E.g. in my case this happened
> because I allowed too many threads for the tasks on the polling side,
> leading to too frequent polling, which leads to this mostly empty queue.
>
>  
>
> But IMO it should be clarified in the documentation as one expects a
> round robin behaviour even for empty queues. And e.g. in low latency
> environments and/or environments with many clients this could make
> problems. I have created an issue about it here:
> https://issues.apache.org/jira/browse/IGNITE-10496
>
>  
>
> Kind Regards
> Peter
>
>  
>
> Am 30.11.18 um 01:44 schrieb Peter:
>
>     Hello,
>
>     My aim is a queue for load balancing that is described in the
>     documentation
>     <https://apacheignite.readme.io/v2.6/docs/queue-and-set#section-cache-queues-and-load-balancing>:
>     create an "ideally balanced system where every node only takes the
>     number of jobs it can process, and not more."
>
>     I'm using jdk8 and ignite 2.6.0. I have successfully set up a two
>     node ignite cluster where node1 has same CPU count (8) and same
>     RAM as node2 but slightly slower CPU (virtual vs. dedicated). I
>     created one unbounded queue in this system (no collection
>     configuration, also no config for cluster except
>     TcpDiscoveryVmIpFinder).
>
>     I call queue.put on both nodes at an equal rate and have one
>     non-ignite-thread per node that does "queue.take()" and what I
>     expect is that both machines go equally fast into the 100% CPU
>     usage as both machines poll at their best frequency. But what I
>     observe is that the slower node (node1) gets approx. 5 times more
>     items via queue.take than node2. This leads to 10% CPU usage on
>     node2 and 100% CPU usage on node1 and I never had the case where
>     it was equal.
>
>     What could be the reason? Is there a fair polling configuration or
>     some anti-affine? Or is it required to do queue.take() inside a
>     Runnable submitted via ignite.compute().something?
>
>     I also played with CollectionConfiguration.setCacheMode but the
>     problem persists. Any pointers are appreciated.
>
>     Kind Regards
>     Peter
>
>  
>
>  
>

RE: Fair queue polling policy?

Posted by Stanislav Lukyanov <st...@gmail.com>.

I think what you’re talking about isn’t fairness, it’s round-robinness.
You can’t distribute a single piece of work among multiple nodes fairly – one gets it and others don’t.
Yes, it could be using different node each time, but it I don’t really a use case for that.

The queue itself isn’t a load balancer implementation, it doesn’t even need to care about fairness or anything.
All it need is to implement queue interface efficiently.

I think I can explain the fact that one node gets the data most of the time.
It’s probably due to that the first value (when the queue is empty) always has the same key – and always ends up on the same node.
So the behavior is not in that the same client get’s the value – it’s in that the same server always stores the first (second, third) value.
When all the servers try to get and remove the same value, the one closest to it (i.e. the one storing it) wins.
We probably could randomize the distribution – but it’s going to cost us in terms of code complexity and, maybe, performance. 

Overall, I don’t think it’s a bug in Ignite, and we would need a solid justification to change the behavior.

Do you have a use case when a random distribution is important?

Stan

From: Peter
Sent: 30 ноября 2018 г. 17:30
To: user@ignite.apache.org
Subject: Re: Fair queue polling policy?

Hello,

I have found this discussion about the same topic and indeed the example there works and the queues poll fair.

And when I tweak the sleep after put and take, so that the queue stays mostly empty all the, I can reproduce the unfair behaviour!
https://github.com/karussell/igniteexample/blob/master/src/main/java/test/IgniteTest.java

I'm not sure if this is a bug as it should be the responsibility of the client to avoid overloading itself. E.g. in my case this happened because I allowed too many threads for the tasks on the polling side, leading to too frequent polling, which leads to this mostly empty queue.

But IMO it should be clarified in the documentation as one expects a round robin behaviour even for empty queues. And e.g. in low latency environments and/or environments with many clients this could make problems. I have created an issue about it here: https://issues.apache.org/jira/browse/IGNITE-10496

Kind Regards
Peter

Am 30.11.18 um 01:44 schrieb Peter:
Hello,
My aim is a queue for load balancing that is described in the documentation: create an "ideally balanced system where every node only takes the number of jobs it can process, and not more."
I'm using jdk8 and ignite 2.6.0. I have successfully set up a two node ignite cluster where node1 has same CPU count (8) and same RAM as node2 but slightly slower CPU (virtual vs. dedicated). I created one unbounded queue in this system (no collection configuration, also no config for cluster except TcpDiscoveryVmIpFinder).
I call queue.put on both nodes at an equal rate and have one non-ignite-thread per node that does "queue.take()" and what I expect is that both machines go equally fast into the 100% CPU usage as both machines poll at their best frequency. But what I observe is that the slower node (node1) gets approx. 5 times more items via queue.take than node2. This leads to 10% CPU usage on node2 and 100% CPU usage on node1 and I never had the case where it was equal.
What could be the reason? Is there a fair polling configuration or some anti-affine? Or is it required to do queue.take() inside a Runnable submitted via ignite.compute().something?
I also played with CollectionConfiguration.setCacheMode but the problem persists. Any pointers are appreciated.
Kind Regards
Peter

Re: Fair queue polling policy?

Posted by Peter <gr...@gmx.de>.

Hello,

I have found this discussion
<http://apache-ignite-users.70518.x6.nabble.com/Clearing-a-distributed-queue-hangs-after-taking-down-one-node-td7353.html>
about the same topic and indeed the example there works and the queues
poll fair.

And when I tweak the sleep after put and take, so that the queue stays
mostly empty all the, I can reproduce the unfair behaviour!
https://github.com/karussell/igniteexample/blob/master/src/main/java/test/IgniteTest.java

I'm not sure if this is a bug as it should be the responsibility of the
client to avoid overloading itself. E.g. in my case this happened
because I allowed too many threads for the tasks on the polling side,
leading to too frequent polling, which leads to this mostly empty queue.

But IMO it should be clarified in the documentation as one expects a
round robin behaviour even for empty queues. And e.g. in low latency
environments and/or environments with many clients this could make
problems. I have created an issue about it here:
https://issues.apache.org/jira/browse/IGNITE-10496

Kind Regards
Peter

Am 30.11.18 um 01:44 schrieb Peter:
>
> Hello,
>
> My aim is a queue for load balancing that is described in the
> documentation
> <https://apacheignite.readme.io/v2.6/docs/queue-and-set#section-cache-queues-and-load-balancing>:
> create an "ideally balanced system where every node only takes the
> number of jobs it can process, and not more."
>
> I'm using jdk8 and ignite 2.6.0. I have successfully set up a two node
> ignite cluster where node1 has same CPU count (8) and same RAM as
> node2 but slightly slower CPU (virtual vs. dedicated). I created one
> unbounded queue in this system (no collection configuration, also no
> config for cluster except TcpDiscoveryVmIpFinder).
>
> I call queue.put on both nodes at an equal rate and have one
> non-ignite-thread per node that does "queue.take()" and what I expect
> is that both machines go equally fast into the 100% CPU usage as both
> machines poll at their best frequency. But what I observe is that the
> slower node (node1) gets approx. 5 times more items via queue.take
> than node2. This leads to 10% CPU usage on node2 and 100% CPU usage on
> node1 and I never had the case where it was equal.
>
> What could be the reason? Is there a fair polling configuration or
> some anti-affine? Or is it required to do queue.take() inside a
> Runnable submitted via ignite.compute().something?
>
> I also played with CollectionConfiguration.setCacheMode but the
> problem persists. Any pointers are appreciated.
>
> Kind Regards
> Peter
>