You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by Pascoe Scholle <pa...@gmail.com> on 2019/09/03 10:43:58 UTC

Job Stealing node not stealing jobs

HI there,

I have asked this question, however I asked it under a different and
resolved topic, so I posted the quest under a more suitable title. I hope
thats ok

We have tried to configure two compute server nodes one of which is running
on a weaker machine. The node running on the more powerful machine always
finished its tasks far before
the weaker node and then sits idle.

The node is not even sending a steal request, so I must have configured
something wrong.

I have attached the code for both nodes if you could kindly point out what
I am missing , I would really appreciate it!

Re: Job Stealing node not stealing jobs

Posted by Ilya Kasnacheev <il...@gmail.com>.
Hello!

Can you throw together a reproducer project that demonstrates incorrect
behavior? We would look into it, raise ticket if needed.

Thanks?


-- 
Ilya Kasnacheev


вт, 10 сент. 2019 г. в 13:01, Pascoe Scholle <pa...@gmail.com>:

> Thanks for the prompt response. I have looked the
> WeightedRandomLoadBalancingSpi. It does not look like one can set the
> number of parallel jobs though and this is big requirement. Also, it is
> inevitable that there will be nodes which will sit idle, due to the nature
> of jobs that will be deployed on the nodes and the job stealer just seems
> like the perfect solution. Regardless, I have used the code provided for
> the job stealing spi on the docs page and it isnt functioning as intended.
>
>
> On Tue, 10 Sep 2019 at 11:34, Stephen Darlington <
> stephen.darlington@gridgain.com> wrote:
>
>> I don’t know the answer to your jon stealing question, but I do wonder if
>> that’s the right configuration for your requirements. Why not use the
>> weighted load balancer (
>> https://apacheignite.readme.io/docs/load-balancing)? That’s designed to
>> work in cases where nodes are of differing sizes.
>>
>> Regards,
>> Stephen
>>
>> On 10 Sep 2019, at 10:19, Pascoe Scholle <pa...@gmail.com>
>> wrote:
>>
>> Hello,
>>
>> is there any update on this?
>>
>> We have not been able to resolve this issue
>>
>> Kind regards
>>
>>
>> On Wed, 04 Sep 2019 at 07:44, Pascoe Scholle <
>> pascoescholletrash@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> attached a small scala project. Just set the build path to src after
>>> building and compiling with sbt.
>>>
>>> We want to execute processes that happen outside the JVM. These
>>> processes can be extremely memory intensive which is why I am limiting the
>>> number of parallel jobs that can be executed on a machine.
>>>
>>> I have one desktop that has a lot more memory available and can thus
>>> execute more jobs in parallel. As all jobs take roughly the same amount of
>>> time, this machine will have completed its jobs much faster. I want it to
>>> then take jobs from the nodes started on weaker machines once it has
>>> completed all its tasks.
>>>
>>> Does that make sense?
>>>
>>> Hope this helps.
>>>
>>> BR,
>>> Pascoe
>>>
>>> On Tue, 3 Sep 2019 at 17:29, Andrei Aleksandrov <ae...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> Some remarks about job stealing SPI:
>>>>
>>>> 1)You have some nodes that can proceed the tasks of some compute job.
>>>> 2)Tasks will be executed in public thread pool by default:
>>>> https://apacheignite.readme.io/docs/thread-pools#section-public-pool
>>>> 3)If some node thread pool is busy then some task of compute job can be
>>>> executed on other node.
>>>>
>>>> In next cases it will not work:
>>>>
>>>> 1)In case if you choose specific node for your compute task
>>>> 2)In case if you do affinity call (the same as above but node will be
>>>> choose by affinity mapping)
>>>>
>>>> According to your case:
>>>>
>>>> It's not clear for me what exactly you try to do. Possible job stealing
>>>> didn't work because of your weak node began executions of some tasks in
>>>> public pool but just do it longer then faster one.
>>>>
>>>> Could you please share your full reproducer for investigation?
>>>>
>>>> BR,
>>>> Andrei
>>>>
>>>> 9/3/2019 1:43 PM, Pascoe Scholle пишет:
>>>> > HI there,
>>>> >
>>>> > I have asked this question, however I asked it under a different and
>>>> > resolved topic, so I posted the quest under a more suitable title. I
>>>> > hope thats ok
>>>> >
>>>> > We have tried to configure two compute server nodes one of which is
>>>> > running on a weaker machine. The node running on the more powerful
>>>> > machine always finished its tasks far before
>>>> > the weaker node and then sits idle.
>>>> >
>>>> > The node is not even sending a steal request, so I must have
>>>> > configured something wrong.
>>>> >
>>>> > I have attached the code for both nodes if you could kindly point out
>>>> > what I am missing , I would really appreciate it!
>>>> >
>>>> >
>>>>
>>>
>>
>>

Re: Job Stealing node not stealing jobs

Posted by Pascoe Scholle <pa...@gmail.com>.
Thanks for the prompt response. I have looked the
WeightedRandomLoadBalancingSpi. It does not look like one can set the
number of parallel jobs though and this is big requirement. Also, it is
inevitable that there will be nodes which will sit idle, due to the nature
of jobs that will be deployed on the nodes and the job stealer just seems
like the perfect solution. Regardless, I have used the code provided for
the job stealing spi on the docs page and it isnt functioning as intended.


On Tue, 10 Sep 2019 at 11:34, Stephen Darlington <
stephen.darlington@gridgain.com> wrote:

> I don’t know the answer to your jon stealing question, but I do wonder if
> that’s the right configuration for your requirements. Why not use the
> weighted load balancer (https://apacheignite.readme.io/docs/load-balancing)?
> That’s designed to work in cases where nodes are of differing sizes.
>
> Regards,
> Stephen
>
> On 10 Sep 2019, at 10:19, Pascoe Scholle <pa...@gmail.com>
> wrote:
>
> Hello,
>
> is there any update on this?
>
> We have not been able to resolve this issue
>
> Kind regards
>
>
> On Wed, 04 Sep 2019 at 07:44, Pascoe Scholle <pa...@gmail.com>
> wrote:
>
>> Hi,
>>
>> attached a small scala project. Just set the build path to src after
>> building and compiling with sbt.
>>
>> We want to execute processes that happen outside the JVM. These processes
>> can be extremely memory intensive which is why I am limiting the
>> number of parallel jobs that can be executed on a machine.
>>
>> I have one desktop that has a lot more memory available and can thus
>> execute more jobs in parallel. As all jobs take roughly the same amount of
>> time, this machine will have completed its jobs much faster. I want it to
>> then take jobs from the nodes started on weaker machines once it has
>> completed all its tasks.
>>
>> Does that make sense?
>>
>> Hope this helps.
>>
>> BR,
>> Pascoe
>>
>> On Tue, 3 Sep 2019 at 17:29, Andrei Aleksandrov <ae...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> Some remarks about job stealing SPI:
>>>
>>> 1)You have some nodes that can proceed the tasks of some compute job.
>>> 2)Tasks will be executed in public thread pool by default:
>>> https://apacheignite.readme.io/docs/thread-pools#section-public-pool
>>> 3)If some node thread pool is busy then some task of compute job can be
>>> executed on other node.
>>>
>>> In next cases it will not work:
>>>
>>> 1)In case if you choose specific node for your compute task
>>> 2)In case if you do affinity call (the same as above but node will be
>>> choose by affinity mapping)
>>>
>>> According to your case:
>>>
>>> It's not clear for me what exactly you try to do. Possible job stealing
>>> didn't work because of your weak node began executions of some tasks in
>>> public pool but just do it longer then faster one.
>>>
>>> Could you please share your full reproducer for investigation?
>>>
>>> BR,
>>> Andrei
>>>
>>> 9/3/2019 1:43 PM, Pascoe Scholle пишет:
>>> > HI there,
>>> >
>>> > I have asked this question, however I asked it under a different and
>>> > resolved topic, so I posted the quest under a more suitable title. I
>>> > hope thats ok
>>> >
>>> > We have tried to configure two compute server nodes one of which is
>>> > running on a weaker machine. The node running on the more powerful
>>> > machine always finished its tasks far before
>>> > the weaker node and then sits idle.
>>> >
>>> > The node is not even sending a steal request, so I must have
>>> > configured something wrong.
>>> >
>>> > I have attached the code for both nodes if you could kindly point out
>>> > what I am missing , I would really appreciate it!
>>> >
>>> >
>>>
>>
>
>

Re: Job Stealing node not stealing jobs

Posted by Stephen Darlington <st...@gridgain.com>.
I don’t know the answer to your jon stealing question, but I do wonder if that’s the right configuration for your requirements. Why not use the weighted load balancer (https://apacheignite.readme.io/docs/load-balancing <https://apacheignite.readme.io/docs/load-balancing>)? That’s designed to work in cases where nodes are of differing sizes.

Regards,
Stephen

> On 10 Sep 2019, at 10:19, Pascoe Scholle <pa...@gmail.com> wrote:
> 
> Hello,
> 
> is there any update on this?
> 
> We have not been able to resolve this issue
> 
> Kind regards
> 
> 
> On Wed, 04 Sep 2019 at 07:44, Pascoe Scholle <pascoescholletrash@gmail.com <ma...@gmail.com>> wrote:
> Hi,
> 
> attached a small scala project. Just set the build path to src after building and compiling with sbt.
> 
> We want to execute processes that happen outside the JVM. These processes can be extremely memory intensive which is why I am limiting the 
> number of parallel jobs that can be executed on a machine.
> 
> I have one desktop that has a lot more memory available and can thus execute more jobs in parallel. As all jobs take roughly the same amount of time, this machine will have completed its jobs much faster. I want it to then take jobs from the nodes started on weaker machines once it has completed all its tasks.
> 
> Does that make sense?
> 
> Hope this helps.
> 
> BR,
> Pascoe
> 
> On Tue, 3 Sep 2019 at 17:29, Andrei Aleksandrov <aealexsandrov@gmail.com <ma...@gmail.com>> wrote:
> Hi,
> 
> Some remarks about job stealing SPI:
> 
> 1)You have some nodes that can proceed the tasks of some compute job.
> 2)Tasks will be executed in public thread pool by default:
> https://apacheignite.readme.io/docs/thread-pools#section-public-pool <https://apacheignite.readme.io/docs/thread-pools#section-public-pool>
> 3)If some node thread pool is busy then some task of compute job can be 
> executed on other node.
> 
> In next cases it will not work:
> 
> 1)In case if you choose specific node for your compute task
> 2)In case if you do affinity call (the same as above but node will be 
> choose by affinity mapping)
> 
> According to your case:
> 
> It's not clear for me what exactly you try to do. Possible job stealing 
> didn't work because of your weak node began executions of some tasks in 
> public pool but just do it longer then faster one.
> 
> Could you please share your full reproducer for investigation?
> 
> BR,
> Andrei
> 
> 9/3/2019 1:43 PM, Pascoe Scholle пишет:
> > HI there,
> >
> > I have asked this question, however I asked it under a different and 
> > resolved topic, so I posted the quest under a more suitable title. I 
> > hope thats ok
> >
> > We have tried to configure two compute server nodes one of which is 
> > running on a weaker machine. The node running on the more powerful 
> > machine always finished its tasks far before
> > the weaker node and then sits idle.
> >
> > The node is not even sending a steal request, so I must have 
> > configured something wrong.
> >
> > I have attached the code for both nodes if you could kindly point out 
> > what I am missing , I would really appreciate it!
> >
> >



Re: Job Stealing node not stealing jobs

Posted by Pascoe Scholle <pa...@gmail.com>.
Hello,

is there any update on this?

We have not been able to resolve this issue

Kind regards


On Wed, 04 Sep 2019 at 07:44, Pascoe Scholle <pa...@gmail.com>
wrote:

> Hi,
>
> attached a small scala project. Just set the build path to src after
> building and compiling with sbt.
>
> We want to execute processes that happen outside the JVM. These processes
> can be extremely memory intensive which is why I am limiting the
> number of parallel jobs that can be executed on a machine.
>
> I have one desktop that has a lot more memory available and can thus
> execute more jobs in parallel. As all jobs take roughly the same amount of
> time, this machine will have completed its jobs much faster. I want it to
> then take jobs from the nodes started on weaker machines once it has
> completed all its tasks.
>
> Does that make sense?
>
> Hope this helps.
>
> BR,
> Pascoe
>
> On Tue, 3 Sep 2019 at 17:29, Andrei Aleksandrov <ae...@gmail.com>
> wrote:
>
>> Hi,
>>
>> Some remarks about job stealing SPI:
>>
>> 1)You have some nodes that can proceed the tasks of some compute job.
>> 2)Tasks will be executed in public thread pool by default:
>> https://apacheignite.readme.io/docs/thread-pools#section-public-pool
>> 3)If some node thread pool is busy then some task of compute job can be
>> executed on other node.
>>
>> In next cases it will not work:
>>
>> 1)In case if you choose specific node for your compute task
>> 2)In case if you do affinity call (the same as above but node will be
>> choose by affinity mapping)
>>
>> According to your case:
>>
>> It's not clear for me what exactly you try to do. Possible job stealing
>> didn't work because of your weak node began executions of some tasks in
>> public pool but just do it longer then faster one.
>>
>> Could you please share your full reproducer for investigation?
>>
>> BR,
>> Andrei
>>
>> 9/3/2019 1:43 PM, Pascoe Scholle пишет:
>> > HI there,
>> >
>> > I have asked this question, however I asked it under a different and
>> > resolved topic, so I posted the quest under a more suitable title. I
>> > hope thats ok
>> >
>> > We have tried to configure two compute server nodes one of which is
>> > running on a weaker machine. The node running on the more powerful
>> > machine always finished its tasks far before
>> > the weaker node and then sits idle.
>> >
>> > The node is not even sending a steal request, so I must have
>> > configured something wrong.
>> >
>> > I have attached the code for both nodes if you could kindly point out
>> > what I am missing , I would really appreciate it!
>> >
>> >
>>
>

Re: Job Stealing node not stealing jobs

Posted by Pascoe Scholle <pa...@gmail.com>.
Hi,

attached a small scala project. Just set the build path to src after
building and compiling with sbt.

We want to execute processes that happen outside the JVM. These processes
can be extremely memory intensive which is why I am limiting the
number of parallel jobs that can be executed on a machine.

I have one desktop that has a lot more memory available and can thus
execute more jobs in parallel. As all jobs take roughly the same amount of
time, this machine will have completed its jobs much faster. I want it to
then take jobs from the nodes started on weaker machines once it has
completed all its tasks.

Does that make sense?

Hope this helps.

BR,
Pascoe

On Tue, 3 Sep 2019 at 17:29, Andrei Aleksandrov <ae...@gmail.com>
wrote:

> Hi,
>
> Some remarks about job stealing SPI:
>
> 1)You have some nodes that can proceed the tasks of some compute job.
> 2)Tasks will be executed in public thread pool by default:
> https://apacheignite.readme.io/docs/thread-pools#section-public-pool
> 3)If some node thread pool is busy then some task of compute job can be
> executed on other node.
>
> In next cases it will not work:
>
> 1)In case if you choose specific node for your compute task
> 2)In case if you do affinity call (the same as above but node will be
> choose by affinity mapping)
>
> According to your case:
>
> It's not clear for me what exactly you try to do. Possible job stealing
> didn't work because of your weak node began executions of some tasks in
> public pool but just do it longer then faster one.
>
> Could you please share your full reproducer for investigation?
>
> BR,
> Andrei
>
> 9/3/2019 1:43 PM, Pascoe Scholle пишет:
> > HI there,
> >
> > I have asked this question, however I asked it under a different and
> > resolved topic, so I posted the quest under a more suitable title. I
> > hope thats ok
> >
> > We have tried to configure two compute server nodes one of which is
> > running on a weaker machine. The node running on the more powerful
> > machine always finished its tasks far before
> > the weaker node and then sits idle.
> >
> > The node is not even sending a steal request, so I must have
> > configured something wrong.
> >
> > I have attached the code for both nodes if you could kindly point out
> > what I am missing , I would really appreciate it!
> >
> >
>

Re: Job Stealing node not stealing jobs

Posted by Andrei Aleksandrov <ae...@gmail.com>.
Hi,

Some remarks about job stealing SPI:

1)You have some nodes that can proceed the tasks of some compute job.
2)Tasks will be executed in public thread pool by default:
https://apacheignite.readme.io/docs/thread-pools#section-public-pool
3)If some node thread pool is busy then some task of compute job can be 
executed on other node.

In next cases it will not work:

1)In case if you choose specific node for your compute task
2)In case if you do affinity call (the same as above but node will be 
choose by affinity mapping)

According to your case:

It's not clear for me what exactly you try to do. Possible job stealing 
didn't work because of your weak node began executions of some tasks in 
public pool but just do it longer then faster one.

Could you please share your full reproducer for investigation?

BR,
Andrei

9/3/2019 1:43 PM, Pascoe Scholle пишет:
> HI there,
>
> I have asked this question, however I asked it under a different and 
> resolved topic, so I posted the quest under a more suitable title. I 
> hope thats ok
>
> We have tried to configure two compute server nodes one of which is 
> running on a weaker machine. The node running on the more powerful 
> machine always finished its tasks far before
> the weaker node and then sits idle.
>
> The node is not even sending a steal request, so I must have 
> configured something wrong.
>
> I have attached the code for both nodes if you could kindly point out 
> what I am missing , I would really appreciate it!
>
>