You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ignite.apache.org by ezhuravlev <e....@gmail.com> on 2017/09/01 17:48:43 UTC

Re: Task management - MapReduce & ForkJoin performance penalty

Hi,

I've added Thread.sleep(200) to Jobs to simulate a small load.
Here is what I've got: 
1node: 1 Task 2000 Jobs ~25 sec
2nodes(on the same machine): 1 Task 2000 Jobs ~13 sec

What I want to say here - this overhead will be not noticeable on real Jobs.

What about some configuration changes - you could change
igniteConfiguration.setPublicThreadPoolSize(). By default, it set to CPU
count(with hyperthreading), but for small tasks you can make it bigger, for
example, CPU count*2.

Also, Ignite contains module benchmarks - you can check it and find
benchmarks for ComputeGrid.

All the best,
Evgenii



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Task management - MapReduce & ForkJoin performance penalty

Posted by ihorps <ih...@gmail.com>.

hi @yakov


yakov wrote
> Yes, however, you can still return results from each job and use it.
> Please
> see javadoc for org.apache.ignite.compute.ComputeJobResult#getData

yes, it's good to have such opportunity at least on "result" step.
But still I'm very curious, why the overhead is so big when results are
being collected on reduce step.

I've done a quick profile action (without deep analysis) and it seems to me
that with "cached results" for MapReduce process there is a big overhead in
org.jsr166.* package. Meaning there are quite a lot of invocation of
"get/poll" and "put" methods from concurrent impl. of Map
(ConcurentHashMap8, ConcurentLinkedHashMap, ConcurentLinkedDeque8) - 
<http://apache-ignite-users.70518.x6.nabble.com/file/t1316/Ignite-JProfile.jpg> 
 
here is ConcurrentLinkedDeque8.poll exceptional method call
<http://apache-ignite-users.70518.x6.nabble.com/file/t1316/Ignite-JProfile-ConcurrentLinkedDeque8-poll.jpg> 

Will try to dig into it more at my spare time...



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Task management - MapReduce & ForkJoin performance penalty

Posted by Yakov Zhdanov <yz...@apache.org>.

Yes, however, you can still return results from each job and use it. Please
see javadoc for org.apache.ignite.compute.ComputeJobResult#getData

--Yakov

Re: Task management - MapReduce & ForkJoin performance penalty

Posted by ihorps <ih...@gmail.com>.

yakov wrote
> What are your timings now?

on two local nodes, after jvm is warmed up (~100 executions), it's running
in average 30ms instead of 6 sec when result is returned in return/reduce
phase. This is a huge improvement! 
I can take it now as a basis and start adding some additional behavior on
top of it (task status persistence, custom job exception handling etc.).

Just for my understanding - after reading the javadoc, I've understood that
this annotation is exactly for use-cases where results of distributed jobs
are not required to be returned to a caller/reducet although these jobs
still can write/read from distributed cache in collocate manner (when
https://issues.apache.org/jira/browse/IGNITE-5037 is implemented), right? 




--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Task management - MapReduce & ForkJoin performance penalty

Posted by Yakov Zhdanov <yz...@apache.org>.

You are welcome!

What are your timings now?

--Yakov

2017-09-07 15:01 GMT+03:00 ihorps <ih...@gmail.com>:

> hi @yakov
>
>
> yakov wrote
> > Try attaching @ComputeTaskNoResultCache to your task.
>
> Thank you for the hint. It speeds up task management processing
> drastically!
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: Task management - MapReduce & ForkJoin performance penalty

Posted by ihorps <ih...@gmail.com>.

hi @yakov


yakov wrote
> Try attaching @ComputeTaskNoResultCache to your task.

Thank you for the hint. It speeds up task management processing drastically!



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Task management - MapReduce & ForkJoin performance penalty

Posted by Yakov Zhdanov <yz...@apache.org>.

Try attaching @ComputeTaskNoResultCache to your task.

Also filed a ticket - https://issues.apache.org/jira/browse/IGNITE-6284

As far as 2 - I meant empty runnables submitted to an JDK thread pool
executor - submission will require to acquire a lock and notify pool
thread. So overhead is very significant compared to an execution of a no-op
runnable. Ignite processes job execution requests coming from remote node
in public pool
(org.apache.ignite.configuration.IgniteConfiguration#getPublicThreadPoolSize)
and submits jobs to this pool mapped from local node.

--Yakov

Re: Task management - MapReduce & ForkJoin performance penalty

Posted by ihorps <ih...@gmail.com>.

hi @yakov

Thank you for your feedback.

1. yes, warming up a jvm - this is what I missed at the begging (no doubts
here at all). I can confirm that it gets better in average after few dozens
of run.
2. did you mean than IgniteRunnable/IgniteCallable here (efficiency for
no-op task/job)? I'd like to use Map Reduce framework (especially when
IGNITE-5037 is implemented) such as it gives me almost out-of-the-box a
feature of Task management in terms of collocation execution, failover,
distribution. If I would go with IgniteRunnable/IgniteCallable I'd have to
take over our proprietary code, which we implemented in hazelcast and this
is what I want to avoid.
3. Can confirm that is runs ~ 1sec. faster after JVM is warmed up (4000
jobs, 1 task and to local nodes).

For now I'm satisfied with the "first-try-touch" experience. Looking forward
for IGNITE-5037.

The next topic is to compare memory consumption, such I wrote in previous
comments, running 4000 no-op jobs with help of Map Reduce API I could follow
that at least 2Gb additional memory was used for such run. Still have to
investigate why and how it works in more details.

Thanks!  



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Task management - MapReduce & ForkJoin performance penalty

Posted by Yakov Zhdanov <yz...@apache.org>.

Guys,

I see the following issues with the benchmark:

1. There is only one iteration. I would put it in a loop and measure at
least hundred of iterations.
2. no-op jobs are not real world example at all =) job requests are
processed in thread pool executor which is not very much effective for such
usecase.
3. Job is an inner class inside NoOpTask. Evgeniy, can you please change it
to a static class and check if it helps here?
4. Also, multinode tests should be run on multiple machines (preferably, 1
node per host).

--Yakov

Re: Task management - MapReduce & ForkJoin performance penalty

Posted by Evgenii Zhuravlev <e....@gmail.com>.

But of course, it could be changed. The community didn't decide yet if wiki
doesn't have information about it.

2017-09-05 17:46 GMT+03:00 Evgenii Zhuravlev <e....@gmail.com>:

> I think it was planned at the end of October.
>
> Evgenii
>
> 2017-09-05 17:41 GMT+03:00 ihorps <ih...@gmail.com>:
>
>> hi, @ezhuravlev
>>
>> This is what I'm looking for, many thanks!
>>
>> Some hints when v2.3 is planned to be release (I can't find it on wiki)?
>>
>> I'd rather wait for this API in Ignite then implementing it by myself an
>> throw it later such as I'm in evaluation/prototype phase now.
>>
>> Best regards,
>> ihorps
>>
>>
>>
>> --
>> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>>
>
>

Re: Task management - MapReduce & ForkJoin performance penalty

Posted by Evgenii Zhuravlev <e....@gmail.com>.

I think it was planned at the end of October.

Evgenii

2017-09-05 17:41 GMT+03:00 ihorps <ih...@gmail.com>:

> hi, @ezhuravlev
>
> This is what I'm looking for, many thanks!
>
> Some hints when v2.3 is planned to be release (I can't find it on wiki)?
>
> I'd rather wait for this API in Ignite then implementing it by myself an
> throw it later such as I'm in evaluation/prototype phase now.
>
> Best regards,
> ihorps
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: Task management - MapReduce & ForkJoin performance penalty

Posted by ihorps <ih...@gmail.com>.

hi, @ezhuravlev

This is what I'm looking for, many thanks!

Some hints when v2.3 is planned to be release (I can't find it on wiki)? 

I'd rather wait for this API in Ignite then implementing it by myself an
throw it later such as I'm in evaluation/prototype phase now.

Best regards,
ihorps



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Task management - MapReduce & ForkJoin performance penalty

Posted by Evgenii Zhuravlev <e....@gmail.com>.

Hi,

Here is a ticket for exactly what you want, it's in progress right now:
https://issues.apache.org/jira/browse/IGNITE-5037

If you don't want to wait till it will be implemented, you can use
affinityCall(...) or affinityRun(...) and somehow reduce result after it
will be returned.

Evgenii

2017-09-04 21:52 GMT+03:00 ihorps <ih...@gmail.com>:

> hi @ezhuravlev
>
> Thank you for your reply, very appreciated!
>
> I can confirm that by adding real business logic to Jobs it's actually
> scales horizontally quite well and by adding more nodes the whole task
> finishes just faster.
>
> One more think, which I'm looking now on is running tasks with help of
> MapReduce API in collocated fashion. As far as I understood from
> documentation ( Collocate Computing and Data
> <https://apacheignite.readme.io/docs/collocate-compute-and-data>  ) this
> is
> possible only by calling affinityCall(...) or affinityRun(...), which take
> IgniteCallable or IgniteRunnable.
> I'd like to create a ComputeTask (ComputeTaskAdapter or
> ComputeTaskSplitAdapter), which would spawn ComputeJob with affinity key
> (let's say in constructor) and execute them on node with co-located data.
>
> So is this possible to do such somehow? I couldn't find for now how it can
> be done in elegant way...
>
> Thank you in advance.
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: Task management - MapReduce & ForkJoin performance penalty

Posted by ihorps <ih...@gmail.com>.

hi @ezhuravlev

Thank you for your reply, very appreciated!

I can confirm that by adding real business logic to Jobs it's actually
scales horizontally quite well and by adding more nodes the whole task
finishes just faster.

One more think, which I'm looking now on is running tasks with help of
MapReduce API in collocated fashion. As far as I understood from
documentation ( Collocate Computing and Data
<https://apacheignite.readme.io/docs/collocate-compute-and-data>  ) this is
possible only by calling affinityCall(...) or affinityRun(...), which take
IgniteCallable or IgniteRunnable.
I'd like to create a ComputeTask (ComputeTaskAdapter or
ComputeTaskSplitAdapter), which would spawn ComputeJob with affinity key
(let's say in constructor) and execute them on node with co-located data.

So is this possible to do such somehow? I couldn't find for now how it can
be done in elegant way...

Thank you in advance.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/