You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by kmandalas <Ky...@IRIworldwide.com> on 2017/04/03 08:29:01 UTC

Re: Distributed Closures VS Executor Service

Hello Christo and thank you for your feedback. 

For the moment we have a Relational Database that is not distributed or
partitioned. We have given some thought for configuring at least some
database tables as cache store to Ignite and it not something that we
exclude. Just for the moment due to some Network I/O performance issues of
the Cloud platform that will host the solution we cannot say much.

However, there is a strong chance at least one or two tables involved in the
flow I described to be loaded into cache. For this reason I would like to
ask the following:

1) Which is the best policy for loading into cache a table of about
~1.000.000 rows and use ANSI SQL to query it later? This table will be
incremented periodically (after an ETL procedure) utill it will reach a
final upper limit of about ~3.000.000 rows. So I would like the optimal way
to stream it initially into cache on startup of a future deployment and
update it after each ETL procedure.

2) If we use MapReduce & ForkJoin procedure how can we combine it with
Affinity? There are examples for Distributed Closures but I do not see any
for ComputeTask/ComputeJobAdapter etc. Ideally each job, should perform ANSI
SQL query on the table that will be loaded and maintained in cache but on
the rows that it keeps locally.

 



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Distributed-Closures-VS-Executor-Service-tp11192p11653.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Distributed Closures VS Executor Service

Posted by kmandalas <Ky...@IRIworldwide.com>.
Dear all,

I have created a new Topic: "Ignite SQL Indexing Performance problems" since
the conversation has deviate a lot form the original subject.



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Distributed-Closures-VS-Executor-Service-tp11192p12347.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Distributed Closures VS Executor Service

Posted by kmandalas <Ky...@IRIworldwide.com>.
Hello,

I don't know about H2, I sent you link from apacheignite.readme.io which
describes that IN clause does not support Indexes when run on Ignite. Also
at my previous post I have sent a simply query using IN clause that has very
poor performance on Ignite even run only on local mode with less than 2
million entries.

So could you please provide a concrete feedback? 



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Distributed-Closures-VS-Executor-Service-tp11192p12335.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Distributed Closures VS Executor Service

Posted by Sergi Vladykin <se...@gmail.com>.
Actually H2 can use index for IN operator, but using JOIN instead of IN may
be still useful.

Sergi

2017-05-01 12:40 GMT+03:00 Nikolai Tikhonov <nt...@apache.org>:

> Hi,
>
> The page [1] contains also workaround for IN operation. You can use JOIN
> instead of IN.
>
> Sergi,
>
> Will this tip actual for 2.0 version yet?
>
> 1. https://apacheignite.readme.io/docs/sql-performance-and-debugging#sql-
> performance-and-usability-considerations
>
> On Sun, Apr 30, 2017 at 1:50 PM, kmandalas <Kyriakos.Mandalas@
> iriworldwide.com> wrote:
>
>> After investigation more I run into the following:
>>
>> https://apacheignite.readme.io/docs/sql-performance-and-debu
>> gging#sql-performance-and-usability-considerations
>>
>> So, cause my query has IN clause no indexing will be used anyway.
>>
>> Could you please elaborate more on this topic and if/what will change at
>> version 2.0? Should I open a different topic better?
>>
>>
>>
>> --
>> View this message in context: http://apache-ignite-users.705
>> 18.x6.nabble.com/Distributed-Closures-VS-Executor-Service-t
>> p11192p12314.html
>> Sent from the Apache Ignite Users mailing list archive at Nabble.com.
>>
>
>

Re: Distributed Closures VS Executor Service

Posted by Nikolai Tikhonov <nt...@apache.org>.
Hi,

The page [1] contains also workaround for IN operation. You can use JOIN
instead of IN.

Sergi,

Will this tip actual for 2.0 version yet?

1. https://apacheignite.readme.io/docs/sql-performance-and-
debugging#sql-performance-and-usability-considerations

On Sun, Apr 30, 2017 at 1:50 PM, kmandalas <
Kyriakos.Mandalas@iriworldwide.com> wrote:

> After investigation more I run into the following:
>
> https://apacheignite.readme.io/docs/sql-performance-and-
> debugging#sql-performance-and-usability-considerations
>
> So, cause my query has IN clause no indexing will be used anyway.
>
> Could you please elaborate more on this topic and if/what will change at
> version 2.0? Should I open a different topic better?
>
>
>
> --
> View this message in context: http://apache-ignite-users.
> 70518.x6.nabble.com/Distributed-Closures-VS-Executor-Service-
> tp11192p12314.html
> Sent from the Apache Ignite Users mailing list archive at Nabble.com.
>

Re: Distributed Closures VS Executor Service

Posted by kmandalas <Ky...@IRIworldwide.com>.
After investigation more I run into the following:

https://apacheignite.readme.io/docs/sql-performance-and-debugging#sql-performance-and-usability-considerations

So, cause my query has IN clause no indexing will be used anyway.

Could you please elaborate more on this topic and if/what will change at
version 2.0? Should I open a different topic better?



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Distributed-Closures-VS-Executor-Service-tp11192p12314.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Distributed Closures VS Executor Service

Posted by kmandalas <Ky...@IRIworldwide.com>.
Hello Andrey,

The issue with the indexes for such a plain query seems major IMHO. So at
version 2.0 adding hints will be supported? Is this final?

However, even with no indexes, the query execution time it too much for
in-memory (time=8645 ms). I have streamed to cache a single table of
~1_600_000 rows/objects. Each object has many fields ~50 fields most of them
Double (cannot be primitives cause I need to have NULL as well for the
business logic executed on them).

As for the PARTITIONED approach I am aware of it but it does not fit well to
the nature of the calculations I have to run which I described in my
previous posts of this thread.



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Distributed-Closures-VS-Executor-Service-tp11192p12263.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Distributed Closures VS Executor Service

Posted by Andrey Mashenkov <an...@gmail.com>.
Hi,

1. Your query uses index "PPSP-IMDG-CACHE"."type_idx" that included single
field. Most probably it is due to H2 planner lack.
Unfortunately, current H2 version does not support index hints. This should
be fixed in ignite-2.0.

2. SQL queries are running in single thread by default. But it is possible
to run query on PARTITIONED in multiple threads [1].

[1]
https://apacheignite.readme.io/docs/sql-performance-and-debugging#query-parallelism



On Sat, Apr 22, 2017 at 1:56 PM, kmandalas <
Kyriakos.Mandalas@iriworldwide.com> wrote:

> Hello again,
>
> I am testing the approach described in previous posts with REPLICATED cache
> mode. The performance of the queries is extremely poor. Example:
>
> [18:23:33,016][INFO
> ][grid-timeout-worker-#13%ppsp-cluster-IV%][IgniteKernal%ppsp-cluster-IV]
> Metrics for local node (to disable set 'metricsLogFrequency' to 0)
>     ^-- Node [id=b5843eb9, name=ppsp-cluster-IV, uptime=00:12:00:110]
>     ^-- H/N/C [hosts=1, nodes=2, CPUs=8]
>     ^-- CPU [cur=8.4%, avg=9.03%, GC=0.1%]
>     ^-- Heap [used=2722MB, free=55.68%, comm=6143MB]
>     ^-- Non heap [used=229MB, free=-1%, comm=232MB]
>     ^-- Public thread pool [active=0, idle=0, qSize=0]
>     ^-- System thread pool [active=0, idle=1, qSize=0]
>     ^-- Outbound messages queue [size=0]
> [18:24:03,584]*[WARN ][pub-#130%ppsp-cluster-IV%][IgniteH2Indexing] Query
> execution is too long [time=8645 ms*, sql='SELECT
> "PPSP-IMDG-CACHE".SimulationInitialValues._KEY,
> "PPSP-IMDG-CACHE".SimulationInitialValues._VAL FROM
> "PPSP-IMDG-CACHE".SimulationInitialValues WHERE categoryId in ( 38, 39,
> 40,
> 41, 42, 43 ) and geoChannelId in ( 3, 4, 5, 6 ) and type= 3 and week
> between
> 1888 and 1939', plan=
> SELECT
>     "PPSP-IMDG-CACHE".SIMULATIONINITIALVALUES._KEY,
>     "PPSP-IMDG-CACHE".SIMULATIONINITIALVALUES._VAL
> FROM "PPSP-IMDG-CACHE".SIMULATIONINITIALVALUES
>     /* "PPSP-IMDG-CACHE"."type_idx": TYPE = 3 */
> WHERE ((WEEK >= 1888)
>     AND (WEEK <= 1939))
>     AND ((TYPE = 3)
>     AND ((CATEGORYID IN(38, 39, 40, 41, 42, 43))
>     AND (GEOCHANNELID IN(3, 4, 5, 6))))
> , parameters=[]]
> [18:24:03,584][WARN ][pub-#131%ppsp-cluster-IV%][IgniteH2Indexing] Query
> execution is too long [time=8701 ms, sql='SELECT
> "PPSP-IMDG-CACHE".SimulationInitialValues._KEY,
> "PPSP-IMDG-CACHE".SimulationInitialValues._VAL FROM
> "PPSP-IMDG-CACHE".SimulationInitialValues WHERE categoryId in ( 25, 26,
> 27,
> 28, 29, 31 ) and geoChannelId in ( 3, 4, 5, 6 ) and type= 3 and week
> between
> 1888 and 1939', plan=
> SELECT
>     "PPSP-IMDG-CACHE".SIMULATIONINITIALVALUES._KEY,
>     "PPSP-IMDG-CACHE".SIMULATIONINITIALVALUES._VAL
> FROM "PPSP-IMDG-CACHE".SIMULATIONINITIALVALUES
>     /* "PPSP-IMDG-CACHE"."type_idx": TYPE = 3 */
> WHERE ((WEEK >= 1888)
>     AND (WEEK <= 1939))
>     AND ((TYPE = 3)
>     AND ((CATEGORYID IN(25, 26, 27, 28, 29, 31))
>     AND (GEOCHANNELID IN(3, 4, 5, 6))))
> , parameters=[]]
> [18:24:03,584][WARN ][pub-#129%ppsp-cluster-IV%][IgniteH2Indexing] Query
> execution is too long [time=8655 ms, sql='SELECT
> "PPSP-IMDG-CACHE".SimulationInitialValues._KEY,
> "PPSP-IMDG-CACHE".SimulationInitialValues._VAL FROM
> "PPSP-IMDG-CACHE".SimulationInitialValues WHERE categoryId in ( 1, 2, 3,
> 4,
> 5, 6 ) and geoChannelId in ( 3, 4, 5, 6 ) and type= 3 and week between 1888
> and 1939', plan=
> SELECT
>     "PPSP-IMDG-CACHE".SIMULATIONINITIALVALUES._KEY,
>     "PPSP-IMDG-CACHE".SIMULATIONINITIALVALUES._VAL
> FROM "PPSP-IMDG-CACHE".SIMULATIONINITIALVALUES
>     /* "PPSP-IMDG-CACHE"."type_idx": TYPE = 3 */
> WHERE ((WEEK >= 1888)
>     AND (WEEK <= 1939))
>     AND ((TYPE = 3)
>     AND ((CATEGORYID IN(1, 2, 3, 4, 5, 6))
>     AND (GEOCHANNELID IN(3, 4, 5, 6))))
> , parameters=[]]
> [18:24:03,584][WARN ][pub-#134%ppsp-cluster-IV%][IgniteH2Indexing] Query
> execution is too long [time=8677 ms, sql='SELECT
> "PPSP-IMDG-CACHE".SimulationInitialValues._KEY,
> "PPSP-IMDG-CACHE".SimulationInitialValues._VAL FROM
> "PPSP-IMDG-CACHE".SimulationInitialValues WHERE categoryId in ( 19, 20,
> 21,
> 22, 23, 24 ) and geoChannelId in ( 3, 4, 5, 6 ) and type= 3 and week
> between
> 1888 and 1939', plan=
> SELECT
>     "PPSP-IMDG-CACHE".SIMULATIONINITIALVALUES._KEY,
>     "PPSP-IMDG-CACHE".SIMULATIONINITIALVALUES._VAL
> FROM "PPSP-IMDG-CACHE".SIMULATIONINITIALVALUES
>     /* "PPSP-IMDG-CACHE"."type_idx": TYPE = 3 */
> WHERE ((WEEK >= 1888)
>     AND (WEEK <= 1939))
>     AND ((TYPE = 3)
>     AND ((CATEGORYID IN(19, 20, 21, 22, 23, 24))
>     AND (GEOCHANNELID IN(3, 4, 5, 6))))
> , parameters=[]]
> [18:24:03,584][WARN ][pub-#132%ppsp-cluster-IV%][IgniteH2Indexing] Query
> execution is too long [time=8687 ms, sql='SELECT
> "PPSP-IMDG-CACHE".SimulationInitialValues._KEY,
> "PPSP-IMDG-CACHE".SimulationInitialValues._VAL FROM
> "PPSP-IMDG-CACHE".SimulationInitialValues WHERE categoryId in ( 32, 33,
> 34,
> 35, 36, 37 ) and geoChannelId in ( 3, 4, 5, 6 ) and type= 3 and week
> between
> 1888 and 1939', plan=
> SELECT
>     "PPSP-IMDG-CACHE".SIMULATIONINITIALVALUES._KEY,
>     "PPSP-IMDG-CACHE".SIMULATIONINITIALVALUES._VAL
> FROM "PPSP-IMDG-CACHE".SIMULATIONINITIALVALUES
>     /* "PPSP-IMDG-CACHE"."type_idx": TYPE = 3 */
> WHERE ((WEEK >= 1888)
>     AND (WEEK <= 1939))
>     AND ((TYPE = 3)
>     AND ((CATEGORYID IN(32, 33, 34, 35, 36, 37))
>     AND (GEOCHANNELID IN(3, 4, 5, 6))))
> , parameters=[]]
> [18:24:03,584][WARN ][pub-#133%ppsp-cluster-IV%][IgniteH2Indexing] Query
> execution is too long [time=8655 ms, sql='SELECT
> "PPSP-IMDG-CACHE".SimulationInitialValues._KEY,
> "PPSP-IMDG-CACHE".SimulationInitialValues._VAL FROM
> "PPSP-IMDG-CACHE".SimulationInitialValues WHERE categoryId in ( 12, 14,
> 15,
> 16, 17, 18 ) and geoChannelId in ( 3, 4, 5, 6 ) and type= 3 and week
> between
> 1888 and 1939', plan=
> SELECT
>     "PPSP-IMDG-CACHE".SIMULATIONINITIALVALUES._KEY,
>     "PPSP-IMDG-CACHE".SIMULATIONINITIALVALUES._VAL
> FROM "PPSP-IMDG-CACHE".SIMULATIONINITIALVALUES
>     /* "PPSP-IMDG-CACHE"."type_idx": TYPE = 3 */
> WHERE ((WEEK >= 1888)
>     AND (WEEK <= 1939))
>     AND ((TYPE = 3)
>     AND ((CATEGORYID IN(12, 14, 15, 16, 17, 18))
>     AND (GEOCHANNELID IN(3, 4, 5, 6))))
> , parameters=[]]
> [18:24:14,339][INFO
> ][grid-timeout-worker-#15%ppsp-cluster-IV%][IgniteKernal%ppsp-cluster-IV]
> Metrics for local node (to disable set 'metricsLogFrequency' to 0)
>     ^-- Node [id=82a8c42c, name=ppsp-cluster-IV, uptime=00:13:06:486]
>     ^-- H/N/C [hosts=1, nodes=2, CPUs=8]
>     ^-- CPU [cur=52.13%, avg=10.12%, GC=7.77%]
>     ^-- Heap [used=4044MB, free=34.17%, comm=6143MB]
>     ^-- Non heap [used=233MB, free=-1%, comm=237MB]
>     ^-- Public thread pool [active=6, idle=0, qSize=0]
>     ^-- System thread pool [active=0, idle=5, qSize=0]
>     ^-- Outbound messages queue [size=0]
> [18:24:47,660][INFO
> ][grid-timeout-worker-#13%ppsp-cluster-IV%][IgniteKernal%ppsp-cluster-IV]
> Metrics for local node (to disable set 'metricsLogFrequency' to 0)
>     ^-- Node [id=b5843eb9, name=ppsp-cluster-IV, uptime=00:13:01:516]
>     ^-- H/N/C [hosts=1, nodes=2, CPUs=8]
>     ^-- CPU [cur=100%, avg=10.29%, GC=107.47%]
>     ^-- Heap [used=5987MB, free=2.54%, comm=6143MB]
>     ^-- Non heap [used=235MB, free=-1%, comm=238MB]
>     ^-- Public thread pool [active=0, idle=0, qSize=0]
>     ^-- System thread pool [active=0, idle=0, qSize=0]
>     ^-- Outbound messages queue [size=0]
>
>
>
>
> --
> View this message in context: http://apache-ignite-users.
> 70518.x6.nabble.com/Distributed-Closures-VS-Executor-Service-
> tp11192p12168.html
> Sent from the Apache Ignite Users mailing list archive at Nabble.com.
>



-- 
Best regards,
Andrey V. Mashenkov

Re: Distributed Closures VS Executor Service

Posted by kmandalas <Ky...@IRIworldwide.com>.
Hello again,

I am testing the approach described in previous posts with REPLICATED cache
mode. The performance of the queries is extremely poor. Example:

[18:23:33,016][INFO
][grid-timeout-worker-#13%ppsp-cluster-IV%][IgniteKernal%ppsp-cluster-IV] 
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
    ^-- Node [id=b5843eb9, name=ppsp-cluster-IV, uptime=00:12:00:110]
    ^-- H/N/C [hosts=1, nodes=2, CPUs=8]
    ^-- CPU [cur=8.4%, avg=9.03%, GC=0.1%]
    ^-- Heap [used=2722MB, free=55.68%, comm=6143MB]
    ^-- Non heap [used=229MB, free=-1%, comm=232MB]
    ^-- Public thread pool [active=0, idle=0, qSize=0]
    ^-- System thread pool [active=0, idle=1, qSize=0]
    ^-- Outbound messages queue [size=0]
[18:24:03,584]*[WARN ][pub-#130%ppsp-cluster-IV%][IgniteH2Indexing] Query
execution is too long [time=8645 ms*, sql='SELECT
"PPSP-IMDG-CACHE".SimulationInitialValues._KEY,
"PPSP-IMDG-CACHE".SimulationInitialValues._VAL FROM
"PPSP-IMDG-CACHE".SimulationInitialValues WHERE categoryId in ( 38, 39, 40,
41, 42, 43 ) and geoChannelId in ( 3, 4, 5, 6 ) and type= 3 and week between
1888 and 1939', plan=
SELECT
    "PPSP-IMDG-CACHE".SIMULATIONINITIALVALUES._KEY,
    "PPSP-IMDG-CACHE".SIMULATIONINITIALVALUES._VAL
FROM "PPSP-IMDG-CACHE".SIMULATIONINITIALVALUES
    /* "PPSP-IMDG-CACHE"."type_idx": TYPE = 3 */
WHERE ((WEEK >= 1888)
    AND (WEEK <= 1939))
    AND ((TYPE = 3)
    AND ((CATEGORYID IN(38, 39, 40, 41, 42, 43))
    AND (GEOCHANNELID IN(3, 4, 5, 6))))
, parameters=[]]
[18:24:03,584][WARN ][pub-#131%ppsp-cluster-IV%][IgniteH2Indexing] Query
execution is too long [time=8701 ms, sql='SELECT
"PPSP-IMDG-CACHE".SimulationInitialValues._KEY,
"PPSP-IMDG-CACHE".SimulationInitialValues._VAL FROM
"PPSP-IMDG-CACHE".SimulationInitialValues WHERE categoryId in ( 25, 26, 27,
28, 29, 31 ) and geoChannelId in ( 3, 4, 5, 6 ) and type= 3 and week between
1888 and 1939', plan=
SELECT
    "PPSP-IMDG-CACHE".SIMULATIONINITIALVALUES._KEY,
    "PPSP-IMDG-CACHE".SIMULATIONINITIALVALUES._VAL
FROM "PPSP-IMDG-CACHE".SIMULATIONINITIALVALUES
    /* "PPSP-IMDG-CACHE"."type_idx": TYPE = 3 */
WHERE ((WEEK >= 1888)
    AND (WEEK <= 1939))
    AND ((TYPE = 3)
    AND ((CATEGORYID IN(25, 26, 27, 28, 29, 31))
    AND (GEOCHANNELID IN(3, 4, 5, 6))))
, parameters=[]]
[18:24:03,584][WARN ][pub-#129%ppsp-cluster-IV%][IgniteH2Indexing] Query
execution is too long [time=8655 ms, sql='SELECT
"PPSP-IMDG-CACHE".SimulationInitialValues._KEY,
"PPSP-IMDG-CACHE".SimulationInitialValues._VAL FROM
"PPSP-IMDG-CACHE".SimulationInitialValues WHERE categoryId in ( 1, 2, 3, 4,
5, 6 ) and geoChannelId in ( 3, 4, 5, 6 ) and type= 3 and week between 1888
and 1939', plan=
SELECT
    "PPSP-IMDG-CACHE".SIMULATIONINITIALVALUES._KEY,
    "PPSP-IMDG-CACHE".SIMULATIONINITIALVALUES._VAL
FROM "PPSP-IMDG-CACHE".SIMULATIONINITIALVALUES
    /* "PPSP-IMDG-CACHE"."type_idx": TYPE = 3 */
WHERE ((WEEK >= 1888)
    AND (WEEK <= 1939))
    AND ((TYPE = 3)
    AND ((CATEGORYID IN(1, 2, 3, 4, 5, 6))
    AND (GEOCHANNELID IN(3, 4, 5, 6))))
, parameters=[]]
[18:24:03,584][WARN ][pub-#134%ppsp-cluster-IV%][IgniteH2Indexing] Query
execution is too long [time=8677 ms, sql='SELECT
"PPSP-IMDG-CACHE".SimulationInitialValues._KEY,
"PPSP-IMDG-CACHE".SimulationInitialValues._VAL FROM
"PPSP-IMDG-CACHE".SimulationInitialValues WHERE categoryId in ( 19, 20, 21,
22, 23, 24 ) and geoChannelId in ( 3, 4, 5, 6 ) and type= 3 and week between
1888 and 1939', plan=
SELECT
    "PPSP-IMDG-CACHE".SIMULATIONINITIALVALUES._KEY,
    "PPSP-IMDG-CACHE".SIMULATIONINITIALVALUES._VAL
FROM "PPSP-IMDG-CACHE".SIMULATIONINITIALVALUES
    /* "PPSP-IMDG-CACHE"."type_idx": TYPE = 3 */
WHERE ((WEEK >= 1888)
    AND (WEEK <= 1939))
    AND ((TYPE = 3)
    AND ((CATEGORYID IN(19, 20, 21, 22, 23, 24))
    AND (GEOCHANNELID IN(3, 4, 5, 6))))
, parameters=[]]
[18:24:03,584][WARN ][pub-#132%ppsp-cluster-IV%][IgniteH2Indexing] Query
execution is too long [time=8687 ms, sql='SELECT
"PPSP-IMDG-CACHE".SimulationInitialValues._KEY,
"PPSP-IMDG-CACHE".SimulationInitialValues._VAL FROM
"PPSP-IMDG-CACHE".SimulationInitialValues WHERE categoryId in ( 32, 33, 34,
35, 36, 37 ) and geoChannelId in ( 3, 4, 5, 6 ) and type= 3 and week between
1888 and 1939', plan=
SELECT
    "PPSP-IMDG-CACHE".SIMULATIONINITIALVALUES._KEY,
    "PPSP-IMDG-CACHE".SIMULATIONINITIALVALUES._VAL
FROM "PPSP-IMDG-CACHE".SIMULATIONINITIALVALUES
    /* "PPSP-IMDG-CACHE"."type_idx": TYPE = 3 */
WHERE ((WEEK >= 1888)
    AND (WEEK <= 1939))
    AND ((TYPE = 3)
    AND ((CATEGORYID IN(32, 33, 34, 35, 36, 37))
    AND (GEOCHANNELID IN(3, 4, 5, 6))))
, parameters=[]]
[18:24:03,584][WARN ][pub-#133%ppsp-cluster-IV%][IgniteH2Indexing] Query
execution is too long [time=8655 ms, sql='SELECT
"PPSP-IMDG-CACHE".SimulationInitialValues._KEY,
"PPSP-IMDG-CACHE".SimulationInitialValues._VAL FROM
"PPSP-IMDG-CACHE".SimulationInitialValues WHERE categoryId in ( 12, 14, 15,
16, 17, 18 ) and geoChannelId in ( 3, 4, 5, 6 ) and type= 3 and week between
1888 and 1939', plan=
SELECT
    "PPSP-IMDG-CACHE".SIMULATIONINITIALVALUES._KEY,
    "PPSP-IMDG-CACHE".SIMULATIONINITIALVALUES._VAL
FROM "PPSP-IMDG-CACHE".SIMULATIONINITIALVALUES
    /* "PPSP-IMDG-CACHE"."type_idx": TYPE = 3 */
WHERE ((WEEK >= 1888)
    AND (WEEK <= 1939))
    AND ((TYPE = 3)
    AND ((CATEGORYID IN(12, 14, 15, 16, 17, 18))
    AND (GEOCHANNELID IN(3, 4, 5, 6))))
, parameters=[]]
[18:24:14,339][INFO
][grid-timeout-worker-#15%ppsp-cluster-IV%][IgniteKernal%ppsp-cluster-IV] 
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
    ^-- Node [id=82a8c42c, name=ppsp-cluster-IV, uptime=00:13:06:486]
    ^-- H/N/C [hosts=1, nodes=2, CPUs=8]
    ^-- CPU [cur=52.13%, avg=10.12%, GC=7.77%]
    ^-- Heap [used=4044MB, free=34.17%, comm=6143MB]
    ^-- Non heap [used=233MB, free=-1%, comm=237MB]
    ^-- Public thread pool [active=6, idle=0, qSize=0]
    ^-- System thread pool [active=0, idle=5, qSize=0]
    ^-- Outbound messages queue [size=0]
[18:24:47,660][INFO
][grid-timeout-worker-#13%ppsp-cluster-IV%][IgniteKernal%ppsp-cluster-IV] 
Metrics for local node (to disable set 'metricsLogFrequency' to 0)
    ^-- Node [id=b5843eb9, name=ppsp-cluster-IV, uptime=00:13:01:516]
    ^-- H/N/C [hosts=1, nodes=2, CPUs=8]
    ^-- CPU [cur=100%, avg=10.29%, GC=107.47%]
    ^-- Heap [used=5987MB, free=2.54%, comm=6143MB]
    ^-- Non heap [used=235MB, free=-1%, comm=238MB]
    ^-- Public thread pool [active=0, idle=0, qSize=0]
    ^-- System thread pool [active=0, idle=0, qSize=0]
    ^-- Outbound messages queue [size=0]




--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Distributed-Closures-VS-Executor-Service-tp11192p12168.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Distributed Closures VS Executor Service

Posted by kmandalas <Ky...@IRIworldwide.com>.
Moreover, even if I use custom Affinity what will happen if a node fails (or
a node is added)? Will I be aware / have control at which node its cache is
transferred so I send there co-located jobs? 

P.S: this table (SimulationInitialValues) is mainly read-only with the
exception that new entries are added periodically after some ETL procedure
which at some point will stop. The maximum number of rows will be around 5-6
million. Under these circumstances, I am thinking that maybe REPLICATED is a
good fit as well. Most likely we will have 4 nodes. 

Also, for a query like the following: select from SimulationInitialValues
where categoryId in (1, 2, 3, 4) and geoChannelId in (5, 8, 12, 14 ,15) and
type =? and week between ? and ?, and forcing local query mode only: what
kind of performance should I expect in terms of milliseconds? 
       



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Distributed-Closures-VS-Executor-Service-tp11192p11863.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Distributed Closures VS Executor Service

Posted by kmandalas <Ky...@IRIworldwide.com>.
Hello Christo,

The problem with the original approach is this:

- I have a flat table structure which is is mapped by the following POJO:

@Entity
@Table(name = "simulation_initial_values")
public class SimulationInitialValues extends IdEntity {

	@Basic
	@Column(name = "product_id")
	private Long productId;

	@Basic
	@Column(name = "geo_channel_id")
	private Long geoChannelId;

	@Basic
	@Column(name = "category_id", nullable = false)
	private Long categoryId;

	@Basic
	@Column(name = "week", nullable = false)
	private Integer week;

	@Basic
	@Column(name = "type_id", nullable = false)
	private Integer type;
        .
        .
        . more fields
}

We consider as primary key combination of: productId, geoChannelId,
categoryId, week and type

Now, when I perform an in-memory query like: /select from
SimulationInitialValues where categoryId in (...) and geoChannelId in (...)
and type =? and week between ? and ?/, I will get back a List of
SimulationInitialValues objects. 

Due to the nature of the calculations I have to perform though, I must have
all the products rows of each distinct category.

So in case of PARTITIONED cache, if I do broadcast, then there is no
guarantee that the categories assigned to a job will have all the rows at
the local cache.
        



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Distributed-Closures-VS-Executor-Service-tp11192p11861.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Distributed Closures VS Executor Service

Posted by christos <ch...@gridgain.com>.
Hi Kyriako,

Thinking about your original approach of using compute tasks assigned to
nodes with all CATEGORIES required by the task to be local, I believe you
could use the affinity interface to figure this out. You'd need to partition
your CATEGORIES cache and use the CATEGORY ID as the affinity, then use the
Affinity interface to determine which node a CATEGORY is mapped to and
allocate same-node categories to tasks. Then you can direct this task to the
correct node and force local query.

IgniteCache cache = ignite.cache(cacheName);
Affinity aff = ignite.affinity(cacheName);
// Get Partition ID for a given Key
int partId = aff.partition(categoryId);
// Get Primary node id from Key. This is the one you need probably
int nodeId = aff.mapKeysToNodes(categoryId);

Javadoc:
https://ignite.apache.org/releases/latest/javadoc/org/apache/ignite/cache/affinity/Affinity.html#mapKeysToNodes(java.util.Collection)

I still think the original approach I suggested is easier and makes more
sense...




--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Distributed-Closures-VS-Executor-Service-tp11192p11817.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Distributed Closures VS Executor Service

Posted by kmandalas <Ky...@IRIworldwide.com>.
Hello Christo,

Thank you for your prompt response. I will check your idea and get back to
you and the community.



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Distributed-Closures-VS-Executor-Service-tp11192p11759.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Distributed Closures VS Executor Service

Posted by christos <ch...@gridgain.com>.
If I understand correctly, the SQL query is executed within every task on
each of the nodes and it is not set to be a local query. Correct?

If so then what you are really doing is executing the SQL queries from all
the nodes to all the nodes. This is bound to be inefficient.

In essence what you want to do is make your tasks work at a local level
only. Why not just switch to simple distributed closures and broadcast the
same task to all the nodes but configure the SQL query within the task to
execute as local one - setLocal(true). I mean you are already using
ComputeTaskSplitAdapter which abstracts the only difference between
closures: the capability to automatically assign jobs to nodes. Broadcasting
the same task with a local query means the same task would execute on all
the nodes in parallel and perform only a local query on the node it is
running on. Then your tasks can proceed to do the required calculations and
write the results etc. 



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Distributed-Closures-VS-Executor-Service-tp11192p11758.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Distributed Closures VS Executor Service

Posted by kmandalas <Ky...@IRIworldwide.com>.
Hi Christo,

I have tried the following:

- When app starts, I stream into PARTITIONED CACHE one (1) table needed for
the distributed jobs
- Now when a Task is submitted, each Job takes over some CATEGORYIDs.
Performs an Ignite cache-query (I paste it below), gathers the results, runs
calculations on them, transforms them to other objects and saves them to
some other smaller table in the Database. Finally it returns a success flag
true/false.

[13:29:09,520][WARN ][pub-#50%ppsp-cluster-IV%][IgniteH2Indexing] Query
execution is too long [time=5007 ms, sql='SELECT
"PPSP-IMDG-CACHE".SimulationInitialValues._KEY,
"PPSP-IMDG-CACHE".SimulationInitialValues._VAL FROM
"PPSP-IMDG-CACHE".SimulationInitialValues WHERE categoryId in ( 1, 2, 3, 4,
5, 6, 8 ) and geoChannelId in ( 3, 4, 5, 6 ) and type= 3 and week between
1888 and 1939', plan=
SELECT
    "PPSP-IMDG-CACHE".SIMULATIONINITIALVALUES._KEY,
    "PPSP-IMDG-CACHE".SIMULATIONINITIALVALUES._VAL
FROM "PPSP-IMDG-CACHE".SIMULATIONINITIALVALUES
    /* "PPSP-IMDG-CACHE"."type_idx": TYPE = 3 */
WHERE ((WEEK >= 1888)
    AND (WEEK <= 1939))
    AND ((TYPE = 3)
    AND ((CATEGORYID IN(1, 2, 3, 4, 5, 6, 8))
    AND (GEOCHANNELID IN(3, 4, 5, 6))))
, parameters=[]]

Now, if I can align the CATEGORYIDs that each node has on its local cache
with the CATEGORYIDs that it receives along with a Job execution, then the
SQL query could be local-only. I have seen examples of affinityRun and
affinityCompute buf these are for distributed closures only. I would like to
do similar with ComputeTask approach because I need it in order to control
other things as well.

My *ComputeTask*is:
public class SimulationComputeTask extends
ComputeTaskSplitAdapter<SimulationMessage[], Boolean> {
...
}

*SimulationMessage *is:

@Getter
@Setter
@NoArgsConstructor
@AllArgsConstructor
public class SimulationMessage implements Serializable {
	private long projectId;
	private long scenarioId;
	private long[] categoryIds;
	private long[] geoChannels;
	private int channelType;
	private int weekStart;
	private int weekEnd;
}

 






--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Distributed-Closures-VS-Executor-Service-tp11192p11742.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Distributed Closures VS Executor Service

Posted by Christos Erotocritou <ch...@gridgain.com>.
Hi Kyriako,

Agree with Nikolai on both points.

Regarding point number 2, you want to use affinity data / data colocation and data / compute colocation. You basically need to ensure that when you are inserting data into the cache you use the same affinity for all entries that need to be collocated to ensure they end up on the same server. Then you can use an affinity run or broadcast with local query enabled to only execute the query locally. Now in your case you said the data is in an external database so I’m not sure how you would ensure the data for each query is local.

Christos

> On 5 Apr 2017, at 11:00, kmandalas <Ky...@IRIworldwide.com> wrote:
> 
> Hello Nikolai, all
> 
> About 1: yes, this was my perception as well, thanks for the confirmation
> 
> About 2: Even if all the nodes provide result sets of local execution to the
> query initiator, if we are talking about results being Lists containing a
> couple of thousands POJOs, then wouldn't be a big overhead these objects to
> be transferred over the Network? At least from some tests I perform, the
> response time is worse than querying the DB directly. Is there a way I make
> sure each node has all the data that will query locally? In combination with
> Compute Task Adapter always.
> 
> 
> 
> --
> View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Distributed-Closures-VS-Executor-Service-tp11192p11739.html
> Sent from the Apache Ignite Users mailing list archive at Nabble.com.


Re: Distributed Closures VS Executor Service

Posted by kmandalas <Ky...@IRIworldwide.com>.
Hello Nikolai, all

About 1: yes, this was my perception as well, thanks for the confirmation

About 2: Even if all the nodes provide result sets of local execution to the
query initiator, if we are talking about results being Lists containing a
couple of thousands POJOs, then wouldn't be a big overhead these objects to
be transferred over the Network? At least from some tests I perform, the
response time is worse than querying the DB directly. Is there a way I make
sure each node has all the data that will query locally? In combination with
Compute Task Adapter always.



--
View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Distributed-Closures-VS-Executor-Service-tp11192p11739.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Distributed Closures VS Executor Service

Posted by Nikolai Tikhonov <nt...@apache.org>.
Hi,

1) The better way using DataStreamer API. More details you can find there
[1].

2) Internal implementation SQL works in similar fashion. The query will be
parsed and split into multiple map queries and a single reduce query. All
the map queries are executed on all the data nodes where cache data
resides. All the nodes provide result sets of local execution to the query
initiator (reducing node) that, in turn, will accomplish the reduce phase
by properly merging provided result sets.

1. https://apacheignite.readme.io/docs/data-streamers

On Mon, Apr 3, 2017 at 11:29 AM, kmandalas <
Kyriakos.Mandalas@iriworldwide.com> wrote:

> Hello Christo and thank you for your feedback.
>
> For the moment we have a Relational Database that is not distributed or
> partitioned. We have given some thought for configuring at least some
> database tables as cache store to Ignite and it not something that we
> exclude. Just for the moment due to some Network I/O performance issues of
> the Cloud platform that will host the solution we cannot say much.
>
> However, there is a strong chance at least one or two tables involved in
> the
> flow I described to be loaded into cache. For this reason I would like to
> ask the following:
>
> 1) Which is the best policy for loading into cache a table of about
> ~1.000.000 rows and use ANSI SQL to query it later? This table will be
> incremented periodically (after an ETL procedure) utill it will reach a
> final upper limit of about ~3.000.000 rows. So I would like the optimal way
> to stream it initially into cache on startup of a future deployment and
> update it after each ETL procedure.
>
> 2) If we use MapReduce & ForkJoin procedure how can we combine it with
> Affinity? There are examples for Distributed Closures but I do not see any
> for ComputeTask/ComputeJobAdapter etc. Ideally each job, should perform
> ANSI
> SQL query on the table that will be loaded and maintained in cache but on
> the rows that it keeps locally.
>
>
>
>
>
> --
> View this message in context: http://apache-ignite-users.
> 70518.x6.nabble.com/Distributed-Closures-VS-Executor-Service-
> tp11192p11653.html
> Sent from the Apache Ignite Users mailing list archive at Nabble.com.
>