You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ignite.apache.org by spoutnik_be <sp...@skynet.be> on 2019/09/12 17:26:36 UTC

Ignite query performance with lots of joins

Hi,

I posted a message related to  H2 & Ignite performances
<https://stackoverflow.com/questions/57902194/h2-ignite-query-performance>  
on stackoverflow.

As the main issue is about a difference in execution times between H2 &
Ignite so I thought it would be good to relay it here.

Thanks for your support,
L.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Ignite query performance with lots of joins

Posted by spoutnik_be <sp...@skynet.be>.

Hi,

Bad news is that there is no short term solution, good news are that the
issue/case is acknowledged and that Ignite is moving forward.

See approach:
https://cwiki.apache.org/confluence/display/IGNITE/IEP-37%3A+New+query+execution+engine





--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Ignite query performance with lots of joins

Posted by Павлухин Иван <vo...@gmail.com>.

Hi,

I checked provided test data. I was able to speedup a query execution
with Ignite about 2 times on my machine by using extra configuration
property System.setProperty("IGNITE_MAX_INDEX_PAYLOAD_SIZE", "256");
See a following documentation section about configuring index inline
size [1]. You can try the same in your environment. Shortly inline
size is needed to tune an indexed search speed. By default ignite
index pages can contain very limited pieces of indexed values (default
inline size is 10 bytes). If indexed values do not fit inline size
then actual values will be searched in another page (data page). It
can lead to a performance degradation.

> Not sure how to interpret the above statement. The support for SQL is an attractive feature of Ignite/Gridgain, but if it doesn't perform on a single node with little data I don't see how it will perform on a multi-node cluster.

Actually data distribution is a tradeoff. And usually it sounds as
"doing more work with more resources". And a gain here is not linear.
But as final result you can reach higher throughput by adding more
computational resources. Of course it depends on a particular
workload. Complex joins might be not good candidate here.

[1] https://apacheignite-sql.readme.io/docs/performance-and-debugging#section-increasing-index-inline-size

ср, 18 сент. 2019 г. в 11:03, spoutnik_be <sp...@skynet.be>:
>
> Unfortunately, I am nowhere near Silicon Valley these days ;-)
>
> Any update on possible optimizations that could bring us somewhat closer
> than H2 timings?
>
> Thanks, L.
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/



-- 
Best regards,
Ivan Pavlukhin

Re: Ignite query performance with lots of joins

Posted by spoutnik_be <sp...@skynet.be>.

Unfortunately, I am nowhere near Silicon Valley these days ;-)

Any update on possible optimizations that could bring us somewhat closer
than H2 timings?

Thanks, L. 



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Ignite query performance with lots of joins

Posted by Denis Magda <dm...@apache.org>.

Ignite SQL engine cannot be considered as a competitor (in a single-node
scenario) to old-good RDBMS like Postgres or MySQL. Those DBs have been
being developed for decades and are optimized for single-machine
deployments. While our efforts (Ignite community) were put into distributed
optimizations when an application has to scale out and utilize RAM in the
distributed fashion. That's why we have that recommendation to avoid any
misunderstanding. Probably, it needs to be rewritten a bit for more clarity.

Btw, if you are around Silicon Valley, step by my session at PostgresConf
this week:
https://postgresconf.org/conferences/SV2019/program/proposals/postgresql-with-in-memory-computing-faster-transactions-and-analytics-d6bb1225-4721-46ce-beda-e44f5e7c333e

As for your specific case, I think that a primary bottleneck is a number of
JOINs. It makes sense to rewrite this query first and then scale-out for
bigger benefits.

Ivan Pavluknin, Stan Lukyanov, Pavel Vinokurov, could you folks please
check this thread out and this repo suggesting any optimizations?
https://github.com/spoutnik-be/h2-ignite-perf.git

-
Denis

On Mon, Sep 16, 2019 at 6:08 AM spoutnik_be <sp...@skynet.be> wrote:

> About the quote on Stackoverflow:
> "Ignite/GridGain is optimized for multi-nodes deployments with RAM as a
> primary storage. Don’t try to compare a single-node GridGain cluster to a
> relational database that was optimized for such single-node configurations.
> You should deploy a multi-node GridGain cluster with the whole copy of data
> in RAM."
>
> Not sure how to interpret the above statement. The support for SQL is an
> attractive feature of Ignite/Gridgain, but if it doesn't perform on a
> single
> node with little data I don't see how it will perform on a multi-node
> cluster.
>
> What would be then your recommendation? Should we implement a SQL converter
> to translate queries into something else Ignite could run faster?
>
>
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: Ignite query performance with lots of joins

Posted by Denis Magda <dm...@apache.org>.

Ignite SQL engine cannot be considered as a competitor (in a single-node
scenario) to old-good RDBMS like Postgres or MySQL. Those DBs have been
being developed for decades and are optimized for single-machine
deployments. While our efforts (Ignite community) were put into distributed
optimizations when an application has to scale out and utilize RAM in the
distributed fashion. That's why we have that recommendation to avoid any
misunderstanding. Probably, it needs to be rewritten a bit for more clarity.

Btw, if you are around Silicon Valley, step by my session at PostgresConf
this week:
https://postgresconf.org/conferences/SV2019/program/proposals/postgresql-with-in-memory-computing-faster-transactions-and-analytics-d6bb1225-4721-46ce-beda-e44f5e7c333e

As for your specific case, I think that a primary bottleneck is a number of
JOINs. It makes sense to rewrite this query first and then scale-out for
bigger benefits.

Ivan Pavluknin, Stan Lukyanov, Pavel Vinokurov, could you folks please
check this thread out and this repo suggesting any optimizations?
https://github.com/spoutnik-be/h2-ignite-perf.git

-
Denis

On Mon, Sep 16, 2019 at 6:08 AM spoutnik_be <sp...@skynet.be> wrote:

> About the quote on Stackoverflow:
> "Ignite/GridGain is optimized for multi-nodes deployments with RAM as a
> primary storage. Don’t try to compare a single-node GridGain cluster to a
> relational database that was optimized for such single-node configurations.
> You should deploy a multi-node GridGain cluster with the whole copy of data
> in RAM."
>
> Not sure how to interpret the above statement. The support for SQL is an
> attractive feature of Ignite/Gridgain, but if it doesn't perform on a
> single
> node with little data I don't see how it will perform on a multi-node
> cluster.
>
> What would be then your recommendation? Should we implement a SQL converter
> to translate queries into something else Ignite could run faster?
>
>
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>

Re: Ignite query performance with lots of joins

Posted by spoutnik_be <sp...@skynet.be>.

About the quote on Stackoverflow:
"Ignite/GridGain is optimized for multi-nodes deployments with RAM as a
primary storage. Don’t try to compare a single-node GridGain cluster to a
relational database that was optimized for such single-node configurations.
You should deploy a multi-node GridGain cluster with the whole copy of data
in RAM."

Not sure how to interpret the above statement. The support for SQL is an
attractive feature of Ignite/Gridgain, but if it doesn't perform on a single
node with little data I don't see how it will perform on a multi-node
cluster.

What would be then your recommendation? Should we implement a SQL converter
to translate queries into something else Ignite could run faster?






--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Ignite query performance with lots of joins

Posted by spoutnik_be <sp...@skynet.be>.

@Daniel Magda: following you comment on Stackoverflow, continuing the
discussion here.

This is most probably related to the number of joins, we'll evolve to a less
normalized model for querying from Ignite.

Couple of questions still remain:
- How many joins can Ignite/H2 handle without significant performance drop
(read something about H2 CBO out if more than 7 joins)?
- How to correctly interpret explain plan, especially with ANALYZE option
and its scanCount?
- Does order of query matters, Oracle CBO uses statistics to re-order joins
while this is not visible in H2 explain plans.

Any hints/info welcome ...

Thanks,
L.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/