You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@ignite.apache.org by "steve.hostettler@gmail.com" <st...@gmail.com> on 2019/11/15 09:53:39 UTC

Re: New SQL execution engine

Dear all,

would it be possible to also have then // execution of sql queries on single
node with that approach?
My understanding is that, for the moment, the SQL queries a re
single-threaded for a given node if there is no affinity.

Best Regards



--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/

RE: New SQL execution engine

Posted by "Hostettler, Steve" <St...@wolterskluwer.com>.

Hi Roman,

Thanks a lot for the answer (and the pull request). As I said initially, I was under the impression that the reason was the lack of affinity.
I understand the reason and the current design and I think we all agreed that this is not optimal and that it should be reworked in the new design. Especially the sort of silent behavior. That being said, more than a warning : having joins in // inter partitions would be very helpful but I understand that it is not straightforward.

As always you guys are very reactive and helpful. Keep up the great work. Appreciate it.

-----Original Message-----
From: Roman Kondakov <ko...@mail.ru.INVALID> 
Sent: Monday, November 18, 2019 11:04 AM
To: dev@ignite.apache.org
Subject: Re: New SQL execution engine

Hi, Steve

This behavior is actually not a bug, but this is not obvious. I'll try to explain.

When query parallelism = N is turned on, it means that each cache is divided into N parts from the SQL point of view. Every SQL query is executed independently over each particular part, and then results are merged together during the reducer step.

This is absolutely identical to the distributed query execution, where instead of a single node with query parallelism = N, we have N nodes with query parallelism = 1. SQL query is executed over each partition of data on all nodes and then results are merged on reducer.

As we can see, query parallelism is equivalent to the distributed query execution. When we do joins over distributed tables, we need to think about the collocation of data [1]. If data is not collocated, we get a wrong result. This happens silently, which is not good, IMO.

I reworked your example a bit in order to impose collocation on the joining key and now join returns correct result [2].

Current approach in configuration and query execution looks very uncomfortable and should be completely redesigned in the new engine.

[1] https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fapacheignite-sql.readme.io%2Fdocs%2Fdistributed-joins&amp;data=02%7C01%7CSteve.Hostettler%40wolterskluwer.com%7C68a93ad417fc4e70ed1808d76c0e9f53%7C8ac76c91e7f141ffa89c3553b2da2c17%7C0%7C0%7C637096682368420072&amp;sdata=82bDWI1PHUOzNz95A5F%2Flyiqlrb9aQ2vadxhE%2FK47LM%3D&amp;reserved=0

[2] https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fhostettler%2FigniteParallelQueries%2Fpull%2F1&amp;data=02%7C01%7CSteve.Hostettler%40wolterskluwer.com%7C68a93ad417fc4e70ed1808d76c0e9f53%7C8ac76c91e7f141ffa89c3553b2da2c17%7C0%7C0%7C637096682368420072&amp;sdata=QCvNEKqGGyZYOXQbF0sG0DUCzYJCnKoWleFTMtngcsc%3D&amp;reserved=0

--
Kind Regards
Roman Kondakov

On 16.11.2019 12:50, steve.hostettler@gmail.com wrote:
> Actually I am now wondering whether this is not just a bug and that I 
> should record it as such. As the behavior is different with and 
> without the parallelism and there is no warning during execution or in the api.
>
> Any thought?
>
>
>
> --
> Sent from: 
> https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fapach
> e-ignite-developers.2346864.n4.nabble.com%2F&amp;data=02%7C01%7CSteve.
> Hostettler%40wolterskluwer.com%7C68a93ad417fc4e70ed1808d76c0e9f53%7C8a
> c76c91e7f141ffa89c3553b2da2c17%7C0%7C0%7C637096682368420072&amp;sdata=
> LzUii%2BuNqHhS1YbFLNwpe7cn6XRRpKrrSO6wS5zNlSU%3D&amp;reserved=0

Re: New SQL execution engine

Posted by Roman Kondakov <ko...@mail.ru.INVALID>.

Hi, Steve

This behavior is actually not a bug, but this is not obvious. I'll try 
to explain.

When query parallelism = N is turned on, it means that each cache is 
divided into N parts from the SQL point of view. Every SQL query is 
executed independently over each particular part, and then results are 
merged together during the reducer step.

This is absolutely identical to the distributed query execution, where 
instead of a single node with query parallelism = N, we have N nodes 
with query parallelism = 1. SQL query is executed over each partition of 
data on all nodes and then results are merged on reducer.

As we can see, query parallelism is equivalent to the distributed query 
execution. When we do joins over distributed tables, we need to think 
about the collocation of data [1]. If data is not collocated, we get a 
wrong result. This happens silently, which is not good, IMO.

I reworked your example a bit in order to impose collocation on the 
joining key and now join returns correct result [2].

Current approach in configuration and query execution looks very 
uncomfortable and should be completely redesigned in the new engine.

[1] https://apacheignite-sql.readme.io/docs/distributed-joins

[2] https://github.com/hostettler/igniteParallelQueries/pull/1

-- 
Kind Regards
Roman Kondakov

On 16.11.2019 12:50, steve.hostettler@gmail.com wrote:
> Actually I am now wondering whether this is not just a bug and that I should
> record it as such. As the behavior is different with and without the
> parallelism and there is no warning during execution or in the api.
>
> Any thought?
>
>
>
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/

RE: New SQL execution engine

Posted by "Hostettler, Steve" <St...@wolterskluwer.com>.

Ivan,

Thanks that is good news. I use ignite as a platform and not directly to exec in-house application so these types of things are making the generic code less  generic 😊.

Thanks a lot for the great work.

-----Original Message-----
From: Ivan Pavlukhin <vo...@gmail.com> 
Sent: Monday, November 18, 2019 10:13 AM
To: dev <de...@ignite.apache.org>
Subject: Re: New SQL execution engine

Steve,

Yep, unfortunately query parallelism in current flavor is counter-intuitive. But it was designed so =( As Roman wrote
> And of course this feature should also be available in the new engine, though it's architecture may be changed.
The architecture of parallel execution will be definitely reconsidered. And currently we are targeted to do it so in one node cluster query will return the same results regardless parallelism.

сб, 16 нояб. 2019 г. в 12:48, steve.hostettler@gmail.com
<st...@gmail.com>:
>
> Actually I am now wondering whether this is not just a bug and that I 
> should record it as such. As the behavior is different with and 
> without the parallelism and there is no warning during execution or in the api.
>
> Any thought?
>
>
>
> --
> Sent from: 
> https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fapach
> e-ignite-developers.2346864.n4.nabble.com%2F&amp;data=02%7C01%7CSteve.
> Hostettler%40wolterskluwer.com%7Cac6000fb14834d1abfa108d76c079273%7C8a
> c76c91e7f141ffa89c3553b2da2c17%7C0%7C0%7C637096652092270800&amp;sdata=
> PcitGXmdx5DittW1RMAOEeneiLfKVrydUHL8uCKGi3g%3D&amp;reserved=0



--
Best regards,
Ivan Pavlukhin

Re: New SQL execution engine

Posted by Ivan Pavlukhin <vo...@gmail.com>.

Steve,

Yep, unfortunately query parallelism in current flavor is
counter-intuitive. But it was designed so =( As Roman wrote
> And of course this feature should also be available in the new engine, though it's architecture may be changed.
The architecture of parallel execution will be definitely
reconsidered. And currently we are targeted to do it so in one node
cluster query will return the same results regardless parallelism.

сб, 16 нояб. 2019 г. в 12:48, steve.hostettler@gmail.com
<st...@gmail.com>:
>
> Actually I am now wondering whether this is not just a bug and that I should
> record it as such. As the behavior is different with and without the
> parallelism and there is no warning during execution or in the api.
>
> Any thought?
>
>
>
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/



-- 
Best regards,
Ivan Pavlukhin

RE: New SQL execution engine

Posted by "steve.hostettler@gmail.com" <st...@gmail.com>.

Actually I am now wondering whether this is not just a bug and that I should
record it as such. As the behavior is different with and without the
parallelism and there is no warning during execution or in the api.

Any thought?



--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/

RE: New SQL execution engine

Posted by "Hostettler, Steve" <St...@wolterskluwer.com>.

Hi Roman,

Actually it does not work as I expect it. Please see https://github.com/hostettler/igniteParallelQueries
Do mvn clean install and then java -jar target/ignite-parallel-query-1.0.0-SNAPSHOT-jar-with-dependencies.jar

This demonstrates that with or without the flag the query does not return the same result. I understand that it probably because I did not set an affinity but it is very counter-intuitive.

Am I missing something?

-----Original Message-----
From: Roman Kondakov <ko...@mail.ru.INVALID> 
Sent: Friday, November 15, 2019 11:46 AM
To: dev@ignite.apache.org
Subject: Re: New SQL execution engine

Hi Steve,

it is possible to execute queries in parallel even in the current engine, see docs here [1]. And of course this feature should also be available in the new engine, though it's architecture may be changed.

[1]
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fapacheignite.readme.io%2Fv2.0%2Fdocs%2Fsql-performance-and-debugging%23query-parallelism&amp;data=02%7C01%7CSteve.Hostettler%40wolterskluwer.com%7C2b752425baeb422af60408d769b9159d%7C8ac76c91e7f141ffa89c3553b2da2c17%7C0%7C0%7C637094115967087030&amp;sdata=eN7b2RCJegg8J9KQVK6TIFhcS6NG7j5pWKFxX9GWyYk%3D&amp;reserved=0

-- 
Kind Regards
Roman Kondakov

On 15.11.2019 12:53, steve.hostettler@gmail.com wrote:
> Dear all,
>
> would it be possible to also have then // execution of sql queries on single
> node with that approach?
> My understanding is that, for the moment, the SQL queries a re
> single-threaded for a given node if there is no affinity.
>
> Best Regards
>
>
>
> --
> Sent from: https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fapache-ignite-developers.2346864.n4.nabble.com%2F&amp;data=02%7C01%7CSteve.Hostettler%40wolterskluwer.com%7C2b752425baeb422af60408d769b9159d%7C8ac76c91e7f141ffa89c3553b2da2c17%7C0%7C0%7C637094115967087030&amp;sdata=jXtLMt2dWYqM4KcRFkw4lby6K0o8glKnrLFgxZ96LbQ%3D&amp;reserved=0

Re: New SQL execution engine

Posted by Roman Kondakov <ko...@mail.ru.INVALID>.

Hi Steve,

it is possible to execute queries in parallel even in the current 
engine, see docs here [1]. And of course this feature should also be 
available in the new engine, though it's architecture may be changed.

[1] 
https://apacheignite.readme.io/v2.0/docs/sql-performance-and-debugging#query-parallelism


-- 
Kind Regards
Roman Kondakov

On 15.11.2019 12:53, steve.hostettler@gmail.com wrote:
> Dear all,
>
> would it be possible to also have then // execution of sql queries on single
> node with that approach?
> My understanding is that, for the moment, the SQL queries a re
> single-threaded for a given node if there is no affinity.
>
> Best Regards
>
>
>
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/