You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@ignite.apache.org by Igor Seliverstov <gv...@apache.org> on 2019/09/27 08:44:59 UTC

New SQL execution engine

Hi Igniters!

As you might know currently we have many open issues relating to current H2 based engine and its execution flow.

Some of them are critical (like impossibility to execute particular queries), some of them are majors (like impossibility to execute particular queries without pre-preparation your data to have a collocation) and many minors.

Most of the issues cannot be solved without whole engine redesign. 

So, here the proposal: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130028084

I'll appreciate if you share your thoughts on top of that.

Regards,
Igor

Re: New SQL execution engine

Posted by Nikolay Izhikov <ni...@apache.org>.

Hello, Alexey.

Thanks for the details.

> Now, as for alternatives for Apache Calcite

I want to discuss our *requirements* for the new engine first.
Can we do it?
The main reason to do it - We should avoid wrong technical decision.
We made one with H2 and we shouldn't do it again.

> As for the IEP content - I agree, we should have a more detailed
> description of steps and technical information there, but I believe this
> will be improved further.

Thanks!
Looking forward for IEP details.

В Пт, 27/09/2019 в 16:04 +0300, Alexey Goncharuk пишет:
> Nikolay, Maxim,
> 
> Asking to provide a list of issues with the current H2 is pointless because
> it has a fundamental architectural flow, not just a bunch of bugs:
> 
> Currently, the query execution is limited to a two-phase map-reduce task
> (with an optional remote cursor when 'distributed joins' flag is enabled)
> and only a limited subset of queries can be executed. You can easily see
> that if you try to draw how three non-collocated caches should be joined on
> an arbitrary condition.
> 
> H2 cannot solve this problem because H2 is a local database and is not
> designed to execute distributed queries, let alone the fact that it is not
> designed to be embedded to other projects as an execution engine. Because
> of this, H2 upgrade is a huge pain which leads to issues up to broken
> compilation. This is exactly the reason why the ticket with index use for
> IN() expression [1] has only been fixed in 2.7, one can see the amount of
> changes needed for a simple version upgrade.
> 
> Now, as for alternatives for Apache Calcite - I personally spent quite a
> large amount of time looking for alternatives but did not find any even
> remotely matching the abilities and flexibility of Calcite, but did not
> find any. As folks noted before, Calcite is specifically designed to have
> flexible optimization rules and support distributed query execution, which
> is already proved by real-life projects. If you have any other framework in
> mind that should be considered - please let the community know, I believe
> it will be a more productive discussion than now.
> 
> As for the IEP content - I agree, we should have a more detailed
> description of steps and technical information there, but I believe this
> will be improved further.
> 
> --AG
> 
> [1] https://issues.apache.org/jira/browse/IGNITE-4150
> 
> 
> 
> пт, 27 сент. 2019 г. в 15:33, Maxim Muzafarov <mm...@apache.org>:
> 
> > Folks,
> > 
> > I agree with Nikolay, the idea of replacing the H2 engine with the
> > most suitable one is reasonable. But since such change is major we
> > should have a strong argumentation on it even for members with are
> > working outside the SQL-team.
> > 
> > I think it is really necessary to have:
> > 
> > 1. The list of issues related to the current engine (H2) which from
> > different points of view and for different developers must seem
> > unsolvable. For example, `... the H2 execution plan is hard-wired with
> > H2 internals and can't be easily transformed` seems doesn't have a
> > strong technical argumentation.
> > After this step, we should have a clear understanding that the engine
> > change is required.
> > 
> > 2. Why only the Apache Calcite? It seems to me we should have a table
> > with a comparison of different engines with the pros and cons of each
> > other. A brief search shows me that we may have a few options here.
> > After this step, we should have a clear understanding of why we choose
> > this dependency prior to another.
> > 
> > 3. We should also have a migration decomposition and step by step
> > actions to do. I haven't found such a decomposition on IEP-37 page. Do
> > we have one? What the implementation phases will be? What components
> > will be changed? What a new API would be and would it be? What
> > problems we are expecting e.g performance degradation on prototype
> > implementation? `Risks and Assumptions` topic doesn't seem to be a
> > good described.
> > After this step, we should have a clear and obvious a new feature
> > implementation plan.
> > 
> > Let's have a strong technical discussion.
> > 
> > On Fri, 27 Sep 2019 at 15:17, Nikolay Izhikov <ni...@apache.org> wrote:
> > > 
> > > Hello, Roman.
> > > 
> > > All I see is links to two tickets:
> > > 
> > > IGNITE-11448 - Open
> > > IGNITE-6085 - Closed
> > > 
> > > Other issues described poorly and have not ticket links.
> > > We can't discuss such a huge change as an execution engine replacement
> > 
> > with descrition like:
> > > 
> > > "No data co-location control, i.e. arbitrary data can be returned
> > 
> > silently" or
> > > "Low control on how query executes internally, as a result we have
> > 
> > limited possibility to implement improvements/fixes."
> > > 
> > > I think we need some reproducer that shows issue.
> > > Tech details also should be added.
> > > 
> > > Let's make these descriptions more specific.
> > > Let's discuss how we want to fix them with the new engine.
> > > 
> > > 
> > > В Пт, 27/09/2019 в 15:10 +0300, Roman Kondakov пишет:
> > > > Hello Nikolay,
> > > > 
> > > > please see IEP--37 [1]. Issues are there.
> > > > 
> > > > 
> > > > [1]
> > > > 
> > 
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130028084
> > > > 
> > > >

Re: New SQL execution engine

Posted by Alexey Goncharuk <al...@gmail.com>.

Nikolay, Maxim,

Asking to provide a list of issues with the current H2 is pointless because
it has a fundamental architectural flow, not just a bunch of bugs:

Currently, the query execution is limited to a two-phase map-reduce task
(with an optional remote cursor when 'distributed joins' flag is enabled)
and only a limited subset of queries can be executed. You can easily see
that if you try to draw how three non-collocated caches should be joined on
an arbitrary condition.

H2 cannot solve this problem because H2 is a local database and is not
designed to execute distributed queries, let alone the fact that it is not
designed to be embedded to other projects as an execution engine. Because
of this, H2 upgrade is a huge pain which leads to issues up to broken
compilation. This is exactly the reason why the ticket with index use for
IN() expression [1] has only been fixed in 2.7, one can see the amount of
changes needed for a simple version upgrade.

Now, as for alternatives for Apache Calcite - I personally spent quite a
large amount of time looking for alternatives but did not find any even
remotely matching the abilities and flexibility of Calcite, but did not
find any. As folks noted before, Calcite is specifically designed to have
flexible optimization rules and support distributed query execution, which
is already proved by real-life projects. If you have any other framework in
mind that should be considered - please let the community know, I believe
it will be a more productive discussion than now.

As for the IEP content - I agree, we should have a more detailed
description of steps and technical information there, but I believe this
will be improved further.

--AG

[1] https://issues.apache.org/jira/browse/IGNITE-4150

пт, 27 сент. 2019 г. в 15:33, Maxim Muzafarov <mm...@apache.org>:

> Folks,
>
> I agree with Nikolay, the idea of replacing the H2 engine with the
> most suitable one is reasonable. But since such change is major we
> should have a strong argumentation on it even for members with are
> working outside the SQL-team.
>
> I think it is really necessary to have:
>
> 1. The list of issues related to the current engine (H2) which from
> different points of view and for different developers must seem
> unsolvable. For example, `... the H2 execution plan is hard-wired with
> H2 internals and can't be easily transformed` seems doesn't have a
> strong technical argumentation.
> After this step, we should have a clear understanding that the engine
> change is required.
>
> 2. Why only the Apache Calcite? It seems to me we should have a table
> with a comparison of different engines with the pros and cons of each
> other. A brief search shows me that we may have a few options here.
> After this step, we should have a clear understanding of why we choose
> this dependency prior to another.
>
> 3. We should also have a migration decomposition and step by step
> actions to do. I haven't found such a decomposition on IEP-37 page. Do
> we have one? What the implementation phases will be? What components
> will be changed? What a new API would be and would it be? What
> problems we are expecting e.g performance degradation on prototype
> implementation? `Risks and Assumptions` topic doesn't seem to be a
> good described.
> After this step, we should have a clear and obvious a new feature
> implementation plan.
>
> Let's have a strong technical discussion.
>
> On Fri, 27 Sep 2019 at 15:17, Nikolay Izhikov <ni...@apache.org> wrote:
> >
> > Hello, Roman.
> >
> > All I see is links to two tickets:
> >
> > IGNITE-11448 - Open
> > IGNITE-6085 - Closed
> >
> > Other issues described poorly and have not ticket links.
> > We can't discuss such a huge change as an execution engine replacement
> with descrition like:
> >
> > "No data co-location control, i.e. arbitrary data can be returned
> silently" or
> > "Low control on how query executes internally, as a result we have
> limited possibility to implement improvements/fixes."
> >
> > I think we need some reproducer that shows issue.
> > Tech details also should be added.
> >
> > Let's make these descriptions more specific.
> > Let's discuss how we want to fix them with the new engine.
> >
> >
> > В Пт, 27/09/2019 в 15:10 +0300, Roman Kondakov пишет:
> > > Hello Nikolay,
> > >
> > > please see IEP--37 [1]. Issues are there.
> > >
> > >
> > > [1]
> > >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130028084
> > >
> > >
>

Re: New SQL execution engine

Posted by Nikolay Izhikov <ni...@apache.org>.

Thanks, Andrey! 

Will take a loo, shortly.

В Пт, 27/09/2019 в 17:19 +0300, Andrey Mashenkov пишет:
> Issues can't be resolved without changes in H2.
> Hope, this will be enough.
> 
> https://issues.apache.org/jira/browse/IGNITE-10598
> https://issues.apache.org/jira/browse/IGNITE-11473
> https://issues.apache.org/jira/browse/IGNITE-11444
> https://issues.apache.org/jira/browse/IGNITE-5289
> https://issues.apache.org/jira/browse/IGNITE-10855
> https://issues.apache.org/jira/browse/IGNITE-11341
> https://issues.apache.org/jira/browse/IGNITE-7526
> https://issues.apache.org/jira/browse/IGNITE-9480
> https://issues.apache.org/jira/browse/IGNITE-9616
> https://issues.apache.org/jira/browse/IGNITE-11891
> https://issues.apache.org/jira/browse/IGNITE-6202
> https://issues.apache.org/jira/browse/IGNITE-11448
> https://issues.apache.org/jira/browse/IGNITE-3911
> 
> 
> On Fri, Sep 27, 2019 at 4:34 PM Nikolay Izhikov <ni...@apache.org> wrote:
> 
> > Roman.
> > 
> > > Nikolay, Maxim, I understand that our arguments may not be as obvious
> > > for you as it obvious for SQL team. So, please arrange your questions in
> > > a more constructive way.
> > 
> > What is SQL team?
> > I only know Ignite community :)
> > 
> > Please, share you knowledge in IEP.
> > I want to join to the process of engine *selection*.
> > It should start with the requirements to such engine.
> > Can you write it in IEP, please?
> > 
> > My point is very simple:
> > 
> > 1. We made the wrong decision with H2
> > 2. We should make a well-thought decision about the new engine.
> > 
> > > How many tickets would satisfy you?
> > 
> > You write about "issueS" with the H2.
> > All I see is one open ticket.
> > IEP doesn't provide enough information.
> > So it's not about the number of tickets, it's about
> > 
> > > These two points (single map-reduce execution and inflexible optimizer)
> > > are the main problems with the current engine.
> > 
> > We may come to the point when Calcite(or any other engine) brings us third
> > and other "main problems".
> > This is how it happens with H2.
> > 
> > Let's start from what we want to get with the engine and move forward from
> > this base.
> > What do you think?
> > 
> > 
> > 
> > В Пт, 27/09/2019 в 16:15 +0300, Roman Kondakov пишет:
> > > Maxim, Nikolay,
> > > 
> > > I've listed two issues which show the ideological flaws of the current
> > > engine.
> > > 
> > > 1. IGNITE-11448 - Open. This ticket describes the impossibility of
> > > executing queries which can not be fit in the hardcoded one pass
> > > map-reduce paradigm.
> > > 
> > > 2. IGNITE-6085 - Closed (won't fix) - This ticket describes the second
> > > major problem with the current engine: H2 query optimizer is very
> > > primitive and can not perform many useful optimizations.
> > > 
> > > These two points (single map-reduce execution and inflexible optimizer)
> > > are the main problems with the current engine. It means that our engine
> > > is currently  suitable for execution only a very limited subset of the
> > > typical SQL queries. For example it can not even run most of the TPC-H
> > > benchmark queries because they don't fit to the simple map-reduce
> > 
> > paradigm.
> > > 
> > > > All I see is links to two tickets:
> > > 
> > > How many tickets would satisfy you? I named two. And it looks like it is
> > > not enough from your point of view. Ok, so how many is enough? The set
> > > of problems caused by listed above tickets is infinite, therefore I can
> > > not create a ticket for each of them.
> > > > Tech details also should be added.
> > > 
> > > Tech details are in the tickets.
> > > 
> > > > We can't discuss such a huge change as an execution engine replacement
> > 
> > with descrition like:
> > > > "No data co-location control, i.e. arbitrary data can be returned
> > 
> > silently" or
> > > > "Low control on how query executes internally, as a result we have
> > 
> > limited possibility to implement improvements/fixes."
> > > 
> > > Why not? Don't you understand these problems? Or you don't think this is
> > > a problem?
> > > 
> > > > Let's make these descriptions more specific.
> > > 
> > > What do you mean by "more specific"? What is the criteria of the
> > > specific description?
> > > 
> > > 
> > > 
> > > Nikolay, Maxim, I understand that our arguments may not be as obvious
> > > for you as it obvious for SQL team. So, please arrange your questions in
> > > a more constructive way.
> > > 
> > > Thank you!
> 
>

Re: New SQL execution engine

Posted by Andrey Mashenkov <an...@gmail.com>.

Issues can't be resolved without changes in H2.
Hope, this will be enough.

https://issues.apache.org/jira/browse/IGNITE-10598
https://issues.apache.org/jira/browse/IGNITE-11473
https://issues.apache.org/jira/browse/IGNITE-11444
https://issues.apache.org/jira/browse/IGNITE-5289
https://issues.apache.org/jira/browse/IGNITE-10855
https://issues.apache.org/jira/browse/IGNITE-11341
https://issues.apache.org/jira/browse/IGNITE-7526
https://issues.apache.org/jira/browse/IGNITE-9480
https://issues.apache.org/jira/browse/IGNITE-9616
https://issues.apache.org/jira/browse/IGNITE-11891
https://issues.apache.org/jira/browse/IGNITE-6202
https://issues.apache.org/jira/browse/IGNITE-11448
https://issues.apache.org/jira/browse/IGNITE-3911


On Fri, Sep 27, 2019 at 4:34 PM Nikolay Izhikov <ni...@apache.org> wrote:

> Roman.
>
> > Nikolay, Maxim, I understand that our arguments may not be as obvious
> > for you as it obvious for SQL team. So, please arrange your questions in
> > a more constructive way.
>
> What is SQL team?
> I only know Ignite community :)
>
> Please, share you knowledge in IEP.
> I want to join to the process of engine *selection*.
> It should start with the requirements to such engine.
> Can you write it in IEP, please?
>
> My point is very simple:
>
> 1. We made the wrong decision with H2
> 2. We should make a well-thought decision about the new engine.
>
> > How many tickets would satisfy you?
>
> You write about "issueS" with the H2.
> All I see is one open ticket.
> IEP doesn't provide enough information.
> So it's not about the number of tickets, it's about
>
> > These two points (single map-reduce execution and inflexible optimizer)
> > are the main problems with the current engine.
>
> We may come to the point when Calcite(or any other engine) brings us third
> and other "main problems".
> This is how it happens with H2.
>
> Let's start from what we want to get with the engine and move forward from
> this base.
> What do you think?
>
>
>
> В Пт, 27/09/2019 в 16:15 +0300, Roman Kondakov пишет:
> > Maxim, Nikolay,
> >
> > I've listed two issues which show the ideological flaws of the current
> > engine.
> >
> > 1. IGNITE-11448 - Open. This ticket describes the impossibility of
> > executing queries which can not be fit in the hardcoded one pass
> > map-reduce paradigm.
> >
> > 2. IGNITE-6085 - Closed (won't fix) - This ticket describes the second
> > major problem with the current engine: H2 query optimizer is very
> > primitive and can not perform many useful optimizations.
> >
> > These two points (single map-reduce execution and inflexible optimizer)
> > are the main problems with the current engine. It means that our engine
> > is currently  suitable for execution only a very limited subset of the
> > typical SQL queries. For example it can not even run most of the TPC-H
> > benchmark queries because they don't fit to the simple map-reduce
> paradigm.
> >
> > > All I see is links to two tickets:
> >
> > How many tickets would satisfy you? I named two. And it looks like it is
> > not enough from your point of view. Ok, so how many is enough? The set
> > of problems caused by listed above tickets is infinite, therefore I can
> > not create a ticket for each of them.
> > > Tech details also should be added.
> >
> > Tech details are in the tickets.
> >
> > > We can't discuss such a huge change as an execution engine replacement
> with descrition like:
> > > "No data co-location control, i.e. arbitrary data can be returned
> silently" or
> > > "Low control on how query executes internally, as a result we have
> limited possibility to implement improvements/fixes."
> >
> > Why not? Don't you understand these problems? Or you don't think this is
> > a problem?
> >
> > > Let's make these descriptions more specific.
> >
> > What do you mean by "more specific"? What is the criteria of the
> > specific description?
> >
> >
> >
> > Nikolay, Maxim, I understand that our arguments may not be as obvious
> > for you as it obvious for SQL team. So, please arrange your questions in
> > a more constructive way.
> >
> > Thank you!
>


-- 
Best regards,
Andrey V. Mashenkov

Re: New SQL execution engine

Posted by Nikolay Izhikov <ni...@apache.org>.

Hello, Andrey.

Thanks, it's more clear now.

> I agree, we should make IEP clear to everyone in community who want to be involved in IEP implementation at first.

Great! 
Looking forward for IEP clarification.


В Пт, 27/09/2019 в 18:07 +0300, Andrey Mashenkov пишет:
> Nikolay, Igor.
> 
> Implementing from scratch is an option, of course.
> If we decide to go this way then we definitely won't to spend long nights
> to invent "yet another SQL parser" with all the stuff related to query
> rewrite rules (e.g. IN -> JOIN) or type casting \ validation \ conversion.
> 
> We thought about step-by-step H2 replacing.
> 1. We've tried to make POC with parser replacement to generated one from
> SQL grammar with ASM,
> but this approach looks slow, AFAIR. Gridgainers, anybody, have smth on
> this?
> 
> 2. Then we need a planner with all the rules.
> Of course we will need to write rules optimized for "Distributed" execution
> in anyway, but I doubt anybody want to write common-rules that already has
> Calcite.
> We can copy-paste, but what for?
> 
> 3. Then we have to implement execution pipeline.
> Possibly, we can adopt new query plans for H2 execution, but then we will
> still have same pain with resolving H2 internal issues (e.g. OOM).
> H2 approach is outdated, it doesn't fit Ignite needs as distributes system.
> 
> With Calcite we can concentrate on 2 and (mostly) 3 points and reuse
> their architectural abstracts, otherwise we should reinvent those abstracts
> through long discussions on dev-list.
> 
> I agree, we should make IEP clear to everyone in community who want to be
> involved in IEP implementation at first.
> Both approaches ("from scratch" and  "with Calcite") are risky, so
> 
> Can we try to make an additional engine "beta"-implementation and allow
> users fallback to old engine until a new one will be decided to become
> mature enough.
> 
> 
> 
> 
> On Fri, Sep 27, 2019 at 5:08 PM Seliverstov Igor <gv...@gmail.com>
> wrote:
> 
> > Nikolay,
> > 
> > The main issue - there is no *selection*.
> > 
> > There is a field of knowledge - relational algebra, which describes how to
> > transform relational expressions saving their semantics, and a couple of
> > implementations (Calcite is only one written in Java).
> > 
> > There are only two alternatives:
> > 
> > 1) Implementing white papers from scratch
> > 2) Adopting Calcite to our needs.
> > 
> > The second way was chosen by several other projects, there is experience,
> > there is a list of known issues (like using indexes) so, almost everything
> > is already done for us.
> > 
> > Implementing a planner is a big deal, I think anybody understands it
> > there. That's why our proposal to reuse others experience is obvious.
> > 
> > If you have an alternative - you're welcome, I'll gratefully listen to you.
> > 
> > The main question isn't "WHAT" but "HOW" - that's the discussion topic
> > from my point of view.
> > 
> > Regards,
> > Igor
> > 
> > > 27 сент. 2019 г., в 16:37, Nikolay Izhikov <ni...@apache.org>
> > 
> > написал(а):
> > > 
> > > Roman.
> > > 
> > > > Nikolay, Maxim, I understand that our arguments may not be as obvious
> > > > for you as it obvious for SQL team. So, please arrange your questions
> > 
> > in
> > > > a more constructive way.
> > > 
> > > What is SQL team?
> > > I only know Ignite community :)
> > > 
> > > Please, share you knowledge in IEP.
> > > I want to join to the process of engine *selection*.
> > > It should start with the requirements to such engine.
> > > Can you write it in IEP, please?
> > > 
> > > My point is very simple:
> > > 
> > > 1. We made the wrong decision with H2
> > > 2. We should make a well-thought decision about the new engine.
> > > 
> > > > How many tickets would satisfy you?
> > > 
> > > You write about "issueS" with the H2.
> > > All I see is one open ticket.
> > > IEP doesn't provide enough information.
> > > So it's not about the number of tickets, it's about
> > > 
> > > > These two points (single map-reduce execution and inflexible optimizer)
> > > > are the main problems with the current engine.
> > > 
> > > We may come to the point when Calcite(or any other engine) brings us
> > 
> > third and other "main problems".
> > > This is how it happens with H2.
> > > 
> > > Let's start from what we want to get with the engine and move forward
> > 
> > from this base.
> > > What do you think?
> > > 
> > > 
> > > 
> > > В Пт, 27/09/2019 в 16:15 +0300, Roman Kondakov пишет:
> > > > Maxim, Nikolay,
> > > > 
> > > > I've listed two issues which show the ideological flaws of the current
> > > > engine.
> > > > 
> > > > 1. IGNITE-11448 - Open. This ticket describes the impossibility of
> > > > executing queries which can not be fit in the hardcoded one pass
> > > > map-reduce paradigm.
> > > > 
> > > > 2. IGNITE-6085 - Closed (won't fix) - This ticket describes the second
> > > > major problem with the current engine: H2 query optimizer is very
> > > > primitive and can not perform many useful optimizations.
> > > > 
> > > > These two points (single map-reduce execution and inflexible optimizer)
> > > > are the main problems with the current engine. It means that our engine
> > > > is currently  suitable for execution only a very limited subset of the
> > > > typical SQL queries. For example it can not even run most of the TPC-H
> > > > benchmark queries because they don't fit to the simple map-reduce
> > 
> > paradigm.
> > > > 
> > > > > All I see is links to two tickets:
> > > > 
> > > > How many tickets would satisfy you? I named two. And it looks like it
> > 
> > is
> > > > not enough from your point of view. Ok, so how many is enough? The set
> > > > of problems caused by listed above tickets is infinite, therefore I can
> > > > not create a ticket for each of them.
> > > > > Tech details also should be added.
> > > > 
> > > > Tech details are in the tickets.
> > > > 
> > > > > We can't discuss such a huge change as an execution engine replacement
> > 
> > with descrition like:
> > > > > "No data co-location control, i.e. arbitrary data can be returned
> > 
> > silently" or
> > > > > "Low control on how query executes internally, as a result we have
> > 
> > limited possibility to implement improvements/fixes."
> > > > 
> > > > Why not? Don't you understand these problems? Or you don't think this
> > 
> > is
> > > > a problem?
> > > > 
> > > > > Let's make these descriptions more specific.
> > > > 
> > > > What do you mean by "more specific"? What is the criteria of the
> > > > specific description?
> > > > 
> > > > 
> > > > 
> > > > Nikolay, Maxim, I understand that our arguments may not be as obvious
> > > > for you as it obvious for SQL team. So, please arrange your questions
> > 
> > in
> > > > a more constructive way.
> > > > 
> > > > Thank you!
> > 
> > 
> 
>

Re: New SQL execution engine

Posted by Andrey Mashenkov <an...@gmail.com>.

Nikolay, Igor.

Implementing from scratch is an option, of course.
If we decide to go this way then we definitely won't to spend long nights
to invent "yet another SQL parser" with all the stuff related to query
rewrite rules (e.g. IN -> JOIN) or type casting \ validation \ conversion.

We thought about step-by-step H2 replacing.
1. We've tried to make POC with parser replacement to generated one from
SQL grammar with ASM,
but this approach looks slow, AFAIR. Gridgainers, anybody, have smth on
this?

2. Then we need a planner with all the rules.
Of course we will need to write rules optimized for "Distributed" execution
in anyway, but I doubt anybody want to write common-rules that already has
Calcite.
We can copy-paste, but what for?

3. Then we have to implement execution pipeline.
Possibly, we can adopt new query plans for H2 execution, but then we will
still have same pain with resolving H2 internal issues (e.g. OOM).
H2 approach is outdated, it doesn't fit Ignite needs as distributes system.

With Calcite we can concentrate on 2 and (mostly) 3 points and reuse
their architectural abstracts, otherwise we should reinvent those abstracts
through long discussions on dev-list.

I agree, we should make IEP clear to everyone in community who want to be
involved in IEP implementation at first.
Both approaches ("from scratch" and  "with Calcite") are risky, so

Can we try to make an additional engine "beta"-implementation and allow
users fallback to old engine until a new one will be decided to become
mature enough.




On Fri, Sep 27, 2019 at 5:08 PM Seliverstov Igor <gv...@gmail.com>
wrote:

> Nikolay,
>
> The main issue - there is no *selection*.
>
> There is a field of knowledge - relational algebra, which describes how to
> transform relational expressions saving their semantics, and a couple of
> implementations (Calcite is only one written in Java).
>
> There are only two alternatives:
>
> 1) Implementing white papers from scratch
> 2) Adopting Calcite to our needs.
>
> The second way was chosen by several other projects, there is experience,
> there is a list of known issues (like using indexes) so, almost everything
> is already done for us.
>
> Implementing a planner is a big deal, I think anybody understands it
> there. That's why our proposal to reuse others experience is obvious.
>
> If you have an alternative - you're welcome, I'll gratefully listen to you.
>
> The main question isn't "WHAT" but "HOW" - that's the discussion topic
> from my point of view.
>
> Regards,
> Igor
>
> > 27 сент. 2019 г., в 16:37, Nikolay Izhikov <ni...@apache.org>
> написал(а):
> >
> > Roman.
> >
> >> Nikolay, Maxim, I understand that our arguments may not be as obvious
> >> for you as it obvious for SQL team. So, please arrange your questions
> in
> >> a more constructive way.
> >
> > What is SQL team?
> > I only know Ignite community :)
> >
> > Please, share you knowledge in IEP.
> > I want to join to the process of engine *selection*.
> > It should start with the requirements to such engine.
> > Can you write it in IEP, please?
> >
> > My point is very simple:
> >
> > 1. We made the wrong decision with H2
> > 2. We should make a well-thought decision about the new engine.
> >
> >> How many tickets would satisfy you?
> >
> > You write about "issueS" with the H2.
> > All I see is one open ticket.
> > IEP doesn't provide enough information.
> > So it's not about the number of tickets, it's about
> >
> >> These two points (single map-reduce execution and inflexible optimizer)
> >> are the main problems with the current engine.
> >
> > We may come to the point when Calcite(or any other engine) brings us
> third and other "main problems".
> > This is how it happens with H2.
> >
> > Let's start from what we want to get with the engine and move forward
> from this base.
> > What do you think?
> >
> >
> >
> > В Пт, 27/09/2019 в 16:15 +0300, Roman Kondakov пишет:
> >> Maxim, Nikolay,
> >>
> >> I've listed two issues which show the ideological flaws of the current
> >> engine.
> >>
> >> 1. IGNITE-11448 - Open. This ticket describes the impossibility of
> >> executing queries which can not be fit in the hardcoded one pass
> >> map-reduce paradigm.
> >>
> >> 2. IGNITE-6085 - Closed (won't fix) - This ticket describes the second
> >> major problem with the current engine: H2 query optimizer is very
> >> primitive and can not perform many useful optimizations.
> >>
> >> These two points (single map-reduce execution and inflexible optimizer)
> >> are the main problems with the current engine. It means that our engine
> >> is currently  suitable for execution only a very limited subset of the
> >> typical SQL queries. For example it can not even run most of the TPC-H
> >> benchmark queries because they don't fit to the simple map-reduce
> paradigm.
> >>
> >>> All I see is links to two tickets:
> >>
> >> How many tickets would satisfy you? I named two. And it looks like it
> is
> >> not enough from your point of view. Ok, so how many is enough? The set
> >> of problems caused by listed above tickets is infinite, therefore I can
> >> not create a ticket for each of them.
> >>> Tech details also should be added.
> >>
> >> Tech details are in the tickets.
> >>
> >>> We can't discuss such a huge change as an execution engine replacement
> with descrition like:
> >>> "No data co-location control, i.e. arbitrary data can be returned
> silently" or
> >>> "Low control on how query executes internally, as a result we have
> limited possibility to implement improvements/fixes."
> >>
> >> Why not? Don't you understand these problems? Or you don't think this
> is
> >> a problem?
> >>
> >>> Let's make these descriptions more specific.
> >>
> >> What do you mean by "more specific"? What is the criteria of the
> >> specific description?
> >>
> >>
> >>
> >> Nikolay, Maxim, I understand that our arguments may not be as obvious
> >> for you as it obvious for SQL team. So, please arrange your questions
> in
> >> a more constructive way.
> >>
> >> Thank you!
>
>

-- 
Best regards,
Andrey V. Mashenkov

Re: New SQL execution engine

Posted by Nikolay Izhikov <ni...@apache.org>.

> I think, we should discuss the idea in general.

Everybody likes the idea so far :)
The issues in details, as usual.


В Пт, 27/09/2019 в 19:03 +0300, Seliverstov Igor пишет:
> Nikolay,
> 
> > What project hosted Calcite based engine?
> 
> 
> Currently the prototype is placed in my personal Ignite fork. I need an appropriate ticket before pushing it to ASF git repository. 
> At first, I think, we should discuss the idea in general.
> 
> > Personally, I'm against the support of two independent implementation of SQL engine for several releases.
> 
> 
> I don’t like the idea to have two engines too. But even development the engine on top of Calcite library is still a big deal. 
> I not sure it will be ready, no, I sure it WONT be ready by Ignite3 release. So I mentioned the option to have two engines at the same time.
> 
> > Let's start with the IEP clarification and replace the SQL engine with the best one for Ignite good.
> 
> Of course, but anyway it’s good to make familiar with a couple of examples it already describes and clarify some additional questions the community may ask.
> 
> Regards,
> Igor
> 
> > 27 сент. 2019 г., в 18:22, Nikolay Izhikov <ni...@apache.org> написал(а):
> > 
> > Igor.
> > 
> > > There is no decision, here we should decide.
> > 
> > Great.
> > 
> > > At now Calcite based engine is placed in different module
> > 
> > What project hosted Calcite based engine?
> > 
> > > It’s possible to develop it as an experimental extension at first (not a replacement)
> > 
> > For me, Ignite 3 are the place where the new engine has to be placed.
> > Personally, I'm against the support of two independent implementation of SQL engine for several releases.
> > 
> > Ignite has too many partially implemented features to include on more :)
> > 
> > Let's start with the IEP clarification and replace the SQL engine with the best one for Ignite good.
> > 
> > 
> > В Пт, 27/09/2019 в 18:08 +0300, Seliverstov Igor пишет:
> > > Nikolay,
> > > 
> > > At last we have better questions.
> > > 
> > > There is no decision, here we should decide.
> > > 
> > > Doing nothing isn’t a decision, it’s just doing nothing
> > > 
> > > Spark Catalyst is a good example, but under the hood it has absolutely the same idea, but adopted to Spark. Calcite is the same, but general. That’s why it’s better start point.
> > > 
> > > Implementing an engine from scratch is really cool, but looks like inventing a bicycle, don’t think it makes sense. At least I against this option.
> > > 
> > > I added requirements to IEP (as you asked), you may see it’s in DRAFT state and will be complemented by details.
> > > 
> > > We have some thoughts on how to make smooth replacement, but at first we should decide what to replace and what with.
> > > 
> > > At now Calcite based engine is placed in different module, we checked it can build execution graph for both local and distributed cases, it has good expandability. 
> > > We talked to Calcite community to identify possible future issues and everything points to the fact it’s the best option. 
> > > It’s possible to develop it as an experimental extension at first (not a replacement) until we make sure that it works as expected. This way there are no risks for anybody who uses Ignite on production environment.
> > > 
> > > Regards,
> > > Igor
> > > 
> > > 
> > > > 27 сент. 2019 г., в 17:25, Nikolay Izhikov <ni...@apache.org> написал(а):
> > > > 
> > > > Igor.
> > > > 
> > > > > The main issue - there is no *selection*.
> > > > 
> > > > 1. I don't remember community decision about this.
> > > > 
> > > > 2. We should avoid to make such long-term decision so quickly.
> > > > We done this kind of decision with H2 and come to the point when we should review it.
> > > > 
> > > > > 1) Implementing white papers from scratch
> > > > > 2) Adopting Calcite to our needs.
> > > > 
> > > > The third option don't fix issues we have with H2.
> > > > The fourth option I know is using spark-catalyst.
> > > > 
> > > > What is wrong with writing engine from scratch?
> > > > 
> > > > I ask you to start with engine requirements.
> > > > Can we, please, discuss it?
> > > > 
> > > > > If you have an alternative - you're welcome, I'll gratefully listen to you.
> > > > 
> > > > We have alternative for now - H2 based engine.
> > > > 
> > > > > The main question isn't "WHAT" but "HOW" - that's the discussion topic from my point of view.
> > > > 
> > > > When we make a decision about engine we can discuss roadmap for replacement.
> > > > One more time - replacement of SQL engine to some more customizable make sense for me.
> > > > But, this kind of decisions need carefull discussion.
> > > > 
> > > > В Пт, 27/09/2019 в 17:08 +0300, Seliverstov Igor пишет:
> > > > > Nikolay,
> > > > > 
> > > > > The main issue - there is no *selection*.
> > > > > 
> > > > > There is a field of knowledge - relational algebra, which describes how to transform relational expressions saving their semantics, and a couple of implementations (Calcite is only one written in Java).
> > > > > 
> > > > > There are only two alternatives:
> > > > > 
> > > > > 1) Implementing white papers from scratch
> > > > > 2) Adopting Calcite to our needs.
> > > > > 
> > > > > The second way was chosen by several other projects, there is experience, there is a list of known issues (like using indexes) so, almost everything is already done for us.
> > > > > 
> > > > > Implementing a planner is a big deal, I think anybody understands it there. That's why our proposal to reuse others experience is obvious.
> > > > > 
> > > > > If you have an alternative - you're welcome, I'll gratefully listen to you.
> > > > > 
> > > > > The main question isn't "WHAT" but "HOW" - that's the discussion topic from my point of view.
> > > > > 
> > > > > Regards,
> > > > > Igor
> > > > > 
> > > > > > 27 сент. 2019 г., в 16:37, Nikolay Izhikov <ni...@apache.org> написал(а):
> > > > > > 
> > > > > > Roman.
> > > > > > 
> > > > > > > Nikolay, Maxim, I understand that our arguments may not be as obvious 
> > > > > > > for you as it obvious for SQL team. So, please arrange your questions in 
> > > > > > > a more constructive way.
> > > > > > 
> > > > > > What is SQL team?
> > > > > > I only know Ignite community :)
> > > > > > 
> > > > > > Please, share you knowledge in IEP.
> > > > > > I want to join to the process of engine *selection*.
> > > > > > It should start with the requirements to such engine.
> > > > > > Can you write it in IEP, please?
> > > > > > 
> > > > > > My point is very simple:
> > > > > > 
> > > > > > 1. We made the wrong decision with H2
> > > > > > 2. We should make a well-thought decision about the new engine.
> > > > > > 
> > > > > > > How many tickets would satisfy you?
> > > > > > 
> > > > > > You write about "issueS" with the H2.
> > > > > > All I see is one open ticket.
> > > > > > IEP doesn't provide enough information.
> > > > > > So it's not about the number of tickets, it's about
> > > > > > 
> > > > > > > These two points (single map-reduce execution and inflexible optimizer) 
> > > > > > > are the main problems with the current engine.
> > > > > > 
> > > > > > We may come to the point when Calcite(or any other engine) brings us third and other "main problems".
> > > > > > This is how it happens with H2.
> > > > > > 
> > > > > > Let's start from what we want to get with the engine and move forward from this base.
> > > > > > What do you think?
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > В Пт, 27/09/2019 в 16:15 +0300, Roman Kondakov пишет:
> > > > > > > Maxim, Nikolay,
> > > > > > > 
> > > > > > > I've listed two issues which show the ideological flaws of the current 
> > > > > > > engine.
> > > > > > > 
> > > > > > > 1. IGNITE-11448 - Open. This ticket describes the impossibility of 
> > > > > > > executing queries which can not be fit in the hardcoded one pass 
> > > > > > > map-reduce paradigm.
> > > > > > > 
> > > > > > > 2. IGNITE-6085 - Closed (won't fix) - This ticket describes the second 
> > > > > > > major problem with the current engine: H2 query optimizer is very 
> > > > > > > primitive and can not perform many useful optimizations.
> > > > > > > 
> > > > > > > These two points (single map-reduce execution and inflexible optimizer) 
> > > > > > > are the main problems with the current engine. It means that our engine 
> > > > > > > is currently  suitable for execution only a very limited subset of the 
> > > > > > > typical SQL queries. For example it can not even run most of the TPC-H 
> > > > > > > benchmark queries because they don't fit to the simple map-reduce paradigm.
> > > > > > > 
> > > > > > > > All I see is links to two tickets:
> > > > > > > 
> > > > > > > How many tickets would satisfy you? I named two. And it looks like it is 
> > > > > > > not enough from your point of view. Ok, so how many is enough? The set 
> > > > > > > of problems caused by listed above tickets is infinite, therefore I can 
> > > > > > > not create a ticket for each of them.
> > > > > > > > Tech details also should be added.
> > > > > > > 
> > > > > > > Tech details are in the tickets.
> > > > > > > 
> > > > > > > > We can't discuss such a huge change as an execution engine replacement with descrition like:
> > > > > > > > "No data co-location control, i.e. arbitrary data can be returned silently" or
> > > > > > > > "Low control on how query executes internally, as a result we have limited possibility to implement improvements/fixes."
> > > > > > > 
> > > > > > > Why not? Don't you understand these problems? Or you don't think this is 
> > > > > > > a problem?
> > > > > > > 
> > > > > > > > Let's make these descriptions more specific.
> > > > > > > 
> > > > > > > What do you mean by "more specific"? What is the criteria of the 
> > > > > > > specific description?
> > > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > Nikolay, Maxim, I understand that our arguments may not be as obvious 
> > > > > > > for you as it obvious for SQL team. So, please arrange your questions in 
> > > > > > > a more constructive way.
> > > > > > > 
> > > > > > > Thank you!
> > > > > 
> > > > > 
> > > 
> > > 
> 
>

Re: New SQL execution engine

Posted by Seliverstov Igor <gv...@gmail.com>.

Nikolay,

> What project hosted Calcite based engine?


Currently the prototype is placed in my personal Ignite fork. I need an appropriate ticket before pushing it to ASF git repository. 
At first, I think, we should discuss the idea in general.

> Personally, I'm against the support of two independent implementation of SQL engine for several releases.


I don’t like the idea to have two engines too. But even development the engine on top of Calcite library is still a big deal. 
I not sure it will be ready, no, I sure it WONT be ready by Ignite3 release. So I mentioned the option to have two engines at the same time.

> Let's start with the IEP clarification and replace the SQL engine with the best one for Ignite good.

Of course, but anyway it’s good to make familiar with a couple of examples it already describes and clarify some additional questions the community may ask.

Regards,
Igor

> 27 сент. 2019 г., в 18:22, Nikolay Izhikov <ni...@apache.org> написал(а):
> 
> Igor.
> 
>> There is no decision, here we should decide.
> 
> Great.
> 
>> At now Calcite based engine is placed in different module
> 
> What project hosted Calcite based engine?
> 
>> It’s possible to develop it as an experimental extension at first (not a replacement)
> 
> For me, Ignite 3 are the place where the new engine has to be placed.
> Personally, I'm against the support of two independent implementation of SQL engine for several releases.
> 
> Ignite has too many partially implemented features to include on more :)
> 
> Let's start with the IEP clarification and replace the SQL engine with the best one for Ignite good.
> 
> 
> В Пт, 27/09/2019 в 18:08 +0300, Seliverstov Igor пишет:
>> Nikolay,
>> 
>> At last we have better questions.
>> 
>> There is no decision, here we should decide.
>> 
>> Doing nothing isn’t a decision, it’s just doing nothing
>> 
>> Spark Catalyst is a good example, but under the hood it has absolutely the same idea, but adopted to Spark. Calcite is the same, but general. That’s why it’s better start point.
>> 
>> Implementing an engine from scratch is really cool, but looks like inventing a bicycle, don’t think it makes sense. At least I against this option.
>> 
>> I added requirements to IEP (as you asked), you may see it’s in DRAFT state and will be complemented by details.
>> 
>> We have some thoughts on how to make smooth replacement, but at first we should decide what to replace and what with.
>> 
>> At now Calcite based engine is placed in different module, we checked it can build execution graph for both local and distributed cases, it has good expandability. 
>> We talked to Calcite community to identify possible future issues and everything points to the fact it’s the best option. 
>> It’s possible to develop it as an experimental extension at first (not a replacement) until we make sure that it works as expected. This way there are no risks for anybody who uses Ignite on production environment.
>> 
>> Regards,
>> Igor
>> 
>> 
>>> 27 сент. 2019 г., в 17:25, Nikolay Izhikov <ni...@apache.org> написал(а):
>>> 
>>> Igor.
>>> 
>>>> The main issue - there is no *selection*.
>>> 
>>> 1. I don't remember community decision about this.
>>> 
>>> 2. We should avoid to make such long-term decision so quickly.
>>> We done this kind of decision with H2 and come to the point when we should review it.
>>> 
>>>> 1) Implementing white papers from scratch
>>>> 2) Adopting Calcite to our needs.
>>> 
>>> The third option don't fix issues we have with H2.
>>> The fourth option I know is using spark-catalyst.
>>> 
>>> What is wrong with writing engine from scratch?
>>> 
>>> I ask you to start with engine requirements.
>>> Can we, please, discuss it?
>>> 
>>>> If you have an alternative - you're welcome, I'll gratefully listen to you.
>>> 
>>> We have alternative for now - H2 based engine.
>>> 
>>>> The main question isn't "WHAT" but "HOW" - that's the discussion topic from my point of view.
>>> 
>>> When we make a decision about engine we can discuss roadmap for replacement.
>>> One more time - replacement of SQL engine to some more customizable make sense for me.
>>> But, this kind of decisions need carefull discussion.
>>> 
>>> В Пт, 27/09/2019 в 17:08 +0300, Seliverstov Igor пишет:
>>>> Nikolay,
>>>> 
>>>> The main issue - there is no *selection*.
>>>> 
>>>> There is a field of knowledge - relational algebra, which describes how to transform relational expressions saving their semantics, and a couple of implementations (Calcite is only one written in Java).
>>>> 
>>>> There are only two alternatives:
>>>> 
>>>> 1) Implementing white papers from scratch
>>>> 2) Adopting Calcite to our needs.
>>>> 
>>>> The second way was chosen by several other projects, there is experience, there is a list of known issues (like using indexes) so, almost everything is already done for us.
>>>> 
>>>> Implementing a planner is a big deal, I think anybody understands it there. That's why our proposal to reuse others experience is obvious.
>>>> 
>>>> If you have an alternative - you're welcome, I'll gratefully listen to you.
>>>> 
>>>> The main question isn't "WHAT" but "HOW" - that's the discussion topic from my point of view.
>>>> 
>>>> Regards,
>>>> Igor
>>>> 
>>>>> 27 сент. 2019 г., в 16:37, Nikolay Izhikov <ni...@apache.org> написал(а):
>>>>> 
>>>>> Roman.
>>>>> 
>>>>>> Nikolay, Maxim, I understand that our arguments may not be as obvious 
>>>>>> for you as it obvious for SQL team. So, please arrange your questions in 
>>>>>> a more constructive way.
>>>>> 
>>>>> What is SQL team?
>>>>> I only know Ignite community :)
>>>>> 
>>>>> Please, share you knowledge in IEP.
>>>>> I want to join to the process of engine *selection*.
>>>>> It should start with the requirements to such engine.
>>>>> Can you write it in IEP, please?
>>>>> 
>>>>> My point is very simple:
>>>>> 
>>>>> 1. We made the wrong decision with H2
>>>>> 2. We should make a well-thought decision about the new engine.
>>>>> 
>>>>>> How many tickets would satisfy you?
>>>>> 
>>>>> You write about "issueS" with the H2.
>>>>> All I see is one open ticket.
>>>>> IEP doesn't provide enough information.
>>>>> So it's not about the number of tickets, it's about
>>>>> 
>>>>>> These two points (single map-reduce execution and inflexible optimizer) 
>>>>>> are the main problems with the current engine.
>>>>> 
>>>>> We may come to the point when Calcite(or any other engine) brings us third and other "main problems".
>>>>> This is how it happens with H2.
>>>>> 
>>>>> Let's start from what we want to get with the engine and move forward from this base.
>>>>> What do you think?
>>>>> 
>>>>> 
>>>>> 
>>>>> В Пт, 27/09/2019 в 16:15 +0300, Roman Kondakov пишет:
>>>>>> Maxim, Nikolay,
>>>>>> 
>>>>>> I've listed two issues which show the ideological flaws of the current 
>>>>>> engine.
>>>>>> 
>>>>>> 1. IGNITE-11448 - Open. This ticket describes the impossibility of 
>>>>>> executing queries which can not be fit in the hardcoded one pass 
>>>>>> map-reduce paradigm.
>>>>>> 
>>>>>> 2. IGNITE-6085 - Closed (won't fix) - This ticket describes the second 
>>>>>> major problem with the current engine: H2 query optimizer is very 
>>>>>> primitive and can not perform many useful optimizations.
>>>>>> 
>>>>>> These two points (single map-reduce execution and inflexible optimizer) 
>>>>>> are the main problems with the current engine. It means that our engine 
>>>>>> is currently  suitable for execution only a very limited subset of the 
>>>>>> typical SQL queries. For example it can not even run most of the TPC-H 
>>>>>> benchmark queries because they don't fit to the simple map-reduce paradigm.
>>>>>> 
>>>>>>> All I see is links to two tickets:
>>>>>> 
>>>>>> How many tickets would satisfy you? I named two. And it looks like it is 
>>>>>> not enough from your point of view. Ok, so how many is enough? The set 
>>>>>> of problems caused by listed above tickets is infinite, therefore I can 
>>>>>> not create a ticket for each of them.
>>>>>>> Tech details also should be added.
>>>>>> 
>>>>>> Tech details are in the tickets.
>>>>>> 
>>>>>>> We can't discuss such a huge change as an execution engine replacement with descrition like:
>>>>>>> "No data co-location control, i.e. arbitrary data can be returned silently" or
>>>>>>> "Low control on how query executes internally, as a result we have limited possibility to implement improvements/fixes."
>>>>>> 
>>>>>> Why not? Don't you understand these problems? Or you don't think this is 
>>>>>> a problem?
>>>>>> 
>>>>>>> Let's make these descriptions more specific.
>>>>>> 
>>>>>> What do you mean by "more specific"? What is the criteria of the 
>>>>>> specific description?
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Nikolay, Maxim, I understand that our arguments may not be as obvious 
>>>>>> for you as it obvious for SQL team. So, please arrange your questions in 
>>>>>> a more constructive way.
>>>>>> 
>>>>>> Thank you!
>>>> 
>>>> 
>> 
>>

Re: New SQL execution engine

Posted by Nikolay Izhikov <ni...@apache.org>.

Igor.

> There is no decision, here we should decide.

Great.

> At now Calcite based engine is placed in different module

What project hosted Calcite based engine?

> It’s possible to develop it as an experimental extension at first (not a replacement)

For me, Ignite 3 are the place where the new engine has to be placed.
Personally, I'm against the support of two independent implementation of SQL engine for several releases.

Ignite has too many partially implemented features to include on more :)

Let's start with the IEP clarification and replace the SQL engine with the best one for Ignite good.


В Пт, 27/09/2019 в 18:08 +0300, Seliverstov Igor пишет:
> Nikolay,
> 
> At last we have better questions.
> 
> There is no decision, here we should decide.
> 
> Doing nothing isn’t a decision, it’s just doing nothing
> 
> Spark Catalyst is a good example, but under the hood it has absolutely the same idea, but adopted to Spark. Calcite is the same, but general. That’s why it’s better start point.
> 
> Implementing an engine from scratch is really cool, but looks like inventing a bicycle, don’t think it makes sense. At least I against this option.
> 
> I added requirements to IEP (as you asked), you may see it’s in DRAFT state and will be complemented by details.
> 
> We have some thoughts on how to make smooth replacement, but at first we should decide what to replace and what with.
> 
> At now Calcite based engine is placed in different module, we checked it can build execution graph for both local and distributed cases, it has good expandability. 
> We talked to Calcite community to identify possible future issues and everything points to the fact it’s the best option. 
> It’s possible to develop it as an experimental extension at first (not a replacement) until we make sure that it works as expected. This way there are no risks for anybody who uses Ignite on production environment.
> 
> Regards,
> Igor
> 
> 
> > 27 сент. 2019 г., в 17:25, Nikolay Izhikov <ni...@apache.org> написал(а):
> > 
> > Igor.
> > 
> > > The main issue - there is no *selection*.
> > 
> > 1. I don't remember community decision about this.
> > 
> > 2. We should avoid to make such long-term decision so quickly.
> > We done this kind of decision with H2 and come to the point when we should review it.
> > 
> > > 1) Implementing white papers from scratch
> > > 2) Adopting Calcite to our needs.
> > 
> > The third option don't fix issues we have with H2.
> > The fourth option I know is using spark-catalyst.
> > 
> > What is wrong with writing engine from scratch?
> > 
> > I ask you to start with engine requirements.
> > Can we, please, discuss it?
> > 
> > > If you have an alternative - you're welcome, I'll gratefully listen to you.
> > 
> > We have alternative for now - H2 based engine.
> > 
> > > The main question isn't "WHAT" but "HOW" - that's the discussion topic from my point of view.
> > 
> > When we make a decision about engine we can discuss roadmap for replacement.
> > One more time - replacement of SQL engine to some more customizable make sense for me.
> > But, this kind of decisions need carefull discussion.
> > 
> > В Пт, 27/09/2019 в 17:08 +0300, Seliverstov Igor пишет:
> > > Nikolay,
> > > 
> > > The main issue - there is no *selection*.
> > > 
> > > There is a field of knowledge - relational algebra, which describes how to transform relational expressions saving their semantics, and a couple of implementations (Calcite is only one written in Java).
> > > 
> > > There are only two alternatives:
> > > 
> > > 1) Implementing white papers from scratch
> > > 2) Adopting Calcite to our needs.
> > > 
> > > The second way was chosen by several other projects, there is experience, there is a list of known issues (like using indexes) so, almost everything is already done for us.
> > > 
> > > Implementing a planner is a big deal, I think anybody understands it there. That's why our proposal to reuse others experience is obvious.
> > > 
> > > If you have an alternative - you're welcome, I'll gratefully listen to you.
> > > 
> > > The main question isn't "WHAT" but "HOW" - that's the discussion topic from my point of view.
> > > 
> > > Regards,
> > > Igor
> > > 
> > > > 27 сент. 2019 г., в 16:37, Nikolay Izhikov <ni...@apache.org> написал(а):
> > > > 
> > > > Roman.
> > > > 
> > > > > Nikolay, Maxim, I understand that our arguments may not be as obvious 
> > > > > for you as it obvious for SQL team. So, please arrange your questions in 
> > > > > a more constructive way.
> > > > 
> > > > What is SQL team?
> > > > I only know Ignite community :)
> > > > 
> > > > Please, share you knowledge in IEP.
> > > > I want to join to the process of engine *selection*.
> > > > It should start with the requirements to such engine.
> > > > Can you write it in IEP, please?
> > > > 
> > > > My point is very simple:
> > > > 
> > > > 1. We made the wrong decision with H2
> > > > 2. We should make a well-thought decision about the new engine.
> > > > 
> > > > > How many tickets would satisfy you?
> > > > 
> > > > You write about "issueS" with the H2.
> > > > All I see is one open ticket.
> > > > IEP doesn't provide enough information.
> > > > So it's not about the number of tickets, it's about
> > > > 
> > > > > These two points (single map-reduce execution and inflexible optimizer) 
> > > > > are the main problems with the current engine.
> > > > 
> > > > We may come to the point when Calcite(or any other engine) brings us third and other "main problems".
> > > > This is how it happens with H2.
> > > > 
> > > > Let's start from what we want to get with the engine and move forward from this base.
> > > > What do you think?
> > > > 
> > > > 
> > > > 
> > > > В Пт, 27/09/2019 в 16:15 +0300, Roman Kondakov пишет:
> > > > > Maxim, Nikolay,
> > > > > 
> > > > > I've listed two issues which show the ideological flaws of the current 
> > > > > engine.
> > > > > 
> > > > > 1. IGNITE-11448 - Open. This ticket describes the impossibility of 
> > > > > executing queries which can not be fit in the hardcoded one pass 
> > > > > map-reduce paradigm.
> > > > > 
> > > > > 2. IGNITE-6085 - Closed (won't fix) - This ticket describes the second 
> > > > > major problem with the current engine: H2 query optimizer is very 
> > > > > primitive and can not perform many useful optimizations.
> > > > > 
> > > > > These two points (single map-reduce execution and inflexible optimizer) 
> > > > > are the main problems with the current engine. It means that our engine 
> > > > > is currently  suitable for execution only a very limited subset of the 
> > > > > typical SQL queries. For example it can not even run most of the TPC-H 
> > > > > benchmark queries because they don't fit to the simple map-reduce paradigm.
> > > > > 
> > > > > > All I see is links to two tickets:
> > > > > 
> > > > > How many tickets would satisfy you? I named two. And it looks like it is 
> > > > > not enough from your point of view. Ok, so how many is enough? The set 
> > > > > of problems caused by listed above tickets is infinite, therefore I can 
> > > > > not create a ticket for each of them.
> > > > > > Tech details also should be added.
> > > > > 
> > > > > Tech details are in the tickets.
> > > > > 
> > > > > > We can't discuss such a huge change as an execution engine replacement with descrition like:
> > > > > > "No data co-location control, i.e. arbitrary data can be returned silently" or
> > > > > > "Low control on how query executes internally, as a result we have limited possibility to implement improvements/fixes."
> > > > > 
> > > > > Why not? Don't you understand these problems? Or you don't think this is 
> > > > > a problem?
> > > > > 
> > > > > > Let's make these descriptions more specific.
> > > > > 
> > > > > What do you mean by "more specific"? What is the criteria of the 
> > > > > specific description?
> > > > > 
> > > > > 
> > > > > 
> > > > > Nikolay, Maxim, I understand that our arguments may not be as obvious 
> > > > > for you as it obvious for SQL team. So, please arrange your questions in 
> > > > > a more constructive way.
> > > > > 
> > > > > Thank you!
> > > 
> > > 
> 
>

Re: New SQL execution engine

Posted by Seliverstov Igor <gv...@gmail.com>.

Nikolay,

At last we have better questions.

There is no decision, here we should decide.

Doing nothing isn’t a decision, it’s just doing nothing

Spark Catalyst is a good example, but under the hood it has absolutely the same idea, but adopted to Spark. Calcite is the same, but general. That’s why it’s better start point.

Implementing an engine from scratch is really cool, but looks like inventing a bicycle, don’t think it makes sense. At least I against this option.

I added requirements to IEP (as you asked), you may see it’s in DRAFT state and will be complemented by details.

We have some thoughts on how to make smooth replacement, but at first we should decide what to replace and what with.

At now Calcite based engine is placed in different module, we checked it can build execution graph for both local and distributed cases, it has good expandability. 
We talked to Calcite community to identify possible future issues and everything points to the fact it’s the best option. 
It’s possible to develop it as an experimental extension at first (not a replacement) until we make sure that it works as expected. This way there are no risks for anybody who uses Ignite on production environment.

Regards,
Igor


> 27 сент. 2019 г., в 17:25, Nikolay Izhikov <ni...@apache.org> написал(а):
> 
> Igor.
> 
>> The main issue - there is no *selection*.
> 
> 1. I don't remember community decision about this.
> 
> 2. We should avoid to make such long-term decision so quickly.
> We done this kind of decision with H2 and come to the point when we should review it.
> 
>> 1) Implementing white papers from scratch
>> 2) Adopting Calcite to our needs.
> 
> The third option don't fix issues we have with H2.
> The fourth option I know is using spark-catalyst.
> 
> What is wrong with writing engine from scratch?
> 
> I ask you to start with engine requirements.
> Can we, please, discuss it?
> 
>> If you have an alternative - you're welcome, I'll gratefully listen to you.
> 
> We have alternative for now - H2 based engine.
> 
>> The main question isn't "WHAT" but "HOW" - that's the discussion topic from my point of view.
> 
> When we make a decision about engine we can discuss roadmap for replacement.
> One more time - replacement of SQL engine to some more customizable make sense for me.
> But, this kind of decisions need carefull discussion.
> 
> В Пт, 27/09/2019 в 17:08 +0300, Seliverstov Igor пишет:
>> Nikolay,
>> 
>> The main issue - there is no *selection*.
>> 
>> There is a field of knowledge - relational algebra, which describes how to transform relational expressions saving their semantics, and a couple of implementations (Calcite is only one written in Java).
>> 
>> There are only two alternatives:
>> 
>> 1) Implementing white papers from scratch
>> 2) Adopting Calcite to our needs.
>> 
>> The second way was chosen by several other projects, there is experience, there is a list of known issues (like using indexes) so, almost everything is already done for us.
>> 
>> Implementing a planner is a big deal, I think anybody understands it there. That's why our proposal to reuse others experience is obvious.
>> 
>> If you have an alternative - you're welcome, I'll gratefully listen to you.
>> 
>> The main question isn't "WHAT" but "HOW" - that's the discussion topic from my point of view.
>> 
>> Regards,
>> Igor
>> 
>>> 27 сент. 2019 г., в 16:37, Nikolay Izhikov <ni...@apache.org> написал(а):
>>> 
>>> Roman.
>>> 
>>>> Nikolay, Maxim, I understand that our arguments may not be as obvious 
>>>> for you as it obvious for SQL team. So, please arrange your questions in 
>>>> a more constructive way.
>>> 
>>> What is SQL team?
>>> I only know Ignite community :)
>>> 
>>> Please, share you knowledge in IEP.
>>> I want to join to the process of engine *selection*.
>>> It should start with the requirements to such engine.
>>> Can you write it in IEP, please?
>>> 
>>> My point is very simple:
>>> 
>>> 1. We made the wrong decision with H2
>>> 2. We should make a well-thought decision about the new engine.
>>> 
>>>> How many tickets would satisfy you?
>>> 
>>> You write about "issueS" with the H2.
>>> All I see is one open ticket.
>>> IEP doesn't provide enough information.
>>> So it's not about the number of tickets, it's about
>>> 
>>>> These two points (single map-reduce execution and inflexible optimizer) 
>>>> are the main problems with the current engine.
>>> 
>>> We may come to the point when Calcite(or any other engine) brings us third and other "main problems".
>>> This is how it happens with H2.
>>> 
>>> Let's start from what we want to get with the engine and move forward from this base.
>>> What do you think?
>>> 
>>> 
>>> 
>>> В Пт, 27/09/2019 в 16:15 +0300, Roman Kondakov пишет:
>>>> Maxim, Nikolay,
>>>> 
>>>> I've listed two issues which show the ideological flaws of the current 
>>>> engine.
>>>> 
>>>> 1. IGNITE-11448 - Open. This ticket describes the impossibility of 
>>>> executing queries which can not be fit in the hardcoded one pass 
>>>> map-reduce paradigm.
>>>> 
>>>> 2. IGNITE-6085 - Closed (won't fix) - This ticket describes the second 
>>>> major problem with the current engine: H2 query optimizer is very 
>>>> primitive and can not perform many useful optimizations.
>>>> 
>>>> These two points (single map-reduce execution and inflexible optimizer) 
>>>> are the main problems with the current engine. It means that our engine 
>>>> is currently  suitable for execution only a very limited subset of the 
>>>> typical SQL queries. For example it can not even run most of the TPC-H 
>>>> benchmark queries because they don't fit to the simple map-reduce paradigm.
>>>> 
>>>>> All I see is links to two tickets:
>>>> 
>>>> How many tickets would satisfy you? I named two. And it looks like it is 
>>>> not enough from your point of view. Ok, so how many is enough? The set 
>>>> of problems caused by listed above tickets is infinite, therefore I can 
>>>> not create a ticket for each of them.
>>>>> Tech details also should be added.
>>>> 
>>>> Tech details are in the tickets.
>>>> 
>>>>> We can't discuss such a huge change as an execution engine replacement with descrition like:
>>>>> "No data co-location control, i.e. arbitrary data can be returned silently" or
>>>>> "Low control on how query executes internally, as a result we have limited possibility to implement improvements/fixes."
>>>> 
>>>> Why not? Don't you understand these problems? Or you don't think this is 
>>>> a problem?
>>>> 
>>>>> Let's make these descriptions more specific.
>>>> 
>>>> What do you mean by "more specific"? What is the criteria of the 
>>>> specific description?
>>>> 
>>>> 
>>>> 
>>>> Nikolay, Maxim, I understand that our arguments may not be as obvious 
>>>> for you as it obvious for SQL team. So, please arrange your questions in 
>>>> a more constructive way.
>>>> 
>>>> Thank you!
>> 
>>

Re: New SQL execution engine

Posted by Maxim Muzafarov <mm...@apache.org>.

Folks,
especially Ignite PMCs,

Are there any plans about how Ignite SQL will be evolved? It is a very
interesting thread on how Ignite SQL as a product will be developed
for the near future e.g. supporting new standards etc.

According to documentation Ignite complies with SQL ANSI-99 [2] but in
fact (correct me if I'm wrong) it doesn't support recursive queries
[1] (the issue mentioned by Andrey), right? Will it be solvable by the
new engine?

[1] https://issues.apache.org/jira/browse/IGNITE-5475
[2] http://ignite.apache.org/use-cases/database/sql-database.html

On Fri, 27 Sep 2019 at 17:22, Nikolay Izhikov <ni...@apache.org> wrote:
>
> Igor.
>
> > The main issue - there is no *selection*.
>
> 1. I don't remember community decision about this.
>
> 2. We should avoid to make such long-term decision so quickly.
> We done this kind of decision with H2 and come to the point when we should review it.
>
> > 1) Implementing white papers from scratch
> > 2) Adopting Calcite to our needs.
>
> The third option don't fix issues we have with H2.
> The fourth option I know is using spark-catalyst.
>
> What is wrong with writing engine from scratch?
>
> I ask you to start with engine requirements.
> Can we, please, discuss it?
>
> > If you have an alternative - you're welcome, I'll gratefully listen to you.
>
> We have alternative for now - H2 based engine.
>
> > The main question isn't "WHAT" but "HOW" - that's the discussion topic from my point of view.
>
> When we make a decision about engine we can discuss roadmap for replacement.
> One more time - replacement of SQL engine to some more customizable make sense for me.
> But, this kind of decisions need carefull discussion.
>
> В Пт, 27/09/2019 в 17:08 +0300, Seliverstov Igor пишет:
> > Nikolay,
> >
> > The main issue - there is no *selection*.
> >
> > There is a field of knowledge - relational algebra, which describes how to transform relational expressions saving their semantics, and a couple of implementations (Calcite is only one written in Java).
> >
> > There are only two alternatives:
> >
> > 1) Implementing white papers from scratch
> > 2) Adopting Calcite to our needs.
> >
> > The second way was chosen by several other projects, there is experience, there is a list of known issues (like using indexes) so, almost everything is already done for us.
> >
> > Implementing a planner is a big deal, I think anybody understands it there. That's why our proposal to reuse others experience is obvious.
> >
> > If you have an alternative - you're welcome, I'll gratefully listen to you.
> >
> > The main question isn't "WHAT" but "HOW" - that's the discussion topic from my point of view.
> >
> > Regards,
> > Igor
> >
> > > 27 сент. 2019 г., в 16:37, Nikolay Izhikov <ni...@apache.org> написал(а):
> > >
> > > Roman.
> > >
> > > > Nikolay, Maxim, I understand that our arguments may not be as obvious
> > > > for you as it obvious for SQL team. So, please arrange your questions in
> > > > a more constructive way.
> > >
> > > What is SQL team?
> > > I only know Ignite community :)
> > >
> > > Please, share you knowledge in IEP.
> > > I want to join to the process of engine *selection*.
> > > It should start with the requirements to such engine.
> > > Can you write it in IEP, please?
> > >
> > > My point is very simple:
> > >
> > > 1. We made the wrong decision with H2
> > > 2. We should make a well-thought decision about the new engine.
> > >
> > > > How many tickets would satisfy you?
> > >
> > > You write about "issueS" with the H2.
> > > All I see is one open ticket.
> > > IEP doesn't provide enough information.
> > > So it's not about the number of tickets, it's about
> > >
> > > > These two points (single map-reduce execution and inflexible optimizer)
> > > > are the main problems with the current engine.
> > >
> > > We may come to the point when Calcite(or any other engine) brings us third and other "main problems".
> > > This is how it happens with H2.
> > >
> > > Let's start from what we want to get with the engine and move forward from this base.
> > > What do you think?
> > >
> > >
> > >
> > > В Пт, 27/09/2019 в 16:15 +0300, Roman Kondakov пишет:
> > > > Maxim, Nikolay,
> > > >
> > > > I've listed two issues which show the ideological flaws of the current
> > > > engine.
> > > >
> > > > 1. IGNITE-11448 - Open. This ticket describes the impossibility of
> > > > executing queries which can not be fit in the hardcoded one pass
> > > > map-reduce paradigm.
> > > >
> > > > 2. IGNITE-6085 - Closed (won't fix) - This ticket describes the second
> > > > major problem with the current engine: H2 query optimizer is very
> > > > primitive and can not perform many useful optimizations.
> > > >
> > > > These two points (single map-reduce execution and inflexible optimizer)
> > > > are the main problems with the current engine. It means that our engine
> > > > is currently  suitable for execution only a very limited subset of the
> > > > typical SQL queries. For example it can not even run most of the TPC-H
> > > > benchmark queries because they don't fit to the simple map-reduce paradigm.
> > > >
> > > > > All I see is links to two tickets:
> > > >
> > > > How many tickets would satisfy you? I named two. And it looks like it is
> > > > not enough from your point of view. Ok, so how many is enough? The set
> > > > of problems caused by listed above tickets is infinite, therefore I can
> > > > not create a ticket for each of them.
> > > > > Tech details also should be added.
> > > >
> > > > Tech details are in the tickets.
> > > >
> > > > > We can't discuss such a huge change as an execution engine replacement with descrition like:
> > > > > "No data co-location control, i.e. arbitrary data can be returned silently" or
> > > > > "Low control on how query executes internally, as a result we have limited possibility to implement improvements/fixes."
> > > >
> > > > Why not? Don't you understand these problems? Or you don't think this is
> > > > a problem?
> > > >
> > > > > Let's make these descriptions more specific.
> > > >
> > > > What do you mean by "more specific"? What is the criteria of the
> > > > specific description?
> > > >
> > > >
> > > >
> > > > Nikolay, Maxim, I understand that our arguments may not be as obvious
> > > > for you as it obvious for SQL team. So, please arrange your questions in
> > > > a more constructive way.
> > > >
> > > > Thank you!
> >
> >

Re: New SQL execution engine

Posted by Nikolay Izhikov <ni...@apache.org>.

Igor.

> The main issue - there is no *selection*.

1. I don't remember community decision about this.

2. We should avoid to make such long-term decision so quickly.
We done this kind of decision with H2 and come to the point when we should review it.

> 1) Implementing white papers from scratch
> 2) Adopting Calcite to our needs.

The third option don't fix issues we have with H2.
The fourth option I know is using spark-catalyst.

What is wrong with writing engine from scratch?

I ask you to start with engine requirements.
Can we, please, discuss it?

> If you have an alternative - you're welcome, I'll gratefully listen to you.

We have alternative for now - H2 based engine.

> The main question isn't "WHAT" but "HOW" - that's the discussion topic from my point of view.

When we make a decision about engine we can discuss roadmap for replacement.
One more time - replacement of SQL engine to some more customizable make sense for me.
But, this kind of decisions need carefull discussion.

В Пт, 27/09/2019 в 17:08 +0300, Seliverstov Igor пишет:
> Nikolay,
> 
> The main issue - there is no *selection*.
> 
> There is a field of knowledge - relational algebra, which describes how to transform relational expressions saving their semantics, and a couple of implementations (Calcite is only one written in Java).
> 
> There are only two alternatives:
> 
> 1) Implementing white papers from scratch
> 2) Adopting Calcite to our needs.
> 
> The second way was chosen by several other projects, there is experience, there is a list of known issues (like using indexes) so, almost everything is already done for us.
> 
> Implementing a planner is a big deal, I think anybody understands it there. That's why our proposal to reuse others experience is obvious.
> 
> If you have an alternative - you're welcome, I'll gratefully listen to you.
> 
> The main question isn't "WHAT" but "HOW" - that's the discussion topic from my point of view.
> 
> Regards,
> Igor
> 
> > 27 сент. 2019 г., в 16:37, Nikolay Izhikov <ni...@apache.org> написал(а):
> > 
> > Roman.
> > 
> > > Nikolay, Maxim, I understand that our arguments may not be as obvious 
> > > for you as it obvious for SQL team. So, please arrange your questions in 
> > > a more constructive way.
> > 
> > What is SQL team?
> > I only know Ignite community :)
> > 
> > Please, share you knowledge in IEP.
> > I want to join to the process of engine *selection*.
> > It should start with the requirements to such engine.
> > Can you write it in IEP, please?
> > 
> > My point is very simple:
> > 
> > 1. We made the wrong decision with H2
> > 2. We should make a well-thought decision about the new engine.
> > 
> > > How many tickets would satisfy you?
> > 
> > You write about "issueS" with the H2.
> > All I see is one open ticket.
> > IEP doesn't provide enough information.
> > So it's not about the number of tickets, it's about
> > 
> > > These two points (single map-reduce execution and inflexible optimizer) 
> > > are the main problems with the current engine.
> > 
> > We may come to the point when Calcite(or any other engine) brings us third and other "main problems".
> > This is how it happens with H2.
> > 
> > Let's start from what we want to get with the engine and move forward from this base.
> > What do you think?
> > 
> > 
> > 
> > В Пт, 27/09/2019 в 16:15 +0300, Roman Kondakov пишет:
> > > Maxim, Nikolay,
> > > 
> > > I've listed two issues which show the ideological flaws of the current 
> > > engine.
> > > 
> > > 1. IGNITE-11448 - Open. This ticket describes the impossibility of 
> > > executing queries which can not be fit in the hardcoded one pass 
> > > map-reduce paradigm.
> > > 
> > > 2. IGNITE-6085 - Closed (won't fix) - This ticket describes the second 
> > > major problem with the current engine: H2 query optimizer is very 
> > > primitive and can not perform many useful optimizations.
> > > 
> > > These two points (single map-reduce execution and inflexible optimizer) 
> > > are the main problems with the current engine. It means that our engine 
> > > is currently  suitable for execution only a very limited subset of the 
> > > typical SQL queries. For example it can not even run most of the TPC-H 
> > > benchmark queries because they don't fit to the simple map-reduce paradigm.
> > > 
> > > > All I see is links to two tickets:
> > > 
> > > How many tickets would satisfy you? I named two. And it looks like it is 
> > > not enough from your point of view. Ok, so how many is enough? The set 
> > > of problems caused by listed above tickets is infinite, therefore I can 
> > > not create a ticket for each of them.
> > > > Tech details also should be added.
> > > 
> > > Tech details are in the tickets.
> > > 
> > > > We can't discuss such a huge change as an execution engine replacement with descrition like:
> > > > "No data co-location control, i.e. arbitrary data can be returned silently" or
> > > > "Low control on how query executes internally, as a result we have limited possibility to implement improvements/fixes."
> > > 
> > > Why not? Don't you understand these problems? Or you don't think this is 
> > > a problem?
> > > 
> > > > Let's make these descriptions more specific.
> > > 
> > > What do you mean by "more specific"? What is the criteria of the 
> > > specific description?
> > > 
> > > 
> > > 
> > > Nikolay, Maxim, I understand that our arguments may not be as obvious 
> > > for you as it obvious for SQL team. So, please arrange your questions in 
> > > a more constructive way.
> > > 
> > > Thank you!
> 
>

Re: New SQL execution engine

Posted by Seliverstov Igor <gv...@gmail.com>.

Nikolay,

The main issue - there is no *selection*.

There is a field of knowledge - relational algebra, which describes how to transform relational expressions saving their semantics, and a couple of implementations (Calcite is only one written in Java).

There are only two alternatives:

1) Implementing white papers from scratch
2) Adopting Calcite to our needs.

The second way was chosen by several other projects, there is experience, there is a list of known issues (like using indexes) so, almost everything is already done for us.

Implementing a planner is a big deal, I think anybody understands it there. That's why our proposal to reuse others experience is obvious.

If you have an alternative - you're welcome, I'll gratefully listen to you.

The main question isn't "WHAT" but "HOW" - that's the discussion topic from my point of view.

Regards,
Igor

> 27 сент. 2019 г., в 16:37, Nikolay Izhikov <ni...@apache.org> написал(а):
> 
> Roman.
> 
>> Nikolay, Maxim, I understand that our arguments may not be as obvious 
>> for you as it obvious for SQL team. So, please arrange your questions in 
>> a more constructive way.
> 
> What is SQL team?
> I only know Ignite community :)
> 
> Please, share you knowledge in IEP.
> I want to join to the process of engine *selection*.
> It should start with the requirements to such engine.
> Can you write it in IEP, please?
> 
> My point is very simple:
> 
> 1. We made the wrong decision with H2
> 2. We should make a well-thought decision about the new engine.
> 
>> How many tickets would satisfy you?
> 
> You write about "issueS" with the H2.
> All I see is one open ticket.
> IEP doesn't provide enough information.
> So it's not about the number of tickets, it's about
> 
>> These two points (single map-reduce execution and inflexible optimizer) 
>> are the main problems with the current engine.
> 
> We may come to the point when Calcite(or any other engine) brings us third and other "main problems".
> This is how it happens with H2.
> 
> Let's start from what we want to get with the engine and move forward from this base.
> What do you think?
> 
> 
> 
> В Пт, 27/09/2019 в 16:15 +0300, Roman Kondakov пишет:
>> Maxim, Nikolay,
>> 
>> I've listed two issues which show the ideological flaws of the current 
>> engine.
>> 
>> 1. IGNITE-11448 - Open. This ticket describes the impossibility of 
>> executing queries which can not be fit in the hardcoded one pass 
>> map-reduce paradigm.
>> 
>> 2. IGNITE-6085 - Closed (won't fix) - This ticket describes the second 
>> major problem with the current engine: H2 query optimizer is very 
>> primitive and can not perform many useful optimizations.
>> 
>> These two points (single map-reduce execution and inflexible optimizer) 
>> are the main problems with the current engine. It means that our engine 
>> is currently  suitable for execution only a very limited subset of the 
>> typical SQL queries. For example it can not even run most of the TPC-H 
>> benchmark queries because they don't fit to the simple map-reduce paradigm.
>> 
>>> All I see is links to two tickets:
>> 
>> How many tickets would satisfy you? I named two. And it looks like it is 
>> not enough from your point of view. Ok, so how many is enough? The set 
>> of problems caused by listed above tickets is infinite, therefore I can 
>> not create a ticket for each of them.
>>> Tech details also should be added.
>> 
>> Tech details are in the tickets.
>> 
>>> We can't discuss such a huge change as an execution engine replacement with descrition like:
>>> "No data co-location control, i.e. arbitrary data can be returned silently" or
>>> "Low control on how query executes internally, as a result we have limited possibility to implement improvements/fixes."
>> 
>> Why not? Don't you understand these problems? Or you don't think this is 
>> a problem?
>> 
>>> Let's make these descriptions more specific.
>> 
>> What do you mean by "more specific"? What is the criteria of the 
>> specific description?
>> 
>> 
>> 
>> Nikolay, Maxim, I understand that our arguments may not be as obvious 
>> for you as it obvious for SQL team. So, please arrange your questions in 
>> a more constructive way.
>> 
>> Thank you!

Re: New SQL execution engine

Posted by Nikolay Izhikov <ni...@apache.org>.

Roman.

> Nikolay, Maxim, I understand that our arguments may not be as obvious 
> for you as it obvious for SQL team. So, please arrange your questions in 
> a more constructive way.

What is SQL team?
I only know Ignite community :)

Please, share you knowledge in IEP.
I want to join to the process of engine *selection*.
It should start with the requirements to such engine.
Can you write it in IEP, please?

My point is very simple:

1. We made the wrong decision with H2
2. We should make a well-thought decision about the new engine.

> How many tickets would satisfy you?

You write about "issueS" with the H2.
All I see is one open ticket.
IEP doesn't provide enough information.
So it's not about the number of tickets, it's about

> These two points (single map-reduce execution and inflexible optimizer) 
> are the main problems with the current engine.

We may come to the point when Calcite(or any other engine) brings us third and other "main problems".
This is how it happens with H2.

Let's start from what we want to get with the engine and move forward from this base.
What do you think?



В Пт, 27/09/2019 в 16:15 +0300, Roman Kondakov пишет:
> Maxim, Nikolay,
> 
> I've listed two issues which show the ideological flaws of the current 
> engine.
> 
> 1. IGNITE-11448 - Open. This ticket describes the impossibility of 
> executing queries which can not be fit in the hardcoded one pass 
> map-reduce paradigm.
> 
> 2. IGNITE-6085 - Closed (won't fix) - This ticket describes the second 
> major problem with the current engine: H2 query optimizer is very 
> primitive and can not perform many useful optimizations.
> 
> These two points (single map-reduce execution and inflexible optimizer) 
> are the main problems with the current engine. It means that our engine 
> is currently  suitable for execution only a very limited subset of the 
> typical SQL queries. For example it can not even run most of the TPC-H 
> benchmark queries because they don't fit to the simple map-reduce paradigm.
> 
> > All I see is links to two tickets:
> 
> How many tickets would satisfy you? I named two. And it looks like it is 
> not enough from your point of view. Ok, so how many is enough? The set 
> of problems caused by listed above tickets is infinite, therefore I can 
> not create a ticket for each of them.
> > Tech details also should be added.
> 
> Tech details are in the tickets.
> 
> > We can't discuss such a huge change as an execution engine replacement with descrition like:
> > "No data co-location control, i.e. arbitrary data can be returned silently" or
> > "Low control on how query executes internally, as a result we have limited possibility to implement improvements/fixes."
> 
> Why not? Don't you understand these problems? Or you don't think this is 
> a problem?
> 
> > Let's make these descriptions more specific.
> 
> What do you mean by "more specific"? What is the criteria of the 
> specific description?
> 
> 
> 
> Nikolay, Maxim, I understand that our arguments may not be as obvious 
> for you as it obvious for SQL team. So, please arrange your questions in 
> a more constructive way.
> 
> Thank you!

Re: New SQL execution engine

Posted by Roman Kondakov <ko...@mail.ru.INVALID>.

Maxim, Nikolay,

I've listed two issues which show the ideological flaws of the current 
engine.

1. IGNITE-11448 - Open. This ticket describes the impossibility of 
executing queries which can not be fit in the hardcoded one pass 
map-reduce paradigm.

2. IGNITE-6085 - Closed (won't fix) - This ticket describes the second 
major problem with the current engine: H2 query optimizer is very 
primitive and can not perform many useful optimizations.

These two points (single map-reduce execution and inflexible optimizer) 
are the main problems with the current engine. It means that our engine 
is currently  suitable for execution only a very limited subset of the 
typical SQL queries. For example it can not even run most of the TPC-H 
benchmark queries because they don't fit to the simple map-reduce paradigm.

> All I see is links to two tickets:
How many tickets would satisfy you? I named two. And it looks like it is 
not enough from your point of view. Ok, so how many is enough? The set 
of problems caused by listed above tickets is infinite, therefore I can 
not create a ticket for each of them.
> Tech details also should be added.

Tech details are in the tickets.

> We can't discuss such a huge change as an execution engine replacement with descrition like:
> "No data co-location control, i.e. arbitrary data can be returned silently" or
> "Low control on how query executes internally, as a result we have limited possibility to implement improvements/fixes."
Why not? Don't you understand these problems? Or you don't think this is 
a problem?

> Let's make these descriptions more specific.
What do you mean by "more specific"? What is the criteria of the 
specific description?



Nikolay, Maxim, I understand that our arguments may not be as obvious 
for you as it obvious for SQL team. So, please arrange your questions in 
a more constructive way.

Thank you!
-- 
Kind Regards
Roman Kondakov

On 27.09.2019 15:32, Maxim Muzafarov wrote:
> Folks,
>
> I agree with Nikolay, the idea of replacing the H2 engine with the
> most suitable one is reasonable. But since such change is major we
> should have a strong argumentation on it even for members with are
> working outside the SQL-team.
>
> I think it is really necessary to have:
>
> 1. The list of issues related to the current engine (H2) which from
> different points of view and for different developers must seem
> unsolvable. For example, `... the H2 execution plan is hard-wired with
> H2 internals and can't be easily transformed` seems doesn't have a
> strong technical argumentation.
> After this step, we should have a clear understanding that the engine
> change is required.
>
> 2. Why only the Apache Calcite? It seems to me we should have a table
> with a comparison of different engines with the pros and cons of each
> other. A brief search shows me that we may have a few options here.
> After this step, we should have a clear understanding of why we choose
> this dependency prior to another.
>
> 3. We should also have a migration decomposition and step by step
> actions to do. I haven't found such a decomposition on IEP-37 page. Do
> we have one? What the implementation phases will be? What components
> will be changed? What a new API would be and would it be? What
> problems we are expecting e.g performance degradation on prototype
> implementation? `Risks and Assumptions` topic doesn't seem to be a
> good described.
> After this step, we should have a clear and obvious a new feature
> implementation plan.
>
> Let's have a strong technical discussion.
>
> On Fri, 27 Sep 2019 at 15:17, Nikolay Izhikov <ni...@apache.org> wrote:
>> Hello, Roman.
>>
>> All I see is links to two tickets:
>>
>> IGNITE-11448 - Open
>> IGNITE-6085 - Closed
>>
>> Other issues described poorly and have not ticket links.
>> We can't discuss such a huge change as an execution engine replacement with descrition like:
>>
>> "No data co-location control, i.e. arbitrary data can be returned silently" or
>> "Low control on how query executes internally, as a result we have limited possibility to implement improvements/fixes."
>>
>> I think we need some reproducer that shows issue.
>> Tech details also should be added.
>>
>> Let's make these descriptions more specific.
>> Let's discuss how we want to fix them with the new engine.
>>
>>
>> В Пт, 27/09/2019 в 15:10 +0300, Roman Kondakov пишет:
>>> Hello Nikolay,
>>>
>>> please see IEP--37 [1]. Issues are there.
>>>
>>>
>>> [1]
>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130028084
>>>
>>>

Re: New SQL execution engine

Posted by Maxim Muzafarov <mm...@apache.org>.

Folks,

I agree with Nikolay, the idea of replacing the H2 engine with the
most suitable one is reasonable. But since such change is major we
should have a strong argumentation on it even for members with are
working outside the SQL-team.

I think it is really necessary to have:

1. The list of issues related to the current engine (H2) which from
different points of view and for different developers must seem
unsolvable. For example, `... the H2 execution plan is hard-wired with
H2 internals and can't be easily transformed` seems doesn't have a
strong technical argumentation.
After this step, we should have a clear understanding that the engine
change is required.

2. Why only the Apache Calcite? It seems to me we should have a table
with a comparison of different engines with the pros and cons of each
other. A brief search shows me that we may have a few options here.
After this step, we should have a clear understanding of why we choose
this dependency prior to another.

3. We should also have a migration decomposition and step by step
actions to do. I haven't found such a decomposition on IEP-37 page. Do
we have one? What the implementation phases will be? What components
will be changed? What a new API would be and would it be? What
problems we are expecting e.g performance degradation on prototype
implementation? `Risks and Assumptions` topic doesn't seem to be a
good described.
After this step, we should have a clear and obvious a new feature
implementation plan.

Let's have a strong technical discussion.

On Fri, 27 Sep 2019 at 15:17, Nikolay Izhikov <ni...@apache.org> wrote:
>
> Hello, Roman.
>
> All I see is links to two tickets:
>
> IGNITE-11448 - Open
> IGNITE-6085 - Closed
>
> Other issues described poorly and have not ticket links.
> We can't discuss such a huge change as an execution engine replacement with descrition like:
>
> "No data co-location control, i.e. arbitrary data can be returned silently" or
> "Low control on how query executes internally, as a result we have limited possibility to implement improvements/fixes."
>
> I think we need some reproducer that shows issue.
> Tech details also should be added.
>
> Let's make these descriptions more specific.
> Let's discuss how we want to fix them with the new engine.
>
>
> В Пт, 27/09/2019 в 15:10 +0300, Roman Kondakov пишет:
> > Hello Nikolay,
> >
> > please see IEP--37 [1]. Issues are there.
> >
> >
> > [1]
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130028084
> >
> >

Re: New SQL execution engine

Posted by Nikolay Izhikov <ni...@apache.org>.

Hello, Roman.

All I see is links to two tickets:

IGNITE-11448 - Open
IGNITE-6085 - Closed

Other issues described poorly and have not ticket links.
We can't discuss such a huge change as an execution engine replacement with descrition like:

"No data co-location control, i.e. arbitrary data can be returned silently" or
"Low control on how query executes internally, as a result we have limited possibility to implement improvements/fixes."

I think we need some reproducer that shows issue.
Tech details also should be added.

Let's make these descriptions more specific. 
Let's discuss how we want to fix them with the new engine.


В Пт, 27/09/2019 в 15:10 +0300, Roman Kondakov пишет:
> Hello Nikolay,
> 
> please see IEP--37 [1]. Issues are there.
> 
> 
> [1] 
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130028084
> 
>

Re: New SQL execution engine

Posted by Roman Kondakov <ko...@mail.ru.INVALID>.

Hello Nikolay,

please see IEP--37 [1]. Issues are there.


[1] 
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130028084


-- 
Kind Regards
Roman Kondakov

On 27.09.2019 14:20, Nikolay Izhikov wrote:
> Hello, Roman.
>
>> Also Apache Calcite is commonly used in popular Apache projects
> I don't think it's a good point.
> H2 is also commonly used.
> But, it doesn't conform to Ignite requirements.
>
> Can you, please, write down issues and engine requirements to the IEP?
> So we can discuss each point separately.
>
>
> В Пт, 27/09/2019 в 13:56 +0300, Roman Kondakov пишет:
>> Hello Nikolay.
>>
>> You've asked very good questions. I'll try to answer.
>>
>>> 1. What the exact issues with the H2 integration?
>>> Can you send a tickets links?
>>> Can we label all H2 integration issues in JIRA? I propose to use "h2" label.
>> Current SQL engine is confined in the single-pass map-reduce algorithm.
>> This make impossible to execute complex queries which can not be
>> expressed with a single map-reduce pass like subqueries with aggregates
>> [1].  Another problem is that H2 optimizer is very primitive and not
>> able to perform many useful optimizations [2].
>>
>> Also Apache Calcite is commonly used in popular Apache projects like
>> Hive, Drill, Flink and others [3]. So it's mature and well battle tested
>> framework, while H2 is a toy database which is hardly ever used in the
>> real production systems.
>>
>>> 2. What are the requirements for the new SQL engine?
>>> We should write it down and discuss.
>> The main requirement is to fix the problems listed above. The new SQL
>> engine should be able to *effectively* execute SQL queries of the
>> *arbitrary complexity*. For example the new engine will be able to
>> perform distributed joins in a multiple ways [4], when current engine
>> can do it only in two ways: collocated and distributed (the latter is
>> usually not very efficient and needed to set manually).
>>
>>> 3. What options do we have?
>>> Are there any alternatives to Calcite on the market?
>>> We did the wrong choice that looked obvious one time.
>>> So we should carefully avoid it at this time.
>> I know the only one open source implementation of the efficient query
>> optimization strategy - and this is Apache Calcite. The alternative way
>> is to write our own query optimizer from scratch which is not a trivial
>> task at all.
>>
>>
>>> 4. What is improvements of Ignite we want to make with the new engine?
>> Ignite will be able to execute complex queries using optimal strategy. I
>> think this is a quite good improvement.
>>
>>
>> [1] https://issues.apache.org/jira/browse/IGNITE-11448
>> [2] https://issues.apache.org/jira/browse/IGNITE-6085
>> [3] https://calcite.apache.org/docs/powered_by.html
>> [4] https://www.memsql.com/blog/scaling-distributed-joins/

Re: New SQL execution engine

Posted by Nikolay Izhikov <ni...@apache.org>.

Hello, Roman.

> Also Apache Calcite is commonly used in popular Apache projects

I don't think it's a good point.
H2 is also commonly used.
But, it doesn't conform to Ignite requirements.

Can you, please, write down issues and engine requirements to the IEP?
So we can discuss each point separately.


В Пт, 27/09/2019 в 13:56 +0300, Roman Kondakov пишет:
> Hello Nikolay.
> 
> You've asked very good questions. I'll try to answer.
> 
> > 1. What the exact issues with the H2 integration?
> > Can you send a tickets links?
> > Can we label all H2 integration issues in JIRA? I propose to use "h2" label.
> 
> Current SQL engine is confined in the single-pass map-reduce algorithm. 
> This make impossible to execute complex queries which can not be 
> expressed with a single map-reduce pass like subqueries with aggregates 
> [1].  Another problem is that H2 optimizer is very primitive and not 
> able to perform many useful optimizations [2].
> 
> Also Apache Calcite is commonly used in popular Apache projects like 
> Hive, Drill, Flink and others [3]. So it's mature and well battle tested 
> framework, while H2 is a toy database which is hardly ever used in the 
> real production systems.
> 
> > 2. What are the requirements for the new SQL engine?
> > We should write it down and discuss.
> 
> The main requirement is to fix the problems listed above. The new SQL 
> engine should be able to *effectively* execute SQL queries of the 
> *arbitrary complexity*. For example the new engine will be able to 
> perform distributed joins in a multiple ways [4], when current engine 
> can do it only in two ways: collocated and distributed (the latter is 
> usually not very efficient and needed to set manually).
> 
> > 3. What options do we have?
> > Are there any alternatives to Calcite on the market?
> > We did the wrong choice that looked obvious one time.
> > So we should carefully avoid it at this time.
> 
> I know the only one open source implementation of the efficient query 
> optimization strategy - and this is Apache Calcite. The alternative way 
> is to write our own query optimizer from scratch which is not a trivial 
> task at all.
> 
> 
> > 4. What is improvements of Ignite we want to make with the new engine?
> 
> Ignite will be able to execute complex queries using optimal strategy. I 
> think this is a quite good improvement.
> 
> 
> [1] https://issues.apache.org/jira/browse/IGNITE-11448
> [2] https://issues.apache.org/jira/browse/IGNITE-6085
> [3] https://calcite.apache.org/docs/powered_by.html
> [4] https://www.memsql.com/blog/scaling-distributed-joins/

Re: New SQL execution engine

Posted by Roman Kondakov <ko...@mail.ru.INVALID>.

Hello Nikolay.

You've asked very good questions. I'll try to answer.

> 1. What the exact issues with the H2 integration?
> Can you send a tickets links?
> Can we label all H2 integration issues in JIRA? I propose to use "h2" label.
Current SQL engine is confined in the single-pass map-reduce algorithm. 
This make impossible to execute complex queries which can not be 
expressed with a single map-reduce pass like subqueries with aggregates 
[1].  Another problem is that H2 optimizer is very primitive and not 
able to perform many useful optimizations [2].

Also Apache Calcite is commonly used in popular Apache projects like 
Hive, Drill, Flink and others [3]. So it's mature and well battle tested 
framework, while H2 is a toy database which is hardly ever used in the 
real production systems.

> 2. What are the requirements for the new SQL engine?
> We should write it down and discuss.
The main requirement is to fix the problems listed above. The new SQL 
engine should be able to *effectively* execute SQL queries of the 
*arbitrary complexity*. For example the new engine will be able to 
perform distributed joins in a multiple ways [4], when current engine 
can do it only in two ways: collocated and distributed (the latter is 
usually not very efficient and needed to set manually).

> 3. What options do we have?
> Are there any alternatives to Calcite on the market?
> We did the wrong choice that looked obvious one time.
> So we should carefully avoid it at this time.
I know the only one open source implementation of the efficient query 
optimization strategy - and this is Apache Calcite. The alternative way 
is to write our own query optimizer from scratch which is not a trivial 
task at all.


> 4. What is improvements of Ignite we want to make with the new engine?
Ignite will be able to execute complex queries using optimal strategy. I 
think this is a quite good improvement.


[1] https://issues.apache.org/jira/browse/IGNITE-11448
[2] https://issues.apache.org/jira/browse/IGNITE-6085
[3] https://calcite.apache.org/docs/powered_by.html
[4] https://www.memsql.com/blog/scaling-distributed-joins/
-- 
Kind Regards
Roman Kondakov

On 27.09.2019 12:20, Nikolay Izhikov wrote:
> Hello, Igor.
>
> Thanks for starting this discussion.
>
> I think we should take a step back in it and answer the following questions:
>
> 1. What the exact issues with the H2 integration?
> Can you send a tickets links?
> Can we label all H2 integration issues in JIRA? I propose to use "h2" label.
>
> 2. What are the requirements for the new SQL engine?
> We should write it down and discuss.
>
> 3. What options do we have?
> Are there any alternatives to Calcite on the market?
> We did the wrong choice that looked obvious one time.
> So we should carefully avoid it at this time.
>
> 4. What is improvements of Ignite we want to make with the new engine?
>
>
> В Пт, 27/09/2019 в 08:44 +0000, Igor Seliverstov пишет:
>> Hi Igniters!
>>
>> As you might know currently we have many open issues relating to current H2 based engine and its execution flow.
>>
>> Some of them are critical (like impossibility to execute particular queries), some of them are majors (like impossibility to execute particular queries without pre-preparation your data to have a collocation) and many minors.
>>
>> Most of the issues cannot be solved without whole engine redesign.
>>
>> So, here the proposal: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130028084
>>
>> I'll appreciate if you share your thoughts on top of that.
>>
>> Regards,
>> Igor

Re: New SQL execution engine

Posted by Nikolay Izhikov <ni...@apache.org>.

Hello, Andrey.

> Ignite SQL layer has some issues that can't be fix with changes in Ignite
> only, and we are blocked with H2.

What are these issues?
Can you make it specific and send a tickets for this issues?

> 3. Replace H2 with smth else.

Actually, I support this decision in general.
But, to make a right choise for H2 replacement we should carefully discuss such huge replacement.

So far, I can't see any written down(in IEP) requirements for SQL engine.
Let's do it and discuss them.

В Пт, 27/09/2019 в 13:39 +0300, Andrey Mashenkov пишет:
> Hi Nikolay,
> 
> Let me add my 5- cent here.
> 
> Ignite SQL layer has some issues that can't be fix with changes in Ignite
> only, and we are blocked with H2.
> To resolve these issues we can:
> 1. Donate some changes to H2 and wait for it's next release. But there are
> more cons than pros and I think we can't rely on H2 project anymore.
> - There is no guarantee our changes will be approved by H2 community.
> - We definitely won't to depend on H2 product lifecycle.
> - New H2 features (like parallel multi-statement query processing in latest
> release) force Ignite for significant changes\refactoring in Ignite SQL
> layer with no visible benefits.
> Every next release it becomes harder to upgrade H2 dependency.
> - Latest H2 versions causes questions about their stability.
> 
> Hot issues are
> - Large intermediate results inside H2 internals can cause OOM for some
> kind of queries. Ignite can't handle this anyhow for now without reworking
> H2 code.
> - HashJoins
> - Ignite can't start multi-step queries, but 2-step (map-reduce) only.
> - It is not possible to apply optimizations on query plan as no logical
> plan actually doen't exists. H2 execution plan is hard-wired with H2
> internals and can't be easily transformed.
> Implementing a new good planner over H2 looks like a huge task.
> 
> 2. Fork H2.
> We already done this in GridGain (you can found H2 module in GridGain
> community edition) as fastest way to unblock work on SQL improvements.
> But this way doesn't look like a good one for Ignite, regarding our
> experience.
> - H2 code can't be included into Ignite at all.
> H2 license are MIT and EPL. From one side they can't be changed to Apache
> Licence. From other side Apache Foundation don't want to host any code
> licensed with other than Apache License.
> GridGain is ok with this, but Apache Foundation won't.
> 
> - We can made separate H2 fork project with it's own lifecycle with full
> control over it and publish it in Maven Central.
> This doen't seem like a big deal. But will causes additional difficulties
> in development, test and release processes of Ignite.
> This way seems bring much pain for every contributor.
> 
> 3. Replace H2 with smth else.
> E.g. with Apache Calcite.
> - Calcite is a framework and it is designed very flexible and extendable.
> - Every it's part can be replaced with our own implementation.
> - Apache License is out of the box =)
> 
> So, summary:
> 1-st way of pain we have now and it slows down Ignite SQL layer developing.
> 2-nd looks few better, but seems bring Ignite to nowhere in prospect.
> 3-rd is a risky, but promissory way.
> 
> 
> On Fri, Sep 27, 2019 at 12:16 PM Nikolay Izhikov <ni...@apache.org>
> wrote:
> 
> > Hello, Igor.
> > 
> > Thanks for starting this discussion.
> > 
> > I think we should take a step back in it and answer the following
> > questions:
> > 
> > 1. What the exact issues with the H2 integration?
> > Can you send a tickets links?
> > Can we label all H2 integration issues in JIRA? I propose to use "h2"
> > label.
> > 
> > 2. What are the requirements for the new SQL engine?
> > We should write it down and discuss.
> > 
> > 3. What options do we have?
> > Are there any alternatives to Calcite on the market?
> > We did the wrong choice that looked obvious one time.
> > So we should carefully avoid it at this time.
> > 
> > 4. What is improvements of Ignite we want to make with the new engine?
> > 
> > 
> > В Пт, 27/09/2019 в 08:44 +0000, Igor Seliverstov пишет:
> > > Hi Igniters!
> > > 
> > > As you might know currently we have many open issues relating to current
> > 
> > H2 based engine and its execution flow.
> > > 
> > > Some of them are critical (like impossibility to execute particular
> > 
> > queries), some of them are majors (like impossibility to execute particular
> > queries without pre-preparation your data to have a collocation) and many
> > minors.
> > > 
> > > Most of the issues cannot be solved without whole engine redesign.
> > > 
> > > So, here the proposal:
> > 
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130028084
> > > 
> > > I'll appreciate if you share your thoughts on top of that.
> > > 
> > > Regards,
> > > Igor
> 
>

Re: New SQL execution engine

Posted by Andrey Mashenkov <an...@gmail.com>.

Hi Nikolay,

Let me add my 5- cent here.

Ignite SQL layer has some issues that can't be fix with changes in Ignite
only, and we are blocked with H2.
To resolve these issues we can:
1. Donate some changes to H2 and wait for it's next release. But there are
more cons than pros and I think we can't rely on H2 project anymore.
- There is no guarantee our changes will be approved by H2 community.
- We definitely won't to depend on H2 product lifecycle.
- New H2 features (like parallel multi-statement query processing in latest
release) force Ignite for significant changes\refactoring in Ignite SQL
layer with no visible benefits.
Every next release it becomes harder to upgrade H2 dependency.
- Latest H2 versions causes questions about their stability.

Hot issues are
- Large intermediate results inside H2 internals can cause OOM for some
kind of queries. Ignite can't handle this anyhow for now without reworking
H2 code.
- HashJoins
- Ignite can't start multi-step queries, but 2-step (map-reduce) only.
- It is not possible to apply optimizations on query plan as no logical
plan actually doen't exists. H2 execution plan is hard-wired with H2
internals and can't be easily transformed.
Implementing a new good planner over H2 looks like a huge task.

2. Fork H2.
We already done this in GridGain (you can found H2 module in GridGain
community edition) as fastest way to unblock work on SQL improvements.
But this way doesn't look like a good one for Ignite, regarding our
experience.
- H2 code can't be included into Ignite at all.
H2 license are MIT and EPL. From one side they can't be changed to Apache
Licence. From other side Apache Foundation don't want to host any code
licensed with other than Apache License.
GridGain is ok with this, but Apache Foundation won't.

- We can made separate H2 fork project with it's own lifecycle with full
control over it and publish it in Maven Central.
This doen't seem like a big deal. But will causes additional difficulties
in development, test and release processes of Ignite.
This way seems bring much pain for every contributor.

3. Replace H2 with smth else.
E.g. with Apache Calcite.
- Calcite is a framework and it is designed very flexible and extendable.
- Every it's part can be replaced with our own implementation.
- Apache License is out of the box =)

So, summary:
1-st way of pain we have now and it slows down Ignite SQL layer developing.
2-nd looks few better, but seems bring Ignite to nowhere in prospect.
3-rd is a risky, but promissory way.


On Fri, Sep 27, 2019 at 12:16 PM Nikolay Izhikov <ni...@apache.org>
wrote:

> Hello, Igor.
>
> Thanks for starting this discussion.
>
> I think we should take a step back in it and answer the following
> questions:
>
> 1. What the exact issues with the H2 integration?
> Can you send a tickets links?
> Can we label all H2 integration issues in JIRA? I propose to use "h2"
> label.
>
> 2. What are the requirements for the new SQL engine?
> We should write it down and discuss.
>
> 3. What options do we have?
> Are there any alternatives to Calcite on the market?
> We did the wrong choice that looked obvious one time.
> So we should carefully avoid it at this time.
>
> 4. What is improvements of Ignite we want to make with the new engine?
>
>
> В Пт, 27/09/2019 в 08:44 +0000, Igor Seliverstov пишет:
> > Hi Igniters!
> >
> > As you might know currently we have many open issues relating to current
> H2 based engine and its execution flow.
> >
> > Some of them are critical (like impossibility to execute particular
> queries), some of them are majors (like impossibility to execute particular
> queries without pre-preparation your data to have a collocation) and many
> minors.
> >
> > Most of the issues cannot be solved without whole engine redesign.
> >
> > So, here the proposal:
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130028084
> >
> > I'll appreciate if you share your thoughts on top of that.
> >
> > Regards,
> > Igor
>


-- 
Best regards,
Andrey V. Mashenkov

Re: New SQL execution engine

Posted by Nikolay Izhikov <ni...@apache.org>.

Hello, Igor.

Thanks for starting this discussion.

I think we should take a step back in it and answer the following questions:

1. What the exact issues with the H2 integration?
Can you send a tickets links? 
Can we label all H2 integration issues in JIRA? I propose to use "h2" label.

2. What are the requirements for the new SQL engine?
We should write it down and discuss.

3. What options do we have?
Are there any alternatives to Calcite on the market?
We did the wrong choice that looked obvious one time. 
So we should carefully avoid it at this time.

4. What is improvements of Ignite we want to make with the new engine?


В Пт, 27/09/2019 в 08:44 +0000, Igor Seliverstov пишет:
> Hi Igniters!
> 
> As you might know currently we have many open issues relating to current H2 based engine and its execution flow.
> 
> Some of them are critical (like impossibility to execute particular queries), some of them are majors (like impossibility to execute particular queries without pre-preparation your data to have a collocation) and many minors.
> 
> Most of the issues cannot be solved without whole engine redesign. 
> 
> So, here the proposal: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130028084
> 
> I'll appreciate if you share your thoughts on top of that.
> 
> Regards,
> Igor

RE: New SQL execution engine

Posted by "Hostettler, Steve" <St...@wolterskluwer.com>.

Hi Roman,

Thanks a lot for the answer (and the pull request). As I said initially, I was under the impression that the reason was the lack of affinity.
I understand the reason and the current design and I think we all agreed that this is not optimal and that it should be reworked in the new design. Especially the sort of silent behavior. That being said, more than a warning : having joins in // inter partitions would be very helpful but I understand that it is not straightforward.

As always you guys are very reactive and helpful. Keep up the great work. Appreciate it.

-----Original Message-----
From: Roman Kondakov <ko...@mail.ru.INVALID> 
Sent: Monday, November 18, 2019 11:04 AM
To: dev@ignite.apache.org
Subject: Re: New SQL execution engine

Hi, Steve

This behavior is actually not a bug, but this is not obvious. I'll try to explain.

When query parallelism = N is turned on, it means that each cache is divided into N parts from the SQL point of view. Every SQL query is executed independently over each particular part, and then results are merged together during the reducer step.

This is absolutely identical to the distributed query execution, where instead of a single node with query parallelism = N, we have N nodes with query parallelism = 1. SQL query is executed over each partition of data on all nodes and then results are merged on reducer.

As we can see, query parallelism is equivalent to the distributed query execution. When we do joins over distributed tables, we need to think about the collocation of data [1]. If data is not collocated, we get a wrong result. This happens silently, which is not good, IMO.

I reworked your example a bit in order to impose collocation on the joining key and now join returns correct result [2].

Current approach in configuration and query execution looks very uncomfortable and should be completely redesigned in the new engine.

[1] https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fapacheignite-sql.readme.io%2Fdocs%2Fdistributed-joins&amp;data=02%7C01%7CSteve.Hostettler%40wolterskluwer.com%7C68a93ad417fc4e70ed1808d76c0e9f53%7C8ac76c91e7f141ffa89c3553b2da2c17%7C0%7C0%7C637096682368420072&amp;sdata=82bDWI1PHUOzNz95A5F%2Flyiqlrb9aQ2vadxhE%2FK47LM%3D&amp;reserved=0

[2] https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fhostettler%2FigniteParallelQueries%2Fpull%2F1&amp;data=02%7C01%7CSteve.Hostettler%40wolterskluwer.com%7C68a93ad417fc4e70ed1808d76c0e9f53%7C8ac76c91e7f141ffa89c3553b2da2c17%7C0%7C0%7C637096682368420072&amp;sdata=QCvNEKqGGyZYOXQbF0sG0DUCzYJCnKoWleFTMtngcsc%3D&amp;reserved=0

--
Kind Regards
Roman Kondakov

On 16.11.2019 12:50, steve.hostettler@gmail.com wrote:
> Actually I am now wondering whether this is not just a bug and that I 
> should record it as such. As the behavior is different with and 
> without the parallelism and there is no warning during execution or in the api.
>
> Any thought?
>
>
>
> --
> Sent from: 
> https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fapach
> e-ignite-developers.2346864.n4.nabble.com%2F&amp;data=02%7C01%7CSteve.
> Hostettler%40wolterskluwer.com%7C68a93ad417fc4e70ed1808d76c0e9f53%7C8a
> c76c91e7f141ffa89c3553b2da2c17%7C0%7C0%7C637096682368420072&amp;sdata=
> LzUii%2BuNqHhS1YbFLNwpe7cn6XRRpKrrSO6wS5zNlSU%3D&amp;reserved=0

Re: New SQL execution engine

Posted by Roman Kondakov <ko...@mail.ru.INVALID>.

Hi, Steve

This behavior is actually not a bug, but this is not obvious. I'll try 
to explain.

When query parallelism = N is turned on, it means that each cache is 
divided into N parts from the SQL point of view. Every SQL query is 
executed independently over each particular part, and then results are 
merged together during the reducer step.

This is absolutely identical to the distributed query execution, where 
instead of a single node with query parallelism = N, we have N nodes 
with query parallelism = 1. SQL query is executed over each partition of 
data on all nodes and then results are merged on reducer.

As we can see, query parallelism is equivalent to the distributed query 
execution. When we do joins over distributed tables, we need to think 
about the collocation of data [1]. If data is not collocated, we get a 
wrong result. This happens silently, which is not good, IMO.

I reworked your example a bit in order to impose collocation on the 
joining key and now join returns correct result [2].

Current approach in configuration and query execution looks very 
uncomfortable and should be completely redesigned in the new engine.

[1] https://apacheignite-sql.readme.io/docs/distributed-joins

[2] https://github.com/hostettler/igniteParallelQueries/pull/1

-- 
Kind Regards
Roman Kondakov

On 16.11.2019 12:50, steve.hostettler@gmail.com wrote:
> Actually I am now wondering whether this is not just a bug and that I should
> record it as such. As the behavior is different with and without the
> parallelism and there is no warning during execution or in the api.
>
> Any thought?
>
>
>
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/

RE: New SQL execution engine

Posted by "Hostettler, Steve" <St...@wolterskluwer.com>.

Ivan,

Thanks that is good news. I use ignite as a platform and not directly to exec in-house application so these types of things are making the generic code less  generic 😊.

Thanks a lot for the great work.

-----Original Message-----
From: Ivan Pavlukhin <vo...@gmail.com> 
Sent: Monday, November 18, 2019 10:13 AM
To: dev <de...@ignite.apache.org>
Subject: Re: New SQL execution engine

Steve,

Yep, unfortunately query parallelism in current flavor is counter-intuitive. But it was designed so =( As Roman wrote
> And of course this feature should also be available in the new engine, though it's architecture may be changed.
The architecture of parallel execution will be definitely reconsidered. And currently we are targeted to do it so in one node cluster query will return the same results regardless parallelism.

сб, 16 нояб. 2019 г. в 12:48, steve.hostettler@gmail.com
<st...@gmail.com>:
>
> Actually I am now wondering whether this is not just a bug and that I 
> should record it as such. As the behavior is different with and 
> without the parallelism and there is no warning during execution or in the api.
>
> Any thought?
>
>
>
> --
> Sent from: 
> https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fapach
> e-ignite-developers.2346864.n4.nabble.com%2F&amp;data=02%7C01%7CSteve.
> Hostettler%40wolterskluwer.com%7Cac6000fb14834d1abfa108d76c079273%7C8a
> c76c91e7f141ffa89c3553b2da2c17%7C0%7C0%7C637096652092270800&amp;sdata=
> PcitGXmdx5DittW1RMAOEeneiLfKVrydUHL8uCKGi3g%3D&amp;reserved=0



--
Best regards,
Ivan Pavlukhin

Re: New SQL execution engine

Posted by Ivan Pavlukhin <vo...@gmail.com>.

Steve,

Yep, unfortunately query parallelism in current flavor is
counter-intuitive. But it was designed so =( As Roman wrote
> And of course this feature should also be available in the new engine, though it's architecture may be changed.
The architecture of parallel execution will be definitely
reconsidered. And currently we are targeted to do it so in one node
cluster query will return the same results regardless parallelism.

сб, 16 нояб. 2019 г. в 12:48, steve.hostettler@gmail.com
<st...@gmail.com>:
>
> Actually I am now wondering whether this is not just a bug and that I should
> record it as such. As the behavior is different with and without the
> parallelism and there is no warning during execution or in the api.
>
> Any thought?
>
>
>
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/



-- 
Best regards,
Ivan Pavlukhin

RE: New SQL execution engine

Posted by "steve.hostettler@gmail.com" <st...@gmail.com>.

Actually I am now wondering whether this is not just a bug and that I should
record it as such. As the behavior is different with and without the
parallelism and there is no warning during execution or in the api.

Any thought?



--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/

RE: New SQL execution engine

Posted by "Hostettler, Steve" <St...@wolterskluwer.com>.

Hi Roman,

Actually it does not work as I expect it. Please see https://github.com/hostettler/igniteParallelQueries
Do mvn clean install and then java -jar target/ignite-parallel-query-1.0.0-SNAPSHOT-jar-with-dependencies.jar

This demonstrates that with or without the flag the query does not return the same result. I understand that it probably because I did not set an affinity but it is very counter-intuitive.

Am I missing something?

-----Original Message-----
From: Roman Kondakov <ko...@mail.ru.INVALID> 
Sent: Friday, November 15, 2019 11:46 AM
To: dev@ignite.apache.org
Subject: Re: New SQL execution engine

Hi Steve,

it is possible to execute queries in parallel even in the current engine, see docs here [1]. And of course this feature should also be available in the new engine, though it's architecture may be changed.

[1]
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fapacheignite.readme.io%2Fv2.0%2Fdocs%2Fsql-performance-and-debugging%23query-parallelism&amp;data=02%7C01%7CSteve.Hostettler%40wolterskluwer.com%7C2b752425baeb422af60408d769b9159d%7C8ac76c91e7f141ffa89c3553b2da2c17%7C0%7C0%7C637094115967087030&amp;sdata=eN7b2RCJegg8J9KQVK6TIFhcS6NG7j5pWKFxX9GWyYk%3D&amp;reserved=0

-- 
Kind Regards
Roman Kondakov

On 15.11.2019 12:53, steve.hostettler@gmail.com wrote:
> Dear all,
>
> would it be possible to also have then // execution of sql queries on single
> node with that approach?
> My understanding is that, for the moment, the SQL queries a re
> single-threaded for a given node if there is no affinity.
>
> Best Regards
>
>
>
> --
> Sent from: https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fapache-ignite-developers.2346864.n4.nabble.com%2F&amp;data=02%7C01%7CSteve.Hostettler%40wolterskluwer.com%7C2b752425baeb422af60408d769b9159d%7C8ac76c91e7f141ffa89c3553b2da2c17%7C0%7C0%7C637094115967087030&amp;sdata=jXtLMt2dWYqM4KcRFkw4lby6K0o8glKnrLFgxZ96LbQ%3D&amp;reserved=0

Re: New SQL execution engine

Posted by Roman Kondakov <ko...@mail.ru.INVALID>.

Hi Steve,

it is possible to execute queries in parallel even in the current 
engine, see docs here [1]. And of course this feature should also be 
available in the new engine, though it's architecture may be changed.

[1] 
https://apacheignite.readme.io/v2.0/docs/sql-performance-and-debugging#query-parallelism


-- 
Kind Regards
Roman Kondakov

On 15.11.2019 12:53, steve.hostettler@gmail.com wrote:
> Dear all,
>
> would it be possible to also have then // execution of sql queries on single
> node with that approach?
> My understanding is that, for the moment, the SQL queries a re
> single-threaded for a given node if there is no affinity.
>
> Best Regards
>
>
>
> --
> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/

Re: New SQL execution engine

Posted by "steve.hostettler@gmail.com" <st...@gmail.com>.

Dear all,

would it be possible to also have then // execution of sql queries on single
node with that approach?
My understanding is that, for the moment, the SQL queries a re
single-threaded for a given node if there is no affinity.

Best Regards



--
Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/

Re: New SQL execution engine

Posted by Ivan Pavlukhin <vo...@gmail.com>.

Nikolay,

Guys updated IEP [1]. Could you please check it? Are there any missing
parts needed at that stage?

[1] https://cwiki.apache.org/confluence/display/IGNITE/IEP-37%3A+New+query+execution+engine

вт, 1 окт. 2019 г. в 12:19, Ivan Pavlukhin <vo...@gmail.com>:
>
> Folks,
>
> I marked IEP-33 as obsolete. Also now the IEP-37 we currently are
> working with has a pretty URL
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-37%3A+New+query+execution+engine
>
> вт, 1 окт. 2019 г. в 11:17, Seliverstov Igor <gv...@gmail.com>:
> >
> > Nikolay,
> >
> > The document you edited is wrong (and outdated).
> >
> > Since the author meant another idea, I decided not to change IEP-35 and
> > create a new one - IEP-37 (https://cwiki.apache.org/confluence/x/NBLABw).
> > It's already have a number of key requirements.
> >
> > Regards,
> > Igor
> >
> > вт, 1 окт. 2019 г., 6:14 Nikolay Izhikov <ni...@apache.org>:
> >
> > > Hello, Igniters.
> > >
> > > I extends IEP [1] with the tickets caused by H2 limitations.
> > >
> > > Please, let's write down requirements for engine in the IEP.
> > >
> > >
> > > https://cwiki.apache.org/confluence/display/IGNITE/IEP-33%3A+New+SQL+executor+engine+infrastructure
> > >
> > > В Пн, 30/09/2019 в 17:20 -0700, Denis Magda пишет:
> > > > Ivan, we need more of these discussions, totally agree with you ;)
> > > >
> > > > I've updated the Motivation paragraph outlining some high-level users we
> > > > see by working with our users. Hope it helps. Let's carry on and let me
> > > > send a note to Apache Calcite community.
> > > >
> > > > -
> > > > Denis
> > > >
> > > >
> > > > On Mon, Sep 30, 2019 at 1:56 AM Ivan Pavlukhin <vo...@gmail.com>
> > > wrote:
> > > >
> > > > > Folks,
> > > > >
> > > > > Thanks everyone for a hot discussion! Not every open source community
> > > > > has such open and boiling discussions. It means that people here
> > > > > really do care. And I am proud of it!
> > > > >
> > > > > As I understood, nobody is strictly against the proposed initiative.
> > > > > And I am glad that we can move forward (with some steps back along the
> > > > > way).
> > > > >
> > > > > пт, 27 сент. 2019 г. в 19:29, Nikolay Izhikov <ni...@apache.org>:
> > > > > >
> > > > > > Hello, Denis.
> > > > > >
> > > > > > Thanks for the clarifications.
> > > > > >
> > > > > > Sounds good for me.
> > > > > > All I try to say in this thread:
> > > > > > Guys, please, let's take a step back and write down
> > > requirements(what we
> > > > >
> > > > > want to get with SQL engine).
> > > > > > Which features and use-cases are primary for us.
> > > > > >
> > > > > > I'm sure you have done it, already during your research.
> > > > > >
> > > > > > Please, share it with the community.
> > > > > >
> > > > > > I'm pretty sure we would back to this document again and again during
> > > > >
> > > > > migration.
> > > > > > So good written design is worth it.
> > > > > >
> > > > > > В Пт, 27/09/2019 в 09:10 -0700, Denis Magda пишет:
> > > > > > > Ignite mates, let me try to move the discussion in a constructive
> > > way.
> > > > >
> > > > > It
> > > > > > > looks like we set a wrong context from the very beginning.
> > > > > > >
> > > > > > > Before proposing this idea to the community, some of us were
> > > > > > > discussing/researching the topic in different groups (the one need
> > > to
> > > > >
> > > > > think
> > > > > > > it through first before even suggesting to consider changes of this
> > > > > > > magnitude). The day has come to share this idea with the whole
> > > > >
> > > > > community
> > > > > > > and outline the next actions. But (!) nobody is 100% sure that
> > > that's
> > > > >
> > > > > the
> > > > > > > right decision. Thus, this will be an *experiment*, some of our
> > > > >
> > > > > community
> > > > > > > members will be developing a *prototype* and only based on the
> > > > >
> > > > > prototype
> > > > > > > outcomes we shall make a final decision. Igor, Roman, Ivan, Andrey,
> > > > >
> > > > > hope
> > > > > > > that nothing has changed and we're on the same page here.
> > > > > > >
> > > > > > > Many technical and architectural reasons that justify this project
> > > have
> > > > > > > been shared but let me throw in my perspective. There is nothing
> > > wrong
> > > > >
> > > > > with
> > > > > > > H2, that was the right choice for that time.  Thanks to H2 and
> > > Ignite
> > > > >
> > > > > SQL
> > > > > > > APIs, our project is used across hundreds of deployments who are
> > > > > > > accelerating relational databases or use Ignite as a system of
> > > records.
> > > > > > > However, these days many more companies are migrating to
> > > *distributed*
> > > > > > > databases that speak SQL. For instance, if a couple of years ago 1
> > > out
> > > > >
> > > > > of
> > > > > > > 10 use cases needed support for multi-joins queries or queries with
> > > > > > > subselects or efficient memory usage then today there are 5 out of
> > > 10
> > > > >
> > > > > use
> > > > > > > cases of this kind; in the foreseeable future, it will be a 10 out
> > > of
> > > > >
> > > > > 10.
> > > > > > > So, the evolution is in progress -- the relational world goes
> > > > >
> > > > > distributed,
> > > > > > > it became exhaustive for both Ignite SQL maintainers and experts
> > > who
> > > > >
> > > > > help
> > > > > > > to tune it for production usage to keep pace with the evolution
> > > mostly
> > > > >
> > > > > due
> > > > > > > to the H2-dependency. Thus, Ignite SQL has to evolve and has to be
> > > > >
> > > > > ready to
> > > > > > > face the future reality.
> > > > > > >
> > > > > > > Luckily, we don't need to rush and don't have the right to rush
> > > because
> > > > > > > hundreds existing users have already trusted their production
> > > > >
> > > > > environments
> > > > > > > to Ignite SQL and we need to roll out changes with such a big
> > > impact
> > > > > > > carefully. So, I'm excited that Roman, Igor, Ivan, Andrey stepped
> > > in
> > > > >
> > > > > and
> > > > > > > agreed to be the first contributors who will be *experimenting*
> > > with
> > > > >
> > > > > the
> > > > > > > new SQL engine. Let's support them; let's connect them with Apache
> > > > >
> > > > > Calcite
> > > > > > > community and see how this story evolves.  Folks, please keep the
> > > > >
> > > > > community
> > > > > > > aware of the progress, let us know when help is needed, some of us
> > > > >
> > > > > will be
> > > > > > > ready to support with development once you create a solid
> > > foundation
> > > > >
> > > > > for
> > > > > > > the prototype.
> > > > > > >
> > > > > > > -
> > > > > > > Denis
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Sep 27, 2019 at 1:45 AM Igor Seliverstov <
> > > > >
> > > > > gvvinblade@apache.org>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Igniters!
> > > > > > > >
> > > > > > > > As you might know currently we have many open issues relating to
> > > > >
> > > > > current
> > > > > > > > H2 based engine and its execution flow.
> > > > > > > >
> > > > > > > > Some of them are critical (like impossibility to execute
> > > particular
> > > > > > > > queries), some of them are majors (like impossibility to execute
> > > > >
> > > > > particular
> > > > > > > > queries without pre-preparation your data to have a collocation)
> > > and
> > > > >
> > > > > many
> > > > > > > > minors.
> > > > > > > >
> > > > > > > > Most of the issues cannot be solved without whole engine
> > > redesign.
> > > > > > > >
> > > > > > > > So, here the proposal:
> > > > > > > >
> > > > >
> > > > >
> > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130028084
> > > > > > > >
> > > > > > > > I'll appreciate if you share your thoughts on top of that.
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > > Igor
> > > > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best regards,
> > > > > Ivan Pavlukhin
> > > > >
> > >
>
>
>
> --
> Best regards,
> Ivan Pavlukhin



-- 
Best regards,
Ivan Pavlukhin

Re: New SQL execution engine

Posted by Ivan Pavlukhin <vo...@gmail.com>.

Folks,

I marked IEP-33 as obsolete. Also now the IEP-37 we currently are
working with has a pretty URL
https://cwiki.apache.org/confluence/display/IGNITE/IEP-37%3A+New+query+execution+engine

вт, 1 окт. 2019 г. в 11:17, Seliverstov Igor <gv...@gmail.com>:
>
> Nikolay,
>
> The document you edited is wrong (and outdated).
>
> Since the author meant another idea, I decided not to change IEP-35 and
> create a new one - IEP-37 (https://cwiki.apache.org/confluence/x/NBLABw).
> It's already have a number of key requirements.
>
> Regards,
> Igor
>
> вт, 1 окт. 2019 г., 6:14 Nikolay Izhikov <ni...@apache.org>:
>
> > Hello, Igniters.
> >
> > I extends IEP [1] with the tickets caused by H2 limitations.
> >
> > Please, let's write down requirements for engine in the IEP.
> >
> >
> > https://cwiki.apache.org/confluence/display/IGNITE/IEP-33%3A+New+SQL+executor+engine+infrastructure
> >
> > В Пн, 30/09/2019 в 17:20 -0700, Denis Magda пишет:
> > > Ivan, we need more of these discussions, totally agree with you ;)
> > >
> > > I've updated the Motivation paragraph outlining some high-level users we
> > > see by working with our users. Hope it helps. Let's carry on and let me
> > > send a note to Apache Calcite community.
> > >
> > > -
> > > Denis
> > >
> > >
> > > On Mon, Sep 30, 2019 at 1:56 AM Ivan Pavlukhin <vo...@gmail.com>
> > wrote:
> > >
> > > > Folks,
> > > >
> > > > Thanks everyone for a hot discussion! Not every open source community
> > > > has such open and boiling discussions. It means that people here
> > > > really do care. And I am proud of it!
> > > >
> > > > As I understood, nobody is strictly against the proposed initiative.
> > > > And I am glad that we can move forward (with some steps back along the
> > > > way).
> > > >
> > > > пт, 27 сент. 2019 г. в 19:29, Nikolay Izhikov <ni...@apache.org>:
> > > > >
> > > > > Hello, Denis.
> > > > >
> > > > > Thanks for the clarifications.
> > > > >
> > > > > Sounds good for me.
> > > > > All I try to say in this thread:
> > > > > Guys, please, let's take a step back and write down
> > requirements(what we
> > > >
> > > > want to get with SQL engine).
> > > > > Which features and use-cases are primary for us.
> > > > >
> > > > > I'm sure you have done it, already during your research.
> > > > >
> > > > > Please, share it with the community.
> > > > >
> > > > > I'm pretty sure we would back to this document again and again during
> > > >
> > > > migration.
> > > > > So good written design is worth it.
> > > > >
> > > > > В Пт, 27/09/2019 в 09:10 -0700, Denis Magda пишет:
> > > > > > Ignite mates, let me try to move the discussion in a constructive
> > way.
> > > >
> > > > It
> > > > > > looks like we set a wrong context from the very beginning.
> > > > > >
> > > > > > Before proposing this idea to the community, some of us were
> > > > > > discussing/researching the topic in different groups (the one need
> > to
> > > >
> > > > think
> > > > > > it through first before even suggesting to consider changes of this
> > > > > > magnitude). The day has come to share this idea with the whole
> > > >
> > > > community
> > > > > > and outline the next actions. But (!) nobody is 100% sure that
> > that's
> > > >
> > > > the
> > > > > > right decision. Thus, this will be an *experiment*, some of our
> > > >
> > > > community
> > > > > > members will be developing a *prototype* and only based on the
> > > >
> > > > prototype
> > > > > > outcomes we shall make a final decision. Igor, Roman, Ivan, Andrey,
> > > >
> > > > hope
> > > > > > that nothing has changed and we're on the same page here.
> > > > > >
> > > > > > Many technical and architectural reasons that justify this project
> > have
> > > > > > been shared but let me throw in my perspective. There is nothing
> > wrong
> > > >
> > > > with
> > > > > > H2, that was the right choice for that time.  Thanks to H2 and
> > Ignite
> > > >
> > > > SQL
> > > > > > APIs, our project is used across hundreds of deployments who are
> > > > > > accelerating relational databases or use Ignite as a system of
> > records.
> > > > > > However, these days many more companies are migrating to
> > *distributed*
> > > > > > databases that speak SQL. For instance, if a couple of years ago 1
> > out
> > > >
> > > > of
> > > > > > 10 use cases needed support for multi-joins queries or queries with
> > > > > > subselects or efficient memory usage then today there are 5 out of
> > 10
> > > >
> > > > use
> > > > > > cases of this kind; in the foreseeable future, it will be a 10 out
> > of
> > > >
> > > > 10.
> > > > > > So, the evolution is in progress -- the relational world goes
> > > >
> > > > distributed,
> > > > > > it became exhaustive for both Ignite SQL maintainers and experts
> > who
> > > >
> > > > help
> > > > > > to tune it for production usage to keep pace with the evolution
> > mostly
> > > >
> > > > due
> > > > > > to the H2-dependency. Thus, Ignite SQL has to evolve and has to be
> > > >
> > > > ready to
> > > > > > face the future reality.
> > > > > >
> > > > > > Luckily, we don't need to rush and don't have the right to rush
> > because
> > > > > > hundreds existing users have already trusted their production
> > > >
> > > > environments
> > > > > > to Ignite SQL and we need to roll out changes with such a big
> > impact
> > > > > > carefully. So, I'm excited that Roman, Igor, Ivan, Andrey stepped
> > in
> > > >
> > > > and
> > > > > > agreed to be the first contributors who will be *experimenting*
> > with
> > > >
> > > > the
> > > > > > new SQL engine. Let's support them; let's connect them with Apache
> > > >
> > > > Calcite
> > > > > > community and see how this story evolves.  Folks, please keep the
> > > >
> > > > community
> > > > > > aware of the progress, let us know when help is needed, some of us
> > > >
> > > > will be
> > > > > > ready to support with development once you create a solid
> > foundation
> > > >
> > > > for
> > > > > > the prototype.
> > > > > >
> > > > > > -
> > > > > > Denis
> > > > > >
> > > > > >
> > > > > > On Fri, Sep 27, 2019 at 1:45 AM Igor Seliverstov <
> > > >
> > > > gvvinblade@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Igniters!
> > > > > > >
> > > > > > > As you might know currently we have many open issues relating to
> > > >
> > > > current
> > > > > > > H2 based engine and its execution flow.
> > > > > > >
> > > > > > > Some of them are critical (like impossibility to execute
> > particular
> > > > > > > queries), some of them are majors (like impossibility to execute
> > > >
> > > > particular
> > > > > > > queries without pre-preparation your data to have a collocation)
> > and
> > > >
> > > > many
> > > > > > > minors.
> > > > > > >
> > > > > > > Most of the issues cannot be solved without whole engine
> > redesign.
> > > > > > >
> > > > > > > So, here the proposal:
> > > > > > >
> > > >
> > > >
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130028084
> > > > > > >
> > > > > > > I'll appreciate if you share your thoughts on top of that.
> > > > > > >
> > > > > > > Regards,
> > > > > > > Igor
> > > > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > > Ivan Pavlukhin
> > > >
> >



-- 
Best regards,
Ivan Pavlukhin

Re: New SQL execution engine

Posted by Seliverstov Igor <gv...@gmail.com>.

Nikolay,

The document you edited is wrong (and outdated).

Since the author meant another idea, I decided not to change IEP-35 and
create a new one - IEP-37 (https://cwiki.apache.org/confluence/x/NBLABw).
It's already have a number of key requirements.

Regards,
Igor

вт, 1 окт. 2019 г., 6:14 Nikolay Izhikov <ni...@apache.org>:

> Hello, Igniters.
>
> I extends IEP [1] with the tickets caused by H2 limitations.
>
> Please, let's write down requirements for engine in the IEP.
>
>
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-33%3A+New+SQL+executor+engine+infrastructure
>
> В Пн, 30/09/2019 в 17:20 -0700, Denis Magda пишет:
> > Ivan, we need more of these discussions, totally agree with you ;)
> >
> > I've updated the Motivation paragraph outlining some high-level users we
> > see by working with our users. Hope it helps. Let's carry on and let me
> > send a note to Apache Calcite community.
> >
> > -
> > Denis
> >
> >
> > On Mon, Sep 30, 2019 at 1:56 AM Ivan Pavlukhin <vo...@gmail.com>
> wrote:
> >
> > > Folks,
> > >
> > > Thanks everyone for a hot discussion! Not every open source community
> > > has such open and boiling discussions. It means that people here
> > > really do care. And I am proud of it!
> > >
> > > As I understood, nobody is strictly against the proposed initiative.
> > > And I am glad that we can move forward (with some steps back along the
> > > way).
> > >
> > > пт, 27 сент. 2019 г. в 19:29, Nikolay Izhikov <ni...@apache.org>:
> > > >
> > > > Hello, Denis.
> > > >
> > > > Thanks for the clarifications.
> > > >
> > > > Sounds good for me.
> > > > All I try to say in this thread:
> > > > Guys, please, let's take a step back and write down
> requirements(what we
> > >
> > > want to get with SQL engine).
> > > > Which features and use-cases are primary for us.
> > > >
> > > > I'm sure you have done it, already during your research.
> > > >
> > > > Please, share it with the community.
> > > >
> > > > I'm pretty sure we would back to this document again and again during
> > >
> > > migration.
> > > > So good written design is worth it.
> > > >
> > > > В Пт, 27/09/2019 в 09:10 -0700, Denis Magda пишет:
> > > > > Ignite mates, let me try to move the discussion in a constructive
> way.
> > >
> > > It
> > > > > looks like we set a wrong context from the very beginning.
> > > > >
> > > > > Before proposing this idea to the community, some of us were
> > > > > discussing/researching the topic in different groups (the one need
> to
> > >
> > > think
> > > > > it through first before even suggesting to consider changes of this
> > > > > magnitude). The day has come to share this idea with the whole
> > >
> > > community
> > > > > and outline the next actions. But (!) nobody is 100% sure that
> that's
> > >
> > > the
> > > > > right decision. Thus, this will be an *experiment*, some of our
> > >
> > > community
> > > > > members will be developing a *prototype* and only based on the
> > >
> > > prototype
> > > > > outcomes we shall make a final decision. Igor, Roman, Ivan, Andrey,
> > >
> > > hope
> > > > > that nothing has changed and we're on the same page here.
> > > > >
> > > > > Many technical and architectural reasons that justify this project
> have
> > > > > been shared but let me throw in my perspective. There is nothing
> wrong
> > >
> > > with
> > > > > H2, that was the right choice for that time.  Thanks to H2 and
> Ignite
> > >
> > > SQL
> > > > > APIs, our project is used across hundreds of deployments who are
> > > > > accelerating relational databases or use Ignite as a system of
> records.
> > > > > However, these days many more companies are migrating to
> *distributed*
> > > > > databases that speak SQL. For instance, if a couple of years ago 1
> out
> > >
> > > of
> > > > > 10 use cases needed support for multi-joins queries or queries with
> > > > > subselects or efficient memory usage then today there are 5 out of
> 10
> > >
> > > use
> > > > > cases of this kind; in the foreseeable future, it will be a 10 out
> of
> > >
> > > 10.
> > > > > So, the evolution is in progress -- the relational world goes
> > >
> > > distributed,
> > > > > it became exhaustive for both Ignite SQL maintainers and experts
> who
> > >
> > > help
> > > > > to tune it for production usage to keep pace with the evolution
> mostly
> > >
> > > due
> > > > > to the H2-dependency. Thus, Ignite SQL has to evolve and has to be
> > >
> > > ready to
> > > > > face the future reality.
> > > > >
> > > > > Luckily, we don't need to rush and don't have the right to rush
> because
> > > > > hundreds existing users have already trusted their production
> > >
> > > environments
> > > > > to Ignite SQL and we need to roll out changes with such a big
> impact
> > > > > carefully. So, I'm excited that Roman, Igor, Ivan, Andrey stepped
> in
> > >
> > > and
> > > > > agreed to be the first contributors who will be *experimenting*
> with
> > >
> > > the
> > > > > new SQL engine. Let's support them; let's connect them with Apache
> > >
> > > Calcite
> > > > > community and see how this story evolves.  Folks, please keep the
> > >
> > > community
> > > > > aware of the progress, let us know when help is needed, some of us
> > >
> > > will be
> > > > > ready to support with development once you create a solid
> foundation
> > >
> > > for
> > > > > the prototype.
> > > > >
> > > > > -
> > > > > Denis
> > > > >
> > > > >
> > > > > On Fri, Sep 27, 2019 at 1:45 AM Igor Seliverstov <
> > >
> > > gvvinblade@apache.org>
> > > > > wrote:
> > > > >
> > > > > > Hi Igniters!
> > > > > >
> > > > > > As you might know currently we have many open issues relating to
> > >
> > > current
> > > > > > H2 based engine and its execution flow.
> > > > > >
> > > > > > Some of them are critical (like impossibility to execute
> particular
> > > > > > queries), some of them are majors (like impossibility to execute
> > >
> > > particular
> > > > > > queries without pre-preparation your data to have a collocation)
> and
> > >
> > > many
> > > > > > minors.
> > > > > >
> > > > > > Most of the issues cannot be solved without whole engine
> redesign.
> > > > > >
> > > > > > So, here the proposal:
> > > > > >
> > >
> > >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130028084
> > > > > >
> > > > > > I'll appreciate if you share your thoughts on top of that.
> > > > > >
> > > > > > Regards,
> > > > > > Igor
> > > > > >
> > >
> > >
> > >
> > > --
> > > Best regards,
> > > Ivan Pavlukhin
> > >
>

Re: New SQL execution engine

Posted by Nikolay Izhikov <ni...@apache.org>.

Hello, Igniters.

I extends IEP [1] with the tickets caused by H2 limitations.

Please, let's write down requirements for engine in the IEP.

https://cwiki.apache.org/confluence/display/IGNITE/IEP-33%3A+New+SQL+executor+engine+infrastructure

В Пн, 30/09/2019 в 17:20 -0700, Denis Magda пишет:
> Ivan, we need more of these discussions, totally agree with you ;)
> 
> I've updated the Motivation paragraph outlining some high-level users we
> see by working with our users. Hope it helps. Let's carry on and let me
> send a note to Apache Calcite community.
> 
> -
> Denis
> 
> 
> On Mon, Sep 30, 2019 at 1:56 AM Ivan Pavlukhin <vo...@gmail.com> wrote:
> 
> > Folks,
> > 
> > Thanks everyone for a hot discussion! Not every open source community
> > has such open and boiling discussions. It means that people here
> > really do care. And I am proud of it!
> > 
> > As I understood, nobody is strictly against the proposed initiative.
> > And I am glad that we can move forward (with some steps back along the
> > way).
> > 
> > пт, 27 сент. 2019 г. в 19:29, Nikolay Izhikov <ni...@apache.org>:
> > > 
> > > Hello, Denis.
> > > 
> > > Thanks for the clarifications.
> > > 
> > > Sounds good for me.
> > > All I try to say in this thread:
> > > Guys, please, let's take a step back and write down requirements(what we
> > 
> > want to get with SQL engine).
> > > Which features and use-cases are primary for us.
> > > 
> > > I'm sure you have done it, already during your research.
> > > 
> > > Please, share it with the community.
> > > 
> > > I'm pretty sure we would back to this document again and again during
> > 
> > migration.
> > > So good written design is worth it.
> > > 
> > > В Пт, 27/09/2019 в 09:10 -0700, Denis Magda пишет:
> > > > Ignite mates, let me try to move the discussion in a constructive way.
> > 
> > It
> > > > looks like we set a wrong context from the very beginning.
> > > > 
> > > > Before proposing this idea to the community, some of us were
> > > > discussing/researching the topic in different groups (the one need to
> > 
> > think
> > > > it through first before even suggesting to consider changes of this
> > > > magnitude). The day has come to share this idea with the whole
> > 
> > community
> > > > and outline the next actions. But (!) nobody is 100% sure that that's
> > 
> > the
> > > > right decision. Thus, this will be an *experiment*, some of our
> > 
> > community
> > > > members will be developing a *prototype* and only based on the
> > 
> > prototype
> > > > outcomes we shall make a final decision. Igor, Roman, Ivan, Andrey,
> > 
> > hope
> > > > that nothing has changed and we're on the same page here.
> > > > 
> > > > Many technical and architectural reasons that justify this project have
> > > > been shared but let me throw in my perspective. There is nothing wrong
> > 
> > with
> > > > H2, that was the right choice for that time.  Thanks to H2 and Ignite
> > 
> > SQL
> > > > APIs, our project is used across hundreds of deployments who are
> > > > accelerating relational databases or use Ignite as a system of records.
> > > > However, these days many more companies are migrating to *distributed*
> > > > databases that speak SQL. For instance, if a couple of years ago 1 out
> > 
> > of
> > > > 10 use cases needed support for multi-joins queries or queries with
> > > > subselects or efficient memory usage then today there are 5 out of 10
> > 
> > use
> > > > cases of this kind; in the foreseeable future, it will be a 10 out of
> > 
> > 10.
> > > > So, the evolution is in progress -- the relational world goes
> > 
> > distributed,
> > > > it became exhaustive for both Ignite SQL maintainers and experts who
> > 
> > help
> > > > to tune it for production usage to keep pace with the evolution mostly
> > 
> > due
> > > > to the H2-dependency. Thus, Ignite SQL has to evolve and has to be
> > 
> > ready to
> > > > face the future reality.
> > > > 
> > > > Luckily, we don't need to rush and don't have the right to rush because
> > > > hundreds existing users have already trusted their production
> > 
> > environments
> > > > to Ignite SQL and we need to roll out changes with such a big impact
> > > > carefully. So, I'm excited that Roman, Igor, Ivan, Andrey stepped in
> > 
> > and
> > > > agreed to be the first contributors who will be *experimenting* with
> > 
> > the
> > > > new SQL engine. Let's support them; let's connect them with Apache
> > 
> > Calcite
> > > > community and see how this story evolves.  Folks, please keep the
> > 
> > community
> > > > aware of the progress, let us know when help is needed, some of us
> > 
> > will be
> > > > ready to support with development once you create a solid foundation
> > 
> > for
> > > > the prototype.
> > > > 
> > > > -
> > > > Denis
> > > > 
> > > > 
> > > > On Fri, Sep 27, 2019 at 1:45 AM Igor Seliverstov <
> > 
> > gvvinblade@apache.org>
> > > > wrote:
> > > > 
> > > > > Hi Igniters!
> > > > > 
> > > > > As you might know currently we have many open issues relating to
> > 
> > current
> > > > > H2 based engine and its execution flow.
> > > > > 
> > > > > Some of them are critical (like impossibility to execute particular
> > > > > queries), some of them are majors (like impossibility to execute
> > 
> > particular
> > > > > queries without pre-preparation your data to have a collocation) and
> > 
> > many
> > > > > minors.
> > > > > 
> > > > > Most of the issues cannot be solved without whole engine redesign.
> > > > > 
> > > > > So, here the proposal:
> > > > > 
> > 
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130028084
> > > > > 
> > > > > I'll appreciate if you share your thoughts on top of that.
> > > > > 
> > > > > Regards,
> > > > > Igor
> > > > > 
> > 
> > 
> > 
> > --
> > Best regards,
> > Ivan Pavlukhin
> >

Re: New SQL execution engine

Posted by Denis Magda <dm...@apache.org>.

Ivan, we need more of these discussions, totally agree with you ;)

I've updated the Motivation paragraph outlining some high-level users we
see by working with our users. Hope it helps. Let's carry on and let me
send a note to Apache Calcite community.

-
Denis


On Mon, Sep 30, 2019 at 1:56 AM Ivan Pavlukhin <vo...@gmail.com> wrote:

> Folks,
>
> Thanks everyone for a hot discussion! Not every open source community
> has such open and boiling discussions. It means that people here
> really do care. And I am proud of it!
>
> As I understood, nobody is strictly against the proposed initiative.
> And I am glad that we can move forward (with some steps back along the
> way).
>
> пт, 27 сент. 2019 г. в 19:29, Nikolay Izhikov <ni...@apache.org>:
> >
> > Hello, Denis.
> >
> > Thanks for the clarifications.
> >
> > Sounds good for me.
> > All I try to say in this thread:
> > Guys, please, let's take a step back and write down requirements(what we
> want to get with SQL engine).
> > Which features and use-cases are primary for us.
> >
> > I'm sure you have done it, already during your research.
> >
> > Please, share it with the community.
> >
> > I'm pretty sure we would back to this document again and again during
> migration.
> > So good written design is worth it.
> >
> > В Пт, 27/09/2019 в 09:10 -0700, Denis Magda пишет:
> > > Ignite mates, let me try to move the discussion in a constructive way.
> It
> > > looks like we set a wrong context from the very beginning.
> > >
> > > Before proposing this idea to the community, some of us were
> > > discussing/researching the topic in different groups (the one need to
> think
> > > it through first before even suggesting to consider changes of this
> > > magnitude). The day has come to share this idea with the whole
> community
> > > and outline the next actions. But (!) nobody is 100% sure that that's
> the
> > > right decision. Thus, this will be an *experiment*, some of our
> community
> > > members will be developing a *prototype* and only based on the
> prototype
> > > outcomes we shall make a final decision. Igor, Roman, Ivan, Andrey,
> hope
> > > that nothing has changed and we're on the same page here.
> > >
> > > Many technical and architectural reasons that justify this project have
> > > been shared but let me throw in my perspective. There is nothing wrong
> with
> > > H2, that was the right choice for that time.  Thanks to H2 and Ignite
> SQL
> > > APIs, our project is used across hundreds of deployments who are
> > > accelerating relational databases or use Ignite as a system of records.
> > > However, these days many more companies are migrating to *distributed*
> > > databases that speak SQL. For instance, if a couple of years ago 1 out
> of
> > > 10 use cases needed support for multi-joins queries or queries with
> > > subselects or efficient memory usage then today there are 5 out of 10
> use
> > > cases of this kind; in the foreseeable future, it will be a 10 out of
> 10.
> > > So, the evolution is in progress -- the relational world goes
> distributed,
> > > it became exhaustive for both Ignite SQL maintainers and experts who
> help
> > > to tune it for production usage to keep pace with the evolution mostly
> due
> > > to the H2-dependency. Thus, Ignite SQL has to evolve and has to be
> ready to
> > > face the future reality.
> > >
> > > Luckily, we don't need to rush and don't have the right to rush because
> > > hundreds existing users have already trusted their production
> environments
> > > to Ignite SQL and we need to roll out changes with such a big impact
> > > carefully. So, I'm excited that Roman, Igor, Ivan, Andrey stepped in
> and
> > > agreed to be the first contributors who will be *experimenting* with
> the
> > > new SQL engine. Let's support them; let's connect them with Apache
> Calcite
> > > community and see how this story evolves.  Folks, please keep the
> community
> > > aware of the progress, let us know when help is needed, some of us
> will be
> > > ready to support with development once you create a solid foundation
> for
> > > the prototype.
> > >
> > > -
> > > Denis
> > >
> > >
> > > On Fri, Sep 27, 2019 at 1:45 AM Igor Seliverstov <
> gvvinblade@apache.org>
> > > wrote:
> > >
> > > > Hi Igniters!
> > > >
> > > > As you might know currently we have many open issues relating to
> current
> > > > H2 based engine and its execution flow.
> > > >
> > > > Some of them are critical (like impossibility to execute particular
> > > > queries), some of them are majors (like impossibility to execute
> particular
> > > > queries without pre-preparation your data to have a collocation) and
> many
> > > > minors.
> > > >
> > > > Most of the issues cannot be solved without whole engine redesign.
> > > >
> > > > So, here the proposal:
> > > >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130028084
> > > >
> > > > I'll appreciate if you share your thoughts on top of that.
> > > >
> > > > Regards,
> > > > Igor
> > > >
>
>
>
> --
> Best regards,
> Ivan Pavlukhin
>

Re: New SQL execution engine

Posted by Ivan Pavlukhin <vo...@gmail.com>.

Folks,

Thanks everyone for a hot discussion! Not every open source community
has such open and boiling discussions. It means that people here
really do care. And I am proud of it!

As I understood, nobody is strictly against the proposed initiative.
And I am glad that we can move forward (with some steps back along the
way).

пт, 27 сент. 2019 г. в 19:29, Nikolay Izhikov <ni...@apache.org>:
>
> Hello, Denis.
>
> Thanks for the clarifications.
>
> Sounds good for me.
> All I try to say in this thread:
> Guys, please, let's take a step back and write down requirements(what we want to get with SQL engine).
> Which features and use-cases are primary for us.
>
> I'm sure you have done it, already during your research.
>
> Please, share it with the community.
>
> I'm pretty sure we would back to this document again and again during migration.
> So good written design is worth it.
>
> В Пт, 27/09/2019 в 09:10 -0700, Denis Magda пишет:
> > Ignite mates, let me try to move the discussion in a constructive way. It
> > looks like we set a wrong context from the very beginning.
> >
> > Before proposing this idea to the community, some of us were
> > discussing/researching the topic in different groups (the one need to think
> > it through first before even suggesting to consider changes of this
> > magnitude). The day has come to share this idea with the whole community
> > and outline the next actions. But (!) nobody is 100% sure that that's the
> > right decision. Thus, this will be an *experiment*, some of our community
> > members will be developing a *prototype* and only based on the prototype
> > outcomes we shall make a final decision. Igor, Roman, Ivan, Andrey, hope
> > that nothing has changed and we're on the same page here.
> >
> > Many technical and architectural reasons that justify this project have
> > been shared but let me throw in my perspective. There is nothing wrong with
> > H2, that was the right choice for that time.  Thanks to H2 and Ignite SQL
> > APIs, our project is used across hundreds of deployments who are
> > accelerating relational databases or use Ignite as a system of records.
> > However, these days many more companies are migrating to *distributed*
> > databases that speak SQL. For instance, if a couple of years ago 1 out of
> > 10 use cases needed support for multi-joins queries or queries with
> > subselects or efficient memory usage then today there are 5 out of 10 use
> > cases of this kind; in the foreseeable future, it will be a 10 out of 10.
> > So, the evolution is in progress -- the relational world goes distributed,
> > it became exhaustive for both Ignite SQL maintainers and experts who help
> > to tune it for production usage to keep pace with the evolution mostly due
> > to the H2-dependency. Thus, Ignite SQL has to evolve and has to be ready to
> > face the future reality.
> >
> > Luckily, we don't need to rush and don't have the right to rush because
> > hundreds existing users have already trusted their production environments
> > to Ignite SQL and we need to roll out changes with such a big impact
> > carefully. So, I'm excited that Roman, Igor, Ivan, Andrey stepped in and
> > agreed to be the first contributors who will be *experimenting* with the
> > new SQL engine. Let's support them; let's connect them with Apache Calcite
> > community and see how this story evolves.  Folks, please keep the community
> > aware of the progress, let us know when help is needed, some of us will be
> > ready to support with development once you create a solid foundation for
> > the prototype.
> >
> > -
> > Denis
> >
> >
> > On Fri, Sep 27, 2019 at 1:45 AM Igor Seliverstov <gv...@apache.org>
> > wrote:
> >
> > > Hi Igniters!
> > >
> > > As you might know currently we have many open issues relating to current
> > > H2 based engine and its execution flow.
> > >
> > > Some of them are critical (like impossibility to execute particular
> > > queries), some of them are majors (like impossibility to execute particular
> > > queries without pre-preparation your data to have a collocation) and many
> > > minors.
> > >
> > > Most of the issues cannot be solved without whole engine redesign.
> > >
> > > So, here the proposal:
> > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130028084
> > >
> > > I'll appreciate if you share your thoughts on top of that.
> > >
> > > Regards,
> > > Igor
> > >



-- 
Best regards,
Ivan Pavlukhin

Re: New SQL execution engine

Posted by Nikolay Izhikov <ni...@apache.org>.

Hello, Denis.

Thanks for the clarifications.

Sounds good for me.
All I try to say in this thread: 
Guys, please, let's take a step back and write down requirements(what we want to get with SQL engine).
Which features and use-cases are primary for us.

I'm sure you have done it, already during your research.

Please, share it with the community.

I'm pretty sure we would back to this document again and again during migration.
So good written design is worth it.

В Пт, 27/09/2019 в 09:10 -0700, Denis Magda пишет:
> Ignite mates, let me try to move the discussion in a constructive way. It
> looks like we set a wrong context from the very beginning.
> 
> Before proposing this idea to the community, some of us were
> discussing/researching the topic in different groups (the one need to think
> it through first before even suggesting to consider changes of this
> magnitude). The day has come to share this idea with the whole community
> and outline the next actions. But (!) nobody is 100% sure that that's the
> right decision. Thus, this will be an *experiment*, some of our community
> members will be developing a *prototype* and only based on the prototype
> outcomes we shall make a final decision. Igor, Roman, Ivan, Andrey, hope
> that nothing has changed and we're on the same page here.
> 
> Many technical and architectural reasons that justify this project have
> been shared but let me throw in my perspective. There is nothing wrong with
> H2, that was the right choice for that time.  Thanks to H2 and Ignite SQL
> APIs, our project is used across hundreds of deployments who are
> accelerating relational databases or use Ignite as a system of records.
> However, these days many more companies are migrating to *distributed*
> databases that speak SQL. For instance, if a couple of years ago 1 out of
> 10 use cases needed support for multi-joins queries or queries with
> subselects or efficient memory usage then today there are 5 out of 10 use
> cases of this kind; in the foreseeable future, it will be a 10 out of 10.
> So, the evolution is in progress -- the relational world goes distributed,
> it became exhaustive for both Ignite SQL maintainers and experts who help
> to tune it for production usage to keep pace with the evolution mostly due
> to the H2-dependency. Thus, Ignite SQL has to evolve and has to be ready to
> face the future reality.
> 
> Luckily, we don't need to rush and don't have the right to rush because
> hundreds existing users have already trusted their production environments
> to Ignite SQL and we need to roll out changes with such a big impact
> carefully. So, I'm excited that Roman, Igor, Ivan, Andrey stepped in and
> agreed to be the first contributors who will be *experimenting* with the
> new SQL engine. Let's support them; let's connect them with Apache Calcite
> community and see how this story evolves.  Folks, please keep the community
> aware of the progress, let us know when help is needed, some of us will be
> ready to support with development once you create a solid foundation for
> the prototype.
> 
> -
> Denis
> 
> 
> On Fri, Sep 27, 2019 at 1:45 AM Igor Seliverstov <gv...@apache.org>
> wrote:
> 
> > Hi Igniters!
> > 
> > As you might know currently we have many open issues relating to current
> > H2 based engine and its execution flow.
> > 
> > Some of them are critical (like impossibility to execute particular
> > queries), some of them are majors (like impossibility to execute particular
> > queries without pre-preparation your data to have a collocation) and many
> > minors.
> > 
> > Most of the issues cannot be solved without whole engine redesign.
> > 
> > So, here the proposal:
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130028084
> > 
> > I'll appreciate if you share your thoughts on top of that.
> > 
> > Regards,
> > Igor
> >

Re: New SQL execution engine

Posted by Denis Magda <dm...@apache.org>.

Ignite mates, let me try to move the discussion in a constructive way. It
looks like we set a wrong context from the very beginning.

Before proposing this idea to the community, some of us were
discussing/researching the topic in different groups (the one need to think
it through first before even suggesting to consider changes of this
magnitude). The day has come to share this idea with the whole community
and outline the next actions. But (!) nobody is 100% sure that that's the
right decision. Thus, this will be an *experiment*, some of our community
members will be developing a *prototype* and only based on the prototype
outcomes we shall make a final decision. Igor, Roman, Ivan, Andrey, hope
that nothing has changed and we're on the same page here.

Many technical and architectural reasons that justify this project have
been shared but let me throw in my perspective. There is nothing wrong with
H2, that was the right choice for that time.  Thanks to H2 and Ignite SQL
APIs, our project is used across hundreds of deployments who are
accelerating relational databases or use Ignite as a system of records.
However, these days many more companies are migrating to *distributed*
databases that speak SQL. For instance, if a couple of years ago 1 out of
10 use cases needed support for multi-joins queries or queries with
subselects or efficient memory usage then today there are 5 out of 10 use
cases of this kind; in the foreseeable future, it will be a 10 out of 10.
So, the evolution is in progress -- the relational world goes distributed,
it became exhaustive for both Ignite SQL maintainers and experts who help
to tune it for production usage to keep pace with the evolution mostly due
to the H2-dependency. Thus, Ignite SQL has to evolve and has to be ready to
face the future reality.

Luckily, we don't need to rush and don't have the right to rush because
hundreds existing users have already trusted their production environments
to Ignite SQL and we need to roll out changes with such a big impact
carefully. So, I'm excited that Roman, Igor, Ivan, Andrey stepped in and
agreed to be the first contributors who will be *experimenting* with the
new SQL engine. Let's support them; let's connect them with Apache Calcite
community and see how this story evolves.  Folks, please keep the community
aware of the progress, let us know when help is needed, some of us will be
ready to support with development once you create a solid foundation for
the prototype.

-
Denis

On Fri, Sep 27, 2019 at 1:45 AM Igor Seliverstov <gv...@apache.org>
wrote:

> Hi Igniters!
>
> As you might know currently we have many open issues relating to current
> H2 based engine and its execution flow.
>
> Some of them are critical (like impossibility to execute particular
> queries), some of them are majors (like impossibility to execute particular
> queries without pre-preparation your data to have a collocation) and many
> minors.
>
> Most of the issues cannot be solved without whole engine redesign.
>
> So, here the proposal:
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130028084
>
> I'll appreciate if you share your thoughts on top of that.
>
> Regards,
> Igor
>

Re: New SQL execution engine

Posted by Roman Kondakov <ko...@mail.ru.INVALID>.

Hi Igor!

In my opinion using Apache Calcite for distributed SQL query 
optimization and planning is much more promising approach than using H2. 
H2 is not suitable for distributed query execution and also it has very 
limited abilities for query optimization. While Apache Calcite is the 
open source implementation of Cascade/Volcano query optimization 
framework [1,2] (other implementations: MS SQL Server, Greenplum). The 
main advantage of this framework is it's extensibility - we can change 
the optimizer behavior by simply adding or removing optimization rules 
to it. Calcite has a cost based optimizer as well as heuristic one which 
can be useful in some situations.

The main challenges I see here:

1. Implementing the distributed query planning for Apache Calcite (it is 
was primarily developed for the single-node query optimization). We can 
reuse the solution of Apache Drill [3] guys here.

2. We need to implement a new distributed query execution engine. Apache 
Calcite is a query planning framework, but not the execution one, 
besides  it has some abilities for executing queries in the single-node 
case.

3. Secondary indexes are not supported by Calcite, so we need to 
overcome this problem somehow. AFAIK Apache Phoenix [4] guys implemented 
support of the secondary indexes as a sorted materialized views.

4. Apache Calcite is a cost-based optimizer - so we need to create our 
own cost model and gather statistics to be able to choose the most 
effective query execution plans.

5. What about deprecating our current query API which has a number of 
drawbacks like using shortcuts `List<?>' as a query result or multiple 
redundant flags in `SqlFieldsQuery` (collocated, lazy, etc) which are 
useless for the new query execution engine?

[1] 
https://www.cse.iitb.ac.in/infolab/Data/Courses/CS632/Papers/Cascades-graefe.pdf
[2] 
https://www.cse.iitb.ac.in/infolab/Data/Courses/CS632/Papers/Volcano-graefe.pdf
[3] https://drill.apache.org/
[4] https://phoenix.apache.org/
-- 
Kind Regards
Roman Kondakov

On 27.09.2019 11:44, Igor Seliverstov wrote:
> Hi Igniters!
>
> As you might know currently we have many open issues relating to current H2 based engine and its execution flow.
>
> Some of them are critical (like impossibility to execute particular queries), some of them are majors (like impossibility to execute particular queries without pre-preparation your data to have a collocation) and many minors.
>
> Most of the issues cannot be solved without whole engine redesign.
>
> So, here the proposal: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=130028084
>
> I'll appreciate if you share your thoughts on top of that.
>
> Regards,
> Igor