You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ignite.apache.org by Vladimir Ozerov <vo...@gridgain.com> on 2018/02/12 08:20:51 UTC

Re: [SparkDataFrame] Query Optimization. Prototype

Hi Nikolay,

I am not sure if ticket for DECIMAL column metadata exists. If you haven't
find one under "sql" component, please feel free to create it on your own.
As far as testing of joins, I think it makes sense to start working on it
when we finish ANSI compliance testing which is already in progress.

On Wed, Jan 24, 2018 at 12:27 PM, Nikolay Izhikov <ni...@gmail.com>
wrote:

> Hello, Vladimir.
>
> Thank you for an answer.
>
> > Do you mean whether it is possible to read it from table metadata?
>
> Yes, you are right.
> I want to read scale and precision of DECIMAL column from table metadata.
>
> > This will be fixed at some point in future, but I do not have any dates
> at the moment.
>
> Is there ticket for it? I can't find it via jira search
>
> > at this moment we do not have complete list of all restrictions on our
> joins, because a lot of work is delegated to H2.
> > In some unsupported scenarios we throw an exception.
> > In other cases we return incorrect results silently (e.g. if you do not
> co-locate data and forgot to set "distributed joins" flag).
>
> Guys, Val, may be we should exclude join optimization from IGNITE-7077
> while we haven't all limitation on the hand?
>
> > We have a plan to perform excessive testing of joins (both co-located
> and distributed) and list all known limitations.
>
> Can I help somehow with this activity?
>
>
> В Ср, 24/01/2018 в 12:08 +0300, Vladimir Ozerov пишет:
> > Hi Nikolay,
> >
> > Could you please clarify your question about scale and precision? Do you
> mean whether it is possible to read it from table metadata? If yes, it is
> not possible at the moment unfortunately - we do not store information
> about lengths, scales and precision, only actual data types are passed to
> H2 (e.g. String, BigDecimal, etc.). This will be fixed at some point in
> future, but I do not have any dates at the moment.
> >
> > Now about joins - Denis, I think you provided wrong link to our internal
> GridGain docs where we accumulate information about ANSI compatibility and
> which will are going to publish on Ignite WIKI when it is ready. In any
> case, this is not what Nikolay aksed about. The question was about
> limitation of our joins which has nothing to do with ANSI standard.
> Unfortunately, at this moment we do not have complete list of all
> restrictions on our joins, because a lot of work is delegated to H2. In
> some unsupported scenarios we throw an exception. In other cases we return
> incorrect results silently (e.g. if you do not co-locate data and forgot to
> set "distributed joins" flag). We have a plan to perform excessive testing
> of joins (both co-located and distributed) and list all known limitations.
> This would require writing a lot of unit tests to cover various scenarios.
> I think we will have this information in a matter of 1-2 months.
> >
> > Vladimir.
> >
> > On Tue, Jan 23, 2018 at 11:45 PM, Denis Magda <dm...@apache.org> wrote:
> > > Agree. The unsupported functions should be mentioned on the page that
> will cover Ignite ANSI-99 compliance. We have first results available for
> CORE features of the specification:
> > > https://ggsystems.atlassian.net/wiki/spaces/GG/pages/
> 45093646/ANSI+SQL+99 <https://ggsystems.atlassian.
> net/wiki/spaces/GG/pages/45093646/ANSI+SQL+99>
> > >
> > > That’s on my radar. I’ll take care of this.
> > >
> > > —
> > > Denis
> > >
> > > > On Jan 23, 2018, at 10:31 AM, Dmitriy Setrakyan <
> dsetrakyan@apache.org> wrote:
> > > >
> > > > I think we need a page listing the unsupported functions with
> explanation
> > > > why, which is either it does not make sense in Ignite or is planned
> in
> > > > future release.
> > > >
> > > > Sergey, do you think you will be able to do it?
> > > >
> > > > D.
> > > >
> > > > On Tue, Jan 23, 2018 at 12:05 AM, Serge Puchnin <
> sergey.puchnin@gmail.com>
> > > > wrote:
> > > >
> > > >> yes, the Cust function is supporting both Ignite and H2.
> > > >>
> > > >> I've updated the documentation for next system functions:
> > > >> CASEWHEN Function, CAST, CONVERT, TABLE
> > > >>
> > > >> https://apacheignite-sql.readme.io/docs/system-functions
> > > >>
> > > >> And for my mind, next functions aren't applicable for Ignite:
> > > >> ARRAY_GET, ARRAY_LENGTH, ARRAY_CONTAINS, CSVREAD, CSVWRITE,
> DATABASE,
> > > >> DATABASE_PATH, DISK_SPACE_USED, FILE_READ, FILE_WRITE, LINK_SCHEMA,
> > > >> MEMORY_FREE, MEMORY_USED, LOCK_MODE, LOCK_TIMEOUT, READONLY,
> CURRVAL,
> > > >> AUTOCOMMIT, CANCEL_SESSION, IDENTITY, NEXTVAL, ROWNUM, SCHEMA,
> > > >> SCOPE_IDENTITY, SESSION_ID, SET, TRANSACTION_ID, TRUNCATE_VALUE,
> USER,
> > > >> H2VERSION
> > > >>
> > > >> Also an issue was created for review current documentation:
> > > >> https://issues.apache.org/jira/browse/IGNITE-7496
> > > >>
> > > >> --
> > > >> BR,
> > > >> Serge
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
> > > >>
> > >
> >
> >
>

Re: [SparkDataFrame] Query Optimization. Prototype

Posted by Denis Magda <dm...@apache.org>.
The testing of the SQL 99 *Core* specification is done and this page will be released once 2.4 goes live:
https://apacheignite-sql.readme.io/v2.3/docs/sql-conformance

However, Wiki page is already live:
https://en.wikipedia.org/wiki/SQL_compliance

—
Denis

> On Feb 12, 2018, at 12:20 AM, Vladimir Ozerov <vo...@gridgain.com> wrote:
> 
> As far as testing of joins, I think it makes sense to start working on it
> when we finish ANSI compliance testing which is already in progress.


Re: [SparkDataFrame] Query Optimization. Prototype

Posted by Valentin Kulichenko <va...@gmail.com>.
Sounds good. Let me know when you feel it's ready and I'll take a look.

-Val

On Tue, Feb 13, 2018 at 7:56 AM, Nikolay Izhikov <ni...@apache.org>
wrote:

> Hello, Valentin.
>
> > Since other parts of data frames integration are already merged,
> > I think it's a good time to resurrect this thread?
> > Does it make sense to review it right now?
> > Or you want to make some more changes?
>
> I've already merged PR [1] with current master.
> So you can review it, if you with.
>
> But I need couple of days to double check all changes.
> Extends scaladoc, etc.
>
> I will write you when PR is fully ready.
>
> [1] https://github.com/apache/ignite/pull/3397
>
>
>
> В Пн, 12/02/2018 в 13:45 -0800, Valentin Kulichenko пишет:
> > Nikolay,
> >
> > When you're talking about join optimization, what exactly are you
> referring to?
> >
> > Since other parts of data frames integration are already merged, I think
> it's a good time to resurrect this thread? Does it make sense to review it
> right now? Or you want to make some more changes?
> >
> > -Val
> >
> > On Mon, Feb 12, 2018 at 12:20 AM, Vladimir Ozerov <vo...@gridgain.com>
> wrote:
> > > Hi Nikolay,
> > >
> > > I am not sure if ticket for DECIMAL column metadata exists. If you
> haven't find one under "sql" component, please feel free to create it on
> your own. As far as testing of joins, I think it makes sense to start
> working on it when we finish ANSI compliance testing which is already in
> progress.
> > >
> > > On Wed, Jan 24, 2018 at 12:27 PM, Nikolay Izhikov <
> nizhikov.dev@gmail.com> wrote:
> > > > Hello, Vladimir.
> > > >
> > > > Thank you for an answer.
> > > >
> > > > > Do you mean whether it is possible to read it from table metadata?
> > > >
> > > > Yes, you are right.
> > > > I want to read scale and precision of DECIMAL column from table
> metadata.
> > > >
> > > > > This will be fixed at some point in future, but I do not have any
> dates at the moment.
> > > >
> > > > Is there ticket for it? I can't find it via jira search
> > > >
> > > > > at this moment we do not have complete list of all restrictions on
> our joins, because a lot of work is delegated to H2.
> > > > > In some unsupported scenarios we throw an exception.
> > > > > In other cases we return incorrect results silently (e.g. if you
> do not co-locate data and forgot to set "distributed joins" flag).
> > > >
> > > > Guys, Val, may be we should exclude join optimization from
> IGNITE-7077 while we haven't all limitation on the hand?
> > > >
> > > > > We have a plan to perform excessive testing of joins (both
> co-located and distributed) and list all known limitations.
> > > >
> > > > Can I help somehow with this activity?
> > > >
> > > >
> > > > В Ср, 24/01/2018 в 12:08 +0300, Vladimir Ozerov пишет:
> > > > > Hi Nikolay,
> > > > >
> > > > > Could you please clarify your question about scale and precision?
> Do you mean whether it is possible to read it from table metadata? If yes,
> it is not possible at the moment unfortunately - we do not store
> information about lengths, scales and precision, only actual data types are
> passed to H2 (e.g. String, BigDecimal, etc.). This will be fixed at some
> point in future, but I do not have any dates at the moment.
> > > > >
> > > > > Now about joins - Denis, I think you provided wrong link to our
> internal GridGain docs where we accumulate information about ANSI
> compatibility and which will are going to publish on Ignite WIKI when it is
> ready. In any case, this is not what Nikolay aksed about. The question was
> about limitation of our joins which has nothing to do with ANSI standard.
> Unfortunately, at this moment we do not have complete list of all
> restrictions on our joins, because a lot of work is delegated to H2. In
> some unsupported scenarios we throw an exception. In other cases we return
> incorrect results silently (e.g. if you do not co-locate data and forgot to
> set "distributed joins" flag). We have a plan to perform excessive testing
> of joins (both co-located and distributed) and list all known limitations.
> This would require writing a lot of unit tests to cover various scenarios.
> I think we will have this information in a matter of 1-2 months.
> > > > >
> > > > > Vladimir.
> > > > >
> > > > > On Tue, Jan 23, 2018 at 11:45 PM, Denis Magda <dm...@apache.org>
> wrote:
> > > > > > Agree. The unsupported functions should be mentioned on the page
> that will cover Ignite ANSI-99 compliance. We have first results available
> for CORE features of the specification:
> > > > > > https://ggsystems.atlassian.net/wiki/spaces/GG/pages/
> 45093646/ANSI+SQL+99 <https://ggsystems.atlassian.
> net/wiki/spaces/GG/pages/45093646/ANSI+SQL+99>
> > > > > >
> > > > > > That’s on my radar. I’ll take care of this.
> > > > > >
> > > > > > —
> > > > > > Denis
> > > > > >
> > > > > > > On Jan 23, 2018, at 10:31 AM, Dmitriy Setrakyan <
> dsetrakyan@apache.org> wrote:
> > > > > > >
> > > > > > > I think we need a page listing the unsupported functions with
> explanation
> > > > > > > why, which is either it does not make sense in Ignite or is
> planned in
> > > > > > > future release.
> > > > > > >
> > > > > > > Sergey, do you think you will be able to do it?
> > > > > > >
> > > > > > > D.
> > > > > > >
> > > > > > > On Tue, Jan 23, 2018 at 12:05 AM, Serge Puchnin <
> sergey.puchnin@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > >> yes, the Cust function is supporting both Ignite and H2.
> > > > > > >>
> > > > > > >> I've updated the documentation for next system functions:
> > > > > > >> CASEWHEN Function, CAST, CONVERT, TABLE
> > > > > > >>
> > > > > > >> https://apacheignite-sql.readme.io/docs/system-functions
> > > > > > >>
> > > > > > >> And for my mind, next functions aren't applicable for Ignite:
> > > > > > >> ARRAY_GET, ARRAY_LENGTH, ARRAY_CONTAINS, CSVREAD, CSVWRITE,
> DATABASE,
> > > > > > >> DATABASE_PATH, DISK_SPACE_USED, FILE_READ, FILE_WRITE,
> LINK_SCHEMA,
> > > > > > >> MEMORY_FREE, MEMORY_USED, LOCK_MODE, LOCK_TIMEOUT, READONLY,
> CURRVAL,
> > > > > > >> AUTOCOMMIT, CANCEL_SESSION, IDENTITY, NEXTVAL, ROWNUM, SCHEMA,
> > > > > > >> SCOPE_IDENTITY, SESSION_ID, SET, TRANSACTION_ID,
> TRUNCATE_VALUE, USER,
> > > > > > >> H2VERSION
> > > > > > >>
> > > > > > >> Also an issue was created for review current documentation:
> > > > > > >> https://issues.apache.org/jira/browse/IGNITE-7496
> > > > > > >>
> > > > > > >> --
> > > > > > >> BR,
> > > > > > >> Serge
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >> --
> > > > > > >> Sent from: http://apache-ignite-developers.2346864.n4.nabble.
> com/
> > > > > > >>
> > > > > >
> > > > >
> > > > >
> >
> >
>

Re: [SparkDataFrame] Query Optimization. Prototype

Posted by Nikolay Izhikov <ni...@apache.org>.
Hello, Valentin.

> Since other parts of data frames integration are already merged, 
> I think it's a good time to resurrect this thread? 
> Does it make sense to review it right now? 
> Or you want to make some more changes?

I've already merged PR [1] with current master.
So you can review it, if you with.

But I need couple of days to double check all changes. 
Extends scaladoc, etc.

I will write you when PR is fully ready.

[1] https://github.com/apache/ignite/pull/3397



В Пн, 12/02/2018 в 13:45 -0800, Valentin Kulichenko пишет:
> Nikolay,
> 
> When you're talking about join optimization, what exactly are you referring to?
> 
> Since other parts of data frames integration are already merged, I think it's a good time to resurrect this thread? Does it make sense to review it right now? Or you want to make some more changes?
> 
> -Val
> 
> On Mon, Feb 12, 2018 at 12:20 AM, Vladimir Ozerov <vo...@gridgain.com> wrote:
> > Hi Nikolay,
> > 
> > I am not sure if ticket for DECIMAL column metadata exists. If you haven't find one under "sql" component, please feel free to create it on your own. As far as testing of joins, I think it makes sense to start working on it when we finish ANSI compliance testing which is already in progress.
> > 
> > On Wed, Jan 24, 2018 at 12:27 PM, Nikolay Izhikov <ni...@gmail.com> wrote:
> > > Hello, Vladimir.
> > > 
> > > Thank you for an answer.
> > > 
> > > > Do you mean whether it is possible to read it from table metadata?
> > > 
> > > Yes, you are right.
> > > I want to read scale and precision of DECIMAL column from table metadata.
> > > 
> > > > This will be fixed at some point in future, but I do not have any dates at the moment.
> > > 
> > > Is there ticket for it? I can't find it via jira search
> > > 
> > > > at this moment we do not have complete list of all restrictions on our joins, because a lot of work is delegated to H2.
> > > > In some unsupported scenarios we throw an exception.
> > > > In other cases we return incorrect results silently (e.g. if you do not co-locate data and forgot to set "distributed joins" flag).
> > > 
> > > Guys, Val, may be we should exclude join optimization from IGNITE-7077 while we haven't all limitation on the hand?
> > > 
> > > > We have a plan to perform excessive testing of joins (both co-located and distributed) and list all known limitations.
> > > 
> > > Can I help somehow with this activity?
> > > 
> > > 
> > > В Ср, 24/01/2018 в 12:08 +0300, Vladimir Ozerov пишет:
> > > > Hi Nikolay,
> > > >
> > > > Could you please clarify your question about scale and precision? Do you mean whether it is possible to read it from table metadata? If yes, it is not possible at the moment unfortunately - we do not store information about lengths, scales and precision, only actual data types are passed to H2 (e.g. String, BigDecimal, etc.). This will be fixed at some point in future, but I do not have any dates at the moment.
> > > >
> > > > Now about joins - Denis, I think you provided wrong link to our internal GridGain docs where we accumulate information about ANSI compatibility and which will are going to publish on Ignite WIKI when it is ready. In any case, this is not what Nikolay aksed about. The question was about limitation of our joins which has nothing to do with ANSI standard. Unfortunately, at this moment we do not have complete list of all restrictions on our joins, because a lot of work is delegated to H2. In some unsupported scenarios we throw an exception. In other cases we return incorrect results silently (e.g. if you do not co-locate data and forgot to set "distributed joins" flag). We have a plan to perform excessive testing of joins (both co-located and distributed) and list all known limitations. This would require writing a lot of unit tests to cover various scenarios. I think we will have this information in a matter of 1-2 months.
> > > >
> > > > Vladimir.
> > > >
> > > > On Tue, Jan 23, 2018 at 11:45 PM, Denis Magda <dm...@apache.org> wrote:
> > > > > Agree. The unsupported functions should be mentioned on the page that will cover Ignite ANSI-99 compliance. We have first results available for CORE features of the specification:
> > > > > https://ggsystems.atlassian.net/wiki/spaces/GG/pages/45093646/ANSI+SQL+99 <https://ggsystems.atlassian.net/wiki/spaces/GG/pages/45093646/ANSI+SQL+99>
> > > > >
> > > > > That’s on my radar. I’ll take care of this.
> > > > >
> > > > > —
> > > > > Denis
> > > > >
> > > > > > On Jan 23, 2018, at 10:31 AM, Dmitriy Setrakyan <ds...@apache.org> wrote:
> > > > > >
> > > > > > I think we need a page listing the unsupported functions with explanation
> > > > > > why, which is either it does not make sense in Ignite or is planned in
> > > > > > future release.
> > > > > >
> > > > > > Sergey, do you think you will be able to do it?
> > > > > >
> > > > > > D.
> > > > > >
> > > > > > On Tue, Jan 23, 2018 at 12:05 AM, Serge Puchnin <se...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > >> yes, the Cust function is supporting both Ignite and H2.
> > > > > >>
> > > > > >> I've updated the documentation for next system functions:
> > > > > >> CASEWHEN Function, CAST, CONVERT, TABLE
> > > > > >>
> > > > > >> https://apacheignite-sql.readme.io/docs/system-functions
> > > > > >>
> > > > > >> And for my mind, next functions aren't applicable for Ignite:
> > > > > >> ARRAY_GET, ARRAY_LENGTH, ARRAY_CONTAINS, CSVREAD, CSVWRITE, DATABASE,
> > > > > >> DATABASE_PATH, DISK_SPACE_USED, FILE_READ, FILE_WRITE, LINK_SCHEMA,
> > > > > >> MEMORY_FREE, MEMORY_USED, LOCK_MODE, LOCK_TIMEOUT, READONLY, CURRVAL,
> > > > > >> AUTOCOMMIT, CANCEL_SESSION, IDENTITY, NEXTVAL, ROWNUM, SCHEMA,
> > > > > >> SCOPE_IDENTITY, SESSION_ID, SET, TRANSACTION_ID, TRUNCATE_VALUE, USER,
> > > > > >> H2VERSION
> > > > > >>
> > > > > >> Also an issue was created for review current documentation:
> > > > > >> https://issues.apache.org/jira/browse/IGNITE-7496
> > > > > >>
> > > > > >> --
> > > > > >> BR,
> > > > > >> Serge
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> --
> > > > > >> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
> > > > > >>
> > > > >
> > > >
> > > >
> 
> 

Re: [SparkDataFrame] Query Optimization. Prototype

Posted by Nikolay Izhikov <ni...@apache.org>.
Hello, Valentin.

> In general, I don't see a reason to exclude anything (especially joins) from PR. Please finalize the change and pass it to me for review.

I made some minor improvements.
Please, review my PR [1]

[1] https://github.com/apache/ignite/pull/3397

В Вт, 13/02/2018 в 11:47 -0800, Valentin Kulichenko пишет:
> Nikolay,
> 
> Non-collocated joins should be used only if there is no way to collocate. Please read here for more info: https://apacheignite-sql.readme.io/docs/distributed-joins
> 
> As for limitations, I think Vladimir is talking more about syntax related stuff, i.e. what we do or don't support from SQL compliance perspective. We depend on H2 here and therefore don't have full knowledge, so I understand that it takes time to test and document everything. But requirement to have an index for a non-collocated join is introduced by Ignite and, if it's an expected one, should be documented. *Vladimir*, can you please comment on this?
> 
> In general, I don't see a reason to exclude anything (especially joins) from PR. Please finalize the change and pass it to me for review.
> 
> -Val
> 
> On Tue, Feb 13, 2018 at 11:10 AM, Nikolay Izhikov <ni...@apache.org> wrote:
> > Valentin,
> > 
> > > Looks like this is because you enabled non-collocated joins
> > 
> > But non-collocated joins is only way to be sure that join returns correct results.
> > So in my case it's OK to enable them.
> > Am I right?
> > 
> > > do we have this documented somewhere?
> > 
> > I'm asked that in previuous mail.
> > Vladimir Ozerov give me an answer [1] I quoted for you:
> > 
> > > Unfortunately, at this moment we do not have complete list of all restrictions on our joins, because a lot of work is delegated to H2.
> > > In some unsupported scenarios we throw an exception.
> > > In other cases we return incorrect results silently (e.g. if you do not co-locate data and forgot to set "distributed joins" flag).
> > > We have a plan to perform excessive testing of joins (both co-located and distributed) and list all known limitations.
> > > This would require writing a lot of unit tests to cover various scenarios.
> > > I think we will have this information in a matter of 1-2 months.
> > 
> > So the answer is no, we haven't documentation for a join limitations.
> > 
> > That's why I propose to exclude join optimization from my PR until:
> > 
> > 1. We create documentation for all join limitations.
> > 2. Create the way to check is certain join satisfy current limitations.
> > 
> > [1] http://apache-ignite-developers.2346864.n4.nabble.com/SparkDataFrame-Query-Optimization-Prototype-tp26249p26361.html
> > 
> > В Вт, 13/02/2018 в 09:55 -0800, Valentin Kulichenko пишет:
> > > Nikolay,
> > >
> > > Looks like this is because you enabled non-collocated joins. I was not
> > > aware of this limitation though, do we have this documented somewhere?
> > >
> > > -Val
> > >
> > > On Tue, Feb 13, 2018 at 8:21 AM, Nikolay Izhikov <ni...@apache.org>
> > > wrote:
> > >
> > > > Val,
> > > >
> > > > Source code check: https://github.com/apache/ignite/blob/master/modules/
> > > > indexing/src/main/java/org/apache/ignite/internal/processors/query/h2/opt/
> > > > GridH2CollocationModel.java#L382
> > > >
> > > > Stack trace:
> > > >
> > > > javax.cache.CacheException: Failed to prepare distributed join query: join
> > > > condition does not use index [joinedCache=SQL_PUBLIC_JT2, plan=SELECT
> > > >     __Z0.ID AS __C0_0,
> > > >     __Z0.VAL1 AS __C0_1,
> > > >     __Z1.ID AS __C0_2,
> > > >     __Z1.VAL2 AS __C0_3
> > > > FROM PUBLIC.JT1 __Z0
> > > >     /* PUBLIC.JT1.__SCAN_ */
> > > > INNER JOIN PUBLIC.JT2 __Z1
> > > >     /* batched:broadcast PUBLIC.JT2.__SCAN_ */
> > > >     ON 1=1
> > > > WHERE __Z0.VAL1 = __Z1.VAL2]
> > > >         at org.apache.ignite.internal.processors.query.h2.opt.
> > > > GridH2CollocationModel.joinedWithCollocated(GridH2CollocationModel.java:
> > > > 384)
> > > >         at org.apache.ignite.internal.processors.query.h2.opt.
> > > > GridH2CollocationModel.calculate(GridH2CollocationModel.java:308)
> > > >         at org.apache.ignite.internal.processors.query.h2.opt.
> > > > GridH2CollocationModel.type(GridH2CollocationModel.java:549)
> > > >         at org.apache.ignite.internal.processors.query.h2.opt.
> > > > GridH2CollocationModel.calculate(GridH2CollocationModel.java:257)
> > > >         at org.apache.ignite.internal.processors.query.h2.opt.
> > > > GridH2CollocationModel.type(GridH2CollocationModel.java:549)
> > > >         at org.apache.ignite.internal.processors.query.h2.opt.
> > > > GridH2CollocationModel.isCollocated(GridH2CollocationModel.java:691)
> > > >         at org.apache.ignite.internal.processors.query.h2.sql.
> > > > GridSqlQuerySplitter.split(GridSqlQuerySplitter.java:239)
> > > >         at org.apache.ignite.internal.processors.query.h2.
> > > > IgniteH2Indexing.split(IgniteH2Indexing.java:1856)
> > > >         at org.apache.ignite.internal.processors.query.h2.
> > > > IgniteH2Indexing.parseAndSplit(IgniteH2Indexing.java:1818)
> > > >         at org.apache.ignite.internal.processors.query.h2.
> > > > IgniteH2Indexing.querySqlFields(IgniteH2Indexing.java:1569)
> > > >         at org.apache.ignite.internal.processors.query.
> > > > GridQueryProcessor$4.applyx(GridQueryProcessor.java:2037)
> > > >         at org.apache.ignite.internal.processors.query.
> > > > GridQueryProcessor$4.applyx(GridQueryProcessor.java:2032)
> > > >         at org.apache.ignite.internal.util.lang.IgniteOutClosureX.
> > > > apply(IgniteOutClosureX.java:36)
> > > >         at org.apache.ignite.internal.processors.query.GridQueryProcessor.
> > > > executeQuery(GridQueryProcessor.java:2553)
> > > >         at org.apache.ignite.internal.processors.query.GridQueryProcessor.
> > > > querySqlFields(GridQueryProcessor.java:2046)
> > > >         at org.apache.ignite.internal.processors.cache.
> > > > IgniteCacheProxyImpl.query(IgniteCacheProxyImpl.java:664)
> > > >         at org.apache.ignite.internal.processors.cache.
> > > > IgniteCacheProxyImpl.query(IgniteCacheProxyImpl.java:615)
> > > >         at org.apache.ignite.internal.processors.cache.
> > > > GatewayProtectedCacheProxy.query(GatewayProtectedCacheProxy.java:382)
> > > >         at org.apache.ignite.spark.JoinTestSpec.execSQL(
> > > > JoinTestSpec.scala:63)
> > > >
> > > >
> > > > В Вт, 13/02/2018 в 08:12 -0800, Valentin Kulichenko пишет:
> > > > > Nikolay,
> > > > >
> > > > > This doesn't make sense to me. Not having an index should not cause
> > > >
> > > > query to fail. What is the exception?
> > > > >
> > > > > -Val
> > > > >
> > > > > On Tue, Feb 13, 2018 at 8:07 AM, Nikolay Izhikov <ni...@apache.org>
> > > >
> > > > wrote:
> > > > > > Hello, Valentin.
> > > > > >
> > > > > > > When you're talking about join optimization, what exactly are you
> > > >
> > > > referring to?
> > > > > >
> > > > > > I'm referring to my PR [1]
> > > > > > Currently, it contains transformation from Spark joins to Ignite joins
> > > >
> > > > [2]
> > > > > >
> > > > > > But, if I understand Vladimir answer right, for now, we don't *fully*
> > > >
> > > > support SQL join queries.
> > > > > >
> > > > > > Sometimes it will work just right, in other cases, it will throw an
> > > >
> > > > exception due Ignite internal implementation.
> > > > > >
> > > > > > Please, see my example [3].
> > > > > > Query from line 4 will throw an exception.
> > > > > > The same query from line 10 will succeed, because of index creation.
> > > > > >
> > > > > > Both of them syntactically correct.
> > > > > >
> > > > > > > Unfortunately, at this moment we do not have complete list of all
> > > >
> > > > restrictions on our joins, because a lot of work is delegated to H2.
> > > > > > > In some unsupported scenarios we throw an exception.
> > > > > > > In other cases we return incorrect results silently (e.g. if you do
> > > >
> > > > not co-locate data and forgot to set "distributed joins" flag).
> > > > > > > We have a plan to perform excessive testing of joins (both
> > > >
> > > > co-located and distributed) and list all known limitations.
> > > > > > > This would require writing a lot of unit tests to cover various
> > > >
> > > > scenarios.
> > > > > > > I think we will have this information in a matter of 1-2 months.
> > > > > >
> > > > > > [1] https://github.com/apache/ignite/pull/3397
> > > > > > [2] https://github.com/apache/ignite/pull/3397/files#diff-
> > > >
> > > > 5a861613530bbce650efa50d553a0e92R227
> > > > > > [3] https://gist.github.com/nizhikov/a4389fd78636869dd38c13920b5baf2b
> > > > > >
> > > > > > В Пн, 12/02/2018 в 13:45 -0800, Valentin Kulichenko пишет:
> > > > > > > Nikolay,
> > > > > > >
> > > > > > > When you're talking about join optimization, what exactly are you
> > > >
> > > > referring to?
> > > > > > >
> > > > > > > Since other parts of data frames integration are already merged, I
> > > >
> > > > think it's a good time to resurrect this thread? Does it make sense to
> > > > review it right now? Or you want to make some more changes?
> > > > > > >
> > > > > > > -Val
> > > > > > >
> > > > > > > On Mon, Feb 12, 2018 at 12:20 AM, Vladimir Ozerov <
> > > >
> > > > vozerov@gridgain.com> wrote:
> > > > > > > > Hi Nikolay,
> > > > > > > >
> > > > > > > > I am not sure if ticket for DECIMAL column metadata exists. If you
> > > >
> > > > haven't find one under "sql" component, please feel free to create it on
> > > > your own. As far as testing of joins, I think it makes sense to start
> > > > working on it when we finish ANSI compliance testing which is already in
> > > > progress.
> > > > > > > >
> > > > > > > > On Wed, Jan 24, 2018 at 12:27 PM, Nikolay Izhikov <
> > > >
> > > > nizhikov.dev@gmail.com> wrote:
> > > > > > > > > Hello, Vladimir.
> > > > > > > > >
> > > > > > > > > Thank you for an answer.
> > > > > > > > >
> > > > > > > > > > Do you mean whether it is possible to read it from table
> > > >
> > > > metadata?
> > > > > > > > >
> > > > > > > > > Yes, you are right.
> > > > > > > > > I want to read scale and precision of DECIMAL column from table
> > > >
> > > > metadata.
> > > > > > > > >
> > > > > > > > > > This will be fixed at some point in future, but I do not have
> > > >
> > > > any dates at the moment.
> > > > > > > > >
> > > > > > > > > Is there ticket for it? I can't find it via jira search
> > > > > > > > >
> > > > > > > > > > at this moment we do not have complete list of all
> > > >
> > > > restrictions on our joins, because a lot of work is delegated to H2.
> > > > > > > > > > In some unsupported scenarios we throw an exception.
> > > > > > > > > > In other cases we return incorrect results silently (e.g. if
> > > >
> > > > you do not co-locate data and forgot to set "distributed joins" flag).
> > > > > > > > >
> > > > > > > > > Guys, Val, may be we should exclude join optimization from
> > > >
> > > > IGNITE-7077 while we haven't all limitation on the hand?
> > > > > > > > >
> > > > > > > > > > We have a plan to perform excessive testing of joins (both
> > > >
> > > > co-located and distributed) and list all known limitations.
> > > > > > > > >
> > > > > > > > > Can I help somehow with this activity?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > В Ср, 24/01/2018 в 12:08 +0300, Vladimir Ozerov пишет:
> > > > > > > > > > Hi Nikolay,
> > > > > > > > > >
> > > > > > > > > > Could you please clarify your question about scale and
> > > >
> > > > precision? Do you mean whether it is possible to read it from table
> > > > metadata? If yes, it is not possible at the moment unfortunately - we do
> > > > not store information about lengths, scales and precision, only actual data
> > > > types are passed to H2 (e.g. String, BigDecimal, etc.). This will be fixed
> > > > at some point in future, but I do not have any dates at the moment.
> > > > > > > > > >
> > > > > > > > > > Now about joins - Denis, I think you provided wrong link to
> > > >
> > > > our internal GridGain docs where we accumulate information about ANSI
> > > > compatibility and which will are going to publish on Ignite WIKI when it is
> > > > ready. In any case, this is not what Nikolay aksed about. The question was
> > > > about limitation of our joins which has nothing to do with ANSI standard.
> > > > Unfortunately, at this moment we do not have complete list of all
> > > > restrictions on our joins, because a lot of work is delegated to H2. In
> > > > some unsupported scenarios we throw an exception. In other cases we return
> > > > incorrect results silently (e.g. if you do not co-locate data and forgot to
> > > > set "distributed joins" flag). We have a plan to perform excessive testing
> > > > of joins (both co-located and distributed) and list all known limitations.
> > > > This would require writing a lot of unit tests to cover various scenarios.
> > > > I think we will have this information in a matter of 1-2 months.
> > > > > > > > > >
> > > > > > > > > > Vladimir.
> > > > > > > > > >
> > > > > > > > > > On Tue, Jan 23, 2018 at 11:45 PM, Denis Magda <
> > > >
> > > > dmagda@apache.org> wrote:
> > > > > > > > > > > Agree. The unsupported functions should be mentioned on the
> > > >
> > > > page that will cover Ignite ANSI-99 compliance. We have first results
> > > > available for CORE features of the specification:
> > > > > > > > > > > https://ggsystems.atlassian.net/wiki/spaces/GG/pages/
> > > >
> > > > 45093646/ANSI+SQL+99 <https://ggsystems.atlassian.
> > > > net/wiki/spaces/GG/pages/45093646/ANSI+SQL+99>
> > > > > > > > > > >
> > > > > > > > > > > That’s on my radar. I’ll take care of this.
> > > > > > > > > > >
> > > > > > > > > > > —
> > > > > > > > > > > Denis
> > > > > > > > > > >
> > > > > > > > > > > > On Jan 23, 2018, at 10:31 AM, Dmitriy Setrakyan <
> > > >
> > > > dsetrakyan@apache.org> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > I think we need a page listing the unsupported functions
> > > >
> > > > with explanation
> > > > > > > > > > > > why, which is either it does not make sense in Ignite or
> > > >
> > > > is planned in
> > > > > > > > > > > > future release.
> > > > > > > > > > > >
> > > > > > > > > > > > Sergey, do you think you will be able to do it?
> > > > > > > > > > > >
> > > > > > > > > > > > D.
> > > > > > > > > > > >
> > > > > > > > > > > > On Tue, Jan 23, 2018 at 12:05 AM, Serge Puchnin <
> > > >
> > > > sergey.puchnin@gmail.com>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > yes, the Cust function is supporting both Ignite and H2.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I've updated the documentation for next system functions:
> > > > > > > > > > > > > CASEWHEN Function, CAST, CONVERT, TABLE
> > > > > > > > > > > > >
> > > > > > > > > > > > > https://apacheignite-sql.readme.io/docs/system-functions
> > > > > > > > > > > > >
> > > > > > > > > > > > > And for my mind, next functions aren't applicable for
> > > >
> > > > Ignite:
> > > > > > > > > > > > > ARRAY_GET, ARRAY_LENGTH, ARRAY_CONTAINS, CSVREAD,
> > > >
> > > > CSVWRITE, DATABASE,
> > > > > > > > > > > > > DATABASE_PATH, DISK_SPACE_USED, FILE_READ, FILE_WRITE,
> > > >
> > > > LINK_SCHEMA,
> > > > > > > > > > > > > MEMORY_FREE, MEMORY_USED, LOCK_MODE, LOCK_TIMEOUT,
> > > >
> > > > READONLY, CURRVAL,
> > > > > > > > > > > > > AUTOCOMMIT, CANCEL_SESSION, IDENTITY, NEXTVAL, ROWNUM,
> > > >
> > > > SCHEMA,
> > > > > > > > > > > > > SCOPE_IDENTITY, SESSION_ID, SET, TRANSACTION_ID,
> > > >
> > > > TRUNCATE_VALUE, USER,
> > > > > > > > > > > > > H2VERSION
> > > > > > > > > > > > >
> > > > > > > > > > > > > Also an issue was created for review current
> > > >
> > > > documentation:
> > > > > > > > > > > > > https://issues.apache.org/jira/browse/IGNITE-7496
> > > > > > > > > > > > >
> > > > > > > > > > > > > --
> > > > > > > > > > > > > BR,
> > > > > > > > > > > > > Serge
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > --
> > > > > > > > > > > > > Sent from: http://apache-ignite-
> > > >
> > > > developers.2346864.n4.nabble.com/
> > > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > >
> > > > > > >
> > > > >
> > > > >
> 
> 

Re: [SparkDataFrame] Query Optimization. Prototype

Posted by Valentin Kulichenko <va...@gmail.com>.
Nikolay,

Non-collocated joins should be used only if there is no way to collocate.
Please read here for more info: https://apacheignite-sql.readme.io/docs/
distributed-joins

As for limitations, I think Vladimir is talking more about syntax related
stuff, i.e. what we do or don't support from SQL compliance perspective. We
depend on H2 here and therefore don't have full knowledge, so I understand
that it takes time to test and document everything. But requirement to have
an index for a non-collocated join is introduced by Ignite and, if it's an
expected one, should be documented. *Vladimir*, can you please comment on
this?

In general, I don't see a reason to exclude anything (especially joins)
from PR. Please finalize the change and pass it to me for review.

-Val

On Tue, Feb 13, 2018 at 11:10 AM, Nikolay Izhikov <ni...@apache.org>
wrote:

> Valentin,
>
> > Looks like this is because you enabled non-collocated joins
>
> But non-collocated joins is only way to be sure that join returns correct
> results.
> So in my case it's OK to enable them.
> Am I right?
>
> > do we have this documented somewhere?
>
> I'm asked that in previuous mail.
> Vladimir Ozerov give me an answer [1] I quoted for you:
>
> > Unfortunately, at this moment we do not have complete list of all
> restrictions on our joins, because a lot of work is delegated to H2.
> > In some unsupported scenarios we throw an exception.
> > In other cases we return incorrect results silently (e.g. if you do not
> co-locate data and forgot to set "distributed joins" flag).
> > We have a plan to perform excessive testing of joins (both co-located
> and distributed) and list all known limitations.
> > This would require writing a lot of unit tests to cover various
> scenarios.
> > I think we will have this information in a matter of 1-2 months.
>
> So the answer is no, we haven't documentation for a join limitations.
>
> That's why I propose to exclude join optimization from my PR until:
>
> 1. We create documentation for all join limitations.
> 2. Create the way to check is certain join satisfy current limitations.
>
> [1] http://apache-ignite-developers.2346864.n4.nabble.com/
> SparkDataFrame-Query-Optimization-Prototype-tp26249p26361.html
>
> В Вт, 13/02/2018 в 09:55 -0800, Valentin Kulichenko пишет:
> > Nikolay,
> >
> > Looks like this is because you enabled non-collocated joins. I was not
> > aware of this limitation though, do we have this documented somewhere?
> >
> > -Val
> >
> > On Tue, Feb 13, 2018 at 8:21 AM, Nikolay Izhikov <ni...@apache.org>
> > wrote:
> >
> > > Val,
> > >
> > > Source code check: https://github.com/apache/igni
> te/blob/master/modules/
> > > indexing/src/main/java/org/apache/ignite/internal/processors
> /query/h2/opt/
> > > GridH2CollocationModel.java#L382
> > >
> > > Stack trace:
> > >
> > > javax.cache.CacheException: Failed to prepare distributed join query:
> join
> > > condition does not use index [joinedCache=SQL_PUBLIC_JT2, plan=SELECT
> > >     __Z0.ID AS __C0_0,
> > >     __Z0.VAL1 AS __C0_1,
> > >     __Z1.ID AS __C0_2,
> > >     __Z1.VAL2 AS __C0_3
> > > FROM PUBLIC.JT1 __Z0
> > >     /* PUBLIC.JT1.__SCAN_ */
> > > INNER JOIN PUBLIC.JT2 __Z1
> > >     /* batched:broadcast PUBLIC.JT2.__SCAN_ */
> > >     ON 1=1
> > > WHERE __Z0.VAL1 = __Z1.VAL2]
> > >         at org.apache.ignite.internal.processors.query.h2.opt.
> > > GridH2CollocationModel.joinedWithCollocated(GridH2Collocatio
> nModel.java:
> > > 384)
> > >         at org.apache.ignite.internal.processors.query.h2.opt.
> > > GridH2CollocationModel.calculate(GridH2CollocationModel.java:308)
> > >         at org.apache.ignite.internal.processors.query.h2.opt.
> > > GridH2CollocationModel.type(GridH2CollocationModel.java:549)
> > >         at org.apache.ignite.internal.processors.query.h2.opt.
> > > GridH2CollocationModel.calculate(GridH2CollocationModel.java:257)
> > >         at org.apache.ignite.internal.processors.query.h2.opt.
> > > GridH2CollocationModel.type(GridH2CollocationModel.java:549)
> > >         at org.apache.ignite.internal.processors.query.h2.opt.
> > > GridH2CollocationModel.isCollocated(GridH2CollocationModel.java:691)
> > >         at org.apache.ignite.internal.processors.query.h2.sql.
> > > GridSqlQuerySplitter.split(GridSqlQuerySplitter.java:239)
> > >         at org.apache.ignite.internal.processors.query.h2.
> > > IgniteH2Indexing.split(IgniteH2Indexing.java:1856)
> > >         at org.apache.ignite.internal.processors.query.h2.
> > > IgniteH2Indexing.parseAndSplit(IgniteH2Indexing.java:1818)
> > >         at org.apache.ignite.internal.processors.query.h2.
> > > IgniteH2Indexing.querySqlFields(IgniteH2Indexing.java:1569)
> > >         at org.apache.ignite.internal.processors.query.
> > > GridQueryProcessor$4.applyx(GridQueryProcessor.java:2037)
> > >         at org.apache.ignite.internal.processors.query.
> > > GridQueryProcessor$4.applyx(GridQueryProcessor.java:2032)
> > >         at org.apache.ignite.internal.util.lang.IgniteOutClosureX.
> > > apply(IgniteOutClosureX.java:36)
> > >         at org.apache.ignite.internal.pro
> cessors.query.GridQueryProcessor.
> > > executeQuery(GridQueryProcessor.java:2553)
> > >         at org.apache.ignite.internal.pro
> cessors.query.GridQueryProcessor.
> > > querySqlFields(GridQueryProcessor.java:2046)
> > >         at org.apache.ignite.internal.processors.cache.
> > > IgniteCacheProxyImpl.query(IgniteCacheProxyImpl.java:664)
> > >         at org.apache.ignite.internal.processors.cache.
> > > IgniteCacheProxyImpl.query(IgniteCacheProxyImpl.java:615)
> > >         at org.apache.ignite.internal.processors.cache.
> > > GatewayProtectedCacheProxy.query(GatewayProtectedCacheProxy.java:382)
> > >         at org.apache.ignite.spark.JoinTestSpec.execSQL(
> > > JoinTestSpec.scala:63)
> > >
> > >
> > > В Вт, 13/02/2018 в 08:12 -0800, Valentin Kulichenko пишет:
> > > > Nikolay,
> > > >
> > > > This doesn't make sense to me. Not having an index should not cause
> > >
> > > query to fail. What is the exception?
> > > >
> > > > -Val
> > > >
> > > > On Tue, Feb 13, 2018 at 8:07 AM, Nikolay Izhikov <
> nizhikov@apache.org>
> > >
> > > wrote:
> > > > > Hello, Valentin.
> > > > >
> > > > > > When you're talking about join optimization, what exactly are you
> > >
> > > referring to?
> > > > >
> > > > > I'm referring to my PR [1]
> > > > > Currently, it contains transformation from Spark joins to Ignite
> joins
> > >
> > > [2]
> > > > >
> > > > > But, if I understand Vladimir answer right, for now, we don't
> *fully*
> > >
> > > support SQL join queries.
> > > > >
> > > > > Sometimes it will work just right, in other cases, it will throw an
> > >
> > > exception due Ignite internal implementation.
> > > > >
> > > > > Please, see my example [3].
> > > > > Query from line 4 will throw an exception.
> > > > > The same query from line 10 will succeed, because of index
> creation.
> > > > >
> > > > > Both of them syntactically correct.
> > > > >
> > > > > > Unfortunately, at this moment we do not have complete list of all
> > >
> > > restrictions on our joins, because a lot of work is delegated to H2.
> > > > > > In some unsupported scenarios we throw an exception.
> > > > > > In other cases we return incorrect results silently (e.g. if you
> do
> > >
> > > not co-locate data and forgot to set "distributed joins" flag).
> > > > > > We have a plan to perform excessive testing of joins (both
> > >
> > > co-located and distributed) and list all known limitations.
> > > > > > This would require writing a lot of unit tests to cover various
> > >
> > > scenarios.
> > > > > > I think we will have this information in a matter of 1-2 months.
> > > > >
> > > > > [1] https://github.com/apache/ignite/pull/3397
> > > > > [2] https://github.com/apache/ignite/pull/3397/files#diff-
> > >
> > > 5a861613530bbce650efa50d553a0e92R227
> > > > > [3] https://gist.github.com/nizhikov/a4389fd78636869dd38c13920b5
> baf2b
> > > > >
> > > > > В Пн, 12/02/2018 в 13:45 -0800, Valentin Kulichenko пишет:
> > > > > > Nikolay,
> > > > > >
> > > > > > When you're talking about join optimization, what exactly are you
> > >
> > > referring to?
> > > > > >
> > > > > > Since other parts of data frames integration are already merged,
> I
> > >
> > > think it's a good time to resurrect this thread? Does it make sense to
> > > review it right now? Or you want to make some more changes?
> > > > > >
> > > > > > -Val
> > > > > >
> > > > > > On Mon, Feb 12, 2018 at 12:20 AM, Vladimir Ozerov <
> > >
> > > vozerov@gridgain.com> wrote:
> > > > > > > Hi Nikolay,
> > > > > > >
> > > > > > > I am not sure if ticket for DECIMAL column metadata exists. If
> you
> > >
> > > haven't find one under "sql" component, please feel free to create it
> on
> > > your own. As far as testing of joins, I think it makes sense to start
> > > working on it when we finish ANSI compliance testing which is already
> in
> > > progress.
> > > > > > >
> > > > > > > On Wed, Jan 24, 2018 at 12:27 PM, Nikolay Izhikov <
> > >
> > > nizhikov.dev@gmail.com> wrote:
> > > > > > > > Hello, Vladimir.
> > > > > > > >
> > > > > > > > Thank you for an answer.
> > > > > > > >
> > > > > > > > > Do you mean whether it is possible to read it from table
> > >
> > > metadata?
> > > > > > > >
> > > > > > > > Yes, you are right.
> > > > > > > > I want to read scale and precision of DECIMAL column from
> table
> > >
> > > metadata.
> > > > > > > >
> > > > > > > > > This will be fixed at some point in future, but I do not
> have
> > >
> > > any dates at the moment.
> > > > > > > >
> > > > > > > > Is there ticket for it? I can't find it via jira search
> > > > > > > >
> > > > > > > > > at this moment we do not have complete list of all
> > >
> > > restrictions on our joins, because a lot of work is delegated to H2.
> > > > > > > > > In some unsupported scenarios we throw an exception.
> > > > > > > > > In other cases we return incorrect results silently (e.g.
> if
> > >
> > > you do not co-locate data and forgot to set "distributed joins" flag).
> > > > > > > >
> > > > > > > > Guys, Val, may be we should exclude join optimization from
> > >
> > > IGNITE-7077 while we haven't all limitation on the hand?
> > > > > > > >
> > > > > > > > > We have a plan to perform excessive testing of joins (both
> > >
> > > co-located and distributed) and list all known limitations.
> > > > > > > >
> > > > > > > > Can I help somehow with this activity?
> > > > > > > >
> > > > > > > >
> > > > > > > > В Ср, 24/01/2018 в 12:08 +0300, Vladimir Ozerov пишет:
> > > > > > > > > Hi Nikolay,
> > > > > > > > >
> > > > > > > > > Could you please clarify your question about scale and
> > >
> > > precision? Do you mean whether it is possible to read it from table
> > > metadata? If yes, it is not possible at the moment unfortunately - we
> do
> > > not store information about lengths, scales and precision, only actual
> data
> > > types are passed to H2 (e.g. String, BigDecimal, etc.). This will be
> fixed
> > > at some point in future, but I do not have any dates at the moment.
> > > > > > > > >
> > > > > > > > > Now about joins - Denis, I think you provided wrong link to
> > >
> > > our internal GridGain docs where we accumulate information about ANSI
> > > compatibility and which will are going to publish on Ignite WIKI when
> it is
> > > ready. In any case, this is not what Nikolay aksed about. The question
> was
> > > about limitation of our joins which has nothing to do with ANSI
> standard.
> > > Unfortunately, at this moment we do not have complete list of all
> > > restrictions on our joins, because a lot of work is delegated to H2. In
> > > some unsupported scenarios we throw an exception. In other cases we
> return
> > > incorrect results silently (e.g. if you do not co-locate data and
> forgot to
> > > set "distributed joins" flag). We have a plan to perform excessive
> testing
> > > of joins (both co-located and distributed) and list all known
> limitations.
> > > This would require writing a lot of unit tests to cover various
> scenarios.
> > > I think we will have this information in a matter of 1-2 months.
> > > > > > > > >
> > > > > > > > > Vladimir.
> > > > > > > > >
> > > > > > > > > On Tue, Jan 23, 2018 at 11:45 PM, Denis Magda <
> > >
> > > dmagda@apache.org> wrote:
> > > > > > > > > > Agree. The unsupported functions should be mentioned on
> the
> > >
> > > page that will cover Ignite ANSI-99 compliance. We have first results
> > > available for CORE features of the specification:
> > > > > > > > > > https://ggsystems.atlassian.net/wiki/spaces/GG/pages/
> > >
> > > 45093646/ANSI+SQL+99 <https://ggsystems.atlassian.
> > > net/wiki/spaces/GG/pages/45093646/ANSI+SQL+99>
> > > > > > > > > >
> > > > > > > > > > That’s on my radar. I’ll take care of this.
> > > > > > > > > >
> > > > > > > > > > —
> > > > > > > > > > Denis
> > > > > > > > > >
> > > > > > > > > > > On Jan 23, 2018, at 10:31 AM, Dmitriy Setrakyan <
> > >
> > > dsetrakyan@apache.org> wrote:
> > > > > > > > > > >
> > > > > > > > > > > I think we need a page listing the unsupported
> functions
> > >
> > > with explanation
> > > > > > > > > > > why, which is either it does not make sense in Ignite
> or
> > >
> > > is planned in
> > > > > > > > > > > future release.
> > > > > > > > > > >
> > > > > > > > > > > Sergey, do you think you will be able to do it?
> > > > > > > > > > >
> > > > > > > > > > > D.
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Jan 23, 2018 at 12:05 AM, Serge Puchnin <
> > >
> > > sergey.puchnin@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > yes, the Cust function is supporting both Ignite and
> H2.
> > > > > > > > > > > >
> > > > > > > > > > > > I've updated the documentation for next system
> functions:
> > > > > > > > > > > > CASEWHEN Function, CAST, CONVERT, TABLE
> > > > > > > > > > > >
> > > > > > > > > > > > https://apacheignite-sql.readm
> e.io/docs/system-functions
> > > > > > > > > > > >
> > > > > > > > > > > > And for my mind, next functions aren't applicable for
> > >
> > > Ignite:
> > > > > > > > > > > > ARRAY_GET, ARRAY_LENGTH, ARRAY_CONTAINS, CSVREAD,
> > >
> > > CSVWRITE, DATABASE,
> > > > > > > > > > > > DATABASE_PATH, DISK_SPACE_USED, FILE_READ,
> FILE_WRITE,
> > >
> > > LINK_SCHEMA,
> > > > > > > > > > > > MEMORY_FREE, MEMORY_USED, LOCK_MODE, LOCK_TIMEOUT,
> > >
> > > READONLY, CURRVAL,
> > > > > > > > > > > > AUTOCOMMIT, CANCEL_SESSION, IDENTITY, NEXTVAL,
> ROWNUM,
> > >
> > > SCHEMA,
> > > > > > > > > > > > SCOPE_IDENTITY, SESSION_ID, SET, TRANSACTION_ID,
> > >
> > > TRUNCATE_VALUE, USER,
> > > > > > > > > > > > H2VERSION
> > > > > > > > > > > >
> > > > > > > > > > > > Also an issue was created for review current
> > >
> > > documentation:
> > > > > > > > > > > > https://issues.apache.org/jira/browse/IGNITE-7496
> > > > > > > > > > > >
> > > > > > > > > > > > --
> > > > > > > > > > > > BR,
> > > > > > > > > > > > Serge
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > --
> > > > > > > > > > > > Sent from: http://apache-ignite-
> > >
> > > developers.2346864.n4.nabble.com/
> > > > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > >
> > > > > >
> > > >
> > > >
>

Re: [SparkDataFrame] Query Optimization. Prototype

Posted by Nikolay Izhikov <ni...@apache.org>.
Valentin,

> Looks like this is because you enabled non-collocated joins

But non-collocated joins is only way to be sure that join returns correct results.
So in my case it's OK to enable them.
Am I right?

> do we have this documented somewhere?

I'm asked that in previuous mail.
Vladimir Ozerov give me an answer [1] I quoted for you:

> Unfortunately, at this moment we do not have complete list of all restrictions on our joins, because a lot of work is delegated to H2.
> In some unsupported scenarios we throw an exception.
> In other cases we return incorrect results silently (e.g. if you do not co-locate data and forgot to set "distributed joins" flag).
> We have a plan to perform excessive testing of joins (both co-located and distributed) and list all known limitations.
> This would require writing a lot of unit tests to cover various scenarios.
> I think we will have this information in a matter of 1-2 months.

So the answer is no, we haven't documentation for a join limitations.

That's why I propose to exclude join optimization from my PR until:

1. We create documentation for all join limitations.
2. Create the way to check is certain join satisfy current limitations.

[1] http://apache-ignite-developers.2346864.n4.nabble.com/SparkDataFrame-Query-Optimization-Prototype-tp26249p26361.html

В Вт, 13/02/2018 в 09:55 -0800, Valentin Kulichenko пишет:
> Nikolay,
> 
> Looks like this is because you enabled non-collocated joins. I was not
> aware of this limitation though, do we have this documented somewhere?
> 
> -Val
> 
> On Tue, Feb 13, 2018 at 8:21 AM, Nikolay Izhikov <ni...@apache.org>
> wrote:
> 
> > Val,
> > 
> > Source code check: https://github.com/apache/ignite/blob/master/modules/
> > indexing/src/main/java/org/apache/ignite/internal/processors/query/h2/opt/
> > GridH2CollocationModel.java#L382
> > 
> > Stack trace:
> > 
> > javax.cache.CacheException: Failed to prepare distributed join query: join
> > condition does not use index [joinedCache=SQL_PUBLIC_JT2, plan=SELECT
> >     __Z0.ID AS __C0_0,
> >     __Z0.VAL1 AS __C0_1,
> >     __Z1.ID AS __C0_2,
> >     __Z1.VAL2 AS __C0_3
> > FROM PUBLIC.JT1 __Z0
> >     /* PUBLIC.JT1.__SCAN_ */
> > INNER JOIN PUBLIC.JT2 __Z1
> >     /* batched:broadcast PUBLIC.JT2.__SCAN_ */
> >     ON 1=1
> > WHERE __Z0.VAL1 = __Z1.VAL2]
> >         at org.apache.ignite.internal.processors.query.h2.opt.
> > GridH2CollocationModel.joinedWithCollocated(GridH2CollocationModel.java:
> > 384)
> >         at org.apache.ignite.internal.processors.query.h2.opt.
> > GridH2CollocationModel.calculate(GridH2CollocationModel.java:308)
> >         at org.apache.ignite.internal.processors.query.h2.opt.
> > GridH2CollocationModel.type(GridH2CollocationModel.java:549)
> >         at org.apache.ignite.internal.processors.query.h2.opt.
> > GridH2CollocationModel.calculate(GridH2CollocationModel.java:257)
> >         at org.apache.ignite.internal.processors.query.h2.opt.
> > GridH2CollocationModel.type(GridH2CollocationModel.java:549)
> >         at org.apache.ignite.internal.processors.query.h2.opt.
> > GridH2CollocationModel.isCollocated(GridH2CollocationModel.java:691)
> >         at org.apache.ignite.internal.processors.query.h2.sql.
> > GridSqlQuerySplitter.split(GridSqlQuerySplitter.java:239)
> >         at org.apache.ignite.internal.processors.query.h2.
> > IgniteH2Indexing.split(IgniteH2Indexing.java:1856)
> >         at org.apache.ignite.internal.processors.query.h2.
> > IgniteH2Indexing.parseAndSplit(IgniteH2Indexing.java:1818)
> >         at org.apache.ignite.internal.processors.query.h2.
> > IgniteH2Indexing.querySqlFields(IgniteH2Indexing.java:1569)
> >         at org.apache.ignite.internal.processors.query.
> > GridQueryProcessor$4.applyx(GridQueryProcessor.java:2037)
> >         at org.apache.ignite.internal.processors.query.
> > GridQueryProcessor$4.applyx(GridQueryProcessor.java:2032)
> >         at org.apache.ignite.internal.util.lang.IgniteOutClosureX.
> > apply(IgniteOutClosureX.java:36)
> >         at org.apache.ignite.internal.processors.query.GridQueryProcessor.
> > executeQuery(GridQueryProcessor.java:2553)
> >         at org.apache.ignite.internal.processors.query.GridQueryProcessor.
> > querySqlFields(GridQueryProcessor.java:2046)
> >         at org.apache.ignite.internal.processors.cache.
> > IgniteCacheProxyImpl.query(IgniteCacheProxyImpl.java:664)
> >         at org.apache.ignite.internal.processors.cache.
> > IgniteCacheProxyImpl.query(IgniteCacheProxyImpl.java:615)
> >         at org.apache.ignite.internal.processors.cache.
> > GatewayProtectedCacheProxy.query(GatewayProtectedCacheProxy.java:382)
> >         at org.apache.ignite.spark.JoinTestSpec.execSQL(
> > JoinTestSpec.scala:63)
> > 
> > 
> > В Вт, 13/02/2018 в 08:12 -0800, Valentin Kulichenko пишет:
> > > Nikolay,
> > > 
> > > This doesn't make sense to me. Not having an index should not cause
> > 
> > query to fail. What is the exception?
> > > 
> > > -Val
> > > 
> > > On Tue, Feb 13, 2018 at 8:07 AM, Nikolay Izhikov <ni...@apache.org>
> > 
> > wrote:
> > > > Hello, Valentin.
> > > > 
> > > > > When you're talking about join optimization, what exactly are you
> > 
> > referring to?
> > > > 
> > > > I'm referring to my PR [1]
> > > > Currently, it contains transformation from Spark joins to Ignite joins
> > 
> > [2]
> > > > 
> > > > But, if I understand Vladimir answer right, for now, we don't *fully*
> > 
> > support SQL join queries.
> > > > 
> > > > Sometimes it will work just right, in other cases, it will throw an
> > 
> > exception due Ignite internal implementation.
> > > > 
> > > > Please, see my example [3].
> > > > Query from line 4 will throw an exception.
> > > > The same query from line 10 will succeed, because of index creation.
> > > > 
> > > > Both of them syntactically correct.
> > > > 
> > > > > Unfortunately, at this moment we do not have complete list of all
> > 
> > restrictions on our joins, because a lot of work is delegated to H2.
> > > > > In some unsupported scenarios we throw an exception.
> > > > > In other cases we return incorrect results silently (e.g. if you do
> > 
> > not co-locate data and forgot to set "distributed joins" flag).
> > > > > We have a plan to perform excessive testing of joins (both
> > 
> > co-located and distributed) and list all known limitations.
> > > > > This would require writing a lot of unit tests to cover various
> > 
> > scenarios.
> > > > > I think we will have this information in a matter of 1-2 months.
> > > > 
> > > > [1] https://github.com/apache/ignite/pull/3397
> > > > [2] https://github.com/apache/ignite/pull/3397/files#diff-
> > 
> > 5a861613530bbce650efa50d553a0e92R227
> > > > [3] https://gist.github.com/nizhikov/a4389fd78636869dd38c13920b5baf2b
> > > > 
> > > > В Пн, 12/02/2018 в 13:45 -0800, Valentin Kulichenko пишет:
> > > > > Nikolay,
> > > > > 
> > > > > When you're talking about join optimization, what exactly are you
> > 
> > referring to?
> > > > > 
> > > > > Since other parts of data frames integration are already merged, I
> > 
> > think it's a good time to resurrect this thread? Does it make sense to
> > review it right now? Or you want to make some more changes?
> > > > > 
> > > > > -Val
> > > > > 
> > > > > On Mon, Feb 12, 2018 at 12:20 AM, Vladimir Ozerov <
> > 
> > vozerov@gridgain.com> wrote:
> > > > > > Hi Nikolay,
> > > > > > 
> > > > > > I am not sure if ticket for DECIMAL column metadata exists. If you
> > 
> > haven't find one under "sql" component, please feel free to create it on
> > your own. As far as testing of joins, I think it makes sense to start
> > working on it when we finish ANSI compliance testing which is already in
> > progress.
> > > > > > 
> > > > > > On Wed, Jan 24, 2018 at 12:27 PM, Nikolay Izhikov <
> > 
> > nizhikov.dev@gmail.com> wrote:
> > > > > > > Hello, Vladimir.
> > > > > > > 
> > > > > > > Thank you for an answer.
> > > > > > > 
> > > > > > > > Do you mean whether it is possible to read it from table
> > 
> > metadata?
> > > > > > > 
> > > > > > > Yes, you are right.
> > > > > > > I want to read scale and precision of DECIMAL column from table
> > 
> > metadata.
> > > > > > > 
> > > > > > > > This will be fixed at some point in future, but I do not have
> > 
> > any dates at the moment.
> > > > > > > 
> > > > > > > Is there ticket for it? I can't find it via jira search
> > > > > > > 
> > > > > > > > at this moment we do not have complete list of all
> > 
> > restrictions on our joins, because a lot of work is delegated to H2.
> > > > > > > > In some unsupported scenarios we throw an exception.
> > > > > > > > In other cases we return incorrect results silently (e.g. if
> > 
> > you do not co-locate data and forgot to set "distributed joins" flag).
> > > > > > > 
> > > > > > > Guys, Val, may be we should exclude join optimization from
> > 
> > IGNITE-7077 while we haven't all limitation on the hand?
> > > > > > > 
> > > > > > > > We have a plan to perform excessive testing of joins (both
> > 
> > co-located and distributed) and list all known limitations.
> > > > > > > 
> > > > > > > Can I help somehow with this activity?
> > > > > > > 
> > > > > > > 
> > > > > > > В Ср, 24/01/2018 в 12:08 +0300, Vladimir Ozerov пишет:
> > > > > > > > Hi Nikolay,
> > > > > > > > 
> > > > > > > > Could you please clarify your question about scale and
> > 
> > precision? Do you mean whether it is possible to read it from table
> > metadata? If yes, it is not possible at the moment unfortunately - we do
> > not store information about lengths, scales and precision, only actual data
> > types are passed to H2 (e.g. String, BigDecimal, etc.). This will be fixed
> > at some point in future, but I do not have any dates at the moment.
> > > > > > > > 
> > > > > > > > Now about joins - Denis, I think you provided wrong link to
> > 
> > our internal GridGain docs where we accumulate information about ANSI
> > compatibility and which will are going to publish on Ignite WIKI when it is
> > ready. In any case, this is not what Nikolay aksed about. The question was
> > about limitation of our joins which has nothing to do with ANSI standard.
> > Unfortunately, at this moment we do not have complete list of all
> > restrictions on our joins, because a lot of work is delegated to H2. In
> > some unsupported scenarios we throw an exception. In other cases we return
> > incorrect results silently (e.g. if you do not co-locate data and forgot to
> > set "distributed joins" flag). We have a plan to perform excessive testing
> > of joins (both co-located and distributed) and list all known limitations.
> > This would require writing a lot of unit tests to cover various scenarios.
> > I think we will have this information in a matter of 1-2 months.
> > > > > > > > 
> > > > > > > > Vladimir.
> > > > > > > > 
> > > > > > > > On Tue, Jan 23, 2018 at 11:45 PM, Denis Magda <
> > 
> > dmagda@apache.org> wrote:
> > > > > > > > > Agree. The unsupported functions should be mentioned on the
> > 
> > page that will cover Ignite ANSI-99 compliance. We have first results
> > available for CORE features of the specification:
> > > > > > > > > https://ggsystems.atlassian.net/wiki/spaces/GG/pages/
> > 
> > 45093646/ANSI+SQL+99 <https://ggsystems.atlassian.
> > net/wiki/spaces/GG/pages/45093646/ANSI+SQL+99>
> > > > > > > > > 
> > > > > > > > > That’s on my radar. I’ll take care of this.
> > > > > > > > > 
> > > > > > > > > —
> > > > > > > > > Denis
> > > > > > > > > 
> > > > > > > > > > On Jan 23, 2018, at 10:31 AM, Dmitriy Setrakyan <
> > 
> > dsetrakyan@apache.org> wrote:
> > > > > > > > > > 
> > > > > > > > > > I think we need a page listing the unsupported functions
> > 
> > with explanation
> > > > > > > > > > why, which is either it does not make sense in Ignite or
> > 
> > is planned in
> > > > > > > > > > future release.
> > > > > > > > > > 
> > > > > > > > > > Sergey, do you think you will be able to do it?
> > > > > > > > > > 
> > > > > > > > > > D.
> > > > > > > > > > 
> > > > > > > > > > On Tue, Jan 23, 2018 at 12:05 AM, Serge Puchnin <
> > 
> > sergey.puchnin@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > > 
> > > > > > > > > > > yes, the Cust function is supporting both Ignite and H2.
> > > > > > > > > > > 
> > > > > > > > > > > I've updated the documentation for next system functions:
> > > > > > > > > > > CASEWHEN Function, CAST, CONVERT, TABLE
> > > > > > > > > > > 
> > > > > > > > > > > https://apacheignite-sql.readme.io/docs/system-functions
> > > > > > > > > > > 
> > > > > > > > > > > And for my mind, next functions aren't applicable for
> > 
> > Ignite:
> > > > > > > > > > > ARRAY_GET, ARRAY_LENGTH, ARRAY_CONTAINS, CSVREAD,
> > 
> > CSVWRITE, DATABASE,
> > > > > > > > > > > DATABASE_PATH, DISK_SPACE_USED, FILE_READ, FILE_WRITE,
> > 
> > LINK_SCHEMA,
> > > > > > > > > > > MEMORY_FREE, MEMORY_USED, LOCK_MODE, LOCK_TIMEOUT,
> > 
> > READONLY, CURRVAL,
> > > > > > > > > > > AUTOCOMMIT, CANCEL_SESSION, IDENTITY, NEXTVAL, ROWNUM,
> > 
> > SCHEMA,
> > > > > > > > > > > SCOPE_IDENTITY, SESSION_ID, SET, TRANSACTION_ID,
> > 
> > TRUNCATE_VALUE, USER,
> > > > > > > > > > > H2VERSION
> > > > > > > > > > > 
> > > > > > > > > > > Also an issue was created for review current
> > 
> > documentation:
> > > > > > > > > > > https://issues.apache.org/jira/browse/IGNITE-7496
> > > > > > > > > > > 
> > > > > > > > > > > --
> > > > > > > > > > > BR,
> > > > > > > > > > > Serge
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > 
> > > > > > > > > > > --
> > > > > > > > > > > Sent from: http://apache-ignite-
> > 
> > developers.2346864.n4.nabble.com/
> > > > > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > 
> > > > > 
> > > 
> > > 

Re: [SparkDataFrame] Query Optimization. Prototype

Posted by Valentin Kulichenko <va...@gmail.com>.
Nikolay,

Looks like this is because you enabled non-collocated joins. I was not
aware of this limitation though, do we have this documented somewhere?

-Val

On Tue, Feb 13, 2018 at 8:21 AM, Nikolay Izhikov <ni...@apache.org>
wrote:

> Val,
>
> Source code check: https://github.com/apache/ignite/blob/master/modules/
> indexing/src/main/java/org/apache/ignite/internal/processors/query/h2/opt/
> GridH2CollocationModel.java#L382
>
> Stack trace:
>
> javax.cache.CacheException: Failed to prepare distributed join query: join
> condition does not use index [joinedCache=SQL_PUBLIC_JT2, plan=SELECT
>     __Z0.ID AS __C0_0,
>     __Z0.VAL1 AS __C0_1,
>     __Z1.ID AS __C0_2,
>     __Z1.VAL2 AS __C0_3
> FROM PUBLIC.JT1 __Z0
>     /* PUBLIC.JT1.__SCAN_ */
> INNER JOIN PUBLIC.JT2 __Z1
>     /* batched:broadcast PUBLIC.JT2.__SCAN_ */
>     ON 1=1
> WHERE __Z0.VAL1 = __Z1.VAL2]
>         at org.apache.ignite.internal.processors.query.h2.opt.
> GridH2CollocationModel.joinedWithCollocated(GridH2CollocationModel.java:
> 384)
>         at org.apache.ignite.internal.processors.query.h2.opt.
> GridH2CollocationModel.calculate(GridH2CollocationModel.java:308)
>         at org.apache.ignite.internal.processors.query.h2.opt.
> GridH2CollocationModel.type(GridH2CollocationModel.java:549)
>         at org.apache.ignite.internal.processors.query.h2.opt.
> GridH2CollocationModel.calculate(GridH2CollocationModel.java:257)
>         at org.apache.ignite.internal.processors.query.h2.opt.
> GridH2CollocationModel.type(GridH2CollocationModel.java:549)
>         at org.apache.ignite.internal.processors.query.h2.opt.
> GridH2CollocationModel.isCollocated(GridH2CollocationModel.java:691)
>         at org.apache.ignite.internal.processors.query.h2.sql.
> GridSqlQuerySplitter.split(GridSqlQuerySplitter.java:239)
>         at org.apache.ignite.internal.processors.query.h2.
> IgniteH2Indexing.split(IgniteH2Indexing.java:1856)
>         at org.apache.ignite.internal.processors.query.h2.
> IgniteH2Indexing.parseAndSplit(IgniteH2Indexing.java:1818)
>         at org.apache.ignite.internal.processors.query.h2.
> IgniteH2Indexing.querySqlFields(IgniteH2Indexing.java:1569)
>         at org.apache.ignite.internal.processors.query.
> GridQueryProcessor$4.applyx(GridQueryProcessor.java:2037)
>         at org.apache.ignite.internal.processors.query.
> GridQueryProcessor$4.applyx(GridQueryProcessor.java:2032)
>         at org.apache.ignite.internal.util.lang.IgniteOutClosureX.
> apply(IgniteOutClosureX.java:36)
>         at org.apache.ignite.internal.processors.query.GridQueryProcessor.
> executeQuery(GridQueryProcessor.java:2553)
>         at org.apache.ignite.internal.processors.query.GridQueryProcessor.
> querySqlFields(GridQueryProcessor.java:2046)
>         at org.apache.ignite.internal.processors.cache.
> IgniteCacheProxyImpl.query(IgniteCacheProxyImpl.java:664)
>         at org.apache.ignite.internal.processors.cache.
> IgniteCacheProxyImpl.query(IgniteCacheProxyImpl.java:615)
>         at org.apache.ignite.internal.processors.cache.
> GatewayProtectedCacheProxy.query(GatewayProtectedCacheProxy.java:382)
>         at org.apache.ignite.spark.JoinTestSpec.execSQL(
> JoinTestSpec.scala:63)
>
>
> В Вт, 13/02/2018 в 08:12 -0800, Valentin Kulichenko пишет:
> > Nikolay,
> >
> > This doesn't make sense to me. Not having an index should not cause
> query to fail. What is the exception?
> >
> > -Val
> >
> > On Tue, Feb 13, 2018 at 8:07 AM, Nikolay Izhikov <ni...@apache.org>
> wrote:
> > > Hello, Valentin.
> > >
> > > > When you're talking about join optimization, what exactly are you
> referring to?
> > >
> > > I'm referring to my PR [1]
> > > Currently, it contains transformation from Spark joins to Ignite joins
> [2]
> > >
> > > But, if I understand Vladimir answer right, for now, we don't *fully*
> support SQL join queries.
> > >
> > > Sometimes it will work just right, in other cases, it will throw an
> exception due Ignite internal implementation.
> > >
> > > Please, see my example [3].
> > > Query from line 4 will throw an exception.
> > > The same query from line 10 will succeed, because of index creation.
> > >
> > > Both of them syntactically correct.
> > >
> > > > Unfortunately, at this moment we do not have complete list of all
> restrictions on our joins, because a lot of work is delegated to H2.
> > > > In some unsupported scenarios we throw an exception.
> > > > In other cases we return incorrect results silently (e.g. if you do
> not co-locate data and forgot to set "distributed joins" flag).
> > > > We have a plan to perform excessive testing of joins (both
> co-located and distributed) and list all known limitations.
> > > > This would require writing a lot of unit tests to cover various
> scenarios.
> > > > I think we will have this information in a matter of 1-2 months.
> > >
> > > [1] https://github.com/apache/ignite/pull/3397
> > > [2] https://github.com/apache/ignite/pull/3397/files#diff-
> 5a861613530bbce650efa50d553a0e92R227
> > > [3] https://gist.github.com/nizhikov/a4389fd78636869dd38c13920b5baf2b
> > >
> > > В Пн, 12/02/2018 в 13:45 -0800, Valentin Kulichenko пишет:
> > > > Nikolay,
> > > >
> > > > When you're talking about join optimization, what exactly are you
> referring to?
> > > >
> > > > Since other parts of data frames integration are already merged, I
> think it's a good time to resurrect this thread? Does it make sense to
> review it right now? Or you want to make some more changes?
> > > >
> > > > -Val
> > > >
> > > > On Mon, Feb 12, 2018 at 12:20 AM, Vladimir Ozerov <
> vozerov@gridgain.com> wrote:
> > > > > Hi Nikolay,
> > > > >
> > > > > I am not sure if ticket for DECIMAL column metadata exists. If you
> haven't find one under "sql" component, please feel free to create it on
> your own. As far as testing of joins, I think it makes sense to start
> working on it when we finish ANSI compliance testing which is already in
> progress.
> > > > >
> > > > > On Wed, Jan 24, 2018 at 12:27 PM, Nikolay Izhikov <
> nizhikov.dev@gmail.com> wrote:
> > > > > > Hello, Vladimir.
> > > > > >
> > > > > > Thank you for an answer.
> > > > > >
> > > > > > > Do you mean whether it is possible to read it from table
> metadata?
> > > > > >
> > > > > > Yes, you are right.
> > > > > > I want to read scale and precision of DECIMAL column from table
> metadata.
> > > > > >
> > > > > > > This will be fixed at some point in future, but I do not have
> any dates at the moment.
> > > > > >
> > > > > > Is there ticket for it? I can't find it via jira search
> > > > > >
> > > > > > > at this moment we do not have complete list of all
> restrictions on our joins, because a lot of work is delegated to H2.
> > > > > > > In some unsupported scenarios we throw an exception.
> > > > > > > In other cases we return incorrect results silently (e.g. if
> you do not co-locate data and forgot to set "distributed joins" flag).
> > > > > >
> > > > > > Guys, Val, may be we should exclude join optimization from
> IGNITE-7077 while we haven't all limitation on the hand?
> > > > > >
> > > > > > > We have a plan to perform excessive testing of joins (both
> co-located and distributed) and list all known limitations.
> > > > > >
> > > > > > Can I help somehow with this activity?
> > > > > >
> > > > > >
> > > > > > В Ср, 24/01/2018 в 12:08 +0300, Vladimir Ozerov пишет:
> > > > > > > Hi Nikolay,
> > > > > > >
> > > > > > > Could you please clarify your question about scale and
> precision? Do you mean whether it is possible to read it from table
> metadata? If yes, it is not possible at the moment unfortunately - we do
> not store information about lengths, scales and precision, only actual data
> types are passed to H2 (e.g. String, BigDecimal, etc.). This will be fixed
> at some point in future, but I do not have any dates at the moment.
> > > > > > >
> > > > > > > Now about joins - Denis, I think you provided wrong link to
> our internal GridGain docs where we accumulate information about ANSI
> compatibility and which will are going to publish on Ignite WIKI when it is
> ready. In any case, this is not what Nikolay aksed about. The question was
> about limitation of our joins which has nothing to do with ANSI standard.
> Unfortunately, at this moment we do not have complete list of all
> restrictions on our joins, because a lot of work is delegated to H2. In
> some unsupported scenarios we throw an exception. In other cases we return
> incorrect results silently (e.g. if you do not co-locate data and forgot to
> set "distributed joins" flag). We have a plan to perform excessive testing
> of joins (both co-located and distributed) and list all known limitations.
> This would require writing a lot of unit tests to cover various scenarios.
> I think we will have this information in a matter of 1-2 months.
> > > > > > >
> > > > > > > Vladimir.
> > > > > > >
> > > > > > > On Tue, Jan 23, 2018 at 11:45 PM, Denis Magda <
> dmagda@apache.org> wrote:
> > > > > > > > Agree. The unsupported functions should be mentioned on the
> page that will cover Ignite ANSI-99 compliance. We have first results
> available for CORE features of the specification:
> > > > > > > > https://ggsystems.atlassian.net/wiki/spaces/GG/pages/
> 45093646/ANSI+SQL+99 <https://ggsystems.atlassian.
> net/wiki/spaces/GG/pages/45093646/ANSI+SQL+99>
> > > > > > > >
> > > > > > > > That’s on my radar. I’ll take care of this.
> > > > > > > >
> > > > > > > > —
> > > > > > > > Denis
> > > > > > > >
> > > > > > > > > On Jan 23, 2018, at 10:31 AM, Dmitriy Setrakyan <
> dsetrakyan@apache.org> wrote:
> > > > > > > > >
> > > > > > > > > I think we need a page listing the unsupported functions
> with explanation
> > > > > > > > > why, which is either it does not make sense in Ignite or
> is planned in
> > > > > > > > > future release.
> > > > > > > > >
> > > > > > > > > Sergey, do you think you will be able to do it?
> > > > > > > > >
> > > > > > > > > D.
> > > > > > > > >
> > > > > > > > > On Tue, Jan 23, 2018 at 12:05 AM, Serge Puchnin <
> sergey.puchnin@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > >> yes, the Cust function is supporting both Ignite and H2.
> > > > > > > > >>
> > > > > > > > >> I've updated the documentation for next system functions:
> > > > > > > > >> CASEWHEN Function, CAST, CONVERT, TABLE
> > > > > > > > >>
> > > > > > > > >> https://apacheignite-sql.readme.io/docs/system-functions
> > > > > > > > >>
> > > > > > > > >> And for my mind, next functions aren't applicable for
> Ignite:
> > > > > > > > >> ARRAY_GET, ARRAY_LENGTH, ARRAY_CONTAINS, CSVREAD,
> CSVWRITE, DATABASE,
> > > > > > > > >> DATABASE_PATH, DISK_SPACE_USED, FILE_READ, FILE_WRITE,
> LINK_SCHEMA,
> > > > > > > > >> MEMORY_FREE, MEMORY_USED, LOCK_MODE, LOCK_TIMEOUT,
> READONLY, CURRVAL,
> > > > > > > > >> AUTOCOMMIT, CANCEL_SESSION, IDENTITY, NEXTVAL, ROWNUM,
> SCHEMA,
> > > > > > > > >> SCOPE_IDENTITY, SESSION_ID, SET, TRANSACTION_ID,
> TRUNCATE_VALUE, USER,
> > > > > > > > >> H2VERSION
> > > > > > > > >>
> > > > > > > > >> Also an issue was created for review current
> documentation:
> > > > > > > > >> https://issues.apache.org/jira/browse/IGNITE-7496
> > > > > > > > >>
> > > > > > > > >> --
> > > > > > > > >> BR,
> > > > > > > > >> Serge
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> --
> > > > > > > > >> Sent from: http://apache-ignite-
> developers.2346864.n4.nabble.com/
> > > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > > >
> > > >
> > > >
> >
> >
>

Re: [SparkDataFrame] Query Optimization. Prototype

Posted by Nikolay Izhikov <ni...@apache.org>.
Val, 

Source code check: https://github.com/apache/ignite/blob/master/modules/indexing/src/main/java/org/apache/ignite/internal/processors/query/h2/opt/GridH2CollocationModel.java#L382

Stack trace:

javax.cache.CacheException: Failed to prepare distributed join query: join condition does not use index [joinedCache=SQL_PUBLIC_JT2, plan=SELECT
    __Z0.ID AS __C0_0,
    __Z0.VAL1 AS __C0_1,
    __Z1.ID AS __C0_2,
    __Z1.VAL2 AS __C0_3
FROM PUBLIC.JT1 __Z0
    /* PUBLIC.JT1.__SCAN_ */
INNER JOIN PUBLIC.JT2 __Z1
    /* batched:broadcast PUBLIC.JT2.__SCAN_ */
    ON 1=1
WHERE __Z0.VAL1 = __Z1.VAL2]
	at org.apache.ignite.internal.processors.query.h2.opt.GridH2CollocationModel.joinedWithCollocated(GridH2CollocationModel.java:384)
	at org.apache.ignite.internal.processors.query.h2.opt.GridH2CollocationModel.calculate(GridH2CollocationModel.java:308)
	at org.apache.ignite.internal.processors.query.h2.opt.GridH2CollocationModel.type(GridH2CollocationModel.java:549)
	at org.apache.ignite.internal.processors.query.h2.opt.GridH2CollocationModel.calculate(GridH2CollocationModel.java:257)
	at org.apache.ignite.internal.processors.query.h2.opt.GridH2CollocationModel.type(GridH2CollocationModel.java:549)
	at org.apache.ignite.internal.processors.query.h2.opt.GridH2CollocationModel.isCollocated(GridH2CollocationModel.java:691)
	at org.apache.ignite.internal.processors.query.h2.sql.GridSqlQuerySplitter.split(GridSqlQuerySplitter.java:239)
	at org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.split(IgniteH2Indexing.java:1856)
	at org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.parseAndSplit(IgniteH2Indexing.java:1818)
	at org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.querySqlFields(IgniteH2Indexing.java:1569)
	at org.apache.ignite.internal.processors.query.GridQueryProcessor$4.applyx(GridQueryProcessor.java:2037)
	at org.apache.ignite.internal.processors.query.GridQueryProcessor$4.applyx(GridQueryProcessor.java:2032)
	at org.apache.ignite.internal.util.lang.IgniteOutClosureX.apply(IgniteOutClosureX.java:36)
	at org.apache.ignite.internal.processors.query.GridQueryProcessor.executeQuery(GridQueryProcessor.java:2553)
	at org.apache.ignite.internal.processors.query.GridQueryProcessor.querySqlFields(GridQueryProcessor.java:2046)
	at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.query(IgniteCacheProxyImpl.java:664)
	at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.query(IgniteCacheProxyImpl.java:615)
	at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.query(GatewayProtectedCacheProxy.java:382)
	at org.apache.ignite.spark.JoinTestSpec.execSQL(JoinTestSpec.scala:63)
	

В Вт, 13/02/2018 в 08:12 -0800, Valentin Kulichenko пишет:
> Nikolay,
> 
> This doesn't make sense to me. Not having an index should not cause query to fail. What is the exception?
> 
> -Val
> 
> On Tue, Feb 13, 2018 at 8:07 AM, Nikolay Izhikov <ni...@apache.org> wrote:
> > Hello, Valentin.
> > 
> > > When you're talking about join optimization, what exactly are you referring to?
> > 
> > I'm referring to my PR [1]
> > Currently, it contains transformation from Spark joins to Ignite joins [2]
> > 
> > But, if I understand Vladimir answer right, for now, we don't *fully* support SQL join queries.
> > 
> > Sometimes it will work just right, in other cases, it will throw an exception due Ignite internal implementation.
> > 
> > Please, see my example [3].
> > Query from line 4 will throw an exception.
> > The same query from line 10 will succeed, because of index creation.
> > 
> > Both of them syntactically correct.
> > 
> > > Unfortunately, at this moment we do not have complete list of all restrictions on our joins, because a lot of work is delegated to H2.
> > > In some unsupported scenarios we throw an exception.
> > > In other cases we return incorrect results silently (e.g. if you do not co-locate data and forgot to set "distributed joins" flag).
> > > We have a plan to perform excessive testing of joins (both co-located and distributed) and list all known limitations.
> > > This would require writing a lot of unit tests to cover various scenarios.
> > > I think we will have this information in a matter of 1-2 months.
> > 
> > [1] https://github.com/apache/ignite/pull/3397
> > [2] https://github.com/apache/ignite/pull/3397/files#diff-5a861613530bbce650efa50d553a0e92R227
> > [3] https://gist.github.com/nizhikov/a4389fd78636869dd38c13920b5baf2b
> > 
> > В Пн, 12/02/2018 в 13:45 -0800, Valentin Kulichenko пишет:
> > > Nikolay,
> > >
> > > When you're talking about join optimization, what exactly are you referring to?
> > >
> > > Since other parts of data frames integration are already merged, I think it's a good time to resurrect this thread? Does it make sense to review it right now? Or you want to make some more changes?
> > >
> > > -Val
> > >
> > > On Mon, Feb 12, 2018 at 12:20 AM, Vladimir Ozerov <vo...@gridgain.com> wrote:
> > > > Hi Nikolay,
> > > >
> > > > I am not sure if ticket for DECIMAL column metadata exists. If you haven't find one under "sql" component, please feel free to create it on your own. As far as testing of joins, I think it makes sense to start working on it when we finish ANSI compliance testing which is already in progress.
> > > >
> > > > On Wed, Jan 24, 2018 at 12:27 PM, Nikolay Izhikov <ni...@gmail.com> wrote:
> > > > > Hello, Vladimir.
> > > > >
> > > > > Thank you for an answer.
> > > > >
> > > > > > Do you mean whether it is possible to read it from table metadata?
> > > > >
> > > > > Yes, you are right.
> > > > > I want to read scale and precision of DECIMAL column from table metadata.
> > > > >
> > > > > > This will be fixed at some point in future, but I do not have any dates at the moment.
> > > > >
> > > > > Is there ticket for it? I can't find it via jira search
> > > > >
> > > > > > at this moment we do not have complete list of all restrictions on our joins, because a lot of work is delegated to H2.
> > > > > > In some unsupported scenarios we throw an exception.
> > > > > > In other cases we return incorrect results silently (e.g. if you do not co-locate data and forgot to set "distributed joins" flag).
> > > > >
> > > > > Guys, Val, may be we should exclude join optimization from IGNITE-7077 while we haven't all limitation on the hand?
> > > > >
> > > > > > We have a plan to perform excessive testing of joins (both co-located and distributed) and list all known limitations.
> > > > >
> > > > > Can I help somehow with this activity?
> > > > >
> > > > >
> > > > > В Ср, 24/01/2018 в 12:08 +0300, Vladimir Ozerov пишет:
> > > > > > Hi Nikolay,
> > > > > >
> > > > > > Could you please clarify your question about scale and precision? Do you mean whether it is possible to read it from table metadata? If yes, it is not possible at the moment unfortunately - we do not store information about lengths, scales and precision, only actual data types are passed to H2 (e.g. String, BigDecimal, etc.). This will be fixed at some point in future, but I do not have any dates at the moment.
> > > > > >
> > > > > > Now about joins - Denis, I think you provided wrong link to our internal GridGain docs where we accumulate information about ANSI compatibility and which will are going to publish on Ignite WIKI when it is ready. In any case, this is not what Nikolay aksed about. The question was about limitation of our joins which has nothing to do with ANSI standard. Unfortunately, at this moment we do not have complete list of all restrictions on our joins, because a lot of work is delegated to H2. In some unsupported scenarios we throw an exception. In other cases we return incorrect results silently (e.g. if you do not co-locate data and forgot to set "distributed joins" flag). We have a plan to perform excessive testing of joins (both co-located and distributed) and list all known limitations. This would require writing a lot of unit tests to cover various scenarios. I think we will have this information in a matter of 1-2 months.
> > > > > >
> > > > > > Vladimir.
> > > > > >
> > > > > > On Tue, Jan 23, 2018 at 11:45 PM, Denis Magda <dm...@apache.org> wrote:
> > > > > > > Agree. The unsupported functions should be mentioned on the page that will cover Ignite ANSI-99 compliance. We have first results available for CORE features of the specification:
> > > > > > > https://ggsystems.atlassian.net/wiki/spaces/GG/pages/45093646/ANSI+SQL+99 <https://ggsystems.atlassian.net/wiki/spaces/GG/pages/45093646/ANSI+SQL+99>
> > > > > > >
> > > > > > > That’s on my radar. I’ll take care of this.
> > > > > > >
> > > > > > > —
> > > > > > > Denis
> > > > > > >
> > > > > > > > On Jan 23, 2018, at 10:31 AM, Dmitriy Setrakyan <ds...@apache.org> wrote:
> > > > > > > >
> > > > > > > > I think we need a page listing the unsupported functions with explanation
> > > > > > > > why, which is either it does not make sense in Ignite or is planned in
> > > > > > > > future release.
> > > > > > > >
> > > > > > > > Sergey, do you think you will be able to do it?
> > > > > > > >
> > > > > > > > D.
> > > > > > > >
> > > > > > > > On Tue, Jan 23, 2018 at 12:05 AM, Serge Puchnin <se...@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > >> yes, the Cust function is supporting both Ignite and H2.
> > > > > > > >>
> > > > > > > >> I've updated the documentation for next system functions:
> > > > > > > >> CASEWHEN Function, CAST, CONVERT, TABLE
> > > > > > > >>
> > > > > > > >> https://apacheignite-sql.readme.io/docs/system-functions
> > > > > > > >>
> > > > > > > >> And for my mind, next functions aren't applicable for Ignite:
> > > > > > > >> ARRAY_GET, ARRAY_LENGTH, ARRAY_CONTAINS, CSVREAD, CSVWRITE, DATABASE,
> > > > > > > >> DATABASE_PATH, DISK_SPACE_USED, FILE_READ, FILE_WRITE, LINK_SCHEMA,
> > > > > > > >> MEMORY_FREE, MEMORY_USED, LOCK_MODE, LOCK_TIMEOUT, READONLY, CURRVAL,
> > > > > > > >> AUTOCOMMIT, CANCEL_SESSION, IDENTITY, NEXTVAL, ROWNUM, SCHEMA,
> > > > > > > >> SCOPE_IDENTITY, SESSION_ID, SET, TRANSACTION_ID, TRUNCATE_VALUE, USER,
> > > > > > > >> H2VERSION
> > > > > > > >>
> > > > > > > >> Also an issue was created for review current documentation:
> > > > > > > >> https://issues.apache.org/jira/browse/IGNITE-7496
> > > > > > > >>
> > > > > > > >> --
> > > > > > > >> BR,
> > > > > > > >> Serge
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> --
> > > > > > > >> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > > >
> > >
> > >
> 
> 

Re: [SparkDataFrame] Query Optimization. Prototype

Posted by Valentin Kulichenko <va...@gmail.com>.
Nikolay,

This doesn't make sense to me. Not having an index should not cause query
to fail. What is the exception?

-Val

On Tue, Feb 13, 2018 at 8:07 AM, Nikolay Izhikov <ni...@apache.org>
wrote:

> Hello, Valentin.
>
> > When you're talking about join optimization, what exactly are you
> referring to?
>
> I'm referring to my PR [1]
> Currently, it contains transformation from Spark joins to Ignite joins [2]
>
> But, if I understand Vladimir answer right, for now, we don't *fully*
> support SQL join queries.
>
> Sometimes it will work just right, in other cases, it will throw an
> exception due Ignite internal implementation.
>
> Please, see my example [3].
> Query from line 4 will throw an exception.
> The same query from line 10 will succeed, because of index creation.
>
> Both of them syntactically correct.
>
> > Unfortunately, at this moment we do not have complete list of all
> restrictions on our joins, because a lot of work is delegated to H2.
> > In some unsupported scenarios we throw an exception.
> > In other cases we return incorrect results silently (e.g. if you do not
> co-locate data and forgot to set "distributed joins" flag).
> > We have a plan to perform excessive testing of joins (both co-located
> and distributed) and list all known limitations.
> > This would require writing a lot of unit tests to cover various
> scenarios.
> > I think we will have this information in a matter of 1-2 months.
>
> [1] https://github.com/apache/ignite/pull/3397
> [2] https://github.com/apache/ignite/pull/3397/files#diff-
> 5a861613530bbce650efa50d553a0e92R227
> [3] https://gist.github.com/nizhikov/a4389fd78636869dd38c13920b5baf2b
>
> В Пн, 12/02/2018 в 13:45 -0800, Valentin Kulichenko пишет:
> > Nikolay,
> >
> > When you're talking about join optimization, what exactly are you
> referring to?
> >
> > Since other parts of data frames integration are already merged, I think
> it's a good time to resurrect this thread? Does it make sense to review it
> right now? Or you want to make some more changes?
> >
> > -Val
> >
> > On Mon, Feb 12, 2018 at 12:20 AM, Vladimir Ozerov <vo...@gridgain.com>
> wrote:
> > > Hi Nikolay,
> > >
> > > I am not sure if ticket for DECIMAL column metadata exists. If you
> haven't find one under "sql" component, please feel free to create it on
> your own. As far as testing of joins, I think it makes sense to start
> working on it when we finish ANSI compliance testing which is already in
> progress.
> > >
> > > On Wed, Jan 24, 2018 at 12:27 PM, Nikolay Izhikov <
> nizhikov.dev@gmail.com> wrote:
> > > > Hello, Vladimir.
> > > >
> > > > Thank you for an answer.
> > > >
> > > > > Do you mean whether it is possible to read it from table metadata?
> > > >
> > > > Yes, you are right.
> > > > I want to read scale and precision of DECIMAL column from table
> metadata.
> > > >
> > > > > This will be fixed at some point in future, but I do not have any
> dates at the moment.
> > > >
> > > > Is there ticket for it? I can't find it via jira search
> > > >
> > > > > at this moment we do not have complete list of all restrictions on
> our joins, because a lot of work is delegated to H2.
> > > > > In some unsupported scenarios we throw an exception.
> > > > > In other cases we return incorrect results silently (e.g. if you
> do not co-locate data and forgot to set "distributed joins" flag).
> > > >
> > > > Guys, Val, may be we should exclude join optimization from
> IGNITE-7077 while we haven't all limitation on the hand?
> > > >
> > > > > We have a plan to perform excessive testing of joins (both
> co-located and distributed) and list all known limitations.
> > > >
> > > > Can I help somehow with this activity?
> > > >
> > > >
> > > > В Ср, 24/01/2018 в 12:08 +0300, Vladimir Ozerov пишет:
> > > > > Hi Nikolay,
> > > > >
> > > > > Could you please clarify your question about scale and precision?
> Do you mean whether it is possible to read it from table metadata? If yes,
> it is not possible at the moment unfortunately - we do not store
> information about lengths, scales and precision, only actual data types are
> passed to H2 (e.g. String, BigDecimal, etc.). This will be fixed at some
> point in future, but I do not have any dates at the moment.
> > > > >
> > > > > Now about joins - Denis, I think you provided wrong link to our
> internal GridGain docs where we accumulate information about ANSI
> compatibility and which will are going to publish on Ignite WIKI when it is
> ready. In any case, this is not what Nikolay aksed about. The question was
> about limitation of our joins which has nothing to do with ANSI standard.
> Unfortunately, at this moment we do not have complete list of all
> restrictions on our joins, because a lot of work is delegated to H2. In
> some unsupported scenarios we throw an exception. In other cases we return
> incorrect results silently (e.g. if you do not co-locate data and forgot to
> set "distributed joins" flag). We have a plan to perform excessive testing
> of joins (both co-located and distributed) and list all known limitations.
> This would require writing a lot of unit tests to cover various scenarios.
> I think we will have this information in a matter of 1-2 months.
> > > > >
> > > > > Vladimir.
> > > > >
> > > > > On Tue, Jan 23, 2018 at 11:45 PM, Denis Magda <dm...@apache.org>
> wrote:
> > > > > > Agree. The unsupported functions should be mentioned on the page
> that will cover Ignite ANSI-99 compliance. We have first results available
> for CORE features of the specification:
> > > > > > https://ggsystems.atlassian.net/wiki/spaces/GG/pages/
> 45093646/ANSI+SQL+99 <https://ggsystems.atlassian.
> net/wiki/spaces/GG/pages/45093646/ANSI+SQL+99>
> > > > > >
> > > > > > That’s on my radar. I’ll take care of this.
> > > > > >
> > > > > > —
> > > > > > Denis
> > > > > >
> > > > > > > On Jan 23, 2018, at 10:31 AM, Dmitriy Setrakyan <
> dsetrakyan@apache.org> wrote:
> > > > > > >
> > > > > > > I think we need a page listing the unsupported functions with
> explanation
> > > > > > > why, which is either it does not make sense in Ignite or is
> planned in
> > > > > > > future release.
> > > > > > >
> > > > > > > Sergey, do you think you will be able to do it?
> > > > > > >
> > > > > > > D.
> > > > > > >
> > > > > > > On Tue, Jan 23, 2018 at 12:05 AM, Serge Puchnin <
> sergey.puchnin@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > >> yes, the Cust function is supporting both Ignite and H2.
> > > > > > >>
> > > > > > >> I've updated the documentation for next system functions:
> > > > > > >> CASEWHEN Function, CAST, CONVERT, TABLE
> > > > > > >>
> > > > > > >> https://apacheignite-sql.readme.io/docs/system-functions
> > > > > > >>
> > > > > > >> And for my mind, next functions aren't applicable for Ignite:
> > > > > > >> ARRAY_GET, ARRAY_LENGTH, ARRAY_CONTAINS, CSVREAD, CSVWRITE,
> DATABASE,
> > > > > > >> DATABASE_PATH, DISK_SPACE_USED, FILE_READ, FILE_WRITE,
> LINK_SCHEMA,
> > > > > > >> MEMORY_FREE, MEMORY_USED, LOCK_MODE, LOCK_TIMEOUT, READONLY,
> CURRVAL,
> > > > > > >> AUTOCOMMIT, CANCEL_SESSION, IDENTITY, NEXTVAL, ROWNUM, SCHEMA,
> > > > > > >> SCOPE_IDENTITY, SESSION_ID, SET, TRANSACTION_ID,
> TRUNCATE_VALUE, USER,
> > > > > > >> H2VERSION
> > > > > > >>
> > > > > > >> Also an issue was created for review current documentation:
> > > > > > >> https://issues.apache.org/jira/browse/IGNITE-7496
> > > > > > >>
> > > > > > >> --
> > > > > > >> BR,
> > > > > > >> Serge
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >> --
> > > > > > >> Sent from: http://apache-ignite-developers.2346864.n4.nabble.
> com/
> > > > > > >>
> > > > > >
> > > > >
> > > > >
> >
> >
>

Re: [SparkDataFrame] Query Optimization. Prototype

Posted by Nikolay Izhikov <ni...@apache.org>.
Hello, Valentin.

> When you're talking about join optimization, what exactly are you referring to?

I'm referring to my PR [1]
Currently, it contains transformation from Spark joins to Ignite joins [2]

But, if I understand Vladimir answer right, for now, we don't *fully* support SQL join queries.

Sometimes it will work just right, in other cases, it will throw an exception due Ignite internal implementation.

Please, see my example [3].
Query from line 4 will throw an exception.
The same query from line 10 will succeed, because of index creation.

Both of them syntactically correct.

> Unfortunately, at this moment we do not have complete list of all restrictions on our joins, because a lot of work is delegated to H2. 
> In some unsupported scenarios we throw an exception. 
> In other cases we return incorrect results silently (e.g. if you do not co-locate data and forgot to set "distributed joins" flag). 
> We have a plan to perform excessive testing of joins (both co-located and distributed) and list all known limitations. 
> This would require writing a lot of unit tests to cover various scenarios. 
> I think we will have this information in a matter of 1-2 months.

[1] https://github.com/apache/ignite/pull/3397
[2] https://github.com/apache/ignite/pull/3397/files#diff-5a861613530bbce650efa50d553a0e92R227
[3] https://gist.github.com/nizhikov/a4389fd78636869dd38c13920b5baf2b

В Пн, 12/02/2018 в 13:45 -0800, Valentin Kulichenko пишет:
> Nikolay,
> 
> When you're talking about join optimization, what exactly are you referring to?
> 
> Since other parts of data frames integration are already merged, I think it's a good time to resurrect this thread? Does it make sense to review it right now? Or you want to make some more changes?
> 
> -Val
> 
> On Mon, Feb 12, 2018 at 12:20 AM, Vladimir Ozerov <vo...@gridgain.com> wrote:
> > Hi Nikolay,
> > 
> > I am not sure if ticket for DECIMAL column metadata exists. If you haven't find one under "sql" component, please feel free to create it on your own. As far as testing of joins, I think it makes sense to start working on it when we finish ANSI compliance testing which is already in progress.
> > 
> > On Wed, Jan 24, 2018 at 12:27 PM, Nikolay Izhikov <ni...@gmail.com> wrote:
> > > Hello, Vladimir.
> > > 
> > > Thank you for an answer.
> > > 
> > > > Do you mean whether it is possible to read it from table metadata?
> > > 
> > > Yes, you are right.
> > > I want to read scale and precision of DECIMAL column from table metadata.
> > > 
> > > > This will be fixed at some point in future, but I do not have any dates at the moment.
> > > 
> > > Is there ticket for it? I can't find it via jira search
> > > 
> > > > at this moment we do not have complete list of all restrictions on our joins, because a lot of work is delegated to H2.
> > > > In some unsupported scenarios we throw an exception.
> > > > In other cases we return incorrect results silently (e.g. if you do not co-locate data and forgot to set "distributed joins" flag).
> > > 
> > > Guys, Val, may be we should exclude join optimization from IGNITE-7077 while we haven't all limitation on the hand?
> > > 
> > > > We have a plan to perform excessive testing of joins (both co-located and distributed) and list all known limitations.
> > > 
> > > Can I help somehow with this activity?
> > > 
> > > 
> > > В Ср, 24/01/2018 в 12:08 +0300, Vladimir Ozerov пишет:
> > > > Hi Nikolay,
> > > >
> > > > Could you please clarify your question about scale and precision? Do you mean whether it is possible to read it from table metadata? If yes, it is not possible at the moment unfortunately - we do not store information about lengths, scales and precision, only actual data types are passed to H2 (e.g. String, BigDecimal, etc.). This will be fixed at some point in future, but I do not have any dates at the moment.
> > > >
> > > > Now about joins - Denis, I think you provided wrong link to our internal GridGain docs where we accumulate information about ANSI compatibility and which will are going to publish on Ignite WIKI when it is ready. In any case, this is not what Nikolay aksed about. The question was about limitation of our joins which has nothing to do with ANSI standard. Unfortunately, at this moment we do not have complete list of all restrictions on our joins, because a lot of work is delegated to H2. In some unsupported scenarios we throw an exception. In other cases we return incorrect results silently (e.g. if you do not co-locate data and forgot to set "distributed joins" flag). We have a plan to perform excessive testing of joins (both co-located and distributed) and list all known limitations. This would require writing a lot of unit tests to cover various scenarios. I think we will have this information in a matter of 1-2 months.
> > > >
> > > > Vladimir.
> > > >
> > > > On Tue, Jan 23, 2018 at 11:45 PM, Denis Magda <dm...@apache.org> wrote:
> > > > > Agree. The unsupported functions should be mentioned on the page that will cover Ignite ANSI-99 compliance. We have first results available for CORE features of the specification:
> > > > > https://ggsystems.atlassian.net/wiki/spaces/GG/pages/45093646/ANSI+SQL+99 <https://ggsystems.atlassian.net/wiki/spaces/GG/pages/45093646/ANSI+SQL+99>
> > > > >
> > > > > That’s on my radar. I’ll take care of this.
> > > > >
> > > > > —
> > > > > Denis
> > > > >
> > > > > > On Jan 23, 2018, at 10:31 AM, Dmitriy Setrakyan <ds...@apache.org> wrote:
> > > > > >
> > > > > > I think we need a page listing the unsupported functions with explanation
> > > > > > why, which is either it does not make sense in Ignite or is planned in
> > > > > > future release.
> > > > > >
> > > > > > Sergey, do you think you will be able to do it?
> > > > > >
> > > > > > D.
> > > > > >
> > > > > > On Tue, Jan 23, 2018 at 12:05 AM, Serge Puchnin <se...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > >> yes, the Cust function is supporting both Ignite and H2.
> > > > > >>
> > > > > >> I've updated the documentation for next system functions:
> > > > > >> CASEWHEN Function, CAST, CONVERT, TABLE
> > > > > >>
> > > > > >> https://apacheignite-sql.readme.io/docs/system-functions
> > > > > >>
> > > > > >> And for my mind, next functions aren't applicable for Ignite:
> > > > > >> ARRAY_GET, ARRAY_LENGTH, ARRAY_CONTAINS, CSVREAD, CSVWRITE, DATABASE,
> > > > > >> DATABASE_PATH, DISK_SPACE_USED, FILE_READ, FILE_WRITE, LINK_SCHEMA,
> > > > > >> MEMORY_FREE, MEMORY_USED, LOCK_MODE, LOCK_TIMEOUT, READONLY, CURRVAL,
> > > > > >> AUTOCOMMIT, CANCEL_SESSION, IDENTITY, NEXTVAL, ROWNUM, SCHEMA,
> > > > > >> SCOPE_IDENTITY, SESSION_ID, SET, TRANSACTION_ID, TRUNCATE_VALUE, USER,
> > > > > >> H2VERSION
> > > > > >>
> > > > > >> Also an issue was created for review current documentation:
> > > > > >> https://issues.apache.org/jira/browse/IGNITE-7496
> > > > > >>
> > > > > >> --
> > > > > >> BR,
> > > > > >> Serge
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> --
> > > > > >> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
> > > > > >>
> > > > >
> > > >
> > > >
> 
> 

Re: [SparkDataFrame] Query Optimization. Prototype

Posted by Valentin Kulichenko <va...@gmail.com>.
Nikolay,

When you're talking about join optimization, what exactly are you referring
to?

Since other parts of data frames integration are already merged, I think
it's a good time to resurrect this thread? Does it make sense to review it
right now? Or you want to make some more changes?

-Val

On Mon, Feb 12, 2018 at 12:20 AM, Vladimir Ozerov <vo...@gridgain.com>
wrote:

> Hi Nikolay,
>
> I am not sure if ticket for DECIMAL column metadata exists. If you haven't
> find one under "sql" component, please feel free to create it on your own.
> As far as testing of joins, I think it makes sense to start working on it
> when we finish ANSI compliance testing which is already in progress.
>
> On Wed, Jan 24, 2018 at 12:27 PM, Nikolay Izhikov <ni...@gmail.com>
> wrote:
>
>> Hello, Vladimir.
>>
>> Thank you for an answer.
>>
>> > Do you mean whether it is possible to read it from table metadata?
>>
>> Yes, you are right.
>> I want to read scale and precision of DECIMAL column from table metadata.
>>
>> > This will be fixed at some point in future, but I do not have any dates
>> at the moment.
>>
>> Is there ticket for it? I can't find it via jira search
>>
>> > at this moment we do not have complete list of all restrictions on our
>> joins, because a lot of work is delegated to H2.
>> > In some unsupported scenarios we throw an exception.
>> > In other cases we return incorrect results silently (e.g. if you do not
>> co-locate data and forgot to set "distributed joins" flag).
>>
>> Guys, Val, may be we should exclude join optimization from IGNITE-7077
>> while we haven't all limitation on the hand?
>>
>> > We have a plan to perform excessive testing of joins (both co-located
>> and distributed) and list all known limitations.
>>
>> Can I help somehow with this activity?
>>
>>
>> В Ср, 24/01/2018 в 12:08 +0300, Vladimir Ozerov пишет:
>> > Hi Nikolay,
>> >
>> > Could you please clarify your question about scale and precision? Do
>> you mean whether it is possible to read it from table metadata? If yes, it
>> is not possible at the moment unfortunately - we do not store information
>> about lengths, scales and precision, only actual data types are passed to
>> H2 (e.g. String, BigDecimal, etc.). This will be fixed at some point in
>> future, but I do not have any dates at the moment.
>> >
>> > Now about joins - Denis, I think you provided wrong link to our
>> internal GridGain docs where we accumulate information about ANSI
>> compatibility and which will are going to publish on Ignite WIKI when it is
>> ready. In any case, this is not what Nikolay aksed about. The question was
>> about limitation of our joins which has nothing to do with ANSI standard.
>> Unfortunately, at this moment we do not have complete list of all
>> restrictions on our joins, because a lot of work is delegated to H2. In
>> some unsupported scenarios we throw an exception. In other cases we return
>> incorrect results silently (e.g. if you do not co-locate data and forgot to
>> set "distributed joins" flag). We have a plan to perform excessive testing
>> of joins (both co-located and distributed) and list all known limitations.
>> This would require writing a lot of unit tests to cover various scenarios.
>> I think we will have this information in a matter of 1-2 months.
>> >
>> > Vladimir.
>> >
>> > On Tue, Jan 23, 2018 at 11:45 PM, Denis Magda <dm...@apache.org>
>> wrote:
>> > > Agree. The unsupported functions should be mentioned on the page that
>> will cover Ignite ANSI-99 compliance. We have first results available for
>> CORE features of the specification:
>> > > https://ggsystems.atlassian.net/wiki/spaces/GG/pages/4509364
>> 6/ANSI+SQL+99 <https://ggsystems.atlassian.n
>> et/wiki/spaces/GG/pages/45093646/ANSI+SQL+99>
>> > >
>> > > That’s on my radar. I’ll take care of this.
>> > >
>> > > —
>> > > Denis
>> > >
>> > > > On Jan 23, 2018, at 10:31 AM, Dmitriy Setrakyan <
>> dsetrakyan@apache.org> wrote:
>> > > >
>> > > > I think we need a page listing the unsupported functions with
>> explanation
>> > > > why, which is either it does not make sense in Ignite or is planned
>> in
>> > > > future release.
>> > > >
>> > > > Sergey, do you think you will be able to do it?
>> > > >
>> > > > D.
>> > > >
>> > > > On Tue, Jan 23, 2018 at 12:05 AM, Serge Puchnin <
>> sergey.puchnin@gmail.com>
>> > > > wrote:
>> > > >
>> > > >> yes, the Cust function is supporting both Ignite and H2.
>> > > >>
>> > > >> I've updated the documentation for next system functions:
>> > > >> CASEWHEN Function, CAST, CONVERT, TABLE
>> > > >>
>> > > >> https://apacheignite-sql.readme.io/docs/system-functions
>> > > >>
>> > > >> And for my mind, next functions aren't applicable for Ignite:
>> > > >> ARRAY_GET, ARRAY_LENGTH, ARRAY_CONTAINS, CSVREAD, CSVWRITE,
>> DATABASE,
>> > > >> DATABASE_PATH, DISK_SPACE_USED, FILE_READ, FILE_WRITE, LINK_SCHEMA,
>> > > >> MEMORY_FREE, MEMORY_USED, LOCK_MODE, LOCK_TIMEOUT, READONLY,
>> CURRVAL,
>> > > >> AUTOCOMMIT, CANCEL_SESSION, IDENTITY, NEXTVAL, ROWNUM, SCHEMA,
>> > > >> SCOPE_IDENTITY, SESSION_ID, SET, TRANSACTION_ID, TRUNCATE_VALUE,
>> USER,
>> > > >> H2VERSION
>> > > >>
>> > > >> Also an issue was created for review current documentation:
>> > > >> https://issues.apache.org/jira/browse/IGNITE-7496
>> > > >>
>> > > >> --
>> > > >> BR,
>> > > >> Serge
>> > > >>
>> > > >>
>> > > >>
>> > > >> --
>> > > >> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
>> > > >>
>> > >
>> >
>> >
>>
>
>

Re: [SparkDataFrame] Query Optimization. Prototype

Posted by Nikolay Izhikov <ni...@apache.org>.
Hello, Vladimir.

I've created ticket

https://issues.apache.org/jira/browse/IGNITE-7691

В Пн, 12/02/2018 в 11:20 +0300, Vladimir Ozerov пишет:
> Hi Nikolay,
> 
> I am not sure if ticket for DECIMAL column metadata exists. If you haven't find one under "sql" component, please feel free to create it on your own. As far as testing of joins, I think it makes sense to start working on it when we finish ANSI compliance testing which is already in progress.
> 
> On Wed, Jan 24, 2018 at 12:27 PM, Nikolay Izhikov <ni...@gmail.com> wrote:
> > Hello, Vladimir.
> > 
> > Thank you for an answer.
> > 
> > > Do you mean whether it is possible to read it from table metadata?
> > 
> > Yes, you are right.
> > I want to read scale and precision of DECIMAL column from table metadata.
> > 
> > > This will be fixed at some point in future, but I do not have any dates at the moment.
> > 
> > Is there ticket for it? I can't find it via jira search
> > 
> > > at this moment we do not have complete list of all restrictions on our joins, because a lot of work is delegated to H2.
> > > In some unsupported scenarios we throw an exception.
> > > In other cases we return incorrect results silently (e.g. if you do not co-locate data and forgot to set "distributed joins" flag).
> > 
> > Guys, Val, may be we should exclude join optimization from IGNITE-7077 while we haven't all limitation on the hand?
> > 
> > > We have a plan to perform excessive testing of joins (both co-located and distributed) and list all known limitations.
> > 
> > Can I help somehow with this activity?
> > 
> > 
> > В Ср, 24/01/2018 в 12:08 +0300, Vladimir Ozerov пишет:
> > > Hi Nikolay,
> > >
> > > Could you please clarify your question about scale and precision? Do you mean whether it is possible to read it from table metadata? If yes, it is not possible at the moment unfortunately - we do not store information about lengths, scales and precision, only actual data types are passed to H2 (e.g. String, BigDecimal, etc.). This will be fixed at some point in future, but I do not have any dates at the moment.
> > >
> > > Now about joins - Denis, I think you provided wrong link to our internal GridGain docs where we accumulate information about ANSI compatibility and which will are going to publish on Ignite WIKI when it is ready. In any case, this is not what Nikolay aksed about. The question was about limitation of our joins which has nothing to do with ANSI standard. Unfortunately, at this moment we do not have complete list of all restrictions on our joins, because a lot of work is delegated to H2. In some unsupported scenarios we throw an exception. In other cases we return incorrect results silently (e.g. if you do not co-locate data and forgot to set "distributed joins" flag). We have a plan to perform excessive testing of joins (both co-located and distributed) and list all known limitations. This would require writing a lot of unit tests to cover various scenarios. I think we will have this information in a matter of 1-2 months.
> > >
> > > Vladimir.
> > >
> > > On Tue, Jan 23, 2018 at 11:45 PM, Denis Magda <dm...@apache.org> wrote:
> > > > Agree. The unsupported functions should be mentioned on the page that will cover Ignite ANSI-99 compliance. We have first results available for CORE features of the specification:
> > > > https://ggsystems.atlassian.net/wiki/spaces/GG/pages/45093646/ANSI+SQL+99 <https://ggsystems.atlassian.net/wiki/spaces/GG/pages/45093646/ANSI+SQL+99>
> > > >
> > > > That’s on my radar. I’ll take care of this.
> > > >
> > > > —
> > > > Denis
> > > >
> > > > > On Jan 23, 2018, at 10:31 AM, Dmitriy Setrakyan <ds...@apache.org> wrote:
> > > > >
> > > > > I think we need a page listing the unsupported functions with explanation
> > > > > why, which is either it does not make sense in Ignite or is planned in
> > > > > future release.
> > > > >
> > > > > Sergey, do you think you will be able to do it?
> > > > >
> > > > > D.
> > > > >
> > > > > On Tue, Jan 23, 2018 at 12:05 AM, Serge Puchnin <se...@gmail.com>
> > > > > wrote:
> > > > >
> > > > >> yes, the Cust function is supporting both Ignite and H2.
> > > > >>
> > > > >> I've updated the documentation for next system functions:
> > > > >> CASEWHEN Function, CAST, CONVERT, TABLE
> > > > >>
> > > > >> https://apacheignite-sql.readme.io/docs/system-functions
> > > > >>
> > > > >> And for my mind, next functions aren't applicable for Ignite:
> > > > >> ARRAY_GET, ARRAY_LENGTH, ARRAY_CONTAINS, CSVREAD, CSVWRITE, DATABASE,
> > > > >> DATABASE_PATH, DISK_SPACE_USED, FILE_READ, FILE_WRITE, LINK_SCHEMA,
> > > > >> MEMORY_FREE, MEMORY_USED, LOCK_MODE, LOCK_TIMEOUT, READONLY, CURRVAL,
> > > > >> AUTOCOMMIT, CANCEL_SESSION, IDENTITY, NEXTVAL, ROWNUM, SCHEMA,
> > > > >> SCOPE_IDENTITY, SESSION_ID, SET, TRANSACTION_ID, TRUNCATE_VALUE, USER,
> > > > >> H2VERSION
> > > > >>
> > > > >> Also an issue was created for review current documentation:
> > > > >> https://issues.apache.org/jira/browse/IGNITE-7496
> > > > >>
> > > > >> --
> > > > >> BR,
> > > > >> Serge
> > > > >>
> > > > >>
> > > > >>
> > > > >> --
> > > > >> Sent from: http://apache-ignite-developers.2346864.n4.nabble.com/
> > > > >>
> > > >
> > >
> > >
> 
>