You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@ignite.apache.org by Sergey Kozlov <sk...@gridgain.com> on 2019/11/14 11:23:53 UTC

Collocated/replicatedOnly flags for Thin JDBC driver

Hi, Igniters

During testing Thin JDBC Driver I found some interesting things that would
be good to discuss:

collocated flag
==============

The flag helps to optimize query against collocated data in advance. We've
following:
1. Simple queries against single tables return weird results [3]. It means
that either a connection has reset every switch collocated/non-collocated
requests or make two connections for collocated and collocated
requests respectively.
2. *distrubitedJoins *flag mostly covers the cases:
 - if it is true we don't concern how the data distributed over cluster
 - if not we always operate local data no matter it's collocated or not
3. There was an activity to remove it but  it was cancelled [2].


replicatedOnly flag
===============

The flag helps to optimize query against replicated tables (caches) and
query always operates local data.
1. But why can't we find at preparing the request for execution?
2. For PRIMARY_SYNC cache mode using local node data may lead to
inconsistent results. Thus it may be implemented as an explicit hint for a
query if user want to do that and accept the possible risks or just re-use
*distrubitedJoins=false*.
3. Same concern that it used per JDBC connection and required reconnect for
change the flag.

I guess both flags should be deprecated and removed (2.8?).

Ideally *distrubitedJoins *should be removed as well and replaced by query
hints (default *distrubitedJoins=true*)

1. https://apacheignite-sql.readme.io/docs/jdbc-driver
2. https://issues.apache.org/jira/browse/IGNITE-6296
3. https://issues.apache.org/jira/browse/IGNITE-12372
-- 
Sergey Kozlov
GridGain Systems
www.gridgain.com

Re: Collocated/replicatedOnly flags for Thin JDBC driver

Posted by Denis Magda <dm...@apache.org>.

Ivan,

Thanks for the details. I've created a ticket to clarify the section of the
docs related to the collocated flag:
https://issues.apache.org/jira/browse/IGNITE-12382

-
Denis


On Mon, Nov 18, 2019 at 1:07 AM Ivan Pavlukhin <vo...@gmail.com> wrote:

> Sergey, Denis,
>
> All aforementioned flags are needed to tackle shortcomings in our SQL
> engine.
>
> SqlFieldsQuery.collocated
> Actually javadoc describes this flag quite well. Some bits from my
> side. As all we know Ignite by default uses "collocated" query
> execution model. Basically it means that joins are processed on each
> node locally. But further reduction processes replies from all nodes
> participating in query. GROUP BY is a variant of such reduction. When
> SqlFieldsQuery.collocated is set to true, then GROUP BY is executed on
> each node locally. Current engine cannot make such decision
> automatically (shortcomings...).
>
> SqlFieldsQuery.replicatedOnly
> Is deprecated in 2.8. Current engine infers it automatically. Was
> another shortcoming in previous releases.
>
> Ideally we should get rid of at least these 2 flags and
> SqlFieldsQuery.distributedJoins. We have it in requirements for new
> SQL engine.
>
> сб, 16 нояб. 2019 г. в 01:14, Denis Magda <dm...@apache.org>:
> >
> > Ignite SQL experts,
> >
> > Could you remind all of us what was the primary reason for adding the
> > flags? It was about corner cases if I'm not mistaken and it makes sense
> to
> > review those usage scenarios/patterns again.
> >
> > The flags are used for both JDBC and ODBC drivers and I would remove them
> > only if they became irrelevant.
> >
> >
> > -
> > Denis
> >
> >
> > On Thu, Nov 14, 2019 at 3:24 AM Sergey Kozlov <sk...@gridgain.com>
> wrote:
> >
> > > Hi, Igniters
> > >
> > > During testing Thin JDBC Driver I found some interesting things that
> would
> > > be good to discuss:
> > >
> > > collocated flag
> > > ==============
> > >
> > > The flag helps to optimize query against collocated data in advance.
> We've
> > > following:
> > > 1. Simple queries against single tables return weird results [3]. It
> means
> > > that either a connection has reset every switch
> collocated/non-collocated
> > > requests or make two connections for collocated and collocated
> > > requests respectively.
> > > 2. *distrubitedJoins *flag mostly covers the cases:
> > >  - if it is true we don't concern how the data distributed over cluster
> > >  - if not we always operate local data no matter it's collocated or not
> > > 3. There was an activity to remove it but  it was cancelled [2].
> > >
> > >
> > > replicatedOnly flag
> > > ===============
> > >
> > > The flag helps to optimize query against replicated tables (caches) and
> > > query always operates local data.
> > > 1. But why can't we find at preparing the request for execution?
> > > 2. For PRIMARY_SYNC cache mode using local node data may lead to
> > > inconsistent results. Thus it may be implemented as an explicit hint
> for a
> > > query if user want to do that and accept the possible risks or just
> re-use
> > > *distrubitedJoins=false*.
> > > 3. Same concern that it used per JDBC connection and required
> reconnect for
> > > change the flag.
> > >
> > > I guess both flags should be deprecated and removed (2.8?).
> > >
> > > Ideally *distrubitedJoins *should be removed as well and replaced by
> query
> > > hints (default *distrubitedJoins=true*)
> > >
> > > 1. https://apacheignite-sql.readme.io/docs/jdbc-driver
> > > 2. https://issues.apache.org/jira/browse/IGNITE-6296
> > > 3. https://issues.apache.org/jira/browse/IGNITE-12372
> > > --
> > > Sergey Kozlov
> > > GridGain Systems
> > > www.gridgain.com
> > >
>
>
>
> --
> Best regards,
> Ivan Pavlukhin
>

Re: Collocated/replicatedOnly flags for Thin JDBC driver

Posted by Ivan Pavlukhin <vo...@gmail.com>.

Sergey, Denis,

All aforementioned flags are needed to tackle shortcomings in our SQL engine.

SqlFieldsQuery.collocated
Actually javadoc describes this flag quite well. Some bits from my
side. As all we know Ignite by default uses "collocated" query
execution model. Basically it means that joins are processed on each
node locally. But further reduction processes replies from all nodes
participating in query. GROUP BY is a variant of such reduction. When
SqlFieldsQuery.collocated is set to true, then GROUP BY is executed on
each node locally. Current engine cannot make such decision
automatically (shortcomings...).

SqlFieldsQuery.replicatedOnly
Is deprecated in 2.8. Current engine infers it automatically. Was
another shortcoming in previous releases.

Ideally we should get rid of at least these 2 flags and
SqlFieldsQuery.distributedJoins. We have it in requirements for new
SQL engine.

сб, 16 нояб. 2019 г. в 01:14, Denis Magda <dm...@apache.org>:
>
> Ignite SQL experts,
>
> Could you remind all of us what was the primary reason for adding the
> flags? It was about corner cases if I'm not mistaken and it makes sense to
> review those usage scenarios/patterns again.
>
> The flags are used for both JDBC and ODBC drivers and I would remove them
> only if they became irrelevant.
>
>
> -
> Denis
>
>
> On Thu, Nov 14, 2019 at 3:24 AM Sergey Kozlov <sk...@gridgain.com> wrote:
>
> > Hi, Igniters
> >
> > During testing Thin JDBC Driver I found some interesting things that would
> > be good to discuss:
> >
> > collocated flag
> > ==============
> >
> > The flag helps to optimize query against collocated data in advance. We've
> > following:
> > 1. Simple queries against single tables return weird results [3]. It means
> > that either a connection has reset every switch collocated/non-collocated
> > requests or make two connections for collocated and collocated
> > requests respectively.
> > 2. *distrubitedJoins *flag mostly covers the cases:
> >  - if it is true we don't concern how the data distributed over cluster
> >  - if not we always operate local data no matter it's collocated or not
> > 3. There was an activity to remove it but  it was cancelled [2].
> >
> >
> > replicatedOnly flag
> > ===============
> >
> > The flag helps to optimize query against replicated tables (caches) and
> > query always operates local data.
> > 1. But why can't we find at preparing the request for execution?
> > 2. For PRIMARY_SYNC cache mode using local node data may lead to
> > inconsistent results. Thus it may be implemented as an explicit hint for a
> > query if user want to do that and accept the possible risks or just re-use
> > *distrubitedJoins=false*.
> > 3. Same concern that it used per JDBC connection and required reconnect for
> > change the flag.
> >
> > I guess both flags should be deprecated and removed (2.8?).
> >
> > Ideally *distrubitedJoins *should be removed as well and replaced by query
> > hints (default *distrubitedJoins=true*)
> >
> > 1. https://apacheignite-sql.readme.io/docs/jdbc-driver
> > 2. https://issues.apache.org/jira/browse/IGNITE-6296
> > 3. https://issues.apache.org/jira/browse/IGNITE-12372
> > --
> > Sergey Kozlov
> > GridGain Systems
> > www.gridgain.com
> >



-- 
Best regards,
Ivan Pavlukhin

Re: Collocated/replicatedOnly flags for Thin JDBC driver

Posted by Denis Magda <dm...@apache.org>.

Ignite SQL experts,

Could you remind all of us what was the primary reason for adding the
flags? It was about corner cases if I'm not mistaken and it makes sense to
review those usage scenarios/patterns again.

The flags are used for both JDBC and ODBC drivers and I would remove them
only if they became irrelevant.


-
Denis


On Thu, Nov 14, 2019 at 3:24 AM Sergey Kozlov <sk...@gridgain.com> wrote:

> Hi, Igniters
>
> During testing Thin JDBC Driver I found some interesting things that would
> be good to discuss:
>
> collocated flag
> ==============
>
> The flag helps to optimize query against collocated data in advance. We've
> following:
> 1. Simple queries against single tables return weird results [3]. It means
> that either a connection has reset every switch collocated/non-collocated
> requests or make two connections for collocated and collocated
> requests respectively.
> 2. *distrubitedJoins *flag mostly covers the cases:
>  - if it is true we don't concern how the data distributed over cluster
>  - if not we always operate local data no matter it's collocated or not
> 3. There was an activity to remove it but  it was cancelled [2].
>
>
> replicatedOnly flag
> ===============
>
> The flag helps to optimize query against replicated tables (caches) and
> query always operates local data.
> 1. But why can't we find at preparing the request for execution?
> 2. For PRIMARY_SYNC cache mode using local node data may lead to
> inconsistent results. Thus it may be implemented as an explicit hint for a
> query if user want to do that and accept the possible risks or just re-use
> *distrubitedJoins=false*.
> 3. Same concern that it used per JDBC connection and required reconnect for
> change the flag.
>
> I guess both flags should be deprecated and removed (2.8?).
>
> Ideally *distrubitedJoins *should be removed as well and replaced by query
> hints (default *distrubitedJoins=true*)
>
> 1. https://apacheignite-sql.readme.io/docs/jdbc-driver
> 2. https://issues.apache.org/jira/browse/IGNITE-6296
> 3. https://issues.apache.org/jira/browse/IGNITE-12372
> --
> Sergey Kozlov
> GridGain Systems
> www.gridgain.com
>