You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@ignite.apache.org by Sergi Vladykin <se...@gmail.com> on 2017/04/12 09:31:51 UTC

SQL on PARTITIONED vs REPLICATED cache

Guys,

I want to introduce another breaking change for 2.0.

Currently SQL is being processed differently when we call method `query` on
partitioned cache and on replicated: on replicated cache we do not do any
extra processing and execute the query as is on current node.

This behavior historically existed for performance reasons. But it is not
obvious and leads to wrong query results. This issue becomes even more
creepy with JDBC and ODBC drivers.

In 2.0 I want to execute all the SQL queries the same way through the whole
processing pipeline to guaranty the correct result irrespectively to the
cache that was the query originator.

To be able to have the old behavior (skip all the preprocessing and run
query on current node) add a flag isReplicatedOnly() on SqlQuery and
SqlFieldsQuery. It will be disabled by default and if one knows that the
only replicated tables participate in a query, then he can enable it for
better performance.

Sergi

Re: SQL on PARTITIONED vs REPLICATED cache

Posted by Sergi Vladykin <se...@gmail.com>.

Yes, it is a correct explanation. I've created the issue

https://issues.apache.org/jira/browse/IGNITE-4955

Sergi

2017-04-13 0:25 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:

> Got it, Denis. I think you are right.
>
> On Wed, Apr 12, 2017 at 2:20 PM, Denis Magda <dm...@apache.org> wrote:
>
> > Dmitriy,
> >
> > No, I think that Sergi supposed a type of cache which reference is used
> > for a query execution. In my example
> >
> > >> 2. Error-prone scenario - *replicatedCache*.query(“SELECT * FROM
> > >> partitionedCache … JOIN replicatedCache …”);
> >
> > *replicatedCache* reference is used for the query execution and, as I
> > understand, this causes the issue.
> >
> > Sergi, please clarify.
> >
> > —
> > Denis
> >
> > > On Apr 12, 2017, at 1:51 PM, Dmitriy Setrakyan <ds...@apache.org>
> > wrote:
> > >
> > > Denis, I think that you meant selecting from replicated cache first as
> an
> > > invalid scenario, but provided the wrong example. Here is the correct
> > > example for the invalid query:
> > >
> > > SELECT * FROM replicatedCache … JOIN partitionedCache …”
> > >
> > > I do agree, we should make the change, as long as we keep the flag to
> > > enable the old behavior.
> > >
> > > D.
> > >
> > > On Wed, Apr 12, 2017 at 12:50 PM, Denis Magda <dm...@apache.org>
> wrote:
> > >
> > >> Sergi,
> > >>
> > >> As far as I understand you’re considering an example below:
> > >>
> > >> IgniteCache partitioneCache = ...;
> > >> IgniteCache replicatedCache = …;
> > >>
> > >> 1. Valid scenario - *partitionedCache*.query(“SELECT * FROM
> > >> partitionedCache … JOIN replicatedCache …”);
> > >> 2. Error-prone scenario - *replicatedCache*.query(“SELECT * FROM
> > >> partitionedCache … JOIN replicatedCache …”);
> > >>
> > >> Do you mean 2. as the issue? If it’s so then can’t we just detect on
> our
> > >> own that all the caches are replicated and execute a query more
> optimal?
> > >> This should omit necessity to add isReplicatedOnly()?
> > >>
> > >> —
> > >> Denis
> > >>
> > >>> On Apr 12, 2017, at 7:07 AM, Andrey Mashenkov <
> > >> andrey.mashenkov@gmail.com> wrote:
> > >>>
> > >>> Yes, it's reasonable.
> > >>>
> > >>> On Wed, Apr 12, 2017 at 3:23 PM, Sergi Vladykin <
> > >> sergi.vladykin@gmail.com>
> > >>> wrote:
> > >>>
> > >>>> Good point, but I'm not sure. The difference is that on client node
> > you
> > >>>> should not be able to enable isLocal, while isReplicatedOnly is
> > >> perfectly
> > >>>> valid. What do you think?
> > >>>>
> > >>>> Sergi
> > >>>>
> > >>>> 2017-04-12 15:18 GMT+03:00 Andrey Mashenkov <
> > andrey.mashenkov@gmail.com
> > >>> :
> > >>>>
> > >>>>> Sergi,
> > >>>>>
> > >>>>> Got it.
> > >>>>>
> > >>>>> Does query execution way and results will be same for
> > isReplicatedOnly
> > >>>> flag
> > >>>>> and for isLocal flag turned on?
> > >>>>> If my understanding is correct, we will get same results and there
> is
> > >> no
> > >>>>> need to introduce a new flag.
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> On Wed, Apr 12, 2017 at 2:54 PM, Sergi Vladykin <
> > >>>> sergi.vladykin@gmail.com>
> > >>>>> wrote:
> > >>>>>
> > >>>>>> Ok, let it be an exception. I'm just saying that the thing does
> not
> > >>>> work
> > >>>>>> now.
> > >>>>>>
> > >>>>>> Sergi
> > >>>>>>
> > >>>>>> 2017-04-12 14:50 GMT+03:00 Andrey Mashenkov <
> > >>>> andrey.mashenkov@gmail.com
> > >>>>>> :
> > >>>>>>
> > >>>>>>> Sergi,
> > >>>>>>>
> > >>>>>>> I wounder how it is possible?
> > >>>>>>>
> > >>>>>>> Looks like it is impossible to run query on replicated cache, but
> > >>>>> select
> > >>>>>>> data from a
> > >>>>>>> partitioned table. It will result with IlleagalStateException on
> > >>>> stable
> > >>>>>>> topology or
> > >>>>>>> IgniteCacheException on unstable topology.
> > >>>>>>> See ReduceQueryExecutor.stableDataNodes() and
> > >>>>>>> replicatedUnstableDataNodes()
> > >>>>>>> methods.
> > >>>>>>>
> > >>>>>>> BTW, IlleagalStateException with no message is confusing.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On Wed, Apr 12, 2017 at 2:36 PM, Sergi Vladykin <
> > >>>>>> sergi.vladykin@gmail.com>
> > >>>>>>> wrote:
> > >>>>>>>
> > >>>>>>>> Andrey,
> > >>>>>>>>
> > >>>>>>>> Because if you run query on replicated cache, but select data
> from
> > >>>> a
> > >>>>>>>> partitioned table, you will get only a part of the result.
> > >>>>>>>>
> > >>>>>>>> Igor,
> > >>>>>>>>
> > >>>>>>>> You are mostly right, but
> > >>>>>>>>
> > >>>>>>>> 1. Performance characteristics may change.
> > >>>>>>>> 2. Ignite SQL processing pipeline may not support all the stuff
> in
> > >>>> H2
> > >>>>>> SQL
> > >>>>>>>> and fail in some case where it worked previously.
> > >>>>>>>>
> > >>>>>>>> Because of this the change may affect existing applications and
> I
> > >>>>> want
> > >>>>>> to
> > >>>>>>>> have it in 2.0 to make it legal.
> > >>>>>>>>
> > >>>>>>>> Sergi
> > >>>>>>>>
> > >>>>>>>> 2017-04-12 14:10 GMT+03:00 Igor Sapego <is...@gridgain.com>:
> > >>>>>>>>
> > >>>>>>>>> Also, is it really a breaking change if the results are wrong?
> > >>>>>>>>> To me it looks more like a bugfix, i.e. you can't break
> something
> > >>>>>>>>> that does not work properly.
> > >>>>>>>>>
> > >>>>>>>>> Best Regards,
> > >>>>>>>>> Igor
> > >>>>>>>>>
> > >>>>>>>>> On Wed, Apr 12, 2017 at 2:04 PM, Andrey Mashenkov <
> > >>>>>>>>> andrey.mashenkov@gmail.com> wrote:
> > >>>>>>>>>
> > >>>>>>>>>> Sergi,
> > >>>>>>>>>>
> > >>>>>>>>>> How can query to replicated cache leads to to wrong results?
> > >>>>>>>>>> Is it due to we can read backup entries?
> > >>>>>>>>>>
> > >>>>>>>>>> On Wed, Apr 12, 2017 at 12:31 PM, Sergi Vladykin <
> > >>>>>>>>> sergi.vladykin@gmail.com
> > >>>>>>>>>>>
> > >>>>>>>>>> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>>> Guys,
> > >>>>>>>>>>>
> > >>>>>>>>>>> I want to introduce another breaking change for 2.0.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Currently SQL is being processed differently when we call
> > >>>>> method
> > >>>>>>>>> `query`
> > >>>>>>>>>> on
> > >>>>>>>>>>> partitioned cache and on replicated: on replicated cache we
> > >>>> do
> > >>>>>> not
> > >>>>>>> do
> > >>>>>>>>> any
> > >>>>>>>>>>> extra processing and execute the query as is on current node.
> > >>>>>>>>>>>
> > >>>>>>>>>>> This behavior historically existed for performance reasons.
> > >>>> But
> > >>>>>> it
> > >>>>>>> is
> > >>>>>>>>> not
> > >>>>>>>>>>> obvious and leads to wrong query results. This issue becomes
> > >>>>> even
> > >>>>>>>> more
> > >>>>>>>>>>> creepy with JDBC and ODBC drivers.
> > >>>>>>>>>>>
> > >>>>>>>>>>> In 2.0 I want to execute all the SQL queries the same way
> > >>>>> through
> > >>>>>>> the
> > >>>>>>>>>> whole
> > >>>>>>>>>>> processing pipeline to guaranty the correct result
> > >>>>> irrespectively
> > >>>>>>> to
> > >>>>>>>>> the
> > >>>>>>>>>>> cache that was the query originator.
> > >>>>>>>>>>>
> > >>>>>>>>>>> To be able to have the old behavior (skip all the
> > >>>> preprocessing
> > >>>>>> and
> > >>>>>>>> run
> > >>>>>>>>>>> query on current node) add a flag isReplicatedOnly() on
> > >>>>> SqlQuery
> > >>>>>>> and
> > >>>>>>>>>>> SqlFieldsQuery. It will be disabled by default and if one
> > >>>> knows
> > >>>>>>> that
> > >>>>>>>>> the
> > >>>>>>>>>>> only replicated tables participate in a query, then he can
> > >>>>> enable
> > >>>>>>> it
> > >>>>>>>>> for
> > >>>>>>>>>>> better performance.
> > >>>>>>>>>>>
> > >>>>>>>>>>> Sergi
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> --
> > >>>>>>>>>> Best regards,
> > >>>>>>>>>> Andrey V. Mashenkov
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> --
> > >>>>>>> Best regards,
> > >>>>>>> Andrey V. Mashenkov
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> --
> > >>>>> Best regards,
> > >>>>> Andrey V. Mashenkov
> > >>>>>
> > >>>>
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> Best regards,
> > >>> Andrey V. Mashenkov
> > >>
> > >>
> >
> >
>

Re: SQL on PARTITIONED vs REPLICATED cache

Posted by Dmitriy Setrakyan <ds...@apache.org>.

Got it, Denis. I think you are right.

On Wed, Apr 12, 2017 at 2:20 PM, Denis Magda <dm...@apache.org> wrote:

> Dmitriy,
>
> No, I think that Sergi supposed a type of cache which reference is used
> for a query execution. In my example
>
> >> 2. Error-prone scenario - *replicatedCache*.query(“SELECT * FROM
> >> partitionedCache … JOIN replicatedCache …”);
>
> *replicatedCache* reference is used for the query execution and, as I
> understand, this causes the issue.
>
> Sergi, please clarify.
>
> —
> Denis
>
> > On Apr 12, 2017, at 1:51 PM, Dmitriy Setrakyan <ds...@apache.org>
> wrote:
> >
> > Denis, I think that you meant selecting from replicated cache first as an
> > invalid scenario, but provided the wrong example. Here is the correct
> > example for the invalid query:
> >
> > SELECT * FROM replicatedCache … JOIN partitionedCache …”
> >
> > I do agree, we should make the change, as long as we keep the flag to
> > enable the old behavior.
> >
> > D.
> >
> > On Wed, Apr 12, 2017 at 12:50 PM, Denis Magda <dm...@apache.org> wrote:
> >
> >> Sergi,
> >>
> >> As far as I understand you’re considering an example below:
> >>
> >> IgniteCache partitioneCache = ...;
> >> IgniteCache replicatedCache = …;
> >>
> >> 1. Valid scenario - *partitionedCache*.query(“SELECT * FROM
> >> partitionedCache … JOIN replicatedCache …”);
> >> 2. Error-prone scenario - *replicatedCache*.query(“SELECT * FROM
> >> partitionedCache … JOIN replicatedCache …”);
> >>
> >> Do you mean 2. as the issue? If it’s so then can’t we just detect on our
> >> own that all the caches are replicated and execute a query more optimal?
> >> This should omit necessity to add isReplicatedOnly()?
> >>
> >> —
> >> Denis
> >>
> >>> On Apr 12, 2017, at 7:07 AM, Andrey Mashenkov <
> >> andrey.mashenkov@gmail.com> wrote:
> >>>
> >>> Yes, it's reasonable.
> >>>
> >>> On Wed, Apr 12, 2017 at 3:23 PM, Sergi Vladykin <
> >> sergi.vladykin@gmail.com>
> >>> wrote:
> >>>
> >>>> Good point, but I'm not sure. The difference is that on client node
> you
> >>>> should not be able to enable isLocal, while isReplicatedOnly is
> >> perfectly
> >>>> valid. What do you think?
> >>>>
> >>>> Sergi
> >>>>
> >>>> 2017-04-12 15:18 GMT+03:00 Andrey Mashenkov <
> andrey.mashenkov@gmail.com
> >>> :
> >>>>
> >>>>> Sergi,
> >>>>>
> >>>>> Got it.
> >>>>>
> >>>>> Does query execution way and results will be same for
> isReplicatedOnly
> >>>> flag
> >>>>> and for isLocal flag turned on?
> >>>>> If my understanding is correct, we will get same results and there is
> >> no
> >>>>> need to introduce a new flag.
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Wed, Apr 12, 2017 at 2:54 PM, Sergi Vladykin <
> >>>> sergi.vladykin@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> Ok, let it be an exception. I'm just saying that the thing does not
> >>>> work
> >>>>>> now.
> >>>>>>
> >>>>>> Sergi
> >>>>>>
> >>>>>> 2017-04-12 14:50 GMT+03:00 Andrey Mashenkov <
> >>>> andrey.mashenkov@gmail.com
> >>>>>> :
> >>>>>>
> >>>>>>> Sergi,
> >>>>>>>
> >>>>>>> I wounder how it is possible?
> >>>>>>>
> >>>>>>> Looks like it is impossible to run query on replicated cache, but
> >>>>> select
> >>>>>>> data from a
> >>>>>>> partitioned table. It will result with IlleagalStateException on
> >>>> stable
> >>>>>>> topology or
> >>>>>>> IgniteCacheException on unstable topology.
> >>>>>>> See ReduceQueryExecutor.stableDataNodes() and
> >>>>>>> replicatedUnstableDataNodes()
> >>>>>>> methods.
> >>>>>>>
> >>>>>>> BTW, IlleagalStateException with no message is confusing.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Wed, Apr 12, 2017 at 2:36 PM, Sergi Vladykin <
> >>>>>> sergi.vladykin@gmail.com>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Andrey,
> >>>>>>>>
> >>>>>>>> Because if you run query on replicated cache, but select data from
> >>>> a
> >>>>>>>> partitioned table, you will get only a part of the result.
> >>>>>>>>
> >>>>>>>> Igor,
> >>>>>>>>
> >>>>>>>> You are mostly right, but
> >>>>>>>>
> >>>>>>>> 1. Performance characteristics may change.
> >>>>>>>> 2. Ignite SQL processing pipeline may not support all the stuff in
> >>>> H2
> >>>>>> SQL
> >>>>>>>> and fail in some case where it worked previously.
> >>>>>>>>
> >>>>>>>> Because of this the change may affect existing applications and I
> >>>>> want
> >>>>>> to
> >>>>>>>> have it in 2.0 to make it legal.
> >>>>>>>>
> >>>>>>>> Sergi
> >>>>>>>>
> >>>>>>>> 2017-04-12 14:10 GMT+03:00 Igor Sapego <is...@gridgain.com>:
> >>>>>>>>
> >>>>>>>>> Also, is it really a breaking change if the results are wrong?
> >>>>>>>>> To me it looks more like a bugfix, i.e. you can't break something
> >>>>>>>>> that does not work properly.
> >>>>>>>>>
> >>>>>>>>> Best Regards,
> >>>>>>>>> Igor
> >>>>>>>>>
> >>>>>>>>> On Wed, Apr 12, 2017 at 2:04 PM, Andrey Mashenkov <
> >>>>>>>>> andrey.mashenkov@gmail.com> wrote:
> >>>>>>>>>
> >>>>>>>>>> Sergi,
> >>>>>>>>>>
> >>>>>>>>>> How can query to replicated cache leads to to wrong results?
> >>>>>>>>>> Is it due to we can read backup entries?
> >>>>>>>>>>
> >>>>>>>>>> On Wed, Apr 12, 2017 at 12:31 PM, Sergi Vladykin <
> >>>>>>>>> sergi.vladykin@gmail.com
> >>>>>>>>>>>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Guys,
> >>>>>>>>>>>
> >>>>>>>>>>> I want to introduce another breaking change for 2.0.
> >>>>>>>>>>>
> >>>>>>>>>>> Currently SQL is being processed differently when we call
> >>>>> method
> >>>>>>>>> `query`
> >>>>>>>>>> on
> >>>>>>>>>>> partitioned cache and on replicated: on replicated cache we
> >>>> do
> >>>>>> not
> >>>>>>> do
> >>>>>>>>> any
> >>>>>>>>>>> extra processing and execute the query as is on current node.
> >>>>>>>>>>>
> >>>>>>>>>>> This behavior historically existed for performance reasons.
> >>>> But
> >>>>>> it
> >>>>>>> is
> >>>>>>>>> not
> >>>>>>>>>>> obvious and leads to wrong query results. This issue becomes
> >>>>> even
> >>>>>>>> more
> >>>>>>>>>>> creepy with JDBC and ODBC drivers.
> >>>>>>>>>>>
> >>>>>>>>>>> In 2.0 I want to execute all the SQL queries the same way
> >>>>> through
> >>>>>>> the
> >>>>>>>>>> whole
> >>>>>>>>>>> processing pipeline to guaranty the correct result
> >>>>> irrespectively
> >>>>>>> to
> >>>>>>>>> the
> >>>>>>>>>>> cache that was the query originator.
> >>>>>>>>>>>
> >>>>>>>>>>> To be able to have the old behavior (skip all the
> >>>> preprocessing
> >>>>>> and
> >>>>>>>> run
> >>>>>>>>>>> query on current node) add a flag isReplicatedOnly() on
> >>>>> SqlQuery
> >>>>>>> and
> >>>>>>>>>>> SqlFieldsQuery. It will be disabled by default and if one
> >>>> knows
> >>>>>>> that
> >>>>>>>>> the
> >>>>>>>>>>> only replicated tables participate in a query, then he can
> >>>>> enable
> >>>>>>> it
> >>>>>>>>> for
> >>>>>>>>>>> better performance.
> >>>>>>>>>>>
> >>>>>>>>>>> Sergi
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> --
> >>>>>>>>>> Best regards,
> >>>>>>>>>> Andrey V. Mashenkov
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> Best regards,
> >>>>>>> Andrey V. Mashenkov
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Best regards,
> >>>>> Andrey V. Mashenkov
> >>>>>
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Best regards,
> >>> Andrey V. Mashenkov
> >>
> >>
>
>

Re: SQL on PARTITIONED vs REPLICATED cache

Posted by Denis Magda <dm...@apache.org>.

Dmitriy,

No, I think that Sergi supposed a type of cache which reference is used for a query execution. In my example

>> 2. Error-prone scenario - *replicatedCache*.query(“SELECT * FROM
>> partitionedCache … JOIN replicatedCache …”);

*replicatedCache* reference is used for the query execution and, as I understand, this causes the issue.

Sergi, please clarify.

—
Denis

> On Apr 12, 2017, at 1:51 PM, Dmitriy Setrakyan <ds...@apache.org> wrote:
> 
> Denis, I think that you meant selecting from replicated cache first as an
> invalid scenario, but provided the wrong example. Here is the correct
> example for the invalid query:
> 
> SELECT * FROM replicatedCache … JOIN partitionedCache …”
> 
> I do agree, we should make the change, as long as we keep the flag to
> enable the old behavior.
> 
> D.
> 
> On Wed, Apr 12, 2017 at 12:50 PM, Denis Magda <dm...@apache.org> wrote:
> 
>> Sergi,
>> 
>> As far as I understand you’re considering an example below:
>> 
>> IgniteCache partitioneCache = ...;
>> IgniteCache replicatedCache = …;
>> 
>> 1. Valid scenario - *partitionedCache*.query(“SELECT * FROM
>> partitionedCache … JOIN replicatedCache …”);
>> 2. Error-prone scenario - *replicatedCache*.query(“SELECT * FROM
>> partitionedCache … JOIN replicatedCache …”);
>> 
>> Do you mean 2. as the issue? If it’s so then can’t we just detect on our
>> own that all the caches are replicated and execute a query more optimal?
>> This should omit necessity to add isReplicatedOnly()?
>> 
>> —
>> Denis
>> 
>>> On Apr 12, 2017, at 7:07 AM, Andrey Mashenkov <
>> andrey.mashenkov@gmail.com> wrote:
>>> 
>>> Yes, it's reasonable.
>>> 
>>> On Wed, Apr 12, 2017 at 3:23 PM, Sergi Vladykin <
>> sergi.vladykin@gmail.com>
>>> wrote:
>>> 
>>>> Good point, but I'm not sure. The difference is that on client node you
>>>> should not be able to enable isLocal, while isReplicatedOnly is
>> perfectly
>>>> valid. What do you think?
>>>> 
>>>> Sergi
>>>> 
>>>> 2017-04-12 15:18 GMT+03:00 Andrey Mashenkov <andrey.mashenkov@gmail.com
>>> :
>>>> 
>>>>> Sergi,
>>>>> 
>>>>> Got it.
>>>>> 
>>>>> Does query execution way and results will be same for isReplicatedOnly
>>>> flag
>>>>> and for isLocal flag turned on?
>>>>> If my understanding is correct, we will get same results and there is
>> no
>>>>> need to introduce a new flag.
>>>>> 
>>>>> 
>>>>> 
>>>>> On Wed, Apr 12, 2017 at 2:54 PM, Sergi Vladykin <
>>>> sergi.vladykin@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> Ok, let it be an exception. I'm just saying that the thing does not
>>>> work
>>>>>> now.
>>>>>> 
>>>>>> Sergi
>>>>>> 
>>>>>> 2017-04-12 14:50 GMT+03:00 Andrey Mashenkov <
>>>> andrey.mashenkov@gmail.com
>>>>>> :
>>>>>> 
>>>>>>> Sergi,
>>>>>>> 
>>>>>>> I wounder how it is possible?
>>>>>>> 
>>>>>>> Looks like it is impossible to run query on replicated cache, but
>>>>> select
>>>>>>> data from a
>>>>>>> partitioned table. It will result with IlleagalStateException on
>>>> stable
>>>>>>> topology or
>>>>>>> IgniteCacheException on unstable topology.
>>>>>>> See ReduceQueryExecutor.stableDataNodes() and
>>>>>>> replicatedUnstableDataNodes()
>>>>>>> methods.
>>>>>>> 
>>>>>>> BTW, IlleagalStateException with no message is confusing.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Wed, Apr 12, 2017 at 2:36 PM, Sergi Vladykin <
>>>>>> sergi.vladykin@gmail.com>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Andrey,
>>>>>>>> 
>>>>>>>> Because if you run query on replicated cache, but select data from
>>>> a
>>>>>>>> partitioned table, you will get only a part of the result.
>>>>>>>> 
>>>>>>>> Igor,
>>>>>>>> 
>>>>>>>> You are mostly right, but
>>>>>>>> 
>>>>>>>> 1. Performance characteristics may change.
>>>>>>>> 2. Ignite SQL processing pipeline may not support all the stuff in
>>>> H2
>>>>>> SQL
>>>>>>>> and fail in some case where it worked previously.
>>>>>>>> 
>>>>>>>> Because of this the change may affect existing applications and I
>>>>> want
>>>>>> to
>>>>>>>> have it in 2.0 to make it legal.
>>>>>>>> 
>>>>>>>> Sergi
>>>>>>>> 
>>>>>>>> 2017-04-12 14:10 GMT+03:00 Igor Sapego <is...@gridgain.com>:
>>>>>>>> 
>>>>>>>>> Also, is it really a breaking change if the results are wrong?
>>>>>>>>> To me it looks more like a bugfix, i.e. you can't break something
>>>>>>>>> that does not work properly.
>>>>>>>>> 
>>>>>>>>> Best Regards,
>>>>>>>>> Igor
>>>>>>>>> 
>>>>>>>>> On Wed, Apr 12, 2017 at 2:04 PM, Andrey Mashenkov <
>>>>>>>>> andrey.mashenkov@gmail.com> wrote:
>>>>>>>>> 
>>>>>>>>>> Sergi,
>>>>>>>>>> 
>>>>>>>>>> How can query to replicated cache leads to to wrong results?
>>>>>>>>>> Is it due to we can read backup entries?
>>>>>>>>>> 
>>>>>>>>>> On Wed, Apr 12, 2017 at 12:31 PM, Sergi Vladykin <
>>>>>>>>> sergi.vladykin@gmail.com
>>>>>>>>>>> 
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Guys,
>>>>>>>>>>> 
>>>>>>>>>>> I want to introduce another breaking change for 2.0.
>>>>>>>>>>> 
>>>>>>>>>>> Currently SQL is being processed differently when we call
>>>>> method
>>>>>>>>> `query`
>>>>>>>>>> on
>>>>>>>>>>> partitioned cache and on replicated: on replicated cache we
>>>> do
>>>>>> not
>>>>>>> do
>>>>>>>>> any
>>>>>>>>>>> extra processing and execute the query as is on current node.
>>>>>>>>>>> 
>>>>>>>>>>> This behavior historically existed for performance reasons.
>>>> But
>>>>>> it
>>>>>>> is
>>>>>>>>> not
>>>>>>>>>>> obvious and leads to wrong query results. This issue becomes
>>>>> even
>>>>>>>> more
>>>>>>>>>>> creepy with JDBC and ODBC drivers.
>>>>>>>>>>> 
>>>>>>>>>>> In 2.0 I want to execute all the SQL queries the same way
>>>>> through
>>>>>>> the
>>>>>>>>>> whole
>>>>>>>>>>> processing pipeline to guaranty the correct result
>>>>> irrespectively
>>>>>>> to
>>>>>>>>> the
>>>>>>>>>>> cache that was the query originator.
>>>>>>>>>>> 
>>>>>>>>>>> To be able to have the old behavior (skip all the
>>>> preprocessing
>>>>>> and
>>>>>>>> run
>>>>>>>>>>> query on current node) add a flag isReplicatedOnly() on
>>>>> SqlQuery
>>>>>>> and
>>>>>>>>>>> SqlFieldsQuery. It will be disabled by default and if one
>>>> knows
>>>>>>> that
>>>>>>>>> the
>>>>>>>>>>> only replicated tables participate in a query, then he can
>>>>> enable
>>>>>>> it
>>>>>>>>> for
>>>>>>>>>>> better performance.
>>>>>>>>>>> 
>>>>>>>>>>> Sergi
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> --
>>>>>>>>>> Best regards,
>>>>>>>>>> Andrey V. Mashenkov
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Best regards,
>>>>>>> Andrey V. Mashenkov
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Best regards,
>>>>> Andrey V. Mashenkov
>>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Best regards,
>>> Andrey V. Mashenkov
>> 
>>

Re: SQL on PARTITIONED vs REPLICATED cache

Posted by Dmitriy Setrakyan <ds...@apache.org>.

Denis, I think that you meant selecting from replicated cache first as an
invalid scenario, but provided the wrong example. Here is the correct
example for the invalid query:

SELECT * FROM replicatedCache … JOIN partitionedCache …”

I do agree, we should make the change, as long as we keep the flag to
enable the old behavior.

D.

On Wed, Apr 12, 2017 at 12:50 PM, Denis Magda <dm...@apache.org> wrote:

> Sergi,
>
> As far as I understand you’re considering an example below:
>
> IgniteCache partitioneCache = ...;
> IgniteCache replicatedCache = …;
>
> 1. Valid scenario - *partitionedCache*.query(“SELECT * FROM
> partitionedCache … JOIN replicatedCache …”);
> 2. Error-prone scenario - *replicatedCache*.query(“SELECT * FROM
> partitionedCache … JOIN replicatedCache …”);
>
> Do you mean 2. as the issue? If it’s so then can’t we just detect on our
> own that all the caches are replicated and execute a query more optimal?
> This should omit necessity to add isReplicatedOnly()?
>
> —
> Denis
>
> > On Apr 12, 2017, at 7:07 AM, Andrey Mashenkov <
> andrey.mashenkov@gmail.com> wrote:
> >
> > Yes, it's reasonable.
> >
> > On Wed, Apr 12, 2017 at 3:23 PM, Sergi Vladykin <
> sergi.vladykin@gmail.com>
> > wrote:
> >
> >> Good point, but I'm not sure. The difference is that on client node you
> >> should not be able to enable isLocal, while isReplicatedOnly is
> perfectly
> >> valid. What do you think?
> >>
> >> Sergi
> >>
> >> 2017-04-12 15:18 GMT+03:00 Andrey Mashenkov <andrey.mashenkov@gmail.com
> >:
> >>
> >>> Sergi,
> >>>
> >>> Got it.
> >>>
> >>> Does query execution way and results will be same for isReplicatedOnly
> >> flag
> >>> and for isLocal flag turned on?
> >>> If my understanding is correct, we will get same results and there is
> no
> >>> need to introduce a new flag.
> >>>
> >>>
> >>>
> >>> On Wed, Apr 12, 2017 at 2:54 PM, Sergi Vladykin <
> >> sergi.vladykin@gmail.com>
> >>> wrote:
> >>>
> >>>> Ok, let it be an exception. I'm just saying that the thing does not
> >> work
> >>>> now.
> >>>>
> >>>> Sergi
> >>>>
> >>>> 2017-04-12 14:50 GMT+03:00 Andrey Mashenkov <
> >> andrey.mashenkov@gmail.com
> >>>> :
> >>>>
> >>>>> Sergi,
> >>>>>
> >>>>> I wounder how it is possible?
> >>>>>
> >>>>> Looks like it is impossible to run query on replicated cache, but
> >>> select
> >>>>> data from a
> >>>>> partitioned table. It will result with IlleagalStateException on
> >> stable
> >>>>> topology or
> >>>>> IgniteCacheException on unstable topology.
> >>>>> See ReduceQueryExecutor.stableDataNodes() and
> >>>>> replicatedUnstableDataNodes()
> >>>>> methods.
> >>>>>
> >>>>> BTW, IlleagalStateException with no message is confusing.
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Wed, Apr 12, 2017 at 2:36 PM, Sergi Vladykin <
> >>>> sergi.vladykin@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>>> Andrey,
> >>>>>>
> >>>>>> Because if you run query on replicated cache, but select data from
> >> a
> >>>>>> partitioned table, you will get only a part of the result.
> >>>>>>
> >>>>>> Igor,
> >>>>>>
> >>>>>> You are mostly right, but
> >>>>>>
> >>>>>> 1. Performance characteristics may change.
> >>>>>> 2. Ignite SQL processing pipeline may not support all the stuff in
> >> H2
> >>>> SQL
> >>>>>> and fail in some case where it worked previously.
> >>>>>>
> >>>>>> Because of this the change may affect existing applications and I
> >>> want
> >>>> to
> >>>>>> have it in 2.0 to make it legal.
> >>>>>>
> >>>>>> Sergi
> >>>>>>
> >>>>>> 2017-04-12 14:10 GMT+03:00 Igor Sapego <is...@gridgain.com>:
> >>>>>>
> >>>>>>> Also, is it really a breaking change if the results are wrong?
> >>>>>>> To me it looks more like a bugfix, i.e. you can't break something
> >>>>>>> that does not work properly.
> >>>>>>>
> >>>>>>> Best Regards,
> >>>>>>> Igor
> >>>>>>>
> >>>>>>> On Wed, Apr 12, 2017 at 2:04 PM, Andrey Mashenkov <
> >>>>>>> andrey.mashenkov@gmail.com> wrote:
> >>>>>>>
> >>>>>>>> Sergi,
> >>>>>>>>
> >>>>>>>> How can query to replicated cache leads to to wrong results?
> >>>>>>>> Is it due to we can read backup entries?
> >>>>>>>>
> >>>>>>>> On Wed, Apr 12, 2017 at 12:31 PM, Sergi Vladykin <
> >>>>>>> sergi.vladykin@gmail.com
> >>>>>>>>>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Guys,
> >>>>>>>>>
> >>>>>>>>> I want to introduce another breaking change for 2.0.
> >>>>>>>>>
> >>>>>>>>> Currently SQL is being processed differently when we call
> >>> method
> >>>>>>> `query`
> >>>>>>>> on
> >>>>>>>>> partitioned cache and on replicated: on replicated cache we
> >> do
> >>>> not
> >>>>> do
> >>>>>>> any
> >>>>>>>>> extra processing and execute the query as is on current node.
> >>>>>>>>>
> >>>>>>>>> This behavior historically existed for performance reasons.
> >> But
> >>>> it
> >>>>> is
> >>>>>>> not
> >>>>>>>>> obvious and leads to wrong query results. This issue becomes
> >>> even
> >>>>>> more
> >>>>>>>>> creepy with JDBC and ODBC drivers.
> >>>>>>>>>
> >>>>>>>>> In 2.0 I want to execute all the SQL queries the same way
> >>> through
> >>>>> the
> >>>>>>>> whole
> >>>>>>>>> processing pipeline to guaranty the correct result
> >>> irrespectively
> >>>>> to
> >>>>>>> the
> >>>>>>>>> cache that was the query originator.
> >>>>>>>>>
> >>>>>>>>> To be able to have the old behavior (skip all the
> >> preprocessing
> >>>> and
> >>>>>> run
> >>>>>>>>> query on current node) add a flag isReplicatedOnly() on
> >>> SqlQuery
> >>>>> and
> >>>>>>>>> SqlFieldsQuery. It will be disabled by default and if one
> >> knows
> >>>>> that
> >>>>>>> the
> >>>>>>>>> only replicated tables participate in a query, then he can
> >>> enable
> >>>>> it
> >>>>>>> for
> >>>>>>>>> better performance.
> >>>>>>>>>
> >>>>>>>>> Sergi
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> Best regards,
> >>>>>>>> Andrey V. Mashenkov
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Best regards,
> >>>>> Andrey V. Mashenkov
> >>>>>
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Best regards,
> >>> Andrey V. Mashenkov
> >>>
> >>
> >
> >
> >
> > --
> > Best regards,
> > Andrey V. Mashenkov
>
>

Re: SQL on PARTITIONED vs REPLICATED cache

Posted by Denis Magda <dm...@apache.org>.

Sergi,

As far as I understand you’re considering an example below:

IgniteCache partitioneCache = ...;
IgniteCache replicatedCache = …;

1. Valid scenario - *partitionedCache*.query(“SELECT * FROM partitionedCache … JOIN replicatedCache …”);
2. Error-prone scenario - *replicatedCache*.query(“SELECT * FROM partitionedCache … JOIN replicatedCache …”);

Do you mean 2. as the issue? If it’s so then can’t we just detect on our own that all the caches are replicated and execute a query more optimal? This should omit necessity to add isReplicatedOnly()?

—
Denis

> On Apr 12, 2017, at 7:07 AM, Andrey Mashenkov <an...@gmail.com> wrote:
> 
> Yes, it's reasonable.
> 
> On Wed, Apr 12, 2017 at 3:23 PM, Sergi Vladykin <se...@gmail.com>
> wrote:
> 
>> Good point, but I'm not sure. The difference is that on client node you
>> should not be able to enable isLocal, while isReplicatedOnly is perfectly
>> valid. What do you think?
>> 
>> Sergi
>> 
>> 2017-04-12 15:18 GMT+03:00 Andrey Mashenkov <an...@gmail.com>:
>> 
>>> Sergi,
>>> 
>>> Got it.
>>> 
>>> Does query execution way and results will be same for isReplicatedOnly
>> flag
>>> and for isLocal flag turned on?
>>> If my understanding is correct, we will get same results and there is no
>>> need to introduce a new flag.
>>> 
>>> 
>>> 
>>> On Wed, Apr 12, 2017 at 2:54 PM, Sergi Vladykin <
>> sergi.vladykin@gmail.com>
>>> wrote:
>>> 
>>>> Ok, let it be an exception. I'm just saying that the thing does not
>> work
>>>> now.
>>>> 
>>>> Sergi
>>>> 
>>>> 2017-04-12 14:50 GMT+03:00 Andrey Mashenkov <
>> andrey.mashenkov@gmail.com
>>>> :
>>>> 
>>>>> Sergi,
>>>>> 
>>>>> I wounder how it is possible?
>>>>> 
>>>>> Looks like it is impossible to run query on replicated cache, but
>>> select
>>>>> data from a
>>>>> partitioned table. It will result with IlleagalStateException on
>> stable
>>>>> topology or
>>>>> IgniteCacheException on unstable topology.
>>>>> See ReduceQueryExecutor.stableDataNodes() and
>>>>> replicatedUnstableDataNodes()
>>>>> methods.
>>>>> 
>>>>> BTW, IlleagalStateException with no message is confusing.
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> On Wed, Apr 12, 2017 at 2:36 PM, Sergi Vladykin <
>>>> sergi.vladykin@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> Andrey,
>>>>>> 
>>>>>> Because if you run query on replicated cache, but select data from
>> a
>>>>>> partitioned table, you will get only a part of the result.
>>>>>> 
>>>>>> Igor,
>>>>>> 
>>>>>> You are mostly right, but
>>>>>> 
>>>>>> 1. Performance characteristics may change.
>>>>>> 2. Ignite SQL processing pipeline may not support all the stuff in
>> H2
>>>> SQL
>>>>>> and fail in some case where it worked previously.
>>>>>> 
>>>>>> Because of this the change may affect existing applications and I
>>> want
>>>> to
>>>>>> have it in 2.0 to make it legal.
>>>>>> 
>>>>>> Sergi
>>>>>> 
>>>>>> 2017-04-12 14:10 GMT+03:00 Igor Sapego <is...@gridgain.com>:
>>>>>> 
>>>>>>> Also, is it really a breaking change if the results are wrong?
>>>>>>> To me it looks more like a bugfix, i.e. you can't break something
>>>>>>> that does not work properly.
>>>>>>> 
>>>>>>> Best Regards,
>>>>>>> Igor
>>>>>>> 
>>>>>>> On Wed, Apr 12, 2017 at 2:04 PM, Andrey Mashenkov <
>>>>>>> andrey.mashenkov@gmail.com> wrote:
>>>>>>> 
>>>>>>>> Sergi,
>>>>>>>> 
>>>>>>>> How can query to replicated cache leads to to wrong results?
>>>>>>>> Is it due to we can read backup entries?
>>>>>>>> 
>>>>>>>> On Wed, Apr 12, 2017 at 12:31 PM, Sergi Vladykin <
>>>>>>> sergi.vladykin@gmail.com
>>>>>>>>> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Guys,
>>>>>>>>> 
>>>>>>>>> I want to introduce another breaking change for 2.0.
>>>>>>>>> 
>>>>>>>>> Currently SQL is being processed differently when we call
>>> method
>>>>>>> `query`
>>>>>>>> on
>>>>>>>>> partitioned cache and on replicated: on replicated cache we
>> do
>>>> not
>>>>> do
>>>>>>> any
>>>>>>>>> extra processing and execute the query as is on current node.
>>>>>>>>> 
>>>>>>>>> This behavior historically existed for performance reasons.
>> But
>>>> it
>>>>> is
>>>>>>> not
>>>>>>>>> obvious and leads to wrong query results. This issue becomes
>>> even
>>>>>> more
>>>>>>>>> creepy with JDBC and ODBC drivers.
>>>>>>>>> 
>>>>>>>>> In 2.0 I want to execute all the SQL queries the same way
>>> through
>>>>> the
>>>>>>>> whole
>>>>>>>>> processing pipeline to guaranty the correct result
>>> irrespectively
>>>>> to
>>>>>>> the
>>>>>>>>> cache that was the query originator.
>>>>>>>>> 
>>>>>>>>> To be able to have the old behavior (skip all the
>> preprocessing
>>>> and
>>>>>> run
>>>>>>>>> query on current node) add a flag isReplicatedOnly() on
>>> SqlQuery
>>>>> and
>>>>>>>>> SqlFieldsQuery. It will be disabled by default and if one
>> knows
>>>>> that
>>>>>>> the
>>>>>>>>> only replicated tables participate in a query, then he can
>>> enable
>>>>> it
>>>>>>> for
>>>>>>>>> better performance.
>>>>>>>>> 
>>>>>>>>> Sergi
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Best regards,
>>>>>>>> Andrey V. Mashenkov
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Best regards,
>>>>> Andrey V. Mashenkov
>>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Best regards,
>>> Andrey V. Mashenkov
>>> 
>> 
> 
> 
> 
> -- 
> Best regards,
> Andrey V. Mashenkov

Re: SQL on PARTITIONED vs REPLICATED cache

Posted by Andrey Mashenkov <an...@gmail.com>.

Yes, it's reasonable.

On Wed, Apr 12, 2017 at 3:23 PM, Sergi Vladykin <se...@gmail.com>
wrote:

> Good point, but I'm not sure. The difference is that on client node you
> should not be able to enable isLocal, while isReplicatedOnly is perfectly
> valid. What do you think?
>
> Sergi
>
> 2017-04-12 15:18 GMT+03:00 Andrey Mashenkov <an...@gmail.com>:
>
> > Sergi,
> >
> > Got it.
> >
> > Does query execution way and results will be same for isReplicatedOnly
> flag
> > and for isLocal flag turned on?
> > If my understanding is correct, we will get same results and there is no
> > need to introduce a new flag.
> >
> >
> >
> > On Wed, Apr 12, 2017 at 2:54 PM, Sergi Vladykin <
> sergi.vladykin@gmail.com>
> > wrote:
> >
> > > Ok, let it be an exception. I'm just saying that the thing does not
> work
> > > now.
> > >
> > > Sergi
> > >
> > > 2017-04-12 14:50 GMT+03:00 Andrey Mashenkov <
> andrey.mashenkov@gmail.com
> > >:
> > >
> > > > Sergi,
> > > >
> > > > I wounder how it is possible?
> > > >
> > > > Looks like it is impossible to run query on replicated cache, but
> > select
> > > > data from a
> > > > partitioned table. It will result with IlleagalStateException on
> stable
> > > > topology or
> > > > IgniteCacheException on unstable topology.
> > > > See ReduceQueryExecutor.stableDataNodes() and
> > > > replicatedUnstableDataNodes()
> > > >  methods.
> > > >
> > > > BTW, IlleagalStateException with no message is confusing.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Wed, Apr 12, 2017 at 2:36 PM, Sergi Vladykin <
> > > sergi.vladykin@gmail.com>
> > > > wrote:
> > > >
> > > > > Andrey,
> > > > >
> > > > > Because if you run query on replicated cache, but select data from
> a
> > > > > partitioned table, you will get only a part of the result.
> > > > >
> > > > > Igor,
> > > > >
> > > > > You are mostly right, but
> > > > >
> > > > > 1. Performance characteristics may change.
> > > > > 2. Ignite SQL processing pipeline may not support all the stuff in
> H2
> > > SQL
> > > > > and fail in some case where it worked previously.
> > > > >
> > > > > Because of this the change may affect existing applications and I
> > want
> > > to
> > > > > have it in 2.0 to make it legal.
> > > > >
> > > > > Sergi
> > > > >
> > > > > 2017-04-12 14:10 GMT+03:00 Igor Sapego <is...@gridgain.com>:
> > > > >
> > > > > > Also, is it really a breaking change if the results are wrong?
> > > > > > To me it looks more like a bugfix, i.e. you can't break something
> > > > > > that does not work properly.
> > > > > >
> > > > > > Best Regards,
> > > > > > Igor
> > > > > >
> > > > > > On Wed, Apr 12, 2017 at 2:04 PM, Andrey Mashenkov <
> > > > > > andrey.mashenkov@gmail.com> wrote:
> > > > > >
> > > > > > > Sergi,
> > > > > > >
> > > > > > > How can query to replicated cache leads to to wrong results?
> > > > > > > Is it due to we can read backup entries?
> > > > > > >
> > > > > > > On Wed, Apr 12, 2017 at 12:31 PM, Sergi Vladykin <
> > > > > > sergi.vladykin@gmail.com
> > > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Guys,
> > > > > > > >
> > > > > > > > I want to introduce another breaking change for 2.0.
> > > > > > > >
> > > > > > > > Currently SQL is being processed differently when we call
> > method
> > > > > > `query`
> > > > > > > on
> > > > > > > > partitioned cache and on replicated: on replicated cache we
> do
> > > not
> > > > do
> > > > > > any
> > > > > > > > extra processing and execute the query as is on current node.
> > > > > > > >
> > > > > > > > This behavior historically existed for performance reasons.
> But
> > > it
> > > > is
> > > > > > not
> > > > > > > > obvious and leads to wrong query results. This issue becomes
> > even
> > > > > more
> > > > > > > > creepy with JDBC and ODBC drivers.
> > > > > > > >
> > > > > > > > In 2.0 I want to execute all the SQL queries the same way
> > through
> > > > the
> > > > > > > whole
> > > > > > > > processing pipeline to guaranty the correct result
> > irrespectively
> > > > to
> > > > > > the
> > > > > > > > cache that was the query originator.
> > > > > > > >
> > > > > > > > To be able to have the old behavior (skip all the
> preprocessing
> > > and
> > > > > run
> > > > > > > > query on current node) add a flag isReplicatedOnly() on
> > SqlQuery
> > > > and
> > > > > > > > SqlFieldsQuery. It will be disabled by default and if one
> knows
> > > > that
> > > > > > the
> > > > > > > > only replicated tables participate in a query, then he can
> > enable
> > > > it
> > > > > > for
> > > > > > > > better performance.
> > > > > > > >
> > > > > > > > Sergi
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Best regards,
> > > > > > > Andrey V. Mashenkov
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > > Andrey V. Mashenkov
> > > >
> > >
> >
> >
> >
> > --
> > Best regards,
> > Andrey V. Mashenkov
> >
>



-- 
Best regards,
Andrey V. Mashenkov

Re: SQL on PARTITIONED vs REPLICATED cache

Posted by Sergi Vladykin <se...@gmail.com>.

Good point, but I'm not sure. The difference is that on client node you
should not be able to enable isLocal, while isReplicatedOnly is perfectly
valid. What do you think?

Sergi

2017-04-12 15:18 GMT+03:00 Andrey Mashenkov <an...@gmail.com>:

> Sergi,
>
> Got it.
>
> Does query execution way and results will be same for isReplicatedOnly flag
> and for isLocal flag turned on?
> If my understanding is correct, we will get same results and there is no
> need to introduce a new flag.
>
>
>
> On Wed, Apr 12, 2017 at 2:54 PM, Sergi Vladykin <se...@gmail.com>
> wrote:
>
> > Ok, let it be an exception. I'm just saying that the thing does not work
> > now.
> >
> > Sergi
> >
> > 2017-04-12 14:50 GMT+03:00 Andrey Mashenkov <andrey.mashenkov@gmail.com
> >:
> >
> > > Sergi,
> > >
> > > I wounder how it is possible?
> > >
> > > Looks like it is impossible to run query on replicated cache, but
> select
> > > data from a
> > > partitioned table. It will result with IlleagalStateException on stable
> > > topology or
> > > IgniteCacheException on unstable topology.
> > > See ReduceQueryExecutor.stableDataNodes() and
> > > replicatedUnstableDataNodes()
> > >  methods.
> > >
> > > BTW, IlleagalStateException with no message is confusing.
> > >
> > >
> > >
> > >
> > >
> > > On Wed, Apr 12, 2017 at 2:36 PM, Sergi Vladykin <
> > sergi.vladykin@gmail.com>
> > > wrote:
> > >
> > > > Andrey,
> > > >
> > > > Because if you run query on replicated cache, but select data from a
> > > > partitioned table, you will get only a part of the result.
> > > >
> > > > Igor,
> > > >
> > > > You are mostly right, but
> > > >
> > > > 1. Performance characteristics may change.
> > > > 2. Ignite SQL processing pipeline may not support all the stuff in H2
> > SQL
> > > > and fail in some case where it worked previously.
> > > >
> > > > Because of this the change may affect existing applications and I
> want
> > to
> > > > have it in 2.0 to make it legal.
> > > >
> > > > Sergi
> > > >
> > > > 2017-04-12 14:10 GMT+03:00 Igor Sapego <is...@gridgain.com>:
> > > >
> > > > > Also, is it really a breaking change if the results are wrong?
> > > > > To me it looks more like a bugfix, i.e. you can't break something
> > > > > that does not work properly.
> > > > >
> > > > > Best Regards,
> > > > > Igor
> > > > >
> > > > > On Wed, Apr 12, 2017 at 2:04 PM, Andrey Mashenkov <
> > > > > andrey.mashenkov@gmail.com> wrote:
> > > > >
> > > > > > Sergi,
> > > > > >
> > > > > > How can query to replicated cache leads to to wrong results?
> > > > > > Is it due to we can read backup entries?
> > > > > >
> > > > > > On Wed, Apr 12, 2017 at 12:31 PM, Sergi Vladykin <
> > > > > sergi.vladykin@gmail.com
> > > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Guys,
> > > > > > >
> > > > > > > I want to introduce another breaking change for 2.0.
> > > > > > >
> > > > > > > Currently SQL is being processed differently when we call
> method
> > > > > `query`
> > > > > > on
> > > > > > > partitioned cache and on replicated: on replicated cache we do
> > not
> > > do
> > > > > any
> > > > > > > extra processing and execute the query as is on current node.
> > > > > > >
> > > > > > > This behavior historically existed for performance reasons. But
> > it
> > > is
> > > > > not
> > > > > > > obvious and leads to wrong query results. This issue becomes
> even
> > > > more
> > > > > > > creepy with JDBC and ODBC drivers.
> > > > > > >
> > > > > > > In 2.0 I want to execute all the SQL queries the same way
> through
> > > the
> > > > > > whole
> > > > > > > processing pipeline to guaranty the correct result
> irrespectively
> > > to
> > > > > the
> > > > > > > cache that was the query originator.
> > > > > > >
> > > > > > > To be able to have the old behavior (skip all the preprocessing
> > and
> > > > run
> > > > > > > query on current node) add a flag isReplicatedOnly() on
> SqlQuery
> > > and
> > > > > > > SqlFieldsQuery. It will be disabled by default and if one knows
> > > that
> > > > > the
> > > > > > > only replicated tables participate in a query, then he can
> enable
> > > it
> > > > > for
> > > > > > > better performance.
> > > > > > >
> > > > > > > Sergi
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best regards,
> > > > > > Andrey V. Mashenkov
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Best regards,
> > > Andrey V. Mashenkov
> > >
> >
>
>
>
> --
> Best regards,
> Andrey V. Mashenkov
>

Re: SQL on PARTITIONED vs REPLICATED cache

Posted by Andrey Mashenkov <an...@gmail.com>.

Sergi,

Got it.

Does query execution way and results will be same for isReplicatedOnly flag
and for isLocal flag turned on?
If my understanding is correct, we will get same results and there is no
need to introduce a new flag.



On Wed, Apr 12, 2017 at 2:54 PM, Sergi Vladykin <se...@gmail.com>
wrote:

> Ok, let it be an exception. I'm just saying that the thing does not work
> now.
>
> Sergi
>
> 2017-04-12 14:50 GMT+03:00 Andrey Mashenkov <an...@gmail.com>:
>
> > Sergi,
> >
> > I wounder how it is possible?
> >
> > Looks like it is impossible to run query on replicated cache, but select
> > data from a
> > partitioned table. It will result with IlleagalStateException on stable
> > topology or
> > IgniteCacheException on unstable topology.
> > See ReduceQueryExecutor.stableDataNodes() and
> > replicatedUnstableDataNodes()
> >  methods.
> >
> > BTW, IlleagalStateException with no message is confusing.
> >
> >
> >
> >
> >
> > On Wed, Apr 12, 2017 at 2:36 PM, Sergi Vladykin <
> sergi.vladykin@gmail.com>
> > wrote:
> >
> > > Andrey,
> > >
> > > Because if you run query on replicated cache, but select data from a
> > > partitioned table, you will get only a part of the result.
> > >
> > > Igor,
> > >
> > > You are mostly right, but
> > >
> > > 1. Performance characteristics may change.
> > > 2. Ignite SQL processing pipeline may not support all the stuff in H2
> SQL
> > > and fail in some case where it worked previously.
> > >
> > > Because of this the change may affect existing applications and I want
> to
> > > have it in 2.0 to make it legal.
> > >
> > > Sergi
> > >
> > > 2017-04-12 14:10 GMT+03:00 Igor Sapego <is...@gridgain.com>:
> > >
> > > > Also, is it really a breaking change if the results are wrong?
> > > > To me it looks more like a bugfix, i.e. you can't break something
> > > > that does not work properly.
> > > >
> > > > Best Regards,
> > > > Igor
> > > >
> > > > On Wed, Apr 12, 2017 at 2:04 PM, Andrey Mashenkov <
> > > > andrey.mashenkov@gmail.com> wrote:
> > > >
> > > > > Sergi,
> > > > >
> > > > > How can query to replicated cache leads to to wrong results?
> > > > > Is it due to we can read backup entries?
> > > > >
> > > > > On Wed, Apr 12, 2017 at 12:31 PM, Sergi Vladykin <
> > > > sergi.vladykin@gmail.com
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > Guys,
> > > > > >
> > > > > > I want to introduce another breaking change for 2.0.
> > > > > >
> > > > > > Currently SQL is being processed differently when we call method
> > > > `query`
> > > > > on
> > > > > > partitioned cache and on replicated: on replicated cache we do
> not
> > do
> > > > any
> > > > > > extra processing and execute the query as is on current node.
> > > > > >
> > > > > > This behavior historically existed for performance reasons. But
> it
> > is
> > > > not
> > > > > > obvious and leads to wrong query results. This issue becomes even
> > > more
> > > > > > creepy with JDBC and ODBC drivers.
> > > > > >
> > > > > > In 2.0 I want to execute all the SQL queries the same way through
> > the
> > > > > whole
> > > > > > processing pipeline to guaranty the correct result irrespectively
> > to
> > > > the
> > > > > > cache that was the query originator.
> > > > > >
> > > > > > To be able to have the old behavior (skip all the preprocessing
> and
> > > run
> > > > > > query on current node) add a flag isReplicatedOnly() on SqlQuery
> > and
> > > > > > SqlFieldsQuery. It will be disabled by default and if one knows
> > that
> > > > the
> > > > > > only replicated tables participate in a query, then he can enable
> > it
> > > > for
> > > > > > better performance.
> > > > > >
> > > > > > Sergi
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best regards,
> > > > > Andrey V. Mashenkov
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > Best regards,
> > Andrey V. Mashenkov
> >
>



-- 
Best regards,
Andrey V. Mashenkov

Re: SQL on PARTITIONED vs REPLICATED cache

Posted by Sergi Vladykin <se...@gmail.com>.

Ok, let it be an exception. I'm just saying that the thing does not work
now.

Sergi

2017-04-12 14:50 GMT+03:00 Andrey Mashenkov <an...@gmail.com>:

> Sergi,
>
> I wounder how it is possible?
>
> Looks like it is impossible to run query on replicated cache, but select
> data from a
> partitioned table. It will result with IlleagalStateException on stable
> topology or
> IgniteCacheException on unstable topology.
> See ReduceQueryExecutor.stableDataNodes() and
> replicatedUnstableDataNodes()
>  methods.
>
> BTW, IlleagalStateException with no message is confusing.
>
>
>
>
>
> On Wed, Apr 12, 2017 at 2:36 PM, Sergi Vladykin <se...@gmail.com>
> wrote:
>
> > Andrey,
> >
> > Because if you run query on replicated cache, but select data from a
> > partitioned table, you will get only a part of the result.
> >
> > Igor,
> >
> > You are mostly right, but
> >
> > 1. Performance characteristics may change.
> > 2. Ignite SQL processing pipeline may not support all the stuff in H2 SQL
> > and fail in some case where it worked previously.
> >
> > Because of this the change may affect existing applications and I want to
> > have it in 2.0 to make it legal.
> >
> > Sergi
> >
> > 2017-04-12 14:10 GMT+03:00 Igor Sapego <is...@gridgain.com>:
> >
> > > Also, is it really a breaking change if the results are wrong?
> > > To me it looks more like a bugfix, i.e. you can't break something
> > > that does not work properly.
> > >
> > > Best Regards,
> > > Igor
> > >
> > > On Wed, Apr 12, 2017 at 2:04 PM, Andrey Mashenkov <
> > > andrey.mashenkov@gmail.com> wrote:
> > >
> > > > Sergi,
> > > >
> > > > How can query to replicated cache leads to to wrong results?
> > > > Is it due to we can read backup entries?
> > > >
> > > > On Wed, Apr 12, 2017 at 12:31 PM, Sergi Vladykin <
> > > sergi.vladykin@gmail.com
> > > > >
> > > > wrote:
> > > >
> > > > > Guys,
> > > > >
> > > > > I want to introduce another breaking change for 2.0.
> > > > >
> > > > > Currently SQL is being processed differently when we call method
> > > `query`
> > > > on
> > > > > partitioned cache and on replicated: on replicated cache we do not
> do
> > > any
> > > > > extra processing and execute the query as is on current node.
> > > > >
> > > > > This behavior historically existed for performance reasons. But it
> is
> > > not
> > > > > obvious and leads to wrong query results. This issue becomes even
> > more
> > > > > creepy with JDBC and ODBC drivers.
> > > > >
> > > > > In 2.0 I want to execute all the SQL queries the same way through
> the
> > > > whole
> > > > > processing pipeline to guaranty the correct result irrespectively
> to
> > > the
> > > > > cache that was the query originator.
> > > > >
> > > > > To be able to have the old behavior (skip all the preprocessing and
> > run
> > > > > query on current node) add a flag isReplicatedOnly() on SqlQuery
> and
> > > > > SqlFieldsQuery. It will be disabled by default and if one knows
> that
> > > the
> > > > > only replicated tables participate in a query, then he can enable
> it
> > > for
> > > > > better performance.
> > > > >
> > > > > Sergi
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > > Andrey V. Mashenkov
> > > >
> > >
> >
>
>
>
> --
> Best regards,
> Andrey V. Mashenkov
>

Re: SQL on PARTITIONED vs REPLICATED cache

Posted by Andrey Mashenkov <an...@gmail.com>.

Sergi,

I wounder how it is possible?

Looks like it is impossible to run query on replicated cache, but select
data from a
partitioned table. It will result with IlleagalStateException on stable
topology or
IgniteCacheException on unstable topology.
See ReduceQueryExecutor.stableDataNodes() and replicatedUnstableDataNodes()
 methods.

BTW, IlleagalStateException with no message is confusing.





On Wed, Apr 12, 2017 at 2:36 PM, Sergi Vladykin <se...@gmail.com>
wrote:

> Andrey,
>
> Because if you run query on replicated cache, but select data from a
> partitioned table, you will get only a part of the result.
>
> Igor,
>
> You are mostly right, but
>
> 1. Performance characteristics may change.
> 2. Ignite SQL processing pipeline may not support all the stuff in H2 SQL
> and fail in some case where it worked previously.
>
> Because of this the change may affect existing applications and I want to
> have it in 2.0 to make it legal.
>
> Sergi
>
> 2017-04-12 14:10 GMT+03:00 Igor Sapego <is...@gridgain.com>:
>
> > Also, is it really a breaking change if the results are wrong?
> > To me it looks more like a bugfix, i.e. you can't break something
> > that does not work properly.
> >
> > Best Regards,
> > Igor
> >
> > On Wed, Apr 12, 2017 at 2:04 PM, Andrey Mashenkov <
> > andrey.mashenkov@gmail.com> wrote:
> >
> > > Sergi,
> > >
> > > How can query to replicated cache leads to to wrong results?
> > > Is it due to we can read backup entries?
> > >
> > > On Wed, Apr 12, 2017 at 12:31 PM, Sergi Vladykin <
> > sergi.vladykin@gmail.com
> > > >
> > > wrote:
> > >
> > > > Guys,
> > > >
> > > > I want to introduce another breaking change for 2.0.
> > > >
> > > > Currently SQL is being processed differently when we call method
> > `query`
> > > on
> > > > partitioned cache and on replicated: on replicated cache we do not do
> > any
> > > > extra processing and execute the query as is on current node.
> > > >
> > > > This behavior historically existed for performance reasons. But it is
> > not
> > > > obvious and leads to wrong query results. This issue becomes even
> more
> > > > creepy with JDBC and ODBC drivers.
> > > >
> > > > In 2.0 I want to execute all the SQL queries the same way through the
> > > whole
> > > > processing pipeline to guaranty the correct result irrespectively to
> > the
> > > > cache that was the query originator.
> > > >
> > > > To be able to have the old behavior (skip all the preprocessing and
> run
> > > > query on current node) add a flag isReplicatedOnly() on SqlQuery and
> > > > SqlFieldsQuery. It will be disabled by default and if one knows that
> > the
> > > > only replicated tables participate in a query, then he can enable it
> > for
> > > > better performance.
> > > >
> > > > Sergi
> > > >
> > >
> > >
> > >
> > > --
> > > Best regards,
> > > Andrey V. Mashenkov
> > >
> >
>



-- 
Best regards,
Andrey V. Mashenkov

Re: SQL on PARTITIONED vs REPLICATED cache

Posted by Sergi Vladykin <se...@gmail.com>.

Andrey,

Because if you run query on replicated cache, but select data from a
partitioned table, you will get only a part of the result.

Igor,

You are mostly right, but

1. Performance characteristics may change.
2. Ignite SQL processing pipeline may not support all the stuff in H2 SQL
and fail in some case where it worked previously.

Because of this the change may affect existing applications and I want to
have it in 2.0 to make it legal.

Sergi

2017-04-12 14:10 GMT+03:00 Igor Sapego <is...@gridgain.com>:

> Also, is it really a breaking change if the results are wrong?
> To me it looks more like a bugfix, i.e. you can't break something
> that does not work properly.
>
> Best Regards,
> Igor
>
> On Wed, Apr 12, 2017 at 2:04 PM, Andrey Mashenkov <
> andrey.mashenkov@gmail.com> wrote:
>
> > Sergi,
> >
> > How can query to replicated cache leads to to wrong results?
> > Is it due to we can read backup entries?
> >
> > On Wed, Apr 12, 2017 at 12:31 PM, Sergi Vladykin <
> sergi.vladykin@gmail.com
> > >
> > wrote:
> >
> > > Guys,
> > >
> > > I want to introduce another breaking change for 2.0.
> > >
> > > Currently SQL is being processed differently when we call method
> `query`
> > on
> > > partitioned cache and on replicated: on replicated cache we do not do
> any
> > > extra processing and execute the query as is on current node.
> > >
> > > This behavior historically existed for performance reasons. But it is
> not
> > > obvious and leads to wrong query results. This issue becomes even more
> > > creepy with JDBC and ODBC drivers.
> > >
> > > In 2.0 I want to execute all the SQL queries the same way through the
> > whole
> > > processing pipeline to guaranty the correct result irrespectively to
> the
> > > cache that was the query originator.
> > >
> > > To be able to have the old behavior (skip all the preprocessing and run
> > > query on current node) add a flag isReplicatedOnly() on SqlQuery and
> > > SqlFieldsQuery. It will be disabled by default and if one knows that
> the
> > > only replicated tables participate in a query, then he can enable it
> for
> > > better performance.
> > >
> > > Sergi
> > >
> >
> >
> >
> > --
> > Best regards,
> > Andrey V. Mashenkov
> >
>

Re: SQL on PARTITIONED vs REPLICATED cache

Posted by Igor Sapego <is...@gridgain.com>.

Also, is it really a breaking change if the results are wrong?
To me it looks more like a bugfix, i.e. you can't break something
that does not work properly.

Best Regards,
Igor

On Wed, Apr 12, 2017 at 2:04 PM, Andrey Mashenkov <
andrey.mashenkov@gmail.com> wrote:

> Sergi,
>
> How can query to replicated cache leads to to wrong results?
> Is it due to we can read backup entries?
>
> On Wed, Apr 12, 2017 at 12:31 PM, Sergi Vladykin <sergi.vladykin@gmail.com
> >
> wrote:
>
> > Guys,
> >
> > I want to introduce another breaking change for 2.0.
> >
> > Currently SQL is being processed differently when we call method `query`
> on
> > partitioned cache and on replicated: on replicated cache we do not do any
> > extra processing and execute the query as is on current node.
> >
> > This behavior historically existed for performance reasons. But it is not
> > obvious and leads to wrong query results. This issue becomes even more
> > creepy with JDBC and ODBC drivers.
> >
> > In 2.0 I want to execute all the SQL queries the same way through the
> whole
> > processing pipeline to guaranty the correct result irrespectively to the
> > cache that was the query originator.
> >
> > To be able to have the old behavior (skip all the preprocessing and run
> > query on current node) add a flag isReplicatedOnly() on SqlQuery and
> > SqlFieldsQuery. It will be disabled by default and if one knows that the
> > only replicated tables participate in a query, then he can enable it for
> > better performance.
> >
> > Sergi
> >
>
>
>
> --
> Best regards,
> Andrey V. Mashenkov
>

Re: SQL on PARTITIONED vs REPLICATED cache

Posted by Andrey Mashenkov <an...@gmail.com>.

Sergi,

How can query to replicated cache leads to to wrong results?
Is it due to we can read backup entries?

On Wed, Apr 12, 2017 at 12:31 PM, Sergi Vladykin <se...@gmail.com>
wrote:

> Guys,
>
> I want to introduce another breaking change for 2.0.
>
> Currently SQL is being processed differently when we call method `query` on
> partitioned cache and on replicated: on replicated cache we do not do any
> extra processing and execute the query as is on current node.
>
> This behavior historically existed for performance reasons. But it is not
> obvious and leads to wrong query results. This issue becomes even more
> creepy with JDBC and ODBC drivers.
>
> In 2.0 I want to execute all the SQL queries the same way through the whole
> processing pipeline to guaranty the correct result irrespectively to the
> cache that was the query originator.
>
> To be able to have the old behavior (skip all the preprocessing and run
> query on current node) add a flag isReplicatedOnly() on SqlQuery and
> SqlFieldsQuery. It will be disabled by default and if one knows that the
> only replicated tables participate in a query, then he can enable it for
> better performance.
>
> Sergi
>



-- 
Best regards,
Andrey V. Mashenkov