You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@ignite.apache.org by Taras Ledkov <tl...@gridgain.com> on 2019/02/21 13:27:25 UTC

SQL: INSERT with hidden columns _key, _val and check the type of input objects

Hi,

Lets discuss SQL DML (INSERT/UPDATE) current behavior specific:

Ignite doesn't check a type of input objects when hidden columns _key, 
_value is used in a DML statements.
I describe the current behavior for example:

1. Cache configuration:  'setIndexedTypes(PersonKey.class, Person.class))'
2.  PersonKey type contains 'int id' field.
3. SQL statement: 'INSERT INTO test (_val, _key) VALUES (?, ?)'

Cases:
1. Invalid value object type:
- Any value object may be passed as a query parameter
- Query is executed without an error and returns '1' (one row updated);
- There is not inserted row at the 'SELECT * FROM test' results.
- cache.get(key) returns inserted object;

2. Invalid key object type:
2.1 Non-primitive object is passed and binary representation doesn't 
contain 'id' field.
- Query is executed without error and returns '1' (one row updated);
- The inserted row is available by 'SELECT *' and the row contains id = 
null;
2.2 Non-primitive object is passed and binary representation contains 
'id' field.
- The inserted row is available by 'SELECT *' and the row contains 
expected 'id' field;
- The cache entry cannot be gathered by 'cache.get' operation with the 
corresponding 'PersonKey(id)' (keys differ).

I propose to check type of the user's input object.

I guess that using _key/_val columns works close to 'cache.put()' but it 
looks like significant usability issue.
To confuse the 'PersonKey.class.getName()' and 
'node.binary().builder("PersonKey")' is a typical mistake of Ignite 
newcomers.

One more argument for check: SQL INSERT sematic means the row is 
inserted into the specified TABLE, not into the cache.
So, throw IgniteSQLException is expected behavior in this case, i think.

[1]. https://issues.apache.org/jira/browse/IGNITE-5250

-- 
Taras Ledkov
Mail-To: tledkov@gridgain.com

Re: SQL: INSERT with hidden columns _key, _val and check the type of input objects

Posted by Taras Ledkov <tl...@gridgain.com>.

Hi,

I was wrong about unwrapped _val.

Lets introduce 
IgniteSystemProperties.IGNITE_ALLOW_KEY_VAL_COLUMNS_AT_DML to switch on 
current behavior
and disallow composite _key, _val columns at the INSERT/MERGE/UPDATE 
statements by default.

27.02.2019 13:19, Vladimir Ozerov пишет:

> I do not think this should be deferred, even though it changes default
> behavior. Clean and simple semantics is much more important. In this
> regards DML was created incorrectly in the first place. We will fix it,
> leaving hidden fallback mode for those users who use this strange semantics.
>
> ср, 27 февр. 2019 г. в 12:57, Ilya Kasnacheev <il...@gmail.com>:
>
>> Hello!
>>
>>> UPDATE table SET _VAL=? WHERE ...   // Disallow
>> Breaking change and as such should be deferred to 3.0.
>>
>> All of our tables have types, so we can disallow doing _VAL=? where
>> parameter object is not of table's type, and semantics break down here -
>> you INSERT object in cache, get "1" rows updated but can't select this row
>> from table.
>> But we probably should not disallow _VAL=? where parameter object IS of
>> table's type, since there may be users whose workflow depends on that and
>> it isn't fixable easily.
>>
>> For example, they can have objects of which only subset of fields is
>> indexed, the rest is not. Then they are inserting them via SQL as shown.
>>
>> Regards,
>> --
>> Ilya Kasnacheev
>>
>>
>> ср, 27 февр. 2019 г. в 12:10, Vladimir Ozerov <vo...@gridgain.com>:
>>
>>> Hi Taras,
>>>
>>> As far as your original question :-) I would say that user should have
>> only
>>> one way to update data with DML - through plain attributes. That is, if
>> we
>>> have a composite value with attributes "a" and "b", then we should:
>>> UPDATE table SET a=?, b=? WHERE ... // Allow
>>> UPDATE table SET _VAL=? WHERE ...   // Disallow
>>>
>>> But if the value is an attribute itself (e.g. in case of primitive), then
>>> DML should be allowed on it for sure:
>>> UPDATE table SET _VAL=? WHERE ...   // Allow
>>>
>>> What do you think?
>>>
>>> On Sat, Feb 23, 2019 at 6:50 PM Denis Magda <dm...@gridgain.com> wrote:
>>>
>>>> Vladimir,
>>>>
>>>> Ok, agreed, let's not boil the ocean...at least for now ;)
>>>>
>>>> --
>>>> Denis Magda
>>>>
>>>>
>>>> On Sat, Feb 23, 2019 at 12:50 AM Vladimir Ozerov <vozerov@gridgain.com
>>>> wrote:
>>>>
>>>>> Denis,
>>>>>
>>>>> Yes, this is what my answer was about - you cannot have SQL without
>>>>> defining fields in advance. Because it breaks a lot of standard SQL
>>>>> invariants and virtually makes the whole language unusable. For
>>> instance,
>>>>> think of product behavior in the following cases:
>>>>> 1) User queries an empty cache with a query "SELECT a FROM table" -
>>> what
>>>>> should happen - exception or empty result? How would I know whether
>>> field
>>>>> "a" will appear in future?
>>>>> 2) User executed a command "ALTER TABLE ... ADD COLUMN b" - how can I
>>>>> understand whether it is possible or not to add a column without
>> strict
>>>>> schema?
>>>>> 3) "ALTER TABLE ... DROP COLUMN c" - what should happen if user will
>>> add
>>>> an
>>>>> object with field "c" after that?
>>>>> 4) User connects to Ignite from Tableau and navigates through schema
>> -
>>>> what
>>>>> should be shown?
>>>>>
>>>>> That is, you cannot have SQL without schema because it is at the very
>>>> heart
>>>>> of the technology. But you can have schema-less noSQL database.
>>>>>
>>>>> Let's do not invent a hybrid with tons of corner cases and separate
>>>>> learning curve. It should be enough just to rethink and simplify our
>>>>> configuration - reshape QueryEntity, deprecate all SQL annotations,
>>> allow
>>>>> only one table per cache, allow to define SQL script to be executed
>> on
>>>>> cache start or so.
>>>>>
>>>>> As far as schemaless - it is viable approach for sure, but should be
>>>>> considered either outside of SQL (e.g. a kind of predicate/criteria
>> API
>>>>> which can be merged with ScanQuery) or as a special datatype in SQL
>>>>> ecosystem (like is is done with JSON in many RDBMS databases).
>>>>>
>>>>> Vladimir.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Feb 22, 2019 at 11:01 PM Denis Magda <dm...@apache.org>
>>> wrote:
>>>>>> Vladimir,
>>>>>>
>>>>>> That's understood. I'm just thinking of a use case different from
>> the
>>>> DDL
>>>>>> approach where the schema is defined initially. Let's say that
>>> someone
>>>>>> configured caches with CacheConfiguration and now puts an Object in
>>> the
>>>>>> cache. For that person, it would be helpful to skip the Annotations
>>> or
>>>>>> QueryEntities approaches for queryable fields definitions (not even
>>>>>> indexes). For instance, the person might simply query some fields
>>> with
>>>>> the
>>>>>> primary index in the WHERE clause and this shouldn't require any
>>> extra
>>>>>> settings. Yes, it's clear that it might be extremely challenging to
>>>>> support
>>>>>> but imagine how usable the API could become if we can get rid of
>>>>>> Annotations and QueryEntities.
>>>>>>
>>>>>> Basically, my idea is that all of the objects and their fields
>> stored
>>>> in
>>>>>> the caches should be visible to SQL w/o extra settings. If someone
>>>> wants
>>>>> to
>>>>>> create indexes then use DDL which was designed for this.
>>>>>>
>>>>>>
>>>>>> -
>>>>>> Denis
>>>>>>
>>>>>>
>>>>>> On Fri, Feb 22, 2019 at 2:27 AM Vladimir Ozerov <
>>> vozerov@gridgain.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Denis,
>>>>>>>
>>>>>>> SQL is a language with strict schema what was one of significant
>>>>> factors
>>>>>> of
>>>>>>> it's worldwide success. I doubt we will ever have SQL without
>>>>>>> configuration/definiton, because otherwise it will be not SQL,
>> but
>>>>>>> something else (e.g. document-oriented, JSON, whatever).
>>>>>>>
>>>>>>> On Fri, Feb 22, 2019 at 1:52 AM Denis Magda <dm...@apache.org>
>>>> wrote:
>>>>>>>> Folks,
>>>>>>>>
>>>>>>>> Do we want to preserve the annotation-based configuration?
>> There
>>>> are
>>>>>> too
>>>>>>>> many ways to configure SQL indexes/fields.
>>>>>>>>
>>>>>>>> For instance, if our new SQL API could see and access all of
>> the
>>>>> fields
>>>>>>>> out-of-the-box (without any extra settings) and DDL will be
>> used
>>> to
>>>>>>> define
>>>>>>>> indexed fields then that would be a huge usability improvement.
>>>>>>>>
>>>>>>>> -
>>>>>>>> Denis
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Feb 21, 2019 at 5:27 AM Taras Ledkov <
>>> tledkov@gridgain.com
>>>>>>> wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Lets discuss SQL DML (INSERT/UPDATE) current behavior
>> specific:
>>>>>>>>> Ignite doesn't check a type of input objects when hidden
>>> columns
>>>>>> _key,
>>>>>>>>> _value is used in a DML statements.
>>>>>>>>> I describe the current behavior for example:
>>>>>>>>>
>>>>>>>>> 1. Cache configuration:  'setIndexedTypes(PersonKey.class,
>>>>>>>> Person.class))'
>>>>>>>>> 2.  PersonKey type contains 'int id' field.
>>>>>>>>> 3. SQL statement: 'INSERT INTO test (_val, _key) VALUES (?,
>> ?)'
>>>>>>>>> Cases:
>>>>>>>>> 1. Invalid value object type:
>>>>>>>>> - Any value object may be passed as a query parameter
>>>>>>>>> - Query is executed without an error and returns '1' (one row
>>>>>> updated);
>>>>>>>>> - There is not inserted row at the 'SELECT * FROM test'
>>> results.
>>>>>>>>> - cache.get(key) returns inserted object;
>>>>>>>>>
>>>>>>>>> 2. Invalid key object type:
>>>>>>>>> 2.1 Non-primitive object is passed and binary representation
>>>>> doesn't
>>>>>>>>> contain 'id' field.
>>>>>>>>> - Query is executed without error and returns '1' (one row
>>>>> updated);
>>>>>>>>> - The inserted row is available by 'SELECT *' and the row
>>>> contains
>>>>>> id =
>>>>>>>>> null;
>>>>>>>>> 2.2 Non-primitive object is passed and binary representation
>>>>> contains
>>>>>>>>> 'id' field.
>>>>>>>>> - The inserted row is available by 'SELECT *' and the row
>>>> contains
>>>>>>>>> expected 'id' field;
>>>>>>>>> - The cache entry cannot be gathered by 'cache.get' operation
>>>> with
>>>>>> the
>>>>>>>>> corresponding 'PersonKey(id)' (keys differ).
>>>>>>>>>
>>>>>>>>> I propose to check type of the user's input object.
>>>>>>>>>
>>>>>>>>> I guess that using _key/_val columns works close to
>>> 'cache.put()'
>>>>> but
>>>>>>> it
>>>>>>>>> looks like significant usability issue.
>>>>>>>>> To confuse the 'PersonKey.class.getName()' and
>>>>>>>>> 'node.binary().builder("PersonKey")' is a typical mistake of
>>>> Ignite
>>>>>>>>> newcomers.
>>>>>>>>>
>>>>>>>>> One more argument for check: SQL INSERT sematic means the row
>>> is
>>>>>>>>> inserted into the specified TABLE, not into the cache.
>>>>>>>>> So, throw IgniteSQLException is expected behavior in this
>>> case, i
>>>>>>> think.
>>>>>>>>> [1]. https://issues.apache.org/jira/browse/IGNITE-5250
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Taras Ledkov
>>>>>>>>> Mail-To: tledkov@gridgain.com
>>>>>>>>>
>>>>>>>>>
-- 
Taras Ledkov
Mail-To: tledkov@gridgain.com

Re: SQL: INSERT with hidden columns _key, _val and check the type of input objects

Posted by Taras Ledkov <tl...@gridgain.com>.

Hi,

Unwrapped key works properly now, but unwrapped value doesn't.

Example:
         execSql("CREATE TABLE test_0 (id integer primary key, name 
varchar) WITH \"WRAP_VALUE\"");
         execSql("INSERT INTO test_0 (id, _val) VALUES (?, ?)", 0, "0");

The exception is thrown:
IgniteSQLException: Value conversion failed [column=_VAL, 
from=java.lang.String, to=java.lang.Object]

So, I propose to choose one of the following:
- disallow use the hidden columns _key,  _val in the INSERT/UPDATE/MERGE
- check the input object type.

27.02.2019 13:19, Vladimir Ozerov пишет:
> I do not think this should be deferred, even though it changes default
> behavior. Clean and simple semantics is much more important. In this
> regards DML was created incorrectly in the first place. We will fix it,
> leaving hidden fallback mode for those users who use this strange semantics.
>
> ср, 27 февр. 2019 г. в 12:57, Ilya Kasnacheev <il...@gmail.com>:
>
>> Hello!
>>
>>> UPDATE table SET _VAL=? WHERE ...   // Disallow
>> Breaking change and as such should be deferred to 3.0.
>>
>> All of our tables have types, so we can disallow doing _VAL=? where
>> parameter object is not of table's type, and semantics break down here -
>> you INSERT object in cache, get "1" rows updated but can't select this row
>> from table.
>> But we probably should not disallow _VAL=? where parameter object IS of
>> table's type, since there may be users whose workflow depends on that and
>> it isn't fixable easily.
>>
>> For example, they can have objects of which only subset of fields is
>> indexed, the rest is not. Then they are inserting them via SQL as shown.
>>
>> Regards,
>> --
>> Ilya Kasnacheev
>>
>>
>> ср, 27 февр. 2019 г. в 12:10, Vladimir Ozerov <vo...@gridgain.com>:
>>
>>> Hi Taras,
>>>
>>> As far as your original question :-) I would say that user should have
>> only
>>> one way to update data with DML - through plain attributes. That is, if
>> we
>>> have a composite value with attributes "a" and "b", then we should:
>>> UPDATE table SET a=?, b=? WHERE ... // Allow
>>> UPDATE table SET _VAL=? WHERE ...   // Disallow
>>>
>>> But if the value is an attribute itself (e.g. in case of primitive), then
>>> DML should be allowed on it for sure:
>>> UPDATE table SET _VAL=? WHERE ...   // Allow
>>>
>>> What do you think?
>>>
>>> On Sat, Feb 23, 2019 at 6:50 PM Denis Magda <dm...@gridgain.com> wrote:
>>>
>>>> Vladimir,
>>>>
>>>> Ok, agreed, let's not boil the ocean...at least for now ;)
>>>>
>>>> --
>>>> Denis Magda
>>>>
>>>>
>>>> On Sat, Feb 23, 2019 at 12:50 AM Vladimir Ozerov <vozerov@gridgain.com
>>>> wrote:
>>>>
>>>>> Denis,
>>>>>
>>>>> Yes, this is what my answer was about - you cannot have SQL without
>>>>> defining fields in advance. Because it breaks a lot of standard SQL
>>>>> invariants and virtually makes the whole language unusable. For
>>> instance,
>>>>> think of product behavior in the following cases:
>>>>> 1) User queries an empty cache with a query "SELECT a FROM table" -
>>> what
>>>>> should happen - exception or empty result? How would I know whether
>>> field
>>>>> "a" will appear in future?
>>>>> 2) User executed a command "ALTER TABLE ... ADD COLUMN b" - how can I
>>>>> understand whether it is possible or not to add a column without
>> strict
>>>>> schema?
>>>>> 3) "ALTER TABLE ... DROP COLUMN c" - what should happen if user will
>>> add
>>>> an
>>>>> object with field "c" after that?
>>>>> 4) User connects to Ignite from Tableau and navigates through schema
>> -
>>>> what
>>>>> should be shown?
>>>>>
>>>>> That is, you cannot have SQL without schema because it is at the very
>>>> heart
>>>>> of the technology. But you can have schema-less noSQL database.
>>>>>
>>>>> Let's do not invent a hybrid with tons of corner cases and separate
>>>>> learning curve. It should be enough just to rethink and simplify our
>>>>> configuration - reshape QueryEntity, deprecate all SQL annotations,
>>> allow
>>>>> only one table per cache, allow to define SQL script to be executed
>> on
>>>>> cache start or so.
>>>>>
>>>>> As far as schemaless - it is viable approach for sure, but should be
>>>>> considered either outside of SQL (e.g. a kind of predicate/criteria
>> API
>>>>> which can be merged with ScanQuery) or as a special datatype in SQL
>>>>> ecosystem (like is is done with JSON in many RDBMS databases).
>>>>>
>>>>> Vladimir.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Feb 22, 2019 at 11:01 PM Denis Magda <dm...@apache.org>
>>> wrote:
>>>>>> Vladimir,
>>>>>>
>>>>>> That's understood. I'm just thinking of a use case different from
>> the
>>>> DDL
>>>>>> approach where the schema is defined initially. Let's say that
>>> someone
>>>>>> configured caches with CacheConfiguration and now puts an Object in
>>> the
>>>>>> cache. For that person, it would be helpful to skip the Annotations
>>> or
>>>>>> QueryEntities approaches for queryable fields definitions (not even
>>>>>> indexes). For instance, the person might simply query some fields
>>> with
>>>>> the
>>>>>> primary index in the WHERE clause and this shouldn't require any
>>> extra
>>>>>> settings. Yes, it's clear that it might be extremely challenging to
>>>>> support
>>>>>> but imagine how usable the API could become if we can get rid of
>>>>>> Annotations and QueryEntities.
>>>>>>
>>>>>> Basically, my idea is that all of the objects and their fields
>> stored
>>>> in
>>>>>> the caches should be visible to SQL w/o extra settings. If someone
>>>> wants
>>>>> to
>>>>>> create indexes then use DDL which was designed for this.
>>>>>>
>>>>>>
>>>>>> -
>>>>>> Denis
>>>>>>
>>>>>>
>>>>>> On Fri, Feb 22, 2019 at 2:27 AM Vladimir Ozerov <
>>> vozerov@gridgain.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Denis,
>>>>>>>
>>>>>>> SQL is a language with strict schema what was one of significant
>>>>> factors
>>>>>> of
>>>>>>> it's worldwide success. I doubt we will ever have SQL without
>>>>>>> configuration/definiton, because otherwise it will be not SQL,
>> but
>>>>>>> something else (e.g. document-oriented, JSON, whatever).
>>>>>>>
>>>>>>> On Fri, Feb 22, 2019 at 1:52 AM Denis Magda <dm...@apache.org>
>>>> wrote:
>>>>>>>> Folks,
>>>>>>>>
>>>>>>>> Do we want to preserve the annotation-based configuration?
>> There
>>>> are
>>>>>> too
>>>>>>>> many ways to configure SQL indexes/fields.
>>>>>>>>
>>>>>>>> For instance, if our new SQL API could see and access all of
>> the
>>>>> fields
>>>>>>>> out-of-the-box (without any extra settings) and DDL will be
>> used
>>> to
>>>>>>> define
>>>>>>>> indexed fields then that would be a huge usability improvement.
>>>>>>>>
>>>>>>>> -
>>>>>>>> Denis
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Feb 21, 2019 at 5:27 AM Taras Ledkov <
>>> tledkov@gridgain.com
>>>>>>> wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Lets discuss SQL DML (INSERT/UPDATE) current behavior
>> specific:
>>>>>>>>> Ignite doesn't check a type of input objects when hidden
>>> columns
>>>>>> _key,
>>>>>>>>> _value is used in a DML statements.
>>>>>>>>> I describe the current behavior for example:
>>>>>>>>>
>>>>>>>>> 1. Cache configuration:  'setIndexedTypes(PersonKey.class,
>>>>>>>> Person.class))'
>>>>>>>>> 2.  PersonKey type contains 'int id' field.
>>>>>>>>> 3. SQL statement: 'INSERT INTO test (_val, _key) VALUES (?,
>> ?)'
>>>>>>>>> Cases:
>>>>>>>>> 1. Invalid value object type:
>>>>>>>>> - Any value object may be passed as a query parameter
>>>>>>>>> - Query is executed without an error and returns '1' (one row
>>>>>> updated);
>>>>>>>>> - There is not inserted row at the 'SELECT * FROM test'
>>> results.
>>>>>>>>> - cache.get(key) returns inserted object;
>>>>>>>>>
>>>>>>>>> 2. Invalid key object type:
>>>>>>>>> 2.1 Non-primitive object is passed and binary representation
>>>>> doesn't
>>>>>>>>> contain 'id' field.
>>>>>>>>> - Query is executed without error and returns '1' (one row
>>>>> updated);
>>>>>>>>> - The inserted row is available by 'SELECT *' and the row
>>>> contains
>>>>>> id =
>>>>>>>>> null;
>>>>>>>>> 2.2 Non-primitive object is passed and binary representation
>>>>> contains
>>>>>>>>> 'id' field.
>>>>>>>>> - The inserted row is available by 'SELECT *' and the row
>>>> contains
>>>>>>>>> expected 'id' field;
>>>>>>>>> - The cache entry cannot be gathered by 'cache.get' operation
>>>> with
>>>>>> the
>>>>>>>>> corresponding 'PersonKey(id)' (keys differ).
>>>>>>>>>
>>>>>>>>> I propose to check type of the user's input object.
>>>>>>>>>
>>>>>>>>> I guess that using _key/_val columns works close to
>>> 'cache.put()'
>>>>> but
>>>>>>> it
>>>>>>>>> looks like significant usability issue.
>>>>>>>>> To confuse the 'PersonKey.class.getName()' and
>>>>>>>>> 'node.binary().builder("PersonKey")' is a typical mistake of
>>>> Ignite
>>>>>>>>> newcomers.
>>>>>>>>>
>>>>>>>>> One more argument for check: SQL INSERT sematic means the row
>>> is
>>>>>>>>> inserted into the specified TABLE, not into the cache.
>>>>>>>>> So, throw IgniteSQLException is expected behavior in this
>>> case, i
>>>>>>> think.
>>>>>>>>> [1]. https://issues.apache.org/jira/browse/IGNITE-5250
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Taras Ledkov
>>>>>>>>> Mail-To: tledkov@gridgain.com
>>>>>>>>>
>>>>>>>>>
-- 
Taras Ledkov
Mail-To: tledkov@gridgain.com

Re: SQL: INSERT with hidden columns _key, _val and check the type of input objects

Posted by Vladimir Ozerov <vo...@gridgain.com>.

I do not think this should be deferred, even though it changes default
behavior. Clean and simple semantics is much more important. In this
regards DML was created incorrectly in the first place. We will fix it,
leaving hidden fallback mode for those users who use this strange semantics.

ср, 27 февр. 2019 г. в 12:57, Ilya Kasnacheev <il...@gmail.com>:

> Hello!
>
> > UPDATE table SET _VAL=? WHERE ...   // Disallow
>
> Breaking change and as such should be deferred to 3.0.
>
> All of our tables have types, so we can disallow doing _VAL=? where
> parameter object is not of table's type, and semantics break down here -
> you INSERT object in cache, get "1" rows updated but can't select this row
> from table.
> But we probably should not disallow _VAL=? where parameter object IS of
> table's type, since there may be users whose workflow depends on that and
> it isn't fixable easily.
>
> For example, they can have objects of which only subset of fields is
> indexed, the rest is not. Then they are inserting them via SQL as shown.
>
> Regards,
> --
> Ilya Kasnacheev
>
>
> ср, 27 февр. 2019 г. в 12:10, Vladimir Ozerov <vo...@gridgain.com>:
>
> > Hi Taras,
> >
> > As far as your original question :-) I would say that user should have
> only
> > one way to update data with DML - through plain attributes. That is, if
> we
> > have a composite value with attributes "a" and "b", then we should:
> > UPDATE table SET a=?, b=? WHERE ... // Allow
> > UPDATE table SET _VAL=? WHERE ...   // Disallow
> >
> > But if the value is an attribute itself (e.g. in case of primitive), then
> > DML should be allowed on it for sure:
> > UPDATE table SET _VAL=? WHERE ...   // Allow
> >
> > What do you think?
> >
> > On Sat, Feb 23, 2019 at 6:50 PM Denis Magda <dm...@gridgain.com> wrote:
> >
> > > Vladimir,
> > >
> > > Ok, agreed, let's not boil the ocean...at least for now ;)
> > >
> > > --
> > > Denis Magda
> > >
> > >
> > > On Sat, Feb 23, 2019 at 12:50 AM Vladimir Ozerov <vozerov@gridgain.com
> >
> > > wrote:
> > >
> > > > Denis,
> > > >
> > > > Yes, this is what my answer was about - you cannot have SQL without
> > > > defining fields in advance. Because it breaks a lot of standard SQL
> > > > invariants and virtually makes the whole language unusable. For
> > instance,
> > > > think of product behavior in the following cases:
> > > > 1) User queries an empty cache with a query "SELECT a FROM table" -
> > what
> > > > should happen - exception or empty result? How would I know whether
> > field
> > > > "a" will appear in future?
> > > > 2) User executed a command "ALTER TABLE ... ADD COLUMN b" - how can I
> > > > understand whether it is possible or not to add a column without
> strict
> > > > schema?
> > > > 3) "ALTER TABLE ... DROP COLUMN c" - what should happen if user will
> > add
> > > an
> > > > object with field "c" after that?
> > > > 4) User connects to Ignite from Tableau and navigates through schema
> -
> > > what
> > > > should be shown?
> > > >
> > > > That is, you cannot have SQL without schema because it is at the very
> > > heart
> > > > of the technology. But you can have schema-less noSQL database.
> > > >
> > > > Let's do not invent a hybrid with tons of corner cases and separate
> > > > learning curve. It should be enough just to rethink and simplify our
> > > > configuration - reshape QueryEntity, deprecate all SQL annotations,
> > allow
> > > > only one table per cache, allow to define SQL script to be executed
> on
> > > > cache start or so.
> > > >
> > > > As far as schemaless - it is viable approach for sure, but should be
> > > > considered either outside of SQL (e.g. a kind of predicate/criteria
> API
> > > > which can be merged with ScanQuery) or as a special datatype in SQL
> > > > ecosystem (like is is done with JSON in many RDBMS databases).
> > > >
> > > > Vladimir.
> > > >
> > > >
> > > >
> > > >
> > > > On Fri, Feb 22, 2019 at 11:01 PM Denis Magda <dm...@apache.org>
> > wrote:
> > > >
> > > > > Vladimir,
> > > > >
> > > > > That's understood. I'm just thinking of a use case different from
> the
> > > DDL
> > > > > approach where the schema is defined initially. Let's say that
> > someone
> > > > > configured caches with CacheConfiguration and now puts an Object in
> > the
> > > > > cache. For that person, it would be helpful to skip the Annotations
> > or
> > > > > QueryEntities approaches for queryable fields definitions (not even
> > > > > indexes). For instance, the person might simply query some fields
> > with
> > > > the
> > > > > primary index in the WHERE clause and this shouldn't require any
> > extra
> > > > > settings. Yes, it's clear that it might be extremely challenging to
> > > > support
> > > > > but imagine how usable the API could become if we can get rid of
> > > > > Annotations and QueryEntities.
> > > > >
> > > > > Basically, my idea is that all of the objects and their fields
> stored
> > > in
> > > > > the caches should be visible to SQL w/o extra settings. If someone
> > > wants
> > > > to
> > > > > create indexes then use DDL which was designed for this.
> > > > >
> > > > >
> > > > > -
> > > > > Denis
> > > > >
> > > > >
> > > > > On Fri, Feb 22, 2019 at 2:27 AM Vladimir Ozerov <
> > vozerov@gridgain.com>
> > > > > wrote:
> > > > >
> > > > > > Denis,
> > > > > >
> > > > > > SQL is a language with strict schema what was one of significant
> > > > factors
> > > > > of
> > > > > > it's worldwide success. I doubt we will ever have SQL without
> > > > > > configuration/definiton, because otherwise it will be not SQL,
> but
> > > > > > something else (e.g. document-oriented, JSON, whatever).
> > > > > >
> > > > > > On Fri, Feb 22, 2019 at 1:52 AM Denis Magda <dm...@apache.org>
> > > wrote:
> > > > > >
> > > > > > > Folks,
> > > > > > >
> > > > > > > Do we want to preserve the annotation-based configuration?
> There
> > > are
> > > > > too
> > > > > > > many ways to configure SQL indexes/fields.
> > > > > > >
> > > > > > > For instance, if our new SQL API could see and access all of
> the
> > > > fields
> > > > > > > out-of-the-box (without any extra settings) and DDL will be
> used
> > to
> > > > > > define
> > > > > > > indexed fields then that would be a huge usability improvement.
> > > > > > >
> > > > > > > -
> > > > > > > Denis
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Feb 21, 2019 at 5:27 AM Taras Ledkov <
> > tledkov@gridgain.com
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > Lets discuss SQL DML (INSERT/UPDATE) current behavior
> specific:
> > > > > > > >
> > > > > > > > Ignite doesn't check a type of input objects when hidden
> > columns
> > > > > _key,
> > > > > > > > _value is used in a DML statements.
> > > > > > > > I describe the current behavior for example:
> > > > > > > >
> > > > > > > > 1. Cache configuration:  'setIndexedTypes(PersonKey.class,
> > > > > > > Person.class))'
> > > > > > > > 2.  PersonKey type contains 'int id' field.
> > > > > > > > 3. SQL statement: 'INSERT INTO test (_val, _key) VALUES (?,
> ?)'
> > > > > > > >
> > > > > > > > Cases:
> > > > > > > > 1. Invalid value object type:
> > > > > > > > - Any value object may be passed as a query parameter
> > > > > > > > - Query is executed without an error and returns '1' (one row
> > > > > updated);
> > > > > > > > - There is not inserted row at the 'SELECT * FROM test'
> > results.
> > > > > > > > - cache.get(key) returns inserted object;
> > > > > > > >
> > > > > > > > 2. Invalid key object type:
> > > > > > > > 2.1 Non-primitive object is passed and binary representation
> > > > doesn't
> > > > > > > > contain 'id' field.
> > > > > > > > - Query is executed without error and returns '1' (one row
> > > > updated);
> > > > > > > > - The inserted row is available by 'SELECT *' and the row
> > > contains
> > > > > id =
> > > > > > > > null;
> > > > > > > > 2.2 Non-primitive object is passed and binary representation
> > > > contains
> > > > > > > > 'id' field.
> > > > > > > > - The inserted row is available by 'SELECT *' and the row
> > > contains
> > > > > > > > expected 'id' field;
> > > > > > > > - The cache entry cannot be gathered by 'cache.get' operation
> > > with
> > > > > the
> > > > > > > > corresponding 'PersonKey(id)' (keys differ).
> > > > > > > >
> > > > > > > > I propose to check type of the user's input object.
> > > > > > > >
> > > > > > > > I guess that using _key/_val columns works close to
> > 'cache.put()'
> > > > but
> > > > > > it
> > > > > > > > looks like significant usability issue.
> > > > > > > > To confuse the 'PersonKey.class.getName()' and
> > > > > > > > 'node.binary().builder("PersonKey")' is a typical mistake of
> > > Ignite
> > > > > > > > newcomers.
> > > > > > > >
> > > > > > > > One more argument for check: SQL INSERT sematic means the row
> > is
> > > > > > > > inserted into the specified TABLE, not into the cache.
> > > > > > > > So, throw IgniteSQLException is expected behavior in this
> > case, i
> > > > > > think.
> > > > > > > >
> > > > > > > > [1]. https://issues.apache.org/jira/browse/IGNITE-5250
> > > > > > > >
> > > > > > > > --
> > > > > > > > Taras Ledkov
> > > > > > > > Mail-To: tledkov@gridgain.com
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: SQL: INSERT with hidden columns _key, _val and check the type of input objects

Posted by Ilya Kasnacheev <il...@gmail.com>.

Hello!

> UPDATE table SET _VAL=? WHERE ...   // Disallow

Breaking change and as such should be deferred to 3.0.

All of our tables have types, so we can disallow doing _VAL=? where
parameter object is not of table's type, and semantics break down here -
you INSERT object in cache, get "1" rows updated but can't select this row
from table.
But we probably should not disallow _VAL=? where parameter object IS of
table's type, since there may be users whose workflow depends on that and
it isn't fixable easily.

For example, they can have objects of which only subset of fields is
indexed, the rest is not. Then they are inserting them via SQL as shown.

Regards,
-- 
Ilya Kasnacheev


ср, 27 февр. 2019 г. в 12:10, Vladimir Ozerov <vo...@gridgain.com>:

> Hi Taras,
>
> As far as your original question :-) I would say that user should have only
> one way to update data with DML - through plain attributes. That is, if we
> have a composite value with attributes "a" and "b", then we should:
> UPDATE table SET a=?, b=? WHERE ... // Allow
> UPDATE table SET _VAL=? WHERE ...   // Disallow
>
> But if the value is an attribute itself (e.g. in case of primitive), then
> DML should be allowed on it for sure:
> UPDATE table SET _VAL=? WHERE ...   // Allow
>
> What do you think?
>
> On Sat, Feb 23, 2019 at 6:50 PM Denis Magda <dm...@gridgain.com> wrote:
>
> > Vladimir,
> >
> > Ok, agreed, let's not boil the ocean...at least for now ;)
> >
> > --
> > Denis Magda
> >
> >
> > On Sat, Feb 23, 2019 at 12:50 AM Vladimir Ozerov <vo...@gridgain.com>
> > wrote:
> >
> > > Denis,
> > >
> > > Yes, this is what my answer was about - you cannot have SQL without
> > > defining fields in advance. Because it breaks a lot of standard SQL
> > > invariants and virtually makes the whole language unusable. For
> instance,
> > > think of product behavior in the following cases:
> > > 1) User queries an empty cache with a query "SELECT a FROM table" -
> what
> > > should happen - exception or empty result? How would I know whether
> field
> > > "a" will appear in future?
> > > 2) User executed a command "ALTER TABLE ... ADD COLUMN b" - how can I
> > > understand whether it is possible or not to add a column without strict
> > > schema?
> > > 3) "ALTER TABLE ... DROP COLUMN c" - what should happen if user will
> add
> > an
> > > object with field "c" after that?
> > > 4) User connects to Ignite from Tableau and navigates through schema -
> > what
> > > should be shown?
> > >
> > > That is, you cannot have SQL without schema because it is at the very
> > heart
> > > of the technology. But you can have schema-less noSQL database.
> > >
> > > Let's do not invent a hybrid with tons of corner cases and separate
> > > learning curve. It should be enough just to rethink and simplify our
> > > configuration - reshape QueryEntity, deprecate all SQL annotations,
> allow
> > > only one table per cache, allow to define SQL script to be executed on
> > > cache start or so.
> > >
> > > As far as schemaless - it is viable approach for sure, but should be
> > > considered either outside of SQL (e.g. a kind of predicate/criteria API
> > > which can be merged with ScanQuery) or as a special datatype in SQL
> > > ecosystem (like is is done with JSON in many RDBMS databases).
> > >
> > > Vladimir.
> > >
> > >
> > >
> > >
> > > On Fri, Feb 22, 2019 at 11:01 PM Denis Magda <dm...@apache.org>
> wrote:
> > >
> > > > Vladimir,
> > > >
> > > > That's understood. I'm just thinking of a use case different from the
> > DDL
> > > > approach where the schema is defined initially. Let's say that
> someone
> > > > configured caches with CacheConfiguration and now puts an Object in
> the
> > > > cache. For that person, it would be helpful to skip the Annotations
> or
> > > > QueryEntities approaches for queryable fields definitions (not even
> > > > indexes). For instance, the person might simply query some fields
> with
> > > the
> > > > primary index in the WHERE clause and this shouldn't require any
> extra
> > > > settings. Yes, it's clear that it might be extremely challenging to
> > > support
> > > > but imagine how usable the API could become if we can get rid of
> > > > Annotations and QueryEntities.
> > > >
> > > > Basically, my idea is that all of the objects and their fields stored
> > in
> > > > the caches should be visible to SQL w/o extra settings. If someone
> > wants
> > > to
> > > > create indexes then use DDL which was designed for this.
> > > >
> > > >
> > > > -
> > > > Denis
> > > >
> > > >
> > > > On Fri, Feb 22, 2019 at 2:27 AM Vladimir Ozerov <
> vozerov@gridgain.com>
> > > > wrote:
> > > >
> > > > > Denis,
> > > > >
> > > > > SQL is a language with strict schema what was one of significant
> > > factors
> > > > of
> > > > > it's worldwide success. I doubt we will ever have SQL without
> > > > > configuration/definiton, because otherwise it will be not SQL, but
> > > > > something else (e.g. document-oriented, JSON, whatever).
> > > > >
> > > > > On Fri, Feb 22, 2019 at 1:52 AM Denis Magda <dm...@apache.org>
> > wrote:
> > > > >
> > > > > > Folks,
> > > > > >
> > > > > > Do we want to preserve the annotation-based configuration? There
> > are
> > > > too
> > > > > > many ways to configure SQL indexes/fields.
> > > > > >
> > > > > > For instance, if our new SQL API could see and access all of the
> > > fields
> > > > > > out-of-the-box (without any extra settings) and DDL will be used
> to
> > > > > define
> > > > > > indexed fields then that would be a huge usability improvement.
> > > > > >
> > > > > > -
> > > > > > Denis
> > > > > >
> > > > > >
> > > > > > On Thu, Feb 21, 2019 at 5:27 AM Taras Ledkov <
> tledkov@gridgain.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > Lets discuss SQL DML (INSERT/UPDATE) current behavior specific:
> > > > > > >
> > > > > > > Ignite doesn't check a type of input objects when hidden
> columns
> > > > _key,
> > > > > > > _value is used in a DML statements.
> > > > > > > I describe the current behavior for example:
> > > > > > >
> > > > > > > 1. Cache configuration:  'setIndexedTypes(PersonKey.class,
> > > > > > Person.class))'
> > > > > > > 2.  PersonKey type contains 'int id' field.
> > > > > > > 3. SQL statement: 'INSERT INTO test (_val, _key) VALUES (?, ?)'
> > > > > > >
> > > > > > > Cases:
> > > > > > > 1. Invalid value object type:
> > > > > > > - Any value object may be passed as a query parameter
> > > > > > > - Query is executed without an error and returns '1' (one row
> > > > updated);
> > > > > > > - There is not inserted row at the 'SELECT * FROM test'
> results.
> > > > > > > - cache.get(key) returns inserted object;
> > > > > > >
> > > > > > > 2. Invalid key object type:
> > > > > > > 2.1 Non-primitive object is passed and binary representation
> > > doesn't
> > > > > > > contain 'id' field.
> > > > > > > - Query is executed without error and returns '1' (one row
> > > updated);
> > > > > > > - The inserted row is available by 'SELECT *' and the row
> > contains
> > > > id =
> > > > > > > null;
> > > > > > > 2.2 Non-primitive object is passed and binary representation
> > > contains
> > > > > > > 'id' field.
> > > > > > > - The inserted row is available by 'SELECT *' and the row
> > contains
> > > > > > > expected 'id' field;
> > > > > > > - The cache entry cannot be gathered by 'cache.get' operation
> > with
> > > > the
> > > > > > > corresponding 'PersonKey(id)' (keys differ).
> > > > > > >
> > > > > > > I propose to check type of the user's input object.
> > > > > > >
> > > > > > > I guess that using _key/_val columns works close to
> 'cache.put()'
> > > but
> > > > > it
> > > > > > > looks like significant usability issue.
> > > > > > > To confuse the 'PersonKey.class.getName()' and
> > > > > > > 'node.binary().builder("PersonKey")' is a typical mistake of
> > Ignite
> > > > > > > newcomers.
> > > > > > >
> > > > > > > One more argument for check: SQL INSERT sematic means the row
> is
> > > > > > > inserted into the specified TABLE, not into the cache.
> > > > > > > So, throw IgniteSQLException is expected behavior in this
> case, i
> > > > > think.
> > > > > > >
> > > > > > > [1]. https://issues.apache.org/jira/browse/IGNITE-5250
> > > > > > >
> > > > > > > --
> > > > > > > Taras Ledkov
> > > > > > > Mail-To: tledkov@gridgain.com
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: SQL: INSERT with hidden columns _key, _val and check the type of input objects

Posted by Vladimir Ozerov <vo...@gridgain.com>.

Hi Taras,

As far as your original question :-) I would say that user should have only
one way to update data with DML - through plain attributes. That is, if we
have a composite value with attributes "a" and "b", then we should:
UPDATE table SET a=?, b=? WHERE ... // Allow
UPDATE table SET _VAL=? WHERE ...   // Disallow

But if the value is an attribute itself (e.g. in case of primitive), then
DML should be allowed on it for sure:
UPDATE table SET _VAL=? WHERE ...   // Allow

What do you think?

On Sat, Feb 23, 2019 at 6:50 PM Denis Magda <dm...@gridgain.com> wrote:

> Vladimir,
>
> Ok, agreed, let's not boil the ocean...at least for now ;)
>
> --
> Denis Magda
>
>
> On Sat, Feb 23, 2019 at 12:50 AM Vladimir Ozerov <vo...@gridgain.com>
> wrote:
>
> > Denis,
> >
> > Yes, this is what my answer was about - you cannot have SQL without
> > defining fields in advance. Because it breaks a lot of standard SQL
> > invariants and virtually makes the whole language unusable. For instance,
> > think of product behavior in the following cases:
> > 1) User queries an empty cache with a query "SELECT a FROM table" - what
> > should happen - exception or empty result? How would I know whether field
> > "a" will appear in future?
> > 2) User executed a command "ALTER TABLE ... ADD COLUMN b" - how can I
> > understand whether it is possible or not to add a column without strict
> > schema?
> > 3) "ALTER TABLE ... DROP COLUMN c" - what should happen if user will add
> an
> > object with field "c" after that?
> > 4) User connects to Ignite from Tableau and navigates through schema -
> what
> > should be shown?
> >
> > That is, you cannot have SQL without schema because it is at the very
> heart
> > of the technology. But you can have schema-less noSQL database.
> >
> > Let's do not invent a hybrid with tons of corner cases and separate
> > learning curve. It should be enough just to rethink and simplify our
> > configuration - reshape QueryEntity, deprecate all SQL annotations, allow
> > only one table per cache, allow to define SQL script to be executed on
> > cache start or so.
> >
> > As far as schemaless - it is viable approach for sure, but should be
> > considered either outside of SQL (e.g. a kind of predicate/criteria API
> > which can be merged with ScanQuery) or as a special datatype in SQL
> > ecosystem (like is is done with JSON in many RDBMS databases).
> >
> > Vladimir.
> >
> >
> >
> >
> > On Fri, Feb 22, 2019 at 11:01 PM Denis Magda <dm...@apache.org> wrote:
> >
> > > Vladimir,
> > >
> > > That's understood. I'm just thinking of a use case different from the
> DDL
> > > approach where the schema is defined initially. Let's say that someone
> > > configured caches with CacheConfiguration and now puts an Object in the
> > > cache. For that person, it would be helpful to skip the Annotations or
> > > QueryEntities approaches for queryable fields definitions (not even
> > > indexes). For instance, the person might simply query some fields with
> > the
> > > primary index in the WHERE clause and this shouldn't require any extra
> > > settings. Yes, it's clear that it might be extremely challenging to
> > support
> > > but imagine how usable the API could become if we can get rid of
> > > Annotations and QueryEntities.
> > >
> > > Basically, my idea is that all of the objects and their fields stored
> in
> > > the caches should be visible to SQL w/o extra settings. If someone
> wants
> > to
> > > create indexes then use DDL which was designed for this.
> > >
> > >
> > > -
> > > Denis
> > >
> > >
> > > On Fri, Feb 22, 2019 at 2:27 AM Vladimir Ozerov <vo...@gridgain.com>
> > > wrote:
> > >
> > > > Denis,
> > > >
> > > > SQL is a language with strict schema what was one of significant
> > factors
> > > of
> > > > it's worldwide success. I doubt we will ever have SQL without
> > > > configuration/definiton, because otherwise it will be not SQL, but
> > > > something else (e.g. document-oriented, JSON, whatever).
> > > >
> > > > On Fri, Feb 22, 2019 at 1:52 AM Denis Magda <dm...@apache.org>
> wrote:
> > > >
> > > > > Folks,
> > > > >
> > > > > Do we want to preserve the annotation-based configuration? There
> are
> > > too
> > > > > many ways to configure SQL indexes/fields.
> > > > >
> > > > > For instance, if our new SQL API could see and access all of the
> > fields
> > > > > out-of-the-box (without any extra settings) and DDL will be used to
> > > > define
> > > > > indexed fields then that would be a huge usability improvement.
> > > > >
> > > > > -
> > > > > Denis
> > > > >
> > > > >
> > > > > On Thu, Feb 21, 2019 at 5:27 AM Taras Ledkov <tledkov@gridgain.com
> >
> > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Lets discuss SQL DML (INSERT/UPDATE) current behavior specific:
> > > > > >
> > > > > > Ignite doesn't check a type of input objects when hidden columns
> > > _key,
> > > > > > _value is used in a DML statements.
> > > > > > I describe the current behavior for example:
> > > > > >
> > > > > > 1. Cache configuration:  'setIndexedTypes(PersonKey.class,
> > > > > Person.class))'
> > > > > > 2.  PersonKey type contains 'int id' field.
> > > > > > 3. SQL statement: 'INSERT INTO test (_val, _key) VALUES (?, ?)'
> > > > > >
> > > > > > Cases:
> > > > > > 1. Invalid value object type:
> > > > > > - Any value object may be passed as a query parameter
> > > > > > - Query is executed without an error and returns '1' (one row
> > > updated);
> > > > > > - There is not inserted row at the 'SELECT * FROM test' results.
> > > > > > - cache.get(key) returns inserted object;
> > > > > >
> > > > > > 2. Invalid key object type:
> > > > > > 2.1 Non-primitive object is passed and binary representation
> > doesn't
> > > > > > contain 'id' field.
> > > > > > - Query is executed without error and returns '1' (one row
> > updated);
> > > > > > - The inserted row is available by 'SELECT *' and the row
> contains
> > > id =
> > > > > > null;
> > > > > > 2.2 Non-primitive object is passed and binary representation
> > contains
> > > > > > 'id' field.
> > > > > > - The inserted row is available by 'SELECT *' and the row
> contains
> > > > > > expected 'id' field;
> > > > > > - The cache entry cannot be gathered by 'cache.get' operation
> with
> > > the
> > > > > > corresponding 'PersonKey(id)' (keys differ).
> > > > > >
> > > > > > I propose to check type of the user's input object.
> > > > > >
> > > > > > I guess that using _key/_val columns works close to 'cache.put()'
> > but
> > > > it
> > > > > > looks like significant usability issue.
> > > > > > To confuse the 'PersonKey.class.getName()' and
> > > > > > 'node.binary().builder("PersonKey")' is a typical mistake of
> Ignite
> > > > > > newcomers.
> > > > > >
> > > > > > One more argument for check: SQL INSERT sematic means the row is
> > > > > > inserted into the specified TABLE, not into the cache.
> > > > > > So, throw IgniteSQLException is expected behavior in this case, i
> > > > think.
> > > > > >
> > > > > > [1]. https://issues.apache.org/jira/browse/IGNITE-5250
> > > > > >
> > > > > > --
> > > > > > Taras Ledkov
> > > > > > Mail-To: tledkov@gridgain.com
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: SQL: INSERT with hidden columns _key, _val and check the type of input objects

Posted by Denis Magda <dm...@gridgain.com>.

Vladimir,

Ok, agreed, let's not boil the ocean...at least for now ;)

--
Denis Magda


On Sat, Feb 23, 2019 at 12:50 AM Vladimir Ozerov <vo...@gridgain.com>
wrote:

> Denis,
>
> Yes, this is what my answer was about - you cannot have SQL without
> defining fields in advance. Because it breaks a lot of standard SQL
> invariants and virtually makes the whole language unusable. For instance,
> think of product behavior in the following cases:
> 1) User queries an empty cache with a query "SELECT a FROM table" - what
> should happen - exception or empty result? How would I know whether field
> "a" will appear in future?
> 2) User executed a command "ALTER TABLE ... ADD COLUMN b" - how can I
> understand whether it is possible or not to add a column without strict
> schema?
> 3) "ALTER TABLE ... DROP COLUMN c" - what should happen if user will add an
> object with field "c" after that?
> 4) User connects to Ignite from Tableau and navigates through schema - what
> should be shown?
>
> That is, you cannot have SQL without schema because it is at the very heart
> of the technology. But you can have schema-less noSQL database.
>
> Let's do not invent a hybrid with tons of corner cases and separate
> learning curve. It should be enough just to rethink and simplify our
> configuration - reshape QueryEntity, deprecate all SQL annotations, allow
> only one table per cache, allow to define SQL script to be executed on
> cache start or so.
>
> As far as schemaless - it is viable approach for sure, but should be
> considered either outside of SQL (e.g. a kind of predicate/criteria API
> which can be merged with ScanQuery) or as a special datatype in SQL
> ecosystem (like is is done with JSON in many RDBMS databases).
>
> Vladimir.
>
>
>
>
> On Fri, Feb 22, 2019 at 11:01 PM Denis Magda <dm...@apache.org> wrote:
>
> > Vladimir,
> >
> > That's understood. I'm just thinking of a use case different from the DDL
> > approach where the schema is defined initially. Let's say that someone
> > configured caches with CacheConfiguration and now puts an Object in the
> > cache. For that person, it would be helpful to skip the Annotations or
> > QueryEntities approaches for queryable fields definitions (not even
> > indexes). For instance, the person might simply query some fields with
> the
> > primary index in the WHERE clause and this shouldn't require any extra
> > settings. Yes, it's clear that it might be extremely challenging to
> support
> > but imagine how usable the API could become if we can get rid of
> > Annotations and QueryEntities.
> >
> > Basically, my idea is that all of the objects and their fields stored in
> > the caches should be visible to SQL w/o extra settings. If someone wants
> to
> > create indexes then use DDL which was designed for this.
> >
> >
> > -
> > Denis
> >
> >
> > On Fri, Feb 22, 2019 at 2:27 AM Vladimir Ozerov <vo...@gridgain.com>
> > wrote:
> >
> > > Denis,
> > >
> > > SQL is a language with strict schema what was one of significant
> factors
> > of
> > > it's worldwide success. I doubt we will ever have SQL without
> > > configuration/definiton, because otherwise it will be not SQL, but
> > > something else (e.g. document-oriented, JSON, whatever).
> > >
> > > On Fri, Feb 22, 2019 at 1:52 AM Denis Magda <dm...@apache.org> wrote:
> > >
> > > > Folks,
> > > >
> > > > Do we want to preserve the annotation-based configuration? There are
> > too
> > > > many ways to configure SQL indexes/fields.
> > > >
> > > > For instance, if our new SQL API could see and access all of the
> fields
> > > > out-of-the-box (without any extra settings) and DDL will be used to
> > > define
> > > > indexed fields then that would be a huge usability improvement.
> > > >
> > > > -
> > > > Denis
> > > >
> > > >
> > > > On Thu, Feb 21, 2019 at 5:27 AM Taras Ledkov <tl...@gridgain.com>
> > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > Lets discuss SQL DML (INSERT/UPDATE) current behavior specific:
> > > > >
> > > > > Ignite doesn't check a type of input objects when hidden columns
> > _key,
> > > > > _value is used in a DML statements.
> > > > > I describe the current behavior for example:
> > > > >
> > > > > 1. Cache configuration:  'setIndexedTypes(PersonKey.class,
> > > > Person.class))'
> > > > > 2.  PersonKey type contains 'int id' field.
> > > > > 3. SQL statement: 'INSERT INTO test (_val, _key) VALUES (?, ?)'
> > > > >
> > > > > Cases:
> > > > > 1. Invalid value object type:
> > > > > - Any value object may be passed as a query parameter
> > > > > - Query is executed without an error and returns '1' (one row
> > updated);
> > > > > - There is not inserted row at the 'SELECT * FROM test' results.
> > > > > - cache.get(key) returns inserted object;
> > > > >
> > > > > 2. Invalid key object type:
> > > > > 2.1 Non-primitive object is passed and binary representation
> doesn't
> > > > > contain 'id' field.
> > > > > - Query is executed without error and returns '1' (one row
> updated);
> > > > > - The inserted row is available by 'SELECT *' and the row contains
> > id =
> > > > > null;
> > > > > 2.2 Non-primitive object is passed and binary representation
> contains
> > > > > 'id' field.
> > > > > - The inserted row is available by 'SELECT *' and the row contains
> > > > > expected 'id' field;
> > > > > - The cache entry cannot be gathered by 'cache.get' operation with
> > the
> > > > > corresponding 'PersonKey(id)' (keys differ).
> > > > >
> > > > > I propose to check type of the user's input object.
> > > > >
> > > > > I guess that using _key/_val columns works close to 'cache.put()'
> but
> > > it
> > > > > looks like significant usability issue.
> > > > > To confuse the 'PersonKey.class.getName()' and
> > > > > 'node.binary().builder("PersonKey")' is a typical mistake of Ignite
> > > > > newcomers.
> > > > >
> > > > > One more argument for check: SQL INSERT sematic means the row is
> > > > > inserted into the specified TABLE, not into the cache.
> > > > > So, throw IgniteSQLException is expected behavior in this case, i
> > > think.
> > > > >
> > > > > [1]. https://issues.apache.org/jira/browse/IGNITE-5250
> > > > >
> > > > > --
> > > > > Taras Ledkov
> > > > > Mail-To: tledkov@gridgain.com
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: SQL: INSERT with hidden columns _key, _val and check the type of input objects

Posted by Vladimir Ozerov <vo...@gridgain.com>.

Denis,

Yes, this is what my answer was about - you cannot have SQL without
defining fields in advance. Because it breaks a lot of standard SQL
invariants and virtually makes the whole language unusable. For instance,
think of product behavior in the following cases:
1) User queries an empty cache with a query "SELECT a FROM table" - what
should happen - exception or empty result? How would I know whether field
"a" will appear in future?
2) User executed a command "ALTER TABLE ... ADD COLUMN b" - how can I
understand whether it is possible or not to add a column without strict
schema?
3) "ALTER TABLE ... DROP COLUMN c" - what should happen if user will add an
object with field "c" after that?
4) User connects to Ignite from Tableau and navigates through schema - what
should be shown?

That is, you cannot have SQL without schema because it is at the very heart
of the technology. But you can have schema-less noSQL database.

Let's do not invent a hybrid with tons of corner cases and separate
learning curve. It should be enough just to rethink and simplify our
configuration - reshape QueryEntity, deprecate all SQL annotations, allow
only one table per cache, allow to define SQL script to be executed on
cache start or so.

As far as schemaless - it is viable approach for sure, but should be
considered either outside of SQL (e.g. a kind of predicate/criteria API
which can be merged with ScanQuery) or as a special datatype in SQL
ecosystem (like is is done with JSON in many RDBMS databases).

Vladimir.




On Fri, Feb 22, 2019 at 11:01 PM Denis Magda <dm...@apache.org> wrote:

> Vladimir,
>
> That's understood. I'm just thinking of a use case different from the DDL
> approach where the schema is defined initially. Let's say that someone
> configured caches with CacheConfiguration and now puts an Object in the
> cache. For that person, it would be helpful to skip the Annotations or
> QueryEntities approaches for queryable fields definitions (not even
> indexes). For instance, the person might simply query some fields with the
> primary index in the WHERE clause and this shouldn't require any extra
> settings. Yes, it's clear that it might be extremely challenging to support
> but imagine how usable the API could become if we can get rid of
> Annotations and QueryEntities.
>
> Basically, my idea is that all of the objects and their fields stored in
> the caches should be visible to SQL w/o extra settings. If someone wants to
> create indexes then use DDL which was designed for this.
>
>
> -
> Denis
>
>
> On Fri, Feb 22, 2019 at 2:27 AM Vladimir Ozerov <vo...@gridgain.com>
> wrote:
>
> > Denis,
> >
> > SQL is a language with strict schema what was one of significant factors
> of
> > it's worldwide success. I doubt we will ever have SQL without
> > configuration/definiton, because otherwise it will be not SQL, but
> > something else (e.g. document-oriented, JSON, whatever).
> >
> > On Fri, Feb 22, 2019 at 1:52 AM Denis Magda <dm...@apache.org> wrote:
> >
> > > Folks,
> > >
> > > Do we want to preserve the annotation-based configuration? There are
> too
> > > many ways to configure SQL indexes/fields.
> > >
> > > For instance, if our new SQL API could see and access all of the fields
> > > out-of-the-box (without any extra settings) and DDL will be used to
> > define
> > > indexed fields then that would be a huge usability improvement.
> > >
> > > -
> > > Denis
> > >
> > >
> > > On Thu, Feb 21, 2019 at 5:27 AM Taras Ledkov <tl...@gridgain.com>
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > Lets discuss SQL DML (INSERT/UPDATE) current behavior specific:
> > > >
> > > > Ignite doesn't check a type of input objects when hidden columns
> _key,
> > > > _value is used in a DML statements.
> > > > I describe the current behavior for example:
> > > >
> > > > 1. Cache configuration:  'setIndexedTypes(PersonKey.class,
> > > Person.class))'
> > > > 2.  PersonKey type contains 'int id' field.
> > > > 3. SQL statement: 'INSERT INTO test (_val, _key) VALUES (?, ?)'
> > > >
> > > > Cases:
> > > > 1. Invalid value object type:
> > > > - Any value object may be passed as a query parameter
> > > > - Query is executed without an error and returns '1' (one row
> updated);
> > > > - There is not inserted row at the 'SELECT * FROM test' results.
> > > > - cache.get(key) returns inserted object;
> > > >
> > > > 2. Invalid key object type:
> > > > 2.1 Non-primitive object is passed and binary representation doesn't
> > > > contain 'id' field.
> > > > - Query is executed without error and returns '1' (one row updated);
> > > > - The inserted row is available by 'SELECT *' and the row contains
> id =
> > > > null;
> > > > 2.2 Non-primitive object is passed and binary representation contains
> > > > 'id' field.
> > > > - The inserted row is available by 'SELECT *' and the row contains
> > > > expected 'id' field;
> > > > - The cache entry cannot be gathered by 'cache.get' operation with
> the
> > > > corresponding 'PersonKey(id)' (keys differ).
> > > >
> > > > I propose to check type of the user's input object.
> > > >
> > > > I guess that using _key/_val columns works close to 'cache.put()' but
> > it
> > > > looks like significant usability issue.
> > > > To confuse the 'PersonKey.class.getName()' and
> > > > 'node.binary().builder("PersonKey")' is a typical mistake of Ignite
> > > > newcomers.
> > > >
> > > > One more argument for check: SQL INSERT sematic means the row is
> > > > inserted into the specified TABLE, not into the cache.
> > > > So, throw IgniteSQLException is expected behavior in this case, i
> > think.
> > > >
> > > > [1]. https://issues.apache.org/jira/browse/IGNITE-5250
> > > >
> > > > --
> > > > Taras Ledkov
> > > > Mail-To: tledkov@gridgain.com
> > > >
> > > >
> > >
> >
>

Re: SQL: INSERT with hidden columns _key, _val and check the type of input objects

Posted by Denis Magda <dm...@apache.org>.

Vladimir,

That's understood. I'm just thinking of a use case different from the DDL
approach where the schema is defined initially. Let's say that someone
configured caches with CacheConfiguration and now puts an Object in the
cache. For that person, it would be helpful to skip the Annotations or
QueryEntities approaches for queryable fields definitions (not even
indexes). For instance, the person might simply query some fields with the
primary index in the WHERE clause and this shouldn't require any extra
settings. Yes, it's clear that it might be extremely challenging to support
but imagine how usable the API could become if we can get rid of
Annotations and QueryEntities.

Basically, my idea is that all of the objects and their fields stored in
the caches should be visible to SQL w/o extra settings. If someone wants to
create indexes then use DDL which was designed for this.


-
Denis


On Fri, Feb 22, 2019 at 2:27 AM Vladimir Ozerov <vo...@gridgain.com>
wrote:

> Denis,
>
> SQL is a language with strict schema what was one of significant factors of
> it's worldwide success. I doubt we will ever have SQL without
> configuration/definiton, because otherwise it will be not SQL, but
> something else (e.g. document-oriented, JSON, whatever).
>
> On Fri, Feb 22, 2019 at 1:52 AM Denis Magda <dm...@apache.org> wrote:
>
> > Folks,
> >
> > Do we want to preserve the annotation-based configuration? There are too
> > many ways to configure SQL indexes/fields.
> >
> > For instance, if our new SQL API could see and access all of the fields
> > out-of-the-box (without any extra settings) and DDL will be used to
> define
> > indexed fields then that would be a huge usability improvement.
> >
> > -
> > Denis
> >
> >
> > On Thu, Feb 21, 2019 at 5:27 AM Taras Ledkov <tl...@gridgain.com>
> wrote:
> >
> > > Hi,
> > >
> > > Lets discuss SQL DML (INSERT/UPDATE) current behavior specific:
> > >
> > > Ignite doesn't check a type of input objects when hidden columns _key,
> > > _value is used in a DML statements.
> > > I describe the current behavior for example:
> > >
> > > 1. Cache configuration:  'setIndexedTypes(PersonKey.class,
> > Person.class))'
> > > 2.  PersonKey type contains 'int id' field.
> > > 3. SQL statement: 'INSERT INTO test (_val, _key) VALUES (?, ?)'
> > >
> > > Cases:
> > > 1. Invalid value object type:
> > > - Any value object may be passed as a query parameter
> > > - Query is executed without an error and returns '1' (one row updated);
> > > - There is not inserted row at the 'SELECT * FROM test' results.
> > > - cache.get(key) returns inserted object;
> > >
> > > 2. Invalid key object type:
> > > 2.1 Non-primitive object is passed and binary representation doesn't
> > > contain 'id' field.
> > > - Query is executed without error and returns '1' (one row updated);
> > > - The inserted row is available by 'SELECT *' and the row contains id =
> > > null;
> > > 2.2 Non-primitive object is passed and binary representation contains
> > > 'id' field.
> > > - The inserted row is available by 'SELECT *' and the row contains
> > > expected 'id' field;
> > > - The cache entry cannot be gathered by 'cache.get' operation with the
> > > corresponding 'PersonKey(id)' (keys differ).
> > >
> > > I propose to check type of the user's input object.
> > >
> > > I guess that using _key/_val columns works close to 'cache.put()' but
> it
> > > looks like significant usability issue.
> > > To confuse the 'PersonKey.class.getName()' and
> > > 'node.binary().builder("PersonKey")' is a typical mistake of Ignite
> > > newcomers.
> > >
> > > One more argument for check: SQL INSERT sematic means the row is
> > > inserted into the specified TABLE, not into the cache.
> > > So, throw IgniteSQLException is expected behavior in this case, i
> think.
> > >
> > > [1]. https://issues.apache.org/jira/browse/IGNITE-5250
> > >
> > > --
> > > Taras Ledkov
> > > Mail-To: tledkov@gridgain.com
> > >
> > >
> >
>

Re: SQL: INSERT with hidden columns _key, _val and check the type of input objects

Posted by Vladimir Ozerov <vo...@gridgain.com>.

Denis,

SQL is a language with strict schema what was one of significant factors of
it's worldwide success. I doubt we will ever have SQL without
configuration/definiton, because otherwise it will be not SQL, but
something else (e.g. document-oriented, JSON, whatever).

On Fri, Feb 22, 2019 at 1:52 AM Denis Magda <dm...@apache.org> wrote:

> Folks,
>
> Do we want to preserve the annotation-based configuration? There are too
> many ways to configure SQL indexes/fields.
>
> For instance, if our new SQL API could see and access all of the fields
> out-of-the-box (without any extra settings) and DDL will be used to define
> indexed fields then that would be a huge usability improvement.
>
> -
> Denis
>
>
> On Thu, Feb 21, 2019 at 5:27 AM Taras Ledkov <tl...@gridgain.com> wrote:
>
> > Hi,
> >
> > Lets discuss SQL DML (INSERT/UPDATE) current behavior specific:
> >
> > Ignite doesn't check a type of input objects when hidden columns _key,
> > _value is used in a DML statements.
> > I describe the current behavior for example:
> >
> > 1. Cache configuration:  'setIndexedTypes(PersonKey.class,
> Person.class))'
> > 2.  PersonKey type contains 'int id' field.
> > 3. SQL statement: 'INSERT INTO test (_val, _key) VALUES (?, ?)'
> >
> > Cases:
> > 1. Invalid value object type:
> > - Any value object may be passed as a query parameter
> > - Query is executed without an error and returns '1' (one row updated);
> > - There is not inserted row at the 'SELECT * FROM test' results.
> > - cache.get(key) returns inserted object;
> >
> > 2. Invalid key object type:
> > 2.1 Non-primitive object is passed and binary representation doesn't
> > contain 'id' field.
> > - Query is executed without error and returns '1' (one row updated);
> > - The inserted row is available by 'SELECT *' and the row contains id =
> > null;
> > 2.2 Non-primitive object is passed and binary representation contains
> > 'id' field.
> > - The inserted row is available by 'SELECT *' and the row contains
> > expected 'id' field;
> > - The cache entry cannot be gathered by 'cache.get' operation with the
> > corresponding 'PersonKey(id)' (keys differ).
> >
> > I propose to check type of the user's input object.
> >
> > I guess that using _key/_val columns works close to 'cache.put()' but it
> > looks like significant usability issue.
> > To confuse the 'PersonKey.class.getName()' and
> > 'node.binary().builder("PersonKey")' is a typical mistake of Ignite
> > newcomers.
> >
> > One more argument for check: SQL INSERT sematic means the row is
> > inserted into the specified TABLE, not into the cache.
> > So, throw IgniteSQLException is expected behavior in this case, i think.
> >
> > [1]. https://issues.apache.org/jira/browse/IGNITE-5250
> >
> > --
> > Taras Ledkov
> > Mail-To: tledkov@gridgain.com
> >
> >
>

Re: SQL: INSERT with hidden columns _key, _val and check the type of input objects

Posted by Denis Magda <dm...@apache.org>.

Folks,

Do we want to preserve the annotation-based configuration? There are too
many ways to configure SQL indexes/fields.

For instance, if our new SQL API could see and access all of the fields
out-of-the-box (without any extra settings) and DDL will be used to define
indexed fields then that would be a huge usability improvement.

-
Denis


On Thu, Feb 21, 2019 at 5:27 AM Taras Ledkov <tl...@gridgain.com> wrote:

> Hi,
>
> Lets discuss SQL DML (INSERT/UPDATE) current behavior specific:
>
> Ignite doesn't check a type of input objects when hidden columns _key,
> _value is used in a DML statements.
> I describe the current behavior for example:
>
> 1. Cache configuration:  'setIndexedTypes(PersonKey.class, Person.class))'
> 2.  PersonKey type contains 'int id' field.
> 3. SQL statement: 'INSERT INTO test (_val, _key) VALUES (?, ?)'
>
> Cases:
> 1. Invalid value object type:
> - Any value object may be passed as a query parameter
> - Query is executed without an error and returns '1' (one row updated);
> - There is not inserted row at the 'SELECT * FROM test' results.
> - cache.get(key) returns inserted object;
>
> 2. Invalid key object type:
> 2.1 Non-primitive object is passed and binary representation doesn't
> contain 'id' field.
> - Query is executed without error and returns '1' (one row updated);
> - The inserted row is available by 'SELECT *' and the row contains id =
> null;
> 2.2 Non-primitive object is passed and binary representation contains
> 'id' field.
> - The inserted row is available by 'SELECT *' and the row contains
> expected 'id' field;
> - The cache entry cannot be gathered by 'cache.get' operation with the
> corresponding 'PersonKey(id)' (keys differ).
>
> I propose to check type of the user's input object.
>
> I guess that using _key/_val columns works close to 'cache.put()' but it
> looks like significant usability issue.
> To confuse the 'PersonKey.class.getName()' and
> 'node.binary().builder("PersonKey")' is a typical mistake of Ignite
> newcomers.
>
> One more argument for check: SQL INSERT sematic means the row is
> inserted into the specified TABLE, not into the cache.
> So, throw IgniteSQLException is expected behavior in this case, i think.
>
> [1]. https://issues.apache.org/jira/browse/IGNITE-5250
>
> --
> Taras Ledkov
> Mail-To: tledkov@gridgain.com
>
>