You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ignite.apache.org by Alexander Paschenko <al...@gmail.com> on 2017/06/01 00:57:14 UTC

Re: ALTER TABLE command support

Hi guys,

To my knowledge, our binary format is currently resilient to
adding/removing fields of individual objects while having those
objects logically belonging to the same value type - thanks to all
efforts with binary resolvers and stable field sorting. Thus, ALTER
TABLE implemented as suggested (descriptors manipulation) indeed will
not need actual data modification, amirite?

That said, in my view at a first glance the task boils down to following:

1. Make type descriptors and table descriptors mutable in principle
2. Add Ignite ALTER TABLE command (at first approach let's limit
ourselves with columns add/remove)
3. Implement its propagation and distributed execution (CREATE INDEX
style - this mechanism of distributed schema changes has already been
implemented by Vlad)

4. On local stage of distributed execution:

4.1. Issue on local H2 instance combination of DROP TABLE and then
CREATE TABLE with new columns list.
This is a lot like what H2 does when you ask it to ALTER TABLE, but H2
actually also does data copying via SELECT which we don't need to do
as far as I can see due to binary format resiliency I wrote above -
our combination of local CREATE and DROP will modify local H2 metadata
to look the way we ultimately need.

4.2. After we're finished with H2 metadata, we're safe to modify type
and table descriptors.

Of course some amount of additional work will be needed to make sure
that such operations are mutually exclusive with cache gets/puts, or
index creation, or cache creation (also implies manipulations with
type descriptors and may also cause a CREATE TABLE in its own right).
But still approach in general could be as described above.

I also have some ideas about support for constraints which is in fact
is also concerned with descriptors mutation, but I believe this is not
the topic of this conversation.

Thoughts?

– Alex

2017-05-31 23:36 GMT+03:00 Denis Magda <dm...@apache.org>:
> Agree. The gradual approach is the way to go for us.
>
> —
> Denis
>
>> On May 31, 2017, at 1:20 PM, Dmitriy Setrakyan <ds...@apache.org> wrote:
>>
>> I think a fully functional ALTER TABLE command may be hard to implement, as
>> it includes changes of columns, types, constraints, etc... We should take a
>> gradual approach here and implement this command by phases.
>>
>> I would propose that in the first phase we simply add the capability to add
>> and remove columns to a table. This way, given that we already support
>> CREATE INDEX command, the new column can be immediately indexed within the
>> cluster and used for query execution.
>>
>> Does this sound like a plan?
>>
>> D.
>>
>> On Wed, May 31, 2017 at 11:52 AM, Denis Magda <dm...@apache.org> wrote:
>>
>>> Sergi, Vovan, Alexander P.,
>>>
>>> It’s great that we added CREATE/DROP index commands support to Ignite. As
>>> the next step, I think we need to move forward and plan to add ALTER TABLE
>>> command to the list.
>>>
>>> The reason we should have this command is simple. If a user adds a new
>>> field to Person class (while the cluster is up and running) then there
>>> should be a way to access the field from SQL and index it later if needed.
>>> Presently, this is not supported.
>>>
>>> What will be our efforts to support this?
>>>
>>> —
>>> Denis
>

Re: ALTER TABLE command support

Posted by Vladimir Ozerov <vo...@gridgain.com>.
Alex,

There are several complex things here:

1) DROP TABLE will require modification of data. Effectively, we have to
rebuild all binary objects in our cache. And the key question - how to make
this process efficient. Without it the following will fail:

INSERT INTO Person(id, salary) VALUES (1, 10_000);
ALTER TABLE Person DROP COLUMN salary;
ALTER TABLE Person ADD COLUMN salary;
SELECT salary FROM Person WHERE id = 1 // Must be NULL, but will be 10_000

2) DROP COLUMN must take in count indexes on the column (if any)

3) ALTER TABLE might IgniteCache.put() interfere with IgniteCache.put() in
subtle ways. E.g.:
INSERT INTO Person(id, salary) VALUES (1, 10_000);
ALTER TABLE Person DROP COLUMN salary;
IgniteCache.put(1, Builder["salary"=10_000]); // Should we allow or fail
this?

4) We have to rework DML as well. Otherwise current processors will work
incorrectly. Consider we have two threads T1 and T2:
T1: INSERT INTO Person(id, salary) VALUES (1, 10_000);
T1: Prepared entry processor
T2: ALTER TABLE Person DROP COLUMN salary
T1: Sent entry processor which completed successfully
T2: ALTER TABLE Person ADD COLUMN salary;
T2: SELECT salary FROM Person WHERE id = 1 // Must be NULL, but will be
10_000

Probably we will have to maintain a kind of schema versions or so here.

5) Last, but not least - design of this feature must take in count upcoming
MVCC, as their designs might be interrelated.

Vladimir.

On Thu, Jun 1, 2017 at 3:57 AM, Alexander Paschenko <
alexander.a.paschenko@gmail.com> wrote:

> Hi guys,
>
> To my knowledge, our binary format is currently resilient to
> adding/removing fields of individual objects while having those
> objects logically belonging to the same value type - thanks to all
> efforts with binary resolvers and stable field sorting. Thus, ALTER
> TABLE implemented as suggested (descriptors manipulation) indeed will
> not need actual data modification, amirite?
>
> That said, in my view at a first glance the task boils down to following:
>
> 1. Make type descriptors and table descriptors mutable in principle
> 2. Add Ignite ALTER TABLE command (at first approach let's limit
> ourselves with columns add/remove)
> 3. Implement its propagation and distributed execution (CREATE INDEX
> style - this mechanism of distributed schema changes has already been
> implemented by Vlad)
>
> 4. On local stage of distributed execution:
>
> 4.1. Issue on local H2 instance combination of DROP TABLE and then
> CREATE TABLE with new columns list.
> This is a lot like what H2 does when you ask it to ALTER TABLE, but H2
> actually also does data copying via SELECT which we don't need to do
> as far as I can see due to binary format resiliency I wrote above -
> our combination of local CREATE and DROP will modify local H2 metadata
> to look the way we ultimately need.
>
> 4.2. After we're finished with H2 metadata, we're safe to modify type
> and table descriptors.
>
> Of course some amount of additional work will be needed to make sure
> that such operations are mutually exclusive with cache gets/puts, or
> index creation, or cache creation (also implies manipulations with
> type descriptors and may also cause a CREATE TABLE in its own right).
> But still approach in general could be as described above.
>
> I also have some ideas about support for constraints which is in fact
> is also concerned with descriptors mutation, but I believe this is not
> the topic of this conversation.
>
> Thoughts?
>
> – Alex
>
> 2017-05-31 23:36 GMT+03:00 Denis Magda <dm...@apache.org>:
> > Agree. The gradual approach is the way to go for us.
> >
> > —
> > Denis
> >
> >> On May 31, 2017, at 1:20 PM, Dmitriy Setrakyan <ds...@apache.org>
> wrote:
> >>
> >> I think a fully functional ALTER TABLE command may be hard to
> implement, as
> >> it includes changes of columns, types, constraints, etc... We should
> take a
> >> gradual approach here and implement this command by phases.
> >>
> >> I would propose that in the first phase we simply add the capability to
> add
> >> and remove columns to a table. This way, given that we already support
> >> CREATE INDEX command, the new column can be immediately indexed within
> the
> >> cluster and used for query execution.
> >>
> >> Does this sound like a plan?
> >>
> >> D.
> >>
> >> On Wed, May 31, 2017 at 11:52 AM, Denis Magda <dm...@apache.org>
> wrote:
> >>
> >>> Sergi, Vovan, Alexander P.,
> >>>
> >>> It’s great that we added CREATE/DROP index commands support to Ignite.
> As
> >>> the next step, I think we need to move forward and plan to add ALTER
> TABLE
> >>> command to the list.
> >>>
> >>> The reason we should have this command is simple. If a user adds a new
> >>> field to Person class (while the cluster is up and running) then there
> >>> should be a way to access the field from SQL and index it later if
> needed.
> >>> Presently, this is not supported.
> >>>
> >>> What will be our efforts to support this?
> >>>
> >>> —
> >>> Denis
> >
>