You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Shengkai Fang <fs...@gmail.com> on 2022/11/18 03:55:01 UTC

[DISCUSS] FLIP-273: Improve Catalog API to Support ALTER TABLE syntax

Hi, dev.


I want to start a discussion about the FLIP-273: Improve Catalog API to
Support ALTER TABLE syntax[1]. The motivation of the FLIP is the current
Catalog API is difficult for some Catalog to alter the table. For example,
the Catalog only exposes

   void alterTable(ObjectPath tablePath, CatalogBaseTable newTable, boolean
ignoreIfNotExists)

now. The main problem of this API is it requires the Catalog,
e.g.`JDBCCatalog` to compare the difference between the new
`CatalogTable`and the original CatalogTable, then generate the SQL to alter
the table in the external system. However, the `ALTER TABLE` syntax already
told the Catalog the differences. With the proposal, the Catalog developer
can easily support the `ALTER TABLE` syntax.

Looking forward to feedback on that proposal.

Best,
Shengkai


[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-273%3A+Improve+the+Catalog+API+to+Support+ALTER+TABLE+syntax

Re: [DISCUSS] FLIP-273: Improve Catalog API to Support ALTER TABLE syntax

Posted by Shengkai Fang <fs...@gmail.com>.
Hi,

Considering there has been no response for a long time, I want to start the
vote on Thursday.

Best,
Shengkai

Jingsong Li <ji...@gmail.com> 于2022年11月24日周四 15:20写道:

> Thanks for the update!
>
> Looks good to me.
>
> Best,
> Jingsong
>
> On Thu, Nov 24, 2022 at 3:07 PM Shengkai Fang <fs...@gmail.com> wrote:
> >
> > Hi, Jingsong.
> >
> > Thanks for your feedback!
> >
> > > Can we define classes at once so that the connector can be fully
> > implemented, but we will not pass changes of the nested column.
> >
> > It's hard to achieve this. If a new need comes, we will add a new kind of
> > TableChange. But I think the current proposal aligns with the Flink
> syntax.
> >
> > > In addition, can we define the modified class of the nested column as
> the
> > parent class? The top-level column modification is just a special case
> of a
> > subclass.
> >
> > I think the modification of the composite type is much more complicated.
> It
> > also modifies the MAP, ARRAY type. For example, Spark supports to modify
> > the element of ARRAY type with the following syntax:
> >
> >
> > ```
> > -- add a field to the struct within an array. Using keyword 'element' to
> > access the array's element column.
> > ALTER TABLE prod.db.sample
> > ADD COLUMN points.element.z double
> > ```
> >
> > For the traditional database, they use the ALTER TYPE syntax to modify
> the
> > composite type.
> >
> > ```
> > CREATE TYPE compfoo AS (f1 int, f2 text);
> >
> > -- add column
> > ALTER TYPE compfoo ADD ATTRIBUTE f3 int;
> > -- rename column
> > ALTER TYPE compfoo RENAME ATTRIBUTE f3 TO f4;
> > -- drop column
> > ALTER TYPE compfoo DROP ATTRIBUTE f4;
> > -- modify type
> > ALTER TYPE compfoo ALTER ATTRIBUTE f1 TYPE text;
> >
> > ```
> >
> > I think the modification of the top-level column is different from the
> > modification of the nested field, and we don't need to introduce a base
> > class. BTW, the RexFieldAccess is also not the parent class of the
> > RexInputRef in Calcite.
> >
> > > ModifyColumn VS fine-grained Change
> >
> > The syntax in Flink also supports modifying the physical column to a
> > metadata column, metadata column definition, etc. The main problem here
> is
> > we need to expose what kind of changes happen when modifying the metadata
> > column, modifying the computed column, and changing the column kind. With
> > these possibilities, it may include:
> >
> > base: ModifyColumnComment, MoidfyColumnPosition;
> > physical column: ModifyPhysicalColumnDataType,
> > ModifyPhysicalColumnNullability;
> > metadata column: ModifyMetadataColumnMetadataKey,
> > ModifyMetadataColumnVirtual, ModifyMetdataColumnDataType;
> > computed column: ModifyComputedColumnExpression
> > column type change: ModifyPhysicalColumnToMetadataColumn,
> > ModifyPhysicalColumnToComputedColumn,
> ModifyComputedColumnToPhysicalColumn,
> > ModifyComputedColumnToMetadataColumn,
> > ModifyMetadataColumnToPhysicalColumn,
> ModifyMetadataColumnToComputedColumn.
> >
> > I just wonder whether it's better we still keep the ModifyColumn and
> > introduce some fine-grained TableChanges because the most cases above are
> > not useful to external catalog, e.g. JDBCCatalog. So I think we can
> > introduce the ModifyColumn that represents column modification and
> >
> > ```
> > /** Modify the column comment */
> > public class ModifyColumnComment extends ModifyColumn {
> >
> >     String getComment();
> >
> > }
> >
> > /** Modify the column position. */
> > public class ModifyColumnPosition extends ModifyColumn {}
> >
> > /** Modify the physical column data type. */
> > public class ModifyPhysicalColumnType extends ModifyColumn {
> >
> >     DataType getNewType();
> >
> > }
> >
> > ```
> >
> > When altering the physical column, the statement will produce the
> > fine-grained changes. In other cases, we will still produce the base
> class
> > ModifyColumn.
> >
> >
> > > ModifyColumn can also rename the column.
> >
> > Yes. I have modified the FLIP and moved the RenameColumn and added a
> class
> > named ModifyColumnName extends ModifyColumn.
> >
> > Best,
> > Shengkai
>

Re: [DISCUSS] FLIP-273: Improve Catalog API to Support ALTER TABLE syntax

Posted by Jingsong Li <ji...@gmail.com>.
Thanks for the update!

Looks good to me.

Best,
Jingsong

On Thu, Nov 24, 2022 at 3:07 PM Shengkai Fang <fs...@gmail.com> wrote:
>
> Hi, Jingsong.
>
> Thanks for your feedback!
>
> > Can we define classes at once so that the connector can be fully
> implemented, but we will not pass changes of the nested column.
>
> It's hard to achieve this. If a new need comes, we will add a new kind of
> TableChange. But I think the current proposal aligns with the Flink syntax.
>
> > In addition, can we define the modified class of the nested column as the
> parent class? The top-level column modification is just a special case of a
> subclass.
>
> I think the modification of the composite type is much more complicated. It
> also modifies the MAP, ARRAY type. For example, Spark supports to modify
> the element of ARRAY type with the following syntax:
>
>
> ```
> -- add a field to the struct within an array. Using keyword 'element' to
> access the array's element column.
> ALTER TABLE prod.db.sample
> ADD COLUMN points.element.z double
> ```
>
> For the traditional database, they use the ALTER TYPE syntax to modify the
> composite type.
>
> ```
> CREATE TYPE compfoo AS (f1 int, f2 text);
>
> -- add column
> ALTER TYPE compfoo ADD ATTRIBUTE f3 int;
> -- rename column
> ALTER TYPE compfoo RENAME ATTRIBUTE f3 TO f4;
> -- drop column
> ALTER TYPE compfoo DROP ATTRIBUTE f4;
> -- modify type
> ALTER TYPE compfoo ALTER ATTRIBUTE f1 TYPE text;
>
> ```
>
> I think the modification of the top-level column is different from the
> modification of the nested field, and we don't need to introduce a base
> class. BTW, the RexFieldAccess is also not the parent class of the
> RexInputRef in Calcite.
>
> > ModifyColumn VS fine-grained Change
>
> The syntax in Flink also supports modifying the physical column to a
> metadata column, metadata column definition, etc. The main problem here is
> we need to expose what kind of changes happen when modifying the metadata
> column, modifying the computed column, and changing the column kind. With
> these possibilities, it may include:
>
> base: ModifyColumnComment, MoidfyColumnPosition;
> physical column: ModifyPhysicalColumnDataType,
> ModifyPhysicalColumnNullability;
> metadata column: ModifyMetadataColumnMetadataKey,
> ModifyMetadataColumnVirtual, ModifyMetdataColumnDataType;
> computed column: ModifyComputedColumnExpression
> column type change: ModifyPhysicalColumnToMetadataColumn,
> ModifyPhysicalColumnToComputedColumn, ModifyComputedColumnToPhysicalColumn,
> ModifyComputedColumnToMetadataColumn,
> ModifyMetadataColumnToPhysicalColumn, ModifyMetadataColumnToComputedColumn.
>
> I just wonder whether it's better we still keep the ModifyColumn and
> introduce some fine-grained TableChanges because the most cases above are
> not useful to external catalog, e.g. JDBCCatalog. So I think we can
> introduce the ModifyColumn that represents column modification and
>
> ```
> /** Modify the column comment */
> public class ModifyColumnComment extends ModifyColumn {
>
>     String getComment();
>
> }
>
> /** Modify the column position. */
> public class ModifyColumnPosition extends ModifyColumn {}
>
> /** Modify the physical column data type. */
> public class ModifyPhysicalColumnType extends ModifyColumn {
>
>     DataType getNewType();
>
> }
>
> ```
>
> When altering the physical column, the statement will produce the
> fine-grained changes. In other cases, we will still produce the base class
> ModifyColumn.
>
>
> > ModifyColumn can also rename the column.
>
> Yes. I have modified the FLIP and moved the RenameColumn and added a class
> named ModifyColumnName extends ModifyColumn.
>
> Best,
> Shengkai

Re: [DISCUSS] FLIP-273: Improve Catalog API to Support ALTER TABLE syntax

Posted by Shengkai Fang <fs...@gmail.com>.
Hi, Jingsong.

Thanks for your feedback!

> Can we define classes at once so that the connector can be fully
implemented, but we will not pass changes of the nested column.

It's hard to achieve this. If a new need comes, we will add a new kind of
TableChange. But I think the current proposal aligns with the Flink syntax.

> In addition, can we define the modified class of the nested column as the
parent class? The top-level column modification is just a special case of a
subclass.

I think the modification of the composite type is much more complicated. It
also modifies the MAP, ARRAY type. For example, Spark supports to modify
the element of ARRAY type with the following syntax:


```
-- add a field to the struct within an array. Using keyword 'element' to
access the array's element column.
ALTER TABLE prod.db.sample
ADD COLUMN points.element.z double
```

For the traditional database, they use the ALTER TYPE syntax to modify the
composite type.

```
CREATE TYPE compfoo AS (f1 int, f2 text);

-- add column
ALTER TYPE compfoo ADD ATTRIBUTE f3 int;
-- rename column
ALTER TYPE compfoo RENAME ATTRIBUTE f3 TO f4;
-- drop column
ALTER TYPE compfoo DROP ATTRIBUTE f4;
-- modify type
ALTER TYPE compfoo ALTER ATTRIBUTE f1 TYPE text;

```

I think the modification of the top-level column is different from the
modification of the nested field, and we don't need to introduce a base
class. BTW, the RexFieldAccess is also not the parent class of the
RexInputRef in Calcite.

> ModifyColumn VS fine-grained Change

The syntax in Flink also supports modifying the physical column to a
metadata column, metadata column definition, etc. The main problem here is
we need to expose what kind of changes happen when modifying the metadata
column, modifying the computed column, and changing the column kind. With
these possibilities, it may include:

base: ModifyColumnComment, MoidfyColumnPosition;
physical column: ModifyPhysicalColumnDataType,
ModifyPhysicalColumnNullability;
metadata column: ModifyMetadataColumnMetadataKey,
ModifyMetadataColumnVirtual, ModifyMetdataColumnDataType;
computed column: ModifyComputedColumnExpression
column type change: ModifyPhysicalColumnToMetadataColumn,
ModifyPhysicalColumnToComputedColumn, ModifyComputedColumnToPhysicalColumn,
ModifyComputedColumnToMetadataColumn,
ModifyMetadataColumnToPhysicalColumn, ModifyMetadataColumnToComputedColumn.

I just wonder whether it's better we still keep the ModifyColumn and
introduce some fine-grained TableChanges because the most cases above are
not useful to external catalog, e.g. JDBCCatalog. So I think we can
introduce the ModifyColumn that represents column modification and

```
/** Modify the column comment */
public class ModifyColumnComment extends ModifyColumn {

    String getComment();

}

/** Modify the column position. */
public class ModifyColumnPosition extends ModifyColumn {}

/** Modify the physical column data type. */
public class ModifyPhysicalColumnType extends ModifyColumn {

    DataType getNewType();

}

```

When altering the physical column, the statement will produce the
fine-grained changes. In other cases, we will still produce the base class
ModifyColumn.


> ModifyColumn can also rename the column.

Yes. I have modified the FLIP and moved the RenameColumn and added a class
named ModifyColumnName extends ModifyColumn.

Best,
Shengkai

Re: [DISCUSS] FLIP-273: Improve Catalog API to Support ALTER TABLE syntax

Posted by Jingsong Li <ji...@gmail.com>.
Thanks Shengkai for the proposal!

This should be very useful!

> In some cases, users might need to modify the nested column. Our proposal is able to express the idea by introducing a new class named.

My concern is that the connector needs to be implemented multiple
times. Can we define classes at once so that the connector can be
fully implemented, but we will not pass changes of the nested column.

In addition, can we define the modified class of the nested column as
the parent class? The top-level column modification is just a special
case of a subclass.

> ModifyColumn & RenameColumn

ModifyColumn seems to be able to do RenameColumn's things?

I prefer not to have classes such as ModifyColumn, and directly throw
the most fine-grained changes to the Catalog to avoid any unsatisfied
requirements in the future.
Like ModifyColumnType ModifyColumnComment....

Can we add the corresponding SQL in the method comments?

Best,
Jingsong

On Fri, Nov 18, 2022 at 11:55 AM Shengkai Fang <fs...@gmail.com> wrote:
>
> Hi, dev.
>
>
> I want to start a discussion about the FLIP-273: Improve Catalog API to
> Support ALTER TABLE syntax[1]. The motivation of the FLIP is the current
> Catalog API is difficult for some Catalog to alter the table. For example,
> the Catalog only exposes
>
>    void alterTable(ObjectPath tablePath, CatalogBaseTable newTable, boolean
> ignoreIfNotExists)
>
> now. The main problem of this API is it requires the Catalog,
> e.g.`JDBCCatalog` to compare the difference between the new
> `CatalogTable`and the original CatalogTable, then generate the SQL to alter
> the table in the external system. However, the `ALTER TABLE` syntax already
> told the Catalog the differences. With the proposal, the Catalog developer
> can easily support the `ALTER TABLE` syntax.
>
> Looking forward to feedback on that proposal.
>
> Best,
> Shengkai
>
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-273%3A+Improve+the+Catalog+API+to+Support+ALTER+TABLE+syntax