You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/06/08 06:55:43 UTC

[GitHub] [iceberg] hililiwei opened a new pull request, #4994: API: Support computed comlumns

hililiwei opened a new pull request, #4994:
URL: https://github.com/apache/iceberg/pull/4994

   We have put forward a proposal, which is mainly about supporting computed columns in iceberg.
   the doc:
   https://docs.google.com/document/d/1PICTSKK2yHgGtxgKAjChrKiF1GFDRVZtvcH9zW8gGcQ
   
   Computed columns are commonly used in db, such as SQL Server, Oracle, and PostgreSQL etc.
   
   On the basis of this, we can make Flink support `Time Attributes`, such AS `user_action_time AS PROCTIME()`.
   
   We used this proposal internally. If the community discusses it and agrees to adopt it, then we are more than happy to continue the engine integration work.
   
   We are seeing some PR doing similar things, but it seems to have stalled for a long time. We raise it again here and hope to take it further.
   
   cc @rdblue @kbendick @chenjunjiedada 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] hililiwei commented on pull request #4994: API: Support computed comlumns

Posted by GitBox <gi...@apache.org>.

hililiwei commented on PR #4994:
URL: https://github.com/apache/iceberg/pull/4994#issuecomment-1150578252

> I don't think it is a good idea for Iceberg tables to store engine-specific expressions.

Flink aside, computed columns are not an engine-specific expression. Many relational databases supports this.

First, we assume that computed columns should be part of the table schema. Show such as:
```sql
create table t1(
id int,
name string,
id2 as id*2
)
```

Id2 should be visible to all upper-level engines and maintain the same operational logic: `id*2`.

Second, we want to know if we are prepared to support this DDL syntax inside Iceberg. This is a critical decision, and it is not so strongly related to the upper engine. If not, we need to look for a resolution at the top level, and as you say, it might even need to be addressed outside of the Iceberg project. If a computed column is supported in Flink, how can the column be visible when using Spark? Another compromise would be to support only Time-Attributes-related columns in Flink, so those columns are meaningless to Spark and its visibility is less important. However, in this case, old sql logic using computed columns (which use expressions similar to the above id2) cannot be smoothly migrated to Iceberg, which needs a lot of modification.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] hililiwei commented on pull request #4994: API: Support computed comlumns

Posted by GitBox <gi...@apache.org>.

hililiwei commented on PR #4994:
URL: https://github.com/apache/iceberg/pull/4994#issuecomment-1150582861

   >  -- why not have a generic way to store those expressions and add them to a table?
   
   It's a good inspiration. We'll try it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue commented on pull request #4994: API: Support computed comlumns

Posted by GitBox <gi...@apache.org>.

rdblue commented on PR #4994:
URL: https://github.com/apache/iceberg/pull/4994#issuecomment-1150522177

   I don't think it is a good idea for Iceberg tables to store engine-specific expressions. I would possibly support expressions that are supported natively with Iceberg, like partition transforms. But beyond that I think this would create a big compatibility problem.
   
   What you're trying to solve is probably best solved with a higher layer, possibly outside of the Iceberg project. There is nothing specific to Iceberg about computed columns -- why not have a generic way to store those expressions and add them to a table?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue commented on pull request #4994: API: Support computed comlumns

Posted by GitBox <gi...@apache.org>.

rdblue commented on PR #4994:
URL: https://github.com/apache/iceberg/pull/4994#issuecomment-1153321007

   > Flink aside, computed columns are not an engine-specific expression. Many relational databases supports this.
   
   While other engines support it, SQL text is generally specific to an engine. That's why I meant it is specific to Flink.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org