You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/02/25 00:23:49 UTC

[GitHub] [iceberg] karuppayya opened a new issue #2270: Spark: Session level Iceberg table config defaults

karuppayya opened a new issue #2270:
URL: https://github.com/apache/iceberg/issues/2270


   An organization may want all iceberg tables created to have certain TBLPROPERTIES.
   Currently, such properties have to be specified in every CREATE command. There is high possibilty of it getting missed
   There is no way to specify such properties, such that it gets applied to all table creation.
   Couple of questions.
   1. Should we introduce a mechanism to add some default properties that affects all tables.(Say spark.iceberg,defaults.prop1=value1, all properties with prefix `spark.iceberg.defaults` will go to TBLPROPERTIES).
   2. Should such functionality be introduced in Spark(can be useful to other datasources as well) or in iceberg?
   Thoughts? @aokolnychyi @RussellSpitzer @rdblue 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] aokolnychyi commented on issue #2270: Spark: Session level Iceberg table config defaults

Posted by GitBox <gi...@apache.org>.

aokolnychyi commented on issue #2270:
URL: https://github.com/apache/iceberg/issues/2270#issuecomment-786261856


   @holdenk @dongjoon-hyun @viirya @rdblue @sunchao, do you think it is worth trying in Spark? Seems all catalog implementations would benefit from this.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] aokolnychyi commented on issue #2270: Spark: Session level Iceberg table config defaults

Posted by GitBox <gi...@apache.org>.

aokolnychyi commented on issue #2270:
URL: https://github.com/apache/iceberg/issues/2270#issuecomment-786344598


   In the [example](https://github.com/apache/iceberg/issues/2270#issuecomment-786255694) above, I'd imagine Spark would load defaults from the session config, combine it with properties provided in the create table statement and pass to the catalog implementation. In the considered example, we would get a map with `k1 -> v1` (comes from the session conf) and `k2 -> custom_value` (comes from the statement directly), which will be persistent in the metadata.
   
   The idea of not persisting props and simply overriding at runtime seems interesting but it will be way harder to implement, I guess.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] aokolnychyi edited a comment on issue #2270: Spark: Session level Iceberg table config defaults

Posted by GitBox <gi...@apache.org>.

aokolnychyi edited a comment on issue #2270:
URL: https://github.com/apache/iceberg/issues/2270#issuecomment-786344598


   In the [example](https://github.com/apache/iceberg/issues/2270#issuecomment-786255694) above, I'd imagine Spark would load defaults from the session config, combine it with properties provided in the create table statement and pass to the catalog implementation. Meaning we would get a map with `k1 -> v1` (comes from the session conf) and `k2 -> custom_value` (comes from the statement directly), which will be persistent in the metadata.
   
   The idea of not persisting props and simply overriding at runtime seems interesting but it will be way harder to implement, I guess.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue commented on issue #2270: Spark: Session level Iceberg table config defaults

Posted by GitBox <gi...@apache.org>.

rdblue commented on issue #2270:
URL: https://github.com/apache/iceberg/issues/2270#issuecomment-786338505


   This sounds useful, kind of like the `site` configuration in Hadoop. I like Russell's idea to do this at the catalog level, but I'm already a little worried that catalog configuration is going to end up being quite large and difficult to maintain across Flink, Spark, Hive, and others since each has a separate way to configure catalogs.
   
   Would we actually set these values in tables or would we just change the default so that precedence is table config, then administrator config, then hard-coded defaults?
   
   Where should we put the administrator config?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] aokolnychyi commented on issue #2270: Spark: Session level Iceberg table config defaults

Posted by GitBox <gi...@apache.org>.

aokolnychyi commented on issue #2270:
URL: https://github.com/apache/iceberg/issues/2270#issuecomment-786255694

I think it will be useful. For example, I've frequently seen projects where folks create a number of tables and want to use more or less the same properties across tables within that project. Instead of specifying the same props over and over again, they could set them in their session config template. In another project, they may have a set of different props.

I believe Spark has does that in some cases already. For instance, properties like `spark.datasource.ds_name.key` are propagated into sources during some operations. I'd vote for a catalog-native solution in Spark 3. That way, it will become a standard way of representing this in Spark.

The session conf can look like this:

```
spark.sql.catalog.cat_name.table_defaults.key1 = value1
spark.sql.catalog.cat_name.table_defaults.key2 = value2
```

Then individual property values in create statements will override what is in the session conf.

```
CREATE TABLE ...
TBLPROPERTIES (
'k2' 'custom_value'
)
````

In the future, we can also support read and write options.

```
// the name is arbitrary
interface SupportsSessionConfigDefaults extends TableCatalog {
String tablePropertiesPrefix();
}
```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] RussellSpitzer commented on issue #2270: Spark: Session level Iceberg table config defaults

Posted by GitBox <gi...@apache.org>.

RussellSpitzer commented on issue #2270:
URL: https://github.com/apache/iceberg/issues/2270#issuecomment-786256901


   I know we implemented something like this within the Spark Cassandra Connector, where we allowed a user to set any property that the underlying SCC could use at the Catalog level and therefor set it for all operations carried out by that Catalog.
   
   In our use case I didn't have any hope of doing this in Spark itself so we just implemented it in the "Initialize" method of the catalog but it might be nice if Spark supported this as a first class feature as well.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org