You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/07/23 21:26:19 UTC

[GitHub] [iceberg] danielcweeks opened a new pull request #2855: Add TableLocationSupplier for more flexible table placement

danielcweeks opened a new pull request #2855:
URL: https://github.com/apache/iceberg/pull/2855


   Currently table location placement is somewhat restrictive because the base location is defined by the catalog, but doesn't have all relevant information to construct paths for certain scenarios.  The only information available to the catalog is `TableIdentifier`.
   
   Adding a TableLocationSupplier interface allows for more information to be used in defining the location of the table and allow for more flexibly layouts that may want to take advantage of other fields that are only populated later in the table creation lifecycle (like uuid, partition spec, sort order, or table properties).
   
   Due to the way the table/metadata creation is performed, this is a little difficult to integrate both cleanly and in a backwards compatible way.
   
   Posting this as a possible approach to allow for more flexible table layouts.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #2855: Add TableLocationSupplier for more flexible table placement

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #2855:
URL: https://github.com/apache/iceberg/pull/2855#issuecomment-917673480


   It seems like this is required because the table UUID is generated by `TableMetadata`. I think that I would rather simplify this and pass the UUID into `TableMetadata` when it is created instead and customize the location entirely within the catalog. So an alternative to this is adding the `TableMetadata` builder, #2957.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2855: Add TableLocationSupplier for more flexible table placement

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2855:
URL: https://github.com/apache/iceberg/pull/2855#discussion_r687083141



##########
File path: core/src/main/java/org/apache/iceberg/BaseMetastoreCatalog.java
##########
@@ -154,6 +155,25 @@ public String toString() {
 
   protected abstract String defaultWarehouseLocation(TableIdentifier tableIdentifier);
 
+  protected TableLocationSupplier tableLocationSupplier(TableIdentifier tableIdentifier) {

Review comment:
       Why did you choose to make this a supplier rather than passing the incoming options to a method like `defaultWarehouseLocation`? Is that because the table UUID is created in the `TableMetadata` constructor?
   
   Are there cases where you think the location would be based on the schema or sort order? I can't think of valid cases and I wonder if this could actually be a bad idea. For example, maybe this could look for a column `pii struct<email string, ...>` and choose to place the table in a "sensitive" bucket, but I don't think that would be a good pattern for people to follow because it is error-prone: using `PII` for the column name could break it.
   
   I'm also skeptical that location would be based on the partition spec, especially because the spec may change.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] danielcweeks closed pull request #2855: Add TableLocationSupplier for more flexible table placement

Posted by GitBox <gi...@apache.org>.
danielcweeks closed pull request #2855:
URL: https://github.com/apache/iceberg/pull/2855


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on pull request #2855: Add TableLocationSupplier for more flexible table placement

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on pull request #2855:
URL: https://github.com/apache/iceberg/pull/2855#issuecomment-918574306


   My understanding is that you are trying to make the default table location a function of all the other table states, and make this function a potentially pluggable interface. 
   
   The biggest use case I see is for tables renamed to have the same name as an old table, they will still have a different path and not clobber the data if UUID is a part of the table path. Please let me know if there is any additional important use case that I missed.
   
   So to achieve this feature, I think another way is to have a different method in parallel to table builder's `withLocation(String newLoaction)` method called `withTableRootLocationProvider(String providerImpl)` (or this can be from table properties) to supply a root location provider. This provider is basically `Function<TableMetadataBuilder, String>`, and we will fill the null location value based on location provider in the table metadata builder.
   
   In the PR that Ryan referenced, I have changed to implement the `TableMetadataUpdateBuilder` that works specifically for the update metadata use case. Given this case, I think it now makes more sense to have a separated `TableMetadataBuilder` to create new table metadata, and we can initialize the location provider and resolve the location in the builder.
   
   To make this backwards compatible with existing implementations, we can remove the `defaultWarehouseLocation` method, and instead provide 2 implementations of the provider, which are `HiveTableRootLocationProvider` and `HadoopTableRootLocationProvider`, that should be able to cover all the existing use cases.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org