You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "munendrasn (via GitHub)" <gi...@apache.org> on 2023/02/07 14:17:26 UTC

[GitHub] [iceberg] munendrasn opened a new issue, #6763: ACL when using DynamoDb based Catalog

munendrasn opened a new issue, #6763:
URL: https://github.com/apache/iceberg/issues/6763

   ### Query engine
   
   _No response_
   
   ### Question
   
   We are using the DynamoDb based Catalog implementation, and working on providing namespace level access control.
   
   DynamoDb table per namespace is not the option due to limitation on number of tables in DynamoDb per region.
   Hence, we are relying on Row-level access control provided by [DynamoDb](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/specifying-conditions.html).
   
   While providing access to Namespace entries we are hitting limitation, as DynamoDb provides row-access control on PartitionKey. For all the namespace entries, PartitionKey is [NAMESPACE](https://github.com/apache/iceberg/pull/2688) . So, we would end giving access to all the Namespace entries.
   One option is to store the namespace value in identifier which would enable Access control. With this, listNamespace would need to performed on GSI (existing one) instead of Primary Index.
   
   Is there any reason for storing the actual value in namespace attribute rather than identifier attribute for namespace?
   Also, Please share if there are any concerns with swapping the values for  Namespace entry in the DynamoDb Catalog Implementation
   cc @SreeramGarlapati @mohitgargk @ChaladiMohanVamsi


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] munendrasn commented on issue #6763: ACL when using DynamoDb based Catalog

Posted by "munendrasn (via GitHub)" <gi...@apache.org>.
munendrasn commented on issue #6763:
URL: https://github.com/apache/iceberg/issues/6763#issuecomment-1424167414

   ### Hybrid Mode
   For the Hybrid mode, we evaluated different ways of handling existing namespaces
   * Any new namespace update would swap the value during updateItem
      * Impacts only the updated namespaces
   * Both formats being present in DynamoDb table for the same Namespace entry
       * Need to have additional migration, transactional writes only covers the Namespace entries that are being updated
   * Allowing the existing Namespace entries to be current format, and new ones to be in new format
   
   Every approach has additional overhead of handling of certain cases, sharing few of them.
   * For namespace operations like setProperties, removeProperties which does loadNamespace, and then update
        * Need to maintain context how the namespace was read for update, or need to transactional update
        * Moreover, ACL permission issue would still exists.
   * Similarly for createNamespace, with new config, we might end up creating namespace for existing namespace. Creation of Namespace with existing name would fail currently. With Hybrid one, it would be tricker to handle - not sure DynamoDb supports conditional expression on different item when updating/creating a new item
   
   Moreover, Hybrid mode would have to be maintained which might impact future feature dev in DynamoDb catalog.  ALL Namespace access need to be provided for Hybrid mode, which again impact ACL permissions. Roll-forward won't complete work if some of the Namespace are never updated.
   
   ### Single config for read & write
   
    We are letting the user to adopt it based on their use-case. If they don't have requirement for ACL then they don't need to move to new format. For a single Catalog instance, all writes, and read would use one format.
   The idea is to let user choose based on use-case, and make an informed choice. As even with Hybrid mode, for namespace not updated, need separate migration
   
   As for migration, I was thinking something similar to `registerTable` , maybe we can have `registerNamespace(Namespace namespace, SupportsNamespaces originalMetastore)`.
   Same can be achieved without registerNamespace functionality too
   * Essentially, read Namespace from one catalog, createNamespace in another
   * Users will move to new format when required, and they have executed the migration for all existing namespace
   New DynamoDb Catalog implementation
   
   ### Add new implementation, and deprecate the existing one.
   
   This has an additional advantage that eventually one would be removed, and future feature development need to handle to single format
   
   ---------------
   
   Due to cases that can arise with Hybrid mode, inclined towards giving the adoption control to the user.  Supporting both formats may have possible impact on future developments.
   Eventually, one format need to be deprecated & removed. Hence, inclined towards having single config, or different catalog implementations. Having multiple implementation would ease the removal process but config based approach should solve the problem too. 
   
   Also, to simplify should we can rename config to `dynamodb.catalog.schema-format` which default to `v1`, and new changes can be `v2`
   
   Kindly, let me know if I missed something. Based on your feedback & response, will move forward with the implementation
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] munendrasn commented on issue #6763: ACL when using DynamoDb based Catalog

Posted by "munendrasn (via GitHub)" <gi...@apache.org>.
munendrasn commented on issue #6763:
URL: https://github.com/apache/iceberg/issues/6763#issuecomment-1422834606

   @jackye1995 
   Pinging you as it relates to PR https://github.com/apache/iceberg/pull/2688. Please share your thoughts/opinion on swapping Values for Namespace entries


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on issue #6763: ACL when using DynamoDb based Catalog

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on issue #6763:
URL: https://github.com/apache/iceberg/issues/6763#issuecomment-1422854166

   The plan sounds good to me, limitation on row level security was overlooked during the initial implementation. But how do we ensure backwards compatibility? Feels like if we update the schema of the table, it requires a migration?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on issue #6763: ACL when using DynamoDb based Catalog

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on issue #6763:
URL: https://github.com/apache/iceberg/issues/6763#issuecomment-1424684806

   > Due to cases that can arise with Hybrid mode, inclined towards giving the adoption control to the user. Supporting both formats may have possible impact on future developments.
   
   Thanks for listing these pros and cons, i agree that it will be increasingly difficult in the hybrid mode. We can start simple to let user handle migration and see how that goes.
   
   > Also, to simplify should we can rename config to dynamodb.catalog.schema-format which default to v1, and new changes can be v2
   
   I like the idea of having a schema version, and we can build tools around migration. We should be clear about how the migration should take place.
   
   My thought is that a new schema format must create a new table in DynamoDB, and the migration tool should be built using DynamoDB stream so it supports incremental updates to forward what was in the previous schema version to the new schema version.
   
   During the migration process, for each table, all the writer must switch at the same time, otherwise 1 writer will commit to the new schema and one will commit to old, causing diverged table version history. The writer library should be upgraded to check for a table property to see what is the catalog schema version it is committing against, and if it cannot support it, it should abort.
   
   Just some thoughts, we can start with a v2 schema implementation and release that first, and then work on the migration part if necessary. Maybe it is not needed and people will just halt all operations and do a full migration and that is sufficient.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] munendrasn commented on issue #6763: ACL when using DynamoDb based Catalog

Posted by "munendrasn (via GitHub)" <gi...@apache.org>.
munendrasn commented on issue #6763:
URL: https://github.com/apache/iceberg/issues/6763#issuecomment-1690273116

   PR related to the the issue #6862 (commenting to mark as not-stale)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] munendrasn commented on issue #6763: ACL when using DynamoDb based Catalog

Posted by "munendrasn (via GitHub)" <gi...@apache.org>.
munendrasn commented on issue #6763:
URL: https://github.com/apache/iceberg/issues/6763#issuecomment-1422946779

   With the above plan, schema would remain same for the table, while writing Namespace entries we would swap values (identifier, and namespace columns)
   
   As for backward compatibility, I had couple of ideas
   * Deprecate current Impl, and introduce new DynamoDbCatalog impl
   * Give option to user where users can specify the how to store the Namespace entries using config `dynamodb.namespace-as-identifier` which can default to `false` at least in the beginning. This flag/config can be used to decide the namespace entry storage. Essentially, two approaches fit into single implementation
      * `listNamespace` also need to consume this config, and fetch entries differently
   
   I agree, eventually migration might be required for the existing users. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] github-actions[bot] commented on issue #6763: ACL when using DynamoDb based Catalog

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on issue #6763:
URL: https://github.com/apache/iceberg/issues/6763#issuecomment-1670471964

   This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on issue #6763: ACL when using DynamoDb based Catalog

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on issue #6763:
URL: https://github.com/apache/iceberg/issues/6763#issuecomment-1423100498

   > schema would remain same for the DynamoDb table
   
   Cool then we are good here
   
   Using a config options sounds like a good idea, but I think that only handles the write side. It's more like `dynamodb.write.namespace-as-identifier`. 
   
   For read, we need to handle a hybrid mode of 2 cases existing at the same time. If that is achievable, we can proceed forward with the implementation 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [I] ACL when using DynamoDb based Catalog [iceberg]

Posted by "munendrasn (via GitHub)" <gi...@apache.org>.
munendrasn closed issue #6763: ACL when using DynamoDb based Catalog 
URL: https://github.com/apache/iceberg/issues/6763


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org