You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/07/22 00:09:17 UTC

[GitHub] [iceberg] daksha121 opened a new pull request #2848: Make SparkWriteBuilder and SparkWrite classes public

daksha121 opened a new pull request #2848:
URL: https://github.com/apache/iceberg/pull/2848


   We have some scenarios for which we need to support opinionated writes to Iceberg tables using Spark.
   In an attempt to support those, we plan to extend the SparkWriteBuilder and SparkWrite classes. One example of the scenarios we are trying to achieve:
   
   - We want to be able to allow **only** incremental writes to the Iceberg table. If the above classes were made public, we could extend them and block all actions except incremental writes. Either based on a custom table property or even by default.
   
   Please let us know if there are alternative ways for us to support such functionality using Iceberg.
   
   cc: @SreeramGarlapati 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on pull request #2848: Make SparkWriteBuilder and SparkWrite classes public

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on pull request #2848:
URL: https://github.com/apache/iceberg/pull/2848#issuecomment-899699795


   I was just thinking if you are already making your own SparkTable and Catalog, why not just build inside the Iceberg Packages in your own code? Then you won't need to change the classes to public.
   
   My worry about this feature in general would be if we were serious, we would want to have it be at the core level. If we only changed the catalog + table implementation another framework (or non-custom iceberg version) could write to the table with whatever mode it liked.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] daksha121 commented on pull request #2848: Make SparkWriteBuilder and SparkWrite classes public

Posted by GitBox <gi...@apache.org>.
daksha121 commented on pull request #2848:
URL: https://github.com/apache/iceberg/pull/2848#issuecomment-899685786


   @rdblue @RussellSpitzer would love your feedback on this


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] daksha121 commented on pull request #2848: Make SparkWriteBuilder and SparkWrite classes public

Posted by GitBox <gi...@apache.org>.
daksha121 commented on pull request #2848:
URL: https://github.com/apache/iceberg/pull/2848#issuecomment-899715663


   > I was just thinking if you are already making your own SparkTable and Catalog, why not just build inside the Iceberg Packages in your own code? Then you won't need to change the classes to public.
   
   This makes sense, thanks @RussellSpitzer 
   
   I agree with your concerns @RussellSpitzer and @rdblue 
   It would be best if we can contribute this to Iceberg: A table property to allow only incremental writes and block other operations. We just weren't sure if this is something Iceberg would want to support.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] daksha121 edited a comment on pull request #2848: Make SparkWriteBuilder and SparkWrite classes public

Posted by GitBox <gi...@apache.org>.
daksha121 edited a comment on pull request #2848:
URL: https://github.com/apache/iceberg/pull/2848#issuecomment-899698187


   > Wouldn't this require other modifications to Iceberg as well? or are you making your own Iceberg source?
   
   @RussellSpitzer  We were thinking of creating a layer over Iceberg -
   - So our own version of SparkCatalog and SparkTable - say CustomSparkTable that holds an instance of [SparkTable](https://github.com/apache/iceberg/blob/master/spark3/src/main/java/org/apache/iceberg/spark/source/SparkTable.java)
   - And we can use this layer to have opinionated writes by extending the Spark Write classes
   
   Is there another way we can achieve this? We are looking for extension points in the Read/Write path
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] daksha121 commented on pull request #2848: Make SparkWriteBuilder and SparkWrite classes public

Posted by GitBox <gi...@apache.org>.
daksha121 commented on pull request #2848:
URL: https://github.com/apache/iceberg/pull/2848#issuecomment-899698187


   > Wouldn't this require other modifications to Iceberg as well? or are you making your own Iceberg source?
   
   @RussellSpitzer  We were thinking of creating a layer over Iceberg -
   - So our own version of SparkCatalog and SparkTable - say CustomSparkTable that holds an instance of [SparkTable](https://github.com/apache/iceberg/blob/master/spark3/src/main/java/org/apache/iceberg/spark/source/SparkTable.java)
   - And we can use this layer to have opinionated writes by extending the Spark Write classes
   
   Is there another way we can achieve this? We are looking for extension points in the Read/Write path
   
   
   
   This class can then hold the 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on pull request #2848: Make SparkWriteBuilder and SparkWrite classes public

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on pull request #2848:
URL: https://github.com/apache/iceberg/pull/2848#issuecomment-899687498


   Wouldn't this require other modifications to Iceberg as well? or are you making your own Iceberg source?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #2848: Make SparkWriteBuilder and SparkWrite classes public

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #2848:
URL: https://github.com/apache/iceberg/pull/2848#issuecomment-899702801


   I think it would be better to add a feature to Iceberg rather than to expose classes. As is, this PR would make it possible to extend these in other unsupported ways and we don't want to create issues from having these as public classes or break you later because we consider them private but they are accessible.
   
   It's reasonable to add some ability to mark a table for a use case, like incremental appends or CDC, and to disable some operations for that use case. I would support that more than just opening up classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org