You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "karim-ramadan (via GitHub)" <gi...@apache.org> on 2023/04/07 14:03:57 UTC

[GitHub] [iceberg] karim-ramadan opened a new pull request, #7295: Spark3 structured streaming enable updates

karim-ramadan opened a new pull request, #7295:
URL: https://github.com/apache/iceberg/pull/7295

   ### Context
   
   As brought up in issue #2788, the only 2 possible actions if reading an iceberg table as a Spark streaming DataFrame are either to skip it or fail. A third possible option would be to consider only added files and ignore deleted files.
   
   ### Proposal 
   
   In this PR I propose a new spark reading option: 
   `streaming-overwrite-snapshots-read-mode` 
   with three possible values: SKIP, BREAK, ADDED_FILES_ONLY
   to substitute the already existing 
   `streaming-skip-overwrite-snapshots` (true|false)
   
   The new ADDED_FILES_ONLY would consider just adding files.
   
   ### Notes
   
   - The old conf streaming-skip-overwrite-snapshots have been maintained and used to integrate with the new one (the new one has higher precedence)
   - Some fixes to unit tests have been applied to make them work on Windows I could revert those changes and address them in another PR if needed 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #7295: Spark3 structured streaming enable updates

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on PR #7295:
URL: https://github.com/apache/iceberg/pull/7295#issuecomment-1634971827

   I think this is a good idea. I'll put it in my queue to review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] tmnd1991 commented on pull request #7295: Spark3 structured streaming enable updates

Posted by "tmnd1991 (via GitHub)" <gi...@apache.org>.
tmnd1991 commented on PR #7295:
URL: https://github.com/apache/iceberg/pull/7295#issuecomment-1522975613

   Any update on this? is there something blocking the review or is it a matter or capacity/priority of maintainers?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jhchee commented on pull request #7295: Spark3 structured streaming enable updates

Posted by "jhchee (via GitHub)" <gi...@apache.org>.
jhchee commented on PR #7295:
URL: https://github.com/apache/iceberg/pull/7295#issuecomment-1532425051

   @karim-ramadan Also, if i understood correctly, this would also stream unmodified row to destination.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] karim-ramadan commented on pull request #7295: Spark3 structured streaming enable updates

Posted by "karim-ramadan (via GitHub)" <gi...@apache.org>.
karim-ramadan commented on PR #7295:
URL: https://github.com/apache/iceberg/pull/7295#issuecomment-1511535844

   HI @SreeramGarlapati,
   this PR addresses your issue https://github.com/apache/iceberg/issues/2788,
   could you or @cwsteinbach @RussellSpitzer @kbendick @rdblue have a look at it ? 
   Thank you very much 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jhchee commented on pull request #7295: Spark3 structured streaming enable updates

Posted by "jhchee (via GitHub)" <gi...@apache.org>.
jhchee commented on PR #7295:
URL: https://github.com/apache/iceberg/pull/7295#issuecomment-1531668958

   @karim-ramadan IIUC, your PR would allow user to stream the insert/updates (from MERGE INTO command) to downstream consumer.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] karim-ramadan commented on pull request #7295: Spark3 structured streaming enable updates

Posted by "karim-ramadan (via GitHub)" <gi...@apache.org>.
karim-ramadan commented on PR #7295:
URL: https://github.com/apache/iceberg/pull/7295#issuecomment-1723277177

   > I think this is a good idea. I'll put it in my queue to review.
   
   Hi @rdblue any news on this ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jhchee commented on pull request #7295: Spark3 structured streaming enable updates

Posted by "jhchee (via GitHub)" <gi...@apache.org>.
jhchee commented on PR #7295:
URL: https://github.com/apache/iceberg/pull/7295#issuecomment-1541617848

   Hi @karim-ramadan I didn't get a reply on this. I was told that the snapshot level CDC will eventually unblock overwrite streaming but I have limited understanding on this.
   Ref: https://github.com/apache/iceberg/issues/3941#issuecomment-1531522049


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jhchee commented on pull request #7295: Spark3 structured streaming enable updates

Posted by "jhchee (via GitHub)" <gi...@apache.org>.
jhchee commented on PR #7295:
URL: https://github.com/apache/iceberg/pull/7295#issuecomment-1532616042

   Unfortunately, I'm not a committer of this project. However, I'm bringing awareness in the Iceberg Slack channel so someone might look into this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] tmnd1991 commented on pull request #7295: Spark3 structured streaming enable updates

Posted by "tmnd1991 (via GitHub)" <gi...@apache.org>.
tmnd1991 commented on PR #7295:
URL: https://github.com/apache/iceberg/pull/7295#issuecomment-1505335988

   @SreeramGarlapati @rdblue @davseitsev can anyone have a look at this? Or point to some one that will? Thank you very much


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] karim-ramadan commented on pull request #7295: Spark3 structured streaming enable updates

Posted by "karim-ramadan (via GitHub)" <gi...@apache.org>.
karim-ramadan commented on PR #7295:
URL: https://github.com/apache/iceberg/pull/7295#issuecomment-1532593793

   > @karim-ramadan Also, if i understood correctly, this would also stream unmodified row to destination.
   
   Hi, @jhchee yes but only for V1 tables V2 tables would stream only modified rows. I can add a configuration to differentiate between the 2 behaviours by also checking the version of the table if you think it is needed.
   
   I've also rebased on top of master and hopefully fixed the problems encountered in the first run of the CI. Could you run it again, please? 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] karim-ramadan commented on pull request #7295: Spark3 structured streaming enable updates

Posted by "karim-ramadan (via GitHub)" <gi...@apache.org>.
karim-ramadan commented on PR #7295:
URL: https://github.com/apache/iceberg/pull/7295#issuecomment-1541589097

   Hi @jhchee any news on this? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org