You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/12/21 14:07:01 UTC

[GitHub] [iceberg] hililiwei opened a new pull request, #6470: Spark: Allow specifying file format in RewriteDataFiles

hililiwei opened a new pull request, #6470:
URL: https://github.com/apache/iceberg/pull/6470

   Close #6464 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on pull request #6470: Spark: Allow specifying file format in RewriteDataFiles

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on PR #6470:
URL: https://github.com/apache/iceberg/pull/6470#issuecomment-1373852898

   Is this really needed as an option? Just seems to me that a user can just change the table default. I don't see a big use case of changing the format just for an optimize invocation without changing all new files.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on pull request #6470: Spark: Allow specifying file format in RewriteDataFiles

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.
RussellSpitzer commented on PR #6470:
URL: https://github.com/apache/iceberg/pull/6470#issuecomment-1405061598

   Does the streaming write have the ability to set the file format? Or does that only let you use the table default as well?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] peay commented on pull request #6470: Spark: Allow specifying file format in RewriteDataFiles

Posted by GitBox <gi...@apache.org>.
peay commented on PR #6470:
URL: https://github.com/apache/iceberg/pull/6470#issuecomment-1361392095

   Not yet very familiar with the codebase, but this looks great to me, thanks a lot @hililiwei!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] peay commented on pull request #6470: Spark: Allow specifying file format in RewriteDataFiles

Posted by GitBox <gi...@apache.org>.
peay commented on PR #6470:
URL: https://github.com/apache/iceberg/pull/6470#issuecomment-1376904416

   The motivation in https://github.com/apache/iceberg/issues/6464 is to allow writing as Avro from a streaming pipeline, where row-based can make sense for small but frequent micro-batches, but then compacting to Parquet for longer-term batch analytics. This can be done today by configuring the table as Parquet, and explicitly setting the streaming writer to write as Avro, but it is a bit less flexible and I think it'd make sense to keep such details in compaction settings instead.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] 0xNacho commented on pull request #6470: Spark: Allow specifying file format in RewriteDataFiles

Posted by "0xNacho (via GitHub)" <gi...@apache.org>.
0xNacho commented on PR #6470:
URL: https://github.com/apache/iceberg/pull/6470#issuecomment-1405009473

   +1 @peay . I have the same problem!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] hililiwei commented on pull request #6470: Spark: Allow specifying file format in RewriteDataFiles

Posted by "hililiwei (via GitHub)" <gi...@apache.org>.
hililiwei commented on PR #6470:
URL: https://github.com/apache/iceberg/pull/6470#issuecomment-1407585933

   > Does the streaming write have the ability to set the file format? Or does that only let you use the table default as well?
   
   Yes, streaming writes can set file format.
   
   > Other than that I think we should probably minimize the amount of tests we are adding for this. This is one of the slower test suites and it looks like we have a full test for every strategy, we should be able to confirm this by just rewriting a single file in each test correct?
   
   Try to reduce the test, PTAL, thx.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org