You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "alamb (via GitHub)" <gi...@apache.org> on 2023/03/16 15:09:53 UTC

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5584: Introduce file writer strategies for Parquet writer

alamb commented on code in PR #5584:
URL: https://github.com/apache/arrow-datafusion/pull/5584#discussion_r1138849539


##########
datafusion/core/src/dataframe.rs:
##########
@@ -930,10 +932,11 @@ impl DataFrame {
         self,
         path: &str,
         writer_properties: Option<WriterProperties>,
+        save_mode: FileWriterSaveMode,

Review Comment:
   > I saw that option, however, it would better to be have a general ***WriterOptions, like we have ParquetReadOptions, CsvReadOptions, AvroReadOptions etc.
   
   I agree that making a more future proof API here would be very helpful  -- for example, I can imagine adding a other (DataFusion) specific properties for writing parquet. 
   
   For example, I would love to have a place to put "sort expressions" to sort the output parquet file by some set of expressions. 
   
   I love @metesynnada 's suggestion for `ParquetWriterOptions`
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org