You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "comicfans (via GitHub)" <gi...@apache.org> on 2023/04/13 09:03:33 UTC

[GitHub] [arrow] comicfans opened a new issue, #35102: [R] write_parquet expose similar options as python parquet.write_table ?

comicfans opened a new issue, #35102:
URL: https://github.com/apache/arrow/issues/35102

   ### Describe the enhancement requested
   
   I've found R write_parquet
   
   ```R
   write_parquet(
     x,
     sink,
     chunk_size = NULL,
     version = "2.4",
     compression = default_parquet_compression(),
     compression_level = NULL,
     use_dictionary = NULL,
     write_statistics = NULL,
     data_page_size = NULL,
     use_deprecated_int96_timestamps = FALSE,
     coerce_timestamps = NULL,
     allow_truncated_timestamps = FALSE
   )
   ```
   
   missed some options compared to python parquet.write_table
   
   ```python
   pyarrow.parquet.write_table(table, where, row_group_size=None, version='2.4', use_dictionary=True, compression='snappy', write_statistics=True, use_deprecated_int96_timestamps=None, coerce_timestamps=None, allow_truncated_timestamps=False, data_page_size=None, flavor=None, filesystem=None, compression_level=None, use_byte_stream_split=False, column_encoding=None, data_page_version='1.0', use_compliant_nested_type=False, encryption_properties=None, write_batch_size=None, dictionary_pagesize_limit=None, store_schema=True, **kwargs)
   ```
   
   maybe R interface should also expose similar options? the most important options to me is 
   use_byte_stream_split/ column_encoding, since I'm using R to save parquet file, but it can't specify best column encoding for every column and leads bigger file, I have to use pyarrow to convert it again, which is not very ideal.
   
   ### Component(s)
   
   R


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] amoeba commented on issue #35102: [R] write_parquet expose similar options as python parquet.write_table ?

Posted by "amoeba (via GitHub)" <gi...@apache.org>.
amoeba commented on issue #35102:
URL: https://github.com/apache/arrow/issues/35102#issuecomment-1515454027

   This is a similar kind of enhancement to https://github.com/apache/arrow/issues/34577 and might even be done in the same bit of work.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org