You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "zenithyr (via GitHub)" <gi...@apache.org> on 2023/07/24 03:36:26 UTC

[GitHub] [arrow] zenithyr commented on issue #36834: [C++] Does the dataset API support compression & appending to existing parquet files?

zenithyr commented on issue #36834:
URL: https://github.com/apache/arrow/issues/36834#issuecomment-1647149055

   > Just answer the second question, for compression:
   > 
   > ```c++
   > class ARROW_DS_EXPORT ParquetFileWriteOptions : public FileWriteOptions {
   >  public:
   >   /// \brief Parquet writer properties.
   >   std::shared_ptr<parquet::WriterProperties> writer_properties;
   > 
   >   /// \brief Parquet Arrow writer properties.
   >   std::shared_ptr<parquet::ArrowWriterProperties> arrow_writer_properties;
   > 
   >  protected:
   >   explicit ParquetFileWriteOptions(std::shared_ptr<FileFormat> format)
   >       : FileWriteOptions(std::move(format)) {}
   > 
   >   friend class ParquetFileFormat;
   > };
   > ```
   > 
   > Maybe you can config compression in `parquet::WriterProperties`.
   
   Much appreciated!
   Configured compression below seems working.
   ```cpp
   ds::FileSystemDatasetWriteOptions write_options;
   
   auto format = std::make_shared<ds::ParquetFileFormat>();
   auto pq_options = std::dynamic_pointer_cast<arrow::dataset::ParquetFileWriteOptions>(format->DefaultWriteOptions());
   pq_options->writer_properties = parquet::WriterProperties::Builder()
           .created_by("1.0")
           ->compression(arrow::Compression::SNAPPY)
           ->build();
   
   write_options.file_write_options = pq_options;
   write_options.file_write_options = format->DefaultWriteOptions();
   write_options.filesystem = filesystem;
   write_options.base_dir = base_dir;
   write_options.partitioning = partitioning;
   write_options.basename_template = "part{i}.parquet";
   ```
   
   Still don't know if Q1 is possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org