You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@arrow.apache.org by David Lahn <da...@forwardpmx.com> on 2020/04/23 20:13:25 UTC
Snappy Compression with red-parquet Ruby Gem
Hi,
Does anyone have any examples of how to output a Parquet file with Snappy compression using the Ruby gem?
We have tested trying to set compression to “snappy” on the TableSaver, but we get the following:
[compressed-output-stream][new]: NotImplemented: Streaming compression unsupported with Snappy (Arrow::Error::NotImplemented)
Example:
Arrow::TableSaver.new(table, 'test.parquet', {compression: 'snappy'}).save
Or are we completely turned around on how to accomplish this?
Dave
David Lahn
DevOps Lead
Development
ForwardPMX
Privacy Policy
This e-mail is confidential to ForwardPMX intended for use by the recipient. If you received this in error or are not the intended recipient, you are hereby notified that any review, retransmission, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited.
Re: Snappy Compression with red-parquet Ruby Gem
Posted by Sutou Kouhei <ko...@clear-code.com>.
Hi,
Oh, we forgot to integrate saver interface with the Parquet
compression option.
You can use the feature by the following code with 0.17.0:
--
require "parquet"
table = Arrow::Table.new({"count" => [1, 2, 3]})
Arrow::FileOutputStream.open("test.parquet", false) do |output|
properties = Parquet::WriterProperties.new
properties.set_compression(:snappy)
Parquet::ArrowFileWriter.open(table.schema, output, properties) do |writer|
chunk_size = 1024
writer.write_table(table, chunk_size)
end
end
--
You'll be able to write the following code with the next release:
--
require "parquet"
table = Arrow::Table.new({"count" => [1, 2, 3]})
table.save("test.parquet", compression: :snappy)
--
Thanks,
--
kou
In <78...@contoso.com>
"Snappy Compression with red-parquet Ruby Gem" on Thu, 23 Apr 2020 20:13:25 +0000,
David Lahn <da...@forwardpmx.com> wrote:
> Hi,
>
> Does anyone have any examples of how to output a Parquet file with Snappy compression using the Ruby gem?
>
> We have tested trying to set compression to “snappy” on the TableSaver, but we get the following:
>
> [compressed-output-stream][new]: NotImplemented: Streaming compression unsupported with Snappy (Arrow::Error::NotImplemented)
>
> Example:
>
> Arrow::TableSaver.new(table, 'test.parquet', {compression: 'snappy'}).save
>
> Or are we completely turned around on how to accomplish this?
>
> Dave
>
> David Lahn
> DevOps Lead
> Development
>
> ForwardPMX
> Privacy Policy
>
>
>
>
> This e-mail is confidential to ForwardPMX intended for use by the recipient. If you received this in error or are not the intended recipient, you are hereby notified that any review, retransmission, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited.
>
Re: Snappy Compression with red-parquet Ruby Gem
Posted by Wes McKinney <we...@gmail.com>.
hi David,
You don't want to pass the compression option to TableSaver.new --
compression is something that's configured in the Parquet writer. This
would need to be an option on save_as_parquet, but it doesn't look
like it is exposed right now
https://github.com/apache/arrow/blob/master/ruby/red-parquet/lib/parquet/arrow-table-savable.rb#L21
It's available in GLib though so this could be added to the Ruby library
https://github.com/apache/arrow/blob/master/c_glib/parquet-glib/arrow-file-writer.h
- Wes
On Thu, Apr 23, 2020 at 3:13 PM David Lahn <da...@forwardpmx.com> wrote:
>
> Hi,
>
>
>
> Does anyone have any examples of how to output a Parquet file with Snappy compression using the Ruby gem?
>
>
>
> We have tested trying to set compression to “snappy” on the TableSaver, but we get the following:
>
>
>
> [compressed-output-stream][new]: NotImplemented: Streaming compression unsupported with Snappy (Arrow::Error::NotImplemented)
>
>
>
> Example:
>
>
>
> Arrow::TableSaver.new(table, 'test.parquet', {compression: 'snappy'}).save
>
>
>
> Or are we completely turned around on how to accomplish this?
>
>
>
> Dave
>
>
> David Lahn
> DevOps Lead
> Development
>
> ForwardPMX
> Privacy Policy
>
> e: david.lahn@forwardpmx.com
> d: +44 (0)203 476 3725 (main office number)
> m: +1 519 573 1624
>
>
> This e-mail is confidential to ForwardPMX intended for use by the recipient. If you received this in error or are not the intended recipient, you are hereby notified that any review, retransmission, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited.
>