You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@arrow.apache.org by Jialin Liu <va...@gmail.com> on 2022/02/01 00:55:03 UTC

pyarrow inputstream snappy

Hi, I'm trying to read snappy file on HDFS using inputstream, but got the
error:
>
>     with fs.open_input_stream(read_path, **open_stream_args) as f:
>   File "pyarrow/_fs.pyx", line 627, in
> pyarrow._fs.FileSystem.open_input_stream
>   File "pyarrow/_fs.pyx", line 557, in
> pyarrow._fs.FileSystem._wrap_input_stream
>   File "pyarrow/io.pxi", line 1283, in
> pyarrow.lib.CompressedInputStream.__init__
>   File "pyarrow/error.pxi", line 143, in
> pyarrow.lib.pyarrow_internal_check_status
>   File "pyarrow/error.pxi", line 120, in pyarrow.lib.check_status
> pyarrow.lib.ArrowNotImplementedError: Streaming decompression unsupported
> with Snappy
>

Can anyone plz help me with this?

Thanks,
Jialin

Re: pyarrow inputstream snappy

Posted by Micah Kornfield <em...@gmail.com>.
The Arrow compression libraries have too modes batch and streaming
decompression.  For Snappy streaming hasn't been implemented.  I don't
recall off the top of my head if Snappy supports streaming decompression
generally.  If it does, a contribution would be welcome.

 To just get the raw snappy compressed bytes passing "compression=None" [1]
to the open_input_stream should work

[1]
https://arrow.apache.org/docs/python/generated/pyarrow.fs.FileSystem.html#pyarrow.fs.FileSystem.open_input_stream

On Mon, Jan 31, 2022 at 4:55 PM Jialin Liu <va...@gmail.com> wrote:

> Hi, I'm trying to read snappy file on HDFS using inputstream, but got the
> error:
>>
>>     with fs.open_input_stream(read_path, **open_stream_args) as f:
>>   File "pyarrow/_fs.pyx", line 627, in
>> pyarrow._fs.FileSystem.open_input_stream
>>   File "pyarrow/_fs.pyx", line 557, in
>> pyarrow._fs.FileSystem._wrap_input_stream
>>   File "pyarrow/io.pxi", line 1283, in
>> pyarrow.lib.CompressedInputStream.__init__
>>   File "pyarrow/error.pxi", line 143, in
>> pyarrow.lib.pyarrow_internal_check_status
>>   File "pyarrow/error.pxi", line 120, in pyarrow.lib.check_status
>> pyarrow.lib.ArrowNotImplementedError: Streaming decompression unsupported
>> with Snappy
>>
>
> Can anyone plz help me with this?
>
> Thanks,
> Jialin
>