You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Daniel Ecer (Jira)" <ji...@apache.org> on 2019/09/06 22:31:00 UTC
[jira] [Created] (BEAM-8168) Python GCSFileSystem failing with gzip
content encoding
Daniel Ecer created BEAM-8168:
---------------------------------
Summary: Python GCSFileSystem failing with gzip content encoding
Key: BEAM-8168
URL: https://issues.apache.org/jira/browse/BEAM-8168
Project: Beam
Issue Type: Bug
Components: io-py-gcp
Affects Versions: 2.15.0
Reporter: Daniel Ecer
Google Storage supports gzip content encoding.
While Apache Beam (Python) can correctly work with .gz files without content encoding.
It however fails to handle .gz files that have content encoding applied.
e.g. (the following would work run in a Jupyer notebook)
{code:python}
file_url_1 = 'gs://some-bucket/test1.gz'
file_url_2 = 'gs://some-bucket/test2.gz'
!echo 'my content' > /tmp/test
# file 1 without content encoding
!cat /tmp/test | gzip | gsutil cp - "{file_url_1}"
# file 2 with content encoding
!gsutil cp -Z /tmp/test "{file_url_2}"
!gsutil cat "{file_url_1}" | zcat -
# output: my content
!gsutil cat "{file_url_2}" | zcat -
# output: my content
import apache_beam as beam
from apache_beam.io.filesystem import CompressionTypes
from apache_beam.io.filesystems import FileSystems
print(beam.__version__)
# output: 2.15.0
with FileSystems.open(file_url_1, compression_type=CompressionTypes.UNCOMPRESSED) as fp:
print(fp.read(10))
# output: b'\x1f\x8b\x08\x00\x10\xd6r]\x00\x03'
with FileSystems.open(file_url_1) as fp:
print(fp.read(10))
# output: b'my content'
with FileSystems.open(file_url_2, compression_type=CompressionTypes.UNCOMPRESSED) as fp:
print(fp.read(10))
# output: b'my content'
# (here I would expect the gzipped byte code)
with FileSystems.open(file_url_2) as fp:
print(fp.read(10))
# exception: FailedToDecompressContent: Content purported to be compressed with gzip but failed to decompress.
{code}
--
This message was sent by Atlassian Jira
(v8.3.2#803003)