You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Ethan (JIRA)" <ji...@apache.org> on 2019/08/14 16:50:00 UTC

[jira] [Resolved] (IMPALA-8549) Add support for scanning DEFLATE text files

     [ https://issues.apache.org/jira/browse/IMPALA-8549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ethan resolved IMPALA-8549.
---------------------------
       Resolution: Fixed
    Fix Version/s: Impala 3.3.0

> Add support for scanning DEFLATE text files
> -------------------------------------------
>
>                 Key: IMPALA-8549
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8549
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>            Reporter: Sahil Takiar
>            Assignee: Ethan
>            Priority: Minor
>              Labels: ramp-up
>             Fix For: Impala 3.3.0
>
>
> Several Hadoop tools (e.g. Hive, MapReduce, etc.) support reading and writing text files stored using zlib / deflate (results in files such as {{000000_0.deflate}}). Impala currently does not support reading {{.deflate}} text files and returns errors such as: {{ERROR: Scanner plugin 'DEFLATE' is not one of the enabled plugins: 'LZO'}}.
> Moreover, the default compression codec in Hadoop is zlib / deflate (see {{o.a.h.io.compress.DefaultCodec}}). So when writing to a text table in Hive, if users set {{hive.exec.compress.output}} to true, then {{.deflate}} files will be written by default.
> Impala does support zlib / deflate with other file formats though: Avro, RCFiles, SequenceFiles (see [https://impala.apache.org/docs/build/html/topics/impala_file_formats.html]).
> Currently, the frontend assigns a compression type to a file depending on its extension. For instance, the functional_text_def database is stored as a file with a .deflate extension and is assigned the compression type DEFLATE. The HdfsTextScanner class receives this value and uses it directly to create a decompressor. The functional_\{avro,seq,rc}_databases are stored as files without extensions, so the frontend interprets their compression type as NONE. However, in the backend, each of their corresponding scanners implement custom logic of their own to read file headers and override the existing NONE compression type assigned to files with new values, such as DEFAULT or DEFLATE, so that they appropriate decompressor can be instantiated.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org