You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Tim Armstrong (JIRA)" <ji...@apache.org> on 2018/10/30 17:20:00 UTC

[jira] [Resolved] (IMPALA-580) Inconsistent or blank fileFormats values passed to CM

     [ https://issues.apache.org/jira/browse/IMPALA-580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tim Armstrong resolved IMPALA-580.
----------------------------------
    Resolution: Cannot Reproduce

> Inconsistent or blank fileFormats values passed to CM
> -----------------------------------------------------
>
>                 Key: IMPALA-580
>                 URL: https://issues.apache.org/jira/browse/IMPALA-580
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 1.1
>         Environment: Impala 1.1.0 and CM 4.6.2.
>            Reporter: John Russell
>            Priority: Minor
>
> In the CM "Query Details" page, one of the fields is "File Formats". If I query a table created with STORED AS SEQFILE with the BZip2 compression codec, CM shows a line like:
> File Formats: SEQUENCE_FILE/BZIP2
> That seems intuitive. However, for other combinations of file format and compression codec, the "File Formats" value is blank or seems misleading. 
> select * from seqfile_snappy limit 5 -> file formats in CM is blank
> select * from rcfile_snappy limit 5 -> file formats in CM is blank
> select count(*) from seqfile_deflate -> file formats in CM = SEQUENCE_FILE/DEFAULT
> select count(*) from rcfile_deflate -> file formats in CM = RC_FILE/DEFAULT (is DEFAULT a typo for DEFLATE since this happens for both SEQFILE and RCFILE tables?)
> select count(*) from parquet_snappy -> file formats =  PARQUET/NONE
> I also see PARQUET/NONE for a Parquet table compressed with GZip.
> I also see PARQUET/NONE for a Parquet table where the Impala data directory contains data files compressed with different codecs. I understand CM could in some cases display multiple values in this "File Formats" field, and that's what I'd expect to happen in this case. (The same way I'd expect multiple "File Formats" values for a join of tables with different file formats, or a query against a partitioned table where partitions had different file formats.)
> I did not have an LZO-compressed text table, so I didn't check if that case would produce TEXT/LZO as expected.
> I did not have an Avro table, so I didn't check those combinations.
> I did not check Avro, SEQFILE, or RCFILE with data files from more than one compression codec in the same directory.
> Other than the above cases, I think I checked every combination of file format and codec, and the only issues I saw were those I listed.
> impala-shell PROFILE output or CM profile text available if desired.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org