You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues-all@impala.apache.org by "Csaba Ringhofer (JIRA)" <ji...@apache.org> on 2018/10/19 17:16:00 UTC

[jira] [Closed] (IMPALA-7723) Recognize int64 timestamps in CREATE TABLE LIKE PARQUET

     [ https://issues.apache.org/jira/browse/IMPALA-7723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Csaba Ringhofer closed IMPALA-7723.
-----------------------------------
    Resolution: Invalid

> Recognize int64 timestamps in CREATE TABLE LIKE PARQUET
> -------------------------------------------------------
>
>                 Key: IMPALA-7723
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7723
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>            Reporter: Csaba Ringhofer
>            Priority: Minor
>              Labels: parquet
>
> IMPALA-5050 adds support for reading int64 encoded Parquet timestamps. These columns have int64 physical type, and converted/logical types has to be used to differentiate them from BIGINTs. These columns can be read both as BIGINTs and TIMESTAMPs depending on the table's schema.
> CREATE TABLE LIKE PARQUET could also convert these columns to TIMESTAMP instead of BIGINT, but I decided to postpone adding this feature for two reasons:
> 1. It could break the following possible workflow:
> - generate Parquet files (that contain int64 timestamps) with some tool
> - use Impala's CREATE TABLE LIKE PARQUET + LOAD DATA to make it accessible as a table
> - run some queries that rely on interpreting these columns as integers
> CAST (col as BIGINT) in the query would make this even worse, as it would convert timestamp to unix time in seconds instead of micros/millis without any warning.
> 2. Adding support for int64 timestamps with nanoseconds precision will need Impala's  parquet-hadoop-bundle dependency to be bumped to a new major version, which may contain incompatible API changes.
> Note that parquet-hadoop-bundle is only used in CREATE TABLE LIKE PARQUET. The C++ parts of Impala only rely on parquet.thrift, which can be updated more easily.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org