You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Tim Armstrong (JIRA)" <ji...@apache.org> on 2019/08/14 19:36:00 UTC

[jira] [Comment Edited] (IMPALA-1903) Add support for partitions by timestamp

    [ https://issues.apache.org/jira/browse/IMPALA-1903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16907565#comment-16907565 ] 

Tim Armstrong edited comment on IMPALA-1903 at 8/14/19 7:35 PM:
----------------------------------------------------------------

[~h-vetinari] the context is all on this JIRA. Allowing partitioning by nanosecond timestamps is just too much of a footgun for us to be inclined to support out-of-the-box without any guard-rails - it's just way too easy to explode the number of partitions and have various operational issues as a result. It might be possible to get some consensus on a change if someone came along with a patch that added the support with some guardrails to prevent that.

We recently added full DATE support, including partitioning by date, which helps with a lot of use cases.

I can see there are probably cases where power users might find partitioning by timestamp to be a useful tool if they're careful with truncation.


was (Author: tarmstrong):
[~h-vetinari] the context is all on this JIRA. Allowing partitioning by nanosecond timestamps is just too much of a footgun for us to be inclined to support out-of-the-box without any guard-rails - it's just way too easy to explode the number of partitions and have various operational issues as a result. It might be possible to get some consensus on a change if someone came along with a patch that added the support with some guardrails to prevent that.

We recently added full DATE support, including partitioning by date, which helps with a lot of use cases.

I can see there are probably cases where power users might find partitioning by timestamp to be a useful tool if they're careful with truncation, but 

> Add support for partitions by timestamp
> ---------------------------------------
>
>                 Key: IMPALA-1903
>                 URL: https://issues.apache.org/jira/browse/IMPALA-1903
>             Project: IMPALA
>          Issue Type: New Feature
>          Components: Frontend
>    Affects Versions: Impala 2.2
>            Reporter: Grant Sohn
>            Assignee: Jim Apple
>            Priority: Critical
>              Labels: ramp-up, sql-language
>
> Timestamps or some time related parameter is a very common way data is partitioned. At Yahoo almost all the Hadoop ETL data was partitioned this way.  This should be on par with nested data types as an important feature for Impala to have.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org