You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Zach Amsden (JIRA)" <ji...@apache.org> on 2017/05/19 19:10:04 UTC

[jira] [Resolved] (IMPALA-5003) Add 'constant propagation' for Views with a partition filter

     [ https://issues.apache.org/jira/browse/IMPALA-5003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zach Amsden resolved IMPALA-5003.
---------------------------------
       Resolution: Fixed
    Fix Version/s: Impala 2.9.0

Also want follow-up change:

https://gerrit.cloudera.org/#/c/6575/

> Add 'constant propagation' for Views with a partition filter
> ------------------------------------------------------------
>
>                 Key: IMPALA-5003
>                 URL: https://issues.apache.org/jira/browse/IMPALA-5003
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>    Affects Versions: Impala 2.9.0
>            Reporter: Peter Ebert
>            Assignee: Zach Amsden
>              Labels: ramp-up
>             Fix For: Impala 2.9.0
>
>
> There are scenarios when you (1) want to reduce the number of partitions by hash&mod or part of a date, etc and (2) do not want to expose this complexity to end users who do not know how the table might be partitioned.
> As an example: Say I have a column that has 1000 values and I want to partition on customer_id, but the data is too small for 1000 partitions so I want to mod the customer_id to put it into 100 'buckets', to give me a 100x faster scan when selecting a single customer_id.
> When using 3rd party tools or exposing this table to end users who may not be educated on the partitioning scheme, those tools will not properly filter based on simply selecting a specific customer_id.
> If this is my partitioned table:
> CREATE TABLE default.lesspartitions (   customer_id INT,    customer_name STRING,    some_data STRING ) PARTITIONED BY (   partition_id INT ) 
> ...
> select customer_id, customer_name, some_data, customer_id % 100 as partition_id from source
> It would be nice to create a view where the partition is hidden from users and filtered on automatically:
> CREATE VIEW default.hiddenpartitions AS SELECT customer_id, customer_name, some_data FROM default.lesspartitions WHERE partition_id = customer_id % 100
> So then a user/3rd party tool can write the following:
> select * from hiddenpartitions where customer_id = 4
> This jira would adding the 'constant propagation' of the filter into the view so that only 1 partition is read.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)