You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2019/01/10 21:20:00 UTC

[jira] [Commented] (IMPALA-6533) Support DECIMAL for min-max runtime filters

    [ https://issues.apache.org/jira/browse/IMPALA-6533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16739788#comment-16739788 ] 

ASF subversion and git services commented on IMPALA-6533:
---------------------------------------------------------

Commit aacd5c35d3134870b9a55658011cf08e60275459 in impala's branch refs/heads/master from Janaki Lahorani
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=aacd5c3 ]

IMPALA-6533: Add min-max filter for decimal types on kudu tables.

The code mimics the code written for other min-max filters.  Decimal data
can be stored using 4 bytes, 8 bytes and 16 bytes.  The code respectively
handles these 3 storage configurations.  The column definition states the
precision and the precision determines the storage size.

The minimum and maximum values are stored in a union.  The precision from
the column will come in as an input.  Based on the precision the size will be
found, and depending on the size appropriate variable will be used.

The code in min-max-filter* follows the general convention of the file, hence
uses macros.

The test includes 24 decimal columns (as listed below) with the following joins:
1.  Inner Join with broadcast (2 tables)
  1a. 1 predicate
  1b. 4 predicates - all results in decimal min-max filter
  1c. 4 predicates - 3 results in decimal min=max filter; 1 doesn't
2.  Inner Join with Shuffle (3 tables)
3.  Right outer join (2 tables)
4.  Left Semi join (2 tables)
5.  Right Semi join (2 tables)

Decimal Columns:
4bytes:
(5,0), (5,1), (5,3), (5,5)
(9,0), (9,1), (9,5), (9,9)
8 bytes:
(14,0), (14,1), (14,7), (14,14)
(18,0), (18,1), (18,9), (18,18)
16 bytes:
(28,0), (28,1), (28,14), (28,28)
(38,0), (38,1), (38,19), (38,38)

The test aggregates the count of probe rows.  This shows that the min-max filter
is exercised, because the number of probe rows is less than the total number
of rows in the probe side table.  The count of probe rows is considered to be
deterministic.  But, it will be beneficial to look out for changes in Kudu that can
change the way data is partitioned.  Such a change could change the probe row count
and in that case, the test will have to be updated.

impala_test_suite.py and test_result_verifier.py are enhanced to support saving
of aggregation using update_results.

Change-Id: Ib7e7278e902160d7060f8097290bc172d9031f94
Reviewed-on: http://gerrit.cloudera.org:8080/12113
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Support DECIMAL for min-max runtime filters
> -------------------------------------------
>
>                 Key: IMPALA-6533
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6533
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>    Affects Versions: Impala 2.12.0
>            Reporter: Thomas Tauber-Marshall
>            Assignee: Janaki Lahorani
>            Priority: Critical
>              Labels: kudu
>
> Since min-max runtime filters are only supported for Kudu, and since at the time they were implemented Kudu didn't support DECIMAL, we didn't implement DECIMAL min-max runtime filters.
> There's ongoing work to add support for DECIMAL on Kudu (IMPALA-5752). To start, min-max runtime filters will just be disabled for DECIMAL targets, but we should extend MinMaxFilter to support it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org