You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Quanlong Huang (Jira)" <ji...@apache.org> on 2023/03/26 10:10:00 UTC

[jira] [Commented] (IMPALA-12007) wrong result with Pushdown runtime min-max filters on bitlong and decimal

    [ https://issues.apache.org/jira/browse/IMPALA-12007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17705051#comment-17705051 ] 

Quanlong Huang commented on IMPALA-12007:
-----------------------------------------

[~dachn] Thanks for reporting this! I can reproduce the issue using a simplified example:
{code:sql}
create table left_tbl (
    id decimal(18, 0),
    project decimal(18, 0), 
    primary key (id)
)
partition by hash (id) partitions 2
stored as kudu;

create table right_tbl (
    id int,
    child bigint,
    primary key (id)
)
partition by hash (id) partitions 2
stored as kudu;

insert into left_tbl values (100, 200);
insert into right_tbl values (0, 100);

select straight_join
a.id,
b.child
from left_tbl a
join right_tbl b on a.id=b.child;{code}
The SELECT query returns 0 rows. If setting query option enabled_runtime_filter_types=bloom, it returns 1 row.

In the query plan, I can see the MinMax runtime filter:
{noformat}
F02:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
|  Per-Host Resources: mem-estimate=10.05MB mem-reservation=4.00MB thread-reservation=1
PLAN-ROOT SINK
|  output exprs: a.id, b.child
|  mem-estimate=10.00MB mem-reservation=4.00MB spill-buffer=2.00MB thread-reservation=0
|
04:EXCHANGE [UNPARTITIONED]
|  mem-estimate=47.99KB mem-reservation=0B thread-reservation=0
|  tuple-ids=0,1 row-size=16B cardinality=unavailable
|  in pipelines: 00(GETNEXT)
|
F00:PLAN FRAGMENT [RANDOM] hosts=2 instances=2
Per-Host Resources: mem-estimate=2.00GB mem-reservation=34.00MB thread-reservation=2
02:HASH JOIN [INNER JOIN, BROADCAST]
|  hash predicates: a.id = b.child
|  fk/pk conjuncts: assumed fk/pk
|  runtime filters: RF001[min_max] <- b.child
|  mem-estimate=2.00GB mem-reservation=34.00MB spill-buffer=2.00MB thread-reservation=0
|  tuple-ids=0,1 row-size=16B cardinality=unavailable
|  in pipelines: 00(GETNEXT), 01(OPEN)
|
|--03:EXCHANGE [BROADCAST]
|  |  mem-estimate=23.99KB mem-reservation=0B thread-reservation=0
|  |  tuple-ids=1 row-size=8B cardinality=unavailable
|  |  in pipelines: 01(GETNEXT)
|  |
|  F01:PLAN FRAGMENT [RANDOM] hosts=2 instances=2
|  Per-Host Resources: mem-estimate=816.00KB mem-reservation=0B thread-reservation=2
|  01:SCAN KUDU [default.right_tbl b]
|     mem-estimate=768.00KB mem-reservation=0B thread-reservation=1
|     tuple-ids=1 row-size=8B cardinality=unavailable
|     in pipelines: 01(GETNEXT)
|
00:SCAN KUDU [default.left_tbl a]
   runtime filters: RF001[min_max] -> a.id
   mem-estimate=768.00KB mem-reservation=0B thread-reservation=1
   tuple-ids=0 row-size=8B cardinality=unavailable
   in pipelines: 00(GETNEXT) {noformat}
When using the DEBUG build, there is a DCHECK assertion failure:
{noformat}
F0326 18:07:35.611291 17304 kudu-scanner.cc:275] 254cf96c51bf73b2:3affec3300000003] Check failed: col_type.IsIntegerType() 
*** Check failure stack trace: *** 
    @          0x37ba83d  google::LogMessage::Fail()
    @          0x37bc774  google::LogMessage::SendToLog()
    @          0x37ba21c  google::LogMessage::Flush()
    @          0x37bcc99  google::LogMessageFatal::~LogMessageFatal()
    @          0x1aba79d  impala::KuduScanner::OpenNextScanToken()
    @          0x1aa994e  impala::KuduScanNode::ProcessScanToken()
    @          0x1aaaac8  impala::KuduScanNode::RunScannerThread()
    @          0x1aaae1d  _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12KuduScanNode17ThreadAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE
    @          0x18c08b4  impala::Thread::SuperviseThread()
    @          0x18c16bd  boost::detail::thread_data<>::run()
    @          0x23b9e67  thread_proxy
    @     0x7f67cbbd46db  start_thread
    @     0x7f67c894c61f  clone
Minidump in thread [17304]kudu-scanner-thread (finst:254cf96c51bf73b2:3affec3300000003, plan-node-id:0, thread-idx:0) running query 254cf96c51bf73b2:3affec3300000000, fragment instance 254cf96c51bf73b2:3affec3300000003
Wrote minidump to /home/quanlong/workspace/Impala/logs/cluster/minidumps/impalad/966697e9-16de-4eee-dded2db6-82b05a8a.dmp{noformat}
Related codes:
{code:cpp}
268         // If the type of the filter is not the same as the type of the target column,
269         // there must be an implicit integer cast and we need to ensure the min/max we
270         // pass to Kudu are within the range of the target column.
271         int64_t int_min;
272         int64_t int_max;
273         const ColumnType& col_type = ColumnType::FromThrift(target_desc.kudu_col_type);
274         if (col_type.type != filter->type()) {
275           DCHECK(col_type.IsIntegerType());
276 
277           if (!filter->GetCastIntMinMax(col_type, &int_min, &int_max)) {
278             // The min/max for this filter is outside the range for the target column,
279             // so all rows are filtered out and we can skip the scan.
280             CloseCurrentClientScanner();
281             *eos = true;
282             return Status::OK();
283           }
284           min = &int_min;
285           max = &int_max;
286         } {code}

> wrong result with Pushdown runtime min-max filters on bitlong and decimal
> -------------------------------------------------------------------------
>
>                 Key: IMPALA-12007
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12007
>             Project: IMPALA
>          Issue Type: Bug
>    Affects Versions: Impala 4.1.1
>         Environment: centos
>            Reporter: daicheng
>            Priority: Major
>         Attachments: image-2023-03-16-18-03-55-375.png, image-2023-03-16-18-05-03-201.png, image-2023-03-16-18-08-55-259.png, image-2023-03-16-18-11-28-239.png, image-2023-03-16-18-12-30-062.png, image-2023-03-16-18-17-57-745.png, image-2023-03-17-15-09-40-463.png, image-2023-03-17-15-13-49-139.png, image-2023-03-17-15-17-23-669.png
>
>
> h1. wrong result with Pushdown runtime min-max filters
> h2. table description
> I have two tables in kudu,prod_{_}tm_jira{_}_AO_69E499_REQ_RELATION is like below,and i create external table in impala:
> {code:java}
> TABLE prod_tm_jira_AO_69E499_REQ_RELATION (
>     ID INT32 NOT NULL,
>     kudu_change_time UNIXTIME_MICROS NOT NULL,
>     kudu_update_time UNIXTIME_MICROS NOT NULL,
>     kudu_is_deleted BOOL NOT NULL,
>     CHILD INT64 NULLABLE,
>     CHILD_PROJECT INT64 NULLABLE,
>     PARENT INT64 NULLABLE,
>     PARENT_PROJECT INT64 NULLABLE,
>     SEQUENCE INT32 NULLABLE,
>     SEQ_BALANCER INT32 NULLABLE,
>     PRIMARY KEY (ID)
> )
> HASH (ID) PARTITIONS 2,
> RANGE (ID) (
>     PARTITION UNBOUNDED
> )
> OWNER yarn
> REPLICAS 3
> COMMENT {code}
> {code:java}
> CREATE EXTERNAL TABLE prod__tm_jira.ao_69e499_req_relation STORED AS KUDU TBLPROPERTIES ('STATS_GENERATED'='TASK', 'accessType'='8', 'impala.lastComputeStatsTime'='1678903202', 'kudu.master_addresses'='kudu-host', 'kudu.table_name'='prod__tm_jira__AO_69E499_REQ_RELATION', 'numFiles'='0', 'numRows'='62204', 'totalSize'='0') {code}
> ‘child’ is bigint in impala,and table AO_69E499_REQ_RELATION table has 6273 rows:
> !image-2023-03-17-15-17-23-669.png!    !image-2023-03-16-18-05-03-201.png|width=395,height=295!
> and another table describe below:
> {code:java}
> TABLE prod_tm_jira_jiraissue (
>     ID DECIMAL(18, 0) NOT NULL,
>     kudu_change_time UNIXTIME_MICROS NOT NULL,
>     kudu_update_time UNIXTIME_MICROS NOT NULL,
>     kudu_is_deleted BOOL NOT NULL,
>     pkey STRING NULLABLE,
>     issuenum DECIMAL(18, 0) NULLABLE,
>     PROJECT DECIMAL(18, 0) NULLABLE,
>     REPORTER STRING NULLABLE,
>     ASSIGNEE STRING NULLABLE,
>     CREATOR STRING NULLABLE,
>     issuetype STRING NULLABLE,
>     SUMMARY STRING NULLABLE,
>     DESCRIPTION STRING NULLABLE,
>     ENVIRONMENT STRING NULLABLE,
>     PRIORITY STRING NULLABLE,
>     RESOLUTION STRING NULLABLE,
>     issuestatus STRING NULLABLE,
>     CREATED UNIXTIME_MICROS NULLABLE,
>     UPDATED UNIXTIME_MICROS NULLABLE,
>     DUEDATE UNIXTIME_MICROS NULLABLE,
>     RESOLUTIONDATE UNIXTIME_MICROS NULLABLE,
>     VOTES DECIMAL(18, 0) NULLABLE,
>     WATCHES DECIMAL(18, 0) NULLABLE,
>     TIMEORIGINALESTIMATE DECIMAL(18, 0) NULLABLE,
>     TIMEESTIMATE DECIMAL(18, 0) NULLABLE,
>     TIMESPENT DECIMAL(18, 0) NULLABLE,
>     WORKFLOW_ID DECIMAL(18, 0) NULLABLE,
>     SECURITY DECIMAL(18, 0) NULLABLE,
>     FIXFOR DECIMAL(18, 0) NULLABLE,
>     COMPONENT DECIMAL(18, 0) NULLABLE,
>     ARCHIVEDBY STRING NULLABLE,
>     ARCHIVEDDATE UNIXTIME_MICROS NULLABLE,
>     ARCHIVED STRING NULLABLE,
>     PRIMARY KEY (ID)
> )
> HASH (ID) PARTITIONS 2,
> RANGE (ID) (
>     PARTITION UNBOUNDED
> )
> OWNER yarn
> REPLICAS 3
> COMMEN {code}
> this is the external table in impala:
> {code:java}
> CREATE EXTERNAL TABLE prod__tm_jira.jiraissue STORED AS KUDU TBLPROPERTIES ('STATS_GENERATED'='TASK', 'accessType'='8', 'impala.lastComputeStatsTime'='1678903206', 'kudu.master_addresses'='kudu-host', 'kudu.table_name'='prod__tm_jira__jiraissue', 'numFiles'='0', 'numRows'='295165', 'totalSize'='0') {code}
> 'id' is decimal in impala,and table jiraissue has 295336 rows
> !image-2023-03-17-15-13-49-139.png!!image-2023-03-16-18-03-55-375.png|width=514,height=452!
> h2. sql description
>  
> {code:java}
> select
> c1.id,
> c3.child,
> c1.project,
> c1.summary,
> c1.description,
> c1.created
>  from prod__tm_jira.AO_69E499_REQ_RELATION c3
> left join prod__tm_jira.jiraissue c1
>  on c1.id=c3.child 
> where c1.issuetype IN ('11801','10400') and c1.id=324482 {code}
> !image-2023-03-16-18-08-55-259.png|width=765,height=379!
> when i execute the sql,i got the right result ,and from execute plan i found:
> !image-2023-03-16-18-11-28-239.png|width=724,height=526!
> after i execute follow sql,i expect more rows result but i got 0 result: 
> {code:java}
> select
> c1.id,
> c3.child,
> c1.project,
> c1.summary,
> c1.description,
> c1.created
>  from prod__tm_jira.AO_69E499_REQ_RELATION c3
> left join prod__tm_jira.jiraissue c1
>  on c1.id=c3.child 
> where c1.issuetype IN ('11801','10400')
> -- and c1.id=324482 {code}
> !image-2023-03-16-18-12-30-062.png|width=545,height=316!
> and the sql plan like below,i found c3.child didn't cast to decimal,and runtime filter works to id.
> !image-2023-03-16-18-17-57-745.png|width=696,height=692!
> *I got 0 results,and i found impala runtime filter is worked,so is it possible that  impala didn't identify data type with runtime filter?in this case 'id' is decimal but 'child' is bigint.*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org