You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Quanlong Huang (Jira)" <ji...@apache.org> on 2023/03/26 10:10:00 UTC
[jira] [Commented] (IMPALA-12007) wrong result with Pushdown runtime min-max filters on bitlong and decimal
[ https://issues.apache.org/jira/browse/IMPALA-12007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17705051#comment-17705051 ]
Quanlong Huang commented on IMPALA-12007:
-----------------------------------------
[~dachn] Thanks for reporting this! I can reproduce the issue using a simplified example:
{code:sql}
create table left_tbl (
id decimal(18, 0),
project decimal(18, 0),
primary key (id)
)
partition by hash (id) partitions 2
stored as kudu;
create table right_tbl (
id int,
child bigint,
primary key (id)
)
partition by hash (id) partitions 2
stored as kudu;
insert into left_tbl values (100, 200);
insert into right_tbl values (0, 100);
select straight_join
a.id,
b.child
from left_tbl a
join right_tbl b on a.id=b.child;{code}
The SELECT query returns 0 rows. If setting query option enabled_runtime_filter_types=bloom, it returns 1 row.
In the query plan, I can see the MinMax runtime filter:
{noformat}
F02:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
| Per-Host Resources: mem-estimate=10.05MB mem-reservation=4.00MB thread-reservation=1
PLAN-ROOT SINK
| output exprs: a.id, b.child
| mem-estimate=10.00MB mem-reservation=4.00MB spill-buffer=2.00MB thread-reservation=0
|
04:EXCHANGE [UNPARTITIONED]
| mem-estimate=47.99KB mem-reservation=0B thread-reservation=0
| tuple-ids=0,1 row-size=16B cardinality=unavailable
| in pipelines: 00(GETNEXT)
|
F00:PLAN FRAGMENT [RANDOM] hosts=2 instances=2
Per-Host Resources: mem-estimate=2.00GB mem-reservation=34.00MB thread-reservation=2
02:HASH JOIN [INNER JOIN, BROADCAST]
| hash predicates: a.id = b.child
| fk/pk conjuncts: assumed fk/pk
| runtime filters: RF001[min_max] <- b.child
| mem-estimate=2.00GB mem-reservation=34.00MB spill-buffer=2.00MB thread-reservation=0
| tuple-ids=0,1 row-size=16B cardinality=unavailable
| in pipelines: 00(GETNEXT), 01(OPEN)
|
|--03:EXCHANGE [BROADCAST]
| | mem-estimate=23.99KB mem-reservation=0B thread-reservation=0
| | tuple-ids=1 row-size=8B cardinality=unavailable
| | in pipelines: 01(GETNEXT)
| |
| F01:PLAN FRAGMENT [RANDOM] hosts=2 instances=2
| Per-Host Resources: mem-estimate=816.00KB mem-reservation=0B thread-reservation=2
| 01:SCAN KUDU [default.right_tbl b]
| mem-estimate=768.00KB mem-reservation=0B thread-reservation=1
| tuple-ids=1 row-size=8B cardinality=unavailable
| in pipelines: 01(GETNEXT)
|
00:SCAN KUDU [default.left_tbl a]
runtime filters: RF001[min_max] -> a.id
mem-estimate=768.00KB mem-reservation=0B thread-reservation=1
tuple-ids=0 row-size=8B cardinality=unavailable
in pipelines: 00(GETNEXT) {noformat}
When using the DEBUG build, there is a DCHECK assertion failure:
{noformat}
F0326 18:07:35.611291 17304 kudu-scanner.cc:275] 254cf96c51bf73b2:3affec3300000003] Check failed: col_type.IsIntegerType()
*** Check failure stack trace: ***
@ 0x37ba83d google::LogMessage::Fail()
@ 0x37bc774 google::LogMessage::SendToLog()
@ 0x37ba21c google::LogMessage::Flush()
@ 0x37bcc99 google::LogMessageFatal::~LogMessageFatal()
@ 0x1aba79d impala::KuduScanner::OpenNextScanToken()
@ 0x1aa994e impala::KuduScanNode::ProcessScanToken()
@ 0x1aaaac8 impala::KuduScanNode::RunScannerThread()
@ 0x1aaae1d _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12KuduScanNode17ThreadAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE
@ 0x18c08b4 impala::Thread::SuperviseThread()
@ 0x18c16bd boost::detail::thread_data<>::run()
@ 0x23b9e67 thread_proxy
@ 0x7f67cbbd46db start_thread
@ 0x7f67c894c61f clone
Minidump in thread [17304]kudu-scanner-thread (finst:254cf96c51bf73b2:3affec3300000003, plan-node-id:0, thread-idx:0) running query 254cf96c51bf73b2:3affec3300000000, fragment instance 254cf96c51bf73b2:3affec3300000003
Wrote minidump to /home/quanlong/workspace/Impala/logs/cluster/minidumps/impalad/966697e9-16de-4eee-dded2db6-82b05a8a.dmp{noformat}
Related codes:
{code:cpp}
268 // If the type of the filter is not the same as the type of the target column,
269 // there must be an implicit integer cast and we need to ensure the min/max we
270 // pass to Kudu are within the range of the target column.
271 int64_t int_min;
272 int64_t int_max;
273 const ColumnType& col_type = ColumnType::FromThrift(target_desc.kudu_col_type);
274 if (col_type.type != filter->type()) {
275 DCHECK(col_type.IsIntegerType());
276
277 if (!filter->GetCastIntMinMax(col_type, &int_min, &int_max)) {
278 // The min/max for this filter is outside the range for the target column,
279 // so all rows are filtered out and we can skip the scan.
280 CloseCurrentClientScanner();
281 *eos = true;
282 return Status::OK();
283 }
284 min = &int_min;
285 max = &int_max;
286 } {code}
> wrong result with Pushdown runtime min-max filters on bitlong and decimal
> -------------------------------------------------------------------------
>
> Key: IMPALA-12007
> URL: https://issues.apache.org/jira/browse/IMPALA-12007
> Project: IMPALA
> Issue Type: Bug
> Affects Versions: Impala 4.1.1
> Environment: centos
> Reporter: daicheng
> Priority: Major
> Attachments: image-2023-03-16-18-03-55-375.png, image-2023-03-16-18-05-03-201.png, image-2023-03-16-18-08-55-259.png, image-2023-03-16-18-11-28-239.png, image-2023-03-16-18-12-30-062.png, image-2023-03-16-18-17-57-745.png, image-2023-03-17-15-09-40-463.png, image-2023-03-17-15-13-49-139.png, image-2023-03-17-15-17-23-669.png
>
>
> h1. wrong result with Pushdown runtime min-max filters
> h2. table description
> I have two tables in kudu,prod_{_}tm_jira{_}_AO_69E499_REQ_RELATION is like below,and i create external table in impala:
> {code:java}
> TABLE prod_tm_jira_AO_69E499_REQ_RELATION (
> ID INT32 NOT NULL,
> kudu_change_time UNIXTIME_MICROS NOT NULL,
> kudu_update_time UNIXTIME_MICROS NOT NULL,
> kudu_is_deleted BOOL NOT NULL,
> CHILD INT64 NULLABLE,
> CHILD_PROJECT INT64 NULLABLE,
> PARENT INT64 NULLABLE,
> PARENT_PROJECT INT64 NULLABLE,
> SEQUENCE INT32 NULLABLE,
> SEQ_BALANCER INT32 NULLABLE,
> PRIMARY KEY (ID)
> )
> HASH (ID) PARTITIONS 2,
> RANGE (ID) (
> PARTITION UNBOUNDED
> )
> OWNER yarn
> REPLICAS 3
> COMMENT {code}
> {code:java}
> CREATE EXTERNAL TABLE prod__tm_jira.ao_69e499_req_relation STORED AS KUDU TBLPROPERTIES ('STATS_GENERATED'='TASK', 'accessType'='8', 'impala.lastComputeStatsTime'='1678903202', 'kudu.master_addresses'='kudu-host', 'kudu.table_name'='prod__tm_jira__AO_69E499_REQ_RELATION', 'numFiles'='0', 'numRows'='62204', 'totalSize'='0') {code}
> ‘child’ is bigint in impala,and table AO_69E499_REQ_RELATION table has 6273 rows:
> !image-2023-03-17-15-17-23-669.png! !image-2023-03-16-18-05-03-201.png|width=395,height=295!
> and another table describe below:
> {code:java}
> TABLE prod_tm_jira_jiraissue (
> ID DECIMAL(18, 0) NOT NULL,
> kudu_change_time UNIXTIME_MICROS NOT NULL,
> kudu_update_time UNIXTIME_MICROS NOT NULL,
> kudu_is_deleted BOOL NOT NULL,
> pkey STRING NULLABLE,
> issuenum DECIMAL(18, 0) NULLABLE,
> PROJECT DECIMAL(18, 0) NULLABLE,
> REPORTER STRING NULLABLE,
> ASSIGNEE STRING NULLABLE,
> CREATOR STRING NULLABLE,
> issuetype STRING NULLABLE,
> SUMMARY STRING NULLABLE,
> DESCRIPTION STRING NULLABLE,
> ENVIRONMENT STRING NULLABLE,
> PRIORITY STRING NULLABLE,
> RESOLUTION STRING NULLABLE,
> issuestatus STRING NULLABLE,
> CREATED UNIXTIME_MICROS NULLABLE,
> UPDATED UNIXTIME_MICROS NULLABLE,
> DUEDATE UNIXTIME_MICROS NULLABLE,
> RESOLUTIONDATE UNIXTIME_MICROS NULLABLE,
> VOTES DECIMAL(18, 0) NULLABLE,
> WATCHES DECIMAL(18, 0) NULLABLE,
> TIMEORIGINALESTIMATE DECIMAL(18, 0) NULLABLE,
> TIMEESTIMATE DECIMAL(18, 0) NULLABLE,
> TIMESPENT DECIMAL(18, 0) NULLABLE,
> WORKFLOW_ID DECIMAL(18, 0) NULLABLE,
> SECURITY DECIMAL(18, 0) NULLABLE,
> FIXFOR DECIMAL(18, 0) NULLABLE,
> COMPONENT DECIMAL(18, 0) NULLABLE,
> ARCHIVEDBY STRING NULLABLE,
> ARCHIVEDDATE UNIXTIME_MICROS NULLABLE,
> ARCHIVED STRING NULLABLE,
> PRIMARY KEY (ID)
> )
> HASH (ID) PARTITIONS 2,
> RANGE (ID) (
> PARTITION UNBOUNDED
> )
> OWNER yarn
> REPLICAS 3
> COMMEN {code}
> this is the external table in impala:
> {code:java}
> CREATE EXTERNAL TABLE prod__tm_jira.jiraissue STORED AS KUDU TBLPROPERTIES ('STATS_GENERATED'='TASK', 'accessType'='8', 'impala.lastComputeStatsTime'='1678903206', 'kudu.master_addresses'='kudu-host', 'kudu.table_name'='prod__tm_jira__jiraissue', 'numFiles'='0', 'numRows'='295165', 'totalSize'='0') {code}
> 'id' is decimal in impala,and table jiraissue has 295336 rows
> !image-2023-03-17-15-13-49-139.png!!image-2023-03-16-18-03-55-375.png|width=514,height=452!
> h2. sql description
>
> {code:java}
> select
> c1.id,
> c3.child,
> c1.project,
> c1.summary,
> c1.description,
> c1.created
> from prod__tm_jira.AO_69E499_REQ_RELATION c3
> left join prod__tm_jira.jiraissue c1
> on c1.id=c3.child
> where c1.issuetype IN ('11801','10400') and c1.id=324482 {code}
> !image-2023-03-16-18-08-55-259.png|width=765,height=379!
> when i execute the sql,i got the right result ,and from execute plan i found:
> !image-2023-03-16-18-11-28-239.png|width=724,height=526!
> after i execute follow sql,i expect more rows result but i got 0 result:
> {code:java}
> select
> c1.id,
> c3.child,
> c1.project,
> c1.summary,
> c1.description,
> c1.created
> from prod__tm_jira.AO_69E499_REQ_RELATION c3
> left join prod__tm_jira.jiraissue c1
> on c1.id=c3.child
> where c1.issuetype IN ('11801','10400')
> -- and c1.id=324482 {code}
> !image-2023-03-16-18-12-30-062.png|width=545,height=316!
> and the sql plan like below,i found c3.child didn't cast to decimal,and runtime filter works to id.
> !image-2023-03-16-18-17-57-745.png|width=696,height=692!
> *I got 0 results,and i found impala runtime filter is worked,so is it possible that impala didn't identify data type with runtime filter?in this case 'id' is decimal but 'child' is bigint.*
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org