You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Gabor Kaszab (Jira)" <ji...@apache.org> on 2022/06/10 12:59:00 UTC
[jira] [Commented] (IMPALA-11280) Zipping unnest hits DCHECK when querying from a view that has an IN operator
[ https://issues.apache.org/jira/browse/IMPALA-11280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17552735#comment-17552735 ]
Gabor Kaszab commented on IMPALA-11280:
---------------------------------------
The issue is most probably related to the unnest predicates being picked up by the JOIN node instead of the UNNEST node. Here is the profile of the problematic query:
{code:java}
+-------------------------------------------------------------------------------------------------------------------+
| Explain String |
+-------------------------------------------------------------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=6.98MB Threads=6 |
| Per-Host Resource Estimates: Memory=88MB |
| WARNING: The following tables are missing relevant table and/or column statistics. |
| functional_parquet.alltypestiny, functional_parquet.complextypes_arrays |
| |
| PLAN-ROOT SINK |
| | |
| 08:EXCHANGE [UNPARTITIONED] |
| | |
| 06:SUBPLAN |
| | row-size=44B cardinality=13.51K |
| | |
| |--04:NESTED LOOP JOIN [CROSS JOIN] |
| | | row-size=44B cardinality=10 |
| | | |
| | |--02:SINGULAR ROW SRC |
| | | row-size=28B cardinality=1 |
| | | |
| | 03:UNNEST [functional_parquet.complextypes_arrays.arr1 arr1, functional_parquet.complextypes_arrays.arr2 arr2] |
| | row-size=0B cardinality=10 |
| | |
| 05:HASH JOIN [LEFT SEMI JOIN, BROADCAST] |
| | hash predicates: id = id |
| | other join predicates: UNNEST(arr1) < 5 |
| | runtime filters: RF000 <- id |
| | row-size=28B cardinality=1.35K |
| | |
| |--07:EXCHANGE [BROADCAST] |
| | | |
| | 01:SCAN HDFS [functional_parquet.alltypestiny] |
| | HDFS partitions=4/4 files=4 size=11.92KB |
| | row-size=4B cardinality=758 |
| | |
| 00:SCAN HDFS [functional_parquet.complextypes_arrays] |
| HDFS partitions=1/1 files=1 size=1.06KB |
| runtime filters: RF000 -> id |
| row-size=28B cardinality=1.35K |
+-------------------------------------------------------------------------------------------------------------------+
{code}
Note the predicate: UNNEST(arr1) in 05:Hash Join node
> Zipping unnest hits DCHECK when querying from a view that has an IN operator
> ----------------------------------------------------------------------------
>
> Key: IMPALA-11280
> URL: https://issues.apache.org/jira/browse/IMPALA-11280
> Project: IMPALA
> Issue Type: Bug
> Components: Frontend
> Affects Versions: Impala 4.1.0
> Reporter: Gabor Kaszab
> Assignee: Daniel Becker
> Priority: Major
> Labels: complextype
>
> *Repro steps:*
> 1) Create a view that returns arrays and has an IN operator in the WHERE clause:
> {code:java}
> drop view if exists unnest_bug_view;
> create view unnest_bug_view as (
> select id, arr1, arr2
> from functional_parquet.complextypes_arrays
> where id % 2 = 1 and id in (select id from functional_parquet.alltypestiny)
> ); {code}
> 2) Unnest the arrays and filter by the unnested values in an outer SELECT:
> {code:java}
> select
> id,
> unnested_arr1,
> unnested_arr2
> from
> (select
> id,
> unnest(arr1) as unnested_arr1,
> unnest(arr2) as unnested_arr2
> from unnest_bug_view) a
> where a.unnested_arr1 < 5; {code}
> This hits a DCHECK in RowDescriptor::GetTupleIdx()
>
>
> {code:java}
> descriptors.cc:467] 5643fd6cdd5cece3:77942ead00000000] Check failed: id < tuple_idx_map_.size() (3 vs. 2) RowDescriptor: Tuple(id=0 size=29 slots=[Slot(id=2 type=INT col_path=[0] offset=24 null=(offset=28 mask=4) slot_idx=2 field_idx=2), Slot(id=3 type=ARRAY col_path=[1] children_tuple_id=3 offset=0 null=(offset=28 mask=1) slot_idx=0 field_idx=0), Slot(id=5 type=ARRAY col_path=[2] children_tuple_id=4 offset=12 null=(offset=28 mask=2) slot_idx=1 field_idx=1)] tuple_path=[])
> Tuple(id=1 size=5 slots=[Slot(id=0 type=INT col_path=[2] offset=0 null=(offset=4 mask=1) slot_idx=0 field_idx=0)] tuple_path=[])
> *** Check failure stack trace: ***
> @ 0x36fe72c google::LogMessage::Fail()
> @ 0x36fffdc google::LogMessage::SendToLog()
> @ 0x36fe08a google::LogMessage::Flush()
> @ 0x3701c48 google::LogMessageFatal::~LogMessageFatal()
> @ 0x12e47ab impala::RowDescriptor::GetTupleIdx()
> @ 0x1b378f5 impala::SlotRef::Init()
> @ 0x1b25fea impala::ScalarExpr::Init()
> @ 0x1b665b2 impala::ScalarFnCall::Init()
> @ 0x1b2c44e impala::ScalarExpr::Create()
> @ 0x1b2c5df impala::ScalarExpr::Create()
> @ 0x1b2c6a0 impala::ScalarExpr::Create()
> @ 0x19ad286 impala::PartitionedHashJoinPlanNode::Init()
> @ 0x18b5d8d impala::PlanNode::CreateTreeHelper()
> @ 0x18b5cd9 impala::PlanNode::CreateTreeHelper()
> @ 0x18b5e48 impala::PlanNode::CreateTree()
> @ 0x12f4ca7 impala::FragmentState::Init()
> @ 0x12f839c impala::FragmentState::CreateFragmentStateMap()
> @ 0x126cedb impala::QueryState::StartFInstances()
> @ 0x125c4df impala::QueryExecMgr::ExecuteQueryHelper()
> {code}
>
>
> Some notes about the repro:
> - The inside of the select (without filtering on the unnested value) is OK.
> - If I unnest only one array then this is OK.
> - If I remove the IN clause from the view’s DDL then the query runs well.
>
> {*}Update{*}:
> I managed to do a repro without creating an actual view. This might reduce the complexity with the tuple/slot IDs for the investigation.
> {code:java}
> select id, unnested_arr1, unnested_arr2 from (
> select id, unnest(arr1) as unnested_arr1, unnest(arr2) as unnested_arr2
> from functional_parquet.complextypes_arrays
> where id in (select id from functional_parquet.alltypestiny)) a
> where a.unnested_arr1 < 5 {code}
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org