You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "luoyuxia (Jira)" <ji...@apache.org> on 2022/06/23 06:10:00 UTC
[jira] [Comment Edited] (FLINK-28212) IndexOutOfBoundsException is thrown when project contains window which dosen't refer all fields of input when using Hive dialect

    [ https://issues.apache.org/jira/browse/FLINK-28212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17557839#comment-17557839 ] 

luoyuxia edited comment on FLINK-28212 at 6/23/22 6:09 AM:
-----------------------------------------------------------

To fix it, the idea is straghtforward, make sure the produced project node containing window only contains the required field in HiveParser.  The logical plan produced by Hive parser should looks like:
{code:java}
LogicalSink(table=[*anonymous_collect$1*], fields=[ctinyint, cint, EXPR$2])
  LogicalProject(ctinyint=[$0], cint=[$2], EXPR$2=[COUNT($5) OVER (PARTITION BY $0 ORDER BY $2 DESC NULLS LAST ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING)])
    LogicalTableScan(table=[[test-catalog, default, alltypesorc]]) {code}
 The project node only contains needed field.


was (Author: luoyuxia):
To fix it. The idea is straghtforward, make sure the produced project node containing window only contains the required field in HiveParser.  The logical plan produced by Hive parser should looks like:
{code:java}
LogicalSink(table=[*anonymous_collect$1*], fields=[ctinyint, cint, EXPR$2])
  LogicalProject(ctinyint=[$0], cint=[$2], EXPR$2=[COUNT($5) OVER (PARTITION BY $0 ORDER BY $2 DESC NULLS LAST ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING)])
    LogicalTableScan(table=[[test-catalog, default, alltypesorc]]) {code}
 The project node only contains needed field.

> IndexOutOfBoundsException is thrown when project contains window which dosen't refer all fields of input when using Hive dialect
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-28212
>                 URL: https://issues.apache.org/jira/browse/FLINK-28212
>             Project: Flink
>          Issue Type: Bug
>          Components: Connectors / Hive
>            Reporter: luoyuxia
>            Priority: Major
>             Fix For: 1.16.0
>
>
> Can be reproduced by following sql when using Hive dialect:
> {code:java}
> CREATE TABLE alltypesorc(
>                             ctinyint TINYINT,
>                             csmallint SMALLINT,
>                             cint INT,
>                             cbigint BIGINT,
>                             cfloat FLOAT,
>                             cdouble DOUBLE,
>                             cstring1 STRING,
>                             cstring2 STRING,
>                             ctimestamp1 TIMESTAMP,
>                             ctimestamp2 TIMESTAMP,
>                             cboolean1 BOOLEAN,
>                             cboolean2 BOOLEAN);
> select a.ctinyint, a.cint, count(a.cdouble)
>   over(partition by a.ctinyint order by a.cint desc
>     rows between 1 preceding and 1 following)
> from alltypesorc {code}
> Then it will throw the exception "caused by: java.lang.IndexOutOfBoundsException: index (7) must be less than size (1)".
>  
> The reson is for such sql, Hive dialect will generate a RelNode:
> {code:java}
> LogicalSink(table=[*anonymous_collect$1*], fields=[ctinyint, cint, _o__c2])
>   LogicalProject(ctinyint=[$0], cint=[$2], _o__c2=[$12])
>     LogicalProject(ctinyint=[$0], csmallint=[$1], cint=[$2], cbigint=[$3], cfloat=[$4], cdouble=[$5], cstring1=[$6], cstring2=[$7], ctimestamp1=[$8], ctimestamp2=[$9], cboolean1=[$10], cboolean2=[$11], _o__col13=[COUNT($5) OVER (PARTITION BY $0 ORDER BY $2 DESC NULLS LAST ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING)])
>       LogicalTableScan(table=[[test-catalog, default, alltypesorc]]) {code}
>  Note: the first ProjectNode  from down to top conatins all fields.
> And as the  "{*}1{*} PRECEDING AND *1* FOLLOWING"  in the window whose input will also contains all fields in the project node  will be converted to RexInputRef in Calcite. So, the window will be like 
> {code:java}
> COUNT($5) OVER (PARTITION BY $0 ORDER BY $2 DESC NULLS LAST ROWS BETWEEN $11 PRECEDING AND $11 FOLLOWING{code}
> {color:#172b4d}Note: `$11` is a special field for windows, which is actually recorded as window's constants.{color}
>  
> But the in rule "ProjectWindowTransposeRule", the uncesscassy field(not refered by the top project and window) will be removed,
> so the the input of the window will only contains 4 fields (ctinyint, cint, cdouble, count(cdouble)).
> Finally, in RelExplainUtil, when explain boundString, it won't find {*}$11{*}, so the exception "Caused by: java.lang.IndexOutOfBoundsException: index (8) must be less than size (1)" throws.
> {code:java}
> val ref = bound.getOffset.asInstanceOf[RexInputRef]
> // ref.getIndex will be 11 but origin input size of the window is 3
> val boundIndex = ref.getIndex - calcOriginInputRows(window)
> // offset = 8, but the window's constants only contains one single element "1"
> val offset = window.constants.get(boundIndex).getValue2
> val offsetKind = if (bound.isPreceding) "PRECEDING" else "FOLLOWING"
> s"$offset $offsetKind" {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)