You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "luoyuxia (Jira)" <ji...@apache.org> on 2022/06/23 04:29:00 UTC
[jira] [Updated] (FLINK-28212) IndexOutOfBoundsException is thrown when project contains window which dosen't refer all fields of input when using Hive dialect

     [ https://issues.apache.org/jira/browse/FLINK-28212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

luoyuxia updated FLINK-28212:
-----------------------------
    Description: 
Can be reproduced by following sql
{code:java}
CREATE TABLE alltypesorc(
                            ctinyint TINYINT,
                            csmallint SMALLINT,
                            cint INT,
                            cbigint BIGINT,
                            cfloat FLOAT,
                            cdouble DOUBLE,
                            cstring1 STRING,
                            cstring2 STRING,
                            ctimestamp1 TIMESTAMP,
                            ctimestamp2 TIMESTAMP,
                            cboolean1 BOOLEAN,
                            cboolean2 BOOLEAN);

select a.ctinyint, a.cint, count(a.cdouble)
  over(partition by a.ctinyint order by a.cint desc
    rows between 1 preceding and 1 following)
from alltypesorc {code}
Then it will throw the exception "caused by: java.lang.IndexOutOfBoundsException: index (7) must be less than size (1)".

 

The reson is for such sql, Hive dialect will generate a RelNode:

 
{code:java}
LogicalSink(table=[*anonymous_collect$1*], fields=[ctinyint, cint, _o__c2])
  LogicalProject(ctinyint=[$0], cint=[$2], _o__c2=[$12])
    LogicalProject(ctinyint=[$0], csmallint=[$1], cint=[$2], cbigint=[$3], cfloat=[$4], cdouble=[$5], cstring1=[$6], cstring2=[$7], ctimestamp1=[$8], ctimestamp2=[$9], cboolean1=[$10], cboolean2=[$11], _o__col13=[COUNT($5) OVER (PARTITION BY $0 ORDER BY $2 DESC NULLS LAST ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING)])
      LogicalTableScan(table=[[test-catalog, default, alltypesorc]]) {code}
 Note: the first ProjectNode  from down to top conatins all fields.

 

And as the  "{*}1{*} PRECEDING AND *1* FOLLOWING"  in windows will be converted to field access. So, the window will be like 

 
{code:java}
COUNT($5) OVER (PARTITION BY $0 ORDER BY $2 DESC NULLS LAST ROWS BETWEEN $12 PRECEDING AND $12 FOLLOWING{code}
 

But the in rule "ProjectWindowTransposeRule", the uncesscassy field(not refered by the top project and window) will be removed,

so the the input of the window will only contains 3 fields (ctinyint, cint, cdouble).

Finally, in RelExplainUtil, when explain boundString, it won't find {*}$12{*}, so the exception throws.

 
{code:java}
val ref = bound.getOffset.asInstanceOf[RexInputRef]
// ref.getIndex will be 12, but input size of the window is 3
val boundIndex = ref.getIndex - calcOriginInputRows(window)
// the window's constants only contains one single element "1"
val offset = window.constants.get(boundIndex).getValue2
val offsetKind = if (bound.isPreceding) "PRECEDING" else "FOLLOWING"
s"$offset $offsetKind" {code}
 

 

  was:
Can be reproduced by following sql
{code:java}
CREATE TABLE alltypesorc(
                            ctinyint TINYINT,
                            csmallint SMALLINT,
                            cint INT,
                            cbigint BIGINT,
                            cfloat FLOAT,
                            cdouble DOUBLE,
                            cstring1 STRING,
                            cstring2 STRING,
                            ctimestamp1 TIMESTAMP,
                            ctimestamp2 TIMESTAMP,
                            cboolean1 BOOLEAN,
                            cboolean2 BOOLEAN);

select a.ctinyint, a.cint, count(a.cdouble)
  over(partition by a.ctinyint order by a.cint desc
    rows between 1 preceding and 1 following)
from alltypesorc {code}
Then it will throw Caused by: java.lang.IndexOutOfBoundsException: index (7) must be less than size (1)


> IndexOutOfBoundsException is thrown when project contains window which dosen't refer all fields of input when using Hive dialect
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-28212
>                 URL: https://issues.apache.org/jira/browse/FLINK-28212
>             Project: Flink
>          Issue Type: Bug
>          Components: Connectors / Hive
>            Reporter: luoyuxia
>            Priority: Major
>             Fix For: 1.16.0
>
>
> Can be reproduced by following sql
> {code:java}
> CREATE TABLE alltypesorc(
>                             ctinyint TINYINT,
>                             csmallint SMALLINT,
>                             cint INT,
>                             cbigint BIGINT,
>                             cfloat FLOAT,
>                             cdouble DOUBLE,
>                             cstring1 STRING,
>                             cstring2 STRING,
>                             ctimestamp1 TIMESTAMP,
>                             ctimestamp2 TIMESTAMP,
>                             cboolean1 BOOLEAN,
>                             cboolean2 BOOLEAN);
> select a.ctinyint, a.cint, count(a.cdouble)
>   over(partition by a.ctinyint order by a.cint desc
>     rows between 1 preceding and 1 following)
> from alltypesorc {code}
> Then it will throw the exception "caused by: java.lang.IndexOutOfBoundsException: index (7) must be less than size (1)".
>  
> The reson is for such sql, Hive dialect will generate a RelNode:
>  
> {code:java}
> LogicalSink(table=[*anonymous_collect$1*], fields=[ctinyint, cint, _o__c2])
>   LogicalProject(ctinyint=[$0], cint=[$2], _o__c2=[$12])
>     LogicalProject(ctinyint=[$0], csmallint=[$1], cint=[$2], cbigint=[$3], cfloat=[$4], cdouble=[$5], cstring1=[$6], cstring2=[$7], ctimestamp1=[$8], ctimestamp2=[$9], cboolean1=[$10], cboolean2=[$11], _o__col13=[COUNT($5) OVER (PARTITION BY $0 ORDER BY $2 DESC NULLS LAST ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING)])
>       LogicalTableScan(table=[[test-catalog, default, alltypesorc]]) {code}
>  Note: the first ProjectNode  from down to top conatins all fields.
>  
> And as the  "{*}1{*} PRECEDING AND *1* FOLLOWING"  in windows will be converted to field access. So, the window will be like 
>  
> {code:java}
> COUNT($5) OVER (PARTITION BY $0 ORDER BY $2 DESC NULLS LAST ROWS BETWEEN $12 PRECEDING AND $12 FOLLOWING{code}
>  
> But the in rule "ProjectWindowTransposeRule", the uncesscassy field(not refered by the top project and window) will be removed,
> so the the input of the window will only contains 3 fields (ctinyint, cint, cdouble).
> Finally, in RelExplainUtil, when explain boundString, it won't find {*}$12{*}, so the exception throws.
>  
> {code:java}
> val ref = bound.getOffset.asInstanceOf[RexInputRef]
> // ref.getIndex will be 12, but input size of the window is 3
> val boundIndex = ref.getIndex - calcOriginInputRows(window)
> // the window's constants only contains one single element "1"
> val offset = window.constants.get(boundIndex).getValue2
> val offsetKind = if (bound.isPreceding) "PRECEDING" else "FOLLOWING"
> s"$offset $offsetKind" {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)