You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dongjoon Hyun (Jira)" <ji...@apache.org> on 2023/11/15 17:15:00 UTC

[jira] [Updated] (SPARK-43393) Sequence expression can overflow

     [ https://issues.apache.org/jira/browse/SPARK-43393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dongjoon Hyun updated SPARK-43393:
----------------------------------
    Fix Version/s:     (was: 3.5.1)

> Sequence expression can overflow
> --------------------------------
>
>                 Key: SPARK-43393
>                 URL: https://issues.apache.org/jira/browse/SPARK-43393
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.5.0
>            Reporter: Deepayan Patra
>            Assignee: Deepayan Patra
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>
> Spark has a (long-standing) overflow bug in the {{sequence}} expression.
>  
> Consider the following operations:
> {{spark.sql("CREATE TABLE foo (l LONG);")}}
> {{spark.sql(s"INSERT INTO foo VALUES (${Long.MaxValue});")}}
> {{spark.sql("SELECT sequence(0, l) FROM foo;").collect()}}
>  
> The result of these operations will be:
> {{Array[org.apache.spark.sql.Row] = Array([WrappedArray()])}}
> an unintended consequence of overflow.
>  
> The sequence is applied to values {{0}} and {{Long.MaxValue}} with a step size of {{1}} which uses a length computation defined [here|https://github.com/apache/spark/blob/16411188c7ba6cb19c46a2bd512b2485a4c03e2c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L3451]. In this calculation, with {{{}start = 0{}}}, {{{}stop = Long.MaxValue{}}}, and {{{}step = 1{}}}, the calculated {{len}} overflows to {{{}Long.MinValue{}}}. The computation, in binary looks like:
> {{{{0111111111111111111111111111111111111111111111111111111111111111 -}}}}
> {{{{0000000000000000000000000000000000000000000000000000000000000000}}}}
> {{{{------------------------------------------------------------------      0111111111111111111111111111111111111111111111111111111111111111 /}}}}
> {{{{0000000000000000000000000000000000000000000000000000000000000001}}}}
> {{{{------------------------------------------------------------------                0111111111111111111111111111111111111111111111111111111111111111 +}}}}
> {{{{0000000000000000000000000000000000000000000000000000000000000001}}}}
> {{{{------------------------------------------------------------------      1000000000000000000000000000000000000000000000000000000000000000}}}}
> The following [check|https://github.com/apache/spark/blob/16411188c7ba6cb19c46a2bd512b2485a4c03e2c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L3454] passes as the negative {{Long.MinValue}} is still {{{}<= MAX_ROUNDED_ARRAY_LENGTH{}}}. The following cast to {{toInt}} uses this representation and [truncates the upper bits|https://github.com/apache/spark/blob/16411188c7ba6cb19c46a2bd512b2485a4c03e2c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L3457] resulting in an empty length of 0.
> Other overflows are similarly problematic.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org