You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Gera Shegalov (Jira)" <ji...@apache.org> on 2022/12/31 01:07:00 UTC

[jira] [Created] (SPARK-41793) Incorrect result for window frames defined as ranges on large decimals

Gera Shegalov created SPARK-41793:
-------------------------------------

             Summary: Incorrect result for window frames defined as ranges on large decimals 
                 Key: SPARK-41793
                 URL: https://issues.apache.org/jira/browse/SPARK-41793
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.4.0
            Reporter: Gera Shegalov


Context https://github.com/NVIDIA/spark-rapids/issues/7429#issuecomment-1368040686

The following windowing query on a simple two-row input should produce two non-empty windows as a result

{code}
from pprint import pprint
data = [
  ('9223372036854775807', '11342371013783243717493546650944543.47'),
  ('9223372036854775807', '999999999999999999999999999999999999.99')
]
df1 = spark.createDataFrame(data, 'a STRING, b STRING')
df2 = df1.select(df1.a.cast('LONG'), df1.b.cast('DECIMAL(38,2)'))
df2.createOrReplaceTempView('test_table')
df = sql('''
  SELECT 
    COUNT(1) OVER (
      PARTITION BY a 
      ORDER BY b ASC 
      RANGE BETWEEN 10.2345 PRECEDING AND 6.7890 FOLLOWING
    ) AS CNT_1 
  FROM 
    test_table
  ''')
res = df.collect()
df.explain(True)
pprint(res)
{code}

SparkĀ 3.4.0-SNAPSHOT output:
{code}
[Row(CNT_1=1), Row(CNT_1=0)]
{code}

Spark 3.3.1 output as expected:
{code}
Row(CNT_1=1), Row(CNT_1=1)]
{code}







--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org