You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Zhizhen Hou (Jira)" <ji...@apache.org> on 2023/10/10 07:19:00 UTC
[jira] [Updated] (SPARK-45478) codegen sum(decimal_column / 2) computes div twice
[ https://issues.apache.org/jira/browse/SPARK-45478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zhizhen Hou updated SPARK-45478:
--------------------------------
Description:
*The SQL to reproduce*
{code:java}
create table t_dec (c1 decimal(6,2));
insert into t_dec values(1.0),(2.0),(null),(3.0);
explain codegen select sum(c1/2) from t_dec; {code}
*Reasons may cause the result:*
Function sum use If expression in updateExpressions:
`If(child.isNull, sum, sum + KnownNotNull(child).cast(resultType))`
The three variables in if expression like this.
{code:java}
predicate: isnull(CheckOverflow((promote_precision(input[2, decimal(10,0), true]) / 2), DecimalType(16,6), true))trueValue: input[0, decimal(26,6), true]falseValue: (input[0, decimal(26,6), true] + cast(knownnotnull(CheckOverflow((promote_precision(input[2, decimal(10,0), true]) / 2), DecimalType(16,6), true)) as decimal(26,6))) {code}
In sub expression elimination, only predicate is evaluated in EquivalentExpressions# childrenToRecurse
{code:java}
private def childrenToRecurse(expr: Expression): Seq[Expression] = expr match {
case _: CodegenFallback => Nil
case i: If => i.predicate :: Nil
case c: CaseWhen => c.children.head :: Nil
case c: Coalesce => c.children.head :: Nil
case other => other.children
} {code}
I tried to replace `case i: If => i.predicate :: Nil` with 'case i: If => i.predicate :: trueValue :: falseValue :: Nil', and it produce correct result.
But the following comment in `childrenToRecurse` makes me not sure it will cause any other problems.
{code:java}
// 2. If: common subexpressions will always be evaluated at the beginning, but the true and
// false expressions in `If` may not get accessed, according to the predicate
// expression. We should only recurse into the predicate expression. {code}
was:
*The SQL to reproduce*
create table t_dec (c1 decimal(6,2));
insert into t_dec values(1.0),(2.0),(null),(3.0);
explain codegen select sum(c1/2) from t_dec;
*Reasons may cause the result:*
Function sum use If expression in updateExpressions:
`If(child.isNull, sum, sum + KnownNotNull(child).cast(resultType))`
The three variables in if expression like this.
{code:java}
predicate: isnull(CheckOverflow((promote_precision(input[2, decimal(10,0), true]) / 2), DecimalType(16,6), true))trueValue: input[0, decimal(26,6), true]falseValue: (input[0, decimal(26,6), true] + cast(knownnotnull(CheckOverflow((promote_precision(input[2, decimal(10,0), true]) / 2), DecimalType(16,6), true)) as decimal(26,6))) {code}
In sub expression elimination, only predicate is evaluated in EquivalentExpressions# childrenToRecurse
{code:java}
private def childrenToRecurse(expr: Expression): Seq[Expression] = expr match {
case _: CodegenFallback => Nil
case i: If => i.predicate :: Nil
case c: CaseWhen => c.children.head :: Nil
case c: Coalesce => c.children.head :: Nil
case other => other.children
} {code}
I tried to replace `case i: If => i.predicate :: Nil` with 'case i: If => i.predicate :: trueValue :: falseValue :: Nil', and it produce correct result.
But the following comment in `childrenToRecurse` makes me not sure it will cause any other problems.
{code:java}
// 2. If: common subexpressions will always be evaluated at the beginning, but the true and
// false expressions in `If` may not get accessed, according to the predicate
// expression. We should only recurse into the predicate expression. {code}
> codegen sum(decimal_column / 2) computes div twice
> --------------------------------------------------
>
> Key: SPARK-45478
> URL: https://issues.apache.org/jira/browse/SPARK-45478
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 3.2.0
> Reporter: Zhizhen Hou
> Priority: Minor
>
> *The SQL to reproduce*
> {code:java}
> create table t_dec (c1 decimal(6,2));
> insert into t_dec values(1.0),(2.0),(null),(3.0);
> explain codegen select sum(c1/2) from t_dec; {code}
>
> *Reasons may cause the result:*
>
> Function sum use If expression in updateExpressions:
> `If(child.isNull, sum, sum + KnownNotNull(child).cast(resultType))`
>
> The three variables in if expression like this.
> {code:java}
> predicate: isnull(CheckOverflow((promote_precision(input[2, decimal(10,0), true]) / 2), DecimalType(16,6), true))trueValue: input[0, decimal(26,6), true]falseValue: (input[0, decimal(26,6), true] + cast(knownnotnull(CheckOverflow((promote_precision(input[2, decimal(10,0), true]) / 2), DecimalType(16,6), true)) as decimal(26,6))) {code}
> In sub expression elimination, only predicate is evaluated in EquivalentExpressions# childrenToRecurse
> {code:java}
> private def childrenToRecurse(expr: Expression): Seq[Expression] = expr match {
> case _: CodegenFallback => Nil
> case i: If => i.predicate :: Nil
> case c: CaseWhen => c.children.head :: Nil
> case c: Coalesce => c.children.head :: Nil
> case other => other.children
> } {code}
> I tried to replace `case i: If => i.predicate :: Nil` with 'case i: If => i.predicate :: trueValue :: falseValue :: Nil', and it produce correct result.
>
> But the following comment in `childrenToRecurse` makes me not sure it will cause any other problems.
> {code:java}
> // 2. If: common subexpressions will always be evaluated at the beginning, but the true and
> // false expressions in `If` may not get accessed, according to the predicate
> // expression. We should only recurse into the predicate expression. {code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org