You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Linhong Liu (Jira)" <ji...@apache.org> on 2020/09/08 06:35:00 UTC

[jira] [Created] (SPARK-32816) Planner error when aggregating multiple distinct DECIMAL columns

Linhong Liu created SPARK-32816:
-----------------------------------

             Summary: Planner error when aggregating multiple distinct DECIMAL columns
                 Key: SPARK-32816
                 URL: https://issues.apache.org/jira/browse/SPARK-32816
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.0.0
            Reporter: Linhong Liu


Running different DISTINCT decimal aggregations causes a query planner error:
{code:java}
java.lang.RuntimeException: You hit a query analyzer bug. Please report your query to Spark user mailing list.
at scala.sys.package$.error(package.scala:30)
	at org.apache.spark.sql.execution.SparkStrategies$Aggregation$.apply(SparkStrategies.scala:473)
	at org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$1(QueryPlanner.scala:67)
	at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:489)
	at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:97)
	at org.apache.spark.sql.execution.SparkStrategies.plan(SparkStrategies.scala:74)
	at org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$3(QueryPlanner.scala:82)
	at scala.collection.TraversableOnce.$anonfun$foldLeft$1(TraversableOnce.scala:162)
	at scala.collection.TraversableOnce.$anonfun$foldLeft$1$adapted(TraversableOnce.scala:162)
	at scala.collection.Iterator.foreach(Iterator.scala:941)
	at scala.collection.Iterator.foreach$(Iterator.scala:941)
{code}
example failing query
{code:java}
import org.apache.spark.util.Utils

// Changing decimal(9, 0) to decimal(8, 0) fixes the problem. Root cause seems to have to do with
// UnscaledValue being used in one of the expressions but not the other.
val df = spark.range(0, 50000, 1, 1).selectExpr(
      "id",
      "cast(id as decimal(9, 0)) as ss_ext_list_price")
val cacheDir = Utils.createTempDir().getCanonicalPath
df.write.parquet(cacheDir)

spark.read.parquet(cacheDir).createOrReplaceTempView("test_table")

spark.sql("""
select
avg(distinct ss_ext_list_price), sum(distinct ss_ext_list_price)
from test_table""").explain
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org