You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Linhong Liu (Jira)" <ji...@apache.org> on 2020/09/08 06:35:00 UTC
[jira] [Created] (SPARK-32816) Planner error when aggregating
multiple distinct DECIMAL columns
Linhong Liu created SPARK-32816:
-----------------------------------
Summary: Planner error when aggregating multiple distinct DECIMAL columns
Key: SPARK-32816
URL: https://issues.apache.org/jira/browse/SPARK-32816
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 3.0.0
Reporter: Linhong Liu
Running different DISTINCT decimal aggregations causes a query planner error:
{code:java}
java.lang.RuntimeException: You hit a query analyzer bug. Please report your query to Spark user mailing list.
at scala.sys.package$.error(package.scala:30)
at org.apache.spark.sql.execution.SparkStrategies$Aggregation$.apply(SparkStrategies.scala:473)
at org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$1(QueryPlanner.scala:67)
at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:489)
at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:97)
at org.apache.spark.sql.execution.SparkStrategies.plan(SparkStrategies.scala:74)
at org.apache.spark.sql.catalyst.planning.QueryPlanner.$anonfun$plan$3(QueryPlanner.scala:82)
at scala.collection.TraversableOnce.$anonfun$foldLeft$1(TraversableOnce.scala:162)
at scala.collection.TraversableOnce.$anonfun$foldLeft$1$adapted(TraversableOnce.scala:162)
at scala.collection.Iterator.foreach(Iterator.scala:941)
at scala.collection.Iterator.foreach$(Iterator.scala:941)
{code}
example failing query
{code:java}
import org.apache.spark.util.Utils
// Changing decimal(9, 0) to decimal(8, 0) fixes the problem. Root cause seems to have to do with
// UnscaledValue being used in one of the expressions but not the other.
val df = spark.range(0, 50000, 1, 1).selectExpr(
"id",
"cast(id as decimal(9, 0)) as ss_ext_list_price")
val cacheDir = Utils.createTempDir().getCanonicalPath
df.write.parquet(cacheDir)
spark.read.parquet(cacheDir).createOrReplaceTempView("test_table")
spark.sql("""
select
avg(distinct ss_ext_list_price), sum(distinct ss_ext_list_price)
from test_table""").explain
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org