You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Ravindra Bajpai (JIRA)" <ji...@apache.org> on 2017/03/18 01:47:41 UTC
[jira] [Created] (SPARK-20008)
hiveContext.emptyDataFrame.except(hiveContext.emptyDataFrame).count()
returns 1
Ravindra Bajpai created SPARK-20008:
---------------------------------------
Summary: hiveContext.emptyDataFrame.except(hiveContext.emptyDataFrame).count() returns 1
Key: SPARK-20008
URL: https://issues.apache.org/jira/browse/SPARK-20008
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 2.0.2
Reporter: Ravindra Bajpai
hiveContext.emptyDataFrame.except(hiveContext.emptyDataFrame).count() yields 1 against expected 0.
This was not the case with spark 1.5.2. This is an api change from usage point of view and hence I consider this as a bug. May be a boundary case, not sure.
Work around - For now I check the counts != 0 before this operation. Not good for performance. Hence creating a jira to track it.
As Young Zhang explained in reply to my mail -
Starting from Spark 2, these kind of operation are implemented in left anti join, instead of using RDD operation directly.
Same issue also on sqlContext.
scala> spark.version
res25: String = 2.0.2
spark.sqlContext.emptyDataFrame.except(spark.sqlContext.emptyDataFrame).explain(true)
== Physical Plan ==
*HashAggregate(keys=[], functions=[], output=[])
+- Exchange SinglePartition
+- *HashAggregate(keys=[], functions=[], output=[])
+- BroadcastNestedLoopJoin BuildRight, LeftAnti, false
:- Scan ExistingRDD[]
+- BroadcastExchange IdentityBroadcastMode
+- Scan ExistingRDD[]
This arguably means a bug. But my guess is liking the logic of comparing NULL = NULL, should it return true or false, causing this kind of confusion.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org