You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Ben Moran (JIRA)" <ji...@apache.org> on 2015/10/02 19:49:26 UTC

[jira] [Created] (SPARK-10914) Incorrect empty join sets

Ben Moran created SPARK-10914:
---------------------------------

             Summary: Incorrect empty join sets
                 Key: SPARK-10914
                 URL: https://issues.apache.org/jira/browse/SPARK-10914
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 1.5.1, 1.5.0
         Environment: Ubuntu 14.04 (spark-slave), 12.04 (master)

            Reporter: Ben Moran


Using an inner join, to match together two integer columns, I generally get no results when there should be matches.  But the results vary and depend on whether the dataframes are coming from SQL, JSON, or cached, as well as the order in which I cache things and query them.

This minimal example reproduces it consistently for me in the spark-shell, on new installs of both 1.5.0 and 1.5.1 (pre-built against Hadoop 2.6 from http://spark.apache.org/downloads.html.)

/* x is {"xx":1}{"xx":2} and y is just {"yy":1}{"yy:2} */
val x = sql("select 1 xx union all select 2") 
val y = sql("select 1 yy union all select 2")

x.join(y, $"xx" === $"yy").count() /* expect 2, get 0 */
/* If I cache both tables it works: */
x.cache()
y.cache()
x.join(y, $"xx" === $"yy").count() /* expect 2, get 2 */

/* but this still doesn't work: */
x.join(y, $"xx" === $"yy").filter("yy=1").count() /* expect 1, get 0 */




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org