You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2015/10/30 23:00:28 UTC

[jira] [Reopened] (SPARK-11430) DataFrame's except method does not work, returns 0

     [ https://issues.apache.org/jira/browse/SPARK-11430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen reopened SPARK-11430:
-------------------------------

[~ramk256] the right resolution is duplicate

> DataFrame's except method does not work, returns 0
> --------------------------------------------------
>
>                 Key: SPARK-11430
>                 URL: https://issues.apache.org/jira/browse/SPARK-11430
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.5.0
>            Reporter: Ram Kandasamy
>
> This may or may not be related to this bug here: https://issues.apache.org/jira/browse/SPARK-11427
> But basically, the except method in dataframes should mirror the functionality of the subtract method in RDDs, but it is not doing so.
> Here is an example:
> scala> val firstFile = sqlContext.read.parquet("/Users/ramkandasamy/sparkData/2015-07-25/*").select("id").distinct
> firstFile: org.apache.spark.sql.DataFrame = [id: string]
> scala> val secondFile = sqlContext.read.parquet("/Users/ramkandasamy/sparkData/2015-10-23/*").select("id").distinct
> secondFile: org.apache.spark.sql.DataFrame = [id: string]
> scala> firstFile.count
> res1: Long = 1072046
> scala> secondFile.count
> res2: Long = 3569941
> scala> firstFile.except(secondFile).count
> res3: Long = 0
> scala> firstFile.rdd.subtract(secondFile.rdd).count
> res4: Long = 1072046
> Can anyone help out here? Thanks!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org