You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Ram Kandasamy (JIRA)" <ji...@apache.org> on 2015/10/31 23:27:27 UTC

[jira] [Resolved] (SPARK-11427) DataFrame's intersect method does not work, returns 1

     [ https://issues.apache.org/jira/browse/SPARK-11427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ram Kandasamy resolved SPARK-11427.
-----------------------------------
    Resolution: Duplicate

> DataFrame's intersect method does not work, returns 1
> -----------------------------------------------------
>
>                 Key: SPARK-11427
>                 URL: https://issues.apache.org/jira/browse/SPARK-11427
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.5.0
>            Reporter: Ram Kandasamy
>
> Hello,
>     I was working with dataframes and I found the intersect() method seems to always return '1'. The RDD's intersection() method does work properly.
> Consider this example:
> scala> val firstFile = sqlContext.read.parquet("/Users/ramkandasamy/sparkData/2015-07-25/*").select("id").distinct
> firstFile: org.apache.spark.sql.DataFrame = [id: string]
> scala> firstFile.count
> res4: Long = 1072046
> scala> firstFile.intersect(firstFile).count
> res5: Long = 1
> scala> firstFile.rdd.intersection(firstFile.rdd).count
> res6: Long = 1072046
> I have tried various different cases, and for some reason, the dataframe's intersect method always returns 1. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org