You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Narine Kokhlikyan (JIRA)" <ji...@apache.org> on 2015/10/05 20:17:26 UTC

[jira] [Commented] (SPARK-9318) Add `merge` as synonym for join

    [ https://issues.apache.org/jira/browse/SPARK-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14943784#comment-14943784 ] 

Narine Kokhlikyan commented on SPARK-9318:
------------------------------------------

Hi all,

[~shivaram], [~falaki], 
I am working on the new signature for merge and have noticed that the join in general has serous issues.
I took one of the examples from R base:::merge -  https://stat.ethz.ch/R-manual/R-devel/library/base/html/merge.html

x <- data.frame(k1 = c(NA,NA,3,4,5), k2 = c(1,NA,NA,4,5), data = 1:5)
y <- data.frame(k1 = c(NA,2,NA,4,5), k2 = c(NA,NA,3,4,5), data = 1:5)

I want to do join on this two dataframes: res <- join(xdf,ydf)

res has the following structure:
DataFrame[k1:double, k2:double, data:int, k1:double, k2:double, data:int]

but when I do head(res) I get the following:
 k1 k2 data
1 NA NA    1
2  2 NA    2
3 NA  3    3
4  4  4    4
5  5  5    5
6 NA NA    1

This is not what I was expecting. The structure is inconsistent with the content/data I see with head.

I tried to put aliases for those columns which have the same names for both data frames with: 

ydfsel <- select(ydf, alias(ydf$k1,"k1.y"), alias(ydf$k2,"k2.y"), alias(ydf$data,"data.y"))
xdfsel <- select(xdf, alias(xdf$k1,"k1.x"), alias(xdf$k2,"k2.x"), alias(xdf$data,"data.x"))

and this actually works and when I do: join(xdfsel, ydfsel ) - this also works 

but the following fails:
join(xdfsel,ydfsel,xdfsel$k1.x==ydfsel$k1.y)

This means that I cannot refer to alias column??

Do you know what the issue here is ? 

Thanks,
Narine



 

> Add `merge` as synonym for join
> -------------------------------
>
>                 Key: SPARK-9318
>                 URL: https://issues.apache.org/jira/browse/SPARK-9318
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SparkR
>            Reporter: Shivaram Venkataraman
>            Assignee: Hossein Falaki
>             Fix For: 1.5.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org