You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Narine Kokhlikyan (JIRA)" <ji...@apache.org> on 2015/10/05 20:17:26 UTC
[jira] [Commented] (SPARK-9318) Add `merge` as synonym for join
[ https://issues.apache.org/jira/browse/SPARK-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14943784#comment-14943784 ]
Narine Kokhlikyan commented on SPARK-9318:
------------------------------------------
Hi all,
[~shivaram], [~falaki],
I am working on the new signature for merge and have noticed that the join in general has serous issues.
I took one of the examples from R base:::merge - https://stat.ethz.ch/R-manual/R-devel/library/base/html/merge.html
x <- data.frame(k1 = c(NA,NA,3,4,5), k2 = c(1,NA,NA,4,5), data = 1:5)
y <- data.frame(k1 = c(NA,2,NA,4,5), k2 = c(NA,NA,3,4,5), data = 1:5)
I want to do join on this two dataframes: res <- join(xdf,ydf)
res has the following structure:
DataFrame[k1:double, k2:double, data:int, k1:double, k2:double, data:int]
but when I do head(res) I get the following:
k1 k2 data
1 NA NA 1
2 2 NA 2
3 NA 3 3
4 4 4 4
5 5 5 5
6 NA NA 1
This is not what I was expecting. The structure is inconsistent with the content/data I see with head.
I tried to put aliases for those columns which have the same names for both data frames with:
ydfsel <- select(ydf, alias(ydf$k1,"k1.y"), alias(ydf$k2,"k2.y"), alias(ydf$data,"data.y"))
xdfsel <- select(xdf, alias(xdf$k1,"k1.x"), alias(xdf$k2,"k2.x"), alias(xdf$data,"data.x"))
and this actually works and when I do: join(xdfsel, ydfsel ) - this also works
but the following fails:
join(xdfsel,ydfsel,xdfsel$k1.x==ydfsel$k1.y)
This means that I cannot refer to alias column??
Do you know what the issue here is ?
Thanks,
Narine
> Add `merge` as synonym for join
> -------------------------------
>
> Key: SPARK-9318
> URL: https://issues.apache.org/jira/browse/SPARK-9318
> Project: Spark
> Issue Type: Sub-task
> Components: SparkR
> Reporter: Shivaram Venkataraman
> Assignee: Hossein Falaki
> Fix For: 1.5.0
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org