You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Wenchen Fan (JIRA)" <ji...@apache.org> on 2018/10/25 02:21:00 UTC

[jira] [Comment Edited] (SPARK-25829) Duplicated map keys are not handled consistently

    [ https://issues.apache.org/jira/browse/SPARK-25829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16663103#comment-16663103 ] 

Wenchen Fan edited comment on SPARK-25829 at 10/25/18 2:20 AM:
---------------------------------------------------------------

After more thoughts, both the map lookup behavior and `Dataset.collect` behavior are visible to end-users. It's hard to say which one is the official semantic as there is no doc, and we have to do behavior change for one of them.

If we want to stick with the "earlier entry wins" semantic, then we need to fix the 3 sub-tasks listed here.

If we want to stick with the "later entry wins" semantic, then we need to fix the map lookup(GetMapValue) and other related functions like `map_filter`, or deduplicate map keys at all the places that may create map. And for 2.4 we should revert these function if they are newly added, like `map_filter`.

Any ideas? cc [~rxin] [~LI,Xiao] [~dongjoon] [~viirya] [~mgaido]


was (Author: cloud_fan):
After more thoughts, both the map lookup behavior and `Dataset.collect` behavior are visible to end-users. It's hard to say which one is the official semantic as there is no doc, and we have to do behavior change for one of them.

If we want to stick with the "earlier entry wins" semantic, then we need to fix the 3 sub-tasks listed here.

If we want to stick with the "later entry wins" semantic, then we need to fix the map lookup(GetMapValue) and other related functions like `map_filter`. And for 2.4 we should revert these function if they are newly added, like `map_filter`.

Any ideas? cc [~rxin] [~LI,Xiao] [~dongjoon] [~viirya] [~mgaido]

> Duplicated map keys are not handled consistently
> ------------------------------------------------
>
>                 Key: SPARK-25829
>                 URL: https://issues.apache.org/jira/browse/SPARK-25829
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.4.0
>            Reporter: Wenchen Fan
>            Priority: Major
>
> In Spark SQL, we apply "earlier entry wins" semantic to duplicated map keys. e.g.
> {code}
> scala> sql("SELECT map(1,2,1,3)[1]").show
> +------------------+
> |map(1, 2, 1, 3)[1]|
> +------------------+
> |                 2|
> +------------------+
> {code}
> However, this handling is not applied consistently.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org