You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Elazar Gershuni (JIRA)" <ji...@apache.org> on 2016/01/05 09:41:40 UTC

[jira] [Comment Edited] (SPARK-12623) map key_values to values

    [ https://issues.apache.org/jira/browse/SPARK-12623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15082639#comment-15082639 ] 

Elazar Gershuni edited comment on SPARK-12623 at 1/5/16 8:41 AM:
-----------------------------------------------------------------

That does not answer the question/feature request. Mapping values to values can be achieved by similar code to the one you suggested:

{code}
rdd.map { case (key, value) => (key, myFunctionOf(value)) }
{code}

Yet Spark does provide {{rdd.mapValues()}}, for performance reasons (retaining the partitioning - avoiding the need to reshuffle when the key does not change).
I would like to enjoy similar benefits for my case too. The code that you suggested does not, since spark cannot know that the key does not change.

I'm sorry if that's not the place for the question/feature request, but it really isn't a user question.


was (Author: elazar):
That does not answer the question/feature request. Mapping values to values can be achieved by similar code to the one you suggested:

rdd.map { case (key, value) => (key, myFunctionOf(value)) }

Yet Spark does provide rdd.mapValues(), for performance reasons (retaining the partitioning - avoiding the need to reshuffle when the key does not change).
I would like to enjoy similar benefits for my case too. The code that you suggested does not, since spark cannot know that the key does not change.

I'm sorry if that's not the place for the question/feature request, but it really isn't a user question.

> map key_values to values
> ------------------------
>
>                 Key: SPARK-12623
>                 URL: https://issues.apache.org/jira/browse/SPARK-12623
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core
>            Reporter: Elazar Gershuni
>            Priority: Minor
>              Labels: easyfix, features, performance
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> Why doesn't the argument to mapValues() take a key as an agument? Alternatively, can we have a "mapKeyValuesToValues" that does?
> Use case: I want to write a simpler analyzer that takes the argument to map(), and analyze it to see whether it (trivially) doesn't change the key, e.g. 
> g = lambda kv: (kv[0], f(kv[0], kv[1]))
> rdd.map(g)
> Problem is, if I find that it is the case, I can't call mapValues() with that function, as in `rdd.mapValues(lambda kv: g(kv)[1])`, since mapValues receives only `v` as an argument.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org