You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/03/12 08:15:00 UTC
[jira] [Comment Edited] (SPARK-26858) Vectorized gapplyCollect, Arrow optimization in native R function execution

    [ https://issues.apache.org/jira/browse/SPARK-26858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16790340#comment-16790340 ] 

Hyukjin Kwon edited comment on SPARK-26858 at 3/12/19 8:14 AM:
---------------------------------------------------------------

Here is the initial try I made, https://github.com/apache/spark/compare/master...HyukjinKwon:SPARK-26858-1?expand=1
This looks hacky but simple

Here is another try which I believe 3) way: https://github.com/apache/spark/compare/master...HyukjinKwon:SPARK-26858-2?expand=1
This looks less hacky but can't reuse code paths; the code size is bigger. (This one, I a bit rushed to write to show the rough idea. I am sure the actual code size is bigger).

2) way looks virtually a mix of 1) and 3) - looks both hacky and code size is bigger.

I can do it if you guys feel strongly about it.
My impression is that the workaround is easy and we could do it later when it's actually requested but don't feel strongly given that this API already existed in SparkR side.


was (Author: hyukjin.kwon):
Here is the initial try I made, https://github.com/apache/spark/compare/master...HyukjinKwon:SPARK-26858-1?expand=1
This looks hacky but simple

Here is another try which I believe 3) way: https://github.com/apache/spark/compare/master...HyukjinKwon:SPARK-26858-2?expand=1
This looks less hacky but can't reuse code paths; the code size is bigger.

2) way looks virtually a mix of 1) and 3) - looks both hacky and code size is bigger.

I can do it if you guys feel strongly about it.
My impression is that the workaround is easy and we could do it later when it's actually requested but don't feel strongly given that this API already existed in SparkR side.

> Vectorized gapplyCollect, Arrow optimization in native R function execution
> ---------------------------------------------------------------------------
>
>                 Key: SPARK-26858
>                 URL: https://issues.apache.org/jira/browse/SPARK-26858
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SparkR, SQL
>    Affects Versions: 3.0.0
>            Reporter: Hyukjin Kwon
>            Assignee: Hyukjin Kwon
>            Priority: Major
>
> Unlike gapply, gapplyCollect requires additional ser/de steps because it can omit the schema, and Spark SQL doesn't know the return type before actually execution happens.
> In original code path, it's done via using binary schema. Once gapply is done (SPARK-26761). we can mimic this approach in vectorized gapply to support gapplyCollect.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org