You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Thomas Graves (Jira)" <ji...@apache.org> on 2020/07/16 18:00:00 UTC

[jira] [Created] (SPARK-32334) Investigate commonizing Columnar and Row data transformations

Thomas Graves created SPARK-32334:
-------------------------------------

             Summary: Investigate commonizing Columnar and Row data transformations 
                 Key: SPARK-32334
                 URL: https://issues.apache.org/jira/browse/SPARK-32334
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.0.0
            Reporter: Thomas Graves


We introduced more Columnar Support with SPARK-27396.

With that we recognized that there is code that is doing very similar transformations from ColumnarBatch or Arrow into InternalRow and vice versa.  For instance: [https://github.com/apache/spark/blob/a4382f7fe1c36a51c64f460c6cb91e93470e0825/sql/core/src/main/scala/org/apache/spark/sql/execution/Columnar.scala#L56-L58]

[https://github.com/apache/spark/blob/a4382f7fe1c36a51c64f460c6cb91e93470e0825/sql/core/src/main/scala/org/apache/spark/sql/execution/Columnar.scala#L389]

We should investigate if we can commonize that code.

We are also looking at making the internal caching serialization pluggable to allow for different cache implementations. ([https://github.com/apache/spark/pull/29067).] 

It was recently brought up that we should investigate if using the data source v2 api makes sense and is feasible for some of these transformations to allow it to be easily extended.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org