You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Thomas Graves (Jira)" <ji...@apache.org> on 2020/07/16 18:00:00 UTC
[jira] [Created] (SPARK-32334) Investigate commonizing Columnar and
Row data transformations
Thomas Graves created SPARK-32334:
-------------------------------------
Summary: Investigate commonizing Columnar and Row data transformations
Key: SPARK-32334
URL: https://issues.apache.org/jira/browse/SPARK-32334
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 3.0.0
Reporter: Thomas Graves
We introduced more Columnar Support with SPARK-27396.
With that we recognized that there is code that is doing very similar transformations from ColumnarBatch or Arrow into InternalRow and vice versa. For instance: [https://github.com/apache/spark/blob/a4382f7fe1c36a51c64f460c6cb91e93470e0825/sql/core/src/main/scala/org/apache/spark/sql/execution/Columnar.scala#L56-L58]
[https://github.com/apache/spark/blob/a4382f7fe1c36a51c64f460c6cb91e93470e0825/sql/core/src/main/scala/org/apache/spark/sql/execution/Columnar.scala#L389]
We should investigate if we can commonize that code.
We are also looking at making the internal caching serialization pluggable to allow for different cache implementations. ([https://github.com/apache/spark/pull/29067).]
It was recently brought up that we should investigate if using the data source v2 api makes sense and is feasible for some of these transformations to allow it to be easily extended.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org