You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Reynold Xin (JIRA)" <ji...@apache.org> on 2016/06/01 05:25:13 UTC

[jira] [Updated] (SPARK-15687) Columnar execution engine

     [ https://issues.apache.org/jira/browse/SPARK-15687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Reynold Xin updated SPARK-15687:
--------------------------------
    Priority: Critical  (was: Major)

> Columnar execution engine
> -------------------------
>
>                 Key: SPARK-15687
>                 URL: https://issues.apache.org/jira/browse/SPARK-15687
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>            Reporter: Reynold Xin
>            Priority: Critical
>
> This ticket tracks progress in making the entire engine columnar, especially in the context of nested data type support.
> In Spark 2.0, we have used the internal column batch interface in Parquet reading (via a vectorized Parquet decoder) and low cardinality aggregation. Other parts of the engine are already using whole-stage code generation, which is in many ways more efficient than a columnar execution engine for flat data types.
> The goal here is to figure out a story to work towards making column batch the common data exchange format between operators outside whole-stage code generation, as well as with external systems (e.g. Pandas).
> Some of the important questions to answer are:
> - What is the end state architecture?
> - Should aggregation be columnar?
> - Should sorting be columnar?
> - How do we handle nested data types?
> - What is the transition plan towards the end state?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org