You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Reynold Xin (JIRA)" <ji...@apache.org> on 2016/06/01 05:25:13 UTC

[jira] [Created] (SPARK-15687) Columnar execution engine

Reynold Xin created SPARK-15687:
-----------------------------------

             Summary: Columnar execution engine
                 Key: SPARK-15687
                 URL: https://issues.apache.org/jira/browse/SPARK-15687
             Project: Spark
          Issue Type: New Feature
          Components: SQL
            Reporter: Reynold Xin


This ticket tracks progress in making the entire engine columnar, especially in the context of nested data type support.

In Spark 2.0, we have used the internal column batch interface in Parquet reading (via a vectorized Parquet decoder) and low cardinality aggregation. Other parts of the engine are already using whole-stage code generation, which is in many ways more efficient than a columnar execution engine for flat data types.

The goal here is to figure out a story to work towards making column batch the common data exchange format between operators outside whole-stage code generation, as well as with external systems (e.g. Pandas).

Some of the important questions to answer are:

- What is the end state architecture?
- Should aggregation be columnar?
- Should sorting be columnar?
- How do we handle nested data types?
- What is the transition plan towards the end state?




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org