You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Takuya Ueshin (JIRA)" <ji...@apache.org> on 2017/08/18 05:35:00 UTC

[jira] [Created] (SPARK-21781) Modify DataSourceScanExec to use concrete ColumnVector type.

Takuya Ueshin created SPARK-21781:
-------------------------------------

             Summary: Modify DataSourceScanExec to use concrete ColumnVector type.
                 Key: SPARK-21781
                 URL: https://issues.apache.org/jira/browse/SPARK-21781
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 2.3.0
            Reporter: Takuya Ueshin


As mentioned at https://github.com/apache/spark/pull/18680#issuecomment-316820409, when we have more {{ColumnVector}} implementations, it might (or might not) have huge performance implications because it might disable inlining, or force virtual dispatches.

As for read path, one of the major paths is the one generated by {{ColumnBatchScan}}. Currently it refers {{ColumnVector}} so the penalty will be bigger as we have more classes, but we can know the concrete type from its usage, e.g. vectorized Parquet reader uses {{OnHeapColumnVector}}. We can use the concrete type in the generated code directly to avoid the penalty.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org