You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Micah Kornfield (Jira)" <ji...@apache.org> on 2023/03/13 18:34:00 UTC

[jira] [Created] (SPARK-42774) Expose VectorTypes API for DataSourceV2 Batch Scans

Micah Kornfield created SPARK-42774:
---------------------------------------

             Summary: Expose VectorTypes API for DataSourceV2 Batch Scans
                 Key: SPARK-42774
                 URL: https://issues.apache.org/jira/browse/SPARK-42774
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.3.2
            Reporter: Micah Kornfield


SparkPlan's vectorType's attribute can be used to [specialize codegen|https://github.com/apache/spark/blob/5556cfc59aa97a3ad4ea0baacebe19859ec0bcb7/sql/core/src/main/scala/org/apache/spark/sql/execution/Columnar.scala#L151] however [BatchScanExecBase|https://github.com/apache/spark/blob/6b6bb6fa20f40aeedea2fb87008e9cce76c54e28/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2ScanExecBase.scala] does not override this so we DSv2 sources do not get any benefit of concrete class dispatch.

This proposes adding an override to BatchScanExecBase which delegates to a new default method on [PartitionReaderFactory|https://github.com/apache/spark/blob/f1d42bb68d6d69d9a32f91a390270f9ec33c3207/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/PartitionReaderFactory.java] to expose vectoryTypes:

{{
default Optional<Iterable<String>> getVectorTypes()

{ return Optional.empty(); } }}

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org