You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Micah Kornfield (Jira)" <ji...@apache.org> on 2023/03/13 18:34:00 UTC
[jira] [Created] (SPARK-42774) Expose VectorTypes API for DataSourceV2 Batch Scans
Micah Kornfield created SPARK-42774:
---------------------------------------
Summary: Expose VectorTypes API for DataSourceV2 Batch Scans
Key: SPARK-42774
URL: https://issues.apache.org/jira/browse/SPARK-42774
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 3.3.2
Reporter: Micah Kornfield
SparkPlan's vectorType's attribute can be used to [specialize codegen|https://github.com/apache/spark/blob/5556cfc59aa97a3ad4ea0baacebe19859ec0bcb7/sql/core/src/main/scala/org/apache/spark/sql/execution/Columnar.scala#L151] however [BatchScanExecBase|https://github.com/apache/spark/blob/6b6bb6fa20f40aeedea2fb87008e9cce76c54e28/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2ScanExecBase.scala] does not override this so we DSv2 sources do not get any benefit of concrete class dispatch.
This proposes adding an override to BatchScanExecBase which delegates to a new default method on [PartitionReaderFactory|https://github.com/apache/spark/blob/f1d42bb68d6d69d9a32f91a390270f9ec33c3207/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/PartitionReaderFactory.java] to expose vectoryTypes:
{{
default Optional<Iterable<String>> getVectorTypes()
{ return Optional.empty(); } }}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org