You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by do...@apache.org on 2022/07/12 16:20:43 UTC
[spark] branch master updated: [SPARK-39706][SQL] Set missing column with defaultValue as constant in `ParquetColumnVector`
This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new e0d4ef4b0bd [SPARK-39706][SQL] Set missing column with defaultValue as constant in `ParquetColumnVector`
e0d4ef4b0bd is described below
commit e0d4ef4b0bd2c8641b830106b0cb6063351ad5da
Author: yangjie01 <ya...@baidu.com>
AuthorDate: Tue Jul 12 09:20:24 2022 -0700
[SPARK-39706][SQL] Set missing column with defaultValue as constant in `ParquetColumnVector`
### What changes were proposed in this pull request?
The change of this pr is add `vector.setIsConstant()` when missing column with defaultValue and `vector.appendObjects(capacity, defaultValue).isPresent()` is true during `ParquetColumnVector` initialization.
### Why are the changes needed?
This is just a minor improvement, for the missing column with default value, setting isConstant to true can will prevent the `reset()` method from restoring the internal state of `WritableColumnVector`. `OrcColumnarBatchReader` has done similar things to missing column.
https://github.com/apache/spark/blob/bb4c4778713c7ba1ee92d0bb0763d7d3ce54374f/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnarBatchReader.java#L178-L191
Without this change, there will be no bug, because missing column will only be initialized once and the corresponding columnReader is null, the reset() method will only reset `.WritableColumnVector#elementsAppended` to 0, but this will not affect anything.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Pass GitHub Actions
Closes #37115 from LuciferYang/setIsConstant.
Lead-authored-by: yangjie01 <ya...@baidu.com>
Co-authored-by: YangJie <ya...@baidu.com>
Signed-off-by: Dongjoon Hyun <do...@apache.org>
---
.../spark/sql/execution/datasources/parquet/ParquetColumnVector.java | 2 ++
1 file changed, 2 insertions(+)
diff --git a/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/ParquetColumnVector.java b/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/ParquetColumnVector.java
index 2ad8cdfcca6..47774e0a397 100644
--- a/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/ParquetColumnVector.java
+++ b/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/ParquetColumnVector.java
@@ -89,6 +89,8 @@ final class ParquetColumnVector {
throw new IllegalArgumentException("Cannot assign default column value to result " +
"column batch in vectorized Parquet reader because the data type is not supported: " +
defaultValue);
+ } else {
+ vector.setIsConstant();
}
}
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org