You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by su...@apache.org on 2022/06/29 16:22:27 UTC
[spark] branch master updated: [SPARK-39231][SQL] Use `ConstantColumnVector` instead of `On/OffHeapColumnVector` to store partition columns in `VectorizedParquetRecordReader`
This is an automated email from the ASF dual-hosted git repository.
sunchao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new c0a12cfbb82 [SPARK-39231][SQL] Use `ConstantColumnVector` instead of `On/OffHeapColumnVector` to store partition columns in `VectorizedParquetRecordReader`
c0a12cfbb82 is described below
commit c0a12cfbb82a710d786158bcd70a7f7bd531751b
Author: yangjie01 <ya...@baidu.com>
AuthorDate: Wed Jun 29 09:22:14 2022 -0700
[SPARK-39231][SQL] Use `ConstantColumnVector` instead of `On/OffHeapColumnVector` to store partition columns in `VectorizedParquetRecordReader`
### What changes were proposed in this pull request?
This pr change to use `ConstantColumnVector` to store partition columns in `VectorizedParquetRecordReader` because partition column vector always constant vector.
### Why are the changes needed?
1. Partition columns vector alway constant vector.
2. **Performance improvement**: `ConstantColumnVector` has better reading and writing performance than `OnHeapColumnVector` and `OffHeapColumnVector`. From the microbench results, the performance improvement is obvious for `StringType` : the read throughput is increased by about 2 times, and the write throughput is increased by more than 100 times.
3. **Memory saving**: `ConstantColumnVector` saves more memory than `OnHeapColumnVector` and `OffHeapColumnVector`, for `UTF8String` type Vector with length of 4096(default `batchSize`), 'ConstantColumnVector' can save more than 90% of memory compared with `OnHeapColumnVector`:
- - `ConstantColumnVector` only stores an `UTF8String`
- - `OnHeapColumnVector` needs `arrayOffsets(int[4096])` + `arrayLengths(int[4096])` + `(UTF8String * 4096)`
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
- Pass Github Action
- Add new UTs to test the new method introduced by this pr: `ColumnVectorUtils.fill(ConstantColumnVector col, InternalRow row, int fieldIdx)`
- Add new micro benchmark to compare the read and write performance of constant vector(simulate partition column scene) between `OnHeapColumnVector`, `OffHeapColumnVector` and `ConstantColumnVector`
Closes #36616 from LuciferYang/SPARK-39231.
Authored-by: yangjie01 <ya...@baidu.com>
Signed-off-by: Chao Sun <su...@apple.com>
---
...ConstantColumnVectorBenchmark-jdk11-results.txt | 280 +++++++++++++++++++
...ConstantColumnVectorBenchmark-jdk17-results.txt | 280 +++++++++++++++++++
.../ConstantColumnVectorBenchmark-results.txt | 280 +++++++++++++++++++
.../parquet/VectorizedParquetRecordReader.java | 21 +-
.../execution/vectorized/ColumnVectorUtils.java | 89 +++++++
.../execution/vectorized/ConstantColumnVector.java | 13 +
.../datasources/parquet/ParquetFileFormat.scala | 6 +-
.../benchmark/ConstantColumnVectorBenchmark.scala | 296 +++++++++++++++++++++
.../vectorized/ColumnVectorUtilsSuite.scala | 163 ++++++++++++
9 files changed, 1414 insertions(+), 14 deletions(-)
diff --git a/sql/core/benchmarks/ConstantColumnVectorBenchmark-jdk11-results.txt b/sql/core/benchmarks/ConstantColumnVectorBenchmark-jdk11-results.txt
new file mode 100644
index 00000000000..775f5499bf8
--- /dev/null
+++ b/sql/core/benchmarks/ConstantColumnVectorBenchmark-jdk11-results.txt
@@ -0,0 +1,280 @@
+OpenJDK 64-Bit Server VM 11.0.15+10-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test write with StringType, row length = 1: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+--------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1 1 0 583637.9 0.0 1.0X
+OnHeapColumnVector 3638 3638 0 112.6 8.9 0.0X
+OffHeapColumnVector 4601 4602 0 89.0 11.2 0.0X
+
+OpenJDK 64-Bit Server VM 11.0.15+10-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test write with StringType, row length = 5: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+--------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 2 2 0 266508.4 0.0 1.0X
+OnHeapColumnVector 4721 4721 0 86.8 11.5 0.0X
+OffHeapColumnVector 6553 6553 0 62.5 16.0 0.0X
+
+OpenJDK 64-Bit Server VM 11.0.15+10-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test write with StringType, row length = 10: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+---------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 2 2 0 266508.9 0.0 1.0X
+OnHeapColumnVector 5220 5224 6 78.5 12.7 0.0X
+OffHeapColumnVector 6510 6516 8 62.9 15.9 0.0X
+
+OpenJDK 64-Bit Server VM 11.0.15+10-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test write with StringType, row length = 15: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+---------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1 1 0 583804.3 0.0 1.0X
+OnHeapColumnVector 4747 4747 0 86.3 11.6 0.0X
+OffHeapColumnVector 7055 7057 3 58.1 17.2 0.0X
+
+OpenJDK 64-Bit Server VM 11.0.15+10-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test write with StringType, row length = 20: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+---------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 2 2 0 266508.6 0.0 1.0X
+OnHeapColumnVector 4929 4930 0 83.1 12.0 0.0X
+OffHeapColumnVector 6588 6589 1 62.2 16.1 0.0X
+
+OpenJDK 64-Bit Server VM 11.0.15+10-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test write with StringType, row length = 30: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+---------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 2 2 0 266508.4 0.0 1.0X
+OnHeapColumnVector 5300 5301 2 77.3 12.9 0.0X
+OffHeapColumnVector 6788 6790 2 60.3 16.6 0.0X
+
+OpenJDK 64-Bit Server VM 11.0.15+10-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test write with IntegerType: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 0 0 0 943120.4 0.0 1.0X
+OnHeapColumnVector 10 10 0 39537.6 0.0 0.0X
+OffHeapColumnVector 139 139 0 2947.3 0.3 0.0X
+
+OpenJDK 64-Bit Server VM 11.0.15+10-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test write with LongType: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1 1 0 766174.6 0.0 1.0X
+OnHeapColumnVector 43 45 1 9504.2 0.1 0.0X
+OffHeapColumnVector 156 158 1 2622.8 0.4 0.0X
+
+OpenJDK 64-Bit Server VM 11.0.15+10-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test write with FloatType: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1 1 0 817233.7 0.0 1.0X
+OnHeapColumnVector 10 10 0 40738.5 0.0 0.0X
+OffHeapColumnVector 139 139 0 2944.1 0.3 0.0X
+
+OpenJDK 64-Bit Server VM 11.0.15+10-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test write with DoubleType: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1 1 0 681073.0 0.0 1.0X
+OnHeapColumnVector 39 42 1 10369.7 0.1 0.0X
+OffHeapColumnVector 156 158 1 2633.1 0.4 0.0X
+
+OpenJDK 64-Bit Server VM 11.0.15+10-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test read with StringType, row length = 1: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+-------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1190 1192 2 344.1 2.9 1.0X
+OnHeapColumnVector 2265 2268 4 180.8 5.5 0.5X
+OffHeapColumnVector 4599 4605 8 89.1 11.2 0.3X
+
+OpenJDK 64-Bit Server VM 11.0.15+10-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test read with StringType, row length = 5: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+-------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1192 1192 0 343.5 2.9 1.0X
+OnHeapColumnVector 5648 5650 2 72.5 13.8 0.2X
+OffHeapColumnVector 4608 4609 1 88.9 11.3 0.3X
+
+OpenJDK 64-Bit Server VM 11.0.15+10-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test read with StringType, row length = 10: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+--------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1192 1194 3 343.7 2.9 1.0X
+OnHeapColumnVector 5840 5844 6 70.1 14.3 0.2X
+OffHeapColumnVector 4602 4610 11 89.0 11.2 0.3X
+
+OpenJDK 64-Bit Server VM 11.0.15+10-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test read with StringType, row length = 15: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+--------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1199 1202 5 341.6 2.9 1.0X
+OnHeapColumnVector 5634 5636 4 72.7 13.8 0.2X
+OffHeapColumnVector 4600 4603 4 89.0 11.2 0.3X
+
+OpenJDK 64-Bit Server VM 11.0.15+10-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test read with StringType, row length = 20: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+--------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1193 1194 1 343.2 2.9 1.0X
+OnHeapColumnVector 5624 5629 6 72.8 13.7 0.2X
+OffHeapColumnVector 4599 4601 3 89.1 11.2 0.3X
+
+OpenJDK 64-Bit Server VM 11.0.15+10-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test read with StringType, row length = 30: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+--------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1192 1193 1 343.7 2.9 1.0X
+OnHeapColumnVector 5623 5625 3 72.8 13.7 0.2X
+OffHeapColumnVector 4611 4617 8 88.8 11.3 0.3X
+
+OpenJDK 64-Bit Server VM 11.0.15+10-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test read with IntegerType: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1161 1161 0 352.8 2.8 1.0X
+OnHeapColumnVector 1722 1728 8 237.9 4.2 0.7X
+OffHeapColumnVector 2839 2842 4 144.3 6.9 0.4X
+
+OpenJDK 64-Bit Server VM 11.0.15+10-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test read with LongType: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1351 1353 2 303.1 3.3 1.0X
+OnHeapColumnVector 1598 1599 0 256.2 3.9 0.8X
+OffHeapColumnVector 2842 2845 4 144.1 6.9 0.5X
+
+OpenJDK 64-Bit Server VM 11.0.15+10-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test read with FloatType: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1163 1164 2 352.2 2.8 1.0X
+OnHeapColumnVector 1361 1363 2 300.9 3.3 0.9X
+OffHeapColumnVector 2522 2525 4 162.4 6.2 0.5X
+
+OpenJDK 64-Bit Server VM 11.0.15+10-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test read with DoubleType: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1214 1214 0 337.5 3.0 1.0X
+OnHeapColumnVector 1478 1482 6 277.1 3.6 0.8X
+OffHeapColumnVector 2021 2021 0 202.7 4.9 0.6X
+
+OpenJDK 64-Bit Server VM 11.0.15+10-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test write and read with StringType, row length = 1: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+-----------------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 2418 2420 3 169.4 5.9 1.0X
+OnHeapColumnVector 5865 5867 2 69.8 14.3 0.4X
+OffHeapColumnVector 4640 4645 7 88.3 11.3 0.5X
+
+OpenJDK 64-Bit Server VM 11.0.15+10-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test write and read with StringType, row length = 5: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+-----------------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 2365 2366 1 173.2 5.8 1.0X
+OnHeapColumnVector 5852 5852 1 70.0 14.3 0.4X
+OffHeapColumnVector 4639 4641 2 88.3 11.3 0.5X
+
+OpenJDK 64-Bit Server VM 11.0.15+10-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test write and read with StringType, row length = 10: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 2365 2370 7 173.2 5.8 1.0X
+OnHeapColumnVector 5863 5864 0 69.9 14.3 0.4X
+OffHeapColumnVector 4639 4642 4 88.3 11.3 0.5X
+
+OpenJDK 64-Bit Server VM 11.0.15+10-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test write and read with StringType, row length = 15: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 2364 2365 1 173.3 5.8 1.0X
+OnHeapColumnVector 5842 5844 2 70.1 14.3 0.4X
+OffHeapColumnVector 4636 4642 9 88.4 11.3 0.5X
+
+OpenJDK 64-Bit Server VM 11.0.15+10-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test write and read with StringType, row length = 20: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 2363 2363 1 173.4 5.8 1.0X
+OnHeapColumnVector 5860 5864 6 69.9 14.3 0.4X
+OffHeapColumnVector 4646 4646 0 88.2 11.3 0.5X
+
+OpenJDK 64-Bit Server VM 11.0.15+10-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test write and read with StringType, row length = 30: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 2365 2367 3 173.2 5.8 1.0X
+OnHeapColumnVector 5850 5851 1 70.0 14.3 0.4X
+OffHeapColumnVector 4640 4643 4 88.3 11.3 0.5X
+
+OpenJDK 64-Bit Server VM 11.0.15+10-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test write and read with IntegerType: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1338 1338 0 306.2 3.3 1.0X
+OnHeapColumnVector 2916 2917 0 140.4 7.1 0.5X
+OffHeapColumnVector 2917 2917 0 140.4 7.1 0.5X
+
+OpenJDK 64-Bit Server VM 11.0.15+10-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test write and read with LongType: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 2807 2809 2 145.9 6.9 1.0X
+OnHeapColumnVector 3230 3232 2 126.8 7.9 0.9X
+OffHeapColumnVector 2715 2715 0 150.9 6.6 1.0X
+
+OpenJDK 64-Bit Server VM 11.0.15+10-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test write and read with FloatType: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1752 1753 1 233.7 4.3 1.0X
+OnHeapColumnVector 2272 2284 18 180.3 5.5 0.8X
+OffHeapColumnVector 2012 2012 0 203.6 4.9 0.9X
+
+OpenJDK 64-Bit Server VM 11.0.15+10-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test write and read with DoubleType: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 2015 2024 12 203.2 4.9 1.0X
+OnHeapColumnVector 2414 2415 1 169.7 5.9 0.8X
+OffHeapColumnVector 2023 2024 1 202.4 4.9 1.0X
+
+OpenJDK 64-Bit Server VM 11.0.15+10-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test isNull with StringType: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1188 1188 0 344.7 2.9 1.0X
+OnHeapColumnVector 1456 1456 0 281.3 3.6 0.8X
+OffHeapColumnVector 1243 1244 1 329.5 3.0 1.0X
+
+OpenJDK 64-Bit Server VM 11.0.15+10-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test isNull with IntegerType: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1188 1195 9 344.7 2.9 1.0X
+OnHeapColumnVector 1451 1453 3 282.4 3.5 0.8X
+OffHeapColumnVector 1243 1244 1 329.4 3.0 1.0X
+
+OpenJDK 64-Bit Server VM 11.0.15+10-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test isNull with LongType: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1195 1197 4 342.8 2.9 1.0X
+OnHeapColumnVector 1456 1457 1 281.3 3.6 0.8X
+OffHeapColumnVector 1239 1244 7 330.6 3.0 1.0X
+
+OpenJDK 64-Bit Server VM 11.0.15+10-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test isNull with FloatType: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1192 1192 1 343.7 2.9 1.0X
+OnHeapColumnVector 1458 1461 4 280.9 3.6 0.8X
+OffHeapColumnVector 1242 1245 5 329.9 3.0 1.0X
+
+OpenJDK 64-Bit Server VM 11.0.15+10-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test isNull with DoubleType: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1190 1191 2 344.2 2.9 1.0X
+OnHeapColumnVector 1454 1458 5 281.7 3.5 0.8X
+OffHeapColumnVector 1237 1238 1 331.0 3.0 1.0X
+
diff --git a/sql/core/benchmarks/ConstantColumnVectorBenchmark-jdk17-results.txt b/sql/core/benchmarks/ConstantColumnVectorBenchmark-jdk17-results.txt
new file mode 100644
index 00000000000..2c74954fbe1
--- /dev/null
+++ b/sql/core/benchmarks/ConstantColumnVectorBenchmark-jdk17-results.txt
@@ -0,0 +1,280 @@
+OpenJDK 64-Bit Server VM 17.0.3+7-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+Test write with StringType, row length = 1: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+--------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1 1 0 567396.3 0.0 1.0X
+OnHeapColumnVector 2388 2389 2 171.6 5.8 0.0X
+OffHeapColumnVector 3665 3682 24 111.7 8.9 0.0X
+
+OpenJDK 64-Bit Server VM 17.0.3+7-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+Test write with StringType, row length = 5: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+--------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1 1 0 638508.5 0.0 1.0X
+OnHeapColumnVector 3545 3552 10 115.6 8.7 0.0X
+OffHeapColumnVector 5482 5483 1 74.7 13.4 0.0X
+
+OpenJDK 64-Bit Server VM 17.0.3+7-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+Test write with StringType, row length = 10: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+---------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1 1 0 638508.5 0.0 1.0X
+OnHeapColumnVector 3586 3586 0 114.2 8.8 0.0X
+OffHeapColumnVector 6296 6312 22 65.1 15.4 0.0X
+
+OpenJDK 64-Bit Server VM 17.0.3+7-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+Test write with StringType, row length = 15: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+---------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1 1 0 638508.5 0.0 1.0X
+OnHeapColumnVector 4251 4252 2 96.4 10.4 0.0X
+OffHeapColumnVector 6586 6589 4 62.2 16.1 0.0X
+
+OpenJDK 64-Bit Server VM 17.0.3+7-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+Test write with StringType, row length = 20: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+---------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1 1 0 638508.5 0.0 1.0X
+OnHeapColumnVector 4270 4274 4 95.9 10.4 0.0X
+OffHeapColumnVector 6698 6698 0 61.2 16.4 0.0X
+
+OpenJDK 64-Bit Server VM 17.0.3+7-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+Test write with StringType, row length = 30: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+---------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1 1 0 638508.5 0.0 1.0X
+OnHeapColumnVector 4327 4328 2 94.7 10.6 0.0X
+OffHeapColumnVector 6746 6753 10 60.7 16.5 0.0X
+
+OpenJDK 64-Bit Server VM 17.0.3+7-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+Test write with IntegerType: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 0 0 0 1135264.4 0.0 1.0X
+OnHeapColumnVector 12 12 0 34578.8 0.0 0.0X
+OffHeapColumnVector 85 85 0 4798.8 0.2 0.0X
+
+OpenJDK 64-Bit Server VM 17.0.3+7-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+Test write with LongType: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 0 0 0 851389.3 0.0 1.0X
+OnHeapColumnVector 22 22 0 18511.9 0.1 0.0X
+OffHeapColumnVector 85 85 0 4804.0 0.2 0.0X
+
+OpenJDK 64-Bit Server VM 17.0.3+7-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+Test write with FloatType: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 0 1 0 928806.6 0.0 1.0X
+OnHeapColumnVector 12 12 0 34709.8 0.0 0.0X
+OffHeapColumnVector 85 85 0 4814.5 0.2 0.0X
+
+OpenJDK 64-Bit Server VM 17.0.3+7-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+Test write with DoubleType: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 0 1 0 851389.3 0.0 1.0X
+OnHeapColumnVector 22 23 0 18516.6 0.1 0.0X
+OffHeapColumnVector 85 85 0 4804.2 0.2 0.0X
+
+OpenJDK 64-Bit Server VM 17.0.3+7-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+Test read with StringType, row length = 1: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+-------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 0 0 0 Infinity 0.0 NaNX
+OnHeapColumnVector 3028 3035 10 135.3 7.4 0.0X
+OffHeapColumnVector 2359 2365 8 173.6 5.8 0.0X
+
+OpenJDK 64-Bit Server VM 17.0.3+7-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+Test read with StringType, row length = 5: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+-------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 0 0 0 Infinity 0.0 NaNX
+OnHeapColumnVector 3038 3042 5 134.8 7.4 0.0X
+OffHeapColumnVector 2368 2536 238 172.9 5.8 0.0X
+
+OpenJDK 64-Bit Server VM 17.0.3+7-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+Test read with StringType, row length = 10: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+--------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 0 0 0 Infinity 0.0 NaNX
+OnHeapColumnVector 3028 3030 3 135.3 7.4 0.0X
+OffHeapColumnVector 2358 2365 9 173.7 5.8 0.0X
+
+OpenJDK 64-Bit Server VM 17.0.3+7-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+Test read with StringType, row length = 15: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+--------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 0 0 0 Infinity 0.0 NaNX
+OnHeapColumnVector 3023 3149 177 135.5 7.4 0.0X
+OffHeapColumnVector 2349 2352 3 174.3 5.7 0.0X
+
+OpenJDK 64-Bit Server VM 17.0.3+7-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+Test read with StringType, row length = 20: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+--------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 0 0 0 Infinity 0.0 NaNX
+OnHeapColumnVector 3058 3169 157 133.9 7.5 0.0X
+OffHeapColumnVector 2358 2358 0 173.7 5.8 0.0X
+
+OpenJDK 64-Bit Server VM 17.0.3+7-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+Test read with StringType, row length = 30: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+--------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 0 0 0 Infinity 0.0 NaNX
+OnHeapColumnVector 3064 3172 151 133.7 7.5 0.0X
+OffHeapColumnVector 2358 2361 5 173.7 5.8 0.0X
+
+OpenJDK 64-Bit Server VM 17.0.3+7-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+Test read with IntegerType: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 0 0 0 2553647.8 0.0 1.0X
+OnHeapColumnVector 1 1 0 510791.3 0.0 0.2X
+OffHeapColumnVector 1277 1277 0 320.9 3.1 0.0X
+
+OpenJDK 64-Bit Server VM 17.0.3+7-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+Test read with LongType: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1 1 0 515808.6 0.0 1.0X
+OnHeapColumnVector 1 1 0 785886.3 0.0 1.5X
+OffHeapColumnVector 1194 1195 1 343.0 2.9 0.0X
+
+OpenJDK 64-Bit Server VM 17.0.3+7-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+Test read with FloatType: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 0 0 0 1135264.4 0.0 1.0X
+OnHeapColumnVector 1 1 0 408622.6 0.0 0.4X
+OffHeapColumnVector 1186 1186 0 345.3 2.9 0.0X
+
+OpenJDK 64-Bit Server VM 17.0.3+7-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+Test read with DoubleType: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1 1 0 464350.1 0.0 1.0X
+OnHeapColumnVector 3007 3009 4 136.2 7.3 0.0X
+OffHeapColumnVector 1273 1274 1 321.7 3.1 0.0X
+
+OpenJDK 64-Bit Server VM 17.0.3+7-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+Test write and read with StringType, row length = 1: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+-----------------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1 1 0 510790.7 0.0 1.0X
+OnHeapColumnVector 3510 3512 4 116.7 8.6 0.0X
+OffHeapColumnVector 2528 2528 0 162.0 6.2 0.0X
+
+OpenJDK 64-Bit Server VM 17.0.3+7-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+Test write and read with StringType, row length = 5: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+-----------------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1248 1249 2 328.2 3.0 1.0X
+OnHeapColumnVector 3515 3518 4 116.5 8.6 0.4X
+OffHeapColumnVector 2528 2531 3 162.0 6.2 0.5X
+
+OpenJDK 64-Bit Server VM 17.0.3+7-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+Test write and read with StringType, row length = 10: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1244 1250 8 329.3 3.0 1.0X
+OnHeapColumnVector 3515 3515 0 116.5 8.6 0.4X
+OffHeapColumnVector 2525 2528 3 162.2 6.2 0.5X
+
+OpenJDK 64-Bit Server VM 17.0.3+7-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+Test write and read with StringType, row length = 15: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1244 1245 1 329.2 3.0 1.0X
+OnHeapColumnVector 3507 3510 5 116.8 8.6 0.4X
+OffHeapColumnVector 2528 2528 1 162.0 6.2 0.5X
+
+OpenJDK 64-Bit Server VM 17.0.3+7-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+Test write and read with StringType, row length = 20: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1240 1240 1 330.4 3.0 1.0X
+OnHeapColumnVector 3506 3513 11 116.8 8.6 0.4X
+OffHeapColumnVector 2529 2530 2 162.0 6.2 0.5X
+
+OpenJDK 64-Bit Server VM 17.0.3+7-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+Test write and read with StringType, row length = 30: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1243 1243 0 329.4 3.0 1.0X
+OnHeapColumnVector 3504 3505 2 116.9 8.6 0.4X
+OffHeapColumnVector 2524 2530 8 162.3 6.2 0.5X
+
+OpenJDK 64-Bit Server VM 17.0.3+7-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+Test write and read with IntegerType: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 0 0 0 1459739.6 0.0 1.0X
+OnHeapColumnVector 1180 1181 0 347.0 2.9 0.0X
+OffHeapColumnVector 1277 1278 1 320.6 3.1 0.0X
+
+OpenJDK 64-Bit Server VM 17.0.3+7-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+Test write and read with LongType: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 965 971 7 424.4 2.4 1.0X
+OnHeapColumnVector 1254 1255 1 326.6 3.1 0.8X
+OffHeapColumnVector 1185 1186 0 345.6 2.9 0.8X
+
+OpenJDK 64-Bit Server VM 17.0.3+7-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+Test write and read with FloatType: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 954 958 4 429.3 2.3 1.0X
+OnHeapColumnVector 1264 1266 2 323.9 3.1 0.8X
+OffHeapColumnVector 1183 1186 3 346.1 2.9 0.8X
+
+OpenJDK 64-Bit Server VM 17.0.3+7-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+Test write and read with DoubleType: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 3282 3283 2 124.8 8.0 1.0X
+OnHeapColumnVector 1189 1191 2 344.4 2.9 2.8X
+OffHeapColumnVector 1274 1274 0 321.5 3.1 2.6X
+
+OpenJDK 64-Bit Server VM 17.0.3+7-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+Test isNull with StringType: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 0 0 0 Infinity 0.0 NaNX
+OnHeapColumnVector 1 1 0 552174.9 0.0 0.0X
+OffHeapColumnVector 0 0 0 Infinity 0.0 NaNX
+
+OpenJDK 64-Bit Server VM 17.0.3+7-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+Test isNull with IntegerType: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 0 0 0 Infinity 0.0 NaNX
+OnHeapColumnVector 0 0 0 2078154.0 0.0 0.0X
+OffHeapColumnVector 0 0 0 Infinity 0.0 NaNX
+
+OpenJDK 64-Bit Server VM 17.0.3+7-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+Test isNull with LongType: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 0 0 0 Infinity 0.0 NaNX
+OnHeapColumnVector 0 0 0 2079198.4 0.0 0.0X
+OffHeapColumnVector 0 0 0 Infinity 0.0 NaNX
+
+OpenJDK 64-Bit Server VM 17.0.3+7-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+Test isNull with FloatType: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 0 0 0 Infinity 0.0 NaNX
+OnHeapColumnVector 0 0 0 2079208.9 0.0 0.0X
+OffHeapColumnVector 0 0 0 Infinity 0.0 NaNX
+
+OpenJDK 64-Bit Server VM 17.0.3+7-LTS on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
+Test isNull with DoubleType: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 0 0 0 Infinity 0.0 NaNX
+OnHeapColumnVector 0 0 0 2079208.9 0.0 0.0X
+OffHeapColumnVector 0 0 0 Infinity 0.0 NaNX
+
diff --git a/sql/core/benchmarks/ConstantColumnVectorBenchmark-results.txt b/sql/core/benchmarks/ConstantColumnVectorBenchmark-results.txt
new file mode 100644
index 00000000000..5101ed7aa1b
--- /dev/null
+++ b/sql/core/benchmarks/ConstantColumnVectorBenchmark-results.txt
@@ -0,0 +1,280 @@
+OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test write with StringType, row length = 1: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+--------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1 1 0 681069.6 0.0 1.0X
+OnHeapColumnVector 3073 3073 0 133.3 7.5 0.0X
+OffHeapColumnVector 3502 3503 2 117.0 8.6 0.0X
+
+OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test write with StringType, row length = 5: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+--------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1 1 0 680956.3 0.0 1.0X
+OnHeapColumnVector 3773 3773 0 108.6 9.2 0.0X
+OffHeapColumnVector 4834 4839 7 84.7 11.8 0.0X
+
+OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test write with StringType, row length = 10: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+---------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1 1 0 681069.6 0.0 1.0X
+OnHeapColumnVector 3980 3980 0 102.9 9.7 0.0X
+OffHeapColumnVector 4706 4706 1 87.0 11.5 0.0X
+
+OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test write with StringType, row length = 15: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+---------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1 1 0 681069.6 0.0 1.0X
+OnHeapColumnVector 4000 4001 1 102.4 9.8 0.0X
+OffHeapColumnVector 4834 4835 1 84.7 11.8 0.0X
+
+OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test write with StringType, row length = 20: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+---------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1 1 0 681069.6 0.0 1.0X
+OnHeapColumnVector 4400 4400 0 93.1 10.7 0.0X
+OffHeapColumnVector 4593 4594 2 89.2 11.2 0.0X
+
+OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test write with StringType, row length = 30: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+---------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1 1 0 681073.0 0.0 1.0X
+OnHeapColumnVector 4781 4781 0 85.7 11.7 0.0X
+OffHeapColumnVector 5185 5187 3 79.0 12.7 0.0X
+
+OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test write with IntegerType: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 0 0 0 1021433.6 0.0 1.0X
+OnHeapColumnVector 70 70 0 5845.6 0.2 0.0X
+OffHeapColumnVector 139 139 0 2937.7 0.3 0.0X
+
+OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test write with LongType: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 0 0 0 942911.9 0.0 1.0X
+OnHeapColumnVector 61 63 1 6759.3 0.1 0.0X
+OffHeapColumnVector 157 158 1 2612.7 0.4 0.0X
+
+OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test write with FloatType: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1 1 0 817232.1 0.0 1.0X
+OnHeapColumnVector 70 70 0 5819.5 0.2 0.0X
+OffHeapColumnVector 139 139 0 2944.1 0.3 0.0X
+
+OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test write with DoubleType: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1 1 0 766171.7 0.0 1.0X
+OnHeapColumnVector 55 59 1 7504.8 0.1 0.0X
+OffHeapColumnVector 154 157 1 2662.4 0.4 0.0X
+
+OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test read with StringType, row length = 1: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+-------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1188 1190 3 344.7 2.9 1.0X
+OnHeapColumnVector 2907 2907 1 140.9 7.1 0.4X
+OffHeapColumnVector 4290 4295 7 95.5 10.5 0.3X
+
+OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test read with StringType, row length = 5: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+-------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1168 1172 6 350.6 2.9 1.0X
+OnHeapColumnVector 5766 5770 6 71.0 14.1 0.2X
+OffHeapColumnVector 4278 4281 5 95.7 10.4 0.3X
+
+OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test read with StringType, row length = 10: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+--------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1163 1165 4 352.3 2.8 1.0X
+OnHeapColumnVector 5761 5762 2 71.1 14.1 0.2X
+OffHeapColumnVector 4233 4249 23 96.8 10.3 0.3X
+
+OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test read with StringType, row length = 15: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+--------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1158 1163 6 353.7 2.8 1.0X
+OnHeapColumnVector 5750 5754 5 71.2 14.0 0.2X
+OffHeapColumnVector 4231 4232 2 96.8 10.3 0.3X
+
+OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test read with StringType, row length = 20: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+--------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1162 1164 2 352.4 2.8 1.0X
+OnHeapColumnVector 6277 6278 2 65.3 15.3 0.2X
+OffHeapColumnVector 4220 4236 22 97.1 10.3 0.3X
+
+OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test read with StringType, row length = 30: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+--------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1159 1160 1 353.4 2.8 1.0X
+OnHeapColumnVector 6281 6284 4 65.2 15.3 0.2X
+OffHeapColumnVector 4233 4245 17 96.8 10.3 0.3X
+
+OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test read with IntegerType: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1361 1362 1 300.9 3.3 1.0X
+OnHeapColumnVector 1807 1808 3 226.7 4.4 0.8X
+OffHeapColumnVector 2808 2811 4 145.8 6.9 0.5X
+
+OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test read with LongType: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1315 1317 3 311.5 3.2 1.0X
+OnHeapColumnVector 2318 2318 1 176.7 5.7 0.6X
+OffHeapColumnVector 2512 2513 1 163.0 6.1 0.5X
+
+OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test read with FloatType: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1177 1179 2 347.9 2.9 1.0X
+OnHeapColumnVector 1334 1335 1 307.0 3.3 0.9X
+OffHeapColumnVector 1837 1838 1 222.9 4.5 0.6X
+
+OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test read with DoubleType: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1178 1178 0 347.8 2.9 1.0X
+OnHeapColumnVector 1330 1331 1 307.9 3.2 0.9X
+OffHeapColumnVector 1835 1837 2 223.2 4.5 0.6X
+
+OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test write and read with StringType, row length = 1: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+-----------------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1164 1166 2 351.8 2.8 1.0X
+OnHeapColumnVector 5884 5884 0 69.6 14.4 0.2X
+OffHeapColumnVector 4744 4746 4 86.3 11.6 0.2X
+
+OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test write and read with StringType, row length = 5: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+-----------------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1162 1164 4 352.6 2.8 1.0X
+OnHeapColumnVector 5545 5546 2 73.9 13.5 0.2X
+OffHeapColumnVector 4754 4756 4 86.2 11.6 0.2X
+
+OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test write and read with StringType, row length = 10: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1164 1165 2 352.0 2.8 1.0X
+OnHeapColumnVector 5539 5541 2 73.9 13.5 0.2X
+OffHeapColumnVector 4745 4749 6 86.3 11.6 0.2X
+
+OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test write and read with StringType, row length = 15: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1168 1168 1 350.8 2.9 1.0X
+OnHeapColumnVector 5541 5543 2 73.9 13.5 0.2X
+OffHeapColumnVector 4755 4756 2 86.1 11.6 0.2X
+
+OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test write and read with StringType, row length = 20: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1163 1163 1 352.3 2.8 1.0X
+OnHeapColumnVector 5541 5542 2 73.9 13.5 0.2X
+OffHeapColumnVector 4755 4758 4 86.1 11.6 0.2X
+
+OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test write and read with StringType, row length = 30: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1168 1168 0 350.8 2.9 1.0X
+OnHeapColumnVector 5538 5539 3 74.0 13.5 0.2X
+OffHeapColumnVector 4757 4758 0 86.1 11.6 0.2X
+
+OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test write and read with IntegerType: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1099 1102 4 372.6 2.7 1.0X
+OnHeapColumnVector 3025 3026 1 135.4 7.4 0.4X
+OffHeapColumnVector 2800 2804 5 146.3 6.8 0.4X
+
+OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test write and read with LongType: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 2466 2469 3 166.1 6.0 1.0X
+OnHeapColumnVector 2778 2779 1 147.4 6.8 0.9X
+OffHeapColumnVector 2511 2514 5 163.1 6.1 1.0X
+
+OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test write and read with FloatType: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1841 1843 3 222.5 4.5 1.0X
+OnHeapColumnVector 2237 2238 2 183.1 5.5 0.8X
+OffHeapColumnVector 1791 1793 4 228.8 4.4 1.0X
+
+OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test write and read with DoubleType: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1645 1647 3 249.0 4.0 1.0X
+OnHeapColumnVector 2223 2225 3 184.2 5.4 0.7X
+OffHeapColumnVector 1767 1769 3 231.8 4.3 0.9X
+
+OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test isNull with StringType: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1098 1100 2 373.0 2.7 1.0X
+OnHeapColumnVector 1437 1438 2 285.0 3.5 0.8X
+OffHeapColumnVector 1166 1168 3 351.2 2.8 0.9X
+
+OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test isNull with IntegerType: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1097 1101 5 373.2 2.7 1.0X
+OnHeapColumnVector 1432 1433 1 286.1 3.5 0.8X
+OffHeapColumnVector 1171 1173 2 349.7 2.9 0.9X
+
+OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test isNull with LongType: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1103 1109 8 371.2 2.7 1.0X
+OnHeapColumnVector 1440 1443 4 284.5 3.5 0.8X
+OffHeapColumnVector 1177 1178 3 348.1 2.9 0.9X
+
+OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test isNull with FloatType: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1112 1113 1 368.3 2.7 1.0X
+OnHeapColumnVector 1449 1451 2 282.6 3.5 0.8X
+OffHeapColumnVector 1179 1180 1 347.3 2.9 0.9X
+
+OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Linux 5.13.0-1023-azure
+Intel(R) Xeon(R) Platinum 8272CL CPU @ 2.60GHz
+Test isNull with DoubleType: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
+------------------------------------------------------------------------------------------------------------------------
+ConstantColumnVector 1109 1110 1 369.4 2.7 1.0X
+OnHeapColumnVector 1451 1451 1 282.4 3.5 0.8X
+OffHeapColumnVector 1175 1179 4 348.5 2.9 0.9X
+
diff --git a/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedParquetRecordReader.java b/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedParquetRecordReader.java
index 6a30876a3bc..4d71f3f6689 100644
--- a/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedParquetRecordReader.java
+++ b/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedParquetRecordReader.java
@@ -37,9 +37,9 @@ import org.apache.parquet.schema.Type;
import org.apache.spark.memory.MemoryMode;
import org.apache.spark.sql.catalyst.InternalRow;
import org.apache.spark.sql.execution.vectorized.ColumnVectorUtils;
+import org.apache.spark.sql.execution.vectorized.ConstantColumnVector;
import org.apache.spark.sql.execution.vectorized.WritableColumnVector;
-import org.apache.spark.sql.execution.vectorized.OffHeapColumnVector;
-import org.apache.spark.sql.execution.vectorized.OnHeapColumnVector;
+import org.apache.spark.sql.vectorized.ColumnVector;
import org.apache.spark.sql.vectorized.ColumnarBatch;
import org.apache.spark.sql.types.StructField;
import org.apache.spark.sql.types.StructType;
@@ -243,18 +243,17 @@ public class VectorizedParquetRecordReader extends SpecificParquetRecordReaderBa
for (StructField f: sparkSchema.fields()) {
batchSchema = batchSchema.add(f);
}
+ int constantColumnLength = 0;
if (partitionColumns != null) {
for (StructField f : partitionColumns.fields()) {
batchSchema = batchSchema.add(f);
}
+ constantColumnLength = partitionColumns.fields().length;
}
- WritableColumnVector[] vectors;
- if (memMode == MemoryMode.OFF_HEAP) {
- vectors = OffHeapColumnVector.allocateColumns(capacity, batchSchema);
- } else {
- vectors = OnHeapColumnVector.allocateColumns(capacity, batchSchema);
- }
+ ColumnVector[] vectors = ColumnVectorUtils.allocateColumns(
+ capacity, batchSchema, memMode == MemoryMode.OFF_HEAP, constantColumnLength);
+
columnarBatch = new ColumnarBatch(vectors);
columnVectors = new ParquetColumnVector[sparkSchema.fields().length];
@@ -264,14 +263,14 @@ public class VectorizedParquetRecordReader extends SpecificParquetRecordReaderBa
defaultValue = sparkRequestedSchema.existenceDefaultValues()[i];
}
columnVectors[i] = new ParquetColumnVector(parquetColumn.children().apply(i),
- vectors[i], capacity, memMode, missingColumns, true, defaultValue);
+ (WritableColumnVector) vectors[i], capacity, memMode, missingColumns, true, defaultValue);
}
if (partitionColumns != null) {
int partitionIdx = sparkSchema.fields().length;
for (int i = 0; i < partitionColumns.fields().length; i++) {
- ColumnVectorUtils.populate(vectors[i + partitionIdx], partitionValues, i);
- vectors[i + partitionIdx].setIsConstant();
+ ColumnVectorUtils.populate(
+ (ConstantColumnVector) vectors[i + partitionIdx], partitionValues, i);
}
}
}
diff --git a/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVectorUtils.java b/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVectorUtils.java
index 353a1282544..c55c59032e6 100644
--- a/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVectorUtils.java
+++ b/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVectorUtils.java
@@ -30,6 +30,7 @@ import org.apache.spark.sql.Row;
import org.apache.spark.sql.catalyst.InternalRow;
import org.apache.spark.sql.catalyst.util.DateTimeUtils;
import org.apache.spark.sql.types.*;
+import org.apache.spark.sql.vectorized.ColumnVector;
import org.apache.spark.sql.vectorized.ColumnarArray;
import org.apache.spark.sql.vectorized.ColumnarBatch;
import org.apache.spark.sql.vectorized.ColumnarMap;
@@ -105,6 +106,61 @@ public class ColumnVectorUtils {
}
}
+ /**
+ * Populates the value of `row[fieldIdx]` into `ConstantColumnVector`.
+ */
+ public static void populate(ConstantColumnVector col, InternalRow row, int fieldIdx) {
+ DataType t = col.dataType();
+
+ if (row.isNullAt(fieldIdx)) {
+ col.setNull();
+ } else {
+ if (t == DataTypes.BooleanType) {
+ col.setBoolean(row.getBoolean(fieldIdx));
+ } else if (t == DataTypes.BinaryType) {
+ col.setBinary(row.getBinary(fieldIdx));
+ } else if (t == DataTypes.ByteType) {
+ col.setByte(row.getByte(fieldIdx));
+ } else if (t == DataTypes.ShortType) {
+ col.setShort(row.getShort(fieldIdx));
+ } else if (t == DataTypes.IntegerType) {
+ col.setInt(row.getInt(fieldIdx));
+ } else if (t == DataTypes.LongType) {
+ col.setLong(row.getLong(fieldIdx));
+ } else if (t == DataTypes.FloatType) {
+ col.setFloat(row.getFloat(fieldIdx));
+ } else if (t == DataTypes.DoubleType) {
+ col.setDouble(row.getDouble(fieldIdx));
+ } else if (t == DataTypes.StringType) {
+ UTF8String v = row.getUTF8String(fieldIdx);
+ col.setUtf8String(v);
+ } else if (t instanceof DecimalType) {
+ DecimalType dt = (DecimalType) t;
+ Decimal d = row.getDecimal(fieldIdx, dt.precision(), dt.scale());
+ if (dt.precision() <= Decimal.MAX_INT_DIGITS()) {
+ col.setInt((int)d.toUnscaledLong());
+ } else if (dt.precision() <= Decimal.MAX_LONG_DIGITS()) {
+ col.setLong(d.toUnscaledLong());
+ } else {
+ final BigInteger integer = d.toJavaBigDecimal().unscaledValue();
+ byte[] bytes = integer.toByteArray();
+ col.setBinary(bytes);
+ }
+ } else if (t instanceof CalendarIntervalType) {
+ // The value of `numRows` is irrelevant.
+ col.setCalendarInterval((CalendarInterval) row.get(fieldIdx, t));
+ } else if (t instanceof DateType || t instanceof YearMonthIntervalType) {
+ col.setInt(row.getInt(fieldIdx));
+ } else if (t instanceof TimestampType || t instanceof TimestampNTZType ||
+ t instanceof DayTimeIntervalType) {
+ col.setLong(row.getLong(fieldIdx));
+ } else {
+ throw new RuntimeException(String.format("DataType %s is not supported" +
+ " in column vectorized reader.", t.sql()));
+ }
+ }
+ }
+
/**
* Returns the array data as the java primitive array.
* For example, an array of IntegerType will return an int[].
@@ -235,4 +291,37 @@ public class ColumnVectorUtils {
batch.setNumRows(n);
return batch;
}
+
+ /**
+ * <b>This method assumes that all constant column are at the end of schema
+ * and `constantColumnLength` represents the number of constant column.<b/>
+ *
+ * This method allocates columns to store elements of each field of the schema,
+ * the data columns use `OffHeapColumnVector` when `useOffHeap` is true and
+ * use `OnHeapColumnVector` when `useOffHeap` is false, the constant columns
+ * always use `ConstantColumnVector`.
+ *
+ * Capacity is the initial capacity of the vector, and it will grow as necessary.
+ * Capacity is in number of elements, not number of bytes.
+ */
+ public static ColumnVector[] allocateColumns(
+ int capacity, StructType schema, boolean useOffHeap, int constantColumnLength) {
+ StructField[] fields = schema.fields();
+ int fieldsLength = fields.length;
+ ColumnVector[] vectors = new ColumnVector[fieldsLength];
+ if (useOffHeap) {
+ for (int i = 0; i < fieldsLength - constantColumnLength; i++) {
+ vectors[i] = new OffHeapColumnVector(capacity, fields[i].dataType());
+ }
+ } else {
+ for (int i = 0; i < fieldsLength - constantColumnLength; i++) {
+ vectors[i] = new OnHeapColumnVector(capacity, fields[i].dataType());
+ }
+ }
+ for (int i = fieldsLength - constantColumnLength; i < fieldsLength; i++) {
+ vectors[i] = new ConstantColumnVector(capacity, fields[i].dataType());
+ }
+ return vectors;
+ }
+
}
diff --git a/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ConstantColumnVector.java b/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ConstantColumnVector.java
index 3a5dea479ca..5095e6b0c9c 100644
--- a/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ConstantColumnVector.java
+++ b/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ConstantColumnVector.java
@@ -23,6 +23,7 @@ import org.apache.spark.sql.types.*;
import org.apache.spark.sql.vectorized.ColumnVector;
import org.apache.spark.sql.vectorized.ColumnarArray;
import org.apache.spark.sql.vectorized.ColumnarMap;
+import org.apache.spark.unsafe.types.CalendarInterval;
import org.apache.spark.unsafe.types.UTF8String;
/**
@@ -63,6 +64,9 @@ public class ConstantColumnVector extends ColumnVector {
} else if (type instanceof CalendarIntervalType) {
// Three columns. Months as int. Days as Int. Microseconds as Long.
this.childData = new ConstantColumnVector[3];
+ this.childData[0] = new ConstantColumnVector(1, DataTypes.IntegerType);
+ this.childData[1] = new ConstantColumnVector(1, DataTypes.IntegerType);
+ this.childData[2] = new ConstantColumnVector(1, DataTypes.LongType);
} else {
this.childData = null;
}
@@ -294,4 +298,13 @@ public class ConstantColumnVector extends ColumnVector {
public void setChild(int ordinal, ConstantColumnVector value) {
childData[ordinal] = value;
}
+
+ /**
+ * Sets the CalendarInterval `value` for all rows
+ */
+ public void setCalendarInterval(CalendarInterval value) {
+ this.childData[0].setInt(value.months);
+ this.childData[1].setInt(value.days);
+ this.childData[2].setLong(value.microseconds);
+ }
}
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
index 9534be81928..3349f335841 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
@@ -47,7 +47,7 @@ import org.apache.spark.sql.catalyst.util.DateTimeUtils
import org.apache.spark.sql.errors.QueryExecutionErrors
import org.apache.spark.sql.execution.WholeStageCodegenExec
import org.apache.spark.sql.execution.datasources._
-import org.apache.spark.sql.execution.vectorized.{OffHeapColumnVector, OnHeapColumnVector}
+import org.apache.spark.sql.execution.vectorized.{ConstantColumnVector, OffHeapColumnVector, OnHeapColumnVector}
import org.apache.spark.sql.internal.SQLConf
import org.apache.spark.sql.sources._
import org.apache.spark.sql.types._
@@ -176,13 +176,13 @@ class ParquetFileFormat
requiredSchema: StructType,
partitionSchema: StructType,
sqlConf: SQLConf): Option[Seq[String]] = {
- Option(Seq.fill(requiredSchema.fields.length + partitionSchema.fields.length)(
+ Option(Seq.fill(requiredSchema.fields.length)(
if (!sqlConf.offHeapColumnVectorEnabled) {
classOf[OnHeapColumnVector].getName
} else {
classOf[OffHeapColumnVector].getName
}
- ))
+ ) ++ Seq.fill(partitionSchema.fields.length)(classOf[ConstantColumnVector].getName))
}
override def isSplitable(
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/ConstantColumnVectorBenchmark.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/ConstantColumnVectorBenchmark.scala
new file mode 100644
index 00000000000..9e4902f2fb5
--- /dev/null
+++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/ConstantColumnVectorBenchmark.scala
@@ -0,0 +1,296 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark
+
+import org.apache.commons.lang3.RandomStringUtils
+
+import org.apache.spark.benchmark.Benchmark
+import org.apache.spark.benchmark.BenchmarkBase
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.execution.vectorized.{ColumnVectorUtils, ConstantColumnVector, OffHeapColumnVector, OnHeapColumnVector}
+import org.apache.spark.sql.types._
+import org.apache.spark.sql.vectorized.ColumnVector
+import org.apache.spark.unsafe.UTF8StringBuilder
+
+/**
+ * Benchmark for constant ColumnVector read and write,
+ * include `ConstantColumnVector`, `OnHeapColumnVector` and `OffHeapColumnVector`
+ * To run this benchmark:
+ * {{{
+ * 1. without sbt: bin/spark-submit --class <this class>
+ * --jars <spark core test jar>,<spark catalyst test jar> <spark sql test jar>
+ * 2. build/sbt "sql/test:runMain <this class>"
+ * 3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain <this class>"
+ * Results will be written to "benchmarks/ConstantColumnVectorBenchmark-results.txt".
+ * }}}
+ */
+object ConstantColumnVectorBenchmark extends BenchmarkBase {
+
+ private def readValues(dataType: DataType, batchSize: Int, vector: ColumnVector): Unit = {
+ dataType match {
+ case IntegerType =>
+ (0 until batchSize).foreach(i => vector.getInt(i))
+ case LongType =>
+ (0 until batchSize).foreach(i => vector.getLong(i))
+ case FloatType =>
+ (0 until batchSize).foreach(i => vector.getFloat(i))
+ case DoubleType =>
+ (0 until batchSize).foreach(i => vector.getDouble(i))
+ case StringType =>
+ (0 until batchSize).foreach(i => vector.getUTF8String(i))
+ }
+ }
+
+ def testWrite(
+ valuesPerIteration: Int,
+ batchSize: Int,
+ dataType: DataType,
+ row: InternalRow): Unit = {
+
+ val onHeapColumnVector = new OnHeapColumnVector(batchSize, dataType)
+ val offHeapColumnVector = new OffHeapColumnVector(batchSize, dataType)
+ val constantColumnVector = new ConstantColumnVector(batchSize, dataType)
+
+ val other = if (dataType == StringType) {
+ s", row length = ${row.getUTF8String(0).toString.length}"
+ } else {
+ ""
+ }
+
+ val benchmark = new Benchmark(
+ s"Test write with $dataType$other",
+ valuesPerIteration * batchSize,
+ output = output)
+
+ benchmark.addCase("ConstantColumnVector") { _: Int =>
+ for (_ <- 0 until valuesPerIteration) {
+ ColumnVectorUtils.populate(constantColumnVector, row, 0)
+ }
+ }
+
+ benchmark.addCase("OnHeapColumnVector") { _: Int =>
+ for (_ <- 0 until valuesPerIteration) {
+ onHeapColumnVector.reset()
+ ColumnVectorUtils.populate(onHeapColumnVector, row, 0)
+ }
+ }
+
+ benchmark.addCase("OffHeapColumnVector") { _: Int =>
+ for (_ <- 0 until valuesPerIteration) {
+ offHeapColumnVector.reset()
+ ColumnVectorUtils.populate(offHeapColumnVector, row, 0)
+ }
+ }
+
+ benchmark.run()
+ onHeapColumnVector.close()
+ offHeapColumnVector.close()
+ constantColumnVector.close()
+ }
+
+ def testRead(
+ valuesPerIteration: Int,
+ batchSize: Int,
+ dataType: DataType,
+ row: InternalRow): Unit = {
+
+ val onHeapColumnVector = new OnHeapColumnVector(batchSize, dataType)
+ val offHeapColumnVector = new OffHeapColumnVector(batchSize, dataType)
+ val constantColumnVector = new ConstantColumnVector(batchSize, dataType)
+
+ onHeapColumnVector.reset()
+ ColumnVectorUtils.populate(onHeapColumnVector, row, 0)
+ offHeapColumnVector.reset()
+ ColumnVectorUtils.populate(offHeapColumnVector, row, 0)
+ ColumnVectorUtils.populate(constantColumnVector, row, 0)
+
+ val other = if (dataType == StringType) {
+ s", row length = ${row.getUTF8String(0).toString.length}"
+ } else {
+ ""
+ }
+
+ val benchmark = new Benchmark(
+ s"Test read with $dataType$other",
+ valuesPerIteration * batchSize,
+ output = output)
+
+ benchmark.addCase("ConstantColumnVector") { _: Int =>
+ for (_ <- 0 until valuesPerIteration) {
+ readValues(dataType, batchSize, constantColumnVector)
+ }
+ }
+
+ benchmark.addCase("OnHeapColumnVector") { _: Int =>
+ for (_ <- 0 until valuesPerIteration) {
+ readValues(dataType, batchSize, onHeapColumnVector)
+ }
+ }
+
+ benchmark.addCase("OffHeapColumnVector") { _: Int =>
+ for (_ <- 0 until valuesPerIteration) {
+ readValues(dataType, batchSize, offHeapColumnVector)
+ }
+ }
+
+ benchmark.run()
+ onHeapColumnVector.close()
+ offHeapColumnVector.close()
+ constantColumnVector.close()
+ }
+
+ def testWriteAndRead(
+ valuesPerIteration: Int,
+ batchSize: Int,
+ dataType: DataType,
+ row: InternalRow): Unit = {
+
+ val onHeapColumnVector = new OnHeapColumnVector(batchSize, dataType)
+ val offHeapColumnVector = new OffHeapColumnVector(batchSize, dataType)
+ val constantColumnVector = new ConstantColumnVector(batchSize, dataType)
+
+ val other = if (dataType == StringType) {
+ s", row length = ${row.getUTF8String(0).toString.length}"
+ } else {
+ ""
+ }
+
+ val benchmark = new Benchmark(
+ s"Test write and read with $dataType$other",
+ valuesPerIteration * batchSize,
+ output = output)
+
+ benchmark.addCase("ConstantColumnVector") { _: Int =>
+ ColumnVectorUtils.populate(constantColumnVector, row, 0)
+ for (_ <- 0 until valuesPerIteration) {
+ readValues(dataType, batchSize, constantColumnVector)
+ }
+ }
+
+ benchmark.addCase("OnHeapColumnVector") { _: Int =>
+ onHeapColumnVector.reset()
+ ColumnVectorUtils.populate(onHeapColumnVector, row, 0)
+ for (_ <- 0 until valuesPerIteration) {
+ readValues(dataType, batchSize, onHeapColumnVector)
+ }
+ }
+
+ benchmark.addCase("OffHeapColumnVector") { _: Int =>
+ offHeapColumnVector.reset()
+ ColumnVectorUtils.populate(offHeapColumnVector, row, 0)
+ for (_ <- 0 until valuesPerIteration) {
+ readValues(dataType, batchSize, offHeapColumnVector)
+ }
+ }
+
+ benchmark.run()
+ onHeapColumnVector.close()
+ offHeapColumnVector.close()
+ constantColumnVector.close()
+ }
+
+ def testIsNull(
+ valuesPerIteration: Int,
+ batchSize: Int,
+ dataType: DataType): Unit = {
+
+ val onHeapColumnVector = new OnHeapColumnVector(batchSize, dataType)
+ val offHeapColumnVector = new OffHeapColumnVector(batchSize, dataType)
+ val constantColumnVector = new ConstantColumnVector(batchSize, dataType)
+
+ onHeapColumnVector.putNulls(0, batchSize)
+ offHeapColumnVector.putNulls(0, batchSize)
+ constantColumnVector.setNull()
+
+ val benchmark = new Benchmark(
+ s"Test isNull with $dataType",
+ valuesPerIteration * batchSize,
+ output = output)
+
+ benchmark.addCase("ConstantColumnVector") { _: Int =>
+ for (_ <- 0 until valuesPerIteration) {
+ (0 until batchSize).foreach(constantColumnVector.isNullAt)
+ }
+ }
+
+ benchmark.addCase("OnHeapColumnVector") { _: Int =>
+ for (i <- 0 until valuesPerIteration) {
+ (0 until batchSize).foreach(onHeapColumnVector.isNullAt)
+ }
+ }
+
+ benchmark.addCase("OffHeapColumnVector") { _: Int =>
+ for (i <- 0 until valuesPerIteration) {
+ (0 until batchSize).foreach(offHeapColumnVector.isNullAt)
+ }
+ }
+
+ benchmark.run()
+ onHeapColumnVector.close()
+ offHeapColumnVector.close()
+ constantColumnVector.close()
+ }
+
+ override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
+ val valuesPerIteration = 100000
+ val batchSize = 4096
+
+ Seq(1, 5, 10, 15, 20, 30).foreach { length =>
+ val builder = new UTF8StringBuilder()
+ builder.append(RandomStringUtils.random(length))
+ val row = InternalRow(builder.build())
+ testWrite(valuesPerIteration, batchSize, StringType, row)
+ }
+
+ testWrite(valuesPerIteration, batchSize, IntegerType, InternalRow(100))
+ testWrite(valuesPerIteration, batchSize, LongType, InternalRow(100L))
+ testWrite(valuesPerIteration, batchSize, FloatType, InternalRow(100F))
+ testWrite(valuesPerIteration, batchSize, DoubleType, InternalRow(100D))
+
+
+ Seq(1, 5, 10, 15, 20, 30).foreach { length =>
+ val builder = new UTF8StringBuilder()
+ builder.append(RandomStringUtils.random(length))
+ val row = InternalRow(builder.build())
+ testRead(valuesPerIteration, batchSize, StringType, row)
+ }
+
+ testRead(valuesPerIteration, batchSize, IntegerType, InternalRow(100))
+ testRead(valuesPerIteration, batchSize, LongType, InternalRow(100L))
+ testRead(valuesPerIteration, batchSize, FloatType, InternalRow(100F))
+ testRead(valuesPerIteration, batchSize, DoubleType, InternalRow(100D))
+
+ Seq(1, 5, 10, 15, 20, 30).foreach { length =>
+ val builder = new UTF8StringBuilder()
+ builder.append(RandomStringUtils.random(length))
+ val row = InternalRow(builder.build())
+ testWriteAndRead(valuesPerIteration, batchSize, StringType, row)
+ }
+
+ testWriteAndRead(valuesPerIteration, batchSize, IntegerType, InternalRow(100))
+ testWriteAndRead(valuesPerIteration, batchSize, LongType, InternalRow(100L))
+ testWriteAndRead(valuesPerIteration, batchSize, FloatType, InternalRow(100F))
+ testWriteAndRead(valuesPerIteration, batchSize, DoubleType, InternalRow(100D))
+
+ testIsNull(valuesPerIteration, batchSize, StringType)
+ testIsNull(valuesPerIteration, batchSize, IntegerType)
+ testIsNull(valuesPerIteration, batchSize, LongType)
+ testIsNull(valuesPerIteration, batchSize, FloatType)
+ testIsNull(valuesPerIteration, batchSize, DoubleType)
+ }
+}
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnVectorUtilsSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnVectorUtilsSuite.scala
new file mode 100644
index 00000000000..6205484d6be
--- /dev/null
+++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnVectorUtilsSuite.scala
@@ -0,0 +1,163 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.vectorized
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.CalendarInterval
+import org.apache.spark.unsafe.types.UTF8String
+
+class ColumnVectorUtilsSuite extends SparkFunSuite {
+
+ private def testConstantColumnVector(name: String, size: Int, dt: DataType)
+ (f: ConstantColumnVector => Unit): Unit = {
+ test(name) {
+ val vector = new ConstantColumnVector(size, dt)
+ f(vector)
+ vector.close()
+ }
+ }
+
+ testConstantColumnVector("fill null", 10, IntegerType) { vector =>
+
+ ColumnVectorUtils.populate(vector, InternalRow(null), 0)
+
+ assert(vector.hasNull)
+ assert(vector.numNulls() == 10)
+ (0 until 10).foreach { i =>
+ assert(vector.isNullAt(i))
+ }
+
+ vector.setNotNull()
+ assert(!vector.hasNull)
+ assert(vector.numNulls() == 0)
+ (0 until 10).foreach { i =>
+ assert(!vector.isNullAt(i))
+ }
+ }
+
+ testConstantColumnVector("fill boolean", 10, BooleanType) { vector =>
+ ColumnVectorUtils.populate(vector, InternalRow(true), 0)
+ (0 until 10).foreach { i =>
+ assert(vector.getBoolean(i))
+ }
+ }
+
+ testConstantColumnVector("fill byte", 10, ByteType) { vector =>
+ ColumnVectorUtils.populate(vector, InternalRow(3.toByte), 0)
+ (0 until 10).foreach { i =>
+ assert(vector.getByte(i) == 3.toByte)
+ }
+ }
+
+ testConstantColumnVector("fill short", 10, ShortType) { vector =>
+ ColumnVectorUtils.populate(vector, InternalRow(3.toShort), 0)
+ (0 until 10).foreach { i =>
+ assert(vector.getShort(i) == 3.toShort)
+ }
+ }
+
+ testConstantColumnVector("fill int", 10, IntegerType) { vector =>
+ ColumnVectorUtils.populate(vector, InternalRow(3), 0)
+ (0 until 10).foreach { i =>
+ assert(vector.getInt(i) == 3)
+ }
+ }
+
+ testConstantColumnVector("fill long", 10, LongType) { vector =>
+ ColumnVectorUtils.populate(vector, InternalRow(3L), 0)
+ (0 until 10).foreach { i =>
+ assert(vector.getLong(i) == 3L)
+ }
+ }
+
+ testConstantColumnVector("fill float", 10, FloatType) { vector =>
+ ColumnVectorUtils.populate(vector, InternalRow(3.toFloat), 0)
+ (0 until 10).foreach { i =>
+ assert(vector.getFloat(i) == 3.toFloat)
+ }
+ }
+
+ testConstantColumnVector("fill double", 10, DoubleType) { vector =>
+ ColumnVectorUtils.populate(vector, InternalRow(3.toDouble), 0)
+ (0 until 10).foreach { i =>
+ assert(vector.getDouble(i) == 3.toDouble)
+ }
+ }
+
+ testConstantColumnVector("fill decimal", 10, DecimalType(10, 0)) { vector =>
+ val decimal = Decimal(100L)
+ ColumnVectorUtils.populate(vector, InternalRow(decimal), 0)
+ (0 until 10).foreach { i =>
+ assert(vector.getDecimal(i, 10, 0) == decimal)
+ }
+ }
+
+ testConstantColumnVector("fill utf8string", 10, StringType) { vector =>
+ val string = UTF8String.fromString("hello")
+ ColumnVectorUtils.populate(vector, InternalRow(string), 0)
+ (0 until 10).foreach { i =>
+ assert(vector.getUTF8String(i) == string)
+ }
+ }
+
+ testConstantColumnVector("fill binary", 10, BinaryType) { vector =>
+ val binary = "hello".getBytes("utf8")
+ ColumnVectorUtils.populate(vector, InternalRow(binary), 0)
+ (0 until 10).foreach { i =>
+ assert(vector.getBinary(i) === binary)
+ }
+ }
+
+ testConstantColumnVector("fill calendar interval", 10,
+ CalendarIntervalType) { vector =>
+ val interval = new CalendarInterval(3, 5, 1000000)
+ ColumnVectorUtils.populate(vector, InternalRow(interval), 0)
+ (0 until 10).foreach { i =>
+ assert(vector.getInterval(i) === interval)
+ }
+ }
+
+ testConstantColumnVector("not supported: fill map", 10,
+ MapType(IntegerType, BooleanType)) { vector =>
+ val message = intercept[RuntimeException] {
+ ColumnVectorUtils.populate(vector, InternalRow("fakeMap"), 0)
+ }.getMessage
+ assert(message == "DataType MAP<INT, BOOLEAN> is not supported in column vectorized reader.")
+ }
+
+ testConstantColumnVector("not supported: fill struct", 10,
+ new StructType()
+ .add(StructField("name", StringType))
+ .add(StructField("age", IntegerType))) { vector =>
+ val message = intercept[RuntimeException] {
+ ColumnVectorUtils.populate(vector, InternalRow("fakeStruct"), 0)
+ }.getMessage
+ assert(message ==
+ "DataType STRUCT<name: STRING, age: INT> is not supported in column vectorized reader.")
+ }
+
+ testConstantColumnVector("not supported: fill array", 10,
+ ArrayType(IntegerType)) { vector =>
+ val message = intercept[RuntimeException] {
+ ColumnVectorUtils.populate(vector, InternalRow("fakeArray"), 0)
+ }.getMessage
+ assert(message == "DataType ARRAY<INT> is not supported in column vectorized reader.")
+ }
+}
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org