You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kyuubi.apache.org by ch...@apache.org on 2022/07/01 09:34:42 UTC
[incubator-kyuubi] branch master updated: [KYUUBI #2981] Improve TPC-DS scan performance
This is an automated email from the ASF dual-hosted git repository.
chengpan pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-kyuubi.git
The following commit(s) were added to refs/heads/master by this push:
new 3a80f33bf [KYUUBI #2981] Improve TPC-DS scan performance
3a80f33bf is described below
commit 3a80f33bf1e906066b6fdfb372041d5e31b0284f
Author: Cheng Pan <ch...@apache.org>
AuthorDate: Fri Jul 1 17:34:30 2022 +0800
[KYUUBI #2981] Improve TPC-DS scan performance
### _Why are the changes needed?_
Before
```
OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Mac OS X 12.4
Apple M1 Pro
TPCDS table generates 1000000 rows benchmark: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
----------------------------------------------------------------------------------------------------------------------------
catalog_returns benchmark 13956 13975 21 0.1 13955.7 1.0X
catalog_sales benchmark 10229 10277 42 0.1 10229.2 1.4X
customer benchmark 9305 9464 249 0.1 9305.0 1.5X
customer_address benchmark 5612 5737 162 0.2 5611.7 2.5X
customer_demographics benchmark 1108 1182 66 0.9 1107.5 12.6X
inventory benchmark 665 695 27 1.5 664.7 21.0X
store_returns benchmark 11260 11409 132 0.1 11260.1 1.2X
store_sales benchmark 7894 7909 15 0.1 7894.1 1.8X
web_returns benchmark 13042 13082 38 0.1 13042.1 1.1X
web_sales benchmark 11182 11201 23 0.1 11182.4 1.2X
```
After
```
OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Mac OS X 12.4
Apple M1 Pro
TPCDS table generates 1000000 rows benchmark: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
----------------------------------------------------------------------------------------------------------------------------
catalog_returns benchmark 13644 13703 52 0.1 13643.6 1.0X
catalog_sales benchmark 10505 10553 43 0.1 10505.2 1.3X
customer benchmark 8571 8658 124 0.1 8570.8 1.6X
customer_address benchmark 5230 5255 25 0.2 5229.7 2.6X
customer_demographics benchmark 838 844 6 1.2 837.7 16.3X
inventory benchmark 475 489 13 2.1 475.3 28.7X
store_returns benchmark 10808 10935 163 0.1 10807.8 1.3X
store_sales benchmark 7694 7723 43 0.1 7693.5 1.8X
web_returns benchmark 12731 12737 6 0.1 12730.8 1.1X
web_sales benchmark 10545 10584 41 0.1 10545.3 1.3X
```
### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible
- [ ] Add screenshots for manual tests if appropriate
- [x] [Run test](https://kyuubi.apache.org/docs/latest/develop_tools/testing.html#running-tests) locally before make a pull request
Closes #2981 from pan3793/tpcds-perf.
Closes #2981
0ca494ef [Cheng Pan] fix
128e6f60 [Cheng Pan] reuse array
64e2f6e2 [Cheng Pan] Improve TPC-DS scan performance
Authored-by: Cheng Pan <ch...@apache.org>
Signed-off-by: Cheng Pan <ch...@apache.org>
---
.../TPCDSTableGenerateBenchmark-results.txt | 28 +++++++++++-----------
.../spark/connector/tpcds/TPCDSBatchScan.scala | 11 +++++----
2 files changed, 21 insertions(+), 18 deletions(-)
diff --git a/extensions/spark/kyuubi-spark-connector-tpcds/benchmarks/TPCDSTableGenerateBenchmark-results.txt b/extensions/spark/kyuubi-spark-connector-tpcds/benchmarks/TPCDSTableGenerateBenchmark-results.txt
index a77f91ba7..a5798c6db 100644
--- a/extensions/spark/kyuubi-spark-connector-tpcds/benchmarks/TPCDSTableGenerateBenchmark-results.txt
+++ b/extensions/spark/kyuubi-spark-connector-tpcds/benchmarks/TPCDSTableGenerateBenchmark-results.txt
@@ -1,15 +1,15 @@
-Java HotSpot(TM) 64-Bit Server VM 1.8.0_311-b11 on Windows 10 10.0
-AMD64 Family 23 Model 96 Stepping 1, AuthenticAMD
+OpenJDK 64-Bit Server VM 1.8.0_332-b09 on Mac OS X 12.4
+Apple M1 Pro
TPCDS table generates 1000000 rows benchmark: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
-----------------------------------------------------------------------------------------------------------------------------
-catalog_returns benchmark 13120 13143 33 0.1 13120.1 1.0X
-catalog_sales benchmark 12008 12054 44 0.1 12008.4 1.1X
-customer benchmark 7499 7529 28 0.1 7499.2 1.7X
-customer_address benchmark 4939 4958 16 0.2 4939.1 2.7X
-customer_demographics benchmark 857 860 5 1.2 856.7 15.3X
-inventory benchmark 527 529 2 1.9 527.0 24.9X
-store_returns benchmark 11643 11703 53 0.1 11643.0 1.1X
-store_sales benchmark 9418 9517 91 0.1 9418.2 1.4X
-web_returns benchmark 13560 13580 23 0.1 13559.7 1.0X
-web_sales benchmark 12635 12748 138 0.1 12634.9 1.0X
-
+----------------------------------------------------------------------------------------------------------------------------
+catalog_returns benchmark 13644 13703 52 0.1 13643.6 1.0X
+catalog_sales benchmark 10505 10553 43 0.1 10505.2 1.3X
+customer benchmark 8571 8658 124 0.1 8570.8 1.6X
+customer_address benchmark 5230 5255 25 0.2 5229.7 2.6X
+customer_demographics benchmark 838 844 6 1.2 837.7 16.3X
+inventory benchmark 475 489 13 2.1 475.3 28.7X
+store_returns benchmark 10808 10935 163 0.1 10807.8 1.3X
+store_sales benchmark 7694 7723 43 0.1 7693.5 1.8X
+web_returns benchmark 12731 12737 6 0.1 12730.8 1.1X
+web_sales benchmark 10545 10584 41 0.1 10545.3 1.3X
+
diff --git a/extensions/spark/kyuubi-spark-connector-tpcds/src/main/scala/org/apache/kyuubi/spark/connector/tpcds/TPCDSBatchScan.scala b/extensions/spark/kyuubi-spark-connector-tpcds/src/main/scala/org/apache/kyuubi/spark/connector/tpcds/TPCDSBatchScan.scala
index b1f5698a5..4fa243271 100644
--- a/extensions/spark/kyuubi-spark-connector-tpcds/src/main/scala/org/apache/kyuubi/spark/connector/tpcds/TPCDSBatchScan.scala
+++ b/extensions/spark/kyuubi-spark-connector-tpcds/src/main/scala/org/apache/kyuubi/spark/connector/tpcds/TPCDSBatchScan.scala
@@ -92,13 +92,15 @@ class TPCDSPartitionReader(
private lazy val dateFmt: DateTimeFormatter = DateTimeFormatter.ofPattern("yyyy-MM-dd")
+ private val reusedRow = new Array[Any](schema.length)
private val iterator = Results
.constructResults(chuckInfo.getOnlyTableToGenerate, chuckInfo)
.iterator.asScala
.map { _.get(0).asScala } // the 1st row is specific table row
- .map { row =>
- row.zipWithIndex.map { case (v, i) =>
- (v, schema(i).dataType) match {
+ .map { stringRow =>
+ var i = 0
+ while (i < stringRow.length) {
+ reusedRow(i) = (stringRow(i), schema(i).dataType) match {
case (null, _) => null
case (Options.DEFAULT_NULL_STRING, _) => null
case (v, IntegerType) => v.toInt
@@ -110,9 +112,10 @@ class TPCDSPartitionReader(
case (v, DecimalType()) => Decimal(v)
case (v, dt) => throw new IllegalArgumentException(s"value: $v, type: $dt")
}
+ i += 1
}
+ InternalRow(reusedRow: _*)
}
- .map { row => InternalRow.fromSeq(row) }
private var currentRow: InternalRow = _