You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (Jira)" <ji...@apache.org> on 2021/04/01 02:55:00 UTC
[jira] [Updated] (SPARK-34929) MapStatusesSerDeserBenchmark causes
a side effect to other benchmarks with tasks being too big (JDK 11)
[ https://issues.apache.org/jira/browse/SPARK-34929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon updated SPARK-34929:
---------------------------------
Description:
In JDK 11, MapStatusesSerDeserBenchmark (being started failed) seems affecting other benchmark cases with growing the size of task:
{code}
2021-03-31T16:46:43.1179145Z 21/03/31 16:46:43 WARN DAGScheduler: Broadcasting large task binary with size 42.2 MiB
2021-03-31T16:46:47.3079315Z 21/03/31 16:46:47 WARN DAGScheduler: Broadcasting large task binary with size 42.2 MiB
2021-03-31T16:46:51.5920733Z 21/03/31 16:46:51 WARN DAGScheduler: Broadcasting large task binary with size 42.2 MiB
2021-03-31T16:46:55.9175194Z 21/03/31 16:46:55 WARN DAGScheduler: Broadcasting large task binary with size 42.2 MiB
2021-03-31T16:46:57.6874541Z Stopped after 3 iterations, 12928 ms
2021-03-31T16:46:57.6875644Z
2021-03-31T16:46:57.6877153Z OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1041-azure
2021-03-31T16:46:57.7095280Z Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
2021-03-31T16:46:57.7097654Z from_json as subExpr in Project: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
2021-03-31T16:46:57.7099059Z ------------------------------------------------------------------------------------------------------------------------
2021-03-31T16:46:57.7100274Z subExprElimination false, codegen: true 38880 41246 1389 0.0 388800445.2 1.0X
2021-03-31T16:46:57.7101134Z subExprElimination false, codegen: false 35819 38141 1234 0.0 358188088.6 1.1X
2021-03-31T16:46:57.7106264Z subExprElimination true, codegen: true 3947 4157 364 0.0 39465629.1 9.9X
2021-03-31T16:46:57.7106982Z subExprElimination true, codegen: false 4191 4309 112 0.0 41908945.5 9.3X
2021-03-31T16:46:57.7107595Z
2021-03-31T16:46:57.7135178Z Preparing data for benchmarking ...
2021-03-31T16:46:58.5630584Z Running benchmark: from_json as subExpr in Filter
2021-03-31T16:46:58.5633083Z Running case: subExprElimination false, codegen: true
2021-03-31T16:48:25.5619312Z 21/03/31 16:48:25 WARN DAGScheduler: Broadcasting large task binary with size 43.0 MiB
{code}
It only happens when the benchmarks run sequentially via Benchmarks.scala.
was:
In JDK 11, MapStatusesSerDeserBenchmark (being started failed) seems affecting other benchmark cases with growing the size of task:
```
2021-03-31T16:46:43.1179145Z 21/03/31 16:46:43 WARN DAGScheduler: Broadcasting large task binary with size 42.2 MiB
2021-03-31T16:46:47.3079315Z 21/03/31 16:46:47 WARN DAGScheduler: Broadcasting large task binary with size 42.2 MiB
2021-03-31T16:46:51.5920733Z 21/03/31 16:46:51 WARN DAGScheduler: Broadcasting large task binary with size 42.2 MiB
2021-03-31T16:46:55.9175194Z 21/03/31 16:46:55 WARN DAGScheduler: Broadcasting large task binary with size 42.2 MiB
2021-03-31T16:46:57.6874541Z Stopped after 3 iterations, 12928 ms
2021-03-31T16:46:57.6875644Z
2021-03-31T16:46:57.6877153Z OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1041-azure
2021-03-31T16:46:57.7095280Z Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
2021-03-31T16:46:57.7097654Z from_json as subExpr in Project: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
2021-03-31T16:46:57.7099059Z ------------------------------------------------------------------------------------------------------------------------
2021-03-31T16:46:57.7100274Z subExprElimination false, codegen: true 38880 41246 1389 0.0 388800445.2 1.0X
2021-03-31T16:46:57.7101134Z subExprElimination false, codegen: false 35819 38141 1234 0.0 358188088.6 1.1X
2021-03-31T16:46:57.7106264Z subExprElimination true, codegen: true 3947 4157 364 0.0 39465629.1 9.9X
2021-03-31T16:46:57.7106982Z subExprElimination true, codegen: false 4191 4309 112 0.0 41908945.5 9.3X
2021-03-31T16:46:57.7107595Z
2021-03-31T16:46:57.7135178Z Preparing data for benchmarking ...
2021-03-31T16:46:58.5630584Z Running benchmark: from_json as subExpr in Filter
2021-03-31T16:46:58.5633083Z Running case: subExprElimination false, codegen: true
2021-03-31T16:48:25.5619312Z 21/03/31 16:48:25 WARN DAGScheduler: Broadcasting large task binary with size 43.0 MiB
```
It only happens when the benchmarks run sequentially via Benchmarks.scala.
> MapStatusesSerDeserBenchmark causes a side effect to other benchmarks with tasks being too big (JDK 11)
> -------------------------------------------------------------------------------------------------------
>
> Key: SPARK-34929
> URL: https://issues.apache.org/jira/browse/SPARK-34929
> Project: Spark
> Issue Type: Bug
> Components: SQL, Tests
> Affects Versions: 3.2.0
> Reporter: Hyukjin Kwon
> Priority: Minor
>
> In JDK 11, MapStatusesSerDeserBenchmark (being started failed) seems affecting other benchmark cases with growing the size of task:
> {code}
> 2021-03-31T16:46:43.1179145Z 21/03/31 16:46:43 WARN DAGScheduler: Broadcasting large task binary with size 42.2 MiB
> 2021-03-31T16:46:47.3079315Z 21/03/31 16:46:47 WARN DAGScheduler: Broadcasting large task binary with size 42.2 MiB
> 2021-03-31T16:46:51.5920733Z 21/03/31 16:46:51 WARN DAGScheduler: Broadcasting large task binary with size 42.2 MiB
> 2021-03-31T16:46:55.9175194Z 21/03/31 16:46:55 WARN DAGScheduler: Broadcasting large task binary with size 42.2 MiB
> 2021-03-31T16:46:57.6874541Z Stopped after 3 iterations, 12928 ms
> 2021-03-31T16:46:57.6875644Z
> 2021-03-31T16:46:57.6877153Z OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1041-azure
> 2021-03-31T16:46:57.7095280Z Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
> 2021-03-31T16:46:57.7097654Z from_json as subExpr in Project: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
> 2021-03-31T16:46:57.7099059Z ------------------------------------------------------------------------------------------------------------------------
> 2021-03-31T16:46:57.7100274Z subExprElimination false, codegen: true 38880 41246 1389 0.0 388800445.2 1.0X
> 2021-03-31T16:46:57.7101134Z subExprElimination false, codegen: false 35819 38141 1234 0.0 358188088.6 1.1X
> 2021-03-31T16:46:57.7106264Z subExprElimination true, codegen: true 3947 4157 364 0.0 39465629.1 9.9X
> 2021-03-31T16:46:57.7106982Z subExprElimination true, codegen: false 4191 4309 112 0.0 41908945.5 9.3X
> 2021-03-31T16:46:57.7107595Z
> 2021-03-31T16:46:57.7135178Z Preparing data for benchmarking ...
> 2021-03-31T16:46:58.5630584Z Running benchmark: from_json as subExpr in Filter
> 2021-03-31T16:46:58.5633083Z Running case: subExprElimination false, codegen: true
> 2021-03-31T16:48:25.5619312Z 21/03/31 16:48:25 WARN DAGScheduler: Broadcasting large task binary with size 43.0 MiB
> {code}
> It only happens when the benchmarks run sequentially via Benchmarks.scala.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org