You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (Jira)" <ji...@apache.org> on 2021/04/01 02:55:00 UTC
[jira] [Updated] (SPARK-34929) MapStatusesSerDeserBenchmark causes a side effect to other benchmarks with tasks being too big (JDK 11)

     [ https://issues.apache.org/jira/browse/SPARK-34929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon updated SPARK-34929:
---------------------------------
    Description: 
In JDK 11, MapStatusesSerDeserBenchmark (being started failed) seems affecting other benchmark cases with growing the size of task:

{code}
2021-03-31T16:46:43.1179145Z 21/03/31 16:46:43 WARN DAGScheduler: Broadcasting large task binary with size 42.2 MiB
2021-03-31T16:46:47.3079315Z 21/03/31 16:46:47 WARN DAGScheduler: Broadcasting large task binary with size 42.2 MiB
2021-03-31T16:46:51.5920733Z 21/03/31 16:46:51 WARN DAGScheduler: Broadcasting large task binary with size 42.2 MiB
2021-03-31T16:46:55.9175194Z 21/03/31 16:46:55 WARN DAGScheduler: Broadcasting large task binary with size 42.2 MiB
2021-03-31T16:46:57.6874541Z   Stopped after 3 iterations, 12928 ms
2021-03-31T16:46:57.6875644Z 
2021-03-31T16:46:57.6877153Z OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1041-azure
2021-03-31T16:46:57.7095280Z Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
2021-03-31T16:46:57.7097654Z from_json as subExpr in Project:          Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
2021-03-31T16:46:57.7099059Z ------------------------------------------------------------------------------------------------------------------------
2021-03-31T16:46:57.7100274Z subExprElimination false, codegen: true           38880          41246        1389          0.0   388800445.2       1.0X
2021-03-31T16:46:57.7101134Z subExprElimination false, codegen: false          35819          38141        1234          0.0   358188088.6       1.1X
2021-03-31T16:46:57.7106264Z subExprElimination true, codegen: true             3947           4157         364          0.0    39465629.1       9.9X
2021-03-31T16:46:57.7106982Z subExprElimination true, codegen: false            4191           4309         112          0.0    41908945.5       9.3X
2021-03-31T16:46:57.7107595Z 
2021-03-31T16:46:57.7135178Z Preparing data for benchmarking ...
2021-03-31T16:46:58.5630584Z Running benchmark: from_json as subExpr in Filter
2021-03-31T16:46:58.5633083Z   Running case: subExprElimination false, codegen: true
2021-03-31T16:48:25.5619312Z 21/03/31 16:48:25 WARN DAGScheduler: Broadcasting large task binary with size 43.0 MiB
{code}

It only happens when the benchmarks run sequentially via Benchmarks.scala. 

  was:
In JDK 11, MapStatusesSerDeserBenchmark (being started failed) seems affecting other benchmark cases with growing the size of task:

```
2021-03-31T16:46:43.1179145Z 21/03/31 16:46:43 WARN DAGScheduler: Broadcasting large task binary with size 42.2 MiB
2021-03-31T16:46:47.3079315Z 21/03/31 16:46:47 WARN DAGScheduler: Broadcasting large task binary with size 42.2 MiB
2021-03-31T16:46:51.5920733Z 21/03/31 16:46:51 WARN DAGScheduler: Broadcasting large task binary with size 42.2 MiB
2021-03-31T16:46:55.9175194Z 21/03/31 16:46:55 WARN DAGScheduler: Broadcasting large task binary with size 42.2 MiB
2021-03-31T16:46:57.6874541Z   Stopped after 3 iterations, 12928 ms
2021-03-31T16:46:57.6875644Z 
2021-03-31T16:46:57.6877153Z OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1041-azure
2021-03-31T16:46:57.7095280Z Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
2021-03-31T16:46:57.7097654Z from_json as subExpr in Project:          Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
2021-03-31T16:46:57.7099059Z ------------------------------------------------------------------------------------------------------------------------
2021-03-31T16:46:57.7100274Z subExprElimination false, codegen: true           38880          41246        1389          0.0   388800445.2       1.0X
2021-03-31T16:46:57.7101134Z subExprElimination false, codegen: false          35819          38141        1234          0.0   358188088.6       1.1X
2021-03-31T16:46:57.7106264Z subExprElimination true, codegen: true             3947           4157         364          0.0    39465629.1       9.9X
2021-03-31T16:46:57.7106982Z subExprElimination true, codegen: false            4191           4309         112          0.0    41908945.5       9.3X
2021-03-31T16:46:57.7107595Z 
2021-03-31T16:46:57.7135178Z Preparing data for benchmarking ...
2021-03-31T16:46:58.5630584Z Running benchmark: from_json as subExpr in Filter
2021-03-31T16:46:58.5633083Z   Running case: subExprElimination false, codegen: true
2021-03-31T16:48:25.5619312Z 21/03/31 16:48:25 WARN DAGScheduler: Broadcasting large task binary with size 43.0 MiB
```

It only happens when the benchmarks run sequentially via Benchmarks.scala. 


> MapStatusesSerDeserBenchmark causes a side effect to other benchmarks with tasks being too big (JDK 11)
> -------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-34929
>                 URL: https://issues.apache.org/jira/browse/SPARK-34929
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL, Tests
>    Affects Versions: 3.2.0
>            Reporter: Hyukjin Kwon
>            Priority: Minor
>
> In JDK 11, MapStatusesSerDeserBenchmark (being started failed) seems affecting other benchmark cases with growing the size of task:
> {code}
> 2021-03-31T16:46:43.1179145Z 21/03/31 16:46:43 WARN DAGScheduler: Broadcasting large task binary with size 42.2 MiB
> 2021-03-31T16:46:47.3079315Z 21/03/31 16:46:47 WARN DAGScheduler: Broadcasting large task binary with size 42.2 MiB
> 2021-03-31T16:46:51.5920733Z 21/03/31 16:46:51 WARN DAGScheduler: Broadcasting large task binary with size 42.2 MiB
> 2021-03-31T16:46:55.9175194Z 21/03/31 16:46:55 WARN DAGScheduler: Broadcasting large task binary with size 42.2 MiB
> 2021-03-31T16:46:57.6874541Z   Stopped after 3 iterations, 12928 ms
> 2021-03-31T16:46:57.6875644Z 
> 2021-03-31T16:46:57.6877153Z OpenJDK 64-Bit Server VM 11.0.10+9-LTS on Linux 5.4.0-1041-azure
> 2021-03-31T16:46:57.7095280Z Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz
> 2021-03-31T16:46:57.7097654Z from_json as subExpr in Project:          Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
> 2021-03-31T16:46:57.7099059Z ------------------------------------------------------------------------------------------------------------------------
> 2021-03-31T16:46:57.7100274Z subExprElimination false, codegen: true           38880          41246        1389          0.0   388800445.2       1.0X
> 2021-03-31T16:46:57.7101134Z subExprElimination false, codegen: false          35819          38141        1234          0.0   358188088.6       1.1X
> 2021-03-31T16:46:57.7106264Z subExprElimination true, codegen: true             3947           4157         364          0.0    39465629.1       9.9X
> 2021-03-31T16:46:57.7106982Z subExprElimination true, codegen: false            4191           4309         112          0.0    41908945.5       9.3X
> 2021-03-31T16:46:57.7107595Z 
> 2021-03-31T16:46:57.7135178Z Preparing data for benchmarking ...
> 2021-03-31T16:46:58.5630584Z Running benchmark: from_json as subExpr in Filter
> 2021-03-31T16:46:58.5633083Z   Running case: subExprElimination false, codegen: true
> 2021-03-31T16:48:25.5619312Z 21/03/31 16:48:25 WARN DAGScheduler: Broadcasting large task binary with size 43.0 MiB
> {code}
> It only happens when the benchmarks run sequentially via Benchmarks.scala. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org