You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (Jira)" <ji...@apache.org> on 2023/03/14 15:02:00 UTC

[jira] [Commented] (SPARK-42789) rewrites multiple GetJsonObjects to a JsonTuple if their json expression is the same

    [ https://issues.apache.org/jira/browse/SPARK-42789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17700241#comment-17700241 ] 

Apache Spark commented on SPARK-42789:
--------------------------------------

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/40419

> rewrites multiple GetJsonObjects to a JsonTuple if their json expression is the same
> ------------------------------------------------------------------------------------
>
>                 Key: SPARK-42789
>                 URL: https://issues.apache.org/jira/browse/SPARK-42789
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.5.0
>            Reporter: Yuming Wang
>            Priority: Major
>
> Benchmark result:
> {noformat}
> Running benchmark: Benchmark rewrite GetJsonObjects
>   Running case: Default: 2
>   Stopped after 2 iterations, 80787 ms
>   Running case: Rewrite: 2
>   Stopped after 2 iterations, 48900 ms
> Java HotSpot(TM) 64-Bit Server VM 17.0.4.1+1-LTS-2 on Mac OS X 13.2.1
> Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
> Benchmark rewrite GetJsonObjects:         Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
> ------------------------------------------------------------------------------------------------------------------------
> Default: 2                                        39026          40394        1935          0.2        5397.8       1.0X
> Rewrite: 2                                        24354          24450         137          0.3        3368.4       1.6X
> Running benchmark: Benchmark rewrite GetJsonObjects
>   Running case: Default: 3
>   Stopped after 2 iterations, 115055 ms
>   Running case: Rewrite: 3
>   Stopped after 2 iterations, 62297 ms
> Java HotSpot(TM) 64-Bit Server VM 17.0.4.1+1-LTS-2 on Mac OS X 13.2.1
> Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
> Benchmark rewrite GetJsonObjects:         Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
> ------------------------------------------------------------------------------------------------------------------------
> Default: 3                                        54652          57528         NaN          0.1        7559.1       1.0X
> Rewrite: 3                                        30702          31149         631          0.2        4246.6       1.8X
> Running benchmark: Benchmark rewrite GetJsonObjects
>   Running case: Default: 4
>   Stopped after 2 iterations, 155392 ms
>   Running case: Rewrite: 4
>   Stopped after 2 iterations, 54776 ms
> Java HotSpot(TM) 64-Bit Server VM 17.0.4.1+1-LTS-2 on Mac OS X 13.2.1
> Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
> Benchmark rewrite GetJsonObjects:         Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
> ------------------------------------------------------------------------------------------------------------------------
> Default: 4                                        75503          77696         NaN          0.1       10443.1       1.0X
> Rewrite: 4                                        26962          27388         602          0.3        3729.3       2.8X
> Running benchmark: Benchmark rewrite GetJsonObjects
>   Running case: Default: 5
>   Stopped after 2 iterations, 192836 ms
>   Running case: Rewrite: 5
>   Stopped after 2 iterations, 51967 ms
> Java HotSpot(TM) 64-Bit Server VM 17.0.4.1+1-LTS-2 on Mac OS X 13.2.1
> Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
> Benchmark rewrite GetJsonObjects:         Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
> ------------------------------------------------------------------------------------------------------------------------
> Default: 5                                        94923          96418        2115          0.1       13129.1       1.0X
> Rewrite: 5                                        25362          25984         880          0.3        3507.8       3.7X
> Running benchmark: Benchmark rewrite GetJsonObjects
>   Running case: Default: 10
>   Stopped after 2 iterations, 317246 ms
>   Running case: Rewrite: 10
>   Stopped after 2 iterations, 56734 ms
> Java HotSpot(TM) 64-Bit Server VM 17.0.4.1+1-LTS-2 on Mac OS X 13.2.1
> Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
> Benchmark rewrite GetJsonObjects:         Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
> ------------------------------------------------------------------------------------------------------------------------
> Default: 10                                      157458         158623        1648          0.0       21778.6       1.0X
> Rewrite: 10                                       28296          28367         100          0.3        3913.8       5.6X
> Running benchmark: Benchmark rewrite GetJsonObjects
>   Running case: Default: 20
>   Stopped after 2 iterations, 618089 ms
>   Running case: Rewrite: 20
>   Stopped after 2 iterations, 63576 ms
> Java HotSpot(TM) 64-Bit Server VM 17.0.4.1+1-LTS-2 on Mac OS X 13.2.1
> Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
> Benchmark rewrite GetJsonObjects:         Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
> ------------------------------------------------------------------------------------------------------------------------
> Default: 20                                      285338         309045         NaN          0.0       39466.2       1.0X
> Rewrite: 20                                       31682          31788         151          0.2        4382.0       9.0X
> Running benchmark: Benchmark rewrite GetJsonObjects
>   Running case: Default: 30
> 07:25:58.851 WARN org.apache.spark.sql.catalyst.util.package: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.sql.debug.maxToStringFields'.
>   Stopped after 2 iterations, 1113910 ms
>   Running case: Rewrite: 30
>   Stopped after 2 iterations, 101468 ms
> Java HotSpot(TM) 64-Bit Server VM 17.0.4.1+1-LTS-2 on Mac OS X 13.2.1
> Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
> Benchmark rewrite GetJsonObjects:         Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
> ------------------------------------------------------------------------------------------------------------------------
> Default: 30                                      481691         556955        1722          0.0       66624.5       1.0X
> Rewrite: 30                                       50497          50734         335          0.1        6984.5       9.5X
> Running benchmark: Benchmark rewrite GetJsonObjects
>   Running case: Default: 36
>   Stopped after 2 iterations, 1272619 ms
>   Running case: Rewrite: 36
>   Stopped after 2 iterations, 81609 ms
> Java HotSpot(TM) 64-Bit Server VM 17.0.4.1+1-LTS-2 on Mac OS X 13.2.1
> Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
> Benchmark rewrite GetJsonObjects:         Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
> ------------------------------------------------------------------------------------------------------------------------
> Default: 36                                      576500         636310         NaN          0.0       79737.8       1.0X
> Rewrite: 36                                       40461          40805         486          0.2        5596.4      14.2X
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org