You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/07/01 11:37:06 UTC

[GitHub] [spark] MaxGekk opened a new pull request #28966: [SPARK-32130][SQL] Disable the JSON option `inferTimestamp` by default

MaxGekk opened a new pull request #28966:
URL: https://github.com/apache/spark/pull/28966


   ### What changes were proposed in this pull request?
   Set the JSON option `inferTimestamp` to `false` if an user don't pass it as datasource option.
   
   ### Why are the changes needed?
   To prevent perf regression while inferring schemas from JSON with potential timestamps fields.
   
   ### Does this PR introduce _any_ user-facing change?
   Yes
   
   ### How was this patch tested?
   Modified existing tests in `JsonSuite` and `JsonInferSchemaSuite`.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28966: [SPARK-32130][SQL] Disable the JSON option `inferTimestamp` by default

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #28966:
URL: https://github.com/apache/spark/pull/28966#discussion_r448650434



##########
File path: sql/core/benchmarks/JsonBenchmark-jdk11-results.txt
##########
@@ -7,106 +7,106 @@ OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-106
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 JSON schema inferring:                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 ------------------------------------------------------------------------------------------------------------------------
-No encoding                                       68879          68993         116          1.5         688.8       1.0X
-UTF-8 is set                                     115270         115602         455          0.9        1152.7       0.6X
+No encoding                                       69219          69342         116          1.4         692.2       1.0X
+UTF-8 is set                                     143950         143986          55          0.7        1439.5       0.5X
 
 Preparing data for benchmarking ...
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 count a short column:                     Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 ------------------------------------------------------------------------------------------------------------------------
-No encoding                                       47452          47538         113          2.1         474.5       1.0X
-UTF-8 is set                                      77330          77354          30          1.3         773.3       0.6X
+No encoding                                       57828          57913         136          1.7         578.3       1.0X
+UTF-8 is set                                      83649          83711          60          1.2         836.5       0.7X
 
 Preparing data for benchmarking ...
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 count a wide column:                      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 ------------------------------------------------------------------------------------------------------------------------
-No encoding                                       60470          60900         534          0.2        6047.0       1.0X
-UTF-8 is set                                     104733         104931         189          0.1       10473.3       0.6X
+No encoding                                       64560          65193        1023          0.2        6456.0       1.0X
+UTF-8 is set                                     102925         103174         216          0.1       10292.5       0.6X
 
 Preparing data for benchmarking ...
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 select wide row:                          Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 ------------------------------------------------------------------------------------------------------------------------
-No encoding                                      130302         131072         976          0.0      260604.6       1.0X
-UTF-8 is set                                     150860         151284         377          0.0      301720.1       0.9X
+No encoding                                      131002         132316        1160          0.0      262003.1       1.0X
+UTF-8 is set                                     152128         152371         332          0.0      304256.5       0.9X
 
 Preparing data for benchmarking ...
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 Select a subset of 10 columns:            Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 ------------------------------------------------------------------------------------------------------------------------
-Select 10 columns                                 18619          18684          99          0.5        1861.9       1.0X
-Select 1 column                                   24227          24270          38          0.4        2422.7       0.8X
+Select 10 columns                                 19376          19514         160          0.5        1937.6       1.0X
+Select 1 column                                   24089          24156          58          0.4        2408.9       0.8X
 
 Preparing data for benchmarking ...
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 creation of JSON parser per line:         Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 ------------------------------------------------------------------------------------------------------------------------
-Short column without encoding                      7947           7971          21          1.3         794.7       1.0X
-Short column with UTF-8                           12700          12753          58          0.8        1270.0       0.6X
-Wide column without encoding                      92632          92955         463          0.1        9263.2       0.1X
-Wide column with UTF-8                           147013         147170         188          0.1       14701.3       0.1X
+Short column without encoding                      8131           8219         103          1.2         813.1       1.0X
+Short column with UTF-8                           13464          13508          44          0.7        1346.4       0.6X
+Wide column without encoding                     108012         108598         914          0.1       10801.2       0.1X
+Wide column with UTF-8                           150988         151369         412          0.1       15098.8       0.1X
 
 Preparing data for benchmarking ...
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 JSON functions:                           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 ------------------------------------------------------------------------------------------------------------------------
-Text read                                           713            734          19         14.0          71.3       1.0X
-from_json                                         22019          22429         456          0.5        2201.9       0.0X
-json_tuple                                        27987          28047          74          0.4        2798.7       0.0X
-get_json_object                                   21468          21870         350          0.5        2146.8       0.0X
+Text read                                           753            765          18         13.3          75.3       1.0X
+from_json                                         23182          23446         230          0.4        2318.2       0.0X
+json_tuple                                        31129          31304         181          0.3        3112.9       0.0X
+get_json_object                                   22821          23073         225          0.4        2282.1       0.0X
 
 Preparing data for benchmarking ...
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 Dataset of json strings:                  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 ------------------------------------------------------------------------------------------------------------------------
-Text read                                          2887           2910          24         17.3          57.7       1.0X
-schema inferring                                  31793          31843          43          1.6         635.9       0.1X
-parsing                                           36791          37104         294          1.4         735.8       0.1X
+Text read                                          3078           3101          26         16.2          61.6       1.0X
+schema inferring                                  30225          30434         333          1.7         604.5       0.1X
+parsing                                           32237          32308          63          1.6         644.7       0.1X
 
 Preparing data for benchmarking ...
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 Json files in the per-line mode:          Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 ------------------------------------------------------------------------------------------------------------------------
-Text read                                         10570          10611          45          4.7         211.4       1.0X
-Schema inferring                                  48729          48763          41          1.0         974.6       0.2X
-Parsing without charset                           35490          35648         141          1.4         709.8       0.3X
-Parsing with UTF-8                                63853          63994         163          0.8        1277.1       0.2X
+Text read                                         10835          10900          86          4.6         216.7       1.0X
+Schema inferring                                  37720          37805         110          1.3         754.4       0.3X
+Parsing without charset                           35464          35538         100          1.4         709.3       0.3X
+Parsing with UTF-8                                67311          67738         381          0.7        1346.2       0.2X
 
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 Write dates and timestamps:               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 ------------------------------------------------------------------------------------------------------------------------
-Create a dataset of timestamps                     2187           2190           5          4.6         218.7       1.0X
-to_json(timestamp)                                16262          16503         323          0.6        1626.2       0.1X
-write timestamps to files                         11679          11692          12          0.9        1167.9       0.2X
-Create a dataset of dates                          2297           2310          12          4.4         229.7       1.0X
-to_json(date)                                     10904          10956          46          0.9        1090.4       0.2X
-write dates to files                               6610           6645          35          1.5         661.0       0.3X
+Create a dataset of timestamps                     2208           2222          14          4.5         220.8       1.0X
+to_json(timestamp)                                14299          14570         285          0.7        1429.9       0.2X
+write timestamps to files                         12955          12969          13          0.8        1295.5       0.2X
+Create a dataset of dates                          2297           2323          30          4.4         229.7       1.0X
+to_json(date)                                      8509           8561          74          1.2         850.9       0.3X
+write dates to files                               6786           6827          45          1.5         678.6       0.3X
 
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 Read dates and timestamps:                Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 ------------------------------------------------------------------------------------------------------------------------
-read timestamp text from files                     2524           2530           9          4.0         252.4       1.0X
-read timestamps from files                        41002          41052          59          0.2        4100.2       0.1X
-infer timestamps from files                       84621          84939         526          0.1        8462.1       0.0X
-read date text from files                          2292           2302           9          4.4         229.2       1.1X
-read date from files                              16954          16976          21          0.6        1695.4       0.1X
-timestamp strings                                  3067           3077          13          3.3         306.7       0.8X
-parse timestamps from Dataset[String]             48690          48971         243          0.2        4869.0       0.1X
-infer timestamps from Dataset[String]             97463          97786         338          0.1        9746.3       0.0X
-date strings                                       3952           3956           3          2.5         395.2       0.6X
-parse dates from Dataset[String]                  24210          24241          30          0.4        2421.0       0.1X
-from_json(timestamp)                              71710          72242         629          0.1        7171.0       0.0X
-from_json(date)                                   42465          42481          13          0.2        4246.5       0.1X
+read timestamp text from files                     2598           2613          18          3.8         259.8       1.0X
+read timestamps from files                        42007          42028          19          0.2        4200.7       0.1X
+infer timestamps from files                       18102          18120          28          0.6        1810.2       0.1X

Review comment:
       Never mind. Let's update the benchmark suite later as a follow-up.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun closed pull request #28966: [SPARK-32130][SQL] Disable the JSON option `inferTimestamp` by default

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun closed pull request #28966:
URL: https://github.com/apache/spark/pull/28966


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #28966: [SPARK-32130][SQL] Disable the JSON option `inferTimestamp` by default

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #28966:
URL: https://github.com/apache/spark/pull/28966#issuecomment-652683911






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #28966: [SPARK-32130][SQL] Disable the JSON option `inferTimestamp` by default

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #28966:
URL: https://github.com/apache/spark/pull/28966#issuecomment-652471884


   Thank you, @MaxGekk !


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang commented on pull request #28966: [SPARK-32130][SQL] Disable the JSON option `inferTimestamp` by default

Posted by GitBox <gi...@apache.org>.
gengliangwang commented on pull request #28966:
URL: https://github.com/apache/spark/pull/28966#issuecomment-652421647


   Shall we add migration guide as well?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28966: [SPARK-32130][SQL] Disable the JSON option `inferTimestamp` by default

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28966:
URL: https://github.com/apache/spark/pull/28966#issuecomment-652367972






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on pull request #28966: [SPARK-32130][SQL] Disable the JSON option `inferTimestamp` by default

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on pull request #28966:
URL: https://github.com/apache/spark/pull/28966#issuecomment-652531097


   @cloud-fan There are items for minor versions too `Upgrading from Spark SQL 2.4 to 2.4.1`. See at docs/sql-migration-guide.md. I am going to add a separate section `Upgrading from Spark SQL 3.0 to 3.0.1`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28966: [SPARK-32130][SQL] Disable the JSON option `inferTimestamp` by default

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28966:
URL: https://github.com/apache/spark/pull/28966#issuecomment-652683911


   Merged build finished. Test PASSed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #28966: [SPARK-32130][SQL] Disable the JSON option `inferTimestamp` by default

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #28966:
URL: https://github.com/apache/spark/pull/28966#issuecomment-652597041


   **[Test build #124770 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124770/testReport)** for PR 28966 at commit [`277a679`](https://github.com/apache/spark/commit/277a679aff7cbdb4c19a4d794615dc30eb387266).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #28966: [SPARK-32130][SQL] Disable the JSON option `inferTimestamp` by default

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #28966:
URL: https://github.com/apache/spark/pull/28966#issuecomment-652599271






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on pull request #28966: [SPARK-32130][SQL] Disable the JSON option `inferTimestamp` by default

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on pull request #28966:
URL: https://github.com/apache/spark/pull/28966#issuecomment-652480348


   I am going to add this to the SQL migration guide:
   ```
     - In Spark 3.0, JSON datasource and JSON function `schema_of_json` infer TimestampType from string values if they match to the pattern defined by the JSON option `timestampFormat`. Since version 3.1, the timestamp type inference is disabled by default. Set the JSON option `inferTimestamp` to `true` to enable such type inference.
   ```
   
   The question is about the version from which we are going to disable the inference - Spark 3.1 or Spark 3.0.1.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun edited a comment on pull request #28966: [SPARK-32130][SQL] Disable the JSON option `inferTimestamp` by default

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun edited a comment on pull request #28966:
URL: https://github.com/apache/spark/pull/28966#issuecomment-652481708


   I set the target version to 3.0.1. This should go to 3.0.1.
   This is a regression due to SPARK-26246, isn't it?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #28966: [SPARK-32130][SQL] Disable the JSON option `inferTimestamp` by default

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #28966:
URL: https://github.com/apache/spark/pull/28966#issuecomment-652481708


   I set the target version to 3.0.1. This should go to 3.0.1.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28966: [SPARK-32130][SQL] Disable the JSON option `inferTimestamp` by default

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28966:
URL: https://github.com/apache/spark/pull/28966#issuecomment-652535966






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on pull request #28966: [SPARK-32130][SQL] Disable the JSON option `inferTimestamp` by default

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on pull request #28966:
URL: https://github.com/apache/spark/pull/28966#issuecomment-652371322


   I am running JSONBenchmark now. If it will show significantly different numbers, I will push the changes to this PR.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #28966: [SPARK-32130][SQL] Disable the JSON option `inferTimestamp` by default

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #28966:
URL: https://github.com/apache/spark/pull/28966#issuecomment-652519714


   Yea this should go to branch-3.0 to fix the perf regression.
   
   AFAIK migration guide is only for major/feature releases, so there is no migration guide for 3.0.1. Ideally 3.0.1 should be Spark 3.0 when it's released, so we can simply remove that item from the 3.0 migration guide.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on pull request #28966: [SPARK-32130][SQL] Disable the JSON option `inferTimestamp` by default

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on pull request #28966:
URL: https://github.com/apache/spark/pull/28966#issuecomment-652445949


   > If it will show significantly different numbers ...
   
   Some numbers are different on JDK 8, I am generating benchmark results on JDK 11, and will push the changes soon.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #28966: [SPARK-32130][SQL] Disable the JSON option `inferTimestamp` by default

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #28966:
URL: https://github.com/apache/spark/pull/28966#issuecomment-652733899


    It takes 7 hours ..? I think we should really revisit the test and cut down. Or maybe it's time to think about splitting and migrating to Github Actions completely ..


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #28966: [SPARK-32130][SQL] Disable the JSON option `inferTimestamp` by default

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #28966:
URL: https://github.com/apache/spark/pull/28966#issuecomment-652367481


   **[Test build #124770 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124770/testReport)** for PR 28966 at commit [`277a679`](https://github.com/apache/spark/commit/277a679aff7cbdb4c19a4d794615dc30eb387266).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #28966: [SPARK-32130][SQL] Disable the JSON option `inferTimestamp` by default

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #28966:
URL: https://github.com/apache/spark/pull/28966#issuecomment-652683121


   **[Test build #124800 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124800/testReport)** for PR 28966 at commit [`43fc9aa`](https://github.com/apache/spark/commit/43fc9aaa55c7d56be2035c6cf4c69af930846693).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #28966: [SPARK-32130][SQL] Disable the JSON option `inferTimestamp` by default

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #28966:
URL: https://github.com/apache/spark/pull/28966#issuecomment-652535432


   **[Test build #124800 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124800/testReport)** for PR 28966 at commit [`43fc9aa`](https://github.com/apache/spark/commit/43fc9aaa55c7d56be2035c6cf4c69af930846693).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28966: [SPARK-32130][SQL] Disable the JSON option `inferTimestamp` by default

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #28966:
URL: https://github.com/apache/spark/pull/28966#discussion_r448622881



##########
File path: sql/core/benchmarks/JsonBenchmark-jdk11-results.txt
##########
@@ -7,106 +7,106 @@ OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-106
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 JSON schema inferring:                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 ------------------------------------------------------------------------------------------------------------------------
-No encoding                                       68879          68993         116          1.5         688.8       1.0X
-UTF-8 is set                                     115270         115602         455          0.9        1152.7       0.6X
+No encoding                                       69219          69342         116          1.4         692.2       1.0X
+UTF-8 is set                                     143950         143986          55          0.7        1439.5       0.5X
 
 Preparing data for benchmarking ...
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 count a short column:                     Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 ------------------------------------------------------------------------------------------------------------------------
-No encoding                                       47452          47538         113          2.1         474.5       1.0X
-UTF-8 is set                                      77330          77354          30          1.3         773.3       0.6X
+No encoding                                       57828          57913         136          1.7         578.3       1.0X
+UTF-8 is set                                      83649          83711          60          1.2         836.5       0.7X
 
 Preparing data for benchmarking ...
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 count a wide column:                      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 ------------------------------------------------------------------------------------------------------------------------
-No encoding                                       60470          60900         534          0.2        6047.0       1.0X
-UTF-8 is set                                     104733         104931         189          0.1       10473.3       0.6X
+No encoding                                       64560          65193        1023          0.2        6456.0       1.0X
+UTF-8 is set                                     102925         103174         216          0.1       10292.5       0.6X
 
 Preparing data for benchmarking ...
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 select wide row:                          Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 ------------------------------------------------------------------------------------------------------------------------
-No encoding                                      130302         131072         976          0.0      260604.6       1.0X
-UTF-8 is set                                     150860         151284         377          0.0      301720.1       0.9X
+No encoding                                      131002         132316        1160          0.0      262003.1       1.0X
+UTF-8 is set                                     152128         152371         332          0.0      304256.5       0.9X
 
 Preparing data for benchmarking ...
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 Select a subset of 10 columns:            Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 ------------------------------------------------------------------------------------------------------------------------
-Select 10 columns                                 18619          18684          99          0.5        1861.9       1.0X
-Select 1 column                                   24227          24270          38          0.4        2422.7       0.8X
+Select 10 columns                                 19376          19514         160          0.5        1937.6       1.0X
+Select 1 column                                   24089          24156          58          0.4        2408.9       0.8X
 
 Preparing data for benchmarking ...
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 creation of JSON parser per line:         Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 ------------------------------------------------------------------------------------------------------------------------
-Short column without encoding                      7947           7971          21          1.3         794.7       1.0X
-Short column with UTF-8                           12700          12753          58          0.8        1270.0       0.6X
-Wide column without encoding                      92632          92955         463          0.1        9263.2       0.1X
-Wide column with UTF-8                           147013         147170         188          0.1       14701.3       0.1X
+Short column without encoding                      8131           8219         103          1.2         813.1       1.0X
+Short column with UTF-8                           13464          13508          44          0.7        1346.4       0.6X
+Wide column without encoding                     108012         108598         914          0.1       10801.2       0.1X
+Wide column with UTF-8                           150988         151369         412          0.1       15098.8       0.1X
 
 Preparing data for benchmarking ...
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 JSON functions:                           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 ------------------------------------------------------------------------------------------------------------------------
-Text read                                           713            734          19         14.0          71.3       1.0X
-from_json                                         22019          22429         456          0.5        2201.9       0.0X
-json_tuple                                        27987          28047          74          0.4        2798.7       0.0X
-get_json_object                                   21468          21870         350          0.5        2146.8       0.0X
+Text read                                           753            765          18         13.3          75.3       1.0X
+from_json                                         23182          23446         230          0.4        2318.2       0.0X
+json_tuple                                        31129          31304         181          0.3        3112.9       0.0X
+get_json_object                                   22821          23073         225          0.4        2282.1       0.0X
 
 Preparing data for benchmarking ...
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 Dataset of json strings:                  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 ------------------------------------------------------------------------------------------------------------------------
-Text read                                          2887           2910          24         17.3          57.7       1.0X
-schema inferring                                  31793          31843          43          1.6         635.9       0.1X
-parsing                                           36791          37104         294          1.4         735.8       0.1X
+Text read                                          3078           3101          26         16.2          61.6       1.0X
+schema inferring                                  30225          30434         333          1.7         604.5       0.1X
+parsing                                           32237          32308          63          1.6         644.7       0.1X
 
 Preparing data for benchmarking ...
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 Json files in the per-line mode:          Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 ------------------------------------------------------------------------------------------------------------------------
-Text read                                         10570          10611          45          4.7         211.4       1.0X
-Schema inferring                                  48729          48763          41          1.0         974.6       0.2X
-Parsing without charset                           35490          35648         141          1.4         709.8       0.3X
-Parsing with UTF-8                                63853          63994         163          0.8        1277.1       0.2X
+Text read                                         10835          10900          86          4.6         216.7       1.0X
+Schema inferring                                  37720          37805         110          1.3         754.4       0.3X
+Parsing without charset                           35464          35538         100          1.4         709.3       0.3X
+Parsing with UTF-8                                67311          67738         381          0.7        1346.2       0.2X
 
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 Write dates and timestamps:               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 ------------------------------------------------------------------------------------------------------------------------
-Create a dataset of timestamps                     2187           2190           5          4.6         218.7       1.0X
-to_json(timestamp)                                16262          16503         323          0.6        1626.2       0.1X
-write timestamps to files                         11679          11692          12          0.9        1167.9       0.2X
-Create a dataset of dates                          2297           2310          12          4.4         229.7       1.0X
-to_json(date)                                     10904          10956          46          0.9        1090.4       0.2X
-write dates to files                               6610           6645          35          1.5         661.0       0.3X
+Create a dataset of timestamps                     2208           2222          14          4.5         220.8       1.0X
+to_json(timestamp)                                14299          14570         285          0.7        1429.9       0.2X
+write timestamps to files                         12955          12969          13          0.8        1295.5       0.2X
+Create a dataset of dates                          2297           2323          30          4.4         229.7       1.0X
+to_json(date)                                      8509           8561          74          1.2         850.9       0.3X
+write dates to files                               6786           6827          45          1.5         678.6       0.3X
 
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 Read dates and timestamps:                Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 ------------------------------------------------------------------------------------------------------------------------
-read timestamp text from files                     2524           2530           9          4.0         252.4       1.0X
-read timestamps from files                        41002          41052          59          0.2        4100.2       0.1X
-infer timestamps from files                       84621          84939         526          0.1        8462.1       0.0X
-read date text from files                          2292           2302           9          4.4         229.2       1.1X
-read date from files                              16954          16976          21          0.6        1695.4       0.1X
-timestamp strings                                  3067           3077          13          3.3         306.7       0.8X
-parse timestamps from Dataset[String]             48690          48971         243          0.2        4869.0       0.1X
-infer timestamps from Dataset[String]             97463          97786         338          0.1        9746.3       0.0X
-date strings                                       3952           3956           3          2.5         395.2       0.6X
-parse dates from Dataset[String]                  24210          24241          30          0.4        2421.0       0.1X
-from_json(timestamp)                              71710          72242         629          0.1        7171.0       0.0X
-from_json(date)                                   42465          42481          13          0.2        4246.5       0.1X
+read timestamp text from files                     2598           2613          18          3.8         259.8       1.0X
+read timestamps from files                        42007          42028          19          0.2        4200.7       0.1X
+infer timestamps from files                       18102          18120          28          0.6        1810.2       0.1X

Review comment:
       It seems that we need to update `JsonBenchmark` together.
   ```
   readBench.addCase("infer timestamps from files", numIters) { _ =>
       spark.read.json(timestampDir).noop()
   }
   ```

##########
File path: sql/core/benchmarks/JsonBenchmark-jdk11-results.txt
##########
@@ -7,106 +7,106 @@ OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-106
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 JSON schema inferring:                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 ------------------------------------------------------------------------------------------------------------------------
-No encoding                                       68879          68993         116          1.5         688.8       1.0X
-UTF-8 is set                                     115270         115602         455          0.9        1152.7       0.6X
+No encoding                                       69219          69342         116          1.4         692.2       1.0X
+UTF-8 is set                                     143950         143986          55          0.7        1439.5       0.5X
 
 Preparing data for benchmarking ...
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 count a short column:                     Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 ------------------------------------------------------------------------------------------------------------------------
-No encoding                                       47452          47538         113          2.1         474.5       1.0X
-UTF-8 is set                                      77330          77354          30          1.3         773.3       0.6X
+No encoding                                       57828          57913         136          1.7         578.3       1.0X
+UTF-8 is set                                      83649          83711          60          1.2         836.5       0.7X
 
 Preparing data for benchmarking ...
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 count a wide column:                      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 ------------------------------------------------------------------------------------------------------------------------
-No encoding                                       60470          60900         534          0.2        6047.0       1.0X
-UTF-8 is set                                     104733         104931         189          0.1       10473.3       0.6X
+No encoding                                       64560          65193        1023          0.2        6456.0       1.0X
+UTF-8 is set                                     102925         103174         216          0.1       10292.5       0.6X
 
 Preparing data for benchmarking ...
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 select wide row:                          Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 ------------------------------------------------------------------------------------------------------------------------
-No encoding                                      130302         131072         976          0.0      260604.6       1.0X
-UTF-8 is set                                     150860         151284         377          0.0      301720.1       0.9X
+No encoding                                      131002         132316        1160          0.0      262003.1       1.0X
+UTF-8 is set                                     152128         152371         332          0.0      304256.5       0.9X
 
 Preparing data for benchmarking ...
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 Select a subset of 10 columns:            Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 ------------------------------------------------------------------------------------------------------------------------
-Select 10 columns                                 18619          18684          99          0.5        1861.9       1.0X
-Select 1 column                                   24227          24270          38          0.4        2422.7       0.8X
+Select 10 columns                                 19376          19514         160          0.5        1937.6       1.0X
+Select 1 column                                   24089          24156          58          0.4        2408.9       0.8X
 
 Preparing data for benchmarking ...
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 creation of JSON parser per line:         Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 ------------------------------------------------------------------------------------------------------------------------
-Short column without encoding                      7947           7971          21          1.3         794.7       1.0X
-Short column with UTF-8                           12700          12753          58          0.8        1270.0       0.6X
-Wide column without encoding                      92632          92955         463          0.1        9263.2       0.1X
-Wide column with UTF-8                           147013         147170         188          0.1       14701.3       0.1X
+Short column without encoding                      8131           8219         103          1.2         813.1       1.0X
+Short column with UTF-8                           13464          13508          44          0.7        1346.4       0.6X
+Wide column without encoding                     108012         108598         914          0.1       10801.2       0.1X
+Wide column with UTF-8                           150988         151369         412          0.1       15098.8       0.1X
 
 Preparing data for benchmarking ...
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 JSON functions:                           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 ------------------------------------------------------------------------------------------------------------------------
-Text read                                           713            734          19         14.0          71.3       1.0X
-from_json                                         22019          22429         456          0.5        2201.9       0.0X
-json_tuple                                        27987          28047          74          0.4        2798.7       0.0X
-get_json_object                                   21468          21870         350          0.5        2146.8       0.0X
+Text read                                           753            765          18         13.3          75.3       1.0X
+from_json                                         23182          23446         230          0.4        2318.2       0.0X
+json_tuple                                        31129          31304         181          0.3        3112.9       0.0X
+get_json_object                                   22821          23073         225          0.4        2282.1       0.0X
 
 Preparing data for benchmarking ...
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 Dataset of json strings:                  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 ------------------------------------------------------------------------------------------------------------------------
-Text read                                          2887           2910          24         17.3          57.7       1.0X
-schema inferring                                  31793          31843          43          1.6         635.9       0.1X
-parsing                                           36791          37104         294          1.4         735.8       0.1X
+Text read                                          3078           3101          26         16.2          61.6       1.0X
+schema inferring                                  30225          30434         333          1.7         604.5       0.1X
+parsing                                           32237          32308          63          1.6         644.7       0.1X
 
 Preparing data for benchmarking ...
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 Json files in the per-line mode:          Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 ------------------------------------------------------------------------------------------------------------------------
-Text read                                         10570          10611          45          4.7         211.4       1.0X
-Schema inferring                                  48729          48763          41          1.0         974.6       0.2X
-Parsing without charset                           35490          35648         141          1.4         709.8       0.3X
-Parsing with UTF-8                                63853          63994         163          0.8        1277.1       0.2X
+Text read                                         10835          10900          86          4.6         216.7       1.0X
+Schema inferring                                  37720          37805         110          1.3         754.4       0.3X
+Parsing without charset                           35464          35538         100          1.4         709.3       0.3X
+Parsing with UTF-8                                67311          67738         381          0.7        1346.2       0.2X
 
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 Write dates and timestamps:               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 ------------------------------------------------------------------------------------------------------------------------
-Create a dataset of timestamps                     2187           2190           5          4.6         218.7       1.0X
-to_json(timestamp)                                16262          16503         323          0.6        1626.2       0.1X
-write timestamps to files                         11679          11692          12          0.9        1167.9       0.2X
-Create a dataset of dates                          2297           2310          12          4.4         229.7       1.0X
-to_json(date)                                     10904          10956          46          0.9        1090.4       0.2X
-write dates to files                               6610           6645          35          1.5         661.0       0.3X
+Create a dataset of timestamps                     2208           2222          14          4.5         220.8       1.0X
+to_json(timestamp)                                14299          14570         285          0.7        1429.9       0.2X
+write timestamps to files                         12955          12969          13          0.8        1295.5       0.2X
+Create a dataset of dates                          2297           2323          30          4.4         229.7       1.0X
+to_json(date)                                      8509           8561          74          1.2         850.9       0.3X
+write dates to files                               6786           6827          45          1.5         678.6       0.3X
 
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 Read dates and timestamps:                Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 ------------------------------------------------------------------------------------------------------------------------
-read timestamp text from files                     2524           2530           9          4.0         252.4       1.0X
-read timestamps from files                        41002          41052          59          0.2        4100.2       0.1X
-infer timestamps from files                       84621          84939         526          0.1        8462.1       0.0X
-read date text from files                          2292           2302           9          4.4         229.2       1.1X
-read date from files                              16954          16976          21          0.6        1695.4       0.1X
-timestamp strings                                  3067           3077          13          3.3         306.7       0.8X
-parse timestamps from Dataset[String]             48690          48971         243          0.2        4869.0       0.1X
-infer timestamps from Dataset[String]             97463          97786         338          0.1        9746.3       0.0X
-date strings                                       3952           3956           3          2.5         395.2       0.6X
-parse dates from Dataset[String]                  24210          24241          30          0.4        2421.0       0.1X
-from_json(timestamp)                              71710          72242         629          0.1        7171.0       0.0X
-from_json(date)                                   42465          42481          13          0.2        4246.5       0.1X
+read timestamp text from files                     2598           2613          18          3.8         259.8       1.0X
+read timestamps from files                        42007          42028          19          0.2        4200.7       0.1X
+infer timestamps from files                       18102          18120          28          0.6        1810.2       0.1X

Review comment:
       It seems that we need to update `JsonBenchmark` together.
   ```scala
   readBench.addCase("infer timestamps from files", numIters) { _ =>
       spark.read.json(timestampDir).noop()
   }
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #28966: [SPARK-32130][SQL] Disable the JSON option `inferTimestamp` by default

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #28966:
URL: https://github.com/apache/spark/pull/28966#issuecomment-652599271






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #28966: [SPARK-32130][SQL] Disable the JSON option `inferTimestamp` by default

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #28966:
URL: https://github.com/apache/spark/pull/28966#issuecomment-652535966






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on pull request #28966: [SPARK-32130][SQL] Disable the JSON option `inferTimestamp` by default

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on pull request #28966:
URL: https://github.com/apache/spark/pull/28966#issuecomment-652599839


   This https://github.com/apache/spark/pull/28966#issuecomment-652597041 took 7 hr 37 min :-(


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28966: [SPARK-32130][SQL] Disable the JSON option `inferTimestamp` by default

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #28966:
URL: https://github.com/apache/spark/pull/28966#discussion_r448621700



##########
File path: sql/core/benchmarks/JsonBenchmark-jdk11-results.txt
##########
@@ -7,106 +7,106 @@ OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-106
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 JSON schema inferring:                    Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 ------------------------------------------------------------------------------------------------------------------------
-No encoding                                       68879          68993         116          1.5         688.8       1.0X
-UTF-8 is set                                     115270         115602         455          0.9        1152.7       0.6X
+No encoding                                       69219          69342         116          1.4         692.2       1.0X
+UTF-8 is set                                     143950         143986          55          0.7        1439.5       0.5X
 
 Preparing data for benchmarking ...
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 count a short column:                     Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 ------------------------------------------------------------------------------------------------------------------------
-No encoding                                       47452          47538         113          2.1         474.5       1.0X
-UTF-8 is set                                      77330          77354          30          1.3         773.3       0.6X
+No encoding                                       57828          57913         136          1.7         578.3       1.0X
+UTF-8 is set                                      83649          83711          60          1.2         836.5       0.7X
 
 Preparing data for benchmarking ...
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 count a wide column:                      Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 ------------------------------------------------------------------------------------------------------------------------
-No encoding                                       60470          60900         534          0.2        6047.0       1.0X
-UTF-8 is set                                     104733         104931         189          0.1       10473.3       0.6X
+No encoding                                       64560          65193        1023          0.2        6456.0       1.0X
+UTF-8 is set                                     102925         103174         216          0.1       10292.5       0.6X
 
 Preparing data for benchmarking ...
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 select wide row:                          Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 ------------------------------------------------------------------------------------------------------------------------
-No encoding                                      130302         131072         976          0.0      260604.6       1.0X
-UTF-8 is set                                     150860         151284         377          0.0      301720.1       0.9X
+No encoding                                      131002         132316        1160          0.0      262003.1       1.0X
+UTF-8 is set                                     152128         152371         332          0.0      304256.5       0.9X
 
 Preparing data for benchmarking ...
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 Select a subset of 10 columns:            Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 ------------------------------------------------------------------------------------------------------------------------
-Select 10 columns                                 18619          18684          99          0.5        1861.9       1.0X
-Select 1 column                                   24227          24270          38          0.4        2422.7       0.8X
+Select 10 columns                                 19376          19514         160          0.5        1937.6       1.0X
+Select 1 column                                   24089          24156          58          0.4        2408.9       0.8X
 
 Preparing data for benchmarking ...
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 creation of JSON parser per line:         Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 ------------------------------------------------------------------------------------------------------------------------
-Short column without encoding                      7947           7971          21          1.3         794.7       1.0X
-Short column with UTF-8                           12700          12753          58          0.8        1270.0       0.6X
-Wide column without encoding                      92632          92955         463          0.1        9263.2       0.1X
-Wide column with UTF-8                           147013         147170         188          0.1       14701.3       0.1X
+Short column without encoding                      8131           8219         103          1.2         813.1       1.0X
+Short column with UTF-8                           13464          13508          44          0.7        1346.4       0.6X
+Wide column without encoding                     108012         108598         914          0.1       10801.2       0.1X
+Wide column with UTF-8                           150988         151369         412          0.1       15098.8       0.1X
 
 Preparing data for benchmarking ...
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 JSON functions:                           Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 ------------------------------------------------------------------------------------------------------------------------
-Text read                                           713            734          19         14.0          71.3       1.0X
-from_json                                         22019          22429         456          0.5        2201.9       0.0X
-json_tuple                                        27987          28047          74          0.4        2798.7       0.0X
-get_json_object                                   21468          21870         350          0.5        2146.8       0.0X
+Text read                                           753            765          18         13.3          75.3       1.0X
+from_json                                         23182          23446         230          0.4        2318.2       0.0X
+json_tuple                                        31129          31304         181          0.3        3112.9       0.0X
+get_json_object                                   22821          23073         225          0.4        2282.1       0.0X
 
 Preparing data for benchmarking ...
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 Dataset of json strings:                  Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 ------------------------------------------------------------------------------------------------------------------------
-Text read                                          2887           2910          24         17.3          57.7       1.0X
-schema inferring                                  31793          31843          43          1.6         635.9       0.1X
-parsing                                           36791          37104         294          1.4         735.8       0.1X
+Text read                                          3078           3101          26         16.2          61.6       1.0X
+schema inferring                                  30225          30434         333          1.7         604.5       0.1X
+parsing                                           32237          32308          63          1.6         644.7       0.1X
 
 Preparing data for benchmarking ...
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 Json files in the per-line mode:          Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 ------------------------------------------------------------------------------------------------------------------------
-Text read                                         10570          10611          45          4.7         211.4       1.0X
-Schema inferring                                  48729          48763          41          1.0         974.6       0.2X
-Parsing without charset                           35490          35648         141          1.4         709.8       0.3X
-Parsing with UTF-8                                63853          63994         163          0.8        1277.1       0.2X
+Text read                                         10835          10900          86          4.6         216.7       1.0X
+Schema inferring                                  37720          37805         110          1.3         754.4       0.3X
+Parsing without charset                           35464          35538         100          1.4         709.3       0.3X
+Parsing with UTF-8                                67311          67738         381          0.7        1346.2       0.2X
 
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 Write dates and timestamps:               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 ------------------------------------------------------------------------------------------------------------------------
-Create a dataset of timestamps                     2187           2190           5          4.6         218.7       1.0X
-to_json(timestamp)                                16262          16503         323          0.6        1626.2       0.1X
-write timestamps to files                         11679          11692          12          0.9        1167.9       0.2X
-Create a dataset of dates                          2297           2310          12          4.4         229.7       1.0X
-to_json(date)                                     10904          10956          46          0.9        1090.4       0.2X
-write dates to files                               6610           6645          35          1.5         661.0       0.3X
+Create a dataset of timestamps                     2208           2222          14          4.5         220.8       1.0X
+to_json(timestamp)                                14299          14570         285          0.7        1429.9       0.2X
+write timestamps to files                         12955          12969          13          0.8        1295.5       0.2X
+Create a dataset of dates                          2297           2323          30          4.4         229.7       1.0X
+to_json(date)                                      8509           8561          74          1.2         850.9       0.3X
+write dates to files                               6786           6827          45          1.5         678.6       0.3X
 
 OpenJDK 64-Bit Server VM 11.0.7+10-post-Ubuntu-2ubuntu218.04 on Linux 4.15.0-1063-aws
 Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
 Read dates and timestamps:                Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 ------------------------------------------------------------------------------------------------------------------------
-read timestamp text from files                     2524           2530           9          4.0         252.4       1.0X
-read timestamps from files                        41002          41052          59          0.2        4100.2       0.1X
-infer timestamps from files                       84621          84939         526          0.1        8462.1       0.0X
-read date text from files                          2292           2302           9          4.4         229.2       1.1X
-read date from files                              16954          16976          21          0.6        1695.4       0.1X
-timestamp strings                                  3067           3077          13          3.3         306.7       0.8X
-parse timestamps from Dataset[String]             48690          48971         243          0.2        4869.0       0.1X
-infer timestamps from Dataset[String]             97463          97786         338          0.1        9746.3       0.0X
-date strings                                       3952           3956           3          2.5         395.2       0.6X
-parse dates from Dataset[String]                  24210          24241          30          0.4        2421.0       0.1X
-from_json(timestamp)                              71710          72242         629          0.1        7171.0       0.0X
-from_json(date)                                   42465          42481          13          0.2        4246.5       0.1X
+read timestamp text from files                     2598           2613          18          3.8         259.8       1.0X
+read timestamps from files                        42007          42028          19          0.2        4200.7       0.1X
+infer timestamps from files                       18102          18120          28          0.6        1810.2       0.1X

Review comment:
       Sorry, but this test case seems to be supposed to run `infer timestamp` always. Why does this become faster?
   ```
   - infer timestamps from files                       84621          84939         526          0.1        8462.1       0.0X
   + infer timestamps from files                       18102          18120          28          0.6        1810.2       0.1X
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #28966: [SPARK-32130][SQL] Disable the JSON option `inferTimestamp` by default

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #28966:
URL: https://github.com/apache/spark/pull/28966#issuecomment-652367972






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #28966: [SPARK-32130][SQL] Disable the JSON option `inferTimestamp` by default

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #28966:
URL: https://github.com/apache/spark/pull/28966#issuecomment-652367481


   **[Test build #124770 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124770/testReport)** for PR 28966 at commit [`277a679`](https://github.com/apache/spark/commit/277a679aff7cbdb4c19a4d794615dc30eb387266).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #28966: [SPARK-32130][SQL] Disable the JSON option `inferTimestamp` by default

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #28966:
URL: https://github.com/apache/spark/pull/28966#issuecomment-652535432


   **[Test build #124800 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124800/testReport)** for PR 28966 at commit [`43fc9aa`](https://github.com/apache/spark/commit/43fc9aaa55c7d56be2035c6cf4c69af930846693).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org