You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by MaxGekk <gi...@git.apache.org> on 2018/12/02 20:57:12 UTC

[GitHub] spark pull request #23201: [SPARK-26246][SQL] Infer date and timestamp types...

GitHub user MaxGekk opened a pull request:

    https://github.com/apache/spark/pull/23201

    [SPARK-26246][SQL] Infer date and timestamp types from JSON

    ## What changes were proposed in this pull request?
    
    The `JsonInferSchema` class is extended to support `DateType` and `TimestampType` inferring from string fields in JSON input. It tries to infer `TimestampType` as tightest type first of all. If timestamp parsing fails, `DateType` is inferred using date pattern. As the fallback in the case of both failures, it invokes `DateTimeUtils.stringToTime`.
    
    ## How was this patch tested?
    
    Added new test suite - `JsonInferSchemaSuite` to check date and timestamp types inferring from JSON. This changes was tested by `JsonSuite`, `JsonExpressionsSuite` and `JsonFunctionsSuite` as well. 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/MaxGekk/spark-1 json-infer-time

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/23201.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #23201
    
----
commit 2a26e2c680b517e9e89a0f4bc4cc31884020188d
Author: Maxim Gekk <ma...@...>
Date:   2018-12-02T20:06:05Z

    Added a test for timestamp inferring

commit bd472072a39dbec2e1eec1396196c6c5e6a659dd
Author: Maxim Gekk <ma...@...>
Date:   2018-12-02T20:43:48Z

    Infer date and timestamp types

commit 9dbdf0a764c998875932e50faf460f36216ef58d
Author: Maxim Gekk <ma...@...>
Date:   2018-12-02T20:44:08Z

    Test for date type

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23201: [SPARK-26246][SQL] Infer date and timestamp types...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23201#discussion_r240153595
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala ---
    @@ -121,7 +122,26 @@ private[sql] class JsonInferSchema(options: JSONOptions) extends Serializable {
                 DecimalType(bigDecimal.precision, bigDecimal.scale)
             }
             decimalTry.getOrElse(StringType)
    -      case VALUE_STRING => StringType
    +      case VALUE_STRING =>
    +        val stringValue = parser.getText
    --- End diff --
    
    Yea, one time I tried to match it with CSV a long long ago but I kind of gave up due to behaviour changes IIRC. If that's possible, it should be awesome.
    
    If that's difficult, matching the behaviour within text based datasource (meaning CSV and JSON I guess) should be good enough.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23201: [SPARK-26246][SQL] Infer date and timestamp types...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23201#discussion_r240000411
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala ---
    @@ -121,7 +122,26 @@ private[sql] class JsonInferSchema(options: JSONOptions) extends Serializable {
                 DecimalType(bigDecimal.precision, bigDecimal.scale)
             }
             decimalTry.getOrElse(StringType)
    -      case VALUE_STRING => StringType
    +      case VALUE_STRING =>
    +        val stringValue = parser.getText
    --- End diff --
    
    The order can be matter if you have the same pattern (or similar) for dates and timestamps. `DateType` can be preferable because it requires less memory. 
    
    It seems reasonable to move from `DateType` to `TimestampType` during schema inferring since opposite one is impossible without loosing info.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23201: [SPARK-26246][SQL] Infer date and timestamp types from J...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23201
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23201: [SPARK-26246][SQL] Infer date and timestamp types...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23201#discussion_r240090192
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala ---
    @@ -121,7 +122,26 @@ private[sql] class JsonInferSchema(options: JSONOptions) extends Serializable {
                 DecimalType(bigDecimal.precision, bigDecimal.scale)
             }
             decimalTry.getOrElse(StringType)
    -      case VALUE_STRING => StringType
    +      case VALUE_STRING =>
    +        val stringValue = parser.getText
    --- End diff --
    
    the partition feature is shared between all the file-based sources, I think it's an overkill to make it differ with different data sources.
    
    The simplest solution to me is asking all text sources to follow the behavior of partition value type inference.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23201: [SPARK-26246][SQL] Infer date and timestamp types...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23201#discussion_r240036225
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala ---
    @@ -121,7 +122,26 @@ private[sql] class JsonInferSchema(options: JSONOptions) extends Serializable {
                 DecimalType(bigDecimal.precision, bigDecimal.scale)
             }
             decimalTry.getOrElse(StringType)
    -      case VALUE_STRING => StringType
    +      case VALUE_STRING =>
    +        val stringValue = parser.getText
    --- End diff --
    
    do you mean partition value type inference will have a different result than json value type inference?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23201: [SPARK-26246][SQL] Infer date and timestamp types from J...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23201
  
    **[Test build #99580 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99580/testReport)** for PR 23201 at commit [`9dbdf0a`](https://github.com/apache/spark/commit/9dbdf0a764c998875932e50faf460f36216ef58d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23201: [SPARK-26246][SQL] Infer date and timestamp types...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23201#discussion_r239687213
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala ---
    @@ -121,7 +122,26 @@ private[sql] class JsonInferSchema(options: JSONOptions) extends Serializable {
                 DecimalType(bigDecimal.precision, bigDecimal.scale)
             }
             decimalTry.getOrElse(StringType)
    -      case VALUE_STRING => StringType
    +      case VALUE_STRING =>
    +        val stringValue = parser.getText
    --- End diff --
    
    I checked `PartitioningUtils.inferPartitionColumnValue`, we try timestamp first and then date. Shall we follow it?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23201: [SPARK-26246][SQL] Infer date and timestamp types from J...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23201
  
    **[Test build #99613 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99613/testReport)** for PR 23201 at commit [`05bbfea`](https://github.com/apache/spark/commit/05bbfea96dc4cf12f25e87a5c7749629156a05ef).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23201: [SPARK-26246][SQL] Infer date and timestamp types...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23201#discussion_r240031238
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala ---
    @@ -121,7 +122,26 @@ private[sql] class JsonInferSchema(options: JSONOptions) extends Serializable {
                 DecimalType(bigDecimal.precision, bigDecimal.scale)
             }
             decimalTry.getOrElse(StringType)
    -      case VALUE_STRING => StringType
    +      case VALUE_STRING =>
    +        val stringValue = parser.getText
    --- End diff --
    
    Yes, date parsing can be less strict in that case but if we prefer `TimestampType` over `DateType` for similar date and timestamp pattern, we will consume more memory. And from `DateType` we can lift to `TimestampType` during schema inferring but opposite way is impossible.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23201: [SPARK-26246][SQL] Infer date and timestamp types from J...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23201
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23201: [SPARK-26246][SQL] Infer date and timestamp types from J...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on the issue:

    https://github.com/apache/spark/pull/23201
  
    @cloud-fan May I ask you to look at this PR, please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23201: [SPARK-26246][SQL] Infer date and timestamp types...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23201#discussion_r239269170
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala ---
    @@ -121,7 +121,18 @@ private[sql] class JsonInferSchema(options: JSONOptions) extends Serializable {
                 DecimalType(bigDecimal.precision, bigDecimal.scale)
             }
             decimalTry.getOrElse(StringType)
    -      case VALUE_STRING => StringType
    +      case VALUE_STRING =>
    +        val stringValue = parser.getText
    +        if ((allCatch opt options.timestampFormat.parse(stringValue)).isDefined) {
    --- End diff --
    
    I made similar changes to https://github.com/apache/spark/pull/23202 - strong `DateType` inferring before `TimestampType`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23201: [SPARK-26246][SQL] Infer date and timestamp types...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23201#discussion_r240156871
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala ---
    @@ -121,7 +122,26 @@ private[sql] class JsonInferSchema(options: JSONOptions) extends Serializable {
                 DecimalType(bigDecimal.precision, bigDecimal.scale)
             }
             decimalTry.getOrElse(StringType)
    -      case VALUE_STRING => StringType
    +      case VALUE_STRING =>
    +        val stringValue = parser.getText
    --- End diff --
    
    > If we switch the order here, we don't need the length check here, right?
    
    @cloud-fan, that works only if we use default date/timestamp patterns. Both should do the exact match with pattern, which unfortunately the current parsing library (SimpleDateFormat) does not allow.
    
    The order here is just to make it look better and both shouldn't be dependent on its order. I think we should support those inferences after completely switching the library to `java.time.format.*` without a legacy. That should make this change easier without a hole.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23201: [SPARK-26246][SQL] Infer date and timestamp types...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23201#discussion_r240038837
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala ---
    @@ -121,7 +122,26 @@ private[sql] class JsonInferSchema(options: JSONOptions) extends Serializable {
                 DecimalType(bigDecimal.precision, bigDecimal.scale)
             }
             decimalTry.getOrElse(StringType)
    -      case VALUE_STRING => StringType
    +      case VALUE_STRING =>
    +        val stringValue = parser.getText
    --- End diff --
    
    I didn't mean type inference in partition values but you are probably right we should follow the same logic in schema inferring in datasources and partition value types.
    
    Just wondering how it works for now, this code: https://github.com/apache/spark/blob/5a140b7844936cf2b65f08853b8cfd8c499d4f13/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala#L474-L482 and this https://github.com/apache/spark/blob/f982ca07e80074bdc1e3b742c5e21cf368e4ede2/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVInferSchema.scala#L163 can use different timestamp patterns, or it is supposed to work only with default settings?
    
    Maybe `inferPartitionColumnValue` should ask a datasource for inferring date/timestamp types?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23201: [SPARK-26246][SQL] Infer date and timestamp types from J...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23201
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5668/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23201: [SPARK-26246][SQL] Infer date and timestamp types...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23201#discussion_r239534668
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala ---
    @@ -121,7 +122,26 @@ private[sql] class JsonInferSchema(options: JSONOptions) extends Serializable {
                 DecimalType(bigDecimal.precision, bigDecimal.scale)
             }
             decimalTry.getOrElse(StringType)
    -      case VALUE_STRING => StringType
    +      case VALUE_STRING =>
    +        val stringValue = parser.getText
    --- End diff --
    
    shall we abstract out this logic for all the text sources?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23201: [SPARK-26246][SQL] Infer date and timestamp types from J...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23201
  
    **[Test build #99613 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99613/testReport)** for PR 23201 at commit [`05bbfea`](https://github.com/apache/spark/commit/05bbfea96dc4cf12f25e87a5c7749629156a05ef).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23201: [SPARK-26246][SQL] Infer date and timestamp types from J...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23201
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99580/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23201: [SPARK-26246][SQL] Infer date and timestamp types...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23201#discussion_r239537694
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala ---
    @@ -121,7 +122,26 @@ private[sql] class JsonInferSchema(options: JSONOptions) extends Serializable {
                 DecimalType(bigDecimal.precision, bigDecimal.scale)
             }
             decimalTry.getOrElse(StringType)
    -      case VALUE_STRING => StringType
    +      case VALUE_STRING =>
    +        val stringValue = parser.getText
    --- End diff --
    
    Yes, we can do that. There is some common code that could be shared. Can we do it in a separate PR?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23201: [SPARK-26246][SQL] Infer date and timestamp types...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23201#discussion_r240022552
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala ---
    @@ -121,7 +122,26 @@ private[sql] class JsonInferSchema(options: JSONOptions) extends Serializable {
                 DecimalType(bigDecimal.precision, bigDecimal.scale)
             }
             decimalTry.getOrElse(StringType)
    -      case VALUE_STRING => StringType
    +      case VALUE_STRING =>
    +        val stringValue = parser.getText
    --- End diff --
    
    If we switch the order here, we don't need the length check [here](https://github.com/apache/spark/pull/23201/files#diff-e925de14239f40430d05f9ffd0360f10R130), right?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23201: [SPARK-26246][SQL] Infer date and timestamp types...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23201#discussion_r239687264
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala ---
    @@ -121,7 +122,26 @@ private[sql] class JsonInferSchema(options: JSONOptions) extends Serializable {
                 DecimalType(bigDecimal.precision, bigDecimal.scale)
             }
             decimalTry.getOrElse(StringType)
    -      case VALUE_STRING => StringType
    +      case VALUE_STRING =>
    +        val stringValue = parser.getText
    --- End diff --
    
    or the order doesn't matter?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23201: [SPARK-26246][SQL] Infer date and timestamp types...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23201#discussion_r239539848
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala ---
    @@ -121,7 +122,26 @@ private[sql] class JsonInferSchema(options: JSONOptions) extends Serializable {
                 DecimalType(bigDecimal.precision, bigDecimal.scale)
             }
             decimalTry.getOrElse(StringType)
    -      case VALUE_STRING => StringType
    +      case VALUE_STRING =>
    +        val stringValue = parser.getText
    --- End diff --
    
    sure. How many text data sources already support it?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23201: [SPARK-26246][SQL] Infer date and timestamp types...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23201#discussion_r238141831
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala ---
    @@ -121,7 +121,18 @@ private[sql] class JsonInferSchema(options: JSONOptions) extends Serializable {
                 DecimalType(bigDecimal.precision, bigDecimal.scale)
             }
             decimalTry.getOrElse(StringType)
    -      case VALUE_STRING => StringType
    +      case VALUE_STRING =>
    +        val stringValue = parser.getText
    +        if ((allCatch opt options.timestampFormat.parse(stringValue)).isDefined) {
    --- End diff --
    
    I haven't tested this by myself but I think it has the same problem (https://github.com/apache/spark/pull/23202#discussion_r238141702)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23201: [SPARK-26246][SQL] Infer date and timestamp types...

Posted by MaxGekk <gi...@git.apache.org>.
Github user MaxGekk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23201#discussion_r239547742
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala ---
    @@ -121,7 +122,26 @@ private[sql] class JsonInferSchema(options: JSONOptions) extends Serializable {
                 DecimalType(bigDecimal.precision, bigDecimal.scale)
             }
             decimalTry.getOrElse(StringType)
    -      case VALUE_STRING => StringType
    +      case VALUE_STRING =>
    +        val stringValue = parser.getText
    --- End diff --
    
    `DateType` is not inferred at all but there is another type inference code that could be shared between JSON and CSV (maybe somewhere else).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23201: [SPARK-26246][SQL] Infer date and timestamp types from J...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23201
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23201: [SPARK-26246][SQL] Infer date and timestamp types from J...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23201
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5638/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23201: [SPARK-26246][SQL] Infer date and timestamp types from J...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/23201
  
    **[Test build #99580 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99580/testReport)** for PR 23201 at commit [`9dbdf0a`](https://github.com/apache/spark/commit/9dbdf0a764c998875932e50faf460f36216ef58d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class JsonInferSchemaSuite extends SparkFunSuite `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23201: [SPARK-26246][SQL] Infer date and timestamp types from J...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23201
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23201: [SPARK-26246][SQL] Infer date and timestamp types from J...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23201
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99613/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23201: [SPARK-26246][SQL] Infer date and timestamp types from J...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23201
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org