You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by HyukjinKwon <gi...@git.apache.org> on 2017/05/12 02:47:04 UTC

[GitHub] spark pull request #17956: [SPARK-18772][SQL] Avoid unnecessary conversion t...

GitHub user HyukjinKwon opened a pull request:

    https://github.com/apache/spark/pull/17956

    [SPARK-18772][SQL] Avoid unnecessary conversion try for special floats in JSON and add related tests

    ## What changes were proposed in this pull request?
    
    This PR is based on  https://github.com/apache/spark/pull/16199 and extracts the valid change from https://github.com/apache/spark/pull/9759 to resolve SPARK-18772
    
    It hardly changes input/output at all but just avoid additional conversion try with `toFloat` and `toDouble`.
    
    **Before**
    
    ```scala
    scala> import org.apache.spark.sql.types._
    import org.apache.spark.sql.types._
    
    scala> spark.read.schema(StructType(Seq(StructField("a", DoubleType)))).option("mode", "FAILFAST").json(Seq("""{"a": "nan"}""").toDS).show()
    17/05/12 11:30:41 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 2)
    java.lang.NumberFormatException: For input string: "nan"
    ...
    ```
    
    **After**
    
    ```scala
    scala> import org.apache.spark.sql.types._
    import org.apache.spark.sql.types._
    
    scala> spark.read.schema(StructType(Seq(StructField("a", DoubleType)))).option("mode", "FAILFAST").json(Seq("""{"a": "nan"}""").toDS).show()
    17/05/12 11:44:30 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
    java.lang.RuntimeException: Cannot parse nan as DoubleType.
    ...
    ```
    
    ## How was this patch tested?
    
    Unit tests added in `JsonSuite`.
    
    Closes #16199

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HyukjinKwon/spark SPARK-18772

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17956.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17956
    
----
commit f83eec7a4c8fce223c5fd28b411fe2d1ae1da8dd
Author: Nathan Howell <nh...@godaddy.com>
Date:   2016-12-07T23:32:14Z

    [SPARK-18772][SQL] NaN/Infinite float parsing in JSON is inconsistent

commit 660a2843050d99b15ca7676ead8b8be4117267f1
Author: hyukjinkwon <gu...@gmail.com>
Date:   2017-05-12T02:22:27Z

    Avoid unnecessary cast try for special floats in JSON and add related tests

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17956: [SPARK-18772][SQL] Avoid unnecessary conversion t...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17956#discussion_r116277511
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala ---
    @@ -126,16 +125,13 @@ class JacksonParser(
     
             case VALUE_STRING =>
               // Special case handling for NaN and Infinity.
    -          val value = parser.getText
    -          val lowerCaseValue = value.toLowerCase(Locale.ROOT)
    -          if (lowerCaseValue.equals("nan") ||
    -            lowerCaseValue.equals("infinity") ||
    -            lowerCaseValue.equals("-infinity") ||
    -            lowerCaseValue.equals("inf") ||
    -            lowerCaseValue.equals("-inf")) {
    -            value.toFloat
    -          } else {
    -            throw new RuntimeException(s"Cannot parse $value as FloatType.")
    +          parser.getText match {
    --- End diff --
    
    I think we should not add all the variants in the default behaviour. We could add the options for those cases. We all already have different ideas. Is it worth adding those cases right now in this PR? This is not the end of fixing this code path.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try and ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17956
  
    Before this PR, the option `allowNonNumericNumbers` affects the results of the following four cases:
    ```Json
    {"a": NaN}
    {"a": Infinity}
    {"a": +Infinity}
    {"a": -Infinity}
    ```
    
    We should also enable this flag for the following cases too. 
    ```Json
    {"a": "+INF"}
    {"a": "INF"}
    {"a": "-INF"}
    {"a": "NaN"}
    {"a": "+NaN"}
    {"a": "-NaN"}
    {"a": "Infinity"}
    {"a": "+Infinity"}
    {"a": "-Infinity"}
    ```
    
    In addition, we almost introduced a very serious regression defect, because we do not have the test case to cover the following scenarios:
    
    ```Json
    {"a": "4.0"}
    ```
    Thus, we also should include the above test case. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try for ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/17956
  
    Unfortunately, we already support 
    
    ```
    "NaN"
    "-Infinity"
    "Infinity"
    ```
    
    Now, this PR targets primarily to avoid unnecessary conversion try primarily.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17956
  
    **[Test build #76858 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76858/testReport)** for PR 17956 at commit [`fa63ff4`](https://github.com/apache/spark/commit/fa63ff4e63bfcb3d780183528663223ab63891a3).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17956: [SPARK-18772][SQL] Avoid unnecessary conversion t...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17956#discussion_r116159278
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala ---
    @@ -1988,4 +1988,47 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData {
           assert(errMsg.startsWith("The field for corrupt records must be string type and nullable"))
         }
       }
    +
    +  test("SPARK-18772: Parse special floats correctly") {
    +    // positive cases
    +    val jsons = Seq(
    +      """{"a": "-INF"}""",
    +      """{"a": "INF"}""",
    +      """{"a": "-INF"}""",
    --- End diff --
    
    Do we want to test `+INF`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17956: [SPARK-18772][SQL] Avoid unnecessary conversion t...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17956#discussion_r116163499
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala ---
    @@ -1988,4 +1988,51 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData {
           assert(errMsg.startsWith("The field for corrupt records must be string type and nullable"))
         }
       }
    +
    +  test("SPARK-18772: Parse special floats correctly") {
    +    val jsons = Seq(
    +      """{"a": "+INF"}""",
    +      """{"a": "INF"}""",
    +      """{"a": "-INF"}""",
    +      """{"a": "NaN"}""",
    +      """{"a": "Infinity"}""",
    +      """{"a": "+Infinity"}""",
    +      """{"a": "-Infinity"}""")
    --- End diff --
    
    I do not understand which standards we follow. The above seven values are accepted by `toFloat` and `toDouble`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try for ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/17956
  
    thanks, merging to master/2.2!
    
    I think this change is pretty safe, we can discuss 2 things later:
    1. if we want to support more special strings like `Inf`
    2. if we want to make it case-insensitive


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try and ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17956
  
    uh... That is different... This is String. That is Float. nvm. They are different issues.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try for ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17956
  
    This PR description is misleading. This PR is actually a bug fix, I think


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try and ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/17956
  
    Hey @gatorsmile, how about picking up the commits here and opening a PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try for ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/17956
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17956: [SPARK-18772][SQL] Avoid unnecessary conversion t...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17956#discussion_r116163770
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala ---
    @@ -1988,4 +1988,51 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData {
           assert(errMsg.startsWith("The field for corrupt records must be string type and nullable"))
         }
       }
    +
    +  test("SPARK-18772: Parse special floats correctly") {
    +    val jsons = Seq(
    +      """{"a": "+INF"}""",
    +      """{"a": "INF"}""",
    +      """{"a": "-INF"}""",
    +      """{"a": "NaN"}""",
    +      """{"a": "Infinity"}""",
    +      """{"a": "+Infinity"}""",
    +      """{"a": "-Infinity"}""")
    --- End diff --
    
    This follows Jackson + Scala. https://github.com/apache/spark/pull/9759#r63321521


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17956: [SPARK-18772][SQL] Avoid unnecessary conversion t...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17956#discussion_r116276050
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala ---
    @@ -126,16 +125,13 @@ class JacksonParser(
     
             case VALUE_STRING =>
               // Special case handling for NaN and Infinity.
    -          val value = parser.getText
    -          val lowerCaseValue = value.toLowerCase(Locale.ROOT)
    -          if (lowerCaseValue.equals("nan") ||
    -            lowerCaseValue.equals("infinity") ||
    -            lowerCaseValue.equals("-infinity") ||
    -            lowerCaseValue.equals("inf") ||
    -            lowerCaseValue.equals("-inf")) {
    -            value.toFloat
    -          } else {
    -            throw new RuntimeException(s"Cannot parse $value as FloatType.")
    +          parser.getText match {
    --- End diff --
    
    Actually, here, we are not trying to educate users/customers what are right or not. We are trying to improve the usability. If the other popular platforms accept some formats, we need to follow them even if they are strange to us.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try for ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/17956
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17956
  
    **[Test build #76857 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76857/testReport)** for PR 17956 at commit [`d581467`](https://github.com/apache/spark/commit/d5814678e488f6a28c33f5e8a31fde9d6d0b25a7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17956: [SPARK-18772][SQL] Avoid unnecessary conversion t...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17956#discussion_r116159435
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala ---
    @@ -126,16 +125,11 @@ class JacksonParser(
     
             case VALUE_STRING =>
               // Special case handling for NaN and Infinity.
    -          val value = parser.getText
    -          val lowerCaseValue = value.toLowerCase(Locale.ROOT)
    -          if (lowerCaseValue.equals("nan") ||
    -            lowerCaseValue.equals("infinity") ||
    -            lowerCaseValue.equals("-infinity") ||
    -            lowerCaseValue.equals("inf") ||
    -            lowerCaseValue.equals("-inf")) {
    -            value.toFloat
    --- End diff --
    
    This sounds a bug fix, because `toFloat` is case sensitive?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17956: [SPARK-18772][SQL] Avoid unnecessary conversion t...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17956#discussion_r116163926
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala ---
    @@ -126,16 +125,11 @@ class JacksonParser(
     
             case VALUE_STRING =>
               // Special case handling for NaN and Infinity.
    -          val value = parser.getText
    -          val lowerCaseValue = value.toLowerCase(Locale.ROOT)
    -          if (lowerCaseValue.equals("nan") ||
    -            lowerCaseValue.equals("infinity") ||
    -            lowerCaseValue.equals("-infinity") ||
    -            lowerCaseValue.equals("inf") ||
    -            lowerCaseValue.equals("-inf")) {
    -            value.toFloat
    --- End diff --
    
    Okay, this adds some cases. I am not still convinced that we should take care of every other cases. Let's add options `nanValue`, `positiveInf` and `negativeInf` in a separate PR like our CSV datasource.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17956: [SPARK-18772][SQL] Avoid unnecessary conversion t...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17956#discussion_r116154314
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala ---
    @@ -1988,4 +1988,47 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData {
           assert(errMsg.startsWith("The field for corrupt records must be string type and nullable"))
         }
       }
    +
    +  test("SPARK-18772: Parse special floats correctly") {
    +    // positive cases
    +    val jsons = Seq(
    +      """{"a": "-INF"}""",
    +      """{"a": "INF"}""",
    +      """{"a": "-INF"}""",
    +      """{"a": "NaN"}""",
    +      """{"a": "Infinity"}""",
    +      """{"a": "+Infinity"}""",
    +      """{"a": "-Infinity"}""")
    +
    +    val checks: Seq[Double => Boolean] = Seq(
    +      _.isNegInfinity,
    +      _.isPosInfinity,
    +      _.isNegInfinity,
    +      _.isNaN,
    +      _.isPosInfinity,
    +      _.isPosInfinity,
    +      _.isNegInfinity)
    +
    +    Seq(FloatType, DoubleType).foreach { dt =>
    +      jsons.zip(checks).foreach { case (json, check) =>
    +        val ds = spark.read
    +          .schema(StructType(Seq(StructField("a", dt))))
    +          .json(Seq(json).toDS())
    +          .select($"a".cast(DoubleType)).as[Double]
    +        assert(check(ds.first()))
    +      }
    +    }
    +
    +    // negative case
    +    Seq(FloatType, DoubleType).foreach { dt =>
    +      val e = intercept[SparkException] {
    +        spark.read
    +          .option("mode", "FAILFAST")
    +          .schema(StructType(Seq(StructField("a", dt))))
    +          .json(Seq( """{"a": "nan"}""").toDS())
    --- End diff --
    
    Shall we also test other negative cases?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try for ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17956
  
    **[Test build #76841 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76841/testReport)** for PR 17956 at commit [`660a284`](https://github.com/apache/spark/commit/660a2843050d99b15ca7676ead8b8be4117267f1).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try and ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/17956
  
    @gatorsmail, I added some more cases I could identify based on http://docs.oracle.com/javase/7/docs/api/java/lang/Double.html#valueOf(java.lang.String) (this looks calling `java.lang.Double.parseDouble` which `toDouble` calls).
    
    And also, for sure, I made this falls back to `toDouble` or `toFloat` and throw the same exceptions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17956: [SPARK-18772][SQL] Avoid unnecessary conversion t...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17956#discussion_r116177634
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala ---
    @@ -126,16 +125,13 @@ class JacksonParser(
     
             case VALUE_STRING =>
               // Special case handling for NaN and Infinity.
    -          val value = parser.getText
    -          val lowerCaseValue = value.toLowerCase(Locale.ROOT)
    -          if (lowerCaseValue.equals("nan") ||
    -            lowerCaseValue.equals("infinity") ||
    -            lowerCaseValue.equals("-infinity") ||
    -            lowerCaseValue.equals("inf") ||
    -            lowerCaseValue.equals("-inf")) {
    -            value.toFloat
    -          } else {
    -            throw new RuntimeException(s"Cannot parse $value as FloatType.")
    +          parser.getText match {
    --- End diff --
    
    It seems to me that we don't really support case insensitive before. Although we check the lower case of input string, but we actually call `toFloat` on the original string. And `"nan".toFloat` will get `java.lang.NumberFormatException: For input string: "nan"`.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try and ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/17956
  
    Yes, thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try and ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/17956
  
    Yea, I am aware of it because we support special floating cases in string and I think we'd expect other doubles could be in string?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17956: [SPARK-18772][SQL] Avoid unnecessary conversion t...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/17956


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17956: [SPARK-18772][SQL] Avoid unnecessary conversion t...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17956#discussion_r116160516
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala ---
    @@ -126,16 +125,11 @@ class JacksonParser(
     
             case VALUE_STRING =>
               // Special case handling for NaN and Infinity.
    -          val value = parser.getText
    -          val lowerCaseValue = value.toLowerCase(Locale.ROOT)
    -          if (lowerCaseValue.equals("nan") ||
    -            lowerCaseValue.equals("infinity") ||
    -            lowerCaseValue.equals("-infinity") ||
    -            lowerCaseValue.equals("inf") ||
    -            lowerCaseValue.equals("-inf")) {
    -            value.toFloat
    --- End diff --
    
    If possible, we want to be consistent with the others. Could you check Pandas? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try and ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/17956
  
    Given the existing logic below:
    
    ```
    if (lowerCaseValue.equals("nan") ||
      lowerCaseValue.equals("infinity") ||
      lowerCaseValue.equals("-infinity") ||
      lowerCaseValue.equals("inf") ||
      lowerCaseValue.equals("-inf")) {
        value.toFloat
    ```
    
    It looks we have try all the combinations (lowercase and uppcass) of `"nan", "infinity", "-infinity", "inf", "-inf"` sofar.
    
    ```python
    import itertools
    items = ["nan", "infinity", "-infinity", "inf", "-inf"]
    comb = []
    for item in items:
       comb.extend(map(''.join, itertools.product(*zip(item.upper(), item.lower()))))
    
    print("Set(%s)" % ", ".join(map(lambda c: '"%s"' % c, comb)))
    ```
    
    This code is to generate the combinations. (I feel convenient for dealing with strings in Python). 
    
    
    ```scala
    val s = Set("NAN", "NAn", ... "-inf")
    s.foreach { w =>
      try {
        w.toDouble // or w.toFloat
        println(w)
      } catch {
        case _ => // pass
      }
    }
    ```
    
    It looks we have supported the cases below so far:
    
    ```
    NaN
    -Infinity
    Infinity
    ```
    
    Let me update this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try and ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17956
  
    If we do not upgrade `Jackson`, is that possible we can do it by checking the flag `allowNonNumericNumbers` before [doing the conversion](https://github.com/HyukjinKwon/spark/blob/fa63ff4e63bfcb3d780183528663223ab63891a3/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala#L129-L131)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try and ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/17956
  
    `allowNonNumericNumbers` option is undocumented and incomplete for now - https://github.com/apache/spark/pull/9724#r44883828
    
    > allowNonNumericNumbers is undocumented for now, since I can't figure out how it works.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17956: [SPARK-18772][SQL] Avoid unnecessary conversion t...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17956#discussion_r116173562
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala ---
    @@ -1988,4 +1989,55 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData {
           assert(errMsg.startsWith("The field for corrupt records must be string type and nullable"))
         }
       }
    +
    +  test("SPARK-18772: Parse special floats correctly") {
    +    val jsons = Seq(
    +      """{"a": "+INF"}""",
    +      """{"a": "INF"}""",
    +      """{"a": "-INF"}""",
    +      """{"a": "NaN"}""",
    +      """{"a": "+NaN"}""",
    +      """{"a": "-NaN"}""",
    +      """{"a": "Infinity"}""",
    +      """{"a": "+Infinity"}""",
    +      """{"a": "-Infinity"}""")
    --- End diff --
    
    It looks `-Infinity` succeeding.
    
    ```
    scala> "-Infinity".toDouble
    res0: Double = -Infinity
    
    scala> "-Infinity".toFloat
    res1: Float = -Infinity
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17956: [SPARK-18772][SQL] Avoid unnecessary conversion t...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17956#discussion_r116165105
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala ---
    @@ -1988,4 +1988,51 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData {
           assert(errMsg.startsWith("The field for corrupt records must be string type and nullable"))
         }
       }
    +
    +  test("SPARK-18772: Parse special floats correctly") {
    +    val jsons = Seq(
    +      """{"a": "+INF"}""",
    +      """{"a": "INF"}""",
    +      """{"a": "-INF"}""",
    +      """{"a": "NaN"}""",
    +      """{"a": "Infinity"}""",
    +      """{"a": "+Infinity"}""",
    +      """{"a": "-Infinity"}""")
    --- End diff --
    
    Below is from `JsonParser.Feature ALLOW_NON_NUMERIC_NUMBERS`
    
    > "INF" (for positive infinity), as well as alias of "Infinity"
    > "-INF" (for negative infinity), alias "-Infinity"
    > "NaN" (for other not-a-numbers, like result of division by zero)
    
    Then, this PR also adds `+INF` and `+Infinity`. 
    
    What is the Scala standard?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try for ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17956
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76846/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17956
  
    **[Test build #76858 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76858/testReport)** for PR 17956 at commit [`fa63ff4`](https://github.com/apache/spark/commit/fa63ff4e63bfcb3d780183528663223ab63891a3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17956: [SPARK-18772][SQL] Avoid unnecessary conversion t...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17956#discussion_r116165603
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala ---
    @@ -126,16 +125,11 @@ class JacksonParser(
     
             case VALUE_STRING =>
               // Special case handling for NaN and Infinity.
    -          val value = parser.getText
    -          val lowerCaseValue = value.toLowerCase(Locale.ROOT)
    -          if (lowerCaseValue.equals("nan") ||
    -            lowerCaseValue.equals("infinity") ||
    -            lowerCaseValue.equals("-infinity") ||
    -            lowerCaseValue.equals("inf") ||
    -            lowerCaseValue.equals("-inf")) {
    -            value.toFloat
    --- End diff --
    
    Without this PR, `+INF`, `INF` and `-INF"` do not work. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try for ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17956
  
    **[Test build #76841 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76841/testReport)** for PR 17956 at commit [`660a284`](https://github.com/apache/spark/commit/660a2843050d99b15ca7676ead8b8be4117267f1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17956
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try and ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17956
  
    This PR also introduces the following behavior changes. 
    
    ```Scala
        def floatRecords: Dataset[String] =
          spark.createDataset(spark.sparkContext.parallelize(
            """{"f": "18.00"}""" ::
              """{"f": "18.30"}""" ::
              """{"f": "20.00"}""" :: Nil))(Encoders.STRING)
    
        val jsonDF = spark.read.schema("f float").json(floatRecords)
    ```
    
    Before this PR, the output is like
    ```
    +----+
    |   f|
    +----+
    |null|
    |null|
    |null|
    +----+
    ```
    After this PR, the output is like
    ```
    +----+
    |   f|
    +----+
    |18.00|
    |18.30|
    |20.00|
    +----+
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17956
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76854/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17956
  
    **[Test build #76857 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76857/testReport)** for PR 17956 at commit [`d581467`](https://github.com/apache/spark/commit/d5814678e488f6a28c33f5e8a31fde9d6d0b25a7).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17956: [SPARK-18772][SQL] Avoid unnecessary conversion t...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17956#discussion_r116166525
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala ---
    @@ -1988,4 +1988,51 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData {
           assert(errMsg.startsWith("The field for corrupt records must be string type and nullable"))
         }
       }
    +
    +  test("SPARK-18772: Parse special floats correctly") {
    +    val jsons = Seq(
    +      """{"a": "+INF"}""",
    +      """{"a": "INF"}""",
    +      """{"a": "-INF"}""",
    +      """{"a": "NaN"}""",
    +      """{"a": "Infinity"}""",
    +      """{"a": "+Infinity"}""",
    +      """{"a": "-Infinity"}""")
    --- End diff --
    
    I think we also can address https://issues.apache.org/jira/browse/SPARK-11753 in this PR, right? Simply accepting all these non-numeric numbers without respecting our flag `allowNonNumericNumbers` does not sound right to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try and ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17956
  
    Do we need to follow `allowNonNumericNumbers` when parsing ` """{"a": "-Infinity"}"""`? cc @cloud-fan @viirya 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17956: [SPARK-18772][SQL] Avoid unnecessary conversion t...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17956#discussion_r116144531
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala ---
    @@ -127,13 +126,15 @@ class JacksonParser(
             case VALUE_STRING =>
               // Special case handling for NaN and Infinity.
               val value = parser.getText
    -          val lowerCaseValue = value.toLowerCase(Locale.ROOT)
    -          if (lowerCaseValue.equals("nan") ||
    -            lowerCaseValue.equals("infinity") ||
    -            lowerCaseValue.equals("-infinity") ||
    -            lowerCaseValue.equals("inf") ||
    -            lowerCaseValue.equals("-inf")) {
    +          if (value.equals("NaN") ||
    --- End diff --
    
    https://github.com/apache/spark/pull/9759#r63321521
    
    > "infinity".toDouble, "inf".toDouble are not legal. These non-numeric numbers are case-sensitive, both for Jackson and Scala.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try and ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17956
  
    Could we update the JIRA, PR title, and PR description? 
    
    Also check the JSON flag `allowNonNumericNumbers` when doing the conversion? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try for ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/17956
  
    cc @NathanHowell, @cloud-fan and @viirya.
    
    (I just want to note this will not change any input/output but just the exception type and avoid additional conversion try.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try and ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/17956
  
    Yea, Let's focus on the topic. For the cases below:
    
    ```
    {"a": NaN}
    {"a": Infinity}
    {"a": +Infinity}
    {"a": -Infinity}
    ```
    
    They are related with `allowNonNumericNumbers` and related tests are already being ignored. 
    
    Okay. So the point is behaviour change, right? Let me give a shot to remove the behaviour changes here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try and ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/17956
  
    We should upgrade Jackson accordingly to support that option correctly. This looks reverted due to dependency problem - https://github.com/apache/spark/pull/9759#issuecomment-222759764. It sounds a bigger problem to me. (Also, strictly, it looks rather related with parsing itself). We already support the special floating cases and it might be fine just to improve this case.
    
    I updated PR title and description already. I will fix JIRA as well (I was hesitated because that was not open by me).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17956: [SPARK-18772][SQL] Avoid unnecessary conversion t...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17956#discussion_r116160121
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala ---
    @@ -126,16 +125,11 @@ class JacksonParser(
     
             case VALUE_STRING =>
               // Special case handling for NaN and Infinity.
    -          val value = parser.getText
    -          val lowerCaseValue = value.toLowerCase(Locale.ROOT)
    -          if (lowerCaseValue.equals("nan") ||
    -            lowerCaseValue.equals("infinity") ||
    -            lowerCaseValue.equals("-infinity") ||
    -            lowerCaseValue.equals("inf") ||
    -            lowerCaseValue.equals("-inf")) {
    -            value.toFloat
    --- End diff --
    
    Not sure whether we should follow it. How about our CSV parsers? How about the Pandas?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try and ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17956
  
    Based on my previous work experience, I am very conservative to introduce any behavior change unless we are sure this is the right thing to do. I do not think this is a high priority PR. I can put it in my to-do list. When I am free, I can work on it. 
    
    To prove the significance of this change, we need to check whether the other projects/products have the same/similar behavior? We do not want to introduce anything new or different. This is my concern. 
    
    Below is the list we need to check. 
    ```
    {"a": "+INF"}
    {"a": "INF"}
    {"a": "-INF"}
    {"a": "NaN"}
    {"a": "+NaN"}
    {"a": "-NaN"}
    {"a": "Infinity"}
    {"a": "+Infinity"}
    {"a": "-Infinity"}
    ```
    
    Note, the above is different from the following JSON strings. 
    ```
    {"a": NaN}
    {"a": Infinity}
    {"a": +Infinity}
    {"a": -Infinity}
    ```
    
    In my previous comment, I just want to share something for the people who want to be a committer in Spark. It is not related to this PR. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try for ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17956
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try and ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/17956
  
    WDYT? I think it'd be faster to get this merged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try for ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17956
  
    **[Test build #76894 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76894/testReport)** for PR 17956 at commit [`90330bc`](https://github.com/apache/spark/commit/90330bc7916c906fec10081bd012fc4efdb450f2).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try for ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17956
  
    We are handling the Json data sources here. Are the following inputs widely used?
    ```
    {"a": "+INF"}
    {"a": "INF"}
    {"a": "-INF"}
    {"a": "NaN"}
    {"a": "+NaN"}
    {"a": "-NaN"}
    {"a": "Infinity"}
    {"a": "+Infinity"}
    {"a": "-Infinity"}
    ``` 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try for ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17956
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17956: [SPARK-18772][SQL] Avoid unnecessary conversion t...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17956#discussion_r116160371
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala ---
    @@ -126,16 +125,11 @@ class JacksonParser(
     
             case VALUE_STRING =>
               // Special case handling for NaN and Infinity.
    -          val value = parser.getText
    -          val lowerCaseValue = value.toLowerCase(Locale.ROOT)
    -          if (lowerCaseValue.equals("nan") ||
    -            lowerCaseValue.equals("infinity") ||
    -            lowerCaseValue.equals("-infinity") ||
    -            lowerCaseValue.equals("inf") ||
    -            lowerCaseValue.equals("-inf")) {
    -            value.toFloat
    --- End diff --
    
    Is JSON format itself related with CSV and Pandas? Please see the discussion in https://github.com/apache/spark/pull/9759#r63321521.
    
    Also, this does not change the existing behaviour.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17956: [SPARK-18772][SQL] Avoid unnecessary conversion t...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17956#discussion_r116145813
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala ---
    @@ -127,13 +126,15 @@ class JacksonParser(
             case VALUE_STRING =>
               // Special case handling for NaN and Infinity.
               val value = parser.getText
    -          val lowerCaseValue = value.toLowerCase(Locale.ROOT)
    -          if (lowerCaseValue.equals("nan") ||
    -            lowerCaseValue.equals("infinity") ||
    -            lowerCaseValue.equals("-infinity") ||
    -            lowerCaseValue.equals("inf") ||
    -            lowerCaseValue.equals("-inf")) {
    +          if (value.equals("NaN") ||
    +            value.equals("Infinity") ||
    +            value.equals("+Infinity") ||
    +            value.equals("-Infinity")) {
                 value.toFloat
    +          } else if (value.equals("+INF") || value.equals("INF")) {
    --- End diff --
    
    how about
    ```
    parser.getText match {
      case "NaN" => Float.NaN
      case "+INF" | "INF" | "+Infinity" | "Infinity" => Float.PositiveInfinity
      case "-INF" | "-Infinity" => Float.NegativeInfinity
      case other => throw new RuntimeException(s"Cannot parse $other as FloatType.")
    }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17956: [SPARK-18772][SQL] Avoid unnecessary conversion t...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17956#discussion_r116166046
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala ---
    @@ -1988,4 +1988,51 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData {
           assert(errMsg.startsWith("The field for corrupt records must be string type and nullable"))
         }
       }
    +
    +  test("SPARK-18772: Parse special floats correctly") {
    +    val jsons = Seq(
    +      """{"a": "+INF"}""",
    +      """{"a": "INF"}""",
    +      """{"a": "-INF"}""",
    +      """{"a": "NaN"}""",
    +      """{"a": "Infinity"}""",
    +      """{"a": "+Infinity"}""",
    +      """{"a": "-Infinity"}""")
    --- End diff --
    
    Just let us assume we want to follow `JsonParser.Feature ALLOW_NON_NUMERIC_NUMBERS`, does our JSON option flag work? `allowNonNumericNumbers`. 
    
    If we turn it off, these test cases should fail, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try for ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17956
  
    **[Test build #76846 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76846/testReport)** for PR 17956 at commit [`aa7c658`](https://github.com/apache/spark/commit/aa7c6580a733c0d964cabd1fcabf1f2730227f10).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17956
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76857/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try and ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17956
  
    BTW, if we want to support the JSON strings like `{"a": "-Infinity"}` as the FLOAT type, I think we also should support the float data in a string. Below is an example. 
    ```Scala
        def floatRecords: Dataset[String] =
          spark.createDataset(spark.sparkContext.parallelize(
            """{"f": "18.00"}""" ::
              """{"f": "18.30"}""" ::
              """{"f": "20.00"}""" :: Nil))(Encoders.STRING)
        val jsonDF = spark.read.schema("f float").json(floatRecords)
    ```
    
    If we want to support parsing float data in a string, we should also support the other data types. Thus, this is another to-do issue, we should do an investigation. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17956: [SPARK-18772][SQL] Avoid unnecessary conversion t...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17956#discussion_r116161266
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala ---
    @@ -126,16 +125,11 @@ class JacksonParser(
     
             case VALUE_STRING =>
               // Special case handling for NaN and Infinity.
    -          val value = parser.getText
    -          val lowerCaseValue = value.toLowerCase(Locale.ROOT)
    -          if (lowerCaseValue.equals("nan") ||
    -            lowerCaseValue.equals("infinity") ||
    -            lowerCaseValue.equals("-infinity") ||
    -            lowerCaseValue.equals("inf") ||
    -            lowerCaseValue.equals("-inf")) {
    -            value.toFloat
    --- End diff --
    
    Probably let me check and open JIRA/PR if there are some cases we should handle when I have some time. Let's leave out that here. It sounds that does not block this PR and I don't want extra changes do not hold off this PR like https://github.com/apache/spark/pull/17901.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17956: [SPARK-18772][SQL] Avoid unnecessary conversion t...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17956#discussion_r116159720
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala ---
    @@ -126,16 +125,11 @@ class JacksonParser(
     
             case VALUE_STRING =>
               // Special case handling for NaN and Infinity.
    -          val value = parser.getText
    -          val lowerCaseValue = value.toLowerCase(Locale.ROOT)
    -          if (lowerCaseValue.equals("nan") ||
    -            lowerCaseValue.equals("infinity") ||
    -            lowerCaseValue.equals("-infinity") ||
    -            lowerCaseValue.equals("inf") ||
    -            lowerCaseValue.equals("-inf")) {
    -            value.toFloat
    --- End diff --
    
    For input/output, it is not a bug. Please see https://github.com/apache/spark/pull/17956#discussion_r116144531 and the PR description.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try for ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17956
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76894/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try and ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/17956
  
    (Ah, it was about Jackson's `JsonParser.Feature.ALLOW_NON_NUMERIC_NUMBERS` ... I see.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17956
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76858/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try and ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17956
  
    We really need to be careful when making the code changes. If we support such a feature, we need to support all the data types. This is unexpected to users. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17956
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17956: [SPARK-18772][SQL] Avoid unnecessary conversion t...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17956#discussion_r116284193
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala ---
    @@ -1988,4 +1989,55 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData {
           assert(errMsg.startsWith("The field for corrupt records must be string type and nullable"))
         }
       }
    +
    +  test("SPARK-18772: Parse special floats correctly") {
    +    val jsons = Seq(
    +      """{"a": "+INF"}""",
    +      """{"a": "INF"}""",
    +      """{"a": "-INF"}""",
    +      """{"a": "NaN"}""",
    +      """{"a": "+NaN"}""",
    +      """{"a": "-NaN"}""",
    +      """{"a": "Infinity"}""",
    +      """{"a": "+Infinity"}""",
    +      """{"a": "-Infinity"}""")
    --- End diff --
    
    It sounds like Jackson only support the following four cases. 
    
    ```Scala
        val jsons = Seq(
          """{"a": NaN}""",
          """{"a": Infinity}""",
          """{"a": +Infinity}""",
          """{"a": -Infinity}""")
    ```
    
    Could you add a separate test case for them? Also add the negative cases too.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try and ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/17956
  
    > Do we need to follow allowNonNumericNumbers when parsing """{"a": "-Infinity"}"""? cc @cloud-fan @viirya
    
    `allowNonNumericNumbers` supports it and Scala supports it too. It seems to me that we should follow them.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17956: [SPARK-18772][SQL] Avoid unnecessary conversion t...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17956#discussion_r116179181
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala ---
    @@ -126,16 +125,13 @@ class JacksonParser(
     
             case VALUE_STRING =>
               // Special case handling for NaN and Infinity.
    -          val value = parser.getText
    -          val lowerCaseValue = value.toLowerCase(Locale.ROOT)
    -          if (lowerCaseValue.equals("nan") ||
    -            lowerCaseValue.equals("infinity") ||
    -            lowerCaseValue.equals("-infinity") ||
    -            lowerCaseValue.equals("inf") ||
    -            lowerCaseValue.equals("-inf")) {
    -            value.toFloat
    -          } else {
    -            throw new RuntimeException(s"Cannot parse $value as FloatType.")
    +          parser.getText match {
    +            case "NaN" | "+NaN" | "-NaN" => Float.NaN
    --- End diff --
    
    Yea, but it is supported in Scala. These were added to address https://github.com/apache/spark/pull/17956#discussion_r116163499 to say we support Scala  + Jackson standard.
    
    ```scala
    scala> "+NaN".toDouble
    res2: Double = NaN
    
    scala> "-NaN".toDouble
    res3: Double = NaN
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try for ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17956
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try for ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17956
  
    **[Test build #76846 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76846/testReport)** for PR 17956 at commit [`aa7c658`](https://github.com/apache/spark/commit/aa7c6580a733c0d964cabd1fcabf1f2730227f10).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17956: [SPARK-18772][SQL] Avoid unnecessary conversion t...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17956#discussion_r116170510
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala ---
    @@ -126,16 +125,13 @@ class JacksonParser(
     
             case VALUE_STRING =>
               // Special case handling for NaN and Infinity.
    -          val value = parser.getText
    -          val lowerCaseValue = value.toLowerCase(Locale.ROOT)
    -          if (lowerCaseValue.equals("nan") ||
    -            lowerCaseValue.equals("infinity") ||
    -            lowerCaseValue.equals("-infinity") ||
    -            lowerCaseValue.equals("inf") ||
    -            lowerCaseValue.equals("-inf")) {
    -            value.toFloat
    -          } else {
    -            throw new RuntimeException(s"Cannot parse $value as FloatType.")
    +          parser.getText match {
    --- End diff --
    
    Up to my knowledge, at least Python supports some case-insensitive cases. I would rather leave this working with options treating this as a variant. At least, I think we can say here it follows Scala + Jackson's standard.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try for ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17956
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17956: [SPARK-18772][SQL] Avoid unnecessary conversion t...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17956#discussion_r116159220
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala ---
    @@ -1988,4 +1988,47 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData {
           assert(errMsg.startsWith("The field for corrupt records must be string type and nullable"))
         }
       }
    +
    +  test("SPARK-18772: Parse special floats correctly") {
    +    // positive cases
    +    val jsons = Seq(
    +      """{"a": "-INF"}""",
    +      """{"a": "INF"}""",
    +      """{"a": "-INF"}""",
    --- End diff --
    
    What is difference between line 1995 and line 1997?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try for ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17956
  
    **[Test build #76894 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76894/testReport)** for PR 17956 at commit [`90330bc`](https://github.com/apache/spark/commit/90330bc7916c906fec10081bd012fc4efdb450f2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17956: [SPARK-18772][SQL] Avoid unnecessary conversion t...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17956#discussion_r116178653
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala ---
    @@ -126,16 +125,13 @@ class JacksonParser(
     
             case VALUE_STRING =>
               // Special case handling for NaN and Infinity.
    -          val value = parser.getText
    -          val lowerCaseValue = value.toLowerCase(Locale.ROOT)
    -          if (lowerCaseValue.equals("nan") ||
    -            lowerCaseValue.equals("infinity") ||
    -            lowerCaseValue.equals("-infinity") ||
    -            lowerCaseValue.equals("inf") ||
    -            lowerCaseValue.equals("-inf")) {
    -            value.toFloat
    -          } else {
    -            throw new RuntimeException(s"Cannot parse $value as FloatType.")
    +          parser.getText match {
    +            case "NaN" | "+NaN" | "-NaN" => Float.NaN
    --- End diff --
    
    I remember that Jackson's `JsonParser.Feature.ALLOW_NON_NUMERIC_NUMBERS` doesn't support "+NaN" and "-NaN".


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17956: [SPARK-18772][SQL] Avoid unnecessary conversion t...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17956#discussion_r116170070
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala ---
    @@ -126,16 +125,13 @@ class JacksonParser(
     
             case VALUE_STRING =>
               // Special case handling for NaN and Infinity.
    -          val value = parser.getText
    -          val lowerCaseValue = value.toLowerCase(Locale.ROOT)
    -          if (lowerCaseValue.equals("nan") ||
    -            lowerCaseValue.equals("infinity") ||
    -            lowerCaseValue.equals("-infinity") ||
    -            lowerCaseValue.equals("inf") ||
    -            lowerCaseValue.equals("-inf")) {
    -            value.toFloat
    -          } else {
    -            throw new RuntimeException(s"Cannot parse $value as FloatType.")
    +          parser.getText match {
    --- End diff --
    
    one last question is, should we be case sensitive here? can we check other systems like pandas?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try for ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17956
  
    Note, we do not support the following cases
    ```
        def floatRecords: Dataset[String] =
          spark.createDataset(spark.sparkContext.parallelize(
            """{"f": "18.00"}""" ::
              """{"f": "18.30"}""" ::
              """{"f": "20.00"}""" :: Nil))(Encoders.STRING)
        val jsonDF = spark.read.schema("f float").json(floatRecords)
    ```
    
    Basically, we do not assume users are use string in double columns for JSON data sources.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17956
  
    **[Test build #76854 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76854/testReport)** for PR 17956 at commit [`e858789`](https://github.com/apache/spark/commit/e85878917d6ff1fe107efc59bb1d403401a2441f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try for ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17956
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76841/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try for ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/17956
  
    LGTM except for a minor comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17956: [SPARK-18772][SQL] Avoid unnecessary conversion t...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17956#discussion_r116288749
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala ---
    @@ -1988,4 +1989,55 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData {
           assert(errMsg.startsWith("The field for corrupt records must be string type and nullable"))
         }
       }
    +
    +  test("SPARK-18772: Parse special floats correctly") {
    +    val jsons = Seq(
    +      """{"a": "+INF"}""",
    +      """{"a": "INF"}""",
    +      """{"a": "-INF"}""",
    +      """{"a": "NaN"}""",
    +      """{"a": "+NaN"}""",
    +      """{"a": "-NaN"}""",
    +      """{"a": "Infinity"}""",
    +      """{"a": "+Infinity"}""",
    +      """{"a": "-Infinity"}""")
    --- End diff --
    
    See https://github.com/apache/spark/pull/17956#issuecomment-301143613


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try and ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17956
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try and ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/17956
  
    I didn't understand what ^ means. Could you elaborate please? `allowNonNumericNumbers` option here for now is incomplete up to my knowledge.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try and ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/17956
  
    1. It is not good to hold off PRs. If a PR looks good and coherent, I think we could merge.
    
    2. "How many JSON writers write a float "INF" as a string?", if it is your worry, I will take out the all newly added cases here. This is why I am thinking we should add this case via options.
    
    3. My question was to ask taking over this. Please answer to this question.
    
    4. Other information looks not related with the topic in this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17956: [SPARK-18772][SQL] Avoid unnecessary conversion t...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17956#discussion_r116159393
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala ---
    @@ -1988,4 +1988,47 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData {
           assert(errMsg.startsWith("The field for corrupt records must be string type and nullable"))
         }
       }
    +
    +  test("SPARK-18772: Parse special floats correctly") {
    +    // positive cases
    +    val jsons = Seq(
    +      """{"a": "-INF"}""",
    +      """{"a": "INF"}""",
    +      """{"a": "-INF"}""",
    --- End diff --
    
    Yes ... 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try for ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17956
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76851/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17956: [SPARK-18772][SQL] Avoid unnecessary conversion t...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17956#discussion_r116179129
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala ---
    @@ -126,16 +125,13 @@ class JacksonParser(
     
             case VALUE_STRING =>
               // Special case handling for NaN and Infinity.
    -          val value = parser.getText
    -          val lowerCaseValue = value.toLowerCase(Locale.ROOT)
    -          if (lowerCaseValue.equals("nan") ||
    -            lowerCaseValue.equals("infinity") ||
    -            lowerCaseValue.equals("-infinity") ||
    -            lowerCaseValue.equals("inf") ||
    -            lowerCaseValue.equals("-inf")) {
    -            value.toFloat
    -          } else {
    -            throw new RuntimeException(s"Cannot parse $value as FloatType.")
    +          parser.getText match {
    --- End diff --
    
    Seems to me NaN and Infinity, INF are special and case insensitive are not really making sense.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17956: [SPARK-18772][SQL] Avoid unnecessary conversion t...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17956#discussion_r116168266
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala ---
    @@ -126,16 +125,11 @@ class JacksonParser(
     
             case VALUE_STRING =>
               // Special case handling for NaN and Infinity.
    -          val value = parser.getText
    -          val lowerCaseValue = value.toLowerCase(Locale.ROOT)
    -          if (lowerCaseValue.equals("nan") ||
    -            lowerCaseValue.equals("infinity") ||
    -            lowerCaseValue.equals("-infinity") ||
    -            lowerCaseValue.equals("inf") ||
    -            lowerCaseValue.equals("-inf")) {
    -            value.toFloat
    --- End diff --
    
    Yea, I added some cases here. I will open another PR/JIRA to add those options into JSON related functionalities.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try for ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/17956
  
    @gatorsmile, @cloud-fan and @viirya, could you take another look please? I tried to get rid of all the behaviour changes existing in both previous PRs but only leave the change to avoid the conversion.
    
    I tried to generate combinations of the existing supports - https://github.com/apache/spark/pull/17956#issuecomment-301230679 and extracted three cases - "NaN", "-Infinity" and "Infinity" which I believe we have supported so far.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try for ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/17956
  
    Thank you everybody sincerely.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17956: [SPARK-18772][SQL] Avoid unnecessary conversion t...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17956#discussion_r116161668
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala ---
    @@ -126,16 +125,11 @@ class JacksonParser(
     
             case VALUE_STRING =>
               // Special case handling for NaN and Infinity.
    -          val value = parser.getText
    -          val lowerCaseValue = value.toLowerCase(Locale.ROOT)
    -          if (lowerCaseValue.equals("nan") ||
    -            lowerCaseValue.equals("infinity") ||
    -            lowerCaseValue.equals("-infinity") ||
    -            lowerCaseValue.equals("inf") ||
    -            lowerCaseValue.equals("-inf")) {
    -            value.toFloat
    --- End diff --
    
    This PR changes the behavior, right? Does your test case pass without the code changes?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try and ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17956
  
    **[Test build #76854 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76854/testReport)** for PR 17956 at commit [`e858789`](https://github.com/apache/spark/commit/e85878917d6ff1fe107efc59bb1d403401a2441f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17956: [SPARK-18772][SQL] Avoid unnecessary conversion t...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17956#discussion_r116173265
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala ---
    @@ -1988,4 +1989,55 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData {
           assert(errMsg.startsWith("The field for corrupt records must be string type and nullable"))
         }
       }
    +
    +  test("SPARK-18772: Parse special floats correctly") {
    +    val jsons = Seq(
    +      """{"a": "+INF"}""",
    +      """{"a": "INF"}""",
    +      """{"a": "-INF"}""",
    +      """{"a": "NaN"}""",
    +      """{"a": "+NaN"}""",
    +      """{"a": "-NaN"}""",
    +      """{"a": "Infinity"}""",
    +      """{"a": "+Infinity"}""",
    +      """{"a": "-Infinity"}""")
    --- End diff --
    
    I did not check it, but does `"""{"a": -Infinity}"""` fail?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try and ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17956
  
    Let me share my understanding. 
    
    First, we do not have a deadline for any non-urgent fixes. This PR is an example. How many JSON writers write a float "INF" as a string? 
    
    Being a committer needs to make a judgement to see how critical a PR is. My personal judgement is that we do not need to rush to merge it. I even want to wait for understanding how the other platforms handle **such a string-represented "INF" in a float-type column**. I believe we are not the first platform to face such an issue. That is why @cloud-fan and I asked the question above.
    
    Second, we have to be careful to avoid introducing bugs when fixing the old issue. Spark is an infrastructure open-source project. We have to be very careful when doing the code reviews and code changes. If we accidentally introduced some supports, it is painful to remove the support and thus we have to keep it maybe forever. @viirya just found it very recently. Thus, please be careful when you do the code review and code changes. 
    
    If the users really need such a fix, they also can fix it and make a build by themselves. I believe that is why people like open source projects. 
    
    At the end, I want to share you a slides about bug costs. You might already saw similar figure before. 
    
    <img width="1395" alt="screen shot 2017-05-12 at 11 41 20 am" src="https://cloud.githubusercontent.com/assets/11567269/26012021/00f34d82-3708-11e7-9271-1a97ecef0fff.png">



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17956: [SPARK-18772][SQL] Avoid unnecessary conversion try for ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17956
  
    **[Test build #76851 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76851/testReport)** for PR 17956 at commit [`e858789`](https://github.com/apache/spark/commit/e85878917d6ff1fe107efc59bb1d403401a2441f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org