You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/02/19 22:13:32 UTC

[GitHub] [spark] andygrove commented on pull request #35573: [WIP] [SPARK-38060] [SQL] Respect allowNonNumericNumbers when parsing quoted NaN and Infinity values in JSON reader

andygrove commented on pull request #35573:
URL: https://github.com/apache/spark/pull/35573#issuecomment-1046114075


   > Dumb questions as I just don't know this code path -- would we parse `"1.0"` as a number or reject it for being a string? and you're saying that Jackson would parse `"NaN"` as NaN and not a string? It feels weird to parse strings that aren't even numbers as numbers, even when the option to do so is set, but if that is consistent with other behavior I could see it.
   
   Jackson will parse valid JSON numbers as either integers or floats. It will parse strings as strings and does not try to interpret them. Jackson has special behavior for tokens such as `NaN` and `Infinity` which are not valid JSON values (JSON does not support non-numeric numbers at all). Jackson will not attempt to interpret `"NaN"` and would just return a string.
   
   For some reason, the code we have today in Spark's `JacksonParser` will look at any returned strings and will parse them if they are `NaN`, `Infinity` or `-Infinity` (but not the other special values that Jackson supports, such as `+INF`). It will not attempt to parse strings that contain actual numeric values like `"1.0"`. This secondary parsing seems like a bad idea overall IMO. Users could get similar behavior by specifying that the column is `StringType` when reading and then performing an explicit `CAST` to a floating-point type and it would support reading numbers and not-numbers regardless of whether they are JSON strings or JSON numbers.
   
   If we have the option of removing this code (or putting it behind a configuration option) then I would be in favor of that. If not, then I would like it to support the same set of not-a-number strings as Jackson and I would like it to respect the `allowNonNumericNumbers` option.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org