You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/12/15 16:15:00 UTC

[jira] [Commented] (SPARK-26376) Skip inputs without tokens by JSON datasource

    [ https://issues.apache.org/jira/browse/SPARK-26376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16722209#comment-16722209 ] 

ASF GitHub Bot commented on SPARK-26376:
----------------------------------------

MaxGekk opened a new pull request #23325: [SPARK-26376][SQL] Skip inputs without tokens by JSON datasource
URL: https://github.com/apache/spark/pull/23325
 
 
   ## What changes were proposed in this pull request?
   
   Added new flag for `JacksonParser` - `skipInputWithoutTokens` to control parser's behaviour when its input doesn't contain any valid JSON tokens. The flag is set to `true` for JSON datasource and enables the same behaviour of the datasource as it has in Spark 2.4 and earlier. The flag is set to `false` for JSON functions like `from_json`. As a consequence of that, `from_json` produces bad records in the `PERMISSIVE` mode for strings without JSON tokens. 
   
   ## How was this patch tested?
   
   It was tested by `JsonSuite`.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Skip inputs without tokens by JSON datasource
> ---------------------------------------------
>
>                 Key: SPARK-26376
>                 URL: https://issues.apache.org/jira/browse/SPARK-26376
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.4.0
>            Reporter: Maxim Gekk
>            Priority: Minor
>
> The changes https://github.com/apache/spark/commit/38628dd1b8298d2686e5d00de17c461c70db99a8 can potentially break existing application if it doesn't expect a bad record for string without any JSON tokens in the PERMISSIVE mode. This ticket aims to return previous behaviour of JSON datasource and ignore such strings (including empty strings). The from_json function should keep new behaviour and produce bad records for empty strings and strings without any JSON tokens.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org