You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (Jira)" <ji...@apache.org> on 2019/11/11 17:07:00 UTC

[jira] [Commented] (SPARK-29806) Using multiline option for a JSON file which is not multiline results in silent truncation of data.

    [ https://issues.apache.org/jira/browse/SPARK-29806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16971730#comment-16971730 ] 

Hyukjin Kwon commented on SPARK-29806:
--------------------------------------

{{multiline}} in JSON source currently only supports one JSON object or a JSON array.

> Using multiline option for a JSON file which is not multiline results in silent truncation of data.
> ---------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-29806
>                 URL: https://issues.apache.org/jira/browse/SPARK-29806
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.4.4
>            Reporter: Dilip Biswal
>            Priority: Major
>
> The content of input Json File.
> {code:java}
> {"name":"John", "id":"100"}
> {"name":"Marry","id":"200"}{code}
> The above is valid json file but every record is in single line. But trying to read this file
>  with a multiline option with FAILFAST mode, results in data truncation without any error.
> {code:java}
> scala> spark.read.option("multiLine", true).option("mode", "FAILFAST").format("json").load("/tmp/json").show(false)
> +---+----+
> |id |name|
> +---+----+
> |100|John|
> +---+----+
> scala> spark.read.option("mode", "FAILFAST").format("json").load("/tmp/json").show(false)
> +---+-----+
> |id |name |
> +---+-----+
> |100|John |
> |200|Marry|
> +---+-----+{code}
> I think Spark should return an error in this case especially in FAILFAST mode. This can be a common user error and we should not do silent data truncation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org