You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Dilip Biswal (Jira)" <ji...@apache.org> on 2019/11/09 01:21:00 UTC

[jira] [Created] (SPARK-29806) Using multiline option for a JSON file which is not multiline results in silent truncation of data.

Dilip Biswal created SPARK-29806:
------------------------------------

             Summary: Using multiline option for a JSON file which is not multiline results in silent truncation of data.
                 Key: SPARK-29806
                 URL: https://issues.apache.org/jira/browse/SPARK-29806
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.4.4
            Reporter: Dilip Biswal


The content of input Json File.
{code:java}
{"name":"John", "id":"100"}
{"name":"Marry","id":"200"}{code}
The above is valid json file but every record is in single line. But trying to read this file
 with a multiline option with FAILFAST mode, results in data truncation without any error.
{code:java}
scala> spark.read.option("multiLine", true).option("mode", "FAILFAST").format("json").load("/tmp/json").show(false)
+---+----+
|id |name|
+---+----+
|100|John|
+---+----+

scala> spark.read.option("mode", "FAILFAST").format("json").load("/tmp/json").show(false)
+---+-----+
|id |name |
+---+-----+
|100|John |
|200|Marry|
+---+-----+{code}

I think Spark should return an error in this case especially in FAILFAST mode. This can be a common user error and we should not do silent data truncation.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org