You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "yaniv oren (Jira)" <ji...@apache.org> on 2020/05/20 16:28:00 UTC

[jira] [Created] (SPARK-31772) Json schema reading is not consistent between int and string types

yaniv oren created SPARK-31772:
----------------------------------

             Summary: Json schema reading is not consistent between int and string types
                 Key: SPARK-31772
                 URL: https://issues.apache.org/jira/browse/SPARK-31772
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 2.4.4
            Reporter: yaniv oren


When reading json file using a schema, int value is converted to string if field is string but string field is not converted to int value if field is int.

Sample Code:

read_schema = StructType([StructField({color:#008080}"a"{color}, IntegerType()),
 StructField({color:#008080}"b"{color}, StringType())])
df = {color:#94558d}self{color}.spark_session.read.schema(read_schema).json({color:#008080}"input/json/temp_test"{color})
df.show()

 

json temp_test

{"a": 1,"b": "b1"}
{"a": 2,"b": "b2"}
{"a": 3,"b": 3}
{"a": "4","b": 4}

 

actual:

| a| b|
+----+----+
| 1| b1|
| 2| b2|
| 3| 3|
|null|null|
+----+----+

 

expected:

Third line will be nulled as the fourth line as b is int while in schema it's string.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org