You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2018/06/22 08:32:00 UTC
[jira] [Resolved] (SPARK-23603) When the length of the json is in a
range,get_json_object will result in missing tail data
[ https://issues.apache.org/jira/browse/SPARK-23603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-23603.
----------------------------------
Resolution: Duplicate
2.7.x has a regression we had to revert it back. See also https://github.com/apache/spark/pull/9759
> When the length of the json is in a range,get_json_object will result in missing tail data
> ------------------------------------------------------------------------------------------
>
> Key: SPARK-23603
> URL: https://issues.apache.org/jira/browse/SPARK-23603
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.0.0, 2.2.0, 2.3.0
> Reporter: dzcxzl
> Priority: Major
>
> Jackson(>=2.7.7) fixes the possibility of missing tail data when the length of the value is in a range
> [https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.7.7]
> [https://github.com/FasterXML/jackson-core/issues/307]
> spark-shell:
> {code:java}
> val value = "x" * 3000
> val json = s"""{"big": "$value"}"""
> spark.sql("select length(get_json_object(\'"+json+"\','$.big'))" ).collect
> res0: Array[org.apache.spark.sql.Row] = Array([2991])
> {code}
> expect result : 3000
> actual result : 2991
> There are two solutions
> One is
> *Bump jackson from 2.6.7&2.6.7.1 to 2.7.7*
> The other one is
> *Replace writeRaw(char[] text, int offset, int len) with writeRaw(String text)*
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org