You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Dain Sundstrom (Jira)" <ji...@apache.org> on 2023/01/18 00:48:00 UTC
[jira] [Created] (HIVE-26958) JsonSerDe data corruption when scalar type is a json object
Dain Sundstrom created HIVE-26958:
-------------------------------------
Summary: JsonSerDe data corruption when scalar type is a json object
Key: HIVE-26958
URL: https://issues.apache.org/jira/browse/HIVE-26958
Project: Hive
Issue Type: Bug
Components: File Formats
Reporter: Dain Sundstrom
JsonSerDe uses the Jackson {{JsonParser.getText}} for decoding scalar values from json strings. The problem is this method in Jackson converts any token to text including {{START_OBJECT}} '{{{}{{}}}'. This means when a scalar field is actually a json object, JsonSerDe will process the open curly bracket for {{{}BOOLEAN{}}}, {{{}DECIMAL{}}}, {{{}CHAR{}}}, {{{}VARCHAR{}}}, and {{{}VARBINARY{}}}. Then it continues processing field inside of the json object as if they are part of the outer json object. When the closing curly bracket is encountered it pops a level, which can end parsing early. This bug will result in corrupted data for the following JSON:
{code:java}
{ "boolean_field" : {}, "other_field" : 99 }
=> [boolean_field=false, other_field=null]
{ "boolean_field" : { "other_field" : 42 }, "other_field" : 99 } => (false, null)
=> [boolean_field=false, other_field=42]{code}
BTW, when a json array is passed instead of an object, you get an error because the array does not contain fields which the code checks for.
I think the behavior should result in an error like you get when a json array is field value for a scalar. If so the fix is to make sure the value token a scalar for non-complex types in {{{}extractCurrentField{}}}, so something like this:
{code:java}
if (!hcatFieldSchema.isComplex() && !valueToken.isScalarValue()) {
throw new IOException(type + " value must be a scalar json value");
} {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)