You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Dain Sundstrom (Jira)" <ji...@apache.org> on 2023/01/18 00:48:00 UTC

[jira] [Created] (HIVE-26958) JsonSerDe data corruption when scalar type is a json object

Dain Sundstrom created HIVE-26958:
-------------------------------------

             Summary: JsonSerDe data corruption when scalar type is a json object
                 Key: HIVE-26958
                 URL: https://issues.apache.org/jira/browse/HIVE-26958
             Project: Hive
          Issue Type: Bug
          Components: File Formats
            Reporter: Dain Sundstrom


 

JsonSerDe uses the Jackson {{JsonParser.getText}} for decoding scalar values from json strings.  The problem is this method in Jackson converts any token to text including {{START_OBJECT}} '{{{}{{}}}'.  This means when a scalar field is actually a json object, JsonSerDe will process the open curly bracket for {{{}BOOLEAN{}}}, {{{}DECIMAL{}}}, {{{}CHAR{}}}, {{{}VARCHAR{}}}, and {{{}VARBINARY{}}}. Then it continues processing field inside of the json object as if they are part of the outer json object. When the closing curly bracket is encountered it pops a level, which can end parsing early. This bug will result in corrupted data for the following JSON:

 
{code:java}
{ "boolean_field" : {}, "other_field" : 99 } 
  => [boolean_field=false, other_field=null]


{ "boolean_field" : { "other_field" : 42 }, "other_field" : 99 } => (false, null) 
 => [boolean_field=false, other_field=42]{code}
 

BTW, when a json array is passed instead of an object, you get an error because the array does not contain fields which the code checks for.

I think the behavior should result in an error like you get when a json array is field value for a scalar.  If so the fix is to make sure the value token a scalar for non-complex types in {{{}extractCurrentField{}}}, so something like this:
{code:java}
if (!hcatFieldSchema.isComplex() && !valueToken.isScalarValue()) {
    throw new IOException(type + " value must be a scalar json value");
} {code}
 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)