You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "gonglinglei (JIRA)" <ji...@apache.org> on 2018/05/07 09:47:00 UTC

[jira] [Commented] (HIVE-18956) AvroSerDe Race Condition

    [ https://issues.apache.org/jira/browse/HIVE-18956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16465693#comment-16465693 ] 

gonglinglei commented on HIVE-18956:
------------------------------------


{code:java}
  @Override
  public void initialize(Configuration configuration, Properties properties) throws SerDeException {
...
    if(!badSchema) {
      this.avroSerializer = new AvroSerializer();
      this.avroDeserializer = new AvroDeserializer();
    }
  }
{code}

It's already fixed in [HIVE-18410|https://issues.apache.org/jira/browse/HIVE-18410], since both {{AvroSerializer}} and {{AvroDeserializer}} now get instance in {{initialize}}.

> AvroSerDe Race Condition
> ------------------------
>
>                 Key: HIVE-18956
>                 URL: https://issues.apache.org/jira/browse/HIVE-18956
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>    Affects Versions: 3.0.0, 2.3.2
>            Reporter: BELUGA BEHR
>            Priority: Trivial
>
> {code}
>   @Override
>   public Writable serialize(Object o, ObjectInspector objectInspector) throws SerDeException {
>     if(badSchema) {
>       throw new BadSchemaException();
>     }
>     return getSerializer().serialize(o, objectInspector, columnNames, columnTypes, schema);
>   }
>   @Override
>   public Object deserialize(Writable writable) throws SerDeException {
>     if(badSchema) {
>       throw new BadSchemaException();
>     }
>     return getDeserializer().deserialize(columnNames, columnTypes, writable, schema);
>   }
> ...
>   private AvroDeserializer getDeserializer() {
>     if(avroDeserializer == null) {
>       avroDeserializer = new AvroDeserializer();
>     }
>     return avroDeserializer;
>   }
>   private AvroSerializer getSerializer() {
>     if(avroSerializer == null) {
>       avroSerializer = new AvroSerializer();
>     }
>     return avroSerializer;
>   }
> {code}
> {{getDeserializer}} and {{getSerializer}} methods are not thread safe, so neither are {{deserialize}} and {{serialize}} methods.  It probably didn't matter with MapReduce, but now that we have Spark/Tez, it may be an issue.
> You could visualize a scenario where three threads all enter {{getSerializer}} and all see that {{avroSerializer}} is _null_ and create three instances, then they would fight to assign the new object to the {{avroSerializer}} variable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)