You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "gonglinglei (JIRA)" <ji...@apache.org> on 2018/05/07 09:47:00 UTC
[jira] [Commented] (HIVE-18956) AvroSerDe Race Condition
[ https://issues.apache.org/jira/browse/HIVE-18956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16465693#comment-16465693 ]
gonglinglei commented on HIVE-18956:
------------------------------------
{code:java}
@Override
public void initialize(Configuration configuration, Properties properties) throws SerDeException {
...
if(!badSchema) {
this.avroSerializer = new AvroSerializer();
this.avroDeserializer = new AvroDeserializer();
}
}
{code}
It's already fixed in [HIVE-18410|https://issues.apache.org/jira/browse/HIVE-18410], since both {{AvroSerializer}} and {{AvroDeserializer}} now get instance in {{initialize}}.
> AvroSerDe Race Condition
> ------------------------
>
> Key: HIVE-18956
> URL: https://issues.apache.org/jira/browse/HIVE-18956
> Project: Hive
> Issue Type: Bug
> Components: Serializers/Deserializers
> Affects Versions: 3.0.0, 2.3.2
> Reporter: BELUGA BEHR
> Priority: Trivial
>
> {code}
> @Override
> public Writable serialize(Object o, ObjectInspector objectInspector) throws SerDeException {
> if(badSchema) {
> throw new BadSchemaException();
> }
> return getSerializer().serialize(o, objectInspector, columnNames, columnTypes, schema);
> }
> @Override
> public Object deserialize(Writable writable) throws SerDeException {
> if(badSchema) {
> throw new BadSchemaException();
> }
> return getDeserializer().deserialize(columnNames, columnTypes, writable, schema);
> }
> ...
> private AvroDeserializer getDeserializer() {
> if(avroDeserializer == null) {
> avroDeserializer = new AvroDeserializer();
> }
> return avroDeserializer;
> }
> private AvroSerializer getSerializer() {
> if(avroSerializer == null) {
> avroSerializer = new AvroSerializer();
> }
> return avroSerializer;
> }
> {code}
> {{getDeserializer}} and {{getSerializer}} methods are not thread safe, so neither are {{deserialize}} and {{serialize}} methods. It probably didn't matter with MapReduce, but now that we have Spark/Tez, it may be an issue.
> You could visualize a scenario where three threads all enter {{getSerializer}} and all see that {{avroSerializer}} is _null_ and create three instances, then they would fight to assign the new object to the {{avroSerializer}} variable.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)