You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Zoltán Borók-Nagy (Jira)" <ji...@apache.org> on 2020/10/15 11:35:00 UTC

[jira] [Updated] (IMPALA-10243) ConcurrentModificationException during parallel INSERTs

     [ https://issues.apache.org/jira/browse/IMPALA-10243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zoltán Borók-Nagy updated IMPALA-10243:
---------------------------------------
    Description: 
Impala might throw a ConcurrentModificationException during a high load of INSERTs to the same table.

The exception happens during thrift serialization of TUpdateCatalogResponse which have a reference to the metastore table. The serialization happens without a lock, so another thread might modify the metastore table object in the meantime. This can potentially happen in CatalogOpExecutor.updateCatalog() which updates the catalog version and unsets table column statistics. A high load of INSERT statements increase the probability of the concurrent modification.

I think the problem is that in Table.toThrift() we set a reference to the metastore table object instead of deep copying it:

[https://github.com/apache/impala/blob/481ea4ab0d476a4aa491f99c2a4e376faddc0b03/fe/src/main/java/org/apache/impala/catalog/Table.java#L505]

The stack trace looks like the following:

[1] java.util.HashMap$HashIterator.nextNode (HashMap.java:1,445)
 [2] java.util.HashMap$EntryIterator.next (HashMap.java:1,479)
 [3] java.util.HashMap$EntryIterator.next (HashMap.java:1,477)
 [4] org.apache.hadoop.hive.metastore.api.Table$TableStandardScheme.write (Table.java:2,641)
 [5] org.apache.hadoop.hive.metastore.api.Table$TableStandardScheme.write (Table.java:2,324)
 [6] org.apache.hadoop.hive.metastore.api.Table.write (Table.java:2,082)
 [7] org.apache.impala.thrift.TTable$TTableStandardScheme.write (TTable.java:1,829)
 [8] org.apache.impala.thrift.TTable$TTableStandardScheme.write (TTable.java:1,569)
 [9] org.apache.impala.thrift.TTable.write (TTable.java:1,357)
 [10] org.apache.impala.thrift.TCatalogObject$TCatalogObjectStandardScheme.write (TCatalogObject.java:1,433)
 [11] org.apache.impala.thrift.TCatalogObject$TCatalogObjectStandardScheme.write (TCatalogObject.java:1,272)
 [12] org.apache.impala.thrift.TCatalogObject.write (TCatalogObject.java:1,086)
 [13] org.apache.impala.thrift.TCatalogUpdateResult$TCatalogUpdateResultStandardScheme.write (TCatalogUpdateResult.java:908)
 [14] org.apache.impala.thrift.TCatalogUpdateResult$TCatalogUpdateResultStandardScheme.write (TCatalogUpdateResult.java:780)
 [15] org.apache.impala.thrift.TCatalogUpdateResult.write (TCatalogUpdateResult.java:682)
 [16] org.apache.impala.thrift.TUpdateCatalogResponse$TUpdateCatalogResponseStandardScheme.write (TUpdateCatalogResponse.java:363)
 [17] org.apache.impala.thrift.TUpdateCatalogResponse$TUpdateCatalogResponseStandardScheme.write (TUpdateCatalogResponse.java:325)
 [18] org.apache.impala.thrift.TUpdateCatalogResponse.write (TUpdateCatalogResponse.java:273)
 [19] org.apache.thrift.TSerializer.serialize (TSerializer.java:79)
 [20] org.apache.impala.service.JniCatalog.updateCatalog (JniCatalog.java:314)

  was:
Impala might throw a ConcurrentModificationException during a high load of INSERTs to the same table.

The exception happens during thrift serialization of TUpdateCatalogResponse which have a reference to the metastore table. The serialization happens without a lock, so another thread might modify the metastore table object in the meantime. This can potentially happen in CatalogOpExecutor.updateCatalog() which updates the catalog version and unsets table column statistics. A high load of INSERT statements increase the probability of the concurrent modification.

I think the problem is that in Table.toThrift() we set a reference to the metastore table object instead of deep copying it:

https://github.com/apache/impala/blob/481ea4ab0d476a4aa491f99c2a4e376faddc0b03/fe/src/main/java/org/apache/impala/catalog/Table.java#L505


> ConcurrentModificationException during parallel INSERTs
> -------------------------------------------------------
>
>                 Key: IMPALA-10243
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10243
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: Zoltán Borók-Nagy
>            Assignee: Zoltán Borók-Nagy
>            Priority: Major
>
> Impala might throw a ConcurrentModificationException during a high load of INSERTs to the same table.
> The exception happens during thrift serialization of TUpdateCatalogResponse which have a reference to the metastore table. The serialization happens without a lock, so another thread might modify the metastore table object in the meantime. This can potentially happen in CatalogOpExecutor.updateCatalog() which updates the catalog version and unsets table column statistics. A high load of INSERT statements increase the probability of the concurrent modification.
> I think the problem is that in Table.toThrift() we set a reference to the metastore table object instead of deep copying it:
> [https://github.com/apache/impala/blob/481ea4ab0d476a4aa491f99c2a4e376faddc0b03/fe/src/main/java/org/apache/impala/catalog/Table.java#L505]
> The stack trace looks like the following:
> [1] java.util.HashMap$HashIterator.nextNode (HashMap.java:1,445)
>  [2] java.util.HashMap$EntryIterator.next (HashMap.java:1,479)
>  [3] java.util.HashMap$EntryIterator.next (HashMap.java:1,477)
>  [4] org.apache.hadoop.hive.metastore.api.Table$TableStandardScheme.write (Table.java:2,641)
>  [5] org.apache.hadoop.hive.metastore.api.Table$TableStandardScheme.write (Table.java:2,324)
>  [6] org.apache.hadoop.hive.metastore.api.Table.write (Table.java:2,082)
>  [7] org.apache.impala.thrift.TTable$TTableStandardScheme.write (TTable.java:1,829)
>  [8] org.apache.impala.thrift.TTable$TTableStandardScheme.write (TTable.java:1,569)
>  [9] org.apache.impala.thrift.TTable.write (TTable.java:1,357)
>  [10] org.apache.impala.thrift.TCatalogObject$TCatalogObjectStandardScheme.write (TCatalogObject.java:1,433)
>  [11] org.apache.impala.thrift.TCatalogObject$TCatalogObjectStandardScheme.write (TCatalogObject.java:1,272)
>  [12] org.apache.impala.thrift.TCatalogObject.write (TCatalogObject.java:1,086)
>  [13] org.apache.impala.thrift.TCatalogUpdateResult$TCatalogUpdateResultStandardScheme.write (TCatalogUpdateResult.java:908)
>  [14] org.apache.impala.thrift.TCatalogUpdateResult$TCatalogUpdateResultStandardScheme.write (TCatalogUpdateResult.java:780)
>  [15] org.apache.impala.thrift.TCatalogUpdateResult.write (TCatalogUpdateResult.java:682)
>  [16] org.apache.impala.thrift.TUpdateCatalogResponse$TUpdateCatalogResponseStandardScheme.write (TUpdateCatalogResponse.java:363)
>  [17] org.apache.impala.thrift.TUpdateCatalogResponse$TUpdateCatalogResponseStandardScheme.write (TUpdateCatalogResponse.java:325)
>  [18] org.apache.impala.thrift.TUpdateCatalogResponse.write (TUpdateCatalogResponse.java:273)
>  [19] org.apache.thrift.TSerializer.serialize (TSerializer.java:79)
>  [20] org.apache.impala.service.JniCatalog.updateCatalog (JniCatalog.java:314)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org