You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Quanlong Huang (Jira)" <ji...@apache.org> on 2021/03/10 06:03:00 UTC

[jira] [Commented] (IMPALA-10563) Stress insert tests timeout on GCS

    [ https://issues.apache.org/jira/browse/IMPALA-10563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298541#comment-17298541 ] 

Quanlong Huang commented on IMPALA-10563:
-----------------------------------------

Running test_insert_stress.py, I see catalogd is hanging in loading table metadata:
{code:java}
W0309 18:35:24.708657 20674 FileSystemUtil.java:828] listStatus(hadoopPath: gs://quanlong-impala-test-02/test-warehouse/test_inserts_ab08196b.db/test_concurrent_inserts/_impala_insert_staging/cd4865bb423380fa_0a87bec000000000): '
gs://quanlong-impala-test-02/test-warehouse/test_inserts_ab08196b.db/test_concurrent_inserts/_impala_insert_staging/cd4865bb423380fa_0a87bec000000000' does not exist.
W0309 18:35:24.763520 20674 FileSystemUtil.java:828] listStatus(hadoopPath: gs://quanlong-impala-test-02/test-warehouse/test_inserts_ab08196b.db/test_concurrent_inserts/_impala_insert_staging/cd4865bb423380fa_0a87bec000000000): '
gs://quanlong-impala-test-02/test-warehouse/test_inserts_ab08196b.db/test_concurrent_inserts/_impala_insert_staging/cd4865bb423380fa_0a87bec000000000' does not exist.
W0309 18:35:24.804994 20674 FileSystemUtil.java:828] listStatus(hadoopPath: gs://quanlong-impala-test-02/test-warehouse/test_inserts_ab08196b.db/test_concurrent_inserts/_impala_insert_staging/cd4865bb423380fa_0a87bec000000000): '
gs://quanlong-impala-test-02/test-warehouse/test_inserts_ab08196b.db/test_concurrent_inserts/_impala_insert_staging/cd4865bb423380fa_0a87bec000000000' does not exist.
W0309 18:35:24.850267 20674 FileSystemUtil.java:828] listStatus(hadoopPath: gs://quanlong-impala-test-02/test-warehouse/test_inserts_ab08196b.db/test_concurrent_inserts/_impala_insert_staging/cd4865bb423380fa_0a87bec000000000): '
gs://quanlong-impala-test-02/test-warehouse/test_inserts_ab08196b.db/test_concurrent_inserts/_impala_insert_staging/cd4865bb423380fa_0a87bec000000000' does not exist.{code}
It keeps throwing FileNotFoundException on the same path. I add some logs to print the exception in another run:
{code:java}
W0309 19:17:31.889636 14215 FileSystemUtil.java:828] listStatus(hadoopPath: gs://quanlong-impala-test-02/test-warehouse/test_inserts_ab08196b.db/test_concurrent_inserts/_impala_insert_staging/a84ed4b83cfaf266_cebf17f000000000): 'gs://quanlong-impala-test-02/test-warehouse/test_inserts_ab08196b.db/test_concurrent_inserts/_impala_insert_staging/a84ed4b83cfaf266_cebf17f000000000' does not exist.
Java exception follows:
java.io.FileNotFoundException: listStatus(hadoopPath: gs://quanlong-impala-test-02/test-warehouse/test_inserts_ab08196b.db/test_concurrent_inserts/_impala_insert_staging/a84ed4b83cfaf266_cebf17f000000000): 'gs://quanlong-impala-test-02/test-warehouse/test_inserts_ab08196b.db/test_concurrent_inserts/_impala_insert_staging/a84ed4b83cfaf266_cebf17f000000000' does not exist.
        at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.listStatus(GoogleHadoopFileSystemBase.java:915)
        at org.apache.hadoop.fs.FileSystem.listStatusBatch(FileSystem.java:1923)
        at org.apache.hadoop.fs.FileSystem$DirListingIterator.fetchMore(FileSystem.java:2196)
        at org.apache.hadoop.fs.FileSystem$DirListingIterator.hasNext(FileSystem.java:2185)
        at org.apache.impala.common.FileSystemUtil$RecursingIterator.hasNext(FileSystemUtil.java:871)
        at org.apache.impala.common.FileSystemUtil$FilterIterator.hasNext(FileSystemUtil.java:819)
        at org.apache.impala.catalog.FileMetadataLoader.load(FileMetadataLoader.java:202)
        at org.apache.impala.catalog.ParallelFileMetadataLoader.lambda$loadInternal$1(ParallelFileMetadataLoader.java:157)
        at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
        at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
        at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
        at com.google.common.util.concurrent.MoreExecutors$DirectExecutorService.execute(MoreExecutors.java:322)
        at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:134)
        at com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:66)
        at com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:36)
        at org.apache.impala.catalog.ParallelFileMetadataLoader.loadInternal(ParallelFileMetadataLoader.java:157)
        at org.apache.impala.catalog.ParallelFileMetadataLoader.load(ParallelFileMetadataLoader.java:115)
        at org.apache.impala.catalog.HdfsTable.loadFileMetadataForPartitions(HdfsTable.java:747)
        at org.apache.impala.catalog.HdfsTable.updateUnpartitionedTableFileMd(HdfsTable.java:1302)
        at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1188)
        at org.apache.impala.service.CatalogOpExecutor.loadTableMetadata(CatalogOpExecutor.java:1015)
        at org.apache.impala.service.CatalogOpExecutor.updateCatalog(CatalogOpExecutor.java:4808)
        at org.apache.impala.service.JniCatalog.updateCatalog(JniCatalog.java:327)
Caused by: java.io.FileNotFoundException: Item not found: gs://quanlong-impala-test-02/test-warehouse/test_inserts_ab08196b.db/test_concurrent_inserts/_impala_insert_staging/a84ed4b83cfaf266_cebf17f000000000
        at com.google.cloud.hadoop.gcsio.GoogleCloudStorageFileSystem.listFileInfo(GoogleCloudStorageFileSystem.java:1043)
        at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase.listStatus(GoogleHadoopFileSystemBase.java:906)
        ... 22 more{code}

> Stress insert tests timeout on GCS
> ----------------------------------
>
>                 Key: IMPALA-10563
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10563
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Infrastructure
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Major
>
> The following tests failed with TimeoutError when running on GCS:
> {code:java}
> F stress/test_acid_stress.py::TestConcurrentAcidInserts::()::test_concurrent_inserts[unique_database0]
>  stress/test_acid_stress.py:255: in test_concurrent_inserts
>      run_tasks(writers + checkers)
>  stress/stress_util.py:43: in run_tasks
>      pool.map_async(Task.run, tasks).get(timeout_seconds)
>  ../toolchain/toolchain-packages-gcc7.5.0/python-2.7.16/lib/python2.7/multiprocessing/pool.py:568: in get
>      raise TimeoutError
>  E   TimeoutError
> F stress/test_acid_stress.py::TestFailingAcidInserts::()::test_failing_inserts[unique_database0]
>  stress/test_acid_stress.py:333: in test_failing_inserts
>      self._run_test_failing_inserts(unique_database, is_partitioned)
>  stress/test_acid_stress.py:324: in _run_test_failing_inserts
>      run_tasks(writers + checkers)
>  stress/stress_util.py:43: in run_tasks
>      pool.map_async(Task.run, tasks).get(timeout_seconds)
>  ../toolchain/toolchain-packages-gcc7.5.0/python-2.7.16/lib/python2.7/multiprocessing/pool.py:568: in get
>      raise TimeoutError
>  E   TimeoutError
> F stress/test_insert_stress.py::TestInsertStress::()::test_inserts[unique_database0]
>  stress/test_insert_stress.py:102: in test_inserts
>      run_tasks(writers + checkers)
>  stress/stress_util.py:43: in run_tasks
>      pool.map_async(Task.run, tasks).get(timeout_seconds)
>  ../toolchain/toolchain-packages-gcc7.5.0/python-2.7.16/lib/python2.7/multiprocessing/pool.py:568: in get
>      raise TimeoutError
>  E   TimeoutError
> {code}
> It seems the statements take too much time in waiting for HMS. Thus resulting in the whole slow down. Found some error logs in HMS's log:
> {code}
> 2021-03-08T01:29:10,479  INFO [pool-6-thread-181] txn.TxnHandler: Allocated writeId: 1 for txnId: 3667
> 2021-03-08T01:29:10,492  INFO [pool-6-thread-177] txn.TxnHandler: Added entries to MIN_HISTORY_LEVEL with a single query for current txn: [3669]
> 2021-03-08T01:29:10,493  INFO [pool-6-thread-181] txn.TxnHandler: Allocated write ids for dbName=test_concurrent_inserts_8933345c, tblName=test_concurrent_inserts (txnIds: [3667])
> 2021-03-08T01:29:10,502  INFO [pool-6-thread-195] txn.TxnHandler: Added entries to MIN_HISTORY_LEVEL with a single query for current txn: [3670]
> 2021-03-08T01:29:10,507 ERROR [pool-6-thread-197] txn.TxnHandler: Exception during write ids allocation for request=AllocateTableWriteIdsRequest(dbName:test_concurrent_inserts_8933345c, tableName:test_concurrent_inserts, txnIds:[3668]). Will retry if possible.
> org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique constraint "NEXT_WRITE_ID_IDX"
>   Detail: Key ("NWI_DATABASE", "NWI_TABLE")=(test_concurrent_inserts_8933345c, test_concurrent_inserts) already exists.
>         at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2532) ~[postgresql-42.2.14.jar:42.2.14]
>         at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2267) ~[postgresql-42.2.14.jar:42.2.14]
>         at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:312) ~[postgresql-42.2.14.jar:42.2.14]
>         at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:448) ~[postgresql-42.2.14.jar:42.2.14]
>         at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:369) ~[postgresql-42.2.14.jar:42.2.14]
>         at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:153) ~[postgresql-42.2.14.jar:42.2.14]
>         at org.postgresql.jdbc.PgPreparedStatement.execute(PgPreparedStatement.java:142) ~[postgresql-42.2.14.jar:42.2.14]
>         at com.zaxxer.hikari.pool.ProxyPreparedStatement.execute(ProxyPreparedStatement.java:44) ~[HikariCP-2.6.1.jar:?]
>         at com.zaxxer.hikari.pool.HikariProxyPreparedStatement.execute(HikariProxyPreparedStatement.java) ~[HikariCP-2.6.1.jar:?]
>         at org.apache.hadoop.hive.metastore.txn.TxnHandler.allocateTableWriteIds(TxnHandler.java:1820) [hive-exec-3.1.3000.7.2.7.0-44.jar:3.1.3000.7.2.7.0-44]
>         at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.allocate_table_write_ids(HiveMetaStore.java:8370) [hive-exec-3.1.3000.7.2.7.0-44.jar:3.1.3000.7.2.7.0-44]
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_282]
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_282]
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_282]
>         at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_282]
>         at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147) [hive-exec-3.1.3000.7.2.7.0-44.jar:3.1.3000.7.2.7.0-44]
>         at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108) [hive-exec-3.1.3000.7.2.7.0-44.jar:3.1.3000.7.2.7.0-44]
>         at com.sun.proxy.$Proxy28.allocate_table_write_ids(Unknown Source) [?:?]
>         at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$allocate_table_write_ids.getResult(ThriftHiveMetastore.java:20126) [hive-exec-3.1.3000.7.2.7.0-44.jar:3.1.3000.7.2.7.0-44]
>         at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$allocate_table_write_ids.getResult(ThriftHiveMetastore.java:20110) [hive-exec-3.1.3000.7.2.7.0-44.jar:3.1.3000.7.2.7.0-44]
>         at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) [hive-exec-3.1.3000.7.2.7.0-44.jar:3.1.3000.7.2.7.0-44]
>         at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:111) [hive-exec-3.1.3000.7.2.7.0-44.jar:3.1.3000.7.2.7.0-44]
>         at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:107) [hive-exec-3.1.3000.7.2.7.0-44.jar:3.1.3000.7.2.7.0-44]
>         at java.security.AccessController.doPrivileged(Native Method) [?:1.8.0_282]
>         at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_282]
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898) [hadoop-common-3.1.1.7.2.7.0-44.jar:?]
>         at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:119) [hive-exec-3.1.3000.7.2.7.0-44.jar:3.1.3000.7.2.7.0-44]
>         at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) [hive-exec-3.1.3000.7.2.7.0-44.jar:3.1.3000.7.2.7.0-44]
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_282]
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_282]
>         at java.lang.Thread.run(Thread.java:748) [?:1.8.0_282]
> 2021-03-08T01:29:10,513  WARN [pool-6-thread-197] txn.TxnHandler: Retryable error detected in allocateTableWriteIds(AllocateTableWriteIdsRequest(dbName:test_concurrent_inserts_8933345c, tableName:test_concurrent_inserts, txnIds:[3668])).  Will wait 2000ms and retry up to 10 times.  Error: ERROR: duplicate key value violates unique constraint "NEXT_WRITE_ID_IDX"
>   Detail: Key ("NWI_DATABASE", "NWI_TABLE")=(test_concurrent_inserts_8933345c, test_concurrent_inserts) already exists.
> 2021-03-08T01:29:10,554  INFO [pool-6-thread-177] txn.TxnHandler: Allocated writeId: 2 for txnId: 3669
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org