You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Joe McDonnell (Jira)" <ji...@apache.org> on 2020/06/24 18:22:00 UTC

[jira] [Commented] (IMPALA-9887) ASAN builds timeout frequently

    [ https://issues.apache.org/jira/browse/IMPALA-9887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17144160#comment-17144160 ] 

Joe McDonnell commented on IMPALA-9887:
---------------------------------------

I took a look into the ASAN slowness. In particular, one set of tests (query_test/test_parquet_stats.py::TestParquetStats::test_page_index) runs for hours. 

If you run that test once on its own, it is fast. It finishes in about 5 minutes, even on the same type of machine where it runs for hours (i.e. it is not an OS issue).

However, if you leave Impala running and run the test multiple times, its performance degrades:

 
{noformat}
First run:
real    4m50.435s
user    0m42.311s
sys     0m2.803s

Second run:
real    11m1.804s
user    0m46.601s
sys     0m3.140s

Third run:
real    20m26.775s
user    0m45.118s
sys     0m3.813s
{noformat}
This reproduces consistently on my Ubuntu 16.04 machine, and I assume that it continues to degrade. In our usual ASAN tests, this test runs towards the end, and so if things have already run for multiple hours, then this may be super-slow at that point.

 

I took a stack trace of impalad while it was spending hours running query_test/test_parquet_stats.py::TestParquetStats::test_page_index, and the Impalad is basically idle except for this stack:
{noformat}
Thread 2 (Thread 0x7fcff526d700 (LWP 21811)):
#0  eq (args=<synthetic pointer>, hash=<optimized out>, this=0x7fd0b497e068) at /mnt/source/llvm/llvm-5.0.1.src-p2/projects/compiler-rt/lib/sanitizer_common/sanitizer_stackdepot.cc:40
#1  find (hash=<optimized out>, args=..., s=0x7fd0b497e068) at /mnt/source/llvm/llvm-5.0.1.src-p2/projects/compiler-rt/lib/sanitizer_common/sanitizer_stackdepotbase.h:66
#2  Put (inserted=0x0, args=..., this=0x88338a0 <__sanitizer::theDepot>) at /mnt/source/llvm/llvm-5.0.1.src-p2/projects/compiler-rt/lib/sanitizer_common/sanitizer_stackdepotbase.h:107
#3  __sanitizer::StackDepotPut (stack=...) at /mnt/source/llvm/llvm-5.0.1.src-p2/projects/compiler-rt/lib/sanitizer_common/sanitizer_stackdepot.cc:112
#4  0x00000000019eab5f in __asan::Allocator::Allocate (this=this@entry=0x83cef20 <__asan::instance>, size=<optimized out>, size@entry=16, alignment=<optimized out>, alignment@entry=0, stack=stack@entry=0x7fcff5261130, alloc_type=alloc_type@entry=__asan::FROM_NEW_BR, can_fill=can_fill@entry=true) at /mnt/source/llvm/llvm-5.0.1.src-p2/projects/compiler-rt/lib/asan/asan_allocator.cc:476
#5  0x00000000019ebaf5 in __asan::asan_memalign (alignment=alignment@entry=0, size=size@entry=16, stack=stack@entry=0x7fcff5261130, alloc_type=alloc_type@entry=__asan::FROM_NEW_BR) at /mnt/source/llvm/llvm-5.0.1.src-p2/projects/compiler-rt/lib/asan/asan_allocator.cc:854
#6  0x0000000001ad5470 in operator new[] (size=16) at /mnt/source/llvm/llvm-5.0.1.src-p2/projects/compiler-rt/lib/asan/asan_new_delete.cc:95
#7  0x0000000005488000 in llvm::APInt::initSlowCase(llvm::APInt const&) ()
#8  0x0000000005197340 in computeKnownBits(llvm::Value const*, llvm::KnownBits&, unsigned int, (anonymous namespace)::Query const&) ()
#9  0x00000000018eedea in computeKnownBitsAddSub(bool, llvm::Value const*, llvm::Value const*, bool, llvm::KnownBits&, llvm::KnownBits&, unsigned int, (anonymous namespace)::Query const&) ()
#10 0x0000000005199631 in computeKnownBits(llvm::Value const*, llvm::KnownBits&, unsigned int, (anonymous namespace)::Query const&) ()
#11 0x000000000519a1ea in computeKnownBits(llvm::Value const*, unsigned int, (anonymous namespace)::Query const&) ()
#12 0x000000000519a270 in llvm::computeKnownBits(llvm::Value const*, llvm::DataLayout const&, unsigned int, llvm::AssumptionCache*, llvm::Instruction const*, llvm::DominatorTree const*, llvm::OptimizationRemarkEmitter*) ()
#13 0x0000000005089a7d in llvm::SimplifyInstruction(llvm::Instruction*, llvm::SimplifyQuery const&, llvm::OptimizationRemarkEmitter*) ()
#14 0x0000000004c83d7e in (anonymous namespace)::EarlyCSE::processNode(llvm::DomTreeNodeBase<llvm::BasicBlock>*) ()
#15 0x0000000004c85c14 in (anonymous namespace)::EarlyCSE::run() ()
#16 0x0000000004c884f7 in (anonymous namespace)::EarlyCSELegacyCommonPass<true>::runOnFunction(llvm::Function&) [clone .part.401] ()
#17 0x0000000005406f09 in llvm::FPPassManager::runOnFunction(llvm::Function&) ()
#18 0x0000000005020289 in (anonymous namespace)::CGPassManager::runOnModule(llvm::Module&) ()
#19 0x0000000005406651 in llvm::legacy::PassManagerImpl::run(llvm::Module&) ()
#20 0x0000000002bf3c5c in impala::LlvmCodeGen::OptimizeModule (this=<optimized out>) at /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/codegen/llvm-codegen.cc:1242
#21 0x0000000002bf20b5 in impala::LlvmCodeGen::FinalizeModule (this=0x6170004f3b00) at /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/codegen/llvm-codegen.cc:1127
#22 0x00000000025b106f in impala::FragmentState::CodegenHelper (this=0x611000f9aa00) at /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/runtime/fragment-state.cc:119
#23 0x00000000025b0979 in impala::FragmentState::InvokeCodegen (this=0x611000f9aa00) at /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/runtime/fragment-state.cc:89
#24 0x00000000024f664b in impala::FragmentInstanceState::Open (this=<optimized out>) at /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/runtime/fragment-instance-state.cc:388
#25 0x00000000024f31f6 in impala::FragmentInstanceState::Exec (this=0x613000ab7b00) at /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/runtime/fragment-instance-state.cc:98
#26 0x00000000025365ef in impala::QueryState::ExecFInstance (this=0x61d003a5ac80, fis=0x613000ab7b00) at /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/runtime/query-state.cc:791{noformat}
I did stack traces multiple minutes apart with basically the same stack.

query_test/test_parquet_stats.py::TestParquetStats::test_page_index runs a similar set of queries multiple times, so it is making progress and it may not be the same query.

It seems related to codegen, and in particular, tracking stacktraces for malloc. If I run the test with disable_codegen=True, then it is very fast (~2.5 minutes) with no degradation on multiple runs. If I run with ASAN_OPTIONS malloc_context_size=0 (so it doesn't track stack traces of mallocs), there also is no degradation.

> ASAN builds timeout frequently
> ------------------------------
>
>                 Key: IMPALA-9887
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9887
>             Project: IMPALA
>          Issue Type: Bug
>    Affects Versions: Impala 4.0
>            Reporter: Vihang Karajgaonkar
>            Priority: Blocker
>              Labels: broken-build
>
> It has happened atleast couple of times in this week on the ASAN builds. The custom cluster tests fails with the test setup error and logs suggest that coordinator nodes don't start up due to the following exception trace:
> {noformat}
> F0623 17:22:42.725920 25786 frontend.cc:136] IllegalStateException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
> CAUSED BY: RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
> CAUSED BY: InvocationTargetException: null
> CAUSED BY: MetaException: Could not connect to meta store using any of the URIs provided. Most recent failure: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused (Connection refused)
>         at org.apache.thrift.transport.TSocket.open(TSocket.java:226)
>         at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:631)
>         at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:241)
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>         at org.apache.hadoop.hive.metastore.utils.JavaUtils.newInstance(JavaUtils.java:84)
>         at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:95)
>         at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:148)
>         at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:119)
>         at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:112)
>         at org.apache.impala.catalog.MetaStoreClientPool$MetaStoreClient.<init>(MetaStoreClientPool.java:99)
>         at org.apache.impala.catalog.MetaStoreClientPool$MetaStoreClient.<init>(MetaStoreClientPool.java:78)
>         at org.apache.impala.catalog.MetaStoreClientPool.initClients(MetaStoreClientPool.java:174)
>         at org.apache.impala.catalog.MetaStoreClientPool.<init>(MetaStoreClientPool.java:163)
>         at org.apache.impala.catalog.MetaStoreClientPool.<init>(MetaStoreClientPool.java:155)
>         at org.apache.impala.service.Frontend.<init>(Frontend.java:331)
>         at org.apache.impala.service.Frontend.<init>(Frontend.java:288)
>         at org.apache.impala.service.JniFrontend.<init>(JniFrontend.java:144)
> Caused by: java.net.ConnectException: Connection refused (Connection refused)
>         at java.net.PlainSocketImpl.socketConnect(Native Method)
>         at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
>         at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
>         at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
>         at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>         at java.net.Socket.connect(Socket.java:589)
>         at org.apache.thrift.transport.TSocket.open(TSocket.java:221)
>         ... 19 more
> . Impalad exiting.
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org