You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Michael Ho (JIRA)" <ji...@apache.org> on 2019/04/22 18:25:00 UTC

[jira] [Resolved] (IMPALA-6316) impalad crashes after hadoopZeroCopyRead failure

     [ https://issues.apache.org/jira/browse/IMPALA-6316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Ho resolved IMPALA-6316.
--------------------------------
       Resolution: Duplicate
    Fix Version/s: Not Applicable

> impalad crashes after hadoopZeroCopyRead failure
> ------------------------------------------------
>
>                 Key: IMPALA-6316
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6316
>             Project: IMPALA
>          Issue Type: Bug
>    Affects Versions: Impala 2.11.0
>            Reporter: Pranay Singh
>            Priority: Major
>             Fix For: Not Applicable
>
>
> End- End tests fails
> ---------------------------
> 20:00:40 [gw0] PASSED query_test/test_join_queries.py::TestJoinQueries::test_single_node_joins_with_limits_exhaustive[batch_size: 1 | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] 
> 20:04:17 query_test/test_join_queries.py::TestJoinQueries::test_single_node_joins_with_limits_exhaustive[batch_size: 1 | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] 
> 20:04:17 [gw1] FAILED query_test/test_queries.py::TestQueries::test_union[exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: rc/snap/block] 
> 20:04:17 query_test/test_queries.py::TestQueries::test_union[exec_option: {'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 100, 'batch_size': 0, 'num_nodes': 0} | table_format: rc/snap/block] 
> 20:04:17 [gw2] FAILED query_test/test_queries.py::TestQueries::test_subquery[exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: seq/def/record] 
> 20:04:17 [gw3] FAILED query_test/test_queries.py::TestQueries::test_analytic_fns[exec_option: {'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 100, 'batch_size': 0, 'num_nodes': 0} | table_format: seq/def/block] 
> 20:04:17 query_test/test_queries.py::TestQueries::test_subquery[exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: seq/def/record] 
> 20:04:17 [gw0] FAILED query_test/test_join_queries.py::TestJoinQueries::test_single_node_joins_with_limits_exhaustive[batch_size: 1 | exec_option: {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: parquet/none]
> #0  0x00000031bea328e5 in raise () from /lib64/libc.so.6
> #1  0x00000031bea340c5 in abort () from /lib64/libc.so.6
> #2  0x0000000003be91a4 in google::DumpStackTraceAndExit() ()
> #3  0x0000000003bdfc1d in google::LogMessage::Fail() ()
> #4  0x0000000003be14c2 in google::LogMessage::SendToLog() ()
> #5  0x0000000003bdf5f7 in google::LogMessage::Flush() ()
> #6  0x0000000003be2bbe in google::LogMessageFatal::~LogMessageFatal() ()
> #7  0x000000000189390a in impala::FragmentInstanceState::Close (this=0xc188ee0) at repos/Impala/be/src/runtime/fragment-instance-state.cc:315
> #8  0x0000000001890a12 in impala::FragmentInstanceState::Exec (this=0xc188ee0) at repos/Impala/be/src/runtime/fragment-instance-state.cc:95
> #9  0x00000000018797b8 in impala::QueryState::ExecFInstance (this=0x20584000, fis=0xc188ee0) at repos/Impala/be/src/runtime/query-state.cc:382
> #10 0x000000000187807a in impala::QueryState::<lambda()>::operator()(void) const (__closure=0x7fc1fafd9bc8) at repos/Impala/be/src/runtime/query-state.cc:325
> #11 0x000000000187a3f7 in boost::detail::function::void_function_obj_invoker0<impala::QueryState::StartFInstances()::<lambda()>, void>::invoke(boost::detail::function::function_buffer &) (function_obj_ptr=...) at Impala-Toolchain/boost-1.57.0-p3/include/boost/function/function_template.hpp:153
> #12 0x00000000017c6ed4 in boost::function0<void>::operator() (this=0x7fc1fafd9bc0) at Impala-Toolchain/boost-1.57.0-p3/include/boost/function/function_template.hpp:767
> #13 0x0000000001abdbc9 in impala::Thread::SuperviseThread (name=..., category=..., functor=..., thread_started=0x7fc0cc476ab0) at repos/Impala/be/src/util/thread.cc:352
> #14 0x0000000001ac6754 in boost::_bi::list4<boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<boost::function<void()> >, boost::_bi::value<impala::Promise<long int>*> >::operator()<void (*)(const std::basic_string<char>&, const std::basic_string<char>&, boost::function<void()>, impala::Promise<long int>*), boost::_bi::list0>(boost::_bi::type<void>, void (*&)(const std::basic_string<char, std::char_traits<char>, std::allocator<char> > &, const std::basic_string<char, std::char_traits<char>, std::allocator<char> > &, boost::function<void()>, impala::Promise<long> *), boost::_bi::list0 &, int) (this=0x1eec8f7c0, f=@0x1eec8f7b8, a=...) at workspace/impala-cdh5-trunk-exhaustive/Impala-Toolchain/boost-1.57.0-p3/include/boost/bind/bind.hpp:457
> #15 0x0000000001ac6697 in boost::_bi::bind_t<void, void (*)(const std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, const std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, boost::function<void()>, impala::Promise<long int>*), boost::_bi::list4<boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<boost::function<void()> >, boost::_bi::value<impala::Promise<long int>*> > >::operator()(void) (this=0x1eec8f7b8) at workspace/impala-cdh5-trunk-exhaustive/Impala-Toolchain/boost-1.57.0-p3/include/boost/bind/bind_template.hpp:20
> #16 0x0000000001ac665a in boost::detail::thread_data<boost::_bi::bind_t<void, void (*)(const std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, const std::basic_string<char, std::char_traits<char>, std::allocator<char> >&, boost::function<void()>, impala::Promise<long int>*), boost::_bi::list4<boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, boost::_bi::value<boost::function<void()> >, boost::_bi::value<impala::Promise<long int>*> > > >::run(void) (this=0x1eec8f600) at workspace/impala-cdh5-trunk-exhaustive/Impala-Toolchain/boost-1.57.0-p3/include/boost/thread/detail/thread.hpp:116
> #17 0x0000000002d6966a in thread_proxy ()
> #18 0x00000031bee07851 in start_thread () from /lib64/libpthread.so.0
> #19 0x00000031beae894d in clone () from /lib64/libc.so.6
> log traces when this happened from impalad.INFO
> --------------------------------------------------------------------
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> E1208 15:03:55.125169  2169 Analyzer.java:2375] Failed to load metadata for table: alltypes
> Failed to load metadata for table: functional.alltypes. Running 'invalidate metadata functional.alltypes' may resolve this problem.
> CAUSED BY: MetaException: Could not connect to meta store using any of the URIs provided. Most recent failure: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused
>   at org.apache.thrift.transport.TSocket.open(TSocket.java:226)
>   at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:472)
>   at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.reconnect(HiveMetaStoreClient.java:337)
>   at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:98)
>   at com.sun.proxy.$Proxy5.getTable(Unknown Source)
>   at org.apache.impala.catalog.TableLoader.load(TableLoader.java:65)
>   at org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:241)
>   at org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:238)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.net.ConnectException: Connection refused
>   at java.net.PlainSocketImpl.socketConnect(Native Method)
>   at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
>   at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
>   at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
>   at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>   at java.net.Socket.connect(Socket.java:579)
>   at org.apache.thrift.transport.TSocket.open(TSocket.java:221)
>   ... 11 more
> Picked up JAVA_TOOL_OPTIONS: -agentlib:jdwp=transport=dt_socket,address=30000,server=y,suspend=n
> hdfsOpenFile(hdfs://localhost:20500/test-warehouse/file_open_fail/564e4332cbb6e8de-c0c5101c00000000_2005391775_data.0.): FileSystem#open((Lorg/apache/hadoop/fs/Path;I)Lorg/apache/hadoop/fs/FSDataInputStream;) error:
> RemoteException: File does not exist: /test-warehouse/file_open_fail/564e4332cbb6e8de-c0c5101c00000000_2005391775_data.0.
>   at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
>   at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
>   at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:2100)
>   at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2070)
>   at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1983)
>   at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:579)
>   at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getBlockLocations(AuthorizationProviderProxyClientProtocol.java:92)
> .
> .
> .
> FSDataOutputStream#close error:
> RemoteException: No lease on /test-warehouse/tpch_parquet.db/ctas_cancel/_impala_insert_staging/a14b1ee198cd7327_a46f833a00000000/.a14b1ee198cd7327-a46f833a00000002_567821133_dir/a14b1ee198cd7327-a46f833a00000002_1272243926_data.0.parq (inode 37350): File does not exist. Holder DFSClient_NONMAPREDUCE_307426671_1 does not have any open files.
>   at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3760)
>   at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:3561)
>   at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3417)
>   at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:690)
>   at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:217)
>   at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:506)
>   at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2281)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2277)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2275)
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /test-warehouse/tpch_parquet.db/ctas_cancel/_impala_insert_staging/a14b1ee198cd7327_a46f833a00000000/.a14b1ee198cd7327-a46f833a00000002_567821133_dir/a14b1ee198cd7327-a46f833a00000002_1272243926_data.0.parq (inode 37350): File does not exist. Holder DFSClient_NONMAPREDUCE_307426671_1 does not have any open files.
>   at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3760)
>   at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:3561)
>   at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3417)
>   at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:690)
>   at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:217)
>   at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:506)
>   at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2281)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2277)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at org.apache.hadoop.security.UserGroupInformation.doAs(UserGr.
> .
> .
> FSDataOutputStream#close error:
> RemoteException: No lease on /test-warehouse/functional_parquet.db/alltypesinsert/_impala_insert_staging/3949d68930d0228e_c655177500000000/.3949d68930d0228e-c655177500000008_357133657_dir/year=2009/month=0/3949d68930d0228e-c655177500000008_24906809_data.0.parq (inode 88180): File does not exist. [Lease.  Holder: DFSClient_NONMAPREDUCE_307426671_1, pending creates: 1]
>   at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3760)
>   at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:3561)
>   at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3417)
>   at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:690)
>   at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:217)
>   at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:506)
>   at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$E1208 17:14:10.009800 15893 LiteralExpr.java:186] Failed to evaluate expr 'space(1073741830)'
> tcmalloc: large alloc 2147483648 bytes == 0x2708a6000 @  0x3d039c6 0x7fc28438ac49
> tcmalloc: large alloc 4294967296 bytes == 0x7fc0dd294000 @  0x3d039c6 0x7fc28438ac49
> E1208 17:17:29.387645 15893 LiteralExpr.java:186] Failed to evaluate expr 'space(1073741830)'
> E1208 17:17:30.425915 15893 LiteralExpr.java:186] Failed to evaluate expr 'space(1073741830)'
> E1208 17:18:41.971148 15893 LiteralExpr.java:186] Failed to evaluate expr 'space(1073741830)'
> tcmalloc: large alloc 4294967296 bytes == 0x7fc0dd294000 @  0x3d039c6 0x7fc28438ac49
> E1208 17:21:30.161092 15893 LiteralExpr.java:186] Failed to evaluate expr 'space(1073741830)'
> E1208 17:21:30.913319 15893 LiteralExpr.java:186] Failed to evaluate expr 'space(1073741830)'
> E1208 17:25:00.198657 18963 LiteralExpr.java:186] Failed to evaluate expr 'test_mem_limits_978e0f35.memtest(10485760)'
> E1208 17:25:00.199533 18963 LiteralExpr.java:186] Failed to evaluate expr 'test_mem_limits_978e0f35.memtest(10485760)'
> E1208 17:25:00.200562 18963 LiteralExpr.java:186] Failed to evaluate expr 'test_mem_limits_978e0f35.memtest(10485760)'
> E1208 17:25:08.581363 18963 LiteralExpr.java:186] Failed to evaluate expr 'test_mem_limits_ae6bd38e.memtest(10485760)'
> .
> .
> E1208 19:13:11.192692  7603 LiteralExpr.java:186] Failed to evaluate expr 'TIMESTAMP '1400-01-01 21:00:00' - INTERVAL 1 DAYS'
> E1208 19:13:11.224931  7603 LiteralExpr.java:186] Failed to evaluate expr 'TIMESTAMP '1400-01-01 21:00:00' - INTERVAL 1 DAYS'
> .
> .
> hadoopZeroCopyRead: ZeroCopyCursor#read failed error:
> ReadOnlyBufferException: java.nio.ReadOnlyBufferException
>   at java.nio.DirectByteBufferR.put(DirectByteBufferR.java:344)
>   at org.apache.hadoop.crypto.CryptoInputStream.decrypt(CryptoInputStream.java:53F1208 20:00:45.213917 25256 fragment-instance-state.cc:315] Check failed: other_time <= total_time + 1 (481986958 vs. 481986956)
> *** Check failure stack trace: ***
>     @          0x3bdfc1d  google::LogMessage::Fail()
>     @          0x3be14c2  google::LogMessage::SendToLog()
>     @          0x3bdf5f7  google::LogMessage::Flush()
>     @          0x3be2bbe  google::LogMessageFatal::~LogMessageFatal()
>     @          0x189390a  impala::FragmentInstanceState::Close()
>     @          0x1890a12  impala::FragmentInstanceState::Exec()
>     @          0x18797b8  impala::QueryState::ExecFInstance()
>     @          0x187807a  _ZZN6impala10QueryState15StartFInstancesEvENKUlvE_clEv
>     @          0x187a3f7  _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala10QueryState15StartFInstancesEvEUlvE_vE6invokeERNS1_15function_bufferE
>     @          0x17c6ed4  boost::function0<>::operator()()
>     @          0x1abdbc9  impala::Thread::SuperviseThread()
>     @          0x1ac6754  boost::_bi::list4<>::operator()<>()
>     @          0x1ac6697  boost::_bi::bind_t<>::operator()()
>     @          0x1ac665a  boost::detail::thread_data<>::run()
>     @          0x2d6966a  thread_proxy
>     @       0x31bee07851  (unknown)
>     @       0x31beae894d  (unknown)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)