You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Sahil Takiar (Jira)" <ji...@apache.org> on 2020/06/01 17:22:00 UTC

[jira] [Resolved] (IMPALA-9806) Multiple data load failures on HDFS errors for erasure coding builds

     [ https://issues.apache.org/jira/browse/IMPALA-9806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sahil Takiar resolved IMPALA-9806.
----------------------------------
    Resolution: Duplicate

Closing as dup of IMPALA-9794 and IMPALA-9777

> Multiple data load failures on HDFS errors for erasure coding builds
> --------------------------------------------------------------------
>
>                 Key: IMPALA-9806
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9806
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Infrastructure
>    Affects Versions: Impala 4.0
>            Reporter: Laszlo Gaal
>            Priority: Blocker
>
> Erasure coding build shows data load failures for TPC-H, TPC-DS and functional-query data sets, all on HDFS errors. Errors are triggered both from Hive and Impala. Pasting the  failure log section for TPC-H as it is a lot shorter, but the Java backtrace for functional-query (breaking in Hive/Tez) eventually runs into the same HDFS log pattern:
> {code}
> INSERT OVERWRITE TABLE tpch_parquet.region SELECT * FROM tpch.region
> Summary: Inserted 5 rows
> Success: True
> Took: 0.264951944351(s)
> Data:
> : 5
> ERROR: INSERT OVERWRITE TABLE tpch_parquet.orders SELECT * FROM tpch.orders
> Traceback (most recent call last):
>   File "/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/bin/load-data.py", line 208, in exec_impala_query_from_file
>     result = impala_client.execute(query)
>   File "/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/tests/beeswax/impala_beeswax.py", line 187, in execute
>     handle = self.__execute_query(query_string.strip(), user=user)
>   File "/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/tests/beeswax/impala_beeswax.py", line 365, in __execute_query
>     self.wait_for_finished(handle)
>   File "/data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/tests/beeswax/impala_beeswax.py", line 386, in wait_for_finished
>     raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
> ImpalaBeeswaxException: ImpalaBeeswaxException:
>  Query aborted:Failed to write data (length: 159515) to Hdfs file: hdfs://localhost:20500/test-warehouse/tpch.orders_parquet/_impala_insert_staging/7c411965970f926e_f61b13b700000000/.7c411965970f926e-f61b13b700000000_2077531399_dir/7c411965970f926e-f61b13b700000000_1445532249_data.0.parq 
> Error(255): Unknown error 255
> Root cause: RemoteException: File /test-warehouse/tpch.orders_parquet/_impala_insert_staging/7c411965970f926e_f61b13b700000000/.7c411965970f926e-f61b13b700000000_2077531399_dir/7c411965970f926e-f61b13b700000000_1445532249_data.0.parq could only be written to 0 of the 3 required nodes for RS-3-2-1024k. There are 5 datanode(s) running and 5 node(s) are excluded in this operation.
> 	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2266)
> 	at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2773)
> 	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:879)
> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:583)
> 	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
> 	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:985)
> 	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:913)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:422)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2882)
> Failed to close HDFS file: hdfs://localhost:20500/test-warehouse/tpch.orders_parquet/_impala_insert_staging/7c411965970f926e_f61b13b700000000/.7c411965970f926e-f61b13b700000000_2077531399_dir/7c411965970f926e-f61b13b700000000_1445532249_data.0.parq
> Error(255): Unknown error 255
> Root cause: RemoteException: File /test-warehouse/tpch.orders_parquet/_impala_insert_staging/7c411965970f926e_f61b13b700000000/.7c411965970f926e-f61b13b700000000_2077531399_dir/7c411965970f926e-f61b13b700000000_1445532249_data.0.parq could only be written to 0 of the 3 required nodes for RS-3-2-1024k. There are 5 datanode(s) running and 5 node(s) are excluded in this operation.
> 	at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2266)
> 	at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2773)
> 	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:879)
> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:583)
> 	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
> 	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:985)
> 	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:913)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:422)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2882)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)