You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-issues@hadoop.apache.org by "zhengchenyu (Jira)" <ji...@apache.org> on 2022/08/18 07:37:00 UTC

[jira] [Updated] (HDFS-16732) [SBN READ] Avoid get location from observer when the block report is delayed.

     [ https://issues.apache.org/jira/browse/HDFS-16732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

zhengchenyu updated HDFS-16732:
-------------------------------
    Description: 
Hive on tez application fail occasionally after observer is enable, log show below.
{code:java}
2022-08-18 15:22:06,914 [ERROR] [Dispatcher thread {Central}] |impl.VertexImpl|: Vertex Input: namenodeinfo_stg initializer failed, vertex=vertex_1660618571916_4839_1_00 [Map 1]
org.apache.tez.dag.app.dag.impl.AMUserCodeException: java.lang.ArrayIndexOutOfBoundsException: 0
	at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallback.onFailure(RootInputInitializerManager.java:329)
	at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1056)
	at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
	at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1138)
	at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:958)
	at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:748)
	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.afterRanInterruptibly(TrustedListenableFutureTask.java:133)
	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:80)
	at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 0
	at org.apache.hadoop.mapred.FileInputFormat.identifyHosts(FileInputFormat.java:748)
	at org.apache.hadoop.mapred.FileInputFormat.getSplitHostsAndCachedHosts(FileInputFormat.java:714)
	at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:378)
	at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:306)
	at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:408)
	at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:159)
	at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:279)
	at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:270)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
	at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:270)
	at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:254)
	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)
	... 4 more {code}
As describe in MAPREDUCE-7082, when the block is missing, then will throw this exception, but my cluster had no missing block.

In this example, I found getListing return location information. When block report of observer is delayed, will return the block without location.

HDFS-13924 is introduce to solve this problem, but only consider getBlockLocations. 

In observer node, all method which may return location should check whether locations is empty or not.

  was:
Hive on tez application fail occasionally after observer is enable, log show below.
{code:java}
2022-08-18 15:22:06,914 [ERROR] [Dispatcher thread {Central}] |impl.VertexImpl|: Vertex Input: namenodeinfo_stg initializer failed, vertex=vertex_1660618571916_4839_1_00 [Map 1]
org.apache.tez.dag.app.dag.impl.AMUserCodeException: java.lang.ArrayIndexOutOfBoundsException: 0
	at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallback.onFailure(RootInputInitializerManager.java:329)
	at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1056)
	at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
	at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1138)
	at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:958)
	at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:748)
	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.afterRanInterruptibly(TrustedListenableFutureTask.java:133)
	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:80)
	at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 0
	at org.apache.hadoop.mapred.FileInputFormat.identifyHosts(FileInputFormat.java:748)
	at org.apache.hadoop.mapred.FileInputFormat.getSplitHostsAndCachedHosts(FileInputFormat.java:714)
	at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:378)
	at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:306)
	at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:408)
	at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:159)
	at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:279)
	at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:270)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
	at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:270)
	at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:254)
	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)
	... 4 more {code}
As show in MAPREDUCE-7082, when the block is missing, then will throw this exception, but my cluster had no missing block.

In this example, I found getListing return location information. When block report of observer is delayed, will return the block without location.

HDFS-13924 is introduce to solve this problem, but only consider getBlockLocations. 

In observer node, all method which may return location should check whether locations is empty or not.


> [SBN READ] Avoid get location from observer when the block report is delayed.
> -----------------------------------------------------------------------------
>
>                 Key: HDFS-16732
>                 URL: https://issues.apache.org/jira/browse/HDFS-16732
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs
>    Affects Versions: 3.2.1
>            Reporter: zhengchenyu
>            Assignee: zhengchenyu
>            Priority: Critical
>
> Hive on tez application fail occasionally after observer is enable, log show below.
> {code:java}
> 2022-08-18 15:22:06,914 [ERROR] [Dispatcher thread {Central}] |impl.VertexImpl|: Vertex Input: namenodeinfo_stg initializer failed, vertex=vertex_1660618571916_4839_1_00 [Map 1]
> org.apache.tez.dag.app.dag.impl.AMUserCodeException: java.lang.ArrayIndexOutOfBoundsException: 0
> 	at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallback.onFailure(RootInputInitializerManager.java:329)
> 	at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1056)
> 	at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
> 	at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1138)
> 	at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:958)
> 	at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:748)
> 	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.afterRanInterruptibly(TrustedListenableFutureTask.java:133)
> 	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:80)
> 	at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> 	at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 0
> 	at org.apache.hadoop.mapred.FileInputFormat.identifyHosts(FileInputFormat.java:748)
> 	at org.apache.hadoop.mapred.FileInputFormat.getSplitHostsAndCachedHosts(FileInputFormat.java:714)
> 	at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:378)
> 	at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:306)
> 	at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:408)
> 	at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:159)
> 	at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:279)
> 	at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:270)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:422)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
> 	at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:270)
> 	at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:254)
> 	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
> 	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)
> 	... 4 more {code}
> As describe in MAPREDUCE-7082, when the block is missing, then will throw this exception, but my cluster had no missing block.
> In this example, I found getListing return location information. When block report of observer is delayed, will return the block without location.
> HDFS-13924 is introduce to solve this problem, but only consider getBlockLocations. 
> In observer node, all method which may return location should check whether locations is empty or not.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org