You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Toshihiko Uchida (Jira)" <ji...@apache.org> on 2019/10/20 16:34:00 UTC

[jira] [Created] (HIVE-22373) File Merge tasks fail when containers are reused

Toshihiko Uchida created HIVE-22373:
---------------------------------------

             Summary: File Merge tasks fail when containers are reused
                 Key: HIVE-22373
                 URL: https://issues.apache.org/jira/browse/HIVE-22373
             Project: Hive
          Issue Type: Bug
    Affects Versions: 3.1.2
            Reporter: Toshihiko Uchida


h1. Problems
Setting tez.am.container.reuse.enabled=true allows for containers to be reused across multiple tasks.
When two File Merge tasks run on the same container, the last task fails in renaming the output path.

Below is an error log of the task 000001_0 on the container container_e87_1570604853053_11564_01_000003, where the task 000004_0 ran before the task 000001_0.
It shows that the task 000001_0's output file name is taken from the previous task id 000004_0 mistakenly.
{code}
2019-10-15 13:00:31,438 [ERROR] [TezChild] |tez.TezProcessor|: java.lang.RuntimeException: Hive Runtime Error while closing operators
	at org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.close(MergeFileRecordProcessor.java:188)
	at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:284)
	at org.apache.hadoop.hive.ql.exec.tez.MergeFileTezProcessor.run(MergeFileTezProcessor.java:42)
	at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
	at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
	at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
	at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
	at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
	at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
	at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to close AbstractFileMergeOperator
	at org.apache.hadoop.hive.ql.exec.AbstractFileMergeOperator.closeOp(AbstractFileMergeOperator.java:315)
	at org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.closeOp(OrcFileMergeOperator.java:265)
	at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:733)
	at org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.close(MergeFileRecordProcessor.java:180)
	... 17 more
Caused by: java.io.IOException: Unable to rename viewfs://<cluster_name>/user/<user_name>/.hive-staging_hive_2019-10-15_12-59-32_916_2461818728035733124-15476/_task_tmp.-ext-10000/_tmp.000004_0 to viewfs://<cluster_name>/user/<user_name>/.hive-staging_hive_2019-10-15_12-59-32_916_2461818728035733124-15476/_tmp.-ext-10000/000004_0
	at org.apache.hadoop.hive.ql.exec.AbstractFileMergeOperator.closeOp(AbstractFileMergeOperator.java:254)
	... 20 more
{code}

h1. Causes
When AbstractFileMergeOperator is initialized, taskId is updated only for the first time.

- AbstractFileMergeOperator.java
{code}
private void updatePaths(Path tp, Path ttp) {
  if (taskId == null) {
    taskId = Utilities.getTaskId(jc);
  }
{code}

It leads to the above conflict of the output file names.

h1. Solutions
Remove the null-checking conditional, which was introduced in HIVE-14640, and update taskId from JobConf whenever the operator is initialized.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)