You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "xiepengjie (Jira)" <ji...@apache.org> on 2019/09/27 01:05:00 UTC

[jira] [Updated] (HIVE-22247) HiveHFileOutputFormat throws FileNotFoundException when partition's task output empty

     [ https://issues.apache.org/jira/browse/HIVE-22247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

xiepengjie updated HIVE-22247:
------------------------------
    Description: 
When partition's task output empty, HiveHFileOutputFormat throws FileNotFoundException like this:
{code:java}
2019-09-24 19:15:55,886 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: 1 finished. closing... 
2019-09-24 19:15:55,886 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: FS[1]: records written - 0
2019-09-24 19:15:55,886 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: Final Path: FS hdfs://Hdptest-mini-nmg/tmp/hive-staging/hadoop_hive_2019-09-24_19-15-26_453_1697529445006435790-5/_tmp.-ext-10002/000002_0
2019-09-24 19:15:55,886 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file: FS hdfs://Hdptest-mini-nmg/tmp/hive-staging/hadoop_hive_2019-09-24_19-15-26_453_1697529445006435790-5/_task_tmp.-ext-10002/_tmp.000002_0
2019-09-24 19:15:55,886 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: New Final Path: FS hdfs://Hdptest-mini-nmg/tmp/hive-staging/hadoop_hive_2019-09-24_19-15-26_453_1697529445006435790-5/_tmp.-ext-10002/000002_0
2019-09-24 19:15:55,915 INFO [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: File Output Committer Algorithm version is 1
2019-09-24 19:15:55,954 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
2019-09-24 19:15:56,089 ERROR [main] ExecReducer: Hit error while closing operators - failing tree
2019-09-24 19:15:56,090 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.RuntimeException: Hive Runtime Error while closing operators: java.io.FileNotFoundException: File hdfs://Hdptest-mini-nmg/tmp/hive-staging/hadoop_hive_2019-09-24_19-15-26_453_1697529445006435790-5/_task_tmp.-ext-10002/_tmp.000002_0 does not exist.
  at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:287)
  at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:453)
  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:422)
  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1923)
  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.FileNotFoundException: File hdfs://Hdptest-mini-nmg/tmp/hive-staging/hadoop_hive_2019-09-24_19-15-26_453_1697529445006435790-5/_task_tmp.-ext-10002/_tmp.000002_0 does not exist.
  at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:200)
  at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:1016)
  at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:617)
  at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:631)
  at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:278)
  ... 7 more
Caused by: java.io.FileNotFoundException: File hdfs://Hdptest-mini-nmg/tmp/hive-staging/hadoop_hive_2019-09-24_19-15-26_453_1697529445006435790-5/_task_tmp.-ext-10002/_tmp.000002_0 does not exist.
  at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:880)
  at org.apache.hadoop.hdfs.DistributedFileSystem.access$700(DistributedFileSystem.java:109)
  at org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:938)
  at org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:934)
  at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
  at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:945)
  at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1592)
  at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1632)
  at org.apache.hadoop.hive.hbase.HiveHFileOutputFormat$1.close(HiveHFileOutputFormat.java:153)
  at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:197)
  ... 11 more

2019-09-24 19:15:56,093 INFO [main] org.apache.hadoop.mapred.Task: Runnning cleanup for the task
{code}
I think we should skip it if srcDir do not exist, fix like this:
{code:java}
@Override
public void close(boolean abort) throws IOException {
  try {

    ...

    FileStatus [] files = null;
    for (;;) {
      try {
        files = fs.listStatus(srcDir, FileUtils.STAGING_DIR_PATH_FILTER);
      } catch (FileNotFoundException fnfe) {
        LOG.error(String.format("Output data is empty, please check Task [ %s ]", tac.getTaskAttemptID().toString()), fnfe);
        break;
      }

   ...

  } catch (InterruptedException ex) {
    throw new IOException(ex);
  }
}
{code}

  was:
When partition's task output empty, HiveHFileOutputFormat throws FileNotFoundException like this:
{code:java}
2019-09-24 19:15:55,886 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: 1 finished. closing... 
2019-09-24 19:15:55,886 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: FS[1]: records written - 0
2019-09-24 19:15:55,886 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: Final Path: FS hdfs://Hdptest-mini-nmg/tmp/hive-staging/hadoop_hive_2019-09-24_19-15-26_453_1697529445006435790-5/_tmp.-ext-10002/000002_0
2019-09-24 19:15:55,886 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file: FS hdfs://Hdptest-mini-nmg/tmp/hive-staging/hadoop_hive_2019-09-24_19-15-26_453_1697529445006435790-5/_task_tmp.-ext-10002/_tmp.000002_0
2019-09-24 19:15:55,886 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: New Final Path: FS hdfs://Hdptest-mini-nmg/tmp/hive-staging/hadoop_hive_2019-09-24_19-15-26_453_1697529445006435790-5/_tmp.-ext-10002/000002_0
2019-09-24 19:15:55,915 INFO [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: File Output Committer Algorithm version is 1
2019-09-24 19:15:55,954 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
2019-09-24 19:15:56,089 ERROR [main] ExecReducer: Hit error while closing operators - failing tree
2019-09-24 19:15:56,090 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.RuntimeException: Hive Runtime Error while closing operators: java.io.FileNotFoundException: File hdfs://Hdptest-mini-nmg/tmp/hive-staging/hadoop_hive_2019-09-24_19-15-26_453_1697529445006435790-5/_task_tmp.-ext-10002/_tmp.000002_0 does not exist.
  at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:287)
  at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:453)
  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:422)
  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1923)
  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.FileNotFoundException: File hdfs://Hdptest-mini-nmg/tmp/hive-staging/hadoop_hive_2019-09-24_19-15-26_453_1697529445006435790-5/_task_tmp.-ext-10002/_tmp.000002_0 does not exist.
  at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:200)
  at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:1016)
  at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:617)
  at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:631)
  at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:278)
  ... 7 more
Caused by: java.io.FileNotFoundException: File hdfs://Hdptest-mini-nmg/tmp/hive-staging/hadoop_hive_2019-09-24_19-15-26_453_1697529445006435790-5/_task_tmp.-ext-10002/_tmp.000002_0 does not exist.
  at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:880)
  at org.apache.hadoop.hdfs.DistributedFileSystem.access$700(DistributedFileSystem.java:109)
  at org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:938)
  at org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:934)
  at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
  at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:945)
  at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1592)
  at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1632)
  at org.apache.hadoop.hive.hbase.HiveHFileOutputFormat$1.close(HiveHFileOutputFormat.java:153)
  at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:197)
  ... 11 more

2019-09-24 19:15:56,093 INFO [main] org.apache.hadoop.mapred.Task: Runnning cleanup for the task
{code}
I think we should skip it if srcDir do not exist,fixed code like this:
{code:java}
@Override
public void close(boolean abort) throws IOException {
  try {

    ...

    FileStatus [] files = null;
    for (;;) {
      try {
        files = fs.listStatus(srcDir, FileUtils.STAGING_DIR_PATH_FILTER);
      } catch (FileNotFoundException fnfe) {
        LOG.error(String.format("Output data is empty, please check Task [ %s ]", tac.getTaskAttemptID().toString()), fnfe);
        break;
      }

   ...

  } catch (InterruptedException ex) {
    throw new IOException(ex);
  }
}
{code}


> HiveHFileOutputFormat throws FileNotFoundException when partition's task output empty
> -------------------------------------------------------------------------------------
>
>                 Key: HIVE-22247
>                 URL: https://issues.apache.org/jira/browse/HIVE-22247
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 2.2.0, 3.0.0
>            Reporter: xiepengjie
>            Assignee: xiepengjie
>            Priority: Major
>
> When partition's task output empty, HiveHFileOutputFormat throws FileNotFoundException like this:
> {code:java}
> 2019-09-24 19:15:55,886 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: 1 finished. closing... 
> 2019-09-24 19:15:55,886 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: FS[1]: records written - 0
> 2019-09-24 19:15:55,886 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: Final Path: FS hdfs://Hdptest-mini-nmg/tmp/hive-staging/hadoop_hive_2019-09-24_19-15-26_453_1697529445006435790-5/_tmp.-ext-10002/000002_0
> 2019-09-24 19:15:55,886 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file: FS hdfs://Hdptest-mini-nmg/tmp/hive-staging/hadoop_hive_2019-09-24_19-15-26_453_1697529445006435790-5/_task_tmp.-ext-10002/_tmp.000002_0
> 2019-09-24 19:15:55,886 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: New Final Path: FS hdfs://Hdptest-mini-nmg/tmp/hive-staging/hadoop_hive_2019-09-24_19-15-26_453_1697529445006435790-5/_tmp.-ext-10002/000002_0
> 2019-09-24 19:15:55,915 INFO [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: File Output Committer Algorithm version is 1
> 2019-09-24 19:15:55,954 INFO [main] org.apache.hadoop.conf.Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
> 2019-09-24 19:15:56,089 ERROR [main] ExecReducer: Hit error while closing operators - failing tree
> 2019-09-24 19:15:56,090 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.RuntimeException: Hive Runtime Error while closing operators: java.io.FileNotFoundException: File hdfs://Hdptest-mini-nmg/tmp/hive-staging/hadoop_hive_2019-09-24_19-15-26_453_1697529445006435790-5/_task_tmp.-ext-10002/_tmp.000002_0 does not exist.
>   at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:287)
>   at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:453)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1923)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.io.FileNotFoundException: File hdfs://Hdptest-mini-nmg/tmp/hive-staging/hadoop_hive_2019-09-24_19-15-26_453_1697529445006435790-5/_task_tmp.-ext-10002/_tmp.000002_0 does not exist.
>   at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:200)
>   at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:1016)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:617)
>   at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:631)
>   at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:278)
>   ... 7 more
> Caused by: java.io.FileNotFoundException: File hdfs://Hdptest-mini-nmg/tmp/hive-staging/hadoop_hive_2019-09-24_19-15-26_453_1697529445006435790-5/_task_tmp.-ext-10002/_tmp.000002_0 does not exist.
>   at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:880)
>   at org.apache.hadoop.hdfs.DistributedFileSystem.access$700(DistributedFileSystem.java:109)
>   at org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:938)
>   at org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:934)
>   at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:945)
>   at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1592)
>   at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1632)
>   at org.apache.hadoop.hive.hbase.HiveHFileOutputFormat$1.close(HiveHFileOutputFormat.java:153)
>   at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:197)
>   ... 11 more
> 2019-09-24 19:15:56,093 INFO [main] org.apache.hadoop.mapred.Task: Runnning cleanup for the task
> {code}
> I think we should skip it if srcDir do not exist, fix like this:
> {code:java}
> @Override
> public void close(boolean abort) throws IOException {
>   try {
>     ...
>     FileStatus [] files = null;
>     for (;;) {
>       try {
>         files = fs.listStatus(srcDir, FileUtils.STAGING_DIR_PATH_FILTER);
>       } catch (FileNotFoundException fnfe) {
>         LOG.error(String.format("Output data is empty, please check Task [ %s ]", tac.getTaskAttemptID().toString()), fnfe);
>         break;
>       }
>    ...
>   } catch (InterruptedException ex) {
>     throw new IOException(ex);
>   }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)