You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Zhihua Deng (Jira)" <ji...@apache.org> on 2021/09/28 03:15:00 UTC

[jira] [Commented] (HIVE-25295) "File already exist exception" during mapper/reducer retry with old hive(0.13)

    [ https://issues.apache.org/jira/browse/HIVE-25295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17421137#comment-17421137 ] 

Zhihua Deng commented on HIVE-25295:
------------------------------------

Have you tried [HIVE-17963](https://issues.apache.org/jira/browse/HIVE-17963),  there is a catch for runaway processes adding additional files into staging directory.

> "File already exist exception" during mapper/reducer retry with old hive(0.13)
> ------------------------------------------------------------------------------
>
>                 Key: HIVE-25295
>                 URL: https://issues.apache.org/jira/browse/HIVE-25295
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>    Affects Versions: 0.13.0
>            Reporter: yuquan wang
>            Priority: Blocker
>
> We are now using very old hive version(0.13) due to historical reason, and we often meet following issue:
> {code:java}
> Caused by: java.io.IOException: File already exists:s3://smart-dmp/warehouse/uploaded/ad_dmp_pixel/dt=2021-06-21/key=259f3XXXXXXX
> {code}
> We have investigated this issue for quite a long time, but didn't get a good fix, so I may want to ask the hive community for help to see if there are any solutions.
>  
> The error is created during map/reduce stage, once an instance failed due to some unexpected reason(for example unstable spot instance got killed), then later retry will throw the above exception, instead of overwriting it.
>  
> we have several guesses like following:
> 1. Is it caused by orc file type? I have found similar issue like https://issues.apache.org/jira/browse/HIVE-6341 but saw no comments there, and our table is stored as orc style.
> 2. Is the problem solved in the higher hive version? because we are also running hive 2.3.6, but didn't meet such an issue, so want to see if version upgrade can solve the issue?
> 3.Do we have such a config that supports always cleaning up existing folders during retry of mapper/reducer stage. I have searched all mapreduce config but can not find one.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)