You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Yuquan Wang <yu...@smartnews.com> on 2021/07/02 03:06:26 UTC

old version hive(0.13) failed with "File already exist"

Hi, hive users,
We need help from the hive community!

We are now using very old hive version(0.13) due to historical reason, and
we often meet following issue:

Caused by: java.io.IOException: File already
exists:s3://smart-dmp/warehouse/uploaded/ad_dmp_pixel/dt=2021-06-21/key=259f3XXXXXXX


We have investigated this issue for quite a long time, but didn't get a
good fix, so I may want to ask the hive community for help to see if there
are any solutions.The error is created during map/reduce stage, once an
instance failed due to some unexpected reason(for example unstable spot
instance got killed), then later retry will throw the above exception,
instead of overwriting it.

We have several guesses like the following:
1. Is it caused by orc file type? I have found similar issue like
https://issues.apache.org/jira/browse/HIVE-6341 but saw no comments there,
and our table is stored as orc style.
2. Is the problem solved in the higher hive version? because we are also
running hive 2.3.6, but didn't meet such an issue, so want to see if
version upgrade can solve the issue?
3.Do we have such a config that supports always cleaning up existing
folders during retry of mapper/reducer stage. I have searched all mapreduce
config but can not find one.

I am really sorry for proposing this question, but I really need help from
the community. Thanks a lot in advance!!!