You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Pradeep Kamath <pr...@yahoo-inc.com> on 2010/11/02 23:26:48 UTC

Does output directory remain in case of map/reduce task failures

Hi,
  While doing an insert into partitions if there is a failure in the map/reduce task (maybe due to a UDF bug), does hive cleanup the output directory corresponding to the partition? The behavior in hadoop is to NOT clean up output location in case of task failures (maybe to allow the user to debug). What is hive's behavior? Specifically if I have a table foo and I am writing to partition datestamp=20101102 then the write would go to /user/hive/warehouse/foo/datestamp=20101102. If the task(s) writing to this fail, does hive remove this dir on exit? If it doesn't, a subsequent attempt to write (presumably after fixing the cause of the earlier failure) would also fail unless the dir is removed first.

Pointers appreciated.

Thanks,
Pradeep

RE: Does output directory remain in case of map/reduce task failures

Posted by Namit Jain <nj...@facebook.com>.
yes

________________________________
From: Pradeep Kamath [pradeepk@yahoo-inc.com]
Sent: Tuesday, November 02, 2010 5:16 PM
To: user@hive.apache.org; hive-user@hadoop.apache.org
Subject: RE: Does output directory remain in case of map/reduce task failures

Just to confirm – is this true for both partitioned and non partitioned tables?

________________________________
From: Namit Jain [mailto:njain@facebook.com]
Sent: Tuesday, November 02, 2010 4:22 PM
To: user@hive.apache.org; hive-user@hadoop.apache.org
Subject: RE: Does output directory remain in case of map/reduce task failures

Hive writes to a temporary directory first, and if the UDF fails, the temp. directory is removed.
The expected final directory is not touched.


-namit


________________________________
From: Pradeep Kamath [pradeepk@yahoo-inc.com]
Sent: Tuesday, November 02, 2010 3:26 PM
To: hive-user@hadoop.apache.org
Subject: Does output directory remain in case of map/reduce task failures
Hi,
  While doing an insert into partitions if there is a failure in the map/reduce task (maybe due to a UDF bug), does hive cleanup the output directory corresponding to the partition? The behavior in hadoop is to NOT clean up output location in case of task failures (maybe to allow the user to debug). What is hive’s behavior? Specifically if I have a table foo and I am writing to partition datestamp=20101102 then the write would go to /user/hive/warehouse/foo/datestamp=20101102. If the task(s) writing to this fail, does hive remove this dir on exit? If it doesn’t, a subsequent attempt to write (presumably after fixing the cause of the earlier failure) would also fail unless the dir is removed first.

Pointers appreciated.

Thanks,
Pradeep

RE: Does output directory remain in case of map/reduce task failures

Posted by Pradeep Kamath <pr...@yahoo-inc.com>.
Just to confirm - is this true for both partitioned and non partitioned tables?

________________________________
From: Namit Jain [mailto:njain@facebook.com]
Sent: Tuesday, November 02, 2010 4:22 PM
To: user@hive.apache.org; hive-user@hadoop.apache.org
Subject: RE: Does output directory remain in case of map/reduce task failures

Hive writes to a temporary directory first, and if the UDF fails, the temp. directory is removed.
The expected final directory is not touched.


-namit


________________________________
From: Pradeep Kamath [pradeepk@yahoo-inc.com]
Sent: Tuesday, November 02, 2010 3:26 PM
To: hive-user@hadoop.apache.org
Subject: Does output directory remain in case of map/reduce task failures
Hi,
  While doing an insert into partitions if there is a failure in the map/reduce task (maybe due to a UDF bug), does hive cleanup the output directory corresponding to the partition? The behavior in hadoop is to NOT clean up output location in case of task failures (maybe to allow the user to debug). What is hive's behavior? Specifically if I have a table foo and I am writing to partition datestamp=20101102 then the write would go to /user/hive/warehouse/foo/datestamp=20101102. If the task(s) writing to this fail, does hive remove this dir on exit? If it doesn't, a subsequent attempt to write (presumably after fixing the cause of the earlier failure) would also fail unless the dir is removed first.

Pointers appreciated.

Thanks,
Pradeep

RE: Does output directory remain in case of map/reduce task failures

Posted by Namit Jain <nj...@facebook.com>.
Hive writes to a temporary directory first, and if the UDF fails, the temp. directory is removed.
The expected final directory is not touched.


-namit


________________________________
From: Pradeep Kamath [pradeepk@yahoo-inc.com]
Sent: Tuesday, November 02, 2010 3:26 PM
To: hive-user@hadoop.apache.org
Subject: Does output directory remain in case of map/reduce task failures

Hi,
  While doing an insert into partitions if there is a failure in the map/reduce task (maybe due to a UDF bug), does hive cleanup the output directory corresponding to the partition? The behavior in hadoop is to NOT clean up output location in case of task failures (maybe to allow the user to debug). What is hive’s behavior? Specifically if I have a table foo and I am writing to partition datestamp=20101102 then the write would go to /user/hive/warehouse/foo/datestamp=20101102. If the task(s) writing to this fail, does hive remove this dir on exit? If it doesn’t, a subsequent attempt to write (presumably after fixing the cause of the earlier failure) would also fail unless the dir is removed first.

Pointers appreciated.

Thanks,
Pradeep