You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org> on 2008/12/07 11:07:44 UTC

[jira] Created: (HIVE-131) insert overwrite directory leaves behind uncommitted/tmp files from failed tasks

insert overwrite directory leaves behind uncommitted/tmp files from failed tasks
--------------------------------------------------------------------------------

                 Key: HIVE-131
                 URL: https://issues.apache.org/jira/browse/HIVE-131
             Project: Hadoop Hive
          Issue Type: Bug
          Components: Query Processor
            Reporter: Joydeep Sen Sarma
            Priority: Critical


_tmp files are getting left behind on insert overwrite directory:

/user/jssarma/ctst1/40422_m_000195_0.deflate  <r 3> 13285 2008-12-07 01:47  rw-r--r-- jssarma supergroup
/user/jssarma/ctst1/40422_m_000196_0.deflate  <r 3> 3055  2008-12-07 01:46  rw-r--r-- jssarma supergroup
/user/jssarma/ctst1/_tmp.40422_m_000033_0 <r 3> 0 2008-12-07 01:53  rw-r--r-- jssarma supergroup
/user/jssarma/ctst1/_tmp.40422_m_000037_1 <r 3> 0 2008-12-07 01:53  rw-r--r-- jssarma supergroup


this happened with speculative execution. the code looks good (in fact in this case many speculative tasks were launched - and only a couple caused problems). Almost seems like these files did not appear in the namespace until after the map-reduce job finished and the movetask did a listing of the output dir ..

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-131) insert overwrite directory leaves behind uncommitted/tmp files from failed tasks

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12674776#action_12674776 ] 

Joydeep Sen Sarma commented on HIVE-131:
----------------------------------------

please commit this to 0.2 also since it's a pretty severe bug

> insert overwrite directory leaves behind uncommitted/tmp files from failed tasks
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-131
>                 URL: https://issues.apache.org/jira/browse/HIVE-131
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>            Priority: Critical
>         Attachments: HIVE-131.patch.1, hive-131.patch.2
>
>
> _tmp files are getting left behind on insert overwrite directory:
> /user/jssarma/ctst1/40422_m_000195_0.deflate  <r 3> 13285 2008-12-07 01:47  rw-r--r-- jssarma supergroup
> /user/jssarma/ctst1/40422_m_000196_0.deflate  <r 3> 3055  2008-12-07 01:46  rw-r--r-- jssarma supergroup
> /user/jssarma/ctst1/_tmp.40422_m_000033_0 <r 3> 0 2008-12-07 01:53  rw-r--r-- jssarma supergroup
> /user/jssarma/ctst1/_tmp.40422_m_000037_1 <r 3> 0 2008-12-07 01:53  rw-r--r-- jssarma supergroup
> this happened with speculative execution. the code looks good (in fact in this case many speculative tasks were launched - and only a couple caused problems). Almost seems like these files did not appear in the namespace until after the map-reduce job finished and the movetask did a listing of the output dir ..

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-131) insert overwrite directory leaves behind uncommitted/tmp files from failed tasks

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654504#action_12654504 ] 

Joydeep Sen Sarma commented on HIVE-131:
----------------------------------------

no - no failures here (from hive perspective). no ctrl-C. speculative map-reduce tasks are causing this problem.

> insert overwrite directory leaves behind uncommitted/tmp files from failed tasks
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-131
>                 URL: https://issues.apache.org/jira/browse/HIVE-131
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Joydeep Sen Sarma
>            Priority: Critical
>
> _tmp files are getting left behind on insert overwrite directory:
> /user/jssarma/ctst1/40422_m_000195_0.deflate  <r 3> 13285 2008-12-07 01:47  rw-r--r-- jssarma supergroup
> /user/jssarma/ctst1/40422_m_000196_0.deflate  <r 3> 3055  2008-12-07 01:46  rw-r--r-- jssarma supergroup
> /user/jssarma/ctst1/_tmp.40422_m_000033_0 <r 3> 0 2008-12-07 01:53  rw-r--r-- jssarma supergroup
> /user/jssarma/ctst1/_tmp.40422_m_000037_1 <r 3> 0 2008-12-07 01:53  rw-r--r-- jssarma supergroup
> this happened with speculative execution. the code looks good (in fact in this case many speculative tasks were launched - and only a couple caused problems). Almost seems like these files did not appear in the namespace until after the map-reduce job finished and the movetask did a listing of the output dir ..

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-131) insert overwrite directory leaves behind uncommitted/tmp files from failed tasks

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654322#action_12654322 ] 

Joydeep Sen Sarma commented on HIVE-131:
----------------------------------------

btw - if we can solve a problem at the application layer - i would really prefer doing that. we are trying to get hive to compile and run for different hadoop versions. the less dependency on non-critical hadoop apis/mechanisms - the less trouble we will have with portability.

> insert overwrite directory leaves behind uncommitted/tmp files from failed tasks
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-131
>                 URL: https://issues.apache.org/jira/browse/HIVE-131
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Joydeep Sen Sarma
>            Priority: Critical
>
> _tmp files are getting left behind on insert overwrite directory:
> /user/jssarma/ctst1/40422_m_000195_0.deflate  <r 3> 13285 2008-12-07 01:47  rw-r--r-- jssarma supergroup
> /user/jssarma/ctst1/40422_m_000196_0.deflate  <r 3> 3055  2008-12-07 01:46  rw-r--r-- jssarma supergroup
> /user/jssarma/ctst1/_tmp.40422_m_000033_0 <r 3> 0 2008-12-07 01:53  rw-r--r-- jssarma supergroup
> /user/jssarma/ctst1/_tmp.40422_m_000037_1 <r 3> 0 2008-12-07 01:53  rw-r--r-- jssarma supergroup
> this happened with speculative execution. the code looks good (in fact in this case many speculative tasks were launched - and only a couple caused problems). Almost seems like these files did not appear in the namespace until after the map-reduce job finished and the movetask did a listing of the output dir ..

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-131) insert overwrite directory leaves behind uncommitted/tmp files from failed tasks

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654313#action_12654313 ] 

Zheng Shao commented on HIVE-131:
---------------------------------

We should start using the standard way to create side-effect files. This could remove the potential race condition.

http://hadoop.apache.org/core/docs/r0.17.2/api/org/apache/hadoop/mapred/FileOutputFormat.html#getWorkOutputPath(org.apache.hadoop.mapred.JobConf)


> insert overwrite directory leaves behind uncommitted/tmp files from failed tasks
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-131
>                 URL: https://issues.apache.org/jira/browse/HIVE-131
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Joydeep Sen Sarma
>            Priority: Critical
>
> _tmp files are getting left behind on insert overwrite directory:
> /user/jssarma/ctst1/40422_m_000195_0.deflate  <r 3> 13285 2008-12-07 01:47  rw-r--r-- jssarma supergroup
> /user/jssarma/ctst1/40422_m_000196_0.deflate  <r 3> 3055  2008-12-07 01:46  rw-r--r-- jssarma supergroup
> /user/jssarma/ctst1/_tmp.40422_m_000033_0 <r 3> 0 2008-12-07 01:53  rw-r--r-- jssarma supergroup
> /user/jssarma/ctst1/_tmp.40422_m_000037_1 <r 3> 0 2008-12-07 01:53  rw-r--r-- jssarma supergroup
> this happened with speculative execution. the code looks good (in fact in this case many speculative tasks were launched - and only a couple caused problems). Almost seems like these files did not appear in the namespace until after the map-reduce job finished and the movetask did a listing of the output dir ..

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-131) insert overwrite directory leaves behind uncommitted/tmp files from failed tasks

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654329#action_12654329 ] 

Namit Jain commented on HIVE-131:
---------------------------------

we already have this infrastructure in place - the intermediate results, describe output etc. are created in the scratch directory, which
are deleted at the end of query execution - Driver has a close() call which is supposed to be called at the end of query execution

> insert overwrite directory leaves behind uncommitted/tmp files from failed tasks
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-131
>                 URL: https://issues.apache.org/jira/browse/HIVE-131
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Joydeep Sen Sarma
>            Priority: Critical
>
> _tmp files are getting left behind on insert overwrite directory:
> /user/jssarma/ctst1/40422_m_000195_0.deflate  <r 3> 13285 2008-12-07 01:47  rw-r--r-- jssarma supergroup
> /user/jssarma/ctst1/40422_m_000196_0.deflate  <r 3> 3055  2008-12-07 01:46  rw-r--r-- jssarma supergroup
> /user/jssarma/ctst1/_tmp.40422_m_000033_0 <r 3> 0 2008-12-07 01:53  rw-r--r-- jssarma supergroup
> /user/jssarma/ctst1/_tmp.40422_m_000037_1 <r 3> 0 2008-12-07 01:53  rw-r--r-- jssarma supergroup
> this happened with speculative execution. the code looks good (in fact in this case many speculative tasks were launched - and only a couple caused problems). Almost seems like these files did not appear in the namespace until after the map-reduce job finished and the movetask did a listing of the output dir ..

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-131) insert overwrite directory leaves behind uncommitted/tmp files from failed tasks

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654479#action_12654479 ] 

Ashish Thusoo commented on HIVE-131:
------------------------------------

Maybe the driver close is not getting called for this failure. Is this ctrl-C case? Or an actual failure. Can you put ut the stack trace of the failure.

> insert overwrite directory leaves behind uncommitted/tmp files from failed tasks
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-131
>                 URL: https://issues.apache.org/jira/browse/HIVE-131
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Joydeep Sen Sarma
>            Priority: Critical
>
> _tmp files are getting left behind on insert overwrite directory:
> /user/jssarma/ctst1/40422_m_000195_0.deflate  <r 3> 13285 2008-12-07 01:47  rw-r--r-- jssarma supergroup
> /user/jssarma/ctst1/40422_m_000196_0.deflate  <r 3> 3055  2008-12-07 01:46  rw-r--r-- jssarma supergroup
> /user/jssarma/ctst1/_tmp.40422_m_000033_0 <r 3> 0 2008-12-07 01:53  rw-r--r-- jssarma supergroup
> /user/jssarma/ctst1/_tmp.40422_m_000037_1 <r 3> 0 2008-12-07 01:53  rw-r--r-- jssarma supergroup
> this happened with speculative execution. the code looks good (in fact in this case many speculative tasks were launched - and only a couple caused problems). Almost seems like these files did not appear in the namespace until after the map-reduce job finished and the movetask did a listing of the output dir ..

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HIVE-131) insert overwrite directory leaves behind uncommitted/tmp files from failed tasks

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joydeep Sen Sarma reassigned HIVE-131:
--------------------------------------

    Assignee: Joydeep Sen Sarma

> insert overwrite directory leaves behind uncommitted/tmp files from failed tasks
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-131
>                 URL: https://issues.apache.org/jira/browse/HIVE-131
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>            Priority: Critical
>
> _tmp files are getting left behind on insert overwrite directory:
> /user/jssarma/ctst1/40422_m_000195_0.deflate  <r 3> 13285 2008-12-07 01:47  rw-r--r-- jssarma supergroup
> /user/jssarma/ctst1/40422_m_000196_0.deflate  <r 3> 3055  2008-12-07 01:46  rw-r--r-- jssarma supergroup
> /user/jssarma/ctst1/_tmp.40422_m_000033_0 <r 3> 0 2008-12-07 01:53  rw-r--r-- jssarma supergroup
> /user/jssarma/ctst1/_tmp.40422_m_000037_1 <r 3> 0 2008-12-07 01:53  rw-r--r-- jssarma supergroup
> this happened with speculative execution. the code looks good (in fact in this case many speculative tasks were launched - and only a couple caused problems). Almost seems like these files did not appear in the namespace until after the map-reduce job finished and the movetask did a listing of the output dir ..

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-131) insert overwrite directory leaves behind uncommitted/tmp files from failed tasks

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joydeep Sen Sarma updated HIVE-131:
-----------------------------------

    Status: Patch Available  (was: Open)

> insert overwrite directory leaves behind uncommitted/tmp files from failed tasks
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-131
>                 URL: https://issues.apache.org/jira/browse/HIVE-131
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>            Priority: Critical
>         Attachments: HIVE-131.patch.1
>
>
> _tmp files are getting left behind on insert overwrite directory:
> /user/jssarma/ctst1/40422_m_000195_0.deflate  <r 3> 13285 2008-12-07 01:47  rw-r--r-- jssarma supergroup
> /user/jssarma/ctst1/40422_m_000196_0.deflate  <r 3> 3055  2008-12-07 01:46  rw-r--r-- jssarma supergroup
> /user/jssarma/ctst1/_tmp.40422_m_000033_0 <r 3> 0 2008-12-07 01:53  rw-r--r-- jssarma supergroup
> /user/jssarma/ctst1/_tmp.40422_m_000037_1 <r 3> 0 2008-12-07 01:53  rw-r--r-- jssarma supergroup
> this happened with speculative execution. the code looks good (in fact in this case many speculative tasks were launched - and only a couple caused problems). Almost seems like these files did not appear in the namespace until after the map-reduce job finished and the movetask did a listing of the output dir ..

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-131) insert overwrite directory leaves behind uncommitted/tmp files from failed tasks

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao updated HIVE-131:
----------------------------

       Resolution: Fixed
    Fix Version/s: 0.3.0
                   0.2.0
     Release Note: HIVE-131. Remove uncommitted files from failed tasks. (Joydeep Sen Sarma via zshao)
     Hadoop Flags: [Reviewed]
           Status: Resolved  (was: Patch Available)

trunk: Committed revision 745709.
branch-0.2: Committed revision 745710.



> insert overwrite directory leaves behind uncommitted/tmp files from failed tasks
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-131
>                 URL: https://issues.apache.org/jira/browse/HIVE-131
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>            Priority: Critical
>             Fix For: 0.2.0, 0.3.0
>
>         Attachments: HIVE-131.patch.1, hive-131.patch.2
>
>
> _tmp files are getting left behind on insert overwrite directory:
> /user/jssarma/ctst1/40422_m_000195_0.deflate  <r 3> 13285 2008-12-07 01:47  rw-r--r-- jssarma supergroup
> /user/jssarma/ctst1/40422_m_000196_0.deflate  <r 3> 3055  2008-12-07 01:46  rw-r--r-- jssarma supergroup
> /user/jssarma/ctst1/_tmp.40422_m_000033_0 <r 3> 0 2008-12-07 01:53  rw-r--r-- jssarma supergroup
> /user/jssarma/ctst1/_tmp.40422_m_000037_1 <r 3> 0 2008-12-07 01:53  rw-r--r-- jssarma supergroup
> this happened with speculative execution. the code looks good (in fact in this case many speculative tasks were launched - and only a couple caused problems). Almost seems like these files did not appear in the namespace until after the map-reduce job finished and the movetask did a listing of the output dir ..

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-131) insert overwrite directory leaves behind uncommitted/tmp files from failed tasks

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12674382#action_12674382 ] 

Zheng Shao commented on HIVE-131:
---------------------------------

+1 Looks good to me.


> insert overwrite directory leaves behind uncommitted/tmp files from failed tasks
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-131
>                 URL: https://issues.apache.org/jira/browse/HIVE-131
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>            Priority: Critical
>         Attachments: HIVE-131.patch.1, hive-131.patch.2
>
>
> _tmp files are getting left behind on insert overwrite directory:
> /user/jssarma/ctst1/40422_m_000195_0.deflate  <r 3> 13285 2008-12-07 01:47  rw-r--r-- jssarma supergroup
> /user/jssarma/ctst1/40422_m_000196_0.deflate  <r 3> 3055  2008-12-07 01:46  rw-r--r-- jssarma supergroup
> /user/jssarma/ctst1/_tmp.40422_m_000033_0 <r 3> 0 2008-12-07 01:53  rw-r--r-- jssarma supergroup
> /user/jssarma/ctst1/_tmp.40422_m_000037_1 <r 3> 0 2008-12-07 01:53  rw-r--r-- jssarma supergroup
> this happened with speculative execution. the code looks good (in fact in this case many speculative tasks were launched - and only a couple caused problems). Almost seems like these files did not appear in the namespace until after the map-reduce job finished and the movetask did a listing of the output dir ..

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-131) insert overwrite directory leaves behind uncommitted/tmp files from failed tasks

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654321#action_12654321 ] 

Joydeep Sen Sarma commented on HIVE-131:
----------------------------------------

i quickly glanced at the doc mentioned. seems like hadoop will move the files automatically to mapred.output.dir - but we don't have a single output directory for the task (since we can have multiple outputs).

anyway - i am not sure why the problem happens (it almost seems like map-reduce can declare a job complete while (speculative) tasks are still running) - but a trivial fix is to just create the tmp files in a completely different directory (say scratch dir per query) and then move from there. we can discard the scratch dir entirely on query completion. there's still some risk of these runaway files leaking inodes/space. but if these are known hive scratchdir locations - they can always be cleaned up later on.

> insert overwrite directory leaves behind uncommitted/tmp files from failed tasks
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-131
>                 URL: https://issues.apache.org/jira/browse/HIVE-131
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Joydeep Sen Sarma
>            Priority: Critical
>
> _tmp files are getting left behind on insert overwrite directory:
> /user/jssarma/ctst1/40422_m_000195_0.deflate  <r 3> 13285 2008-12-07 01:47  rw-r--r-- jssarma supergroup
> /user/jssarma/ctst1/40422_m_000196_0.deflate  <r 3> 3055  2008-12-07 01:46  rw-r--r-- jssarma supergroup
> /user/jssarma/ctst1/_tmp.40422_m_000033_0 <r 3> 0 2008-12-07 01:53  rw-r--r-- jssarma supergroup
> /user/jssarma/ctst1/_tmp.40422_m_000037_1 <r 3> 0 2008-12-07 01:53  rw-r--r-- jssarma supergroup
> this happened with speculative execution. the code looks good (in fact in this case many speculative tasks were launched - and only a couple caused problems). Almost seems like these files did not appear in the namespace until after the map-reduce job finished and the movetask did a listing of the output dir ..

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-131) insert overwrite directory leaves behind uncommitted/tmp files from failed tasks

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joydeep Sen Sarma updated HIVE-131:
-----------------------------------

    Attachment: hive-131.patch.2

Dhruba said:

> 1. I see that execute returns values 1, 2, and 3. It will be good to document what these values mean.
> 2. Staring hadoop 0.19, it might make sense to set FileSystem.deleteOnExit() for files that are temporary.
> 3. It is interesting to note that now there is an extra step jobClose() that gets triggered on the client-side after the job is complete. Prior to this patch, a job would be successful even if the client-side has disappeared before the job is completed. This patch requires that the client remains active and healthy till the entire job is complete. This probably is ok for Hive, especially because Hive anyway requires job-chaining and I do not see any other way to do it

- incorporated  suggestion to use deleteOnExit where available.
- return codes are always accompanied by a corresponding message on the console/log. So don't see much point creating additional documentation around them.
- hive has always depended on client side code-patch for query completion.

> insert overwrite directory leaves behind uncommitted/tmp files from failed tasks
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-131
>                 URL: https://issues.apache.org/jira/browse/HIVE-131
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>            Priority: Critical
>         Attachments: HIVE-131.patch.1, hive-131.patch.2
>
>
> _tmp files are getting left behind on insert overwrite directory:
> /user/jssarma/ctst1/40422_m_000195_0.deflate  <r 3> 13285 2008-12-07 01:47  rw-r--r-- jssarma supergroup
> /user/jssarma/ctst1/40422_m_000196_0.deflate  <r 3> 3055  2008-12-07 01:46  rw-r--r-- jssarma supergroup
> /user/jssarma/ctst1/_tmp.40422_m_000033_0 <r 3> 0 2008-12-07 01:53  rw-r--r-- jssarma supergroup
> /user/jssarma/ctst1/_tmp.40422_m_000037_1 <r 3> 0 2008-12-07 01:53  rw-r--r-- jssarma supergroup
> this happened with speculative execution. the code looks good (in fact in this case many speculative tasks were launched - and only a couple caused problems). Almost seems like these files did not appear in the namespace until after the map-reduce job finished and the movetask did a listing of the output dir ..

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-131) insert overwrite directory leaves behind uncommitted/tmp files from failed tasks

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Carl Steinbach updated HIVE-131:
--------------------------------

    Fix Version/s:     (was: 0.6.0)

> insert overwrite directory leaves behind uncommitted/tmp files from failed tasks
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-131
>                 URL: https://issues.apache.org/jira/browse/HIVE-131
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>            Priority: Critical
>             Fix For: 0.3.0
>
>         Attachments: HIVE-131.patch.1, hive-131.patch.2
>
>
> _tmp files are getting left behind on insert overwrite directory:
> /user/jssarma/ctst1/40422_m_000195_0.deflate  <r 3> 13285 2008-12-07 01:47  rw-r--r-- jssarma supergroup
> /user/jssarma/ctst1/40422_m_000196_0.deflate  <r 3> 3055  2008-12-07 01:46  rw-r--r-- jssarma supergroup
> /user/jssarma/ctst1/_tmp.40422_m_000033_0 <r 3> 0 2008-12-07 01:53  rw-r--r-- jssarma supergroup
> /user/jssarma/ctst1/_tmp.40422_m_000037_1 <r 3> 0 2008-12-07 01:53  rw-r--r-- jssarma supergroup
> this happened with speculative execution. the code looks good (in fact in this case many speculative tasks were launched - and only a couple caused problems). Almost seems like these files did not appear in the namespace until after the map-reduce job finished and the movetask did a listing of the output dir ..

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-131) insert overwrite directory leaves behind uncommitted/tmp files from failed tasks

Posted by "Joydeep Sen Sarma (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joydeep Sen Sarma updated HIVE-131:
-----------------------------------

    Attachment: HIVE-131.patch.1

> insert overwrite directory leaves behind uncommitted/tmp files from failed tasks
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-131
>                 URL: https://issues.apache.org/jira/browse/HIVE-131
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>            Priority: Critical
>         Attachments: HIVE-131.patch.1
>
>
> _tmp files are getting left behind on insert overwrite directory:
> /user/jssarma/ctst1/40422_m_000195_0.deflate  <r 3> 13285 2008-12-07 01:47  rw-r--r-- jssarma supergroup
> /user/jssarma/ctst1/40422_m_000196_0.deflate  <r 3> 3055  2008-12-07 01:46  rw-r--r-- jssarma supergroup
> /user/jssarma/ctst1/_tmp.40422_m_000033_0 <r 3> 0 2008-12-07 01:53  rw-r--r-- jssarma supergroup
> /user/jssarma/ctst1/_tmp.40422_m_000037_1 <r 3> 0 2008-12-07 01:53  rw-r--r-- jssarma supergroup
> this happened with speculative execution. the code looks good (in fact in this case many speculative tasks were launched - and only a couple caused problems). Almost seems like these files did not appear in the namespace until after the map-reduce job finished and the movetask did a listing of the output dir ..

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.