You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Namit Jain (JIRA)" <ji...@apache.org> on 2010/04/14 01:27:49 UTC

[jira] Created: (HIVE-1306) cleanup the jobscratchdir

cleanup the jobscratchdir
-------------------------

                 Key: HIVE-1306
                 URL: https://issues.apache.org/jira/browse/HIVE-1306
             Project: Hadoop Hive
          Issue Type: Improvement
          Components: Query Processor
            Reporter: Namit Jain
            Assignee: Ning Zhang
             Fix For: 0.6.0


Currently, the job scratch directory is cleaned up at the end of the hive query - 
It is in a finally block, which will not work if the process is killed.

It should be a shutdown hook - which will take care of these scenarios.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HIVE-1306) cleanup the jobscratchdir

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-1306:
-----------------------------

    Status: Patch Available  (was: Open)

uploaded a new patch where all the query specific directories hang from a given sub-directory.
Verified in test cluster the same.

Running tests right now

> cleanup the jobscratchdir
> -------------------------
>
>                 Key: HIVE-1306
>                 URL: https://issues.apache.org/jira/browse/HIVE-1306
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>             Fix For: 0.6.0
>
>         Attachments: hive.1306.1.patch, hive.1306.2.patch
>
>
> Currently, the job scratch directory is cleaned up at the end of the hive query - 
> It is in a finally block, which will not work if the process is killed.
> It should be a shutdown hook - which will take care of these scenarios.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HIVE-1306) cleanup the jobscratchdir

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12856758#action_12856758 ] 

Ning Zhang commented on HIVE-1306:
----------------------------------

It seems we need to register an OutputCommitter: http://hadoop.apache.org/common/docs/current/mapred_tutorial.html#OutputCommitter

> cleanup the jobscratchdir
> -------------------------
>
>                 Key: HIVE-1306
>                 URL: https://issues.apache.org/jira/browse/HIVE-1306
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>             Fix For: 0.6.0
>
>
> Currently, the job scratch directory is cleaned up at the end of the hive query - 
> It is in a finally block, which will not work if the process is killed.
> It should be a shutdown hook - which will take care of these scenarios.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HIVE-1306) cleanup the jobscratchdir

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12857021#action_12857021 ] 

Namit Jain commented on HIVE-1306:
----------------------------------

I havent tested it yet - will do after checking the unit tests.

scratchDir is the directory from  where all file sinks write to.
the, movetask copies to the actual location.

If you look at the task log, there is the default FileOutputCommitter which cleans the jobScratchDir - I am not sure 
what jobScratchDir is used for - it is not used for the final output. Maybe, it is for the intermediate output or something like
that. Can ask Dhruba/Joy ?


> cleanup the jobscratchdir
> -------------------------
>
>                 Key: HIVE-1306
>                 URL: https://issues.apache.org/jira/browse/HIVE-1306
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>             Fix For: 0.6.0
>
>         Attachments: hive.1306.1.patch
>
>
> Currently, the job scratch directory is cleaned up at the end of the hive query - 
> It is in a finally block, which will not work if the process is killed.
> It should be a shutdown hook - which will take care of these scenarios.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HIVE-1306) cleanup the jobscratchdir

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-1306:
-----------------------------

    Attachment: hive.1306.3.patch

> cleanup the jobscratchdir
> -------------------------
>
>                 Key: HIVE-1306
>                 URL: https://issues.apache.org/jira/browse/HIVE-1306
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>             Fix For: 0.6.0
>
>         Attachments: hive.1306.1.patch, hive.1306.2.patch, hive.1306.3.patch
>
>
> Currently, the job scratch directory is cleaned up at the end of the hive query - 
> It is in a finally block, which will not work if the process is killed.
> It should be a shutdown hook - which will take care of these scenarios.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HIVE-1306) cleanup the jobscratchdir

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12856768#action_12856768 ] 

Ning Zhang commented on HIVE-1306:
----------------------------------

I didn't find the user of any OutputCommitter in Hive. Can  you point me the file? Also it seems the scratch dir is actually configured under hdfs://tmp/<userid>-hive/... Let's talk tomorrow.

> cleanup the jobscratchdir
> -------------------------
>
>                 Key: HIVE-1306
>                 URL: https://issues.apache.org/jira/browse/HIVE-1306
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>             Fix For: 0.6.0
>
>
> Currently, the job scratch directory is cleaned up at the end of the hive query - 
> It is in a finally block, which will not work if the process is killed.
> It should be a shutdown hook - which will take care of these scenarios.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HIVE-1306) cleanup the jobscratchdir

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12856999#action_12856999 ] 

Ning Zhang commented on HIVE-1306:
----------------------------------

Does this works if user kill -9 of the Hive CLI? Also is it possible to add a unit test for this? 

Also what's different between jobScratchDir and scratchDir? If the tmp directory is under hdfs://tmp we should be able to clean it up using OutputCommitter, which IMO is a cleaner solution. Am I missing something?

 

> cleanup the jobscratchdir
> -------------------------
>
>                 Key: HIVE-1306
>                 URL: https://issues.apache.org/jira/browse/HIVE-1306
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>             Fix For: 0.6.0
>
>         Attachments: hive.1306.1.patch
>
>
> Currently, the job scratch directory is cleaned up at the end of the hive query - 
> It is in a finally block, which will not work if the process is killed.
> It should be a shutdown hook - which will take care of these scenarios.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HIVE-1306) cleanup the jobscratchdir

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12856764#action_12856764 ] 

Namit Jain commented on HIVE-1306:
----------------------------------

I think FileOutputCommitter is already used.

It is not jobScratchDir that was the problem reported by Dmytro, but the temp. directory created by intermediate tasks

> cleanup the jobscratchdir
> -------------------------
>
>                 Key: HIVE-1306
>                 URL: https://issues.apache.org/jira/browse/HIVE-1306
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>             Fix For: 0.6.0
>
>
> Currently, the job scratch directory is cleaned up at the end of the hive query - 
> It is in a finally block, which will not work if the process is killed.
> It should be a shutdown hook - which will take care of these scenarios.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HIVE-1306) cleanup the jobscratchdir

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-1306:
-----------------------------

    Attachment: hive.1306.2.patch

> cleanup the jobscratchdir
> -------------------------
>
>                 Key: HIVE-1306
>                 URL: https://issues.apache.org/jira/browse/HIVE-1306
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>             Fix For: 0.6.0
>
>         Attachments: hive.1306.1.patch, hive.1306.2.patch
>
>
> Currently, the job scratch directory is cleaned up at the end of the hive query - 
> It is in a finally block, which will not work if the process is killed.
> It should be a shutdown hook - which will take care of these scenarios.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HIVE-1306) cleanup the jobscratchdir

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12856994#action_12856994 ] 

Namit Jain commented on HIVE-1306:
----------------------------------

Can you look at the patch - there were some diffs. I am still looking at them.
But, as I said, the scratchDir was not cleaned up, which should be if shutdown hook works as expected.

jobScratchDir is not a problem - I guess we can do the same for that also, but that is not a priority right now

> cleanup the jobscratchdir
> -------------------------
>
>                 Key: HIVE-1306
>                 URL: https://issues.apache.org/jira/browse/HIVE-1306
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>             Fix For: 0.6.0
>
>         Attachments: hive.1306.1.patch
>
>
> Currently, the job scratch directory is cleaned up at the end of the hive query - 
> It is in a finally block, which will not work if the process is killed.
> It should be a shutdown hook - which will take care of these scenarios.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HIVE-1306) cleanup the jobscratchdir

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ning Zhang updated HIVE-1306:
-----------------------------

        Status: Resolved  (was: Patch Available)
    Resolution: Fixed

Committed. Thanks Namit!

> cleanup the jobscratchdir
> -------------------------
>
>                 Key: HIVE-1306
>                 URL: https://issues.apache.org/jira/browse/HIVE-1306
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>             Fix For: 0.6.0
>
>         Attachments: hive.1306.1.patch, hive.1306.2.patch, hive.1306.3.patch, hive.1306.4.patch
>
>
> Currently, the job scratch directory is cleaned up at the end of the hive query - 
> It is in a finally block, which will not work if the process is killed.
> It should be a shutdown hook - which will take care of these scenarios.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HIVE-1306) cleanup the jobscratchdir

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-1306:
-----------------------------

    Attachment: hive.1306.1.patch

> cleanup the jobscratchdir
> -------------------------
>
>                 Key: HIVE-1306
>                 URL: https://issues.apache.org/jira/browse/HIVE-1306
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>             Fix For: 0.6.0
>
>         Attachments: hive.1306.1.patch
>
>
> Currently, the job scratch directory is cleaned up at the end of the hive query - 
> It is in a finally block, which will not work if the process is killed.
> It should be a shutdown hook - which will take care of these scenarios.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HIVE-1306) cleanup the jobscratchdir

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12857946#action_12857946 ] 

Ning Zhang commented on HIVE-1306:
----------------------------------

+1 will commit after tests. 

> cleanup the jobscratchdir
> -------------------------
>
>                 Key: HIVE-1306
>                 URL: https://issues.apache.org/jira/browse/HIVE-1306
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>             Fix For: 0.6.0
>
>         Attachments: hive.1306.1.patch, hive.1306.2.patch, hive.1306.3.patch, hive.1306.4.patch
>
>
> Currently, the job scratch directory is cleaned up at the end of the hive query - 
> It is in a finally block, which will not work if the process is killed.
> It should be a shutdown hook - which will take care of these scenarios.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Assigned: (HIVE-1306) cleanup the jobscratchdir

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain reassigned HIVE-1306:
--------------------------------

    Assignee: Namit Jain  (was: Ning Zhang)

> cleanup the jobscratchdir
> -------------------------
>
>                 Key: HIVE-1306
>                 URL: https://issues.apache.org/jira/browse/HIVE-1306
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>             Fix For: 0.6.0
>
>
> Currently, the job scratch directory is cleaned up at the end of the hive query - 
> It is in a finally block, which will not work if the process is killed.
> It should be a shutdown hook - which will take care of these scenarios.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HIVE-1306) cleanup the jobscratchdir

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-1306:
-----------------------------

    Attachment: hive.1306.4.patch

> cleanup the jobscratchdir
> -------------------------
>
>                 Key: HIVE-1306
>                 URL: https://issues.apache.org/jira/browse/HIVE-1306
>             Project: Hadoop Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: Namit Jain
>             Fix For: 0.6.0
>
>         Attachments: hive.1306.1.patch, hive.1306.2.patch, hive.1306.3.patch, hive.1306.4.patch
>
>
> Currently, the job scratch directory is cleaned up at the end of the hive query - 
> It is in a finally block, which will not work if the process is killed.
> It should be a shutdown hook - which will take care of these scenarios.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira