You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Olga Natkovich (JIRA)" <ji...@apache.org> on 2008/06/06 20:16:45 UTC

[jira] Created: (PIG-258) Pig should cleanup output directory of a failed query

Pig should cleanup output directory of a failed query
-----------------------------------------------------

                 Key: PIG-258
                 URL: https://issues.apache.org/jira/browse/PIG-258
             Project: Pig
          Issue Type: Bug
            Reporter: Olga Natkovich
            Priority: Minor


Currently, after a failed store, the output directory is left behind and can't be re-used without manual cleanup


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-258) Pig should cleanup output directory of a failed query

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608608#action_12608608 ] 

Daniel Dai commented on PIG-258:
--------------------------------

Currently only one POStore in one physical plan, so if the execution of a physical plan fails, remove the associated output file.

Here is the plan:
1. Create an entry deleteOnFail in FileLocalizer
2. Capture POStore in physical plan. We can put this code in MRCompiler.compile
3. Clear deteleOnFail before execution of any physical plan
4. After an unsuccessful execution of a physical plan, remove output file if it exists. Add this in PigServer.execute 

I will create a patch shortly if no problem

> Pig should cleanup output directory of a failed query
> -----------------------------------------------------
>
>                 Key: PIG-258
>                 URL: https://issues.apache.org/jira/browse/PIG-258
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Olga Natkovich
>            Priority: Minor
>
> Currently, after a failed store, the output directory is left behind and can't be re-used without manual cleanup

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-258) Pig should cleanup output directory of a failed query

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai updated PIG-258:
---------------------------

    Attachment: clearoutput.patch

Attached patch target branches/types

> Pig should cleanup output directory of a failed query
> -----------------------------------------------------
>
>                 Key: PIG-258
>                 URL: https://issues.apache.org/jira/browse/PIG-258
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Olga Natkovich
>            Priority: Minor
>         Attachments: clearoutput.patch
>
>
> Currently, after a failed store, the output directory is left behind and can't be re-used without manual cleanup

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-258) Pig should cleanup output directory of a failed query

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12609752#action_12609752 ] 

Olga Natkovich commented on PIG-258:
------------------------------------

Hi Daniel,

Looks good. I have a couple of questions/comments:

(1) I don't think we should through exception if we can't register the file to delete. We should just log a warning
(2) Also, if we can't delete, we should log a warning not an error
(3) I think you might be double deleting in case of dump. dump stores into temp file which would get cleaned up when we remove temp files. This might be ok if both check for file existance before trying to delete
(3) What kind of tests did you run?

I think we need to test the following:

(1) Pig script with a single successful store
(2) Pig script with multiple successful stores
(3) Pig script with one failed store 
(4) Pig script with multiple failures 
(5) Pig script with one successful and one failed store
(6) Pig script with a single successful dump
(7) Pig script with a single failed dump



> Pig should cleanup output directory of a failed query
> -----------------------------------------------------
>
>                 Key: PIG-258
>                 URL: https://issues.apache.org/jira/browse/PIG-258
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Olga Natkovich
>            Priority: Minor
>         Attachments: clearoutput.patch
>
>
> Currently, after a failed store, the output directory is left behind and can't be re-used without manual cleanup

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (PIG-258) Pig should cleanup output directory of a failed query

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich resolved PIG-258.
--------------------------------

    Resolution: Fixed

changies committed into the types branch. Thanks, Daniel for contributing!

> Pig should cleanup output directory of a failed query
> -----------------------------------------------------
>
>                 Key: PIG-258
>                 URL: https://issues.apache.org/jira/browse/PIG-258
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Olga Natkovich
>            Priority: Minor
>         Attachments: clearoutput.patch, clearoutput2.patch
>
>
> Currently, after a failed store, the output directory is left behind and can't be re-used without manual cleanup

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-258) Pig should cleanup output directory of a failed query

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai updated PIG-258:
---------------------------

    Attachment: clearoutput2.patch

Some changes in the patch:
1. Use log.warn instead of throw exception/log.error
2. To prevent double deletion, check if output file is inside temp file list before putting into deleteOnFail
3. Add new testcase TestDeleteOnFail

> Pig should cleanup output directory of a failed query
> -----------------------------------------------------
>
>                 Key: PIG-258
>                 URL: https://issues.apache.org/jira/browse/PIG-258
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Olga Natkovich
>            Priority: Minor
>         Attachments: clearoutput.patch, clearoutput2.patch
>
>
> Currently, after a failed store, the output directory is left behind and can't be re-used without manual cleanup

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.