You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Hong Tang (JIRA)" <ji...@apache.org> on 2009/11/30 19:35:20 UTC

[jira] Created: (PIG-1115) [zebra] temp files are not cleaned.

[zebra] temp files are not cleaned.
-----------------------------------

                 Key: PIG-1115
                 URL: https://issues.apache.org/jira/browse/PIG-1115
             Project: Pig
          Issue Type: Bug
            Reporter: Hong Tang


Temp files created by zebra during table creation are not cleaned where there is any task failure, which results in waste of disk space.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1115) [zebra] temp files are not cleaned.

Posted by "Yan Zhou (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834445#action_12834445 ] 

Yan Zhou commented on PIG-1115:
-------------------------------

Hudson results on the load-store-redesign branch:

+1 overall.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 7 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac compiler warnings.

 +1 findbugs.  The patch does not introduce any new Findbugs warnings.

 +1 release audit.  The applied patch does not increase the total number of release audit warnings.

> [zebra] temp files are not cleaned.
> -----------------------------------
>
>                 Key: PIG-1115
>                 URL: https://issues.apache.org/jira/browse/PIG-1115
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Hong Tang
>            Assignee: Gaurav Jain
>         Attachments: PIG-1115.patch
>
>
> Temp files created by zebra during table creation are not cleaned where there is any task failure, which results in waste of disk space.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1115) [zebra] temp files are not cleaned.

Posted by "Yan Zhou (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834409#action_12834409 ] 

Yan Zhou commented on PIG-1115:
-------------------------------

patch reviewed +1

> [zebra] temp files are not cleaned.
> -----------------------------------
>
>                 Key: PIG-1115
>                 URL: https://issues.apache.org/jira/browse/PIG-1115
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Hong Tang
>            Assignee: Gaurav Jain
>         Attachments: PIG-1115.patch
>
>
> Temp files created by zebra during table creation are not cleaned where there is any task failure, which results in waste of disk space.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1115) [zebra] temp files are not cleaned.

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Gates updated PIG-1115:
----------------------------

    Fix Version/s: 0.7.0

> [zebra] temp files are not cleaned.
> -----------------------------------
>
>                 Key: PIG-1115
>                 URL: https://issues.apache.org/jira/browse/PIG-1115
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Hong Tang
>            Assignee: Gaurav Jain
>             Fix For: 0.7.0
>
>         Attachments: PIG-1115.patch
>
>
> Temp files created by zebra during table creation are not cleaned where there is any task failure, which results in waste of disk space.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1115) [zebra] temp files are not cleaned.

Posted by "Yan Zhou (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yan Zhou updated PIG-1115:
--------------------------

    Affects Version/s: 0.7.0

> [zebra] temp files are not cleaned.
> -----------------------------------
>
>                 Key: PIG-1115
>                 URL: https://issues.apache.org/jira/browse/PIG-1115
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Hong Tang
>
> Temp files created by zebra during table creation are not cleaned where there is any task failure, which results in waste of disk space.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1115) [zebra] temp files are not cleaned.

Posted by "Gaurav Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gaurav Jain updated PIG-1115:
-----------------------------

    Attachment: PIG-1115.patch


Patch for the fix.

We rely on application to call BTOF.close() for successful jobs as in Hadoop 0.21 OutputCommitter we can not differetiate b/w failed and successful jobs. Hadoop patch for this issue is available in Hadoop 0.22

For the same reasons, we rely on applications to clean any unwanted files/dirs for FAILED JOBS as they are doing currently.

Once the Hadoop patch/release is available, we can port the above inside zebra libraries.

> [zebra] temp files are not cleaned.
> -----------------------------------
>
>                 Key: PIG-1115
>                 URL: https://issues.apache.org/jira/browse/PIG-1115
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Hong Tang
>            Assignee: Gaurav Jain
>         Attachments: PIG-1115.patch
>
>
> Temp files created by zebra during table creation are not cleaned where there is any task failure, which results in waste of disk space.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (PIG-1115) [zebra] temp files are not cleaned.

Posted by "Gaurav Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gaurav Jain reassigned PIG-1115:
--------------------------------

    Assignee: Gaurav Jain

> [zebra] temp files are not cleaned.
> -----------------------------------
>
>                 Key: PIG-1115
>                 URL: https://issues.apache.org/jira/browse/PIG-1115
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Hong Tang
>            Assignee: Gaurav Jain
>
> Temp files created by zebra during table creation are not cleaned where there is any task failure, which results in waste of disk space.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Closed: (PIG-1115) [zebra] temp files are not cleaned.

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai closed PIG-1115.
---------------------------


> [zebra] temp files are not cleaned.
> -----------------------------------
>
>                 Key: PIG-1115
>                 URL: https://issues.apache.org/jira/browse/PIG-1115
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Hong Tang
>            Assignee: Gaurav Jain
>             Fix For: 0.7.0
>
>         Attachments: PIG-1115.patch
>
>
> Temp files created by zebra during table creation are not cleaned where there is any task failure, which results in waste of disk space.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (PIG-1115) [zebra] temp files are not cleaned.

Posted by "Yan Zhou (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yan Zhou resolved PIG-1115.
---------------------------

    Resolution: Fixed

Patch committed to the load-store-redesign branch.

> [zebra] temp files are not cleaned.
> -----------------------------------
>
>                 Key: PIG-1115
>                 URL: https://issues.apache.org/jira/browse/PIG-1115
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Hong Tang
>            Assignee: Gaurav Jain
>         Attachments: PIG-1115.patch
>
>
> Temp files created by zebra during table creation are not cleaned where there is any task failure, which results in waste of disk space.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1115) [zebra] temp files are not cleaned.

Posted by "Hong Tang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834373#action_12834373 ] 

Hong Tang commented on PIG-1115:
--------------------------------

Why not requesting the patch to be back ported to Hadoop 0.21 (btw do you mean Hadoop 0.21 or 0.20)?

> [zebra] temp files are not cleaned.
> -----------------------------------
>
>                 Key: PIG-1115
>                 URL: https://issues.apache.org/jira/browse/PIG-1115
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Hong Tang
>            Assignee: Gaurav Jain
>         Attachments: PIG-1115.patch
>
>
> Temp files created by zebra during table creation are not cleaned where there is any task failure, which results in waste of disk space.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1115) [zebra] temp files are not cleaned.

Posted by "Gaurav Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830383#action_12830383 ] 

Gaurav Jain commented on PIG-1115:
----------------------------------

Proposed Solution:

-- Zebra will implement ZebraOutputCommitter

-- Zebra FrontEnd will create all the final directories and schema files 

                    $basicTable/.btschema
                    $basicTable/CG0/.schema
                    $basicTable/CG1/.schema


-- Zebra will create a temporary directory per BasicTable and write all data there during RecordWrite.write() under

                     $basicTable/_temporary/CG0/part-0000
                     $basicTable/_temporary/CG1/part-0000

-- _temporary directory will always be created under $basicTable

-- In BackEnd, Zebra created RecordWrites which in turn creates CGInserter. CGInserter works on directory, which we call 'workOutputPath' , 
                                  $basicTable/_temporary/$CG/
             But It needs .schema file which is located 2 levels up. So it reads schema file from
                                  $basicTable/$workOutputPath.getName()

-- In CGInserter.close(), 
                     $basicTable/_temporary/CG0/part-0000       ----------->              $basicTable/CG0/part-0000
-- In ZebraOutputCommitter.cleanupJob(), BasicTableOutputFormat.close() will be called.
-- In BasicTableOutPutFormat.close()
                      remove (                $basicTable/_temporary/               )






> [zebra] temp files are not cleaned.
> -----------------------------------
>
>                 Key: PIG-1115
>                 URL: https://issues.apache.org/jira/browse/PIG-1115
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Hong Tang
>
> Temp files created by zebra during table creation are not cleaned where there is any task failure, which results in waste of disk space.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1115) [zebra] temp files are not cleaned.

Posted by "Gaurav Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834380#action_12834380 ] 

Gaurav Jain commented on PIG-1115:
----------------------------------


We discussed the backport with M/R team ( patch MAPREDUCE-947), earliest it can be done is in the next release of Hadoop.

I meant Hadoop 0.20/0.21 ( any release other than trunk )

> [zebra] temp files are not cleaned.
> -----------------------------------
>
>                 Key: PIG-1115
>                 URL: https://issues.apache.org/jira/browse/PIG-1115
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Hong Tang
>            Assignee: Gaurav Jain
>         Attachments: PIG-1115.patch
>
>
> Temp files created by zebra during table creation are not cleaned where there is any task failure, which results in waste of disk space.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.