You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Hong Tang (JIRA)" <ji...@apache.org> on 2009/11/30 19:35:20 UTC
[jira] Created: (PIG-1115) [zebra] temp files are not cleaned.
[zebra] temp files are not cleaned.
-----------------------------------
Key: PIG-1115
URL: https://issues.apache.org/jira/browse/PIG-1115
Project: Pig
Issue Type: Bug
Reporter: Hong Tang
Temp files created by zebra during table creation are not cleaned where there is any task failure, which results in waste of disk space.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1115) [zebra] temp files are not cleaned.
Posted by "Yan Zhou (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834445#action_12834445 ]
Yan Zhou commented on PIG-1115:
-------------------------------
Hudson results on the load-store-redesign branch:
+1 overall.
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 7 new or modified tests.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 findbugs. The patch does not introduce any new Findbugs warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
> [zebra] temp files are not cleaned.
> -----------------------------------
>
> Key: PIG-1115
> URL: https://issues.apache.org/jira/browse/PIG-1115
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.7.0
> Reporter: Hong Tang
> Assignee: Gaurav Jain
> Attachments: PIG-1115.patch
>
>
> Temp files created by zebra during table creation are not cleaned where there is any task failure, which results in waste of disk space.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1115) [zebra] temp files are not cleaned.
Posted by "Yan Zhou (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834409#action_12834409 ]
Yan Zhou commented on PIG-1115:
-------------------------------
patch reviewed +1
> [zebra] temp files are not cleaned.
> -----------------------------------
>
> Key: PIG-1115
> URL: https://issues.apache.org/jira/browse/PIG-1115
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.7.0
> Reporter: Hong Tang
> Assignee: Gaurav Jain
> Attachments: PIG-1115.patch
>
>
> Temp files created by zebra during table creation are not cleaned where there is any task failure, which results in waste of disk space.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1115) [zebra] temp files are not cleaned.
Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alan Gates updated PIG-1115:
----------------------------
Fix Version/s: 0.7.0
> [zebra] temp files are not cleaned.
> -----------------------------------
>
> Key: PIG-1115
> URL: https://issues.apache.org/jira/browse/PIG-1115
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.7.0
> Reporter: Hong Tang
> Assignee: Gaurav Jain
> Fix For: 0.7.0
>
> Attachments: PIG-1115.patch
>
>
> Temp files created by zebra during table creation are not cleaned where there is any task failure, which results in waste of disk space.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1115) [zebra] temp files are not cleaned.
Posted by "Yan Zhou (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yan Zhou updated PIG-1115:
--------------------------
Affects Version/s: 0.7.0
> [zebra] temp files are not cleaned.
> -----------------------------------
>
> Key: PIG-1115
> URL: https://issues.apache.org/jira/browse/PIG-1115
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.7.0
> Reporter: Hong Tang
>
> Temp files created by zebra during table creation are not cleaned where there is any task failure, which results in waste of disk space.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1115) [zebra] temp files are not cleaned.
Posted by "Gaurav Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gaurav Jain updated PIG-1115:
-----------------------------
Attachment: PIG-1115.patch
Patch for the fix.
We rely on application to call BTOF.close() for successful jobs as in Hadoop 0.21 OutputCommitter we can not differetiate b/w failed and successful jobs. Hadoop patch for this issue is available in Hadoop 0.22
For the same reasons, we rely on applications to clean any unwanted files/dirs for FAILED JOBS as they are doing currently.
Once the Hadoop patch/release is available, we can port the above inside zebra libraries.
> [zebra] temp files are not cleaned.
> -----------------------------------
>
> Key: PIG-1115
> URL: https://issues.apache.org/jira/browse/PIG-1115
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.7.0
> Reporter: Hong Tang
> Assignee: Gaurav Jain
> Attachments: PIG-1115.patch
>
>
> Temp files created by zebra during table creation are not cleaned where there is any task failure, which results in waste of disk space.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-1115) [zebra] temp files are not cleaned.
Posted by "Gaurav Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gaurav Jain reassigned PIG-1115:
--------------------------------
Assignee: Gaurav Jain
> [zebra] temp files are not cleaned.
> -----------------------------------
>
> Key: PIG-1115
> URL: https://issues.apache.org/jira/browse/PIG-1115
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.7.0
> Reporter: Hong Tang
> Assignee: Gaurav Jain
>
> Temp files created by zebra during table creation are not cleaned where there is any task failure, which results in waste of disk space.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Closed: (PIG-1115) [zebra] temp files are not cleaned.
Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Daniel Dai closed PIG-1115.
---------------------------
> [zebra] temp files are not cleaned.
> -----------------------------------
>
> Key: PIG-1115
> URL: https://issues.apache.org/jira/browse/PIG-1115
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.7.0
> Reporter: Hong Tang
> Assignee: Gaurav Jain
> Fix For: 0.7.0
>
> Attachments: PIG-1115.patch
>
>
> Temp files created by zebra during table creation are not cleaned where there is any task failure, which results in waste of disk space.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-1115) [zebra] temp files are not cleaned.
Posted by "Yan Zhou (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yan Zhou resolved PIG-1115.
---------------------------
Resolution: Fixed
Patch committed to the load-store-redesign branch.
> [zebra] temp files are not cleaned.
> -----------------------------------
>
> Key: PIG-1115
> URL: https://issues.apache.org/jira/browse/PIG-1115
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.7.0
> Reporter: Hong Tang
> Assignee: Gaurav Jain
> Attachments: PIG-1115.patch
>
>
> Temp files created by zebra during table creation are not cleaned where there is any task failure, which results in waste of disk space.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1115) [zebra] temp files are not cleaned.
Posted by "Hong Tang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834373#action_12834373 ]
Hong Tang commented on PIG-1115:
--------------------------------
Why not requesting the patch to be back ported to Hadoop 0.21 (btw do you mean Hadoop 0.21 or 0.20)?
> [zebra] temp files are not cleaned.
> -----------------------------------
>
> Key: PIG-1115
> URL: https://issues.apache.org/jira/browse/PIG-1115
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.7.0
> Reporter: Hong Tang
> Assignee: Gaurav Jain
> Attachments: PIG-1115.patch
>
>
> Temp files created by zebra during table creation are not cleaned where there is any task failure, which results in waste of disk space.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1115) [zebra] temp files are not cleaned.
Posted by "Gaurav Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830383#action_12830383 ]
Gaurav Jain commented on PIG-1115:
----------------------------------
Proposed Solution:
-- Zebra will implement ZebraOutputCommitter
-- Zebra FrontEnd will create all the final directories and schema files
$basicTable/.btschema
$basicTable/CG0/.schema
$basicTable/CG1/.schema
-- Zebra will create a temporary directory per BasicTable and write all data there during RecordWrite.write() under
$basicTable/_temporary/CG0/part-0000
$basicTable/_temporary/CG1/part-0000
-- _temporary directory will always be created under $basicTable
-- In BackEnd, Zebra created RecordWrites which in turn creates CGInserter. CGInserter works on directory, which we call 'workOutputPath' ,
$basicTable/_temporary/$CG/
But It needs .schema file which is located 2 levels up. So it reads schema file from
$basicTable/$workOutputPath.getName()
-- In CGInserter.close(),
$basicTable/_temporary/CG0/part-0000 -----------> $basicTable/CG0/part-0000
-- In ZebraOutputCommitter.cleanupJob(), BasicTableOutputFormat.close() will be called.
-- In BasicTableOutPutFormat.close()
remove ( $basicTable/_temporary/ )
> [zebra] temp files are not cleaned.
> -----------------------------------
>
> Key: PIG-1115
> URL: https://issues.apache.org/jira/browse/PIG-1115
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.7.0
> Reporter: Hong Tang
>
> Temp files created by zebra during table creation are not cleaned where there is any task failure, which results in waste of disk space.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1115) [zebra] temp files are not cleaned.
Posted by "Gaurav Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834380#action_12834380 ]
Gaurav Jain commented on PIG-1115:
----------------------------------
We discussed the backport with M/R team ( patch MAPREDUCE-947), earliest it can be done is in the next release of Hadoop.
I meant Hadoop 0.20/0.21 ( any release other than trunk )
> [zebra] temp files are not cleaned.
> -----------------------------------
>
> Key: PIG-1115
> URL: https://issues.apache.org/jira/browse/PIG-1115
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.7.0
> Reporter: Hong Tang
> Assignee: Gaurav Jain
> Attachments: PIG-1115.patch
>
>
> Temp files created by zebra during table creation are not cleaned where there is any task failure, which results in waste of disk space.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.