You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Eli Reisman (JIRA)" <ji...@apache.org> on 2012/08/15 03:04:38 UTC

[jira] [Updated] (PIG-1891) Enable StoreFunc to make intelligent decision based on job success or failure

     [ https://issues.apache.org/jira/browse/PIG-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eli Reisman updated PIG-1891:
-----------------------------

    Attachment: PIG-1891-2.patch

Hey Alan, what do you think of this?

It restores cleanupOnFailureImpl (why is this exposed in the interface at all, btw?) and does not attempt to implement cleanupOnSuccess, just adds it where relevant. This way users can implement it themselves if they need it in their StoreFunc.

Also: would you look at the way it is wired into PigServer#launchPlan() I'm giving it the same args that cleanupOnFailure() gets but I'm not certain this is the information a user would want it to receive. I expect if they do implement cleanupOnSuccess, these args will provide the data to delete? In the DB example here in this thread, will the data already have been successfully loaded to DB by the user code,and this merely has to erase unneeded files the data was stored in during processing steps after the fact? Would cleanupOnSuccess include the 'load to database' and 'erase leftover files' code together?

Anyway, thanks, let me know if this is what we need or I'm on the right track, thanks again.
                
> Enable StoreFunc to make intelligent decision based on job success or failure
> -----------------------------------------------------------------------------
>
>                 Key: PIG-1891
>                 URL: https://issues.apache.org/jira/browse/PIG-1891
>             Project: Pig
>          Issue Type: New Feature
>    Affects Versions: 0.10.0
>            Reporter: Alex Rovner
>            Priority: Minor
>              Labels: patch
>         Attachments: PIG-1891-1.patch, PIG-1891-2.patch
>
>
> We are in the process of using PIG for various data processing and component integration. Here is where we feel pig storage funcs lack:
> They are not aware if the over all job has succeeded. This creates a problem for storage funcs which needs to "upload" results into another system:
> DB, FTP, another file system etc.
> I looked at the DBStorage in the piggybank (http://svn.apache.org/viewvc/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/DBStorage.java?view=markup) and what I see is essentially a mechanism which for each task does the following:
> 1. Creates a recordwriter (in this case open connection to db)
> 2. Open transaction.
> 3. Writes records into a batch
> 4. Executes commit or rollback depending if the task was successful.
> While this aproach works great on a task level, it does not work at all on a job level. 
> If certain tasks will succeed but over job will fail, partial records are going to get uploaded into the DB.
> Any ideas on the workaround? 
> Our current workaround is fairly ugly: We created a java wrapper that launches pig jobs and then uploads to DB's once pig's job is successful. While the approach works, it's not really integrated into pig.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira