You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Eli Reisman (JIRA)" <ji...@apache.org> on 2012/09/01 21:59:07 UTC

[jira] [Commented] (PIG-1891) Enable StoreFunc to make intelligent decision based on job success or failure

    [ https://issues.apache.org/jira/browse/PIG-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13446793#comment-13446793 ] 

Eli Reisman commented on PIG-1891:
----------------------------------

Now when I run my local machine tests with 'ant test-commit' on PIG-1891-3.patch + trunk, I get this error (and only this error):


Testcase: testNumSamples took 22.016 sec
	FAILED
expected:<47> but was:<42>
junit.framework.AssertionFailedError: expected:<47> but was:<42>
	at org.apache.pig.test.TestPoissonSampleLoader.testNumSamples(TestPoissonSampleLoader.java:125)


I did not alter then number of allowed instantiations in the TestLoadStoreFuncLifeCycle test for loads, just stores, so perhaps this set off a ripple effect of other problems, its odd that the fail is in a loader. But I am unsure if this is directly related to this patch or an existing problem you guys know about so i thought I'd post here before hunting it down. Thanks again!

                
> Enable StoreFunc to make intelligent decision based on job success or failure
> -----------------------------------------------------------------------------
>
>                 Key: PIG-1891
>                 URL: https://issues.apache.org/jira/browse/PIG-1891
>             Project: Pig
>          Issue Type: New Feature
>    Affects Versions: 0.10.0
>            Reporter: Alex Rovner
>            Priority: Minor
>              Labels: patch
>         Attachments: PIG-1891-1.patch, PIG-1891-2.patch, PIG-1891-3.patch
>
>
> We are in the process of using PIG for various data processing and component integration. Here is where we feel pig storage funcs lack:
> They are not aware if the over all job has succeeded. This creates a problem for storage funcs which needs to "upload" results into another system:
> DB, FTP, another file system etc.
> I looked at the DBStorage in the piggybank (http://svn.apache.org/viewvc/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/DBStorage.java?view=markup) and what I see is essentially a mechanism which for each task does the following:
> 1. Creates a recordwriter (in this case open connection to db)
> 2. Open transaction.
> 3. Writes records into a batch
> 4. Executes commit or rollback depending if the task was successful.
> While this aproach works great on a task level, it does not work at all on a job level. 
> If certain tasks will succeed but over job will fail, partial records are going to get uploaded into the DB.
> Any ideas on the workaround? 
> Our current workaround is fairly ugly: We created a java wrapper that launches pig jobs and then uploads to DB's once pig's job is successful. While the approach works, it's not really integrated into pig.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira