You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Daniel Dai (JIRA)" <ji...@apache.org> on 2011/02/01 20:06:28 UTC

[jira] Updated: (PIG-1831) Indeterministic behavior in local mode due to static variable PigMapReduce.sJobConf

     [ https://issues.apache.org/jira/browse/PIG-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai updated PIG-1831:
----------------------------

    Attachment: PIG-1831-1.patch

Richard suggest a better fix using ThreadLocal variable instead of static variable. Still keep static sJobConf for backward compatibility though it is already marked as deprecate in 0.7. In theory, if UDF still use deprecated sJobConf, they might see the same issue. But the chance of it should be very low.

> Indeterministic behavior in local mode due to static variable PigMapReduce.sJobConf
> -----------------------------------------------------------------------------------
>
>                 Key: PIG-1831
>                 URL: https://issues.apache.org/jira/browse/PIG-1831
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Vivek Padmanabhan
>            Assignee: Daniel Dai
>         Attachments: PIG-1831-0.patch, PIG-1831-1.patch
>
>
> The below script when run in local mode gives me a different output. It looks like in local mode I have to store a relation obtained through streaming in order to use it afterwards.
>  For example consider the below script : 
> DEFINE MySTREAMUDF `test.sh`;
> A  = LOAD 'myinput' USING PigStorage() AS (myId:chararray, data2, data3,data4 );
> B = STREAM A THROUGH MySTREAMUDF AS (wId:chararray, num:int);
> --STORE B into 'output.B';
> C = JOIN B by wId LEFT OUTER, A by myId;
> D = FOREACH C GENERATE B::wId,B::num,data4 ;
> D = STREAM D THROUGH MySTREAMUDF AS (f1:chararray,f2:int);
> --STORE D into 'output.D';
> E = foreach B GENERATE wId,num;
> F = DISTINCT E;
> G = GROUP F ALL;
> H = FOREACH G GENERATE COUNT_STAR(F) as TotalCount;
> I = CROSS D,H;
> STORE I  into 'output.I';
> test.sh
> ---------
> #/bin/bash
> cut -f1,3
> And input is 
> abcd    label1  11      feature1
> acbd    label2  22      feature2
> adbc    label3  33      feature3
> Here if I store relation B and D then everytime i get the result  :
> acbd            3
> abcd            3
> adbc            3
> But if i dont store relations B and D then I get an empty output.  Here again I have observed that this behaviour is random ie sometimes like 1out of 5 runs there will be output. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira