You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Vivek Padmanabhan (JIRA)" <ji...@apache.org> on 2011/01/28 11:07:43 UTC

[jira] Updated: (PIG-1831) Variation in output while using streaming udfs in local mode

     [ https://issues.apache.org/jira/browse/PIG-1831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vivek Padmanabhan updated PIG-1831:
-----------------------------------

    Description: 
The below script when run in local mode gives me a different output. It looks like in local mode I have to store a relation obtained through streaming in order to use it afterwards.

 For example consider the below script : 

DEFINE MySTREAMUDF `test.sh`;
A  = LOAD 'myinput' USING PigStorage() AS (myId:chararray, data2, data3,data4 );
B = STREAM A THROUGH MySTREAMUDF AS (wId:chararray, num:int);

--STORE B into 'output.B';

C = JOIN B by wId LEFT OUTER, A by myId;
D = FOREACH C GENERATE B::wId,B::num,data4 ;
D = STREAM D THROUGH MySTREAMUDF AS (f1:chararray,f2:int);

--STORE D into 'output.D';

E = foreach B GENERATE wId,num;
F = DISTINCT E;
G = GROUP F ALL;
H = FOREACH G GENERATE COUNT_STAR(F) as TotalCount;
I = CROSS D,H;
STORE I  into 'output.I';


#/bin/bash
cut -f1,3


And input is 
abcd    label1  11      feature1
acbd    label2  22      feature2
adbc    label3  33      feature3


Here if I store relation B and D then everytime i get the result  :
acbd            3
abcd            3
adbc            3

But if i dont store relations B and D then I get an empty output.  




  was:
The below script when run in local mode gives me a different output. It looks like in local mode I have to store a relation obtained through streaming in order to use it afterwards.

 For example consider the below script : 
{code:lang=scala|title=} 
DEFINE MySTREAMUDF `test.sh`;

A  = LOAD 'myinput' USING PigStorage() AS (myId:chararray, data2, data3,data4 );
B = STREAM A THROUGH MySTREAMUDF AS (wId:chararray, num:int);

--STORE B into 'output.B';

C = JOIN B by wId LEFT OUTER, A by myId;
D = FOREACH C GENERATE B::wId,B::num,data4 ;
D = STREAM D THROUGH MySTREAMUDF AS (f1:chararray,f2:int);

--STORE D into 'output.D';

E = foreach B GENERATE wId,num;
F = DISTINCT E;
G = GROUP F ALL;
H = FOREACH G GENERATE COUNT_STAR(F) as TotalCount;
I = CROSS D,H;
STORE I  into 'output.I';
{code}

{code:lang=scala|title=test.sh}
#/bin/bash
cut -f1,3
{code}

And input is 
>abcd    label1  11      feature1
>acbd    label2  22      feature2
>adbc    label3  33      feature3


Here if I store relation B and D then everytime i get the result  :
acbd            3
abcd            3
adbc            3

But if i dont store relations B and D then I get an empty output.  





> Variation in output while using streaming udfs in local mode
> ------------------------------------------------------------
>
>                 Key: PIG-1831
>                 URL: https://issues.apache.org/jira/browse/PIG-1831
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.0
>            Reporter: Vivek Padmanabhan
>
> The below script when run in local mode gives me a different output. It looks like in local mode I have to store a relation obtained through streaming in order to use it afterwards.
>  For example consider the below script : 
> DEFINE MySTREAMUDF `test.sh`;
> A  = LOAD 'myinput' USING PigStorage() AS (myId:chararray, data2, data3,data4 );
> B = STREAM A THROUGH MySTREAMUDF AS (wId:chararray, num:int);
> --STORE B into 'output.B';
> C = JOIN B by wId LEFT OUTER, A by myId;
> D = FOREACH C GENERATE B::wId,B::num,data4 ;
> D = STREAM D THROUGH MySTREAMUDF AS (f1:chararray,f2:int);
> --STORE D into 'output.D';
> E = foreach B GENERATE wId,num;
> F = DISTINCT E;
> G = GROUP F ALL;
> H = FOREACH G GENERATE COUNT_STAR(F) as TotalCount;
> I = CROSS D,H;
> STORE I  into 'output.I';
> #/bin/bash
> cut -f1,3
> And input is 
> abcd    label1  11      feature1
> acbd    label2  22      feature2
> adbc    label3  33      feature3
> Here if I store relation B and D then everytime i get the result  :
> acbd            3
> abcd            3
> adbc            3
> But if i dont store relations B and D then I get an empty output.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.