You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Joel D <ga...@gmail.com> on 2016/07/09 23:19:31 UTC

Code works in MR but not in Tez

Hi,

Below code work in pig MapReduce mode but doesn't in Tez. In the sense
mstat should return 'matches' but returns nothing when executed in tez mode.

cd1 = LOAD '/user/falcon/data/cd1.txt' USING PigStorage('\n') AS first:
chararray;
cd2 = LOAD '/user/falcon/data/cd1.txt' USING PigStorage('\n') AS second:
chararray;


combined = JOIN cd1 BY first FULL OUTER, cd2 BY second;


mstat = FOREACH combined GENERATE (
  CASE
    WHEN cd1.first == cd2.second THEN 'matches'
else 'mismatch'
  END
) as match_status;

dump mstat;



Suggestions please.

Thanks,
Joel

Re: Code works in MR but not in Tez

Posted by Joel D <ga...@gmail.com>.
Thanks Rohini.

I changed the code and used "::". It worked in both mr and tez.

Changed code:
mstat = FOREACH combined GENERATE (
  CASE
    WHEN (cd1::first is null ? 'null' : cd1::first) == (cd2::second is null
? 'null' : cd2::second) THEN 'matches'
else 'mismatch'
  END
) as match_status;


Thanks.

On Tue, Jul 12, 2016 at 2:01 PM, Rohini Palaniswamy <rohini.aditya@gmail.com
> wrote:

> Are you sure it worked in MR. You should have got an error like
>
> *Scalar has more than one row in the output. 1st : (xxxx), 2nd :(yyyy)
> (common cause: "JOIN" then "FOREACH ... GENERATE foo.bar" should be
> "foo::bar" )*
>
> cd1.first == cd2.second  should be written as cd1::first == cd2::second.
> Refer http://pig.apache.org/docs/r0.16.0/basic.html#disambiguate
>
> On Sat, Jul 9, 2016 at 4:19 PM, Joel D <ga...@gmail.com> wrote:
>
>> Hi,
>>
>> Below code work in pig MapReduce mode but doesn't in Tez. In the sense
>> mstat should return 'matches' but returns nothing when executed in tez mode.
>>
>> cd1 = LOAD '/user/falcon/data/cd1.txt' USING PigStorage('\n') AS first:
>> chararray;
>> cd2 = LOAD '/user/falcon/data/cd1.txt' USING PigStorage('\n') AS second:
>> chararray;
>>
>>
>> combined = JOIN cd1 BY first FULL OUTER, cd2 BY second;
>>
>>
>> mstat = FOREACH combined GENERATE (
>>   CASE
>>     WHEN cd1.first == cd2.second THEN 'matches'
>> else 'mismatch'
>>   END
>> ) as match_status;
>>
>> dump mstat;
>>
>>
>>
>> Suggestions please.
>>
>> Thanks,
>> Joel
>>
>>
>>
>>
>

Re: Code works in MR but not in Tez

Posted by Joel D <ga...@gmail.com>.
Thanks Rohini.

I changed the code and used "::". It worked in both mr and tez.

Changed code:
mstat = FOREACH combined GENERATE (
  CASE
    WHEN (cd1::first is null ? 'null' : cd1::first) == (cd2::second is null
? 'null' : cd2::second) THEN 'matches'
else 'mismatch'
  END
) as match_status;


Thanks.

On Tue, Jul 12, 2016 at 2:01 PM, Rohini Palaniswamy <rohini.aditya@gmail.com
> wrote:

> Are you sure it worked in MR. You should have got an error like
>
> *Scalar has more than one row in the output. 1st : (xxxx), 2nd :(yyyy)
> (common cause: "JOIN" then "FOREACH ... GENERATE foo.bar" should be
> "foo::bar" )*
>
> cd1.first == cd2.second  should be written as cd1::first == cd2::second.
> Refer http://pig.apache.org/docs/r0.16.0/basic.html#disambiguate
>
> On Sat, Jul 9, 2016 at 4:19 PM, Joel D <ga...@gmail.com> wrote:
>
>> Hi,
>>
>> Below code work in pig MapReduce mode but doesn't in Tez. In the sense
>> mstat should return 'matches' but returns nothing when executed in tez mode.
>>
>> cd1 = LOAD '/user/falcon/data/cd1.txt' USING PigStorage('\n') AS first:
>> chararray;
>> cd2 = LOAD '/user/falcon/data/cd1.txt' USING PigStorage('\n') AS second:
>> chararray;
>>
>>
>> combined = JOIN cd1 BY first FULL OUTER, cd2 BY second;
>>
>>
>> mstat = FOREACH combined GENERATE (
>>   CASE
>>     WHEN cd1.first == cd2.second THEN 'matches'
>> else 'mismatch'
>>   END
>> ) as match_status;
>>
>> dump mstat;
>>
>>
>>
>> Suggestions please.
>>
>> Thanks,
>> Joel
>>
>>
>>
>>
>

Re: Code works in MR but not in Tez

Posted by Rohini Palaniswamy <ro...@gmail.com>.
Are you sure it worked in MR. You should have got an error like

*Scalar has more than one row in the output. 1st : (xxxx), 2nd :(yyyy)
(common cause: "JOIN" then "FOREACH ... GENERATE foo.bar" should be
"foo::bar" )*

cd1.first == cd2.second  should be written as cd1::first == cd2::second.
Refer http://pig.apache.org/docs/r0.16.0/basic.html#disambiguate

On Sat, Jul 9, 2016 at 4:19 PM, Joel D <ga...@gmail.com> wrote:

> Hi,
>
> Below code work in pig MapReduce mode but doesn't in Tez. In the sense
> mstat should return 'matches' but returns nothing when executed in tez mode.
>
> cd1 = LOAD '/user/falcon/data/cd1.txt' USING PigStorage('\n') AS first:
> chararray;
> cd2 = LOAD '/user/falcon/data/cd1.txt' USING PigStorage('\n') AS second:
> chararray;
>
>
> combined = JOIN cd1 BY first FULL OUTER, cd2 BY second;
>
>
> mstat = FOREACH combined GENERATE (
>   CASE
>     WHEN cd1.first == cd2.second THEN 'matches'
> else 'mismatch'
>   END
> ) as match_status;
>
> dump mstat;
>
>
>
> Suggestions please.
>
> Thanks,
> Joel
>
>
>
>

Re: Code works in MR but not in Tez

Posted by Rohini Palaniswamy <ro...@gmail.com>.
Are you sure it worked in MR. You should have got an error like

*Scalar has more than one row in the output. 1st : (xxxx), 2nd :(yyyy)
(common cause: "JOIN" then "FOREACH ... GENERATE foo.bar" should be
"foo::bar" )*

cd1.first == cd2.second  should be written as cd1::first == cd2::second.
Refer http://pig.apache.org/docs/r0.16.0/basic.html#disambiguate

On Sat, Jul 9, 2016 at 4:19 PM, Joel D <ga...@gmail.com> wrote:

> Hi,
>
> Below code work in pig MapReduce mode but doesn't in Tez. In the sense
> mstat should return 'matches' but returns nothing when executed in tez mode.
>
> cd1 = LOAD '/user/falcon/data/cd1.txt' USING PigStorage('\n') AS first:
> chararray;
> cd2 = LOAD '/user/falcon/data/cd1.txt' USING PigStorage('\n') AS second:
> chararray;
>
>
> combined = JOIN cd1 BY first FULL OUTER, cd2 BY second;
>
>
> mstat = FOREACH combined GENERATE (
>   CASE
>     WHEN cd1.first == cd2.second THEN 'matches'
> else 'mismatch'
>   END
> ) as match_status;
>
> dump mstat;
>
>
>
> Suggestions please.
>
> Thanks,
> Joel
>
>
>
>