You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Joel D <ga...@gmail.com> on 2016/07/09 23:19:31 UTC
Code works in MR but not in Tez
Hi,
Below code work in pig MapReduce mode but doesn't in Tez. In the sense
mstat should return 'matches' but returns nothing when executed in tez mode.
cd1 = LOAD '/user/falcon/data/cd1.txt' USING PigStorage('\n') AS first:
chararray;
cd2 = LOAD '/user/falcon/data/cd1.txt' USING PigStorage('\n') AS second:
chararray;
combined = JOIN cd1 BY first FULL OUTER, cd2 BY second;
mstat = FOREACH combined GENERATE (
CASE
WHEN cd1.first == cd2.second THEN 'matches'
else 'mismatch'
END
) as match_status;
dump mstat;
Suggestions please.
Thanks,
Joel
Re: Code works in MR but not in Tez
Posted by Joel D <ga...@gmail.com>.
Thanks Rohini.
I changed the code and used "::". It worked in both mr and tez.
Changed code:
mstat = FOREACH combined GENERATE (
CASE
WHEN (cd1::first is null ? 'null' : cd1::first) == (cd2::second is null
? 'null' : cd2::second) THEN 'matches'
else 'mismatch'
END
) as match_status;
Thanks.
On Tue, Jul 12, 2016 at 2:01 PM, Rohini Palaniswamy <rohini.aditya@gmail.com
> wrote:
> Are you sure it worked in MR. You should have got an error like
>
> *Scalar has more than one row in the output. 1st : (xxxx), 2nd :(yyyy)
> (common cause: "JOIN" then "FOREACH ... GENERATE foo.bar" should be
> "foo::bar" )*
>
> cd1.first == cd2.second should be written as cd1::first == cd2::second.
> Refer http://pig.apache.org/docs/r0.16.0/basic.html#disambiguate
>
> On Sat, Jul 9, 2016 at 4:19 PM, Joel D <ga...@gmail.com> wrote:
>
>> Hi,
>>
>> Below code work in pig MapReduce mode but doesn't in Tez. In the sense
>> mstat should return 'matches' but returns nothing when executed in tez mode.
>>
>> cd1 = LOAD '/user/falcon/data/cd1.txt' USING PigStorage('\n') AS first:
>> chararray;
>> cd2 = LOAD '/user/falcon/data/cd1.txt' USING PigStorage('\n') AS second:
>> chararray;
>>
>>
>> combined = JOIN cd1 BY first FULL OUTER, cd2 BY second;
>>
>>
>> mstat = FOREACH combined GENERATE (
>> CASE
>> WHEN cd1.first == cd2.second THEN 'matches'
>> else 'mismatch'
>> END
>> ) as match_status;
>>
>> dump mstat;
>>
>>
>>
>> Suggestions please.
>>
>> Thanks,
>> Joel
>>
>>
>>
>>
>
Re: Code works in MR but not in Tez
Posted by Joel D <ga...@gmail.com>.
Thanks Rohini.
I changed the code and used "::". It worked in both mr and tez.
Changed code:
mstat = FOREACH combined GENERATE (
CASE
WHEN (cd1::first is null ? 'null' : cd1::first) == (cd2::second is null
? 'null' : cd2::second) THEN 'matches'
else 'mismatch'
END
) as match_status;
Thanks.
On Tue, Jul 12, 2016 at 2:01 PM, Rohini Palaniswamy <rohini.aditya@gmail.com
> wrote:
> Are you sure it worked in MR. You should have got an error like
>
> *Scalar has more than one row in the output. 1st : (xxxx), 2nd :(yyyy)
> (common cause: "JOIN" then "FOREACH ... GENERATE foo.bar" should be
> "foo::bar" )*
>
> cd1.first == cd2.second should be written as cd1::first == cd2::second.
> Refer http://pig.apache.org/docs/r0.16.0/basic.html#disambiguate
>
> On Sat, Jul 9, 2016 at 4:19 PM, Joel D <ga...@gmail.com> wrote:
>
>> Hi,
>>
>> Below code work in pig MapReduce mode but doesn't in Tez. In the sense
>> mstat should return 'matches' but returns nothing when executed in tez mode.
>>
>> cd1 = LOAD '/user/falcon/data/cd1.txt' USING PigStorage('\n') AS first:
>> chararray;
>> cd2 = LOAD '/user/falcon/data/cd1.txt' USING PigStorage('\n') AS second:
>> chararray;
>>
>>
>> combined = JOIN cd1 BY first FULL OUTER, cd2 BY second;
>>
>>
>> mstat = FOREACH combined GENERATE (
>> CASE
>> WHEN cd1.first == cd2.second THEN 'matches'
>> else 'mismatch'
>> END
>> ) as match_status;
>>
>> dump mstat;
>>
>>
>>
>> Suggestions please.
>>
>> Thanks,
>> Joel
>>
>>
>>
>>
>
Re: Code works in MR but not in Tez
Posted by Rohini Palaniswamy <ro...@gmail.com>.
Are you sure it worked in MR. You should have got an error like
*Scalar has more than one row in the output. 1st : (xxxx), 2nd :(yyyy)
(common cause: "JOIN" then "FOREACH ... GENERATE foo.bar" should be
"foo::bar" )*
cd1.first == cd2.second should be written as cd1::first == cd2::second.
Refer http://pig.apache.org/docs/r0.16.0/basic.html#disambiguate
On Sat, Jul 9, 2016 at 4:19 PM, Joel D <ga...@gmail.com> wrote:
> Hi,
>
> Below code work in pig MapReduce mode but doesn't in Tez. In the sense
> mstat should return 'matches' but returns nothing when executed in tez mode.
>
> cd1 = LOAD '/user/falcon/data/cd1.txt' USING PigStorage('\n') AS first:
> chararray;
> cd2 = LOAD '/user/falcon/data/cd1.txt' USING PigStorage('\n') AS second:
> chararray;
>
>
> combined = JOIN cd1 BY first FULL OUTER, cd2 BY second;
>
>
> mstat = FOREACH combined GENERATE (
> CASE
> WHEN cd1.first == cd2.second THEN 'matches'
> else 'mismatch'
> END
> ) as match_status;
>
> dump mstat;
>
>
>
> Suggestions please.
>
> Thanks,
> Joel
>
>
>
>
Re: Code works in MR but not in Tez
Posted by Rohini Palaniswamy <ro...@gmail.com>.
Are you sure it worked in MR. You should have got an error like
*Scalar has more than one row in the output. 1st : (xxxx), 2nd :(yyyy)
(common cause: "JOIN" then "FOREACH ... GENERATE foo.bar" should be
"foo::bar" )*
cd1.first == cd2.second should be written as cd1::first == cd2::second.
Refer http://pig.apache.org/docs/r0.16.0/basic.html#disambiguate
On Sat, Jul 9, 2016 at 4:19 PM, Joel D <ga...@gmail.com> wrote:
> Hi,
>
> Below code work in pig MapReduce mode but doesn't in Tez. In the sense
> mstat should return 'matches' but returns nothing when executed in tez mode.
>
> cd1 = LOAD '/user/falcon/data/cd1.txt' USING PigStorage('\n') AS first:
> chararray;
> cd2 = LOAD '/user/falcon/data/cd1.txt' USING PigStorage('\n') AS second:
> chararray;
>
>
> combined = JOIN cd1 BY first FULL OUTER, cd2 BY second;
>
>
> mstat = FOREACH combined GENERATE (
> CASE
> WHEN cd1.first == cd2.second THEN 'matches'
> else 'mismatch'
> END
> ) as match_status;
>
> dump mstat;
>
>
>
> Suggestions please.
>
> Thanks,
> Joel
>
>
>
>