You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Rohini Palaniswamy (JIRA)" <ji...@apache.org> on 2014/06/13 21:15:04 UTC

[jira] [Comment Edited] (PIG-3938) Type cast doesn't work after flatten result of UDF

    [ https://issues.apache.org/jira/browse/PIG-3938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14031029#comment-14031029 ] 

Rohini Palaniswamy edited comment on PIG-3938 at 6/13/14 7:14 PM:
------------------------------------------------------------------

[~polisan],
  I don't think simple script in description reproduces the exact issue faced by the user. 

> A = load 'input.txt' using PigStorage(' ') as (id:chararray, kv:[]);
 Schema of the map in this case would be key:chararray value:bytearrary.
> E = foreach B generate id, flatten(to_bag) as (key:chararray, value:chararray);
  As clause does not work in FOREACH GENERATE (PIG:2315). You have to
explicitly type cast. 

In the users case, it still did not work after doing equivalent of
kv:[chararray] in LOAD and doing explicit type cast in foreach
after flattening the bag. It should have worked just doing either
one of the above. So there are two issues to be investigated and fixed.


was (Author: rohini):
[~polisan],
  I don't think simple script in description reproduces the exact issue faced by the user. 

> A = load 'input.txt' using PigStorage(' ') as (id:chararray, kv:[]);
 Schema of the map in this case would be key:chararray value:bytearrary.
> E = foreach B generate id, flatten(to_bag) as (key:chararray, value:chararray);
  As clause does not work in FOREACH GENERATE (PIG:2315). You have to
explicitly type cast. 

In the users case, it still did not work after doing equivalent of
kv:[chararray] in LOAD (Comment 14) and doing explicit type cast in foreach
after flattening the bag (Comment 1). It should have worked just doing either
one of the above. So there are two issues to be investigated and fixed.

> Type cast doesn't work after flatten result of UDF
> --------------------------------------------------
>
>                 Key: PIG-3938
>                 URL: https://issues.apache.org/jira/browse/PIG-3938
>             Project: Pig
>          Issue Type: Bug
>          Components: internal-udfs
>    Affects Versions: 0.12.0, 0.11.1
>            Reporter: Hongchang Li
>
> this ticket was very close to http://stackoverflow.com/questions/8828839/how-can-correct-data-types-on-apache-pig-be-enforced.
> To reproduce the issue, first, we have an UDF to cast map to bag, code almost like(http://stackoverflow.com/questions/12476929/group-key-value-of-map-in-pig?answertab=votes#tab-top)
> {code:title=test.pig}
> $ cat test.pig
> register polisan/maptobag.jar;
> define MAPTOBAG maptobag.MAPTOBAG();
> A = load 'polisan/input1.txt' using PigStorage(' ') as (id:chararray, kv:[]);
> B = foreach A generate id, MAPTOBAG(kv) as to_bag;
> C = foreach B generate id, flatten(to_bag) as (key:chararray, value:chararray);
> D = group C by (id, key);
> E = foreach D generate group, MIN(C.value);
> dump E;
> {code}
> {code:title=polisan/input1.pig}
> 1 [x#1,y#ab]
> 1 [x#2,y#cd]
> {code}
> then run the pig, I got exception as following:
> {noformat}
> 2014-05-15 19:44:52,944 [Thread-2] WARN  org.apache.hadoop.mapred.LocalJobRunner - job_local_0001
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing (Name: D: Local Rearrange[tuple]{tuple}(false) - scope-42 Operator Key: scope-42): org.apache.pig.backend.executionengine.ExecException: ERROR 2106: Error while computing min in Initial
> 	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:289)
> 	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNextTuple(POLocalRearrange.java:263)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:282)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:1)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> 	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2106: Error while computing min in Initial
> 	at org.apache.pig.builtin.StringMin$Initial.exec(StringMin.java:81)
> 	at org.apache.pig.builtin.StringMin$Initial.exec(StringMin.java:1)
> 	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:352)
> 	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNextTuple(POUserFunc.java:391)
> 	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:334)
> 	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:378)
> 	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:298)
> 	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:281)
> 	... 8 more
> Caused by: java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot be cast to java.lang.String
> 	at org.apache.pig.builtin.StringMin$Initial.exec(StringMin.java:73)
> 	... 15 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)