You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Daniel Dai (JIRA)" <ji...@apache.org> on 2016/12/01 23:43:58 UTC

[jira] [Commented] (PIG-3938) Add LoadCaster to EvalFunc(UDF)

    [ https://issues.apache.org/jira/browse/PIG-3938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15713436#comment-15713436 ] 

Daniel Dai commented on PIG-3938:
---------------------------------

I prefer the following logic:
1. If getLoadCaster is not null, we shall always use it
2. If not, if all parent loadcaster are the same, use this common loadcaster

The definition of getLoadCasterFuncSpecFromParams sounds confusing and I'd better avoid it.

> Add LoadCaster to EvalFunc(UDF) 
> --------------------------------
>
>                 Key: PIG-3938
>                 URL: https://issues.apache.org/jira/browse/PIG-3938
>             Project: Pig
>          Issue Type: Bug
>          Components: internal-udfs
>    Affects Versions: 0.12.0, 0.11.1
>            Reporter: Hongchang Li
>            Assignee: Koji Noguchi
>         Attachments: pig-3938-v01.patch, pig-3938-v02.patch
>
>
> this ticket was very close to http://stackoverflow.com/questions/8828839/how-can-correct-data-types-on-apache-pig-be-enforced.
> To reproduce the issue, first, we have an UDF to cast map to bag, code almost like(http://stackoverflow.com/questions/12476929/group-key-value-of-map-in-pig?answertab=votes#tab-top)
> {code:title=test.pig}
> $ cat test.pig
> register polisan/maptobag.jar;
> define MAPTOBAG maptobag.MAPTOBAG();
> A = load 'polisan/input1.txt' using PigStorage(' ') as (id:chararray, kv:[]);
> B = foreach A generate id, MAPTOBAG(kv) as to_bag;
> C = foreach B generate id, flatten(to_bag) as (key:chararray, value:chararray);
> D = group C by (id, key);
> E = foreach D generate group, MIN(C.value);
> dump E;
> {code}
> {code:title=polisan/input1.pig}
> 1 [x#1,y#ab]
> 1 [x#2,y#cd]
> {code}
> then run the pig, I got exception as following:
> {noformat}
> 2014-05-15 19:44:52,944 [Thread-2] WARN  org.apache.hadoop.mapred.LocalJobRunner - job_local_0001
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing (Name: D: Local Rearrange[tuple]{tuple}(false) - scope-42 Operator Key: scope-42): org.apache.pig.backend.executionengine.ExecException: ERROR 2106: Error while computing min in Initial
> 	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:289)
> 	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNextTuple(POLocalRearrange.java:263)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:282)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277)
> 	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:1)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> 	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2106: Error while computing min in Initial
> 	at org.apache.pig.builtin.StringMin$Initial.exec(StringMin.java:81)
> 	at org.apache.pig.builtin.StringMin$Initial.exec(StringMin.java:1)
> 	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:352)
> 	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNextTuple(POUserFunc.java:391)
> 	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:334)
> 	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:378)
> 	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:298)
> 	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:281)
> 	... 8 more
> Caused by: java.lang.ClassCastException: org.apache.pig.data.DataByteArray cannot be cast to java.lang.String
> 	at org.apache.pig.builtin.StringMin$Initial.exec(StringMin.java:73)
> 	... 15 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)