You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Vivek Padmanabhan (JIRA)" <ji...@apache.org> on 2011/03/31 11:47:05 UTC

[jira] [Created] (PIG-1948) java.lang.ClassCastException while using double value from result of a group

java.lang.ClassCastException while using double value from result of a group
----------------------------------------------------------------------------

                 Key: PIG-1948
                 URL: https://issues.apache.org/jira/browse/PIG-1948
             Project: Pig
          Issue Type: Bug
          Components: impl
    Affects Versions: 0.8.0, 0.7.0, 0.9.0
            Reporter: Vivek Padmanabhan


I have a fairly simple script (but too many coloumns) which is failing with class cast exception.


{code}
register myudf.jar;
A = load 'newinput' as (datestamp: chararray,vtestid: chararray,src_kt1: chararray,f1: chararray,f2: chararray,f3: chararray,f4: chararray,f5: chararray,f6: int,ipc: chararray,woeid: long,woeid_place: chararray,f7: chararray,f8: double,woeid_latitude: double,f9: chararray,woeid_town: chararray,woeid_county: chararray,a1: chararray,a2: chararray,woeid_country: chararray,a3: chararray,connection_speed: chararray,isp_name: chararray,isp_domain: chararray,ecnt: int,vcnt: int,ccnt: int,startts: int,duration: int,endts: int,stqust: chararray,startqc: chararray,starts_con: chararray,starts_lng: chararray,startv_pk1: int,startv_pk2: int,startv_pk3: int,startv_pk4: int,startv_pk5: int,lastquerystring: chararray,lastqc: chararray,lasts_con: chararray,lasts_lng: chararray,lastv_pk1: int,lastv_pk2: int,lastv_pk3: int,lastv_pk4: int,lastv_pk5: int,b1: chararray,lastsection: chararray,lastseclink: chararray,lasturl: chararray,path: chararray,pathtype: chararray,firstlastquerymatch: int,log_duration: double,log_duration_sq: double,duration_sq: double);

B = foreach A generate  datestamp,src_kt1,vtestid,stqust,ecnt,vcnt,ccnt,log_duration,duration;
C = group B by ( datestamp, src_kt1,vtestid, stqust ) parallel 4;
D = foreach C generate COUNT( B ) as total, MyEval( B.log_duration ) as log_duration_summary;
store D into 'output';

{code}

The above script is failing with class cast exception;

{code}
java.lang.ClassCastException: java.lang.Double cannot be cast to java.lang.String
	at org.apache.pig.data.BinInterSedes.readMap(BinInterSedes.java:193)
	at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:280)
	at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:251)
	at org.apache.pig.data.BinInterSedes.readTuple(BinInterSedes.java:111)
	at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:270)
	at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:251)
	at org.apache.pig.data.BinInterSedes.addColsToTuple(BinInterSedes.java:555)
	at org.apache.pig.data.BinSedesTuple.readFields(BinSedesTuple.java:64)
	at org.apache.pig.impl.io.PigNullableWritable.readFields(PigNullableWritable.java:114)
	at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
	at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
	at org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
	at org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
	at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
	at org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1376)
        .
        .
{code}

The problem is happening in the line MyEval( B.log_duration ), here even though log_duration is defined as a double field  BinInterSedes is considering it as a map value, TINYMAP to be exact. Hence it is trying to cast the double value into the key identifier, ie a String .  This bug exists in 0.9 also.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (PIG-1948) java.lang.ClassCastException while using double value from result of a group

Posted by "Thejas M Nair (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thejas M Nair resolved PIG-1948.
--------------------------------

    Resolution: Invalid

The udf in this case was returning a map which had double as key, which is not supported by pig. This was causing the error during deserialization.


> java.lang.ClassCastException while using double value from result of a group
> ----------------------------------------------------------------------------
>
>                 Key: PIG-1948
>                 URL: https://issues.apache.org/jira/browse/PIG-1948
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.7.0, 0.8.0, 0.9.0
>            Reporter: Vivek Padmanabhan
>            Assignee: Thejas M Nair
>             Fix For: 0.8.0
>
>
> I have a fairly simple script (but too many coloumns) which is failing with class cast exception.
> {code}
> register myudf.jar;
> A = load 'newinput' as (datestamp: chararray,vtestid: chararray,src_kt1: chararray,f1: chararray,f2: chararray,f3: chararray,f4: chararray,f5: chararray,f6: int,ipc: chararray,woeid: long,woeid_place: chararray,f7: chararray,f8: double,woeid_latitude: double,f9: chararray,woeid_town: chararray,woeid_county: chararray,a1: chararray,a2: chararray,woeid_country: chararray,a3: chararray,connection_speed: chararray,isp_name: chararray,isp_domain: chararray,ecnt: int,vcnt: int,ccnt: int,startts: int,duration: int,endts: int,stqust: chararray,startqc: chararray,starts_con: chararray,starts_lng: chararray,startv_pk1: int,startv_pk2: int,startv_pk3: int,startv_pk4: int,startv_pk5: int,lastquerystring: chararray,lastqc: chararray,lasts_con: chararray,lasts_lng: chararray,lastv_pk1: int,lastv_pk2: int,lastv_pk3: int,lastv_pk4: int,lastv_pk5: int,b1: chararray,lastsection: chararray,lastseclink: chararray,lasturl: chararray,path: chararray,pathtype: chararray,firstlastquerymatch: int,log_duration: double,log_duration_sq: double,duration_sq: double);
> B = foreach A generate  datestamp,src_kt1,vtestid,stqust,ecnt,vcnt,ccnt,log_duration,duration;
> C = group B by ( datestamp, src_kt1,vtestid, stqust ) parallel 4;
> D = foreach C generate COUNT( B ) as total, MyEval( B.log_duration ) as log_duration_summary;
> store D into 'output';
> {code}
> The above script is failing with class cast exception;
> {code}
> java.lang.ClassCastException: java.lang.Double cannot be cast to java.lang.String
> 	at org.apache.pig.data.BinInterSedes.readMap(BinInterSedes.java:193)
> 	at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:280)
> 	at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:251)
> 	at org.apache.pig.data.BinInterSedes.readTuple(BinInterSedes.java:111)
> 	at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:270)
> 	at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:251)
> 	at org.apache.pig.data.BinInterSedes.addColsToTuple(BinInterSedes.java:555)
> 	at org.apache.pig.data.BinSedesTuple.readFields(BinSedesTuple.java:64)
> 	at org.apache.pig.impl.io.PigNullableWritable.readFields(PigNullableWritable.java:114)
> 	at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
> 	at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
> 	at org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
> 	at org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
> 	at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
> 	at org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1376)
>         .
>         .
> {code}
> The problem is happening in the line MyEval( B.log_duration ), here even though log_duration is defined as a double field  BinInterSedes is considering it as a map value, TINYMAP to be exact. Hence it is trying to cast the double value into the key identifier, ie a String .  This bug exists in 0.9 also.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (PIG-1948) java.lang.ClassCastException while using double value from result of a group

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich reassigned PIG-1948:
-----------------------------------

    Assignee: Thejas M Nair

> java.lang.ClassCastException while using double value from result of a group
> ----------------------------------------------------------------------------
>
>                 Key: PIG-1948
>                 URL: https://issues.apache.org/jira/browse/PIG-1948
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.7.0, 0.8.0, 0.9.0
>            Reporter: Vivek Padmanabhan
>            Assignee: Thejas M Nair
>             Fix For: 0.8.0
>
>
> I have a fairly simple script (but too many coloumns) which is failing with class cast exception.
> {code}
> register myudf.jar;
> A = load 'newinput' as (datestamp: chararray,vtestid: chararray,src_kt1: chararray,f1: chararray,f2: chararray,f3: chararray,f4: chararray,f5: chararray,f6: int,ipc: chararray,woeid: long,woeid_place: chararray,f7: chararray,f8: double,woeid_latitude: double,f9: chararray,woeid_town: chararray,woeid_county: chararray,a1: chararray,a2: chararray,woeid_country: chararray,a3: chararray,connection_speed: chararray,isp_name: chararray,isp_domain: chararray,ecnt: int,vcnt: int,ccnt: int,startts: int,duration: int,endts: int,stqust: chararray,startqc: chararray,starts_con: chararray,starts_lng: chararray,startv_pk1: int,startv_pk2: int,startv_pk3: int,startv_pk4: int,startv_pk5: int,lastquerystring: chararray,lastqc: chararray,lasts_con: chararray,lasts_lng: chararray,lastv_pk1: int,lastv_pk2: int,lastv_pk3: int,lastv_pk4: int,lastv_pk5: int,b1: chararray,lastsection: chararray,lastseclink: chararray,lasturl: chararray,path: chararray,pathtype: chararray,firstlastquerymatch: int,log_duration: double,log_duration_sq: double,duration_sq: double);
> B = foreach A generate  datestamp,src_kt1,vtestid,stqust,ecnt,vcnt,ccnt,log_duration,duration;
> C = group B by ( datestamp, src_kt1,vtestid, stqust ) parallel 4;
> D = foreach C generate COUNT( B ) as total, MyEval( B.log_duration ) as log_duration_summary;
> store D into 'output';
> {code}
> The above script is failing with class cast exception;
> {code}
> java.lang.ClassCastException: java.lang.Double cannot be cast to java.lang.String
> 	at org.apache.pig.data.BinInterSedes.readMap(BinInterSedes.java:193)
> 	at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:280)
> 	at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:251)
> 	at org.apache.pig.data.BinInterSedes.readTuple(BinInterSedes.java:111)
> 	at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:270)
> 	at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:251)
> 	at org.apache.pig.data.BinInterSedes.addColsToTuple(BinInterSedes.java:555)
> 	at org.apache.pig.data.BinSedesTuple.readFields(BinSedesTuple.java:64)
> 	at org.apache.pig.impl.io.PigNullableWritable.readFields(PigNullableWritable.java:114)
> 	at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
> 	at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
> 	at org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
> 	at org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
> 	at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
> 	at org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1376)
>         .
>         .
> {code}
> The problem is happening in the line MyEval( B.log_duration ), here even though log_duration is defined as a double field  BinInterSedes is considering it as a map value, TINYMAP to be exact. Hence it is trying to cast the double value into the key identifier, ie a String .  This bug exists in 0.9 also.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-1948) java.lang.ClassCastException while using double value from result of a group

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich updated PIG-1948:
--------------------------------

    Fix Version/s: 0.8.0

> java.lang.ClassCastException while using double value from result of a group
> ----------------------------------------------------------------------------
>
>                 Key: PIG-1948
>                 URL: https://issues.apache.org/jira/browse/PIG-1948
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.7.0, 0.8.0, 0.9.0
>            Reporter: Vivek Padmanabhan
>            Assignee: Thejas M Nair
>             Fix For: 0.8.0
>
>
> I have a fairly simple script (but too many coloumns) which is failing with class cast exception.
> {code}
> register myudf.jar;
> A = load 'newinput' as (datestamp: chararray,vtestid: chararray,src_kt1: chararray,f1: chararray,f2: chararray,f3: chararray,f4: chararray,f5: chararray,f6: int,ipc: chararray,woeid: long,woeid_place: chararray,f7: chararray,f8: double,woeid_latitude: double,f9: chararray,woeid_town: chararray,woeid_county: chararray,a1: chararray,a2: chararray,woeid_country: chararray,a3: chararray,connection_speed: chararray,isp_name: chararray,isp_domain: chararray,ecnt: int,vcnt: int,ccnt: int,startts: int,duration: int,endts: int,stqust: chararray,startqc: chararray,starts_con: chararray,starts_lng: chararray,startv_pk1: int,startv_pk2: int,startv_pk3: int,startv_pk4: int,startv_pk5: int,lastquerystring: chararray,lastqc: chararray,lasts_con: chararray,lasts_lng: chararray,lastv_pk1: int,lastv_pk2: int,lastv_pk3: int,lastv_pk4: int,lastv_pk5: int,b1: chararray,lastsection: chararray,lastseclink: chararray,lasturl: chararray,path: chararray,pathtype: chararray,firstlastquerymatch: int,log_duration: double,log_duration_sq: double,duration_sq: double);
> B = foreach A generate  datestamp,src_kt1,vtestid,stqust,ecnt,vcnt,ccnt,log_duration,duration;
> C = group B by ( datestamp, src_kt1,vtestid, stqust ) parallel 4;
> D = foreach C generate COUNT( B ) as total, MyEval( B.log_duration ) as log_duration_summary;
> store D into 'output';
> {code}
> The above script is failing with class cast exception;
> {code}
> java.lang.ClassCastException: java.lang.Double cannot be cast to java.lang.String
> 	at org.apache.pig.data.BinInterSedes.readMap(BinInterSedes.java:193)
> 	at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:280)
> 	at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:251)
> 	at org.apache.pig.data.BinInterSedes.readTuple(BinInterSedes.java:111)
> 	at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:270)
> 	at org.apache.pig.data.BinInterSedes.readDatum(BinInterSedes.java:251)
> 	at org.apache.pig.data.BinInterSedes.addColsToTuple(BinInterSedes.java:555)
> 	at org.apache.pig.data.BinSedesTuple.readFields(BinSedesTuple.java:64)
> 	at org.apache.pig.impl.io.PigNullableWritable.readFields(PigNullableWritable.java:114)
> 	at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
> 	at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
> 	at org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
> 	at org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
> 	at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
> 	at org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1376)
>         .
>         .
> {code}
> The problem is happening in the line MyEval( B.log_duration ), here even though log_duration is defined as a double field  BinInterSedes is considering it as a map value, TINYMAP to be exact. Hence it is trying to cast the double value into the key identifier, ie a String .  This bug exists in 0.9 also.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira