You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Vivek Padmanabhan (JIRA)" <ji...@apache.org> on 2011/05/06 08:40:03 UTC

[jira] [Created] (PIG-2045) Pig treating map values as String causing ClassCastException in CONCAT

Pig treating map values as String  causing ClassCastException in CONCAT
-----------------------------------------------------------------------

                 Key: PIG-2045
                 URL: https://issues.apache.org/jira/browse/PIG-2045
             Project: Pig
          Issue Type: Bug
          Components: impl
    Affects Versions: 0.8.0, 0.9.0
            Reporter: Vivek Padmanabhan


I have the below script ;

{code}
register mymapudf.jar;
a = load '4523893_1' as (f1);
a1 = foreach a generate org.vivek.udfs.mToMapUDF(f1);
a2 = foreach a1 generate mapout#'k1' as str1,mapout#'k3' as str2;
b = load '4523893_2' as (f1,f2);
c = join a2 by CONCAT(str1,str2) , b by CONCAT(f1,f2);
dump c;
{code}

The udf looks like below;
{code}
public class mToMapUDF  extends EvalFunc<Map> {

	@Override
	public Map<String, Object> exec(Tuple arg0) throws IOException {
		Map <String,Object> myMapTResult =  new HashMap<String, Object>();
		myMapTResult.put("k1", "SomeString");
		myMapTResult.put("k3", "SomeOtherString");
		return myMapTResult;
	}

	@Override
	public Schema outputSchema(Schema input) {
		// TODO Auto-generated method stub
		return new Schema(new Schema.FieldSchema("mapout",DataType.MAP));
	}
}
{code}





The script fails with exception ;
 java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.pig.data.DataByteArray
	at org.apache.pig.builtin.CONCAT.exec(CONCAT.java:51)


The values of the map output, ie str1 and str2, is autmomatically treated as String by Pig and this causes the ClassCast exception when it is used in subsequent udfs.
Since there are no explicit casting done nor any types defined, Pig should treat the values as the default bytearray. This issue is also observed in 0.9
The workaround in this case is to cast explicitly to chararray all keys involved in join.






--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (PIG-2045) Pig treating map values as String causing ClassCastException in CONCAT

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich resolved PIG-2045.
---------------------------------

    Resolution: Invalid

This is expexcted behavior and map keys need a cast. Otherwise, there is a mismatch bewtween function selected - one that handles bytearray and actual data producing strings

> Pig treating map values as String  causing ClassCastException in CONCAT
> -----------------------------------------------------------------------
>
>                 Key: PIG-2045
>                 URL: https://issues.apache.org/jira/browse/PIG-2045
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.8.0, 0.9.0
>            Reporter: Vivek Padmanabhan
>
> I have the below script ;
> {code}
> register mymapudf.jar;
> a = load '4523893_1' as (f1);
> a1 = foreach a generate org.vivek.udfs.mToMapUDF(f1);
> a2 = foreach a1 generate mapout#'k1' as str1,mapout#'k3' as str2;
> b = load '4523893_2' as (f1,f2);
> c = join a2 by CONCAT(str1,str2) , b by CONCAT(f1,f2);
> dump c;
> {code}
> The udf looks like below;
> {code}
> public class mToMapUDF  extends EvalFunc<Map> {
> 	@Override
> 	public Map<String, Object> exec(Tuple arg0) throws IOException {
> 		Map <String,Object> myMapTResult =  new HashMap<String, Object>();
> 		myMapTResult.put("k1", "SomeString");
> 		myMapTResult.put("k3", "SomeOtherString");
> 		return myMapTResult;
> 	}
> 	@Override
> 	public Schema outputSchema(Schema input) {
> 		// TODO Auto-generated method stub
> 		return new Schema(new Schema.FieldSchema("mapout",DataType.MAP));
> 	}
> }
> {code}
> The script fails with exception ;
>  java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.pig.data.DataByteArray
> 	at org.apache.pig.builtin.CONCAT.exec(CONCAT.java:51)
> The values of the map output, ie str1 and str2, is autmomatically treated as String by Pig and this causes the ClassCast exception when it is used in subsequent udfs.
> Since there are no explicit casting done nor any types defined, Pig should treat the values as the default bytearray. This issue is also observed in 0.9
> The workaround in this case is to cast explicitly to chararray all keys involved in join.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira