You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Michael Moss <mi...@gmail.com> on 2013/05/11 00:16:59 UTC

avro.java.string vs utf8 compatibility in recent pig and hive versions

Hello,

It looks like representing avro strings as Utf8 provide some interesting
performance enhancements, but I'm wondering if folks out there are actually
using it in practice, or have had any issues with it.

We have recently run into an issue where our avro files which represents
strings as "avro.java.string" are causing ClassCastExceptions because Pig
and Hive are expecting them to be Utf8. The exceptions occur when using
avro-1.7.x.jar, but dissapear when using version avro-1.5.3.jar.

I'm wondering if this is something that should be addressed in the avro
jar, or in pig and hive like this thread suggests:
https://issues.apache.org/jira/browse/PIG-3297

Here are the exceptions we are seeing:
*Hive:*
Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to
org.apache.avro.util.Utf8        at
org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserializeMap(AvroDeserializer.java:253)

*Pig:*
Caused by: java.io.IOException: java.lang.ClassCastException:
java.lang.String cannot be cast to org.apache.avro.util.Utf8
at
org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java:275)
 at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:194)
at
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:532)

Thanks.

-Mike

Re: avro.java.string vs utf8 compatibility in recent pig and hive versions

Posted by Scott Carey <sc...@apache.org>.
The change in the Pig loader in PIG-3297 seems correct ‹ they must use
CharSequence, not Utf8.

I suspect that the Avro 1.5.3.jar does not respect the "avro.java.string"
property and is using Utf8 (for the API that Pig is using), but have not
confirmed it.  "avro.java.string" is an optional hint for the Java
implementation.

On the Avro side, we may be able to make a modification that allows one to
configure a decoder or encoder to ignore the "avro.java.string" property.
Perhaps it could look for a system property as an override to help with
cases like this.


On 5/10/13 3:16 PM, "Michael Moss" <mi...@gmail.com> wrote:

> Hello, 
> 
> It looks like representing avro strings as Utf8 provide some interesting
> performance enhancements, but I'm wondering if folks out there are actually
> using it in practice, or have had any issues with it.
> 
> We have recently run into an issue where our avro files which represents
> strings as "avro.java.string" are causing ClassCastExceptions because Pig and
> Hive are expecting them to be Utf8. The exceptions occur when using
> avro-1.7.x.jar, but dissapear when using version avro-1.5.3.jar.
> 
> I'm wondering if this is something that should be addressed in the avro jar,
> or in pig and hive like this thread suggests:
> https://issues.apache.org/jira/browse/PIG-3297
> 
> Here are the exceptions we are seeing:
> Hive:
> Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to
> org.apache.avro.util.Utf8        at
> org.apache.hadoop.hive.serde2.avro.AvroDeserializer.deserializeMap(AvroDeseria
> lizer.java:253)
> 
> Pig:
> Caused by: java.io.IOException: java.lang.ClassCastException: java.lang.String
> cannot be cast to org.apache.avro.util.Utf8
> at 
> 
org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java:275>
)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.n
> extKeyValue(PigRecordReader.java:194)
> at 
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.
> java:532)
> 
> Thanks.
> 
> -Mike
> 
>