You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Andrew Kenworthy <ad...@yahoo.com> on 2012/01/09 10:15:53 UTC

Simple AvroStorage LOAD and STORE with Avro 1.6.0

Hallo,

When I run a simple pig script to LOAD and STORE avro data, I get:-

java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple cannot be cast to org.apache.avro.generic.IndexedRecord


Script:

REGISTER /tmp/avro-1.6.0.jar;
--REGISTER /tmp/avro-1.5.4.jar
--REGISTER /tmp/avro-1.4.1.jar;

REGISTER /tmp/piggybank-0.9.1.jar;
REGISTER /tmp/json-simple-1.1.jar;
REGISTER /tmp/jackson-core-asl-1.8.4.jar;
REGISTER /tmp/jackson-mapper-asl-1.8.4.jar;

avroData=LOAD '$DATA_INPUTDIR' USING org.apache.pig.piggybank.storage.avro.AvroStorage();

dataSubset = FOREACH avroData GENERATE myField1, myField2;
describe dataSubset;
-----------------------------------------------
-- shows: 
-- dataSubset : {myField1: int,myField2: int}
----------------------------------------------- 
STORE dataSubset INTO '$OUTPUTDIR' USING org.apache.pig.piggybank.storage.avro.AvroStorage();

If I use the 1.5.4 jar I get the same error, but the script works with the 1.4.1 version. If I just write one field, then it works with 1.6.0.

I see there's been a related issue fixed here:

https://issues.apache.org/jira/browse/PIG-2202 
https://issues.apache.org/jira/browse/PIG-2195 

Can anyone confirm that this or similar works with avro 1.6.0, and/or point me in the right direction concering where the problem may lie?

Many thanks,

Andrew

Re: Simple AvroStorage LOAD and STORE with Avro 1.6.0

Posted by Scott Carey <sc...@richrelevance.com>.
FYI:
https://issues.apache.org/jira/browse/AVRO-993

I expect that Avro 1.6.2 will add these methods back in.

On 1/11/12 1:47 AM, "Andrew Kenworthy" <ad...@yahoo.com> wrote:

>Hi Stan,
>
>Thank you for your feedback. I've run the script passing "-D
>mapred.child.java.opts=-verbose:class" and have the following in my logs:
>
>[Loaded org.apache.avro.generic.GenericDatumWriter from
>file:/var/lib/hadoop-0.20/cache/mapred/mapred/local/taskTracker/ankenworth
>y/jobcache/job_201111230039_0146/jars/job.jar]
>[Loaded org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter from
>file:/var/lib/hadoop-0.20/cache/mapred/mapred/local/taskTracker/ankenworth
>y/jobcache/job_201111230039_0146/jars/job.jar]
>
>I assume the .../job_201111230039_0146/jars/job.jar is the one prepared
>by pig using the jars I have REGISTER-ed, in which case the classes are
>the ones I expect, or have I misread that?
>
>Regards,
>
>Andrew
>
>
>
>>________________________________
>> From: Stan Rosenberg <sr...@proclivitysystems.com>
>>To: user@pig.apache.org; Andrew Kenworthy <ad...@yahoo.com>
>>Sent: Tuesday, January 10, 2012 5:36 PM
>>Subject: Re: Simple AvroStorage LOAD and STORE with Avro 1.6.0
>> 
>>Andrew,
>>
>>Something looks odd in this stack trace:
>>
>>Caused by: java.lang.ClassCastException:
>>org.apache.pig.data.BinSedesTuple cannot be cast to
>>org.apache.avro.generic.IndexedRecord
>>>         at 
>>>org.apache.avro.generic.GenericData.getField(GenericData.java:525)
>>>         at 
>>>org.apache.avro.generic.GenericData.getField(GenericData.java:540)
>>>         at 
>>>org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWrite
>>>r.java:103)
>>>         at 
>>>org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java
>>>:65)
>>>         at 
>>>org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.write(PigAvroDa
>>>tumWriter.java:99)
>>
>>PigAvroDatumWriter overrides 'GenericDatumWriter.writeRecord' in order
>>to extract values from a tuple.  Thus, I would expect that the third
>>method invocation be PigAvroDatumWriter.writeRecord.  Perhaps, someone
>>else has more insight as to why it's not getting invoked.  In the
>>meantime, please confirm that both PigAvroDatumWriter and
>>GenericDatumWriter are loaded from the right jar files. (You can do
>>this by temporarily changing the pig script to invoke JVM with 'java
>>-verbose' and 'grep' the output for these classes.)
>>
>>Best,
>>
>>stan
>>
>>On Tue, Jan 10, 2012 at 8:03 AM, Andrew Kenworthy
>><ad...@yahoo.com> wrote:
>>> Hi Stan,
>>>
>>> here's the full stacktrace:
>>>
>>> org.apache.avro.file.DataFileWriter$AppendWriteException:
>>>java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple cannot
>>>be cast to org.apache.avro.generic.IndexedRecord
>>>         at 
>>>org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:261)
>>>         at 
>>>org.apache.pig.piggybank.storage.avro.PigAvroRecordWriter.write(PigAvroR
>>>ecordWriter.java:49)
>>>         at 
>>>org.apache.pig.piggybank.storage.avro.AvroStorage.putNext(AvroStorage.ja
>>>va:580)
>>>         at 
>>>org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFo
>>>rmat$PigRecordWriter.write(PigOutputFormat.java:138)
>>>         at 
>>>org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFo
>>>rmat$PigRecordWriter.write(PigOutputFormat.java:97)
>>>         at 
>>>org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.
>>>java:530)
>>>         at 
>>>org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutput
>>>Context.java:80)
>>>         at 
>>>org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$
>>>Map.collect(PigMapOnly.java:48)
>>>         at 
>>>org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.
>>>runPipeline(PigMapBase.java:238)
>>>         at 
>>>org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.
>>>map(PigMapBase.java:231)
>>>         at 
>>>org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.
>>>map(PigMapBase.java:53)
>>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>>>         at 
>>>org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:646)
>>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322)
>>>         at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>>>         at java.security.AccessController.doPrivileged(Native Method)
>>>         at javax.security.auth.Subject.doAs(Subject.java:396)
>>>         at 
>>>org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformatio
>>>n.java:1115)
>>>         at org.apache.hadoop.mapred.Child.main(Child.java:262)
>>> Caused by: java.lang.ClassCastException:
>>>org.apache.pig.data.BinSedesTuple cannot be cast to
>>>org.apache.avro.generic.IndexedRecord
>>>         at 
>>>org.apache.avro.generic.GenericData.getField(GenericData.java:525)
>>>         at 
>>>org.apache.avro.generic.GenericData.getField(GenericData.java:540)
>>>         at 
>>>org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWrite
>>>r.java:103)
>>>         at 
>>>org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java
>>>:65)
>>>         at 
>>>org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.write(PigAvroDa
>>>tumWriter.java:99)
>>>         at 
>>>org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java
>>>:57)
>>>         at 
>>>org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:255)
>>>         ... 18 more
>>>
>>>
>>> Andrew
>>>
>>>
>>>
>>>>________________________________
>>>> From: Stan Rosenberg <sr...@proclivitysystems.com>
>>>>To: user@pig.apache.org; Andrew Kenworthy <ad...@yahoo.com>
>>>>Sent: Monday, January 9, 2012 5:30 PM
>>>>Subject: Re: Simple AvroStorage LOAD and STORE with Avro 1.6.0
>>>>
>>>>Andrew,
>>>>
>>>>The source of the problem may be AvroStorage in piggybank.  Could you
>>>>please include the entire stack trace?
>>>>
>>>>stan
>>>>
>>>>On Mon, Jan 9, 2012 at 4:15 AM, Andrew Kenworthy
>>>><ad...@yahoo.com> wrote:
>>>>> Hallo,
>>>>>
>>>>> When I run a simple pig script to LOAD and STORE avro data, I get:-
>>>>>
>>>>> java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple
>>>>>cannot be cast to org.apache.avro.generic.IndexedRecord
>>>>>
>>>>>
>>>>> Script:
>>>>>
>>>>> REGISTER /tmp/avro-1.6.0.jar;
>>>>> --REGISTER /tmp/avro-1.5.4.jar
>>>>> --REGISTER /tmp/avro-1.4.1.jar;
>>>>>
>>>>> REGISTER /tmp/piggybank-0.9.1.jar;
>>>>> REGISTER /tmp/json-simple-1.1.jar;
>>>>> REGISTER /tmp/jackson-core-asl-1.8.4.jar;
>>>>> REGISTER /tmp/jackson-mapper-asl-1.8.4.jar;
>>>>>
>>>>> avroData=LOAD '$DATA_INPUTDIR' USING
>>>>>org.apache.pig.piggybank.storage.avro.AvroStorage();
>>>>>
>>>>> dataSubset = FOREACH avroData GENERATE myField1, myField2;
>>>>> describe dataSubset;
>>>>> -----------------------------------------------
>>>>> -- shows:
>>>>> -- dataSubset : {myField1: int,myField2: int}
>>>>> -----------------------------------------------
>>>>> STORE dataSubset INTO '$OUTPUTDIR' USING
>>>>>org.apache.pig.piggybank.storage.avro.AvroStorage();
>>>>>
>>>>> If I use the 1.5.4 jar I get the same error, but the script works
>>>>>with the 1.4.1 version. If I just write one field, then it works with
>>>>>1.6.0.
>>>>>
>>>>> I see there's been a related issue fixed here:
>>>>>
>>>>> https://issues.apache.org/jira/browse/PIG-2202
>>>>> https://issues.apache.org/jira/browse/PIG-2195
>>>>>
>>>>> Can anyone confirm that this or similar works with avro 1.6.0,
>>>>>and/or point me in the right direction concering where the problem
>>>>>may lie?
>>>>>
>>>>> Many thanks,
>>>>>
>>>>> Andrew
>>>>
>>>>
>>>>
>>
>>


Re: Simple AvroStorage LOAD and STORE with Avro 1.6.0

Posted by Andrew Kenworthy <ad...@yahoo.com>.
Hi Stan,

Thank you for your feedback. I've run the script passing "-D mapred.child.java.opts=-verbose:class" and have the following in my logs:

[Loaded org.apache.avro.generic.GenericDatumWriter from file:/var/lib/hadoop-0.20/cache/mapred/mapred/local/taskTracker/ankenworthy/jobcache/job_201111230039_0146/jars/job.jar]
[Loaded org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter from file:/var/lib/hadoop-0.20/cache/mapred/mapred/local/taskTracker/ankenworthy/jobcache/job_201111230039_0146/jars/job.jar]

I assume the .../job_201111230039_0146/jars/job.jar is the one prepared by pig using the jars I have REGISTER-ed, in which case the classes are the ones I expect, or have I misread that?

Regards,

Andrew



>________________________________
> From: Stan Rosenberg <sr...@proclivitysystems.com>
>To: user@pig.apache.org; Andrew Kenworthy <ad...@yahoo.com> 
>Sent: Tuesday, January 10, 2012 5:36 PM
>Subject: Re: Simple AvroStorage LOAD and STORE with Avro 1.6.0
> 
>Andrew,
>
>Something looks odd in this stack trace:
>
>Caused by: java.lang.ClassCastException:
>org.apache.pig.data.BinSedesTuple cannot be cast to
>org.apache.avro.generic.IndexedRecord
>>         at org.apache.avro.generic.GenericData.getField(GenericData.java:525)
>>         at org.apache.avro.generic.GenericData.getField(GenericData.java:540)
>>         at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:103)
>>         at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:65)
>>         at org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.write(PigAvroDatumWriter.java:99)
>
>PigAvroDatumWriter overrides 'GenericDatumWriter.writeRecord' in order
>to extract values from a tuple.  Thus, I would expect that the third
>method invocation be PigAvroDatumWriter.writeRecord.  Perhaps, someone
>else has more insight as to why it's not getting invoked.  In the
>meantime, please confirm that both PigAvroDatumWriter and
>GenericDatumWriter are loaded from the right jar files. (You can do
>this by temporarily changing the pig script to invoke JVM with 'java
>-verbose' and 'grep' the output for these classes.)
>
>Best,
>
>stan
>
>On Tue, Jan 10, 2012 at 8:03 AM, Andrew Kenworthy
><ad...@yahoo.com> wrote:
>> Hi Stan,
>>
>> here's the full stacktrace:
>>
>> org.apache.avro.file.DataFileWriter$AppendWriteException: java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple cannot be cast to org.apache.avro.generic.IndexedRecord
>>         at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:261)
>>         at org.apache.pig.piggybank.storage.avro.PigAvroRecordWriter.write(PigAvroRecordWriter.java:49)
>>         at org.apache.pig.piggybank.storage.avro.AvroStorage.putNext(AvroStorage.java:580)
>>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:138)
>>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:97)
>>         at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:530)
>>         at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
>>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:238)
>>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
>>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
>>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:646)
>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322)
>>         at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>>         at java.security.AccessController.doPrivileged(Native Method)
>>         at javax.security.auth.Subject.doAs(Subject.java:396)
>>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>>         at org.apache.hadoop.mapred.Child.main(Child.java:262)
>> Caused by: java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple cannot be cast to org.apache.avro.generic.IndexedRecord
>>         at org.apache.avro.generic.GenericData.getField(GenericData.java:525)
>>         at org.apache.avro.generic.GenericData.getField(GenericData.java:540)
>>         at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:103)
>>         at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:65)
>>         at org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.write(PigAvroDatumWriter.java:99)
>>         at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:57)
>>         at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:255)
>>         ... 18 more
>>
>>
>> Andrew
>>
>>
>>
>>>________________________________
>>> From: Stan Rosenberg <sr...@proclivitysystems.com>
>>>To: user@pig.apache.org; Andrew Kenworthy <ad...@yahoo.com>
>>>Sent: Monday, January 9, 2012 5:30 PM
>>>Subject: Re: Simple AvroStorage LOAD and STORE with Avro 1.6.0
>>>
>>>Andrew,
>>>
>>>The source of the problem may be AvroStorage in piggybank.  Could you
>>>please include the entire stack trace?
>>>
>>>stan
>>>
>>>On Mon, Jan 9, 2012 at 4:15 AM, Andrew Kenworthy <ad...@yahoo.com> wrote:
>>>> Hallo,
>>>>
>>>> When I run a simple pig script to LOAD and STORE avro data, I get:-
>>>>
>>>> java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple cannot be cast to org.apache.avro.generic.IndexedRecord
>>>>
>>>>
>>>> Script:
>>>>
>>>> REGISTER /tmp/avro-1.6.0.jar;
>>>> --REGISTER /tmp/avro-1.5.4.jar
>>>> --REGISTER /tmp/avro-1.4.1.jar;
>>>>
>>>> REGISTER /tmp/piggybank-0.9.1.jar;
>>>> REGISTER /tmp/json-simple-1.1.jar;
>>>> REGISTER /tmp/jackson-core-asl-1.8.4.jar;
>>>> REGISTER /tmp/jackson-mapper-asl-1.8.4.jar;
>>>>
>>>> avroData=LOAD '$DATA_INPUTDIR' USING org.apache.pig.piggybank.storage.avro.AvroStorage();
>>>>
>>>> dataSubset = FOREACH avroData GENERATE myField1, myField2;
>>>> describe dataSubset;
>>>> -----------------------------------------------
>>>> -- shows:
>>>> -- dataSubset : {myField1: int,myField2: int}
>>>> -----------------------------------------------
>>>> STORE dataSubset INTO '$OUTPUTDIR' USING org.apache.pig.piggybank.storage.avro.AvroStorage();
>>>>
>>>> If I use the 1.5.4 jar I get the same error, but the script works with the 1.4.1 version. If I just write one field, then it works with 1.6.0.
>>>>
>>>> I see there's been a related issue fixed here:
>>>>
>>>> https://issues.apache.org/jira/browse/PIG-2202
>>>> https://issues.apache.org/jira/browse/PIG-2195
>>>>
>>>> Can anyone confirm that this or similar works with avro 1.6.0, and/or point me in the right direction concering where the problem may lie?
>>>>
>>>> Many thanks,
>>>>
>>>> Andrew
>>>
>>>
>>>
>
>
>

Re: Simple AvroStorage LOAD and STORE with Avro 1.6.0

Posted by Stan Rosenberg <sr...@proclivitysystems.com>.
Andrew,

Something looks odd in this stack trace:

Caused by: java.lang.ClassCastException:
org.apache.pig.data.BinSedesTuple cannot be cast to
org.apache.avro.generic.IndexedRecord
>         at org.apache.avro.generic.GenericData.getField(GenericData.java:525)
>         at org.apache.avro.generic.GenericData.getField(GenericData.java:540)
>         at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:103)
>         at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:65)
>         at org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.write(PigAvroDatumWriter.java:99)

PigAvroDatumWriter overrides 'GenericDatumWriter.writeRecord' in order
to extract values from a tuple.  Thus, I would expect that the third
method invocation be PigAvroDatumWriter.writeRecord.  Perhaps, someone
else has more insight as to why it's not getting invoked.  In the
meantime, please confirm that both PigAvroDatumWriter and
GenericDatumWriter are loaded from the right jar files. (You can do
this by temporarily changing the pig script to invoke JVM with 'java
-verbose' and 'grep' the output for these classes.)

Best,

stan

On Tue, Jan 10, 2012 at 8:03 AM, Andrew Kenworthy
<ad...@yahoo.com> wrote:
> Hi Stan,
>
> here's the full stacktrace:
>
> org.apache.avro.file.DataFileWriter$AppendWriteException: java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple cannot be cast to org.apache.avro.generic.IndexedRecord
>         at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:261)
>         at org.apache.pig.piggybank.storage.avro.PigAvroRecordWriter.write(PigAvroRecordWriter.java:49)
>         at org.apache.pig.piggybank.storage.avro.AvroStorage.putNext(AvroStorage.java:580)
>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:138)
>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:97)
>         at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:530)
>         at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:238)
>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
>         at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:646)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322)
>         at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>         at org.apache.hadoop.mapred.Child.main(Child.java:262)
> Caused by: java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple cannot be cast to org.apache.avro.generic.IndexedRecord
>         at org.apache.avro.generic.GenericData.getField(GenericData.java:525)
>         at org.apache.avro.generic.GenericData.getField(GenericData.java:540)
>         at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:103)
>         at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:65)
>         at org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.write(PigAvroDatumWriter.java:99)
>         at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:57)
>         at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:255)
>         ... 18 more
>
>
> Andrew
>
>
>
>>________________________________
>> From: Stan Rosenberg <sr...@proclivitysystems.com>
>>To: user@pig.apache.org; Andrew Kenworthy <ad...@yahoo.com>
>>Sent: Monday, January 9, 2012 5:30 PM
>>Subject: Re: Simple AvroStorage LOAD and STORE with Avro 1.6.0
>>
>>Andrew,
>>
>>The source of the problem may be AvroStorage in piggybank.  Could you
>>please include the entire stack trace?
>>
>>stan
>>
>>On Mon, Jan 9, 2012 at 4:15 AM, Andrew Kenworthy <ad...@yahoo.com> wrote:
>>> Hallo,
>>>
>>> When I run a simple pig script to LOAD and STORE avro data, I get:-
>>>
>>> java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple cannot be cast to org.apache.avro.generic.IndexedRecord
>>>
>>>
>>> Script:
>>>
>>> REGISTER /tmp/avro-1.6.0.jar;
>>> --REGISTER /tmp/avro-1.5.4.jar
>>> --REGISTER /tmp/avro-1.4.1.jar;
>>>
>>> REGISTER /tmp/piggybank-0.9.1.jar;
>>> REGISTER /tmp/json-simple-1.1.jar;
>>> REGISTER /tmp/jackson-core-asl-1.8.4.jar;
>>> REGISTER /tmp/jackson-mapper-asl-1.8.4.jar;
>>>
>>> avroData=LOAD '$DATA_INPUTDIR' USING org.apache.pig.piggybank.storage.avro.AvroStorage();
>>>
>>> dataSubset = FOREACH avroData GENERATE myField1, myField2;
>>> describe dataSubset;
>>> -----------------------------------------------
>>> -- shows:
>>> -- dataSubset : {myField1: int,myField2: int}
>>> -----------------------------------------------
>>> STORE dataSubset INTO '$OUTPUTDIR' USING org.apache.pig.piggybank.storage.avro.AvroStorage();
>>>
>>> If I use the 1.5.4 jar I get the same error, but the script works with the 1.4.1 version. If I just write one field, then it works with 1.6.0.
>>>
>>> I see there's been a related issue fixed here:
>>>
>>> https://issues.apache.org/jira/browse/PIG-2202
>>> https://issues.apache.org/jira/browse/PIG-2195
>>>
>>> Can anyone confirm that this or similar works with avro 1.6.0, and/or point me in the right direction concering where the problem may lie?
>>>
>>> Many thanks,
>>>
>>> Andrew
>>
>>
>>

Re: Simple AvroStorage LOAD and STORE with Avro 1.6.0

Posted by Andrew Kenworthy <ad...@yahoo.com>.
Hi Stan,

here's the full stacktrace:

org.apache.avro.file.DataFileWriter$AppendWriteException: java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple cannot be cast to org.apache.avro.generic.IndexedRecord
        at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:261)
        at org.apache.pig.piggybank.storage.avro.PigAvroRecordWriter.write(PigAvroRecordWriter.java:49)
        at org.apache.pig.piggybank.storage.avro.AvroStorage.putNext(AvroStorage.java:580)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:138)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:97)
        at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:530)
        at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:238)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:646)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:322)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
        at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple cannot be cast to org.apache.avro.generic.IndexedRecord
        at org.apache.avro.generic.GenericData.getField(GenericData.java:525)
        at org.apache.avro.generic.GenericData.getField(GenericData.java:540)
        at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:103)
        at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:65)
        at org.apache.pig.piggybank.storage.avro.PigAvroDatumWriter.write(PigAvroDatumWriter.java:99)
        at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:57)
        at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:255)
        ... 18 more


Andrew



>________________________________
> From: Stan Rosenberg <sr...@proclivitysystems.com>
>To: user@pig.apache.org; Andrew Kenworthy <ad...@yahoo.com> 
>Sent: Monday, January 9, 2012 5:30 PM
>Subject: Re: Simple AvroStorage LOAD and STORE with Avro 1.6.0
> 
>Andrew,
>
>The source of the problem may be AvroStorage in piggybank.  Could you
>please include the entire stack trace?
>
>stan
>
>On Mon, Jan 9, 2012 at 4:15 AM, Andrew Kenworthy <ad...@yahoo.com> wrote:
>> Hallo,
>>
>> When I run a simple pig script to LOAD and STORE avro data, I get:-
>>
>> java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple cannot be cast to org.apache.avro.generic.IndexedRecord
>>
>>
>> Script:
>>
>> REGISTER /tmp/avro-1.6.0.jar;
>> --REGISTER /tmp/avro-1.5.4.jar
>> --REGISTER /tmp/avro-1.4.1.jar;
>>
>> REGISTER /tmp/piggybank-0.9.1.jar;
>> REGISTER /tmp/json-simple-1.1.jar;
>> REGISTER /tmp/jackson-core-asl-1.8.4.jar;
>> REGISTER /tmp/jackson-mapper-asl-1.8.4.jar;
>>
>> avroData=LOAD '$DATA_INPUTDIR' USING org.apache.pig.piggybank.storage.avro.AvroStorage();
>>
>> dataSubset = FOREACH avroData GENERATE myField1, myField2;
>> describe dataSubset;
>> -----------------------------------------------
>> -- shows:
>> -- dataSubset : {myField1: int,myField2: int}
>> -----------------------------------------------
>> STORE dataSubset INTO '$OUTPUTDIR' USING org.apache.pig.piggybank.storage.avro.AvroStorage();
>>
>> If I use the 1.5.4 jar I get the same error, but the script works with the 1.4.1 version. If I just write one field, then it works with 1.6.0.
>>
>> I see there's been a related issue fixed here:
>>
>> https://issues.apache.org/jira/browse/PIG-2202
>> https://issues.apache.org/jira/browse/PIG-2195
>>
>> Can anyone confirm that this or similar works with avro 1.6.0, and/or point me in the right direction concering where the problem may lie?
>>
>> Many thanks,
>>
>> Andrew
>
>
>

Re: Simple AvroStorage LOAD and STORE with Avro 1.6.0

Posted by Stan Rosenberg <sr...@proclivitysystems.com>.
Andrew,

The source of the problem may be AvroStorage in piggybank.  Could you
please include the entire stack trace?

stan

On Mon, Jan 9, 2012 at 4:15 AM, Andrew Kenworthy <ad...@yahoo.com> wrote:
> Hallo,
>
> When I run a simple pig script to LOAD and STORE avro data, I get:-
>
> java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple cannot be cast to org.apache.avro.generic.IndexedRecord
>
>
> Script:
>
> REGISTER /tmp/avro-1.6.0.jar;
> --REGISTER /tmp/avro-1.5.4.jar
> --REGISTER /tmp/avro-1.4.1.jar;
>
> REGISTER /tmp/piggybank-0.9.1.jar;
> REGISTER /tmp/json-simple-1.1.jar;
> REGISTER /tmp/jackson-core-asl-1.8.4.jar;
> REGISTER /tmp/jackson-mapper-asl-1.8.4.jar;
>
> avroData=LOAD '$DATA_INPUTDIR' USING org.apache.pig.piggybank.storage.avro.AvroStorage();
>
> dataSubset = FOREACH avroData GENERATE myField1, myField2;
> describe dataSubset;
> -----------------------------------------------
> -- shows:
> -- dataSubset : {myField1: int,myField2: int}
> -----------------------------------------------
> STORE dataSubset INTO '$OUTPUTDIR' USING org.apache.pig.piggybank.storage.avro.AvroStorage();
>
> If I use the 1.5.4 jar I get the same error, but the script works with the 1.4.1 version. If I just write one field, then it works with 1.6.0.
>
> I see there's been a related issue fixed here:
>
> https://issues.apache.org/jira/browse/PIG-2202
> https://issues.apache.org/jira/browse/PIG-2195
>
> Can anyone confirm that this or similar works with avro 1.6.0, and/or point me in the right direction concering where the problem may lie?
>
> Many thanks,
>
> Andrew

Re: Simple AvroStorage LOAD and STORE with Avro 1.6.0

Posted by Bill Graham <bi...@gmail.com>.
I'd be cautious of using AvroStorage in it's current state with 1.6.0.

Running the piggybank unit tests against 1.6.0 causes compile failures, due
to non-backward compatible Avro changes in 1.6.0.
GenericDatumReader.newRecord(Object old, Schema schema) has gone away in
Avro 1.6.0.

   [javac]
/Users/billg/ws/git/pig/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/avro/PigAvroDatumReader.java:136:
method does not override or implement a method from a supertype
    [javac]     @Override
    [javac]     ^


I've just created this FYI:
https://issues.apache.org/jira/browse/PIG-2463


On Mon, Jan 9, 2012 at 1:21 PM, Russell Jurney <ru...@gmail.com>wrote:

> Avro 1.4.1 only works for me with PIG-2411 applied
>
> Russell Jurney
> twitter.com/rjurney
> russell.jurney@gmail.com
> datasyndrome.com
>
> On Jan 9, 2012, at 12:52 PM, Stan Rosenberg
> <sr...@proclivitysystems.com> wrote:
>
> > Generally, AvroStorage works fine for us with Avro 1.6.  However, we
> > also patched AvroStorage on a couple of occasions, e.g., see PIG-2330.
> >
> > stan
> >
> > On Mon, Jan 9, 2012 at 3:47 PM, Russell Jurney <ru...@gmail.com>
> wrote:
> >> I could only make AvroStorage work with Avro 1.4.1.
> >>
> >> Russell Jurney
> >> twitter.com/rjurney
> >> russell.jurney@gmail.com
> >> datasyndrome.com
> >>
> >> On Jan 9, 2012, at 1:16 AM, Andrew Kenworthy <ad...@yahoo.com>
> wrote:
> >>
> >> Hallo,
> >>
> >> When I run a simple pig script to LOAD and STORE avro data, I get:-
> >>
> >> java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple cannot
> be
> >> cast to org.apache.avro.generic.IndexedRecord
> >>
> >> Script:
> >>
> >> REGISTER /tmp/avro-1.6.0.jar;
> >> --REGISTER  /tmp/avro-1.5.4.jar
> >> --REGISTER  /tmp/avro-1.4.1.jar;
> >>
> >> REGISTER /tmp/piggybank-0.9.1.jar;
> >> REGISTER /tmp/json-simple-1.1.jar;
> >> REGISTER /tmp/jackson-core-asl-1.8.4.jar;
> >> REGISTER /tmp/jackson-mapper-asl-1.8.4.jar;
> >>
> >> avroData=LOAD '$DATA_INPUTDIR' USING
> >> org.apache.pig.piggybank.storage.avro.AvroStorage();
> >>
> >> dataSubset = FOREACH avroData GENERATE myField1, myField2;
> >> describe  dataSubset;
> >> -----------------------------------------------
> >> -- shows:
> >> -- dataSubset : { myField1: int, myField2: int}
> >> -----------------------------------------------
> >> STORE dataSubset INTO '$OUTPUTDIR' USING
> >> org.apache.pig.piggybank.storage.avro.AvroStorage();
> >>
> >> If I use the 1.5.4 jar I get the same error, but the script works with
> the
> >> 1.4.1 version. If I just write one field, then it works with 1.6.0.
> >>
> >> I see there's been a related issue fixed here:
> >>
> >> https://issues.apache.org/jira/browse/PIG-2202
> >> https://issues.apache.org/jira/browse/PIG-2195
> >>
> >> Can anyone confirm that this or similar works with avro 1.6.0, and/or
> point
> >> me in the right direction concering where the problem may lie?
> >>
> >> Many thanks,
> >>
> >> Andrew
>

Re: Simple AvroStorage LOAD and STORE with Avro 1.6.0

Posted by Russell Jurney <ru...@gmail.com>.
Avro 1.4.1 only works for me with PIG-2411 applied

Russell Jurney
twitter.com/rjurney
russell.jurney@gmail.com
datasyndrome.com

On Jan 9, 2012, at 12:52 PM, Stan Rosenberg
<sr...@proclivitysystems.com> wrote:

> Generally, AvroStorage works fine for us with Avro 1.6.  However, we
> also patched AvroStorage on a couple of occasions, e.g., see PIG-2330.
>
> stan
>
> On Mon, Jan 9, 2012 at 3:47 PM, Russell Jurney <ru...@gmail.com> wrote:
>> I could only make AvroStorage work with Avro 1.4.1.
>>
>> Russell Jurney
>> twitter.com/rjurney
>> russell.jurney@gmail.com
>> datasyndrome.com
>>
>> On Jan 9, 2012, at 1:16 AM, Andrew Kenworthy <ad...@yahoo.com> wrote:
>>
>> Hallo,
>>
>> When I run a simple pig script to LOAD and STORE avro data, I get:-
>>
>> java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple cannot be
>> cast to org.apache.avro.generic.IndexedRecord
>>
>> Script:
>>
>> REGISTER /tmp/avro-1.6.0.jar;
>> --REGISTER  /tmp/avro-1.5.4.jar
>> --REGISTER  /tmp/avro-1.4.1.jar;
>>
>> REGISTER /tmp/piggybank-0.9.1.jar;
>> REGISTER /tmp/json-simple-1.1.jar;
>> REGISTER /tmp/jackson-core-asl-1.8.4.jar;
>> REGISTER /tmp/jackson-mapper-asl-1.8.4.jar;
>>
>> avroData=LOAD '$DATA_INPUTDIR' USING
>> org.apache.pig.piggybank.storage.avro.AvroStorage();
>>
>> dataSubset = FOREACH avroData GENERATE myField1, myField2;
>> describe  dataSubset;
>> -----------------------------------------------
>> -- shows:
>> -- dataSubset : { myField1: int, myField2: int}
>> -----------------------------------------------
>> STORE dataSubset INTO '$OUTPUTDIR' USING
>> org.apache.pig.piggybank.storage.avro.AvroStorage();
>>
>> If I use the 1.5.4 jar I get the same error, but the script works with the
>> 1.4.1 version. If I just write one field, then it works with 1.6.0.
>>
>> I see there's been a related issue fixed here:
>>
>> https://issues.apache.org/jira/browse/PIG-2202
>> https://issues.apache.org/jira/browse/PIG-2195
>>
>> Can anyone confirm that this or similar works with avro 1.6.0, and/or point
>> me in the right direction concering where the problem may lie?
>>
>> Many thanks,
>>
>> Andrew

Re: Simple AvroStorage LOAD and STORE with Avro 1.6.0

Posted by Stan Rosenberg <sr...@proclivitysystems.com>.
Generally, AvroStorage works fine for us with Avro 1.6.  However, we
also patched AvroStorage on a couple of occasions, e.g., see PIG-2330.

stan

On Mon, Jan 9, 2012 at 3:47 PM, Russell Jurney <ru...@gmail.com> wrote:
> I could only make AvroStorage work with Avro 1.4.1.
>
> Russell Jurney
> twitter.com/rjurney
> russell.jurney@gmail.com
> datasyndrome.com
>
> On Jan 9, 2012, at 1:16 AM, Andrew Kenworthy <ad...@yahoo.com> wrote:
>
> Hallo,
>
> When I run a simple pig script to LOAD and STORE avro data, I get:-
>
> java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple cannot be
> cast to org.apache.avro.generic.IndexedRecord
>
> Script:
>
> REGISTER /tmp/avro-1.6.0.jar;
> --REGISTER  /tmp/avro-1.5.4.jar
> --REGISTER  /tmp/avro-1.4.1.jar;
>
> REGISTER /tmp/piggybank-0.9.1.jar;
> REGISTER /tmp/json-simple-1.1.jar;
> REGISTER /tmp/jackson-core-asl-1.8.4.jar;
> REGISTER /tmp/jackson-mapper-asl-1.8.4.jar;
>
> avroData=LOAD '$DATA_INPUTDIR' USING
> org.apache.pig.piggybank.storage.avro.AvroStorage();
>
> dataSubset = FOREACH avroData GENERATE myField1, myField2;
> describe  dataSubset;
> -----------------------------------------------
> -- shows:
> -- dataSubset : { myField1: int, myField2: int}
> -----------------------------------------------
> STORE dataSubset INTO '$OUTPUTDIR' USING
> org.apache.pig.piggybank.storage.avro.AvroStorage();
>
> If I use the 1.5.4 jar I get the same error, but the script works with the
> 1.4.1 version. If I just write one field, then it works with 1.6.0.
>
> I see there's been a related issue fixed here:
>
> https://issues.apache.org/jira/browse/PIG-2202
> https://issues.apache.org/jira/browse/PIG-2195
>
> Can anyone confirm that this or similar works with avro 1.6.0, and/or point
> me in the right direction concering where the problem may lie?
>
> Many thanks,
>
> Andrew

Re: Simple AvroStorage LOAD and STORE with Avro 1.6.0

Posted by Russell Jurney <ru...@gmail.com>.
I could only make AvroStorage work with Avro 1.4.1.

Russell Jurney
twitter.com/rjurney
russell.jurney@gmail.com
datasyndrome.com

On Jan 9, 2012, at 1:16 AM, Andrew Kenworthy <ad...@yahoo.com> wrote:

Hallo,

When I run a simple pig script to LOAD and STORE avro data, I get:-

java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple cannot be
cast to org.apache.avro.generic.IndexedRecord

Script:

REGISTER /tmp/avro-1.6.0.jar;
--REGISTER  /tmp/avro-1.5.4.jar
--REGISTER  /tmp/avro-1.4.1.jar;

REGISTER /tmp/piggybank-0.9.1.jar;
REGISTER /tmp/json-simple-1.1.jar;
REGISTER /tmp/jackson-core-asl-1.8.4.jar;
REGISTER /tmp/jackson-mapper-asl-1.8.4.jar;

avroData=LOAD '$DATA_INPUTDIR' USING
org.apache.pig.piggybank.storage.avro.AvroStorage();

dataSubset = FOREACH avroData GENERATE myField1, myField2;
describe  dataSubset;
-----------------------------------------------
-- shows:
-- dataSubset : { myField1: int, myField2: int}
-----------------------------------------------
STORE dataSubset INTO '$OUTPUTDIR' USING
org.apache.pig.piggybank.storage.avro.AvroStorage();

If I use the 1.5.4 jar I get the same error, but the script works with the
1.4.1 version. If I just write one field, then it works with 1.6.0.

I see there's been a related issue fixed here:

https://issues.apache.org/jira/browse/PIG-2202
https://issues.apache.org/jira/browse/PIG-2195

Can anyone confirm that this or similar works with avro 1.6.0, and/or point
me in the right direction concering where the problem may lie?

Many thanks,

Andrew

Re: Simple AvroStorage LOAD and STORE with Avro 1.6.0

Posted by Russell Jurney <ru...@gmail.com>.
I could only make AvroStorage work with Avro 1.4.1.

Russell Jurney
twitter.com/rjurney
russell.jurney@gmail.com
datasyndrome.com

On Jan 9, 2012, at 1:16 AM, Andrew Kenworthy <ad...@yahoo.com> wrote:

Hallo,

When I run a simple pig script to LOAD and STORE avro data, I get:-

java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple cannot be
cast to org.apache.avro.generic.IndexedRecord

Script:

REGISTER /tmp/avro-1.6.0.jar;
--REGISTER  /tmp/avro-1.5.4.jar
--REGISTER  /tmp/avro-1.4.1.jar;

REGISTER /tmp/piggybank-0.9.1.jar;
REGISTER /tmp/json-simple-1.1.jar;
REGISTER /tmp/jackson-core-asl-1.8.4.jar;
REGISTER /tmp/jackson-mapper-asl-1.8.4.jar;

avroData=LOAD '$DATA_INPUTDIR' USING
org.apache.pig.piggybank.storage.avro.AvroStorage();

dataSubset = FOREACH avroData GENERATE myField1, myField2;
describe  dataSubset;
-----------------------------------------------
-- shows:
-- dataSubset : { myField1: int, myField2: int}
-----------------------------------------------
STORE dataSubset INTO '$OUTPUTDIR' USING
org.apache.pig.piggybank.storage.avro.AvroStorage();

If I use the 1.5.4 jar I get the same error, but the script works with the
1.4.1 version. If I just write one field, then it works with 1.6.0.

I see there's been a related issue fixed here:

https://issues.apache.org/jira/browse/PIG-2202
https://issues.apache.org/jira/browse/PIG-2195

Can anyone confirm that this or similar works with avro 1.6.0, and/or point
me in the right direction concering where the problem may lie?

Many thanks,

Andrew