You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Jeff Zhang <zj...@gmail.com> on 2011/09/22 01:42:10 UTC

Re: Pig duplicate records

Seems this is a pig bug. Maybe it is caused by AvroStorage.
According the log, it said pig read 4 records, and output 4 records.



On Wed, Sep 21, 2011 at 1:55 PM, Scott Carey <sc...@apache.org> wrote:

> You will want to ask the pig user mailing list this question.
>
> org.apache.pig.piggybank.storage.avro.AvroStorage is maintained by the Pig
> project and you will get more help from there.
>
> On 9/21/11 4:34 AM, "Alex Holmes" <gr...@gmail.com> wrote:
>
> >Hi all,
> >
> >I have a simple schema
> >
> >{"name": "Record", "type": "record",
> >  "fields": [
> >    {"name": "name", "type": "string"},
> >    {"name": "id", "type": "int"}
> >  ]
> >}
> >
> >which I use to write 2 records to an Avro file, and my reader code
> >(which reads the file and dumps the records) verifies that there are 2
> >records in the file:
> >
> >Record@1e9e5c73[name=r1,id=1]
> >Record@ed42d08[name=r2,id=2]
> >
> >When using this file with pig and AvroStorage, pig seems to think
> >there are 4 records:
> >
> >grunt> REGISTER /app/hadoop/lib/avro-1.5.4.jar;
> >grunt> REGISTER /app/pig-0.9.0/contrib/piggybank/java/piggybank.jar;
> >grunt> REGISTER /app/pig-0.9.0/build/ivy/lib/Pig/json-simple-1.1.jar;
> >grunt> REGISTER
> >/app/pig-0.9.0/build/ivy/lib/Pig/jackson-core-asl-1.6.0.jar;
> >grunt> REGISTER
> >/app/pig-0.9.0/build/ivy/lib/Pig/jackson-mapper-asl-1.6.0.jar;
> >grunt> raw = LOAD 'test.v1.avro' USING
> >org.apache.pig.piggybank.storage.avro.AvroStorage;
> >grunt> dump raw;
> >..
> >Input(s):
> >Successfully read 4 records (825 bytes) from:
> >"hdfs://localhost:9000/user/aholmes/test.v1.avro"
> >
> >Output(s):
> >Successfully stored 4 records (46 bytes) in:
> >"hdfs://localhost:9000/tmp/temp2039109003/tmp1924774585"
> >
> >Counters:
> >Total records written : 4
> >Total bytes written : 46
> >..
> >(r1,1)
> >(r2,2)
> >(r1,1)
> >(r2,2)
> >
> >I'm sure I'm doing something wrong (again)!
> >
> >Many thanks,
> >Alex
>
>
>


-- 
Best Regards

Jeff Zhang