You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Viraj Bhat (JIRA)" <ji...@apache.org> on 2015/04/06 23:40:12 UTC

[jira] [Updated] (PIG-4498) AvroStorage in Piggbank does not handle bad records and fails

     [ https://issues.apache.org/jira/browse/PIG-4498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Viraj Bhat updated PIG-4498:
----------------------------
    Affects Version/s: 0.13.1
                       0.12.0

> AvroStorage in Piggbank does not handle bad records and fails
> -------------------------------------------------------------
>
>                 Key: PIG-4498
>                 URL: https://issues.apache.org/jira/browse/PIG-4498
>             Project: Pig
>          Issue Type: Bug
>          Components: piggybank
>    Affects Versions: 0.12.0, 0.11.1, 0.13.1, 0.14.1
>            Reporter: Viraj Bhat
>            Assignee: Viraj Bhat
>             Fix For: 0.14.1
>
>
> The following Pig script fails if the records within the file are corrupted.
> {code}
> DEFINE AvroLoader org.apache.pig.piggybank.storage.avro.AvroStorage('ignore_bad_files');
>  DH_RAW = LOAD 'bad_data*' USING AvroLoader();
> STORE DH_RAW INTO 'output' USING PigStorage();
> {code}
> Here is the stack trace:
> {quote}
> java.lang.ArrayIndexOutOfBoundsException: -49 at org.apache.pig.piggybank.storage.avro.PigAvroRecordReader.getCurrentValue(PigAvroRecordReader.java:230) at org.apache.pig.piggybank.storage.avro.AvroStorage.getNext(AvroStorage.java:407) ... 12 more Caused by: java.lang.ArrayIndexOutOfBoundsException: -49 at org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:364) at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229) at org.apache.avro.io.parsing.Parser.advance(Parser.java:88) at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152) at org.apache.pig.piggybank.storage.avro.PigAvroDatumReader.readMap(PigAvroDatumReader.java:89) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151) at org.apache.pig.piggybank.storage.avro.PigAvroDatumReader.readRecord(PigAvroDatumReader.java:73) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148) at org.apache.pig.piggybank.storage.avro.PigAvroDatumReader.readRecord(PigAvroDatumReader.java:73) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:139) at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233) at org.apache.avro.file.DataFileStream.next(DataFileStream.java:220) at org.apache.pig.piggybank.storage.avro.PigAvroRecordReader.getCurrentValue(PigAvroRecordReader.java:198) ..
> {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)