You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Derek Brown (JIRA)" <ji...@apache.org> on 2010/02/22 18:16:27 UTC

[jira] Commented: (PIG-1246) SequenceFileLoader problem with compressed values

    [ https://issues.apache.org/jira/browse/PIG-1246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12836731#action_12836731 ] 

Derek Brown commented on PIG-1246:
----------------------------------

Also- if I create a sequence file bzip2 compressed, with params of "CompressionType.BLOCK, new Bzip2Codec()" added to SequenceFile.createWriter(), that file is read in pig, with the SequenceFileLoader. It's using GzipCodec() that results in the output file that causes the error mentioned before.


> SequenceFileLoader problem with compressed values
> -------------------------------------------------
>
>                 Key: PIG-1246
>                 URL: https://issues.apache.org/jira/browse/PIG-1246
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Derek Brown
>            Assignee: Dmitriy V. Ryaboy
>
> I sent the following to the pig-users list, and Dmitriy said to open a ticket.
> http://mail-archives.apache.org/mod_mbox/hadoop-pig-user/201002.mbox/%3C357a70951002191451n6136a3en8475652fc0bd32c8@mail.gmail.com%3E
> > I'm having a problem getting the SequenceFileLoader, from the Piggybank, to
> > read sequence files whose values are block comressed (gzip'd). I'm using
> > Pig
> > 0.4.99.0+10, and Hadoop hadoop-0.20.1+152, via Cloudera.
> >
> > Did the following:
> >
> > * Copied the SequenceFileLoader class into my own project
> >
> > * Removed
> >
> > public LoadFunc.RequiredFieldResponse
> > fieldsToRead(LoadFunc.RequiredFieldList requiredFieldList)
> >
> > because LoadFunc.RequiredFieldList isn't resolvable, and added
> >
> > public void fieldsToRead(Schema schema)
> >
> > * Jarred up the .class file
> >
> > * Programmatically created a trivial sequence file of a few lines, with
> > IntWritable keys and Text values, using the basic code in an example in
> > Hadoop The Definitive Guide
> >
> > * That file is successfully read and keys/values displayed, with "hadoop fs
> > -text", as well as with pig, doing the following:
> >
> > grunt> register sequencefileloader.jar;
> > grunt> r = load '/path/to/sequence_file' using
> > com.foobar.SequenceFileLoader();
> > grunt> dump r;
> >
> > * The sequence file with the compressed values is successfully read with
> > hadoop fs -text
> >
> > * When doing the load step in pig with that file, the following results:
> >
> > --
> > 2010-02-19 16:59:14,489 [main] WARN
> >  org.apache.hadoop.util.NativeCodeLoader
> > - Unable to load native-hadoop library for your platform..
> > . using builtin-java classes where applicable
> > 2010-02-19 16:59:14,490 [main] INFO
> >  org.apache.hadoop.io.compress.CodecPool
> > - Got brand-new decompressor
> > 2010-02-19 16:59:14,498 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> > ERROR 1018: Problem determining schema during load
> > Details at logfile: /path/to/pig_1266616744562.log
> > --
> >
> > That log file contains the following:
> >
> > --
> > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error
> > during
> > parsing. Problem determining schema during load
> >        at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1037)
> >        at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:981)
> >        at org.apache.pig.PigServer.registerQuery(PigServer.java:383)
> >        at
> > org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:717)
> >        at
> >
> > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:273)
> >        at
> >
> > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
> >        at
> >
> > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142)
> >        at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
> >        at org.apache.pig.Main.main(Main.java:363)
> > Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException: Problem
> > determining schema during load
> >        at
> >
> > org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:734)
> >        at
> >
> > org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
> >        at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1031)
> >        ... 8 more
> > Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1018:
> > Problem determining schema during load
> >        at
> > org.apache.pig.impl.logicalLayer.LOLoad.getSchema(LOLoad.java:155)
> >        at
> >
> > org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:732)
> >        ... 10 more
> > Caused by: java.io.EOFException
> >        at java.util.zip.GZIPInputStream.readUByte(GZIPInputStream.java:207)
> >        at
> > java.util.zip.GZIPInputStream.readUShort(GZIPInputStream.java:197)
> >        at
> > java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:136)
> >        at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:58)
> >        at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:68)
> >        at
> >
> > org.apache.hadoop.io.compress.GzipCodec$GzipInputStream$ResetableGZIPInputStream.<init>(GzipCodec.java:92)
> >        at
> >
> > org.apache.hadoop.io.compress.GzipCodec$GzipInputStream.<init>(GzipCodec.java:101)
> >        at
> >
> > org.apache.hadoop.io.compress.GzipCodec.createInputStream(GzipCodec.java:169)
> >        at
> >
> > org.apache.hadoop.io.compress.GzipCodec.createInputStream(GzipCodec.java:179)
> >        at
> > org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1520)
> >        at
> > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1428)
> >        at
> > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
> >        at
> > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
> >        at
> > com.media6.SequenceFileLoader.inferReader(SequenceFileLoader.java:140)
> >        at
> > com.media6.SequenceFileLoader.determineSchema(SequenceFileLoader.java:106)
> >        at
> > org.apache.pig.impl.logicalLayer.LOLoad.getSchema(LOLoad.java:148)
> >        ... 11 more
> > --
> >
> > Maybe there's something that needs to be added to SequenceFileLoader to
> > account for the compressed values, which hadoop's "fs -text" accounts for.
> > Thanks for any ideas/pointers.
> >
> > Derek

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.