You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org> on 2010/02/22 17:58:27 UTC
[jira] Assigned: (PIG-1246) SequenceFileLoader problem with compressed values

     [ https://issues.apache.org/jira/browse/PIG-1246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dmitriy V. Ryaboy reassigned PIG-1246:
--------------------------------------

    Assignee: Dmitriy V. Ryaboy

> SequenceFileLoader problem with compressed values
> -------------------------------------------------
>
>                 Key: PIG-1246
>                 URL: https://issues.apache.org/jira/browse/PIG-1246
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Derek Brown
>            Assignee: Dmitriy V. Ryaboy
>
> I sent the following to the pig-users list, and Dmitriy said to open a ticket.
> http://mail-archives.apache.org/mod_mbox/hadoop-pig-user/201002.mbox/%3C357a70951002191451n6136a3en8475652fc0bd32c8@mail.gmail.com%3E
> > I'm having a problem getting the SequenceFileLoader, from the Piggybank, to
> > read sequence files whose values are block comressed (gzip'd). I'm using
> > Pig
> > 0.4.99.0+10, and Hadoop hadoop-0.20.1+152, via Cloudera.
> >
> > Did the following:
> >
> > * Copied the SequenceFileLoader class into my own project
> >
> > * Removed
> >
> > public LoadFunc.RequiredFieldResponse
> > fieldsToRead(LoadFunc.RequiredFieldList requiredFieldList)
> >
> > because LoadFunc.RequiredFieldList isn't resolvable, and added
> >
> > public void fieldsToRead(Schema schema)
> >
> > * Jarred up the .class file
> >
> > * Programmatically created a trivial sequence file of a few lines, with
> > IntWritable keys and Text values, using the basic code in an example in
> > Hadoop The Definitive Guide
> >
> > * That file is successfully read and keys/values displayed, with "hadoop fs
> > -text", as well as with pig, doing the following:
> >
> > grunt> register sequencefileloader.jar;
> > grunt> r = load '/path/to/sequence_file' using
> > com.foobar.SequenceFileLoader();
> > grunt> dump r;
> >
> > * The sequence file with the compressed values is successfully read with
> > hadoop fs -text
> >
> > * When doing the load step in pig with that file, the following results:
> >
> > --
> > 2010-02-19 16:59:14,489 [main] WARN
> >  org.apache.hadoop.util.NativeCodeLoader
> > - Unable to load native-hadoop library for your platform..
> > . using builtin-java classes where applicable
> > 2010-02-19 16:59:14,490 [main] INFO
> >  org.apache.hadoop.io.compress.CodecPool
> > - Got brand-new decompressor
> > 2010-02-19 16:59:14,498 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> > ERROR 1018: Problem determining schema during load
> > Details at logfile: /path/to/pig_1266616744562.log
> > --
> >
> > That log file contains the following:
> >
> > --
> > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error
> > during
> > parsing. Problem determining schema during load
> >        at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1037)
> >        at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:981)
> >        at org.apache.pig.PigServer.registerQuery(PigServer.java:383)
> >        at
> > org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:717)
> >        at
> >
> > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:273)
> >        at
> >
> > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
> >        at
> >
> > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142)
> >        at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
> >        at org.apache.pig.Main.main(Main.java:363)
> > Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException: Problem
> > determining schema during load
> >        at
> >
> > org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:734)
> >        at
> >
> > org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
> >        at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1031)
> >        ... 8 more
> > Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1018:
> > Problem determining schema during load
> >        at
> > org.apache.pig.impl.logicalLayer.LOLoad.getSchema(LOLoad.java:155)
> >        at
> >
> > org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:732)
> >        ... 10 more
> > Caused by: java.io.EOFException
> >        at java.util.zip.GZIPInputStream.readUByte(GZIPInputStream.java:207)
> >        at
> > java.util.zip.GZIPInputStream.readUShort(GZIPInputStream.java:197)
> >        at
> > java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:136)
> >        at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:58)
> >        at java.util.zip.GZIPInputStream.<init>(GZIPInputStream.java:68)
> >        at
> >
> > org.apache.hadoop.io.compress.GzipCodec$GzipInputStream$ResetableGZIPInputStream.<init>(GzipCodec.java:92)
> >        at
> >
> > org.apache.hadoop.io.compress.GzipCodec$GzipInputStream.<init>(GzipCodec.java:101)
> >        at
> >
> > org.apache.hadoop.io.compress.GzipCodec.createInputStream(GzipCodec.java:169)
> >        at
> >
> > org.apache.hadoop.io.compress.GzipCodec.createInputStream(GzipCodec.java:179)
> >        at
> > org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1520)
> >        at
> > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1428)
> >        at
> > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
> >        at
> > org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
> >        at
> > com.media6.SequenceFileLoader.inferReader(SequenceFileLoader.java:140)
> >        at
> > com.media6.SequenceFileLoader.determineSchema(SequenceFileLoader.java:106)
> >        at
> > org.apache.pig.impl.logicalLayer.LOLoad.getSchema(LOLoad.java:148)
> >        ... 11 more
> > --
> >
> > Maybe there's something that needs to be added to SequenceFileLoader to
> > account for the compressed values, which hadoop's "fs -text" accounts for.
> > Thanks for any ideas/pointers.
> >
> > Derek

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.