You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Parag Arora <pa...@webaroo.com> on 2010/10/15 13:15:28 UTC

Need help to ignore corrupted gzipped files while doing a query

Hello

I have a small query and need little help on the same. I have a hive table
which loads its data from files partitioned by timestamp (every 15 minutes)
and placed there in gzipped format. There may be some gzip files which are
corrupted (while transferring files, network error etc. may have resulted a
corrupted file).

Now, when I run any job on this table trying to dump data to another table,
my hive job starts with the following error:
FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.ExecDriver

*Is there a way to catch this error so that I can ignore corrupted files and
still get the job completed?*

*Hadoop log* shows that error is related while uncompressing my gzipped file
if I am right.
*
2010-10-15 10:38:48,027 INFO org.apache.hadoop.mapred.TaskInProgress (IPC
Server handler 2 on 9001): Error from attempt_201010150837_0002_m_000041_0:
java.io.EOFException: Unexpected end of input stream
        at
org.apache.hadoop.io.compress.DecompressorStream.getCompressedData(DecompressorStream.java:98)
        at
org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:86)
        at
org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:74)
        at java.io.InputStream.read(InputStream.java:85)
        at
org.apache.hadoop.mapred.LineRecordReader$LineReader.backfill(LineRecordReader.java:94)
        at
org.apache.hadoop.mapred.LineRecordReader$LineReader.readLine(LineRecordReader.java:124)
        at
org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:266)
        at
org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:39)
        at
org.apache.hadoop.hive.ql.io.HiveRecordReader.next(HiveRecordReader.java:58)
        at
org.apache.hadoop.hive.ql.io.HiveRecordReader.next(HiveRecordReader.java:27)
        at
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:167)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:45)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:231)
        at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2216)



*