You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Pedro Costa <ps...@gmail.com> on 2011/02/16 22:37:04 UTC
How read compressed files?
Hi,
1 - I'm trying to read parts of a compressed file to generate message
digests, but I can't fetch the right parts. I searched for an example
that read compressed files, but I can't find one.
As I've 3 partition in my example, below are the indexes of the file:
raw bytes: 54632 / offset: 0 / partLength: 20307
raw bytes: 53771 / offset: 20307 / partLength: 19882
raw bytes: 53568 / offset: 40189 / partLength: 19814
Here's my code:
[code]
readCompressedFile(InputStream input) {
decompressor.reset();
CompressionInputStream input2 = codec.createInputStream(input,
decompressor);
IndexRecord index = spillRec.getIndex(part);
long size = index.rawLength;
//long size2 = index.partLength;
long offset = index.startOffset;
hash[part] = hashGen.generateHash(input2, (int) offset, (int) size);
}
public String generateHash(CompressionInputStream input, int offset,
int mapOutputLength) {
MessageDigest md = null;
StringBuffer buf = new StringBuffer();
try {
md = MessageDigest.getInstance("SHA-1");
int totalBytes= 0;
int size = mapOutputLength < (60 * 1024) ? mapOutputLength : (60*1024);
byte[] buffer = new byte[size];
int n = input.read(buffer, 0, size);
if(n > 0)
md.update(buffer);
while (n > 0) {
totalBytes += n;
mapOutputLength -= n;
// the case that the bytes read is small the the default size.
// We don't want that the message digest contains trash.
size = mapOutputLength < (60 * 1024) ? mapOutputLength : (60*1024);
if(size == 0)
break;
buffer = new byte[size];
n = input.read(buffer, 0, size);
if(n > 0) {
md.update(buffer);
}
}
System.out.println("END: " + totalBytes + " - ");
// DO THE HASH
} catch (NoSuchAlgorithmException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
return HASH;
}
[/code]
I can't get the right portions of the compressed file, and I don't
know why. What am I doing wrong?
2 - When I'm reading a compressed file with the CompressionInputStream class,
CompressionInputStream input2 = codec.createInputStream(input, decompressor);
means that, when I call the method "read", I'm reading uncompressed data?
Thanks,
--
Pedro