You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Espen Amble Kolstad <es...@trank.no> on 2007/06/01 12:49:45 UTC
Re: LzoCodec not working correctly?

Hi Arun,

TestCodec only writes once to the deflateFilter, that's why the test
works with LzoCodec.

I've tried out the change mentioned below, on streaming data as well as
compressing files, and it works.
I have one app that sends ~200MB (uncompressed size) lzo-compressed data
over http repeatedly. And it only works with the change.

- Espen

Arun C Murthy wrote:
> Espen Amble Kolstad wrote:
>> Hi,
>>
>> I changed LzoCompressor.finished() from:
>>
>>   public synchronized boolean finished() {
>>     // ...
>>     return (finished && compressedDirectBuf.remaining() == 0);
>>   }
>>
>> to:
>>
>>   public synchronized boolean finished() {
>>     // ...
>>     return (finish && compressedDirectBuf.remaining() == 0);
>>   }
>>
>> And it seems to work correctly now. I used CompressionCodecFactory.main
>> to test this. It failed before the change, and works after the change.
>> Both compress and decompress works.
>>
>> Could you verify Arun? I'll do some more testing.
>>
> 
> Sorry Espen, I've been busy with some 0.13.0 blockers...
> 
> It's been a while, I rechecked that part... that part of the code is
> correct. The 'finished' variable is set in the native code and needs to
> be checked to ensure all data is compressed/decompressed.
> 
> Could you take a look at TestCodec.java
> (src/test/org/apache/hadoop/io/compress) and see if there is something
> you can pick up from there? I'll keep looking at my end.
> 
> thanks,
> Arun
> 
>> thanks,
>> Espen
>>
>> Espen Amble Kolstad wrote:
>>
>>> Hi Arun,
>>>
>>> Arun C Murthy wrote:
>>>
>>>> Espen,
>>>>
>>>> On Thu, May 24, 2007 at 03:49:38PM +0200, Espen Amble Kolstad wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I've been trying to use LzoCodec to write a compressed file:
>>>>>
>>>>
>>>> Could you try this command:
>>>> $ bin/hadoop jar build/hadoop-0.12.4-dev-test.jar testsequencefile
>>>> -seed 0 -count 10000 -compressType RECORD blah.seq -codec
>>>> org.apache.hadoop.io.compress.LzoCodec -check
>>>
>>> This works like it should:
>>> 07/05/25 08:29:07 INFO io.SequenceFile: count = 10000
>>> 07/05/25 08:29:07 INFO io.SequenceFile: megabytes = 1
>>> 07/05/25 08:29:07 INFO io.SequenceFile: factor = 10
>>> 07/05/25 08:29:07 INFO io.SequenceFile: create = true
>>> 07/05/25 08:29:07 INFO io.SequenceFile: seed = 0
>>> 07/05/25 08:29:07 INFO io.SequenceFile: rwonly = false
>>> 07/05/25 08:29:07 INFO io.SequenceFile: check = true
>>> 07/05/25 08:29:07 INFO io.SequenceFile: fast = false
>>> 07/05/25 08:29:07 INFO io.SequenceFile: merge = false
>>> 07/05/25 08:29:07 INFO io.SequenceFile: compressType = RECORD
>>> 07/05/25 08:29:07 INFO io.SequenceFile: compressionCodec =
>>> org.apache.hadoop.io.compress.LzoCodec
>>> 07/05/25 08:29:07 INFO io.SequenceFile: file = blah.seq
>>> 07/05/25 08:29:07 INFO util.NativeCodeLoader: Loaded the native-hadoop
>>> library
>>> 07/05/25 08:29:07 INFO compress.LzoCodec: Successfully loaded &
>>> initialized native-lzo library
>>> 07/05/25 08:29:07 INFO io.SequenceFile: creating 10000 records with
>>> RECORD compression
>>> 07/05/25 08:29:13 INFO io.SequenceFile: writing intermediate results to
>>> /tmp/hadoop-espen/mapred/local/intermediate.1
>>> 07/05/25 08:29:15 INFO io.SequenceFile: done sorting 10000 debug
>>> 07/05/25 08:29:15 INFO io.SequenceFile: sorting 10000 records in memory
>>> for debug
>>>
>>> I think the difference, is that I try to write to the stream twice. It
>>> seems hadoop-code always writes all bytes at once.
>>>
>>> The code in LzoCompressor checks for userBufLen <= 0 and sets finished =
>>> true, userBufLen is set in setInput(). This results in that you can only
>>> write to the stream once?!
>>>
>>> - Espen
>>>
>>>
>>>> LzoCodec seems to work fine for me... maybe your FileOutputStream
>>>> was somehow corrupted?
>>>>
>>>> thanks,
>>>> Arun
>>>>
>>>>
>>>>> public class LzoTest {
>>>>>
>>>>>  public static void main(String[] args) throws Exception {
>>>>>     final LzoCodec codec = new LzoCodec();
>>>>>     codec.setConf(new Configuration());
>>>>>     final CompressionOutputStream out = codec.createOutputStream(new
>>>>> FileOutputStream("test.lzo"));
>>>>>     out.write("abc".getBytes());
>>>>>     out.write("def".getBytes());
>>>>>     out.close();
>>>>>  }
>>>>> }
>>>>>
>>>>> I get the following output:
>>>>>
>>>>> 07/05/24 15:44:22 INFO util.NativeCodeLoader: Loaded the native-hadoop
>>>>> library
>>>>> 07/05/24 15:44:22 INFO compress.LzoCodec: Successfully loaded &
>>>>> initialized native-lzo library
>>>>> Exception in thread "main" java.io.IOException: write beyond end of
>>>>> stream
>>>>>     at
>>>>> org.apache.hadoop.io.compress.BlockCompressorStream.write(BlockCompressorStream.java:68)
>>>>>
>>>>>     at java.io.OutputStream.write(OutputStream.java:58)
>>>>>     at no.trank.tI.LzoTest.main(LzoTest.java:19)
>>>>>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>     at
>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>>
>>>>>     at
>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>>
>>>>>     at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>     at
>>>>> com.intellij.rt.execution.application.AppMain.main(AppMain.java:90)
>>>>>
>>>>> Isn't it possible to use LzoCodec for this purpose, or is this a bug?
>>>>>
>>>>> - Espen
>>>
>>
>