You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Luke Nezda <ln...@gmail.com> on 2010/07/10 07:39:46 UTC
empty input array causes infinite loop in o.a.l.document.CompressionTools#decompress
(LUCENE-772 fix)
Hello -
Me and a co-worker actually got burned by this code twice in 1 week - once
via Lucene 2.2 (
http://svn.apache.org/repos/asf/lucene/java/tags/lucene_2_2_0/src/java/org/apache/lucene/index/FieldsReader.java)
and again via a copy of the same source (moved to
https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/src/java/org/apache/lucene/document/CompressionTools.java)
we were using elsewhere. Due to an edge case misuse of the
java.util.zip.Inflater API, an empty byte array sends {@link
org.apache.lucene.document.CompressionTools#decompress} into a tight,
infinite loop.
I realize this is degenerate case, but it happened to us 2 totally
different ways -- a more robust behavior like returning the input empty
array, or throwing an exception would be preferable. The following source
demonstrates the issue and a couple trivial solution alternatives.
I was scanning bugs -- this should solve
https://issues.apache.org/jira/browse/LUCENE-772
Kind regards,
- Luke
import java.util.zip.Inflater;
import java.util.zip.Deflater;
import java.util.zip.DataFormatException;
import java.io.ByteArrayOutputStream;
/** Demonstrates bad empty input edge case behavior of {@link
org.apache.lucene.document.CompressionTools#decompress} */
class Infiniflater {
/** Compresses the specified byte range using the
* specified compressionLevel (constants are defined in
* java.util.zip.Deflater). */
public static byte[] compress(byte[] value, int offset, int length, int
compressionLevel) {
/* Create an expandable byte array to hold the compressed data.
* You cannot use an array that's the same size as the orginal because
* there is no guarantee that the compressed data will be smaller than
* the uncompressed data. */
ByteArrayOutputStream bos = new ByteArrayOutputStream(length);
Deflater compressor = new Deflater();
try {
compressor.setLevel(compressionLevel);
compressor.setInput(value, offset, length);
compressor.finish();
// Compress the data
final byte[] buf = new byte[1024];
while (!compressor.finished()) {
int count = compressor.deflate(buf);
bos.write(buf, 0, count);
}
} finally {
compressor.end();
}
return bos.toByteArray();
}
/** Decompress the byte array previously returned by
* compress */
public static byte[] decompress(byte[] value) throws DataFormatException {
// Create an expandable byte array to hold the decompressed data
ByteArrayOutputStream bos = new ByteArrayOutputStream(value.length);
Inflater decompressor = new Inflater();
try {
decompressor.setInput(value);
// Decompress the data
final byte[] buf = new byte[1024];
//*** original form with potential for inifinite loop below
//while (!decompressor.finished()) {
// bos.write(buf, 0, count);
//}
//*** simple refined guard condition
//while (!decompressor.finished() && ! decompressor.needsInput()) {
// bos.write(buf, 0, count);
//}
//*** slightly more efficient (less synchronization (within
decompressor.needsInput()) for instance)
while (!decompressor.finished()) {
int count = decompressor.inflate(buf);
if (count == 0 && decompressor.needsInput()) {
break;
}
bos.write(buf, 0, count);
}
} finally {
decompressor.end();
}
return bos.toByteArray();
}
public static void main(String[] args) throws Exception {
// simplest demonstration of case which will cause infinite loop on
empty byte array input
final Inflater decompressor = new Inflater();
final byte[] noBytes = new byte[0];
decompressor.setInput(noBytes);
final boolean notFinished = false == decompressor.finished();
System.err.println("notFinished?: "+notFinished);
assert notFinished;
// exercise methods to roundtrip empty array
// since this "compresses" to 8 bytes, maybe we should just toss
DataFormatException or IOException
// if arg to decompress has length < 8 - would be better than current
infinite loop
final byte[] compressed = compress(noBytes, 0, noBytes.length,
Deflater.BEST_COMPRESSION);
assert compressed.length == 8 : "compressed.length: "+compressed.length;
final byte[] decompressed = decompress(compressed);
assert decompressed.length == 0 : "decompressed.length:
"+decompressed.length;
// spin condition
final byte[] output = decompress(noBytes);
assert output.length == 0;
}
}
Re: empty input array causes infinite loop in o.a.l.document.CompressionTools#decompress
(LUCENE-772 fix)
Posted by Luke Nezda <ln...@gmail.com>.
On Sat, Jul 10, 2010 at 7:32 AM, Robert Muir <rc...@gmail.com> wrote:
> I am curious why we have utility methods to zip compress byte arrays and
> strings in the lucene core at all?!...
Historical reasons -- compressed binary fields used to be a (ill conceived)
feature (pre-exclusive opaque byte array fields).
> i think we should get rid of compressionutils completely.
>
Agreed going forward - they are no longer used in trunk (except in
TestBinaryDocument), and the code serves as a bad example for others :). As
for LUCENE-772, I guess it can be closed. Hopefully this isn't coming up
too often for folks using older releases for whatever reason (time/cost of
upgrade).
- Luke
>
> On Sat, Jul 10, 2010 at 1:39 AM, Luke Nezda <ln...@gmail.com> wrote:
>
>> Hello -
>>
>> Me and a co-worker actually got burned by this code twice in 1 week - once
>> via Lucene 2.2 (
>> http://svn.apache.org/repos/asf/lucene/java/tags/lucene_2_2_0/src/java/org/apache/lucene/index/FieldsReader.java)
>> and again via a copy of the same source (moved to
>> https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/src/java/org/apache/lucene/document/CompressionTools.java)
>> we were using elsewhere. Due to an edge case misuse of the
>> java.util.zip.Inflater API, an empty byte array sends {@link
>> org.apache.lucene.document.CompressionTools#decompress} into a tight,
>> infinite loop.
>> I realize this is degenerate case, but it happened to us 2 totally
>> different ways -- a more robust behavior like returning the input empty
>> array, or throwing an exception would be preferable. The following source
>> demonstrates the issue and a couple trivial solution alternatives.
>>
>> I was scanning bugs -- this should solve
>> https://issues.apache.org/jira/browse/LUCENE-772
>>
>> Kind regards,
>> - Luke
>>
>> import java.util.zip.Inflater;
>> import java.util.zip.Deflater;
>> import java.util.zip.DataFormatException;
>> import java.io.ByteArrayOutputStream;
>>
>> /** Demonstrates bad empty input edge case behavior of {@link
>> org.apache.lucene.document.CompressionTools#decompress} */
>> class Infiniflater {
>> /** Compresses the specified byte range using the
>> * specified compressionLevel (constants are defined in
>> * java.util.zip.Deflater). */
>> public static byte[] compress(byte[] value, int offset, int length, int
>> compressionLevel) {
>> /* Create an expandable byte array to hold the compressed data.
>> * You cannot use an array that's the same size as the orginal because
>> * there is no guarantee that the compressed data will be smaller than
>> * the uncompressed data. */
>> ByteArrayOutputStream bos = new ByteArrayOutputStream(length);
>>
>> Deflater compressor = new Deflater();
>>
>> try {
>> compressor.setLevel(compressionLevel);
>> compressor.setInput(value, offset, length);
>> compressor.finish();
>>
>> // Compress the data
>> final byte[] buf = new byte[1024];
>> while (!compressor.finished()) {
>> int count = compressor.deflate(buf);
>> bos.write(buf, 0, count);
>> }
>> } finally {
>> compressor.end();
>> }
>>
>> return bos.toByteArray();
>> }
>>
>> /** Decompress the byte array previously returned by
>> * compress */
>> public static byte[] decompress(byte[] value) throws DataFormatException
>> {
>> // Create an expandable byte array to hold the decompressed data
>> ByteArrayOutputStream bos = new ByteArrayOutputStream(value.length);
>>
>> Inflater decompressor = new Inflater();
>>
>> try {
>> decompressor.setInput(value);
>>
>> // Decompress the data
>> final byte[] buf = new byte[1024];
>> //*** original form with potential for inifinite loop below
>> //while (!decompressor.finished()) {
>> // bos.write(buf, 0, count);
>> //}
>> //*** simple refined guard condition
>> //while (!decompressor.finished() && ! decompressor.needsInput()) {
>> // bos.write(buf, 0, count);
>> //}
>> //*** slightly more efficient (less synchronization (within
>> decompressor.needsInput()) for instance)
>> while (!decompressor.finished()) {
>> int count = decompressor.inflate(buf);
>> if (count == 0 && decompressor.needsInput()) {
>> break;
>> }
>> bos.write(buf, 0, count);
>> }
>> } finally {
>> decompressor.end();
>> }
>>
>> return bos.toByteArray();
>> }
>>
>> public static void main(String[] args) throws Exception {
>> // simplest demonstration of case which will cause infinite loop on
>> empty byte array input
>> final Inflater decompressor = new Inflater();
>> final byte[] noBytes = new byte[0];
>> decompressor.setInput(noBytes);
>> final boolean notFinished = false == decompressor.finished();
>> System.err.println("notFinished?: "+notFinished);
>> assert notFinished;
>>
>> // exercise methods to roundtrip empty array
>> // since this "compresses" to 8 bytes, maybe we should just toss
>> DataFormatException or IOException
>> // if arg to decompress has length < 8 - would be better than current
>> infinite loop
>> final byte[] compressed = compress(noBytes, 0, noBytes.length,
>> Deflater.BEST_COMPRESSION);
>> assert compressed.length == 8 : "compressed.length:
>> "+compressed.length;
>> final byte[] decompressed = decompress(compressed);
>> assert decompressed.length == 0 : "decompressed.length:
>> "+decompressed.length;
>>
>> // spin condition
>> final byte[] output = decompress(noBytes);
>> assert output.length == 0;
>> }
>> }
>>
>>
>
>
> --
> Robert Muir
> rcmuir@gmail.com
>
Re: empty input array causes infinite loop in o.a.l.document.CompressionTools#decompress
(LUCENE-772 fix)
Posted by Robert Muir <rc...@gmail.com>.
I am curious why we have utility methods to zip compress byte arrays and
strings in the lucene core at all?!...
i think we should get rid of compressionutils completely.
On Sat, Jul 10, 2010 at 1:39 AM, Luke Nezda <ln...@gmail.com> wrote:
> Hello -
>
> Me and a co-worker actually got burned by this code twice in 1 week - once
> via Lucene 2.2 (
> http://svn.apache.org/repos/asf/lucene/java/tags/lucene_2_2_0/src/java/org/apache/lucene/index/FieldsReader.java)
> and again via a copy of the same source (moved to
> https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/src/java/org/apache/lucene/document/CompressionTools.java)
> we were using elsewhere. Due to an edge case misuse of the
> java.util.zip.Inflater API, an empty byte array sends {@link
> org.apache.lucene.document.CompressionTools#decompress} into a tight,
> infinite loop.
> I realize this is degenerate case, but it happened to us 2 totally
> different ways -- a more robust behavior like returning the input empty
> array, or throwing an exception would be preferable. The following source
> demonstrates the issue and a couple trivial solution alternatives.
>
> I was scanning bugs -- this should solve
> https://issues.apache.org/jira/browse/LUCENE-772
>
> Kind regards,
> - Luke
>
> import java.util.zip.Inflater;
> import java.util.zip.Deflater;
> import java.util.zip.DataFormatException;
> import java.io.ByteArrayOutputStream;
>
> /** Demonstrates bad empty input edge case behavior of {@link
> org.apache.lucene.document.CompressionTools#decompress} */
> class Infiniflater {
> /** Compresses the specified byte range using the
> * specified compressionLevel (constants are defined in
> * java.util.zip.Deflater). */
> public static byte[] compress(byte[] value, int offset, int length, int
> compressionLevel) {
> /* Create an expandable byte array to hold the compressed data.
> * You cannot use an array that's the same size as the orginal because
> * there is no guarantee that the compressed data will be smaller than
> * the uncompressed data. */
> ByteArrayOutputStream bos = new ByteArrayOutputStream(length);
>
> Deflater compressor = new Deflater();
>
> try {
> compressor.setLevel(compressionLevel);
> compressor.setInput(value, offset, length);
> compressor.finish();
>
> // Compress the data
> final byte[] buf = new byte[1024];
> while (!compressor.finished()) {
> int count = compressor.deflate(buf);
> bos.write(buf, 0, count);
> }
> } finally {
> compressor.end();
> }
>
> return bos.toByteArray();
> }
>
> /** Decompress the byte array previously returned by
> * compress */
> public static byte[] decompress(byte[] value) throws DataFormatException
> {
> // Create an expandable byte array to hold the decompressed data
> ByteArrayOutputStream bos = new ByteArrayOutputStream(value.length);
>
> Inflater decompressor = new Inflater();
>
> try {
> decompressor.setInput(value);
>
> // Decompress the data
> final byte[] buf = new byte[1024];
> //*** original form with potential for inifinite loop below
> //while (!decompressor.finished()) {
> // bos.write(buf, 0, count);
> //}
> //*** simple refined guard condition
> //while (!decompressor.finished() && ! decompressor.needsInput()) {
> // bos.write(buf, 0, count);
> //}
> //*** slightly more efficient (less synchronization (within
> decompressor.needsInput()) for instance)
> while (!decompressor.finished()) {
> int count = decompressor.inflate(buf);
> if (count == 0 && decompressor.needsInput()) {
> break;
> }
> bos.write(buf, 0, count);
> }
> } finally {
> decompressor.end();
> }
>
> return bos.toByteArray();
> }
>
> public static void main(String[] args) throws Exception {
> // simplest demonstration of case which will cause infinite loop on
> empty byte array input
> final Inflater decompressor = new Inflater();
> final byte[] noBytes = new byte[0];
> decompressor.setInput(noBytes);
> final boolean notFinished = false == decompressor.finished();
> System.err.println("notFinished?: "+notFinished);
> assert notFinished;
>
> // exercise methods to roundtrip empty array
> // since this "compresses" to 8 bytes, maybe we should just toss
> DataFormatException or IOException
> // if arg to decompress has length < 8 - would be better than current
> infinite loop
> final byte[] compressed = compress(noBytes, 0, noBytes.length,
> Deflater.BEST_COMPRESSION);
> assert compressed.length == 8 : "compressed.length:
> "+compressed.length;
> final byte[] decompressed = decompress(compressed);
> assert decompressed.length == 0 : "decompressed.length:
> "+decompressed.length;
>
> // spin condition
> final byte[] output = decompress(noBytes);
> assert output.length == 0;
> }
> }
>
>
--
Robert Muir
rcmuir@gmail.com