You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Alexander Behm (JIRA)" <ji...@apache.org> on 2017/09/26 22:07:00 UTC

[jira] [Created] (IMPALA-5987) LZ4 Codec silently produces bogus compressed data for large inputs

Alexander Behm created IMPALA-5987:
--------------------------------------

             Summary: LZ4 Codec silently produces bogus compressed data for large inputs
                 Key: IMPALA-5987
                 URL: https://issues.apache.org/jira/browse/IMPALA-5987
             Project: IMPALA
          Issue Type: Bug
          Components: Backend
    Affects Versions: Impala 2.9.0, Impala 2.8.0, Impala 2.7.0, Impala 2.6.0, Impala 2.5.0, Impala 2.10.0
            Reporter: Alexander Behm
            Priority: Critical


LZ4 has a built-in limit on the payload size that it can successfully compress. This limit can be indirectly checked via LZ4_compressBound(), but our code does not properly handle when LZ4_compressBound() returns 0 (which means the payload is too big).

As a result, large payloads are compressed to a bogus result. The bogus result even decompresses successfully - but not to the data that was originally compressed.

Relevant LZ4 code snippet:
https://github.com/lz4/lz4/blob/dev/lib/lz4.h#L153

Reproduction:
Add the following test case to decompress-test.cc
{code}
TEST_F(DecompressorTest, LZ4Huge) {
  // Generate a big random payload.
  int payload_len = numeric_limits<int>::max();
  uint8_t* payload = new uint8_t[payload_len];
  for (int i = 0 ; i < payload_len; ++i) payload[i] = rand();

  scoped_ptr<Codec> compressor;
  EXPECT_OK(Codec::CreateCompressor(nullptr, true, impala::THdfsCompression::LZ4,
      &compressor));

  // The returned max_size is 0 because the payload is too big.
  int64_t max_size = compressor->MaxOutputLen(payload_len);

  // Compression succeeds!
  int64_t compressed_len = max_size;
  uint8_t* compressed = new uint8_t[max_size];
  EXPECT_OK(compressor->ProcessBlock(true, payload_len, payload,
      &compressed_len, &compressed));

  // Decompression succeeds!
  scoped_ptr<Codec> decompressor;
  EXPECT_OK(Codec::CreateDecompressor(nullptr, true, impala::THdfsCompression::LZ4,
      &decompressor));
  int64_t decompressed_len = compressed_len;
  uint8_t* decompressed = new uint8_t[compressed_len];
  EXPECT_OK(decompressor->ProcessBlock(true, compressed_len,
      compressed, &decompressed_len, &decompressed));

  // Assert fails. The uncompressed data is not the same as the original payload.
  ASSERT_EQ(memcmp(payload, decompressed, payload_len), 0);
}
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)