You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@avro.apache.org by bl...@apache.org on 2016/09/12 16:08:21 UTC

avro git commit: AVRO-1873: Add CRC32 checksum to Snappy-compressed blocks.

Repository: avro
Updated Branches:
  refs/heads/master 4c992a587 -> 832512edc


AVRO-1873: Add CRC32 checksum to Snappy-compressed blocks.

Java and other implementations require this CRC32 checksum of the
uncompressed content in order to read the data. This implements the
checksum, with backward-compatibility for files written by old versions
of avro-ruby. If the checksum doesn't match or if decompression fails
with the last 4 bytes removed as the checksum, avro-ruby will decompress
the incoming bytes and pass them on assuming that the file is from an
old reader.

Closes #121.


Project: http://git-wip-us.apache.org/repos/asf/avro/repo
Commit: http://git-wip-us.apache.org/repos/asf/avro/commit/832512ed
Tree: http://git-wip-us.apache.org/repos/asf/avro/tree/832512ed
Diff: http://git-wip-us.apache.org/repos/asf/avro/diff/832512ed

Branch: refs/heads/master
Commit: 832512edcd7591c238c35b5a479e15ac0709e4cb
Parents: 4c992a5
Author: Ryan Blue <bl...@apache.org>
Authored: Sat Sep 10 15:57:30 2016 -0700
Committer: Ryan Blue <bl...@apache.org>
Committed: Mon Sep 12 09:08:07 2016 -0700

----------------------------------------------------------------------
 CHANGES.txt                     |  3 +++
 lang/ruby/lib/avro/data_file.rb | 19 ++++++++++++++++++-
 lang/ruby/test/test_io.rb       | 11 +++++++++++
 3 files changed, 32 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/avro/blob/832512ed/CHANGES.txt
----------------------------------------------------------------------
diff --git a/CHANGES.txt b/CHANGES.txt
index b88e798..253c356 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -51,6 +51,9 @@ Trunk (not yet released)
     AVRO-1908: Fix TestSpecificCompiler reference to private method.
     (blue)
 
+    AVRO-1873: Ruby: Add CRC32 checksum to Snappy-compressed blocks.
+    (blue)
+
 Avro 1.8.1 (14 May 2016)
 
   INCOMPATIBLE CHANGES

http://git-wip-us.apache.org/repos/asf/avro/blob/832512ed/lang/ruby/lib/avro/data_file.rb
----------------------------------------------------------------------
diff --git a/lang/ruby/lib/avro/data_file.rb b/lang/ruby/lib/avro/data_file.rb
index c27c2dc..e465055 100644
--- a/lang/ruby/lib/avro/data_file.rb
+++ b/lang/ruby/lib/avro/data_file.rb
@@ -338,12 +338,29 @@ module Avro
 
       def decompress(data)
         load_snappy!
+        crc32 = data.slice(-4..-1).unpack('N').first
+        uncompressed = Snappy.inflate(data.slice(0..-5))
+
+        if crc32 == Zlib.crc32(uncompressed)
+          uncompressed
+        else
+          # older versions of avro-ruby didn't write the checksum, so if it
+          # doesn't match this must assume that it wasn't there and return
+          # the entire payload uncompressed.
+          Snappy.inflate(data)
+        end
+      rescue Snappy::Error
+        # older versions of avro-ruby didn't write the checksum, so removing
+        # the last 4 bytes may cause Snappy to fail. recover by assuming the
+        # payload is from an older file and uncompress the entire buffer.
         Snappy.inflate(data)
       end
 
       def compress(data)
         load_snappy!
-        Snappy.deflate(data)
+        crc32 = Zlib.crc32(data)
+        compressed = Snappy.deflate(data)
+        [compressed, crc32].pack('a*N')
       end
 
       private

http://git-wip-us.apache.org/repos/asf/avro/blob/832512ed/lang/ruby/test/test_io.rb
----------------------------------------------------------------------
diff --git a/lang/ruby/test/test_io.rb b/lang/ruby/test/test_io.rb
index 153cb94..09d725d 100644
--- a/lang/ruby/test/test_io.rb
+++ b/lang/ruby/test/test_io.rb
@@ -340,6 +340,17 @@ EOS
       assert_equal(incorrect, 0)
     end
   end
+
+  def test_snappy_backward_compat
+    # a snappy-compressed block payload without the checksum
+    # this has no back-references, just one literal so the last 9
+    # bytes are the uncompressed payload.
+    old_snappy_bytes = "\x09\x20\x02\x06\x02\x0a\x67\x72\x65\x65\x6e"
+    uncompressed_bytes = "\x02\x06\x02\x0a\x67\x72\x65\x65\x6e"
+    snappy = Avro::DataFile::SnappyCodec.new
+    assert_equal(uncompressed_bytes, snappy.decompress(old_snappy_bytes))
+  end
+
   private
 
   def check_no_default(schema_json)