You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@avro.apache.org by bl...@apache.org on 2016/09/12 16:08:21 UTC
avro git commit: AVRO-1873: Add CRC32 checksum to Snappy-compressed
blocks.
Repository: avro
Updated Branches:
refs/heads/master 4c992a587 -> 832512edc
AVRO-1873: Add CRC32 checksum to Snappy-compressed blocks.
Java and other implementations require this CRC32 checksum of the
uncompressed content in order to read the data. This implements the
checksum, with backward-compatibility for files written by old versions
of avro-ruby. If the checksum doesn't match or if decompression fails
with the last 4 bytes removed as the checksum, avro-ruby will decompress
the incoming bytes and pass them on assuming that the file is from an
old reader.
Closes #121.
Project: http://git-wip-us.apache.org/repos/asf/avro/repo
Commit: http://git-wip-us.apache.org/repos/asf/avro/commit/832512ed
Tree: http://git-wip-us.apache.org/repos/asf/avro/tree/832512ed
Diff: http://git-wip-us.apache.org/repos/asf/avro/diff/832512ed
Branch: refs/heads/master
Commit: 832512edcd7591c238c35b5a479e15ac0709e4cb
Parents: 4c992a5
Author: Ryan Blue <bl...@apache.org>
Authored: Sat Sep 10 15:57:30 2016 -0700
Committer: Ryan Blue <bl...@apache.org>
Committed: Mon Sep 12 09:08:07 2016 -0700
----------------------------------------------------------------------
CHANGES.txt | 3 +++
lang/ruby/lib/avro/data_file.rb | 19 ++++++++++++++++++-
lang/ruby/test/test_io.rb | 11 +++++++++++
3 files changed, 32 insertions(+), 1 deletion(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/avro/blob/832512ed/CHANGES.txt
----------------------------------------------------------------------
diff --git a/CHANGES.txt b/CHANGES.txt
index b88e798..253c356 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -51,6 +51,9 @@ Trunk (not yet released)
AVRO-1908: Fix TestSpecificCompiler reference to private method.
(blue)
+ AVRO-1873: Ruby: Add CRC32 checksum to Snappy-compressed blocks.
+ (blue)
+
Avro 1.8.1 (14 May 2016)
INCOMPATIBLE CHANGES
http://git-wip-us.apache.org/repos/asf/avro/blob/832512ed/lang/ruby/lib/avro/data_file.rb
----------------------------------------------------------------------
diff --git a/lang/ruby/lib/avro/data_file.rb b/lang/ruby/lib/avro/data_file.rb
index c27c2dc..e465055 100644
--- a/lang/ruby/lib/avro/data_file.rb
+++ b/lang/ruby/lib/avro/data_file.rb
@@ -338,12 +338,29 @@ module Avro
def decompress(data)
load_snappy!
+ crc32 = data.slice(-4..-1).unpack('N').first
+ uncompressed = Snappy.inflate(data.slice(0..-5))
+
+ if crc32 == Zlib.crc32(uncompressed)
+ uncompressed
+ else
+ # older versions of avro-ruby didn't write the checksum, so if it
+ # doesn't match this must assume that it wasn't there and return
+ # the entire payload uncompressed.
+ Snappy.inflate(data)
+ end
+ rescue Snappy::Error
+ # older versions of avro-ruby didn't write the checksum, so removing
+ # the last 4 bytes may cause Snappy to fail. recover by assuming the
+ # payload is from an older file and uncompress the entire buffer.
Snappy.inflate(data)
end
def compress(data)
load_snappy!
- Snappy.deflate(data)
+ crc32 = Zlib.crc32(data)
+ compressed = Snappy.deflate(data)
+ [compressed, crc32].pack('a*N')
end
private
http://git-wip-us.apache.org/repos/asf/avro/blob/832512ed/lang/ruby/test/test_io.rb
----------------------------------------------------------------------
diff --git a/lang/ruby/test/test_io.rb b/lang/ruby/test/test_io.rb
index 153cb94..09d725d 100644
--- a/lang/ruby/test/test_io.rb
+++ b/lang/ruby/test/test_io.rb
@@ -340,6 +340,17 @@ EOS
assert_equal(incorrect, 0)
end
end
+
+ def test_snappy_backward_compat
+ # a snappy-compressed block payload without the checksum
+ # this has no back-references, just one literal so the last 9
+ # bytes are the uncompressed payload.
+ old_snappy_bytes = "\x09\x20\x02\x06\x02\x0a\x67\x72\x65\x65\x6e"
+ uncompressed_bytes = "\x02\x06\x02\x0a\x67\x72\x65\x65\x6e"
+ snappy = Avro::DataFile::SnappyCodec.new
+ assert_equal(uncompressed_bytes, snappy.decompress(old_snappy_bytes))
+ end
+
private
def check_no_default(schema_json)