You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by GitBox <gi...@apache.org> on 2021/02/23 18:18:23 UTC

[GitHub] [avro] andrewthauer opened a new pull request #1102: AVRO-3054: Fix / support decimal logical type in Ruby

andrewthauer opened a new pull request #1102:
URL: https://github.com/apache/avro/pull/1102


   ### Jira
   
   - [ ] My PR addresses the following [Avro Jira](https://issues.apache.org/jira/browse/AVRO-3054) issues and references them in the PR title. For example, "AVRO-1234: My Avro PR"
     - https://issues.apache.org/jira/browse/AVRO-3054
     - In case you are adding a dependency, check if the license complies with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   
   ### Tests
   
   - [  ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason:
   
   ### Commits
   
   - [  ] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](https://chris.beams.io/posts/git-commit/)":
     1. Subject is separated from body by a blank line
     1. Subject is limited to 50 characters (not including Jira issue reference)
     1. Subject does not end with a period
     1. Subject uses the imperative mood ("add", not "adding")
     1. Body wraps at 72 characters
     1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [  ] In case of new functionality, my PR adds documentation that describes how to use it.
     - All the public functions and the classes in the PR contain Javadoc that explain what it does
   
   ### Credit
   
   This upstream PR was originally created by @ziggythehamster & @johvet as part of #918.
   
   ### Supercedes PRs (from $918)
   
   The following PRs are currently open and implement an incorrect version of this feature, and should be closed:
   
   - #829
   - #840
   
   Both PRs shove a string like "1.234" into the bytes, rather than encoding them according to the specification. Both PRs do not validate inputs nor introduce infrastructure to do that.
   
   ### Notes from #918
   
   > The Avro specification is imprecise about how decimals are to be implemented, which required us to dig into the source code of Avro for Java as well as dig into Java's BigDecimal and BigInteger to make sure we were doing the same thing. Perhaps the specification could include a Java one-liner that implements the encoder/decoder? Here's a Scala one-liner that we used to test our implementation:
   
   ```java
   val encoded = new java.math.BigDecimal("3.4562").setScale(6).unscaledValue().toByteArray()
   val decoded = new java.math.BigDecimal(new java.math.BigInteger(encoded), 6)
   
   encoded.map("%02x".format(_)).mkString(" ") // 34 bc c8: String
   decoded // 3.456200: java.math.BigDecimal
   ```
   
   > - We tested this in Ruby 2.3, 2.4, 2.5, 2.6, and 2.7. This is the reason for the <= check for retained objects, as some Ruby versions retain objects where others don't. We think this is either a bug in memory_profiler or a bug in Ruby itself.
   > - Your build system depends on the echoe gem, but echoe is not compatible with RubyGems > 2.7. RubyGems 3.x has been out since 2018, and RubyGems 2.7 barely works in newer versions of Ruby. Consider upgrading this.
   > - This PR is against master, but 1.9 is the current stable version. This branch is a version based on 1.9 with #761 incorporated (#761 was the PR that incompletely implemented AVRO-2677).
   > - There's a gem published to GitHub packages as well, if you're like us and need a version with decimal support before this hits an official channel.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] andrewthauer commented on pull request #1102: AVRO-3054: Fix / support decimal logical type in Ruby

Posted by GitBox <gi...@apache.org>.
andrewthauer commented on pull request #1102:
URL: https://github.com/apache/avro/pull/1102#issuecomment-835370335


   @tjwp - I've rebased #1082 into this branch and re-tested. Everything seems good. Thanks!
   
   I've added commit https://github.com/apache/avro/pull/1102/commits/fa8b7254e76f3af5b2ec7d2ebdfdfafb26fd5316 which avoids frozen string errors in tests since I'm testing on ruby 2.7.
   
   Is there anything else you'd like addressed in this PR or does that mostly cover it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] ziggythehamster commented on pull request #1102: AVRO-3054: Fix / support decimal logical type in Ruby

Posted by GitBox <gi...@apache.org>.
ziggythehamster commented on pull request #1102:
URL: https://github.com/apache/avro/pull/1102#issuecomment-784503696


   This looks correct to me. Thanks for taking over on getting this landed; we have not had the time to work on the comments raised in the original PR or keep it fresh. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] andrewthauer commented on a change in pull request #1102: AVRO-3054: Fix / support decimal logical type in Ruby

Posted by GitBox <gi...@apache.org>.
andrewthauer commented on a change in pull request #1102:
URL: https://github.com/apache/avro/pull/1102#discussion_r632956389



##########
File path: lang/ruby/test/test_logical_types.rb
##########
@@ -99,6 +100,140 @@ def test_parse_fixed_duration
     assert_equal 'duration', schema.logical_type
   end
 
+  def test_bytes_decimal
+    schema = Avro::Schema.parse <<-SCHEMA
+      { "type": "bytes", "logicalType": "decimal", "precision": 9, "scale": 6 }
+    SCHEMA
+
+    assert_equal 'decimal', schema.logical_type
+    assert_equal 9, schema.precision
+    assert_equal 6, schema.scale
+
+    assert_encode_and_decode BigDecimal('-3.4562'), schema
+    assert_encode_and_decode BigDecimal('3.4562'), schema
+    assert_encode_and_decode 15.123, schema
+    assert_encode_and_decode 15, schema
+    assert_encode_and_decode BigDecimal('0.123456'), schema
+    assert_encode_and_decode BigDecimal('0'), schema
+    assert_encode_and_decode BigDecimal('1'), schema
+    assert_encode_and_decode BigDecimal('-1'), schema
+
+    assert_raise ArgumentError do
+      type = Avro::LogicalTypes::BytesDecimal.new(schema)
+      type.encode('1.23')
+    end
+  end
+
+  def test_bytes_decimal_range_errors
+    schema = Avro::Schema.parse <<-SCHEMA
+      { "type": "bytes", "logicalType": "decimal", "precision": 4, "scale": 2 }
+    SCHEMA
+
+    type = Avro::LogicalTypes::BytesDecimal.new(schema)
+
+    assert_raises RangeError do
+      type.encode(BigDecimal('345'))
+    end
+
+    assert_raises RangeError do
+      type.encode(BigDecimal('1.5342'))
+    end
+
+    assert_raises RangeError do
+      type.encode(BigDecimal('-1.5342'))
+    end
+
+    assert_raises RangeError do
+      type.encode(BigDecimal('-100.2'))
+    end
+
+    assert_raises RangeError do
+      type.encode(BigDecimal('-99.991'))
+    end
+  end
+
+  def test_bytes_decimal_conversion
+    schema = Avro::Schema.parse <<-SCHEMA
+      { "type": "bytes", "logicalType": "decimal", "precision": 12, "scale": 6 }
+    SCHEMA
+
+    type = Avro::LogicalTypes::BytesDecimal.new(schema)
+
+    enc = "\xcb\x43\x38".dup.force_encoding('BINARY')
+    assert_equal enc, type.encode(BigDecimal('-3.4562'))
+    assert_equal BigDecimal('-3.4562'), type.decode(enc)
+
+    assert_equal "\x34\xbc\xc8".dup.force_encoding('BINARY'), type.encode(BigDecimal('3.4562'))
+    assert_equal BigDecimal('3.4562'), type.decode("\x34\xbc\xc8".dup.force_encoding('BINARY'))
+
+    assert_equal "\x6a\x33\x0e\x87\x00".dup.force_encoding('BINARY'), type.encode(BigDecimal('456123.123456'))
+    assert_equal BigDecimal('456123.123456'), type.decode("\x6a\x33\x0e\x87\x00".dup.force_encoding('BINARY'))
+  end
+
+  def test_logical_type_with_schema
+    exception = assert_raises(ArgumentError) do
+      Avro::LogicalTypes::LogicalTypeWithSchema.new(nil)
+    end
+    assert_equal exception.to_s, 'schema is required'
+
+    schema = Avro::Schema.parse <<-SCHEMA
+      { "type": "bytes", "logicalType": "decimal", "precision": 12, "scale": 6 }
+    SCHEMA
+
+    assert_nothing_raised do
+      Avro::LogicalTypes::LogicalTypeWithSchema.new(schema)
+    end
+
+    assert_raises NotImplementedError do
+      Avro::LogicalTypes::LogicalTypeWithSchema.new(schema).encode(BigDecimal('2'))
+    end
+
+    assert_raises NotImplementedError do
+      Avro::LogicalTypes::LogicalTypeWithSchema.new(schema).decode('foo')
+    end
+  end
+
+  def test_bytes_decimal_object_allocations_encode
+    schema = Avro::Schema.parse <<-SCHEMA
+      { "type": "bytes", "logicalType": "decimal", "precision": 4, "scale": 2 }
+    SCHEMA
+
+    type = Avro::LogicalTypes::BytesDecimal.new(schema)
+
+    positive_value = BigDecimal('5.2')
+    negative_value = BigDecimal('-5.2')
+
+    [positive_value, negative_value].each do |value|
+      report = MemoryProfiler.report do
+        type.encode(value)
+      end
+
+      assert_equal 5, report.total_allocated
+      # Ruby 2.3, 2.4 and 2.7 do not retain anything. Ruby 2.5 and some 2.6 retain 1
+      assert_operator 1, :>=, report.total_retained
+    end
+  end
+
+  def test_bytes_decimal_object_allocations_decode
+    schema = Avro::Schema.parse <<-SCHEMA
+      { "type": "bytes", "logicalType": "decimal", "precision": 4, "scale": 2 }
+    SCHEMA
+
+    type = Avro::LogicalTypes::BytesDecimal.new(schema)
+
+    positive_enc = "\x02\b".dup.force_encoding('BINARY')
+    negative_enc = "\xFD\xF8".dup.force_encoding('BINARY')
+
+    [positive_enc, negative_enc].each do |encoded|
+      report = MemoryProfiler.report do
+        type.decode(encoded)
+      end
+
+      assert_equal 5, report.total_allocated
+      assert_equal 0, report.total_retained

Review comment:
       Updated based on suggestion




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] andrewthauer commented on pull request #1102: AVRO-3054: Fix / support decimal logical type in Ruby

Posted by GitBox <gi...@apache.org>.
andrewthauer commented on pull request #1102:
URL: https://github.com/apache/avro/pull/1102#issuecomment-808789659


   @tjwp - Sounds good. I'll keep an eye on #1082, and once that lands I'll rebase and look at incorporating https://github.com/tjwp/avro/commit/bdb64e658750e58ecdc8e659195cb39df41dbb55.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] tjwp commented on pull request #1102: AVRO-3054: Fix / support decimal logical type in Ruby

Posted by GitBox <gi...@apache.org>.
tjwp commented on pull request #1102:
URL: https://github.com/apache/avro/pull/1102#issuecomment-841668234


   I've created issues for the fixed validation and encoding/decoding:
   - https://issues.apache.org/jira/browse/AVRO-3141
   - https://issues.apache.org/jira/browse/AVRO-3142
   I'm optimistic that I'll have some time to work on those.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] tjwp edited a comment on pull request #1102: AVRO-3054: Fix / support decimal logical type in Ruby

Posted by GitBox <gi...@apache.org>.
tjwp edited a comment on pull request #1102:
URL: https://github.com/apache/avro/pull/1102#issuecomment-841668234


   I've created issues for the fixed validation and encoding/decoding:
   - https://issues.apache.org/jira/browse/AVRO-3141
   - https://issues.apache.org/jira/browse/AVRO-3142
   
   I'm optimistic that I'll have some time to work on those.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] andrewthauer commented on a change in pull request #1102: AVRO-3054: Fix / support decimal logical type in Ruby

Posted by GitBox <gi...@apache.org>.
andrewthauer commented on a change in pull request #1102:
URL: https://github.com/apache/avro/pull/1102#discussion_r632956368



##########
File path: lang/ruby/test/test_logical_types.rb
##########
@@ -99,6 +100,140 @@ def test_parse_fixed_duration
     assert_equal 'duration', schema.logical_type
   end
 
+  def test_bytes_decimal
+    schema = Avro::Schema.parse <<-SCHEMA
+      { "type": "bytes", "logicalType": "decimal", "precision": 9, "scale": 6 }
+    SCHEMA
+
+    assert_equal 'decimal', schema.logical_type
+    assert_equal 9, schema.precision
+    assert_equal 6, schema.scale
+
+    assert_encode_and_decode BigDecimal('-3.4562'), schema
+    assert_encode_and_decode BigDecimal('3.4562'), schema
+    assert_encode_and_decode 15.123, schema
+    assert_encode_and_decode 15, schema
+    assert_encode_and_decode BigDecimal('0.123456'), schema
+    assert_encode_and_decode BigDecimal('0'), schema
+    assert_encode_and_decode BigDecimal('1'), schema
+    assert_encode_and_decode BigDecimal('-1'), schema
+
+    assert_raise ArgumentError do
+      type = Avro::LogicalTypes::BytesDecimal.new(schema)
+      type.encode('1.23')
+    end
+  end
+
+  def test_bytes_decimal_range_errors
+    schema = Avro::Schema.parse <<-SCHEMA
+      { "type": "bytes", "logicalType": "decimal", "precision": 4, "scale": 2 }
+    SCHEMA
+
+    type = Avro::LogicalTypes::BytesDecimal.new(schema)
+
+    assert_raises RangeError do
+      type.encode(BigDecimal('345'))
+    end
+
+    assert_raises RangeError do
+      type.encode(BigDecimal('1.5342'))
+    end
+
+    assert_raises RangeError do
+      type.encode(BigDecimal('-1.5342'))
+    end
+
+    assert_raises RangeError do
+      type.encode(BigDecimal('-100.2'))
+    end
+
+    assert_raises RangeError do
+      type.encode(BigDecimal('-99.991'))
+    end
+  end
+
+  def test_bytes_decimal_conversion
+    schema = Avro::Schema.parse <<-SCHEMA
+      { "type": "bytes", "logicalType": "decimal", "precision": 12, "scale": 6 }
+    SCHEMA
+
+    type = Avro::LogicalTypes::BytesDecimal.new(schema)
+
+    enc = "\xcb\x43\x38".dup.force_encoding('BINARY')
+    assert_equal enc, type.encode(BigDecimal('-3.4562'))
+    assert_equal BigDecimal('-3.4562'), type.decode(enc)
+
+    assert_equal "\x34\xbc\xc8".dup.force_encoding('BINARY'), type.encode(BigDecimal('3.4562'))
+    assert_equal BigDecimal('3.4562'), type.decode("\x34\xbc\xc8".dup.force_encoding('BINARY'))
+
+    assert_equal "\x6a\x33\x0e\x87\x00".dup.force_encoding('BINARY'), type.encode(BigDecimal('456123.123456'))
+    assert_equal BigDecimal('456123.123456'), type.decode("\x6a\x33\x0e\x87\x00".dup.force_encoding('BINARY'))
+  end
+
+  def test_logical_type_with_schema
+    exception = assert_raises(ArgumentError) do
+      Avro::LogicalTypes::LogicalTypeWithSchema.new(nil)
+    end
+    assert_equal exception.to_s, 'schema is required'
+
+    schema = Avro::Schema.parse <<-SCHEMA
+      { "type": "bytes", "logicalType": "decimal", "precision": 12, "scale": 6 }
+    SCHEMA
+
+    assert_nothing_raised do
+      Avro::LogicalTypes::LogicalTypeWithSchema.new(schema)
+    end
+
+    assert_raises NotImplementedError do
+      Avro::LogicalTypes::LogicalTypeWithSchema.new(schema).encode(BigDecimal('2'))
+    end
+
+    assert_raises NotImplementedError do
+      Avro::LogicalTypes::LogicalTypeWithSchema.new(schema).decode('foo')
+    end
+  end
+
+  def test_bytes_decimal_object_allocations_encode
+    schema = Avro::Schema.parse <<-SCHEMA
+      { "type": "bytes", "logicalType": "decimal", "precision": 4, "scale": 2 }
+    SCHEMA
+
+    type = Avro::LogicalTypes::BytesDecimal.new(schema)
+
+    positive_value = BigDecimal('5.2')
+    negative_value = BigDecimal('-5.2')
+
+    [positive_value, negative_value].each do |value|
+      report = MemoryProfiler.report do
+        type.encode(value)
+      end
+
+      assert_equal 5, report.total_allocated
+      # Ruby 2.3, 2.4 and 2.7 do not retain anything. Ruby 2.5 and some 2.6 retain 1

Review comment:
       Updated comment




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] andrewthauer commented on pull request #1102: AVRO-3054: Fix / support decimal logical type in Ruby

Posted by GitBox <gi...@apache.org>.
andrewthauer commented on pull request #1102:
URL: https://github.com/apache/avro/pull/1102#issuecomment-841560909


   Almost forgot about https://github.com/tjwp/avro/commit/bdb64e658750e58ecdc8e659195cb39df41dbb55. This is not incorporated into the branch @tjwp.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] andrewthauer commented on a change in pull request #1102: AVRO-3054: Fix / support decimal logical type in Ruby

Posted by GitBox <gi...@apache.org>.
andrewthauer commented on a change in pull request #1102:
URL: https://github.com/apache/avro/pull/1102#discussion_r632956360



##########
File path: lang/ruby/lib/avro/logical_types.rb
##########
@@ -17,9 +17,193 @@
 # limitations under the License.
 
 require 'date'
+require 'bigdecimal'
+require 'bigdecimal/util'
 
 module Avro
   module LogicalTypes
+    ##
+    # Base class for logical types requiring a schema to be present
+    class LogicalTypeWithSchema
+      ##
+      # @return [Avro::Schema] The schema this logical type is dealing with
+      attr_reader :schema
+
+      ##
+      # Build a new instance of a logical type using the provided schema
+      #
+      # @param schema [Avro::Schema]
+      #     The schema to use with this instance
+      #
+      # @raise [ArgumentError]
+      #     If the provided schema is nil
+      def initialize(schema)
+        raise ArgumentError, 'schema is required' if schema.nil?
+
+        @schema = schema
+      end
+
+      ##
+      # Encode the provided datum
+      #
+      # @param datum [Object] The datum to encode
+      #
+      # @raise [NotImplementedError]
+      #     Subclass will need to override this method
+      def encode(datum)
+        raise NotImplementedError
+      end
+
+      ##
+      # Decode the provided datum
+      #
+      # @param datum [Object] The datum to decode
+      #
+      # @raise [NotImplementedError]
+      #     Subclass will need to override this method
+      def decode(datum)
+        raise NotImplementedError
+      end
+    end
+
+    ##
+    # Logical type to handle arbitrary-precision decimals using byte array.
+    #
+    # The byte array contains the two's-complement representation of the unscaled integer
+    # value in big-endian byte order.
+    class BytesDecimal < LogicalTypeWithSchema
+      # Messages for exceptions
+      ERROR_INSUFFICIENT_PRECISION = 'Precision is too small'
+      ERROR_ROUNDING_NECESSARY     = 'Rounding necessary'
+      ERROR_VALUE_MUST_BE_NUMERIC  = 'value must be numeric'
+
+      # The pattern used to pack up the byte array (8 bit unsigned integer/char)
+      PACK_UNSIGNED_CHARS = 'C*'
+
+      # The number 10 as BigDecimal
+      TEN = BigDecimal(10).freeze
+
+      ##
+      # @return [Integer] The number of total digits supported by the decimal
+      attr_reader :precision
+
+      ##
+      # @return [Integer] The number of fractional digits
+      attr_reader :scale
+
+      ##
+      # Build a new decimal logical type
+      #
+      # @param schema [Avro::Schema]
+      #     The schema defining precision and scale for the conversion
+      #
+      # @raise [ArgumentError]

Review comment:
       Removed yardoc comment




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] andrewthauer edited a comment on pull request #1102: AVRO-3054: Fix / support decimal logical type in Ruby

Posted by GitBox <gi...@apache.org>.
andrewthauer edited a comment on pull request #1102:
URL: https://github.com/apache/avro/pull/1102#issuecomment-841560909


   Almost forgot about https://github.com/tjwp/avro/commit/bdb64e658750e58ecdc8e659195cb39df41dbb55. This is now incorporated into the branch @tjwp.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] tjwp commented on pull request #1102: AVRO-3054: Fix / support decimal logical type in Ruby

Posted by GitBox <gi...@apache.org>.
tjwp commented on pull request #1102:
URL: https://github.com/apache/avro/pull/1102#issuecomment-841669243


   >  Ideally the additional validation can be addressed in future PR.
   
   👍🏻 
   
   > Would be great to get this PR shipped so we can stop using the forked version we have.
   
   Looks good! Merging now


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] andrewthauer commented on a change in pull request #1102: AVRO-3054: Fix / support decimal logical type in Ruby

Posted by GitBox <gi...@apache.org>.
andrewthauer commented on a change in pull request #1102:
URL: https://github.com/apache/avro/pull/1102#discussion_r632955700



##########
File path: lang/ruby/test/test_logical_types.rb
##########
@@ -99,6 +100,140 @@ def test_parse_fixed_duration
     assert_equal 'duration', schema.logical_type
   end
 
+  def test_bytes_decimal
+    schema = Avro::Schema.parse <<-SCHEMA
+      { "type": "bytes", "logicalType": "decimal", "precision": 9, "scale": 6 }
+    SCHEMA
+
+    assert_equal 'decimal', schema.logical_type
+    assert_equal 9, schema.precision
+    assert_equal 6, schema.scale
+
+    assert_encode_and_decode BigDecimal('-3.4562'), schema
+    assert_encode_and_decode BigDecimal('3.4562'), schema
+    assert_encode_and_decode 15.123, schema
+    assert_encode_and_decode 15, schema
+    assert_encode_and_decode BigDecimal('0.123456'), schema
+    assert_encode_and_decode BigDecimal('0'), schema
+    assert_encode_and_decode BigDecimal('1'), schema
+    assert_encode_and_decode BigDecimal('-1'), schema
+
+    assert_raise ArgumentError do
+      type = Avro::LogicalTypes::BytesDecimal.new(schema)
+      type.encode('1.23')
+    end
+  end
+
+  def test_bytes_decimal_range_errors
+    schema = Avro::Schema.parse <<-SCHEMA
+      { "type": "bytes", "logicalType": "decimal", "precision": 4, "scale": 2 }
+    SCHEMA
+
+    type = Avro::LogicalTypes::BytesDecimal.new(schema)
+
+    assert_raises RangeError do
+      type.encode(BigDecimal('345'))
+    end
+
+    assert_raises RangeError do
+      type.encode(BigDecimal('1.5342'))
+    end
+
+    assert_raises RangeError do
+      type.encode(BigDecimal('-1.5342'))
+    end
+
+    assert_raises RangeError do
+      type.encode(BigDecimal('-100.2'))
+    end
+
+    assert_raises RangeError do
+      type.encode(BigDecimal('-99.991'))
+    end
+  end
+
+  def test_bytes_decimal_conversion
+    schema = Avro::Schema.parse <<-SCHEMA
+      { "type": "bytes", "logicalType": "decimal", "precision": 12, "scale": 6 }
+    SCHEMA
+
+    type = Avro::LogicalTypes::BytesDecimal.new(schema)
+
+    enc = "\xcb\x43\x38".dup.force_encoding('BINARY')
+    assert_equal enc, type.encode(BigDecimal('-3.4562'))
+    assert_equal BigDecimal('-3.4562'), type.decode(enc)
+
+    assert_equal "\x34\xbc\xc8".dup.force_encoding('BINARY'), type.encode(BigDecimal('3.4562'))
+    assert_equal BigDecimal('3.4562'), type.decode("\x34\xbc\xc8".dup.force_encoding('BINARY'))
+
+    assert_equal "\x6a\x33\x0e\x87\x00".dup.force_encoding('BINARY'), type.encode(BigDecimal('456123.123456'))
+    assert_equal BigDecimal('456123.123456'), type.decode("\x6a\x33\x0e\x87\x00".dup.force_encoding('BINARY'))
+  end
+
+  def test_logical_type_with_schema
+    exception = assert_raises(ArgumentError) do
+      Avro::LogicalTypes::LogicalTypeWithSchema.new(nil)
+    end
+    assert_equal exception.to_s, 'schema is required'
+
+    schema = Avro::Schema.parse <<-SCHEMA
+      { "type": "bytes", "logicalType": "decimal", "precision": 12, "scale": 6 }
+    SCHEMA
+
+    assert_nothing_raised do
+      Avro::LogicalTypes::LogicalTypeWithSchema.new(schema)
+    end
+
+    assert_raises NotImplementedError do
+      Avro::LogicalTypes::LogicalTypeWithSchema.new(schema).encode(BigDecimal('2'))
+    end
+
+    assert_raises NotImplementedError do
+      Avro::LogicalTypes::LogicalTypeWithSchema.new(schema).decode('foo')
+    end
+  end
+
+  def test_bytes_decimal_object_allocations_encode
+    schema = Avro::Schema.parse <<-SCHEMA
+      { "type": "bytes", "logicalType": "decimal", "precision": 4, "scale": 2 }
+    SCHEMA
+
+    type = Avro::LogicalTypes::BytesDecimal.new(schema)
+
+    positive_value = BigDecimal('5.2')
+    negative_value = BigDecimal('-5.2')
+
+    [positive_value, negative_value].each do |value|
+      report = MemoryProfiler.report do
+        type.encode(value)
+      end
+
+      assert_equal 5, report.total_allocated
+      # Ruby 2.3, 2.4 and 2.7 do not retain anything. Ruby 2.5 and some 2.6 retain 1
+      assert_operator 1, :>=, report.total_retained
+    end
+  end
+
+  def test_bytes_decimal_object_allocations_decode
+    schema = Avro::Schema.parse <<-SCHEMA
+      { "type": "bytes", "logicalType": "decimal", "precision": 4, "scale": 2 }
+    SCHEMA
+
+    type = Avro::LogicalTypes::BytesDecimal.new(schema)
+
+    positive_enc = "\x02\b".dup.force_encoding('BINARY')
+    negative_enc = "\xFD\xF8".dup.force_encoding('BINARY')
+
+    [positive_enc, negative_enc].each do |encoded|
+      report = MemoryProfiler.report do
+        type.decode(encoded)
+      end
+
+      assert_equal 5, report.total_allocated
+      assert_equal 0, report.total_retained

Review comment:
       I noticed this too, but I can't reproduce it locally (even with the same ruby 2.6.7). I didn't realize the details here. Happy to change that quick.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] andrewthauer commented on pull request #1102: AVRO-3054: Fix / support decimal logical type in Ruby

Posted by GitBox <gi...@apache.org>.
andrewthauer commented on pull request #1102:
URL: https://github.com/apache/avro/pull/1102#issuecomment-787518478


   @tjwp - Thanks for looking at this and also working on the additional validation refactoring.
   
   It definitely seems more appropriate to handle the validation up front vs during encoding/decoding. https://github.com/tjwp/avro/commit/bdb64e658750e58ecdc8e659195cb39df41dbb55 seems cleaner as well imo. I haven't tested against that or #1082, but nothing stands out as concern in either. Are you thinking of getting both of those merged and before this?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] tjwp commented on pull request #1102: AVRO-3054: Fix / support decimal logical type in Ruby

Posted by GitBox <gi...@apache.org>.
tjwp commented on pull request #1102:
URL: https://github.com/apache/avro/pull/1102#issuecomment-808789199


   @andrewthauer My plan was to address the schema matching with fixed feedback in #1082, and get that merged first. 
   
   After that this pull request will need to be rebased. Then the changes from https://github.com/tjwp/avro/commit/bdb64e658750e58ecdc8e659195cb39df41dbb55 should be incorporated into this branch. It is possible some additional changes may also be required as a result of #1082. I may be able to work on this consolidation once #1082 is done.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] tjwp merged pull request #1102: AVRO-3054: Fix / support decimal logical type in Ruby

Posted by GitBox <gi...@apache.org>.
tjwp merged pull request #1102:
URL: https://github.com/apache/avro/pull/1102


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] andrewthauer commented on pull request #1102: AVRO-3054: Fix / support decimal logical type in Ruby

Posted by GitBox <gi...@apache.org>.
andrewthauer commented on pull request #1102:
URL: https://github.com/apache/avro/pull/1102#issuecomment-841668176


   @tjwp - Updated as per your suggestions. Ideally the additional validation can be addressed in future PR. Would be great to get this PR shipped so we can stop using the forked version we have.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] andrewthauer commented on pull request #1102: AVRO-3054: Fix / support decimal logical type in Ruby

Posted by GitBox <gi...@apache.org>.
andrewthauer commented on pull request #1102:
URL: https://github.com/apache/avro/pull/1102#issuecomment-799553858


   @twjp - Did you want me to try and incorporate the changes from #1082 and/or https://github.com/tjwp/avro/commit/bdb64e658750e58ecdc8e659195cb39df41dbb55 into this PR?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] tjwp commented on pull request #1102: AVRO-3054: Fix / support decimal logical type in Ruby

Posted by GitBox <gi...@apache.org>.
tjwp commented on pull request #1102:
URL: https://github.com/apache/avro/pull/1102#issuecomment-787511122


   @andrewthauer Thank you for continuing to push this forward! I've taken a look at this branch, and I created my own branch to work through any additional changes that I would make. I have that commit here if you'd like to take a look: 
   
   https://github.com/tjwp/avro/commit/bdb64e658750e58ecdc8e659195cb39df41dbb55
   
   The main change is to bring the validation of the decimal logical type forward to when the schema is parsed. That way errors with a schema are caught prior to any encoding/decoding. Please take a look and let me know what you think.
   
   As for https://github.com/apache/avro/pull/1082, if you and @ziggythehamster have looked and have no major concerns with it, then I'm happy to move forward.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [avro] tjwp commented on a change in pull request #1102: AVRO-3054: Fix / support decimal logical type in Ruby

Posted by GitBox <gi...@apache.org>.
tjwp commented on a change in pull request #1102:
URL: https://github.com/apache/avro/pull/1102#discussion_r632953810



##########
File path: lang/ruby/test/test_logical_types.rb
##########
@@ -99,6 +100,140 @@ def test_parse_fixed_duration
     assert_equal 'duration', schema.logical_type
   end
 
+  def test_bytes_decimal
+    schema = Avro::Schema.parse <<-SCHEMA
+      { "type": "bytes", "logicalType": "decimal", "precision": 9, "scale": 6 }
+    SCHEMA
+
+    assert_equal 'decimal', schema.logical_type
+    assert_equal 9, schema.precision
+    assert_equal 6, schema.scale
+
+    assert_encode_and_decode BigDecimal('-3.4562'), schema
+    assert_encode_and_decode BigDecimal('3.4562'), schema
+    assert_encode_and_decode 15.123, schema
+    assert_encode_and_decode 15, schema
+    assert_encode_and_decode BigDecimal('0.123456'), schema
+    assert_encode_and_decode BigDecimal('0'), schema
+    assert_encode_and_decode BigDecimal('1'), schema
+    assert_encode_and_decode BigDecimal('-1'), schema
+
+    assert_raise ArgumentError do
+      type = Avro::LogicalTypes::BytesDecimal.new(schema)
+      type.encode('1.23')
+    end
+  end
+
+  def test_bytes_decimal_range_errors
+    schema = Avro::Schema.parse <<-SCHEMA
+      { "type": "bytes", "logicalType": "decimal", "precision": 4, "scale": 2 }
+    SCHEMA
+
+    type = Avro::LogicalTypes::BytesDecimal.new(schema)
+
+    assert_raises RangeError do
+      type.encode(BigDecimal('345'))
+    end
+
+    assert_raises RangeError do
+      type.encode(BigDecimal('1.5342'))
+    end
+
+    assert_raises RangeError do
+      type.encode(BigDecimal('-1.5342'))
+    end
+
+    assert_raises RangeError do
+      type.encode(BigDecimal('-100.2'))
+    end
+
+    assert_raises RangeError do
+      type.encode(BigDecimal('-99.991'))
+    end
+  end
+
+  def test_bytes_decimal_conversion
+    schema = Avro::Schema.parse <<-SCHEMA
+      { "type": "bytes", "logicalType": "decimal", "precision": 12, "scale": 6 }
+    SCHEMA
+
+    type = Avro::LogicalTypes::BytesDecimal.new(schema)
+
+    enc = "\xcb\x43\x38".dup.force_encoding('BINARY')
+    assert_equal enc, type.encode(BigDecimal('-3.4562'))
+    assert_equal BigDecimal('-3.4562'), type.decode(enc)
+
+    assert_equal "\x34\xbc\xc8".dup.force_encoding('BINARY'), type.encode(BigDecimal('3.4562'))
+    assert_equal BigDecimal('3.4562'), type.decode("\x34\xbc\xc8".dup.force_encoding('BINARY'))
+
+    assert_equal "\x6a\x33\x0e\x87\x00".dup.force_encoding('BINARY'), type.encode(BigDecimal('456123.123456'))
+    assert_equal BigDecimal('456123.123456'), type.decode("\x6a\x33\x0e\x87\x00".dup.force_encoding('BINARY'))
+  end
+
+  def test_logical_type_with_schema
+    exception = assert_raises(ArgumentError) do
+      Avro::LogicalTypes::LogicalTypeWithSchema.new(nil)
+    end
+    assert_equal exception.to_s, 'schema is required'
+
+    schema = Avro::Schema.parse <<-SCHEMA
+      { "type": "bytes", "logicalType": "decimal", "precision": 12, "scale": 6 }
+    SCHEMA
+
+    assert_nothing_raised do
+      Avro::LogicalTypes::LogicalTypeWithSchema.new(schema)
+    end
+
+    assert_raises NotImplementedError do
+      Avro::LogicalTypes::LogicalTypeWithSchema.new(schema).encode(BigDecimal('2'))
+    end
+
+    assert_raises NotImplementedError do
+      Avro::LogicalTypes::LogicalTypeWithSchema.new(schema).decode('foo')
+    end
+  end
+
+  def test_bytes_decimal_object_allocations_encode
+    schema = Avro::Schema.parse <<-SCHEMA
+      { "type": "bytes", "logicalType": "decimal", "precision": 4, "scale": 2 }
+    SCHEMA
+
+    type = Avro::LogicalTypes::BytesDecimal.new(schema)
+
+    positive_value = BigDecimal('5.2')
+    negative_value = BigDecimal('-5.2')
+
+    [positive_value, negative_value].each do |value|
+      report = MemoryProfiler.report do
+        type.encode(value)
+      end
+
+      assert_equal 5, report.total_allocated
+      # Ruby 2.3, 2.4 and 2.7 do not retain anything. Ruby 2.5 and some 2.6 retain 1
+      assert_operator 1, :>=, report.total_retained
+    end
+  end
+
+  def test_bytes_decimal_object_allocations_decode
+    schema = Avro::Schema.parse <<-SCHEMA
+      { "type": "bytes", "logicalType": "decimal", "precision": 4, "scale": 2 }
+    SCHEMA
+
+    type = Avro::LogicalTypes::BytesDecimal.new(schema)
+
+    positive_enc = "\x02\b".dup.force_encoding('BINARY')
+    negative_enc = "\xFD\xF8".dup.force_encoding('BINARY')
+
+    [positive_enc, negative_enc].each do |encoded|
+      report = MemoryProfiler.report do
+        type.decode(encoded)
+      end
+
+      assert_equal 5, report.total_allocated
+      assert_equal 0, report.total_retained

Review comment:
       This line is failing with Ruby 2.6 where it is getting 1. This should probably be changed to match https://github.com/apache/avro/pull/1102/files#diff-5c5f6ae9be9e29a9812a5b49746e79d50c326fb32fc984064dc15c52a7b41385R212-R213 including a similar comment. 

##########
File path: lang/ruby/lib/avro/logical_types.rb
##########
@@ -17,9 +17,193 @@
 # limitations under the License.
 
 require 'date'
+require 'bigdecimal'
+require 'bigdecimal/util'
 
 module Avro
   module LogicalTypes
+    ##
+    # Base class for logical types requiring a schema to be present
+    class LogicalTypeWithSchema
+      ##
+      # @return [Avro::Schema] The schema this logical type is dealing with
+      attr_reader :schema
+
+      ##
+      # Build a new instance of a logical type using the provided schema
+      #
+      # @param schema [Avro::Schema]
+      #     The schema to use with this instance
+      #
+      # @raise [ArgumentError]
+      #     If the provided schema is nil
+      def initialize(schema)
+        raise ArgumentError, 'schema is required' if schema.nil?
+
+        @schema = schema
+      end
+
+      ##
+      # Encode the provided datum
+      #
+      # @param datum [Object] The datum to encode
+      #
+      # @raise [NotImplementedError]
+      #     Subclass will need to override this method
+      def encode(datum)
+        raise NotImplementedError
+      end
+
+      ##
+      # Decode the provided datum
+      #
+      # @param datum [Object] The datum to decode
+      #
+      # @raise [NotImplementedError]
+      #     Subclass will need to override this method
+      def decode(datum)
+        raise NotImplementedError
+      end
+    end
+
+    ##
+    # Logical type to handle arbitrary-precision decimals using byte array.
+    #
+    # The byte array contains the two's-complement representation of the unscaled integer
+    # value in big-endian byte order.
+    class BytesDecimal < LogicalTypeWithSchema
+      # Messages for exceptions
+      ERROR_INSUFFICIENT_PRECISION = 'Precision is too small'
+      ERROR_ROUNDING_NECESSARY     = 'Rounding necessary'
+      ERROR_VALUE_MUST_BE_NUMERIC  = 'value must be numeric'
+
+      # The pattern used to pack up the byte array (8 bit unsigned integer/char)
+      PACK_UNSIGNED_CHARS = 'C*'
+
+      # The number 10 as BigDecimal
+      TEN = BigDecimal(10).freeze
+
+      ##
+      # @return [Integer] The number of total digits supported by the decimal
+      attr_reader :precision
+
+      ##
+      # @return [Integer] The number of fractional digits
+      attr_reader :scale
+
+      ##
+      # Build a new decimal logical type
+      #
+      # @param schema [Avro::Schema]
+      #     The schema defining precision and scale for the conversion
+      #
+      # @raise [ArgumentError]

Review comment:
       Minor, but this initialize method no longer raises since the validation was moved to the `BytesSchema` class.
   
   Now that parsing and matching has been added for fixed decimal types the same validation should be performed there, and it would be nice to share the implementation. I think that can happen separately.

##########
File path: lang/ruby/test/test_logical_types.rb
##########
@@ -99,6 +100,140 @@ def test_parse_fixed_duration
     assert_equal 'duration', schema.logical_type
   end
 
+  def test_bytes_decimal
+    schema = Avro::Schema.parse <<-SCHEMA
+      { "type": "bytes", "logicalType": "decimal", "precision": 9, "scale": 6 }
+    SCHEMA
+
+    assert_equal 'decimal', schema.logical_type
+    assert_equal 9, schema.precision
+    assert_equal 6, schema.scale
+
+    assert_encode_and_decode BigDecimal('-3.4562'), schema
+    assert_encode_and_decode BigDecimal('3.4562'), schema
+    assert_encode_and_decode 15.123, schema
+    assert_encode_and_decode 15, schema
+    assert_encode_and_decode BigDecimal('0.123456'), schema
+    assert_encode_and_decode BigDecimal('0'), schema
+    assert_encode_and_decode BigDecimal('1'), schema
+    assert_encode_and_decode BigDecimal('-1'), schema
+
+    assert_raise ArgumentError do
+      type = Avro::LogicalTypes::BytesDecimal.new(schema)
+      type.encode('1.23')
+    end
+  end
+
+  def test_bytes_decimal_range_errors
+    schema = Avro::Schema.parse <<-SCHEMA
+      { "type": "bytes", "logicalType": "decimal", "precision": 4, "scale": 2 }
+    SCHEMA
+
+    type = Avro::LogicalTypes::BytesDecimal.new(schema)
+
+    assert_raises RangeError do
+      type.encode(BigDecimal('345'))
+    end
+
+    assert_raises RangeError do
+      type.encode(BigDecimal('1.5342'))
+    end
+
+    assert_raises RangeError do
+      type.encode(BigDecimal('-1.5342'))
+    end
+
+    assert_raises RangeError do
+      type.encode(BigDecimal('-100.2'))
+    end
+
+    assert_raises RangeError do
+      type.encode(BigDecimal('-99.991'))
+    end
+  end
+
+  def test_bytes_decimal_conversion
+    schema = Avro::Schema.parse <<-SCHEMA
+      { "type": "bytes", "logicalType": "decimal", "precision": 12, "scale": 6 }
+    SCHEMA
+
+    type = Avro::LogicalTypes::BytesDecimal.new(schema)
+
+    enc = "\xcb\x43\x38".dup.force_encoding('BINARY')
+    assert_equal enc, type.encode(BigDecimal('-3.4562'))
+    assert_equal BigDecimal('-3.4562'), type.decode(enc)
+
+    assert_equal "\x34\xbc\xc8".dup.force_encoding('BINARY'), type.encode(BigDecimal('3.4562'))
+    assert_equal BigDecimal('3.4562'), type.decode("\x34\xbc\xc8".dup.force_encoding('BINARY'))
+
+    assert_equal "\x6a\x33\x0e\x87\x00".dup.force_encoding('BINARY'), type.encode(BigDecimal('456123.123456'))
+    assert_equal BigDecimal('456123.123456'), type.decode("\x6a\x33\x0e\x87\x00".dup.force_encoding('BINARY'))
+  end
+
+  def test_logical_type_with_schema
+    exception = assert_raises(ArgumentError) do
+      Avro::LogicalTypes::LogicalTypeWithSchema.new(nil)
+    end
+    assert_equal exception.to_s, 'schema is required'
+
+    schema = Avro::Schema.parse <<-SCHEMA
+      { "type": "bytes", "logicalType": "decimal", "precision": 12, "scale": 6 }
+    SCHEMA
+
+    assert_nothing_raised do
+      Avro::LogicalTypes::LogicalTypeWithSchema.new(schema)
+    end
+
+    assert_raises NotImplementedError do
+      Avro::LogicalTypes::LogicalTypeWithSchema.new(schema).encode(BigDecimal('2'))
+    end
+
+    assert_raises NotImplementedError do
+      Avro::LogicalTypes::LogicalTypeWithSchema.new(schema).decode('foo')
+    end
+  end
+
+  def test_bytes_decimal_object_allocations_encode
+    schema = Avro::Schema.parse <<-SCHEMA
+      { "type": "bytes", "logicalType": "decimal", "precision": 4, "scale": 2 }
+    SCHEMA
+
+    type = Avro::LogicalTypes::BytesDecimal.new(schema)
+
+    positive_value = BigDecimal('5.2')
+    negative_value = BigDecimal('-5.2')
+
+    [positive_value, negative_value].each do |value|
+      report = MemoryProfiler.report do
+        type.encode(value)
+      end
+
+      assert_equal 5, report.total_allocated
+      # Ruby 2.3, 2.4 and 2.7 do not retain anything. Ruby 2.5 and some 2.6 retain 1

Review comment:
       Only Ruby 2.6 and later are supported now, so references to earlier versions could be dropped here. (optional)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org