You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@avro.apache.org by GitBox <gi...@apache.org> on 2020/07/10 04:01:03 UTC

[GitHub] [avro] ziggythehamster commented on a change in pull request #918: AVRO-2677: Fully and correctly support decimal logical type in Ruby

ziggythehamster commented on a change in pull request #918:
URL: https://github.com/apache/avro/pull/918#discussion_r452608512



##########
File path: lang/ruby/lib/avro/logical_types.rb
##########
@@ -16,9 +16,209 @@
 # limitations under the License.
 
 require 'date'
+require 'bigdecimal'
+require 'bigdecimal/util'
 
 module Avro
   module LogicalTypes
+
+    ##
+    # Base class for logical types requiring a schema to be present
+    class LogicalTypeWithSchema
+      ##
+      # @return [Avro::Schema] The schema this logical type is dealing with
+      attr_reader :schema
+
+      ##
+      # Build a new isntance of a logical type using the provided schema
+      #
+      # @param schema [Avro::Schema]
+      #     The schema to use with this instance
+      #
+      # @raise [ArgumentError]
+      #     If the provided schema is nil
+      def initialize(schema)
+        raise ArgumentError, 'schema is required' if schema.nil?
+
+        @schema = schema
+      end
+
+      ##
+      # Encode the provided datum
+      #
+      # @param datum [Object] The datum to encode
+      #
+      # @raise [NotImplementedError]
+      #     Subclass will need to override this method
+      def encode(datum)
+        raise NotImplementedError
+      end
+
+      ##
+      # Decode the provided datum
+      #
+      # @param datum [Object] The datum to decode
+      #
+      # @raise [NotImplementedError]
+      #     Subclass will need to override this method
+      def decode(datum)
+        raise NotImplementedError
+      end
+    end
+
+    ##
+    # Logical type to handle arbitrary-precision decimals using byte array.
+    #
+    # The byte array contains the two's-complement representation of the unscaled integer
+    # value in big-endian byte order.
+    class BytesDecimal < LogicalTypeWithSchema
+      # Messages for exceptions
+      ERROR_INSUFICIENT_PRECISION = 'Precision is too small'.freeze
+      ERROR_INVALID_SCALE         = 'Scale must be greater than or equal to 0'.freeze
+      ERROR_INVALID_PRECISION     = 'Precision must be positive'.freeze
+      ERROR_PRECISION_TOO_SMALL   = 'Precision must be greater than scale'.freeze
+      ERROR_ROUNDING_NECESSARY    = 'Rounding necessary'.freeze
+      ERROR_VALUE_MUST_BE_NUMERIC = 'value must be numeric'.freeze
+
+      # The pattern used to pack up the byte array (8 bit unsigned integer/char)
+      PACK_UNSIGNED_CHARS = 'C*'.freeze
+
+      # The number 10 as BigDecimal
+      TEN = BigDecimal(10).freeze
+
+      ##
+      # @return [Integer] The number of total digits supported by the decimal
+      attr_reader :precision
+
+      ##
+      # @return [Integer] The number of fractional digits
+      attr_reader :scale
+
+      ##
+      # Build a new decimal logical type
+      #
+      # @param schema [Avro::Schema]
+      #     The schema defining precision and scale for the conversion
+      #
+      # @raise [ArgumentError]
+      #     If precision or scale have invalid values. Precision must be
+      #     positive and greater or equal to scale. If scale is missing,
+      #     it defaults to 0.
+      def initialize(schema)

Review comment:
       In order to do that, there would need to be a cached lookup for Schema first. If you create two Schema instances, they are different even if representing the same thing, which makes Schema unsuitable as a lookup key. If you converted Schema to a String/Symbol/whatever that was consistent and used it as a lookup, that would work, but it seemed like the refactor of "cache Schema instances in a lookup" was out of scope.

##########
File path: lang/ruby/lib/avro/logical_types.rb
##########
@@ -16,9 +16,209 @@
 # limitations under the License.
 
 require 'date'
+require 'bigdecimal'
+require 'bigdecimal/util'
 
 module Avro
   module LogicalTypes
+
+    ##
+    # Base class for logical types requiring a schema to be present
+    class LogicalTypeWithSchema
+      ##
+      # @return [Avro::Schema] The schema this logical type is dealing with
+      attr_reader :schema
+
+      ##
+      # Build a new isntance of a logical type using the provided schema

Review comment:
       ```suggestion
         # Build a new instance of a logical type using the provided schema
   ```

##########
File path: lang/ruby/lib/avro/logical_types.rb
##########
@@ -16,9 +16,209 @@
 # limitations under the License.
 
 require 'date'
+require 'bigdecimal'
+require 'bigdecimal/util'
 
 module Avro
   module LogicalTypes
+
+    ##
+    # Base class for logical types requiring a schema to be present
+    class LogicalTypeWithSchema
+      ##
+      # @return [Avro::Schema] The schema this logical type is dealing with
+      attr_reader :schema
+
+      ##
+      # Build a new isntance of a logical type using the provided schema
+      #
+      # @param schema [Avro::Schema]
+      #     The schema to use with this instance
+      #
+      # @raise [ArgumentError]
+      #     If the provided schema is nil
+      def initialize(schema)
+        raise ArgumentError, 'schema is required' if schema.nil?
+
+        @schema = schema
+      end
+
+      ##
+      # Encode the provided datum
+      #
+      # @param datum [Object] The datum to encode
+      #
+      # @raise [NotImplementedError]
+      #     Subclass will need to override this method
+      def encode(datum)
+        raise NotImplementedError
+      end
+
+      ##
+      # Decode the provided datum
+      #
+      # @param datum [Object] The datum to decode
+      #
+      # @raise [NotImplementedError]
+      #     Subclass will need to override this method
+      def decode(datum)
+        raise NotImplementedError
+      end
+    end
+
+    ##
+    # Logical type to handle arbitrary-precision decimals using byte array.
+    #
+    # The byte array contains the two's-complement representation of the unscaled integer
+    # value in big-endian byte order.
+    class BytesDecimal < LogicalTypeWithSchema
+      # Messages for exceptions
+      ERROR_INSUFICIENT_PRECISION = 'Precision is too small'.freeze
+      ERROR_INVALID_SCALE         = 'Scale must be greater than or equal to 0'.freeze
+      ERROR_INVALID_PRECISION     = 'Precision must be positive'.freeze
+      ERROR_PRECISION_TOO_SMALL   = 'Precision must be greater than scale'.freeze
+      ERROR_ROUNDING_NECESSARY    = 'Rounding necessary'.freeze
+      ERROR_VALUE_MUST_BE_NUMERIC = 'value must be numeric'.freeze
+
+      # The pattern used to pack up the byte array (8 bit unsigned integer/char)
+      PACK_UNSIGNED_CHARS = 'C*'.freeze
+
+      # The number 10 as BigDecimal
+      TEN = BigDecimal(10).freeze
+
+      ##
+      # @return [Integer] The number of total digits supported by the decimal
+      attr_reader :precision
+
+      ##
+      # @return [Integer] The number of fractional digits
+      attr_reader :scale
+
+      ##
+      # Build a new decimal logical type
+      #
+      # @param schema [Avro::Schema]
+      #     The schema defining precision and scale for the conversion
+      #
+      # @raise [ArgumentError]
+      #     If precision or scale have invalid values. Precision must be
+      #     positive and greater or equal to scale. If scale is missing,
+      #     it defaults to 0.
+      def initialize(schema)
+        super
+
+        @scale     = schema.scale.to_i
+        @precision = schema.precision.to_i
+        @factor    = TEN ** @scale
+
+        validate!
+      end
+
+      ##
+      # Encode the provided value into a byte array
+      #
+      # @param value [BigDecimal, Float, Integer]
+      #     The numeric value to encode
+      #
+      # @raise [ArgumentError]
+      #     If the provided value is not a numeric type
+      #
+      # @raise [RangeError]
+      #     If the provided value has a scale higher than the schema permits,
+      #     or does not fit into the schema's precision
+      def encode(value)
+        raise ArgumentError, ERROR_VALUE_MUST_BE_NUMERIC unless value.is_a?(Numeric)
+
+        to_byte_array(unscaled_value(value.to_d)).pack(PACK_UNSIGNED_CHARS).freeze
+      end
+
+      ##
+      # Decode a byte array (in form of a string) into a BigDecimal of the
+      # given precision and scale
+      #
+      # @param stream [String]
+      #     The byte array to decode
+      #
+      # @return [BigDecimal]
+      def decode(stream)
+        from_byte_array(stream) / @factor
+      end
+
+      private
+
+      ##
+      # Convert the provided stream of bytes into the unscaled value
+      #
+      # @param stream [String]
+      #     The stream of bytes to convert
+      #
+      # @return [Integer]
+      def from_byte_array(stream)
+        bytes    = stream.bytes
+        positive = bytes.first[7].zero?
+        total    = 0
+
+        bytes.each_with_index do |value, ix|
+          total += (positive ? value : (value ^ 0xff)) << (bytes.length - ix - 1) * 8
+        end
+
+        return total if positive
+
+        -(total + 1)
+      end
+
+      ##
+      # Convert the provided number into its two's complement representation
+      # in network order (big endian).
+      #
+      # @param number [Integer]
+      #     The number to convert
+      #
+      # @return [Array<Integer>]
+      #     The byte array in network order
+      def to_byte_array(number)
+        [].tap do |result|
+          loop do
+            result.unshift(number & 0xff)
+            number >>= 8
+
+            break if (number == 0 || number == -1) && (result.first[7] == number[7])
+          end
+        end
+      end
+
+      ##
+      # Get the unscaled value from a BigDecimal considering the schema's scale
+      #
+      # @param decimal [BigDecimal]
+      #     The decimal to get the unscaled value from
+      #
+      # @return [Integer]
+      def unscaled_value(decimal)
+        details = decimal.split
+        length  = details[1].length
+
+        fractional_part = length - details[3]
+        raise RangeError, ERROR_ROUNDING_NECESSARY if fractional_part > scale
+
+        if length > precision || (length - fractional_part) > (precision - scale)
+          raise RangeError, ERROR_INSUFICIENT_PRECISION
+        end
+
+        (decimal * @factor).to_i
+      end
+
+      ##
+      # Make sure the schema definition is legit
+      #
+      # @raise [ArgumentError] If scale or precision are not quite right
+      def validate!
+        raise ArgumentError, ERROR_INVALID_SCALE if scale.negative?
+        raise ArgumentError, ERROR_INVALID_PRECISION unless precision.positive?
+        raise ArgumentError, ERROR_PRECISION_TOO_SMALL if precision < scale
+      end

Review comment:
       Yes, but each encode/decode currently result in a new Schema for each item, so this seemed like a premature optimization. Looking at the code again, though, there's already some validation occurring. Moving this there would probably add some clarity.

##########
File path: lang/ruby/lib/avro/logical_types.rb
##########
@@ -81,10 +284,11 @@ def self.decode(datum)
       },
     }.freeze
 
-    def self.type_adapter(type, logical_type)
+    def self.type_adapter(type, logical_type, schema = nil)
       return unless logical_type
 
-      TYPES.fetch(type, {}.freeze).fetch(logical_type, Identity)
+      adapter = TYPES.fetch(type, {}.freeze).fetch(logical_type, Identity)
+      adapter.is_a?(Class) ? adapter.new(schema) : adapter

Review comment:
       That would probably make it more clear what's going on, but you can't make the module return `true` and expect it to function, so we decided on an identity check.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org