You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/01/26 20:57:04 UTC

[GitHub] [iceberg] cccs-eric opened a new pull request #3988: Python: Add Decimal type conversion

cccs-eric opened a new pull request #3988:
URL: https://github.com/apache/iceberg/pull/3988


   python_legacy was missing support for Decimal conversion.  An error would be raised every time such a conversion happened:
   ```
    File "/home/.local/lib/python3.8/site-packages/iceberg/api/types/conversions.py", line 93, in from_byte_buffer
   [2022-01-25, 17:16:23 UTC] {custom_pod_launcher.py:53} INFO -     return Conversions.internal_from_byte_buffer(type_var.type_id, buffer_var)
   [2022-01-25, 17:16:23 UTC] {custom_pod_launcher.py:53} INFO -   File "/home/.local/lib/python3.8/site-packages/iceberg/api/types/conversions.py", line 98, in internal_from_byte_buffer
   [2022-01-25, 17:16:23 UTC] {custom_pod_launcher.py:53} INFO -     return Conversions.from_byte_buff_mapping.get(type_id)(type_id, buffer_var)
   [2022-01-25, 17:16:23 UTC] {custom_pod_launcher.py:53} INFO - TypeError: 'NoneType' object is not callable
   ```
   This error occurs since the map doesn't contain a lambda function for the Decimal TypeID.
   
   I added such functions, but needed to change the argument to `from_byte_buff_mapping`.  Used to be TypeID, now Type.  The public interfaces do not change, so it should be backward compatible.  The `internal_from_byte_buffer` is not needed anymore, so I removed it.  This *could* break compatibility, but nobody should be using this function outside Iceberg...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #3988: Python: Add Decimal type conversion

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #3988:
URL: https://github.com/apache/iceberg/pull/3988#discussion_r801905837



##########
File path: python_legacy/tests/api/test_conversions.py
##########
@@ -79,3 +87,9 @@ def test_to_bytes(self):
                          Literal.of(uuid.UUID("f79c3e09-677c-4bbd-a479-3f349cb785e7")).to_byte_buffer())
         self.assertEqual(b'foo', Literal.of(bytes(b'foo')).to_byte_buffer())
         self.assertEqual(b'foo', Literal.of(bytearray(b'foo')).to_byte_buffer())
+        # Decimal on 2-bytes
+        self.assertEqual(b'\x30\x39', Literal.of(123.45).to(DecimalType.of(5, 2)).to_byte_buffer())
+        # Decimal on 3-bytes to test that we use the minimum number of bytes
+        self.assertEqual(b'\x12\xd6\x87', Literal.of(123.4567).to(DecimalType.of(7, 4)).to_byte_buffer())
+        # Negative decimal to test two's complement
+        self.assertEqual(b'\xed\x29\x79', Literal.of(-123.4567).to(DecimalType.of(7, 4)).to_byte_buffer())

Review comment:
       No, I think keeping all the tests is a good thing. If anything, we should probably add the Python tests to Java.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] cccs-eric commented on a change in pull request #3988: Python: Add Decimal type conversion

Posted by GitBox <gi...@apache.org>.
cccs-eric commented on a change in pull request #3988:
URL: https://github.com/apache/iceberg/pull/3988#discussion_r793812444



##########
File path: python_legacy/iceberg/api/types/conversions.py
##########
@@ -90,11 +99,7 @@ def to_byte_buffer(type_id, value):
 
     @staticmethod
     def from_byte_buffer(type_var, buffer_var):
-        return Conversions.internal_from_byte_buffer(type_var.type_id, buffer_var)
-
-    @staticmethod
-    def internal_from_byte_buffer(type_id, buffer_var):
         try:
-            return Conversions.from_byte_buff_mapping.get(type_id)(type_id, buffer_var)
+            return Conversions.from_byte_buff_mapping.get(type_var.type_id)(type_var, buffer_var)
         except KeyError:
-            raise TypeError("Cannot deserialize Type: %s" % type_id)
+            raise TypeError("Cannot deserialize Type: %s" % type_var)

Review comment:
       Ok, I have changed my mind after make the modifications.  The reason is that for some types like `struct`, the `__str__` method may spit out a very long string.  It goes and [loops over every field](https://github.com/apache/iceberg/blob/master/python_legacy/iceberg/api/types/types.py#L526)...  For this reason, I would refrain from using `type_var` in the exception message to keep logs clean.  Let me know if you diasgree.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] cccs-eric commented on a change in pull request #3988: Python: Add Decimal type conversion

Posted by GitBox <gi...@apache.org>.
cccs-eric commented on a change in pull request #3988:
URL: https://github.com/apache/iceberg/pull/3988#discussion_r800916253



##########
File path: python_legacy/tests/api/test_conversions.py
##########
@@ -79,3 +87,9 @@ def test_to_bytes(self):
                          Literal.of(uuid.UUID("f79c3e09-677c-4bbd-a479-3f349cb785e7")).to_byte_buffer())
         self.assertEqual(b'foo', Literal.of(bytes(b'foo')).to_byte_buffer())
         self.assertEqual(b'foo', Literal.of(bytearray(b'foo')).to_byte_buffer())
+        # Decimal on 2-bytes
+        self.assertEqual(b'\x30\x39', Literal.of(123.45).to(DecimalType.of(5, 2)).to_byte_buffer())
+        # Decimal on 3-bytes to test that we use the minimum number of bytes
+        self.assertEqual(b'\x12\xd6\x87', Literal.of(123.4567).to(DecimalType.of(7, 4)).to_byte_buffer())
+        # Negative decimal to test two's complement
+        self.assertEqual(b'\xed\x29\x79', Literal.of(-123.4567).to(DecimalType.of(7, 4)).to_byte_buffer())

Review comment:
       @rdblue Should I remove the "old" tests in python and only keep the ones in-sync with Java?
   I also added 3 tests in the Java implementation as requested.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] cccs-eric commented on a change in pull request #3988: Python: Add Decimal type conversion

Posted by GitBox <gi...@apache.org>.
cccs-eric commented on a change in pull request #3988:
URL: https://github.com/apache/iceberg/pull/3988#discussion_r793647114



##########
File path: python_legacy/iceberg/api/types/conversions.py
##########
@@ -90,11 +99,7 @@ def to_byte_buffer(type_id, value):
 
     @staticmethod
     def from_byte_buffer(type_var, buffer_var):
-        return Conversions.internal_from_byte_buffer(type_var.type_id, buffer_var)
-
-    @staticmethod
-    def internal_from_byte_buffer(type_id, buffer_var):
         try:
-            return Conversions.from_byte_buff_mapping.get(type_id)(type_id, buffer_var)
+            return Conversions.from_byte_buff_mapping.get(type_var.type_id)(type_var, buffer_var)
         except KeyError:
-            raise TypeError("Cannot deserialize Type: %s" % type_id)
+            raise TypeError("Cannot deserialize Type: %s" % type_var)

Review comment:
       I'm not sure what we want to express here.  Here's what I think:
   ```Python
   raise TypeError("Cannot deserialize Type: %s" % type_var)
   ```
   If we want to express that we could not deserialize a type, it makes sense that the error specifies which type it is by echoing what was passed to the function.  It could not deserialize **this** type and not a TypeID.  The dev never asked for a TypeID to be deserialized...
   
   ```Python
   raise TypeError("Cannot deserialize Type: %s" % type_var.type_id)
   ```
   If we want to use `type_id` in the message, we should say that we couldn't find a converter or a deserializer for this TypeID.  
   
   I think what is causing confusion is that `to_byte_buffer()` and `from_byte_buffer()` don't align...  But maybe it's just me 😄   That's my 2cents, let me know what you prefer.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] cccs-eric commented on a change in pull request #3988: Python: Add Decimal type conversion

Posted by GitBox <gi...@apache.org>.
cccs-eric commented on a change in pull request #3988:
URL: https://github.com/apache/iceberg/pull/3988#discussion_r801990105



##########
File path: python_legacy/tests/api/test_conversions.py
##########
@@ -79,3 +87,9 @@ def test_to_bytes(self):
                          Literal.of(uuid.UUID("f79c3e09-677c-4bbd-a479-3f349cb785e7")).to_byte_buffer())
         self.assertEqual(b'foo', Literal.of(bytes(b'foo')).to_byte_buffer())
         self.assertEqual(b'foo', Literal.of(bytearray(b'foo')).to_byte_buffer())
+        # Decimal on 2-bytes
+        self.assertEqual(b'\x30\x39', Literal.of(123.45).to(DecimalType.of(5, 2)).to_byte_buffer())
+        # Decimal on 3-bytes to test that we use the minimum number of bytes
+        self.assertEqual(b'\x12\xd6\x87', Literal.of(123.4567).to(DecimalType.of(7, 4)).to_byte_buffer())
+        # Negative decimal to test two's complement
+        self.assertEqual(b'\xed\x29\x79', Literal.of(-123.4567).to(DecimalType.of(7, 4)).to_byte_buffer())

Review comment:
       Yes, I added 3 tests to the Java implementation that I felt were not covered by the Java test suite.  If it's what you are talking about...




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] cccs-eric commented on a change in pull request #3988: Python: Add Decimal type conversion

Posted by GitBox <gi...@apache.org>.
cccs-eric commented on a change in pull request #3988:
URL: https://github.com/apache/iceberg/pull/3988#discussion_r796527010



##########
File path: python_legacy/iceberg/api/types/conversions.py
##########
@@ -53,48 +54,45 @@ class Conversions(object):
                                                                             & 0xFFFFFFFFFFFFFFFF),
                             TypeID.FIXED: lambda type_id, value: value,
                             TypeID.BINARY: lambda type_id, value: value,
-                            # TypeId.DECIMAL: lambda type_var, value: struct.pack(value.quantize(
-                            #     Decimal('.' + "".join(['0' for x in range(0, type_var.scale)]) + '1'))
+                            TypeID.DECIMAL: decimal_to_bytes
                             }
 
-    from_byte_buff_mapping = {TypeID.BOOLEAN: lambda type_id, value: struct.unpack('<?', value)[0] != 0,
-                              TypeID.INTEGER: lambda type_id, value: struct.unpack('<i', value)[0],
-                              TypeID.DATE: lambda type_id, value: struct.unpack('<i', value)[0],
-                              TypeID.LONG: lambda type_id, value: struct.unpack('<q', value)[0],
-                              TypeID.TIME: lambda type_id, value: struct.unpack('<q', value)[0],
-                              TypeID.TIMESTAMP: lambda type_id, value: struct.unpack('<q', value)[0],
-                              TypeID.FLOAT: lambda type_id, value: struct.unpack('<f', value)[0],
-                              TypeID.DOUBLE: lambda type_id, value: struct.unpack('<d', value)[0],
-                              TypeID.STRING: lambda type_id, value: bytes(value).decode("utf-8"),
-                              TypeID.UUID: lambda type_id, value:
+    from_byte_buff_mapping = {TypeID.BOOLEAN: lambda type_var, value: struct.unpack('<?', value)[0] != 0,
+                              TypeID.INTEGER: lambda type_var, value: struct.unpack('<i', value)[0],
+                              TypeID.DATE: lambda type_var, value: struct.unpack('<i', value)[0],
+                              TypeID.LONG: lambda type_var, value: struct.unpack('<q', value)[0],
+                              TypeID.TIME: lambda type_var, value: struct.unpack('<q', value)[0],
+                              TypeID.TIMESTAMP: lambda type_var, value: struct.unpack('<q', value)[0],
+                              TypeID.FLOAT: lambda type_var, value: struct.unpack('<f', value)[0],
+                              TypeID.DOUBLE: lambda type_var, value: struct.unpack('<d', value)[0],
+                              TypeID.STRING: lambda type_var, value: bytes(value).decode("utf-8"),
+                              TypeID.UUID: lambda type_var, value:
                               uuid.UUID(int=struct.unpack('>QQ', value)[0] << 64 | struct.unpack('>QQ', value)[1]),
-                              TypeID.FIXED: lambda type_id, value: value,
-                              TypeID.BINARY: lambda type_id, value: value}
+                              TypeID.FIXED: lambda type_var, value: value,
+                              TypeID.BINARY: lambda type_var, value: value,
+                              TypeID.DECIMAL: lambda type_var, value: Decimal(int.from_bytes(value, 'big', signed=True) * 10**-type_var.scale)
+                              }
 
     @staticmethod
     def from_partition_string(type_var, as_string):
         if as_string is None or Conversions.HIVE_NULL == as_string:
             return None
         part_func = Conversions.value_mapping.get(type_var.type_id)
         if part_func is None:
-            raise RuntimeError("Unsupported type for fromPartitionString: %s" % type_var)
+            raise RuntimeError(f"Unsupported type for from_partition_string: {type_var}")
 
         return part_func(as_string)
 
     @staticmethod
     def to_byte_buffer(type_id, value):

Review comment:
       @jun-he  Even though I fully agree, what would look like a simple change becomes very invasive.  I've looked into it and it's all good except for this [line](https://github.com/apache/iceberg/blob/master/python_legacy/iceberg/api/expressions/literals.py#L140):
   `self.byte_buffer = Conversions.to_byte_buffer(self.type_id, self.value)`
   
   `BaseLiteral` instances are built using a `type_id` and it becomes tricky to use a `type_var` for literal types like `TimestampType`, `DecimalType` and `FixedType` that do not offer a `get()` method.  Because you guys are refactoring the Python library, I would make that change there is leave python_legacy as-is.  What do you think?  Am I over-complexifying the task at hand?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] cccs-eric commented on a change in pull request #3988: Python: Add Decimal type conversion

Posted by GitBox <gi...@apache.org>.
cccs-eric commented on a change in pull request #3988:
URL: https://github.com/apache/iceberg/pull/3988#discussion_r793796699



##########
File path: python_legacy/iceberg/api/types/conversions.py
##########
@@ -90,11 +99,7 @@ def to_byte_buffer(type_id, value):
 
     @staticmethod
     def from_byte_buffer(type_var, buffer_var):
-        return Conversions.internal_from_byte_buffer(type_var.type_id, buffer_var)
-
-    @staticmethod
-    def internal_from_byte_buffer(type_id, buffer_var):
         try:
-            return Conversions.from_byte_buff_mapping.get(type_id)(type_id, buffer_var)
+            return Conversions.from_byte_buff_mapping.get(type_var.type_id)(type_var, buffer_var)
         except KeyError:
-            raise TypeError("Cannot deserialize Type: %s" % type_id)
+            raise TypeError("Cannot deserialize Type: %s" % type_var)

Review comment:
       Yes, it's a KeyError that we are catching, but we are raising a TypeError!  Maybe we should do a combination of both.  Let me try something and you can comment on it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #3988: Python: Add Decimal type conversion

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #3988:
URL: https://github.com/apache/iceberg/pull/3988#discussion_r801966825



##########
File path: python_legacy/tests/api/test_conversions.py
##########
@@ -79,3 +87,9 @@ def test_to_bytes(self):
                          Literal.of(uuid.UUID("f79c3e09-677c-4bbd-a479-3f349cb785e7")).to_byte_buffer())
         self.assertEqual(b'foo', Literal.of(bytes(b'foo')).to_byte_buffer())
         self.assertEqual(b'foo', Literal.of(bytearray(b'foo')).to_byte_buffer())
+        # Decimal on 2-bytes
+        self.assertEqual(b'\x30\x39', Literal.of(123.45).to(DecimalType.of(5, 2)).to_byte_buffer())
+        # Decimal on 3-bytes to test that we use the minimum number of bytes
+        self.assertEqual(b'\x12\xd6\x87', Literal.of(123.4567).to(DecimalType.of(7, 4)).to_byte_buffer())
+        # Negative decimal to test two's complement
+        self.assertEqual(b'\xed\x29\x79', Literal.of(-123.4567).to(DecimalType.of(7, 4)).to_byte_buffer())

Review comment:
       Oh, looks like you already did!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] samredai commented on a change in pull request #3988: Python: Add Decimal type conversion

Posted by GitBox <gi...@apache.org>.
samredai commented on a change in pull request #3988:
URL: https://github.com/apache/iceberg/pull/3988#discussion_r793769057



##########
File path: python_legacy/iceberg/api/types/conversions.py
##########
@@ -90,11 +99,7 @@ def to_byte_buffer(type_id, value):
 
     @staticmethod
     def from_byte_buffer(type_var, buffer_var):
-        return Conversions.internal_from_byte_buffer(type_var.type_id, buffer_var)
-
-    @staticmethod
-    def internal_from_byte_buffer(type_id, buffer_var):
         try:
-            return Conversions.from_byte_buff_mapping.get(type_id)(type_id, buffer_var)
+            return Conversions.from_byte_buff_mapping.get(type_var.type_id)(type_var, buffer_var)
         except KeyError:
-            raise TypeError("Cannot deserialize Type: %s" % type_id)
+            raise TypeError("Cannot deserialize Type: %s" % type_var)

Review comment:
       Ah I see, that makes sense. With the write `__str__` for `type_var` I think using `type_var` alone would make sense, assuming that the type is also visible. The main thing I think is that since it's a `KeyError`, the error should make it obvious what the key was. Even including both feels ok to me. `f"Cannot deserialize {type_var.type_id}: {type_var}"`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] cccs-eric commented on a change in pull request #3988: Python: Add Decimal type conversion

Posted by GitBox <gi...@apache.org>.
cccs-eric commented on a change in pull request #3988:
URL: https://github.com/apache/iceberg/pull/3988#discussion_r793636864



##########
File path: python_legacy/iceberg/api/types/conversions.py
##########
@@ -90,11 +99,7 @@ def to_byte_buffer(type_id, value):
 
     @staticmethod
     def from_byte_buffer(type_var, buffer_var):
-        return Conversions.internal_from_byte_buffer(type_var.type_id, buffer_var)
-
-    @staticmethod
-    def internal_from_byte_buffer(type_id, buffer_var):
         try:
-            return Conversions.from_byte_buff_mapping.get(type_id)(type_id, buffer_var)
+            return Conversions.from_byte_buff_mapping.get(type_var.type_id)(type_var, buffer_var)

Review comment:
       Should I apply the same fix for `to_byte_buffer()`:
   ` return Conversions.to_byte_buff_mapping.get(type_id)(type_id, value)`
   would become
   `return Conversions.to_byte_buff_mapping[type_id](type_id, value)`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] cccs-eric commented on pull request #3988: Python: Add Decimal type conversion

Posted by GitBox <gi...@apache.org>.
cccs-eric commented on pull request #3988:
URL: https://github.com/apache/iceberg/pull/3988#issuecomment-1025968006


   @rdblue Here is another PR related to python_legacy.  Sam requested your input earlier, but I think you missed it.  Would you mind having a look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #3988: Python: Add Decimal type conversion

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #3988:
URL: https://github.com/apache/iceberg/pull/3988#discussion_r800242361



##########
File path: python_legacy/tests/api/test_conversions.py
##########
@@ -79,3 +87,9 @@ def test_to_bytes(self):
                          Literal.of(uuid.UUID("f79c3e09-677c-4bbd-a479-3f349cb785e7")).to_byte_buffer())
         self.assertEqual(b'foo', Literal.of(bytes(b'foo')).to_byte_buffer())
         self.assertEqual(b'foo', Literal.of(bytearray(b'foo')).to_byte_buffer())
+        # Decimal on 2-bytes
+        self.assertEqual(b'\x30\x39', Literal.of(123.45).to(DecimalType.of(5, 2)).to_byte_buffer())
+        # Decimal on 3-bytes to test that we use the minimum number of bytes
+        self.assertEqual(b'\x12\xd6\x87', Literal.of(123.4567).to(DecimalType.of(7, 4)).to_byte_buffer())
+        # Negative decimal to test two's complement
+        self.assertEqual(b'\xed\x29\x79', Literal.of(-123.4567).to(DecimalType.of(7, 4)).to_byte_buffer())

Review comment:
       Can you make sure the test cases are shared between the Java implementation and here? There's a test for Java in `TestConversions` but it would be good to add these cases and ensure that they match. I'd also like to have a case that requires an empty byte, like `Literal.of(0.11).to(DecimalType.of(10, 2)).to_byte_buffer()`, as well as one for negative numbers.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #3988: Python: Add Decimal type conversion

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #3988:
URL: https://github.com/apache/iceberg/pull/3988#discussion_r800242408



##########
File path: python_legacy/iceberg/api/types/type_util.py
##########
@@ -105,6 +106,14 @@ def assign_fresh_ids(type_var, next_id):
                            .as_nested_type().fields))
 
 
+def decimal_to_bytes(_, value):
+    scale = abs(value.as_tuple().exponent)
+    quantized_value = value.quantize(Decimal("10")**-scale)
+    unscaled_value = int((quantized_value * 10**scale).to_integral_value())
+    min_num_bytes = (unscaled_value.bit_length() + 7) // 8
+    return unscaled_value.to_bytes(min_num_bytes, 'big', signed=True)

Review comment:
       This implementation seems fine to me, but I'd really like to have tests in sync between Java and Python to ensure it is correct.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue merged pull request #3988: Python: Add Decimal type conversion

Posted by GitBox <gi...@apache.org>.
rdblue merged pull request #3988:
URL: https://github.com/apache/iceberg/pull/3988


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] cccs-eric commented on pull request #3988: Python: Add Decimal type conversion

Posted by GitBox <gi...@apache.org>.
cccs-eric commented on pull request #3988:
URL: https://github.com/apache/iceberg/pull/3988#issuecomment-1022595801


   @samredai Can you take a look at this?  I've made it a draft to see what you think.  If you are ok with it, I'll convert it to an official PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jun-he commented on a change in pull request #3988: Python: Add Decimal type conversion

Posted by GitBox <gi...@apache.org>.
jun-he commented on a change in pull request #3988:
URL: https://github.com/apache/iceberg/pull/3988#discussion_r796338920



##########
File path: python_legacy/iceberg/api/types/conversions.py
##########
@@ -53,48 +54,45 @@ class Conversions(object):
                                                                             & 0xFFFFFFFFFFFFFFFF),
                             TypeID.FIXED: lambda type_id, value: value,
                             TypeID.BINARY: lambda type_id, value: value,
-                            # TypeId.DECIMAL: lambda type_var, value: struct.pack(value.quantize(
-                            #     Decimal('.' + "".join(['0' for x in range(0, type_var.scale)]) + '1'))
+                            TypeID.DECIMAL: decimal_to_bytes
                             }
 
-    from_byte_buff_mapping = {TypeID.BOOLEAN: lambda type_id, value: struct.unpack('<?', value)[0] != 0,
-                              TypeID.INTEGER: lambda type_id, value: struct.unpack('<i', value)[0],
-                              TypeID.DATE: lambda type_id, value: struct.unpack('<i', value)[0],
-                              TypeID.LONG: lambda type_id, value: struct.unpack('<q', value)[0],
-                              TypeID.TIME: lambda type_id, value: struct.unpack('<q', value)[0],
-                              TypeID.TIMESTAMP: lambda type_id, value: struct.unpack('<q', value)[0],
-                              TypeID.FLOAT: lambda type_id, value: struct.unpack('<f', value)[0],
-                              TypeID.DOUBLE: lambda type_id, value: struct.unpack('<d', value)[0],
-                              TypeID.STRING: lambda type_id, value: bytes(value).decode("utf-8"),
-                              TypeID.UUID: lambda type_id, value:
+    from_byte_buff_mapping = {TypeID.BOOLEAN: lambda type_var, value: struct.unpack('<?', value)[0] != 0,
+                              TypeID.INTEGER: lambda type_var, value: struct.unpack('<i', value)[0],
+                              TypeID.DATE: lambda type_var, value: struct.unpack('<i', value)[0],
+                              TypeID.LONG: lambda type_var, value: struct.unpack('<q', value)[0],
+                              TypeID.TIME: lambda type_var, value: struct.unpack('<q', value)[0],
+                              TypeID.TIMESTAMP: lambda type_var, value: struct.unpack('<q', value)[0],
+                              TypeID.FLOAT: lambda type_var, value: struct.unpack('<f', value)[0],
+                              TypeID.DOUBLE: lambda type_var, value: struct.unpack('<d', value)[0],
+                              TypeID.STRING: lambda type_var, value: bytes(value).decode("utf-8"),
+                              TypeID.UUID: lambda type_var, value:
                               uuid.UUID(int=struct.unpack('>QQ', value)[0] << 64 | struct.unpack('>QQ', value)[1]),
-                              TypeID.FIXED: lambda type_id, value: value,
-                              TypeID.BINARY: lambda type_id, value: value}
+                              TypeID.FIXED: lambda type_var, value: value,
+                              TypeID.BINARY: lambda type_var, value: value,
+                              TypeID.DECIMAL: lambda type_var, value: Decimal(int.from_bytes(value, 'big', signed=True) * 10**-type_var.scale)
+                              }
 
     @staticmethod
     def from_partition_string(type_var, as_string):
         if as_string is None or Conversions.HIVE_NULL == as_string:
             return None
         part_func = Conversions.value_mapping.get(type_var.type_id)
         if part_func is None:
-            raise RuntimeError("Unsupported type for fromPartitionString: %s" % type_var)
+            raise RuntimeError(f"Unsupported type for from_partition_string: {type_var}")
 
         return part_func(as_string)
 
     @staticmethod
     def to_byte_buffer(type_id, value):

Review comment:
       nit: to be consistent to use `type_var` here as well.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] cccs-eric commented on a change in pull request #3988: Python: Add Decimal type conversion

Posted by GitBox <gi...@apache.org>.
cccs-eric commented on a change in pull request #3988:
URL: https://github.com/apache/iceberg/pull/3988#discussion_r793812444



##########
File path: python_legacy/iceberg/api/types/conversions.py
##########
@@ -90,11 +99,7 @@ def to_byte_buffer(type_id, value):
 
     @staticmethod
     def from_byte_buffer(type_var, buffer_var):
-        return Conversions.internal_from_byte_buffer(type_var.type_id, buffer_var)
-
-    @staticmethod
-    def internal_from_byte_buffer(type_id, buffer_var):
         try:
-            return Conversions.from_byte_buff_mapping.get(type_id)(type_id, buffer_var)
+            return Conversions.from_byte_buff_mapping.get(type_var.type_id)(type_var, buffer_var)
         except KeyError:
-            raise TypeError("Cannot deserialize Type: %s" % type_id)
+            raise TypeError("Cannot deserialize Type: %s" % type_var)

Review comment:
       Ok, I have changed my mind after make the modifications.  The reason is that for some types like `struct`, the `__str__` method may spit out a very long string.  It goes and [loops over every field](https://github.com/apache/iceberg/blob/master/python_legacy/iceberg/api/types/types.py#L526)...  For this reason, I would refrain from using `type_var` in the exception message to keep logs clean.  Let me know if you disagree.  I have proposed something in my latest push.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #3988: Python: Add Decimal type conversion

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #3988:
URL: https://github.com/apache/iceberg/pull/3988#issuecomment-1034367518


   Thanks, @cccs-eric! Looks great.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #3988: Python: Add Decimal type conversion

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #3988:
URL: https://github.com/apache/iceberg/pull/3988#discussion_r801966615



##########
File path: python_legacy/tests/api/test_conversions.py
##########
@@ -79,3 +87,155 @@ def test_to_bytes(self):
                          Literal.of(uuid.UUID("f79c3e09-677c-4bbd-a479-3f349cb785e7")).to_byte_buffer())
         self.assertEqual(b'foo', Literal.of(bytes(b'foo')).to_byte_buffer())
         self.assertEqual(b'foo', Literal.of(bytearray(b'foo')).to_byte_buffer())
+        # Decimal on 2-bytes
+        self.assertEqual(b'\x30\x39', Literal.of(123.45).to(DecimalType.of(5, 2)).to_byte_buffer())
+        # Decimal on 3-bytes to test that we use the minimum number of bytes
+        self.assertEqual(b'\x12\xd6\x87', Literal.of(123.4567).to(DecimalType.of(7, 4)).to_byte_buffer())
+        # Negative decimal to test two's complement
+        self.assertEqual(b'\xed\x29\x79', Literal.of(-123.4567).to(DecimalType.of(7, 4)).to_byte_buffer())
+
+    def test_byte_buffer_conversions(self):

Review comment:
       Nice work!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #3988: Python: Add Decimal type conversion

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #3988:
URL: https://github.com/apache/iceberg/pull/3988#issuecomment-1025971088


   @cccs-eric, will do.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] samredai commented on a change in pull request #3988: Python: Add Decimal type conversion

Posted by GitBox <gi...@apache.org>.
samredai commented on a change in pull request #3988:
URL: https://github.com/apache/iceberg/pull/3988#discussion_r793770074



##########
File path: python_legacy/iceberg/api/types/conversions.py
##########
@@ -90,11 +99,7 @@ def to_byte_buffer(type_id, value):
 
     @staticmethod
     def from_byte_buffer(type_var, buffer_var):
-        return Conversions.internal_from_byte_buffer(type_var.type_id, buffer_var)
-
-    @staticmethod
-    def internal_from_byte_buffer(type_id, buffer_var):
         try:
-            return Conversions.from_byte_buff_mapping.get(type_id)(type_id, buffer_var)
+            return Conversions.from_byte_buff_mapping.get(type_var.type_id)(type_var, buffer_var)

Review comment:
       Good catch, yes that should be changed too.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] samredai commented on a change in pull request #3988: Python: Add Decimal type conversion

Posted by GitBox <gi...@apache.org>.
samredai commented on a change in pull request #3988:
URL: https://github.com/apache/iceberg/pull/3988#discussion_r793193421



##########
File path: python_legacy/iceberg/api/types/conversions.py
##########
@@ -23,6 +23,14 @@
 from .type import TypeID
 
 
+def decimal_to_bytes(type_id, value):

Review comment:
       There's a [type_util.py](https://github.com/apache/iceberg/blob/master/python_legacy/iceberg/api/types/type_util.py) file. It makes sense to put this in there and import it here.

##########
File path: python_legacy/iceberg/api/types/conversions.py
##########
@@ -23,6 +23,14 @@
 from .type import TypeID
 
 
+def decimal_to_bytes(type_id, value):

Review comment:
       I see that you added `type_id` since you needed to match the other mappings, but since it's unused it might be helpful to signify that with an underscore. `def decimal_to_bytes(_, value):`.

##########
File path: python_legacy/iceberg/api/types/conversions.py
##########
@@ -90,11 +99,7 @@ def to_byte_buffer(type_id, value):
 
     @staticmethod
     def from_byte_buffer(type_var, buffer_var):
-        return Conversions.internal_from_byte_buffer(type_var.type_id, buffer_var)
-
-    @staticmethod
-    def internal_from_byte_buffer(type_id, buffer_var):
         try:
-            return Conversions.from_byte_buff_mapping.get(type_id)(type_id, buffer_var)
+            return Conversions.from_byte_buff_mapping.get(type_var.type_id)(type_var, buffer_var)

Review comment:
       This will never hit the `KeyError` and will instead hit a similar error to what you originally saw that won't be caught by the except block. Instead of using `get`, I think it should be a normal dictionary lookup using square brackets:
   ```py
   return Conversions.from_byte_buff_mapping[type_var.type_id](type_var, buffer_var)
   ```

##########
File path: python_legacy/iceberg/api/types/conversions.py
##########
@@ -90,11 +99,7 @@ def to_byte_buffer(type_id, value):
 
     @staticmethod
     def from_byte_buffer(type_var, buffer_var):
-        return Conversions.internal_from_byte_buffer(type_var.type_id, buffer_var)
-
-    @staticmethod
-    def internal_from_byte_buffer(type_id, buffer_var):
         try:
-            return Conversions.from_byte_buff_mapping.get(type_id)(type_id, buffer_var)
+            return Conversions.from_byte_buff_mapping.get(type_var.type_id)(type_var, buffer_var)
         except KeyError:
-            raise TypeError("Cannot deserialize Type: %s" % type_id)
+            raise TypeError("Cannot deserialize Type: %s" % type_var)

Review comment:
       Should this be `type_var.type_id` instead of `type_var`? And maybe an f-string: `f"Cannot deserialize Type: {type_var.type_id}"`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org