You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/11/17 19:23:48 UTC

[GitHub] [iceberg] alec-heif opened a new issue, #6210: pyiceberg BinaryDecoder does not correctly read int values

alec-heif opened a new issue, #6210:
URL: https://github.com/apache/iceberg/issues/6210

   ### Apache Iceberg version
   
   _No response_
   
   ### Query engine
   
   _No response_
   
   ### Please describe the bug 🐞
   
   The logic in https://github.com/apache/iceberg/blob/master/python/pyiceberg/avro/decoder.py#L70 appears to be incorrect.
   
   The spec for a binary-encoded `int` in the manifest files is [as follows:](https://iceberg.apache.org/spec/#binary-single-value-serialization)
   
   ```
   int | Stored as 4-byte little-endian
   ```
   
   so, an example bytestring of `0xad4a0000` should be read as the decimal `19117`:
   
   1. lsb `0xad` is 173
   2. 2nd lsb `0x4a` is 74
   3. (74 * 256) + 173 == 19117
   
   however `BinaryDecoder` does not read this correctly:
   ```
   import io
   def as_fo(x):
       return io.BytesIO(bytes.fromhex('ad4a0000'))
   
   assert as_fo('ad4a0000').read(4).hex() == 'ad4a0000'
   assert BinaryDecoder(as_io('ad4a0000')).read_int() == -4759
   ```
   
   it is not obvious by inspection of `BinaryDecoder.read_int` where the bug is, but it is clearly a bug.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] alec-heif commented on issue #6210: pyiceberg BinaryDecoder does not correctly read int values

Posted by GitBox <gi...@apache.org>.
alec-heif commented on issue #6210:
URL: https://github.com/apache/iceberg/issues/6210#issuecomment-1319098656

   fyi @Fokko since this logic appears to have been introduced by https://github.com/apache/iceberg/pull/4920


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] alec-heif commented on issue #6210: pyiceberg BinaryDecoder does not correctly read int values

Posted by GitBox <gi...@apache.org>.
alec-heif commented on issue #6210:
URL: https://github.com/apache/iceberg/issues/6210#issuecomment-1319134076

   somewhat disturbingly, the [unit test ](https://github.com/apache/iceberg/blob/master/python/tests/avro/test_decoder.py#L166)coverage for this function also verifies the (incorrect) behavior:
   
   ```    
   mis = MemoryInputStream(b"\xBC\x7D")
   decoder = BinaryDecoder(mis)
   assert decoder.read_date_from_int() == date(1991, 12, 27)
   ```
   
   I believe that `0xBC7D0000` should not be 1991-12-27
   
   1. lsb `0xbc == 188`
   2. 2nd lsb `0x7d == 125`
   3. `(125 * 256) + 188 == 32188` which is 32188 days, or `2058-02-16` 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] alec-heif closed issue #6210: pyiceberg BinaryDecoder does not correctly read 4-byte little-endian int values

Posted by GitBox <gi...@apache.org>.
alec-heif closed issue #6210: pyiceberg BinaryDecoder does not correctly read 4-byte little-endian int values
URL: https://github.com/apache/iceberg/issues/6210


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on issue #6210: pyiceberg BinaryDecoder does not correctly read 4-byte little-endian int values

Posted by GitBox <gi...@apache.org>.
Fokko commented on issue #6210:
URL: https://github.com/apache/iceberg/issues/6210#issuecomment-1319152002

   No problem at all, happy to help


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] alec-heif commented on issue #6210: pyiceberg BinaryDecoder does not correctly read 4-byte little-endian int values

Posted by GitBox <gi...@apache.org>.
alec-heif commented on issue #6210:
URL: https://github.com/apache/iceberg/issues/6210#issuecomment-1319146864

   hoooo wow ok that's embarrassing, thanks for correcting!! sorry for the false alarm here, i don't know how i missed this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on issue #6210: pyiceberg BinaryDecoder does not correctly read 4-byte little-endian int values

Posted by GitBox <gi...@apache.org>.
Fokko commented on issue #6210:
URL: https://github.com/apache/iceberg/issues/6210#issuecomment-1319141897

   Hey @alec-heif Thanks for opening this PR. I think we're mixing up different file types. The decoder that you pointed out in the example, is an Avro decoder that adheres to the [Avro spec](https://avro.apache.org/docs/1.11.1/specification/#primitive-types-1). int and long values are written using [variable-length](https://lucene.apache.org/java/3_5_0/fileformats.html#VInt) [zig-zag](https://code.google.com/apis/protocolbuffers/docs/encoding.html#types) coding.
   
   The decoding of the single values happens using the `_from_byte_buffer` method, which produces the same result as your example:
   ![image](https://user-images.githubusercontent.com/1134248/202548298-9d440258-e061-4447-b6a4-768add7b4f3d.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org