You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Jacques Nadeau (JIRA)" <ji...@apache.org> on 2017/07/26 15:52:00 UTC

[jira] [Comment Edited] (ARROW-786) [Format] In-memory format for 128-bit Decimals, handling of sign bit

    [ https://issues.apache.org/jira/browse/ARROW-786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16101836#comment-16101836 ] 

Jacques Nadeau edited comment on ARROW-786 at 7/26/17 3:51 PM:
---------------------------------------------------------------

The current format of the java implementation is an embedded sign bit. GCC/Clang/Intel support __int128 which I believe on x86-64 machines is represented with the sign bit embedded ( ? ). I remember talking to [~nongli] about this years ago and (if I recall correctly), we chose the Parquet representation based on his experiments with GCC or Clang/LLVM. (Unfortunately, I'm unable to find the thread.)

The current Java implementation supports a 16-bit wide, sign-bit embedded twos-complement big-endian representation that is the same as the Parquet description here: 

https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L81


was (Author: jnadeau):
The current format of the java implementation is an embedded sign bit. GCC/Clang/Intel support __int128 which I believe on x86-64 machines is represented with the sign bit embedded (?). I remember talking to [~nongli] about this years ago and (if I recall correctly), we chose the Parquet representation based on his experiments with GCC or Clang/LLVM. (Unfortunately, I'm unable to find the thread.)

The current Java implementation supports a 16-bit wide, sign-bit embedded twos-complement big-endian representation that is the same as the Parquet description here: 

https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L81

> [Format] In-memory format for 128-bit Decimals, handling of sign bit
> --------------------------------------------------------------------
>
>                 Key: ARROW-786
>                 URL: https://issues.apache.org/jira/browse/ARROW-786
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Format
>            Reporter: Wes McKinney
>             Fix For: 0.6.0
>
>
> cc [~cpcloud]
> We found in ARROW-655 that we needed to add an extra bit for signedness for decimals stored as 128-bit values to be able to use the Boost multiprecision libraries. This makes Decimal128 not fit completely neatly as a 16-byte fixed size binary value, and more of a {{struct<sign_bitmap: boolean, data: fixed_size_binary(16)>}}. What is the current formata in the Java implementation? We will need to document the memory layout for decimals that maximizes compatibility across languages and eventually implement integration tests for IPC. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)