You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/04/24 21:37:55 UTC

[GitHub] [arrow] velvia commented on a change in pull request #4815: [DISCUSS] Add strawman proposal for sparseness and data integrity

velvia commented on a change in pull request #4815:
URL: https://github.com/apache/arrow/pull/4815#discussion_r414877852



##########
File path: format/Message.fbs
##########
@@ -21,10 +21,69 @@ include "Tensor.fbs";
 
 namespace org.apache.arrow.flatbuf;
 
+/// ------------------------------------------------------
+/// Buffer encoding schemes.
+/// -------------------------------------------------------
+
+/// Encoding for buffers representing integer as offsets from a reference value.
+/// This encoding uses less bits then the logical type indicates.
+/// It saves space when all values in the buffer can be represented with a
+/// small bit width (e.g. if all values in an int64 column are between -128
+/// and 127, then a bit-width of 8 can be be used) offset from the
+/// reference value.
+table FrameOfReferenceIntEncoding {
+  /// The value that all values in the buffer are relative to.
+  reference_value: long = 0;

Review comment:
       Depending on the size of your batch, a sloped representation would result in far smaller arrays, since the delta from a slope is typically much smaller and can fit into less bits.  You can still do O(1) access to any element, just compute ax + b etc.   Assuming the data is actually increasing, of course - otherwise step wise is fine.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org