You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@datasketches.apache.org by GitBox <gi...@apache.org> on 2022/04/04 18:31:34 UTC

[GitHub] [datasketches-java] leerho commented on a diff in pull request #390: Direct kll double

leerho commented on code in PR #390:
URL: https://github.com/apache/datasketches-java/pull/390#discussion_r842029958


##########
src/main/java/org/apache/datasketches/kll/KllPreambleUtil.java:
##########
@@ -0,0 +1,372 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.datasketches.kll;
+
+import static org.apache.datasketches.Util.zeroPad;
+
+import org.apache.datasketches.Util;
+import org.apache.datasketches.memory.Memory;
+import org.apache.datasketches.memory.WritableMemory;
+
+//@formatter:off
+
+/**
+ * This class defines the serialized data structure and provides access methods for the key fields.
+ *
+ * <p>The intent of the design of this class was to isolate the detailed knowledge of the bit and
+ * byte layout of the serialized form of the sketches derived from the base sketch classes into one place.
+ * This allows the possibility of the introduction of different serialization
+ * schemes with minimal impact on the rest of the library.</p>
+ *
+ * <p>
+ * LAYOUT: The low significance bytes of this <i>long</i> based data structure are on the right.
+ * The multi-byte primitives are stored in native byte order.
+ * The single byte fields are treated as unsigned.</p>
+ *
+ * <p>An empty sketch requires only 8 bytes, which is only preamble.
+ * A serialized, non-empty KllDoublesSketch requires at least 16 bytes of preamble.
+ * A serialized, non-empty KllFloatsSketch requires at least 12 bytes of preamble.</p>
+ *
+ * <pre>{@code
+ * Serialized float sketch layout, more than one item:
+ *  Adr:
+ *      ||    7    |   6   |    5   |    4   |    3   |    2    |    1   |      0       |
+ *  0   || unused  |   M   |--------K--------|  Flags |  FamID  | SerVer | PreambleInts |
+ *      ||   15    |   14  |   13   |   12   |   11   |   10    |    9   |      8       |
+ *  1   ||---------------------------------N_LONG---------------------------------------|
+ *      ||         |       |        |   20   |   19   |    18   |   17   |      16      |
+ *  2   ||<-------Levels Arr Start----------]| unused |NumLevels|--Dynamic-Min K--------|
+ *      ||         |       |        |        |        |         |        |              |
+ *  ?   ||<-------Min/Max Arr Start---------]|[<----------Levels Arr End----------------|
+ *      ||         |       |        |        |        |         |        |              |
+ *  ?   ||<-----Float Items Arr Start-------]|[<---------Min/Max Arr End----------------|
+ *      ||         |       |        |        |        |         |        |              |
+ *  ?   ||         |       |        |        |[<-------Float Items Arr End--------------|
+ *
+ * Serialized float sketch layout, Empty (8 bytes) and Single Item (12 bytes):
+ *  Adr:
+ *      ||    7    |   6   |    5   |    4   |    3   |    2    |    1   |      0       |
+ *  0   || unused  |   M   |--------K--------|  Flags |  FamID  | SerVer | PreambleInts |
+ *      ||   15    |   14  |   13   |   12   |   11   |   10    |    9   |      8       |
+ *  1   ||                                   |-------------Single Item------------------|
+ *
+ *
+ *
+ * Serialized double sketch layout, more than one item:
+ *  Adr:
+ *      ||    7    |   6   |    5   |    4   |    3   |    2    |    1   |      0       |
+ *  0   || unused  |   M   |--------K--------|  Flags |  FamID  | SerVer | PreambleInts |
+ *      ||   15    |   14  |   13   |   12   |   11   |   10    |    9   |      8       |
+ *  1   ||---------------------------------N_LONG---------------------------------------|
+ *      ||   23    |   22  |   21   |   20   |   19   |    18   |   17   |      16      |
+ *  2   ||<-------Levels Arr Start----------]| unused |NumLevels|--Dynamic-Min K--------|
+ *      ||         |       |        |        |        |         |        |              |
+ *  ?   ||<-------Min/Max Arr Start---------]|[<----------Levels Arr End----------------|
+ *      ||         |       |        |        |        |         |        |              |
+ *  ?   ||<----Double Items Arr Start-------]|[<---------Min/Max Arr End----------------|
+ *      ||         |       |        |        |        |         |        |              |
+ *  ?   ||         |       |        |        |[<------Double Items Arr End--------------|
+ *
+ * Serialized double sketch layout, Empty (8 bytes) and Single Item (16 bytes):
+ *  Adr:
+ *      ||    7    |   6   |    5   |    4   |    3   |    2    |    1   |      0       |
+ *  0   || unused  |   M   |--------K--------|  Flags |  FamID  | SerVer | PreambleInts |
+ *      ||                                                               |      8       |
+ *  1   ||------------------------------Single Item-------------------------------------|
+ *
+ * The structure of the data block depends on Layout:
+ *
+ *   For FLOAT_SINGLE_COMPACT or DOUBLE_SINGLE_COMPACT:
+ *     The single data item is at offset DATA_START_ADR_SINGLE_ITEM = 8
+ *
+ *   For FLOAT_FULL_COMPACT:
+ *     The int[] levels array starts at offset DATA_START_ADR_FLOAT = 20 with a length of numLevels integers;
+ *     Followed by Float Min_Value, then Float Max_Value
+ *     Followed by an array of Floats of length retainedItems()
+ *
+ *   For DOUBLE_FULL_COMPACT
+ *     The int[] levels array starts at offset DATA_START_ADR_DOUBLE = 20 with a length of numLevels integers;
+ *     Followed by Double Min_Value, then Double Max_Value
+ *     Followed by an array of Doubles of length retainedItems()
+ *
+ *   For FLOAT_UPDATABLE
+ *     The int[] levels array starts at offset DATA_START_ADR_FLOAT = 20 with a length of (numLevels + 1) integers;
+ *     Followed by Float Min_Value, then Float Max_Value
+ *     Followed by an array of Floats of length KllHelper.computeTotalItemCapacity(...).
+ *
+ *   For DOUBLE_UPDATABLE
+ *     The int[] levels array starts at offset DATA_START_ADR_DOUBLE = 20 with a length of (numLevels + 1) integers;
+ *     Followed by Double Min_Value, then Double Max_Value
+ *     Followed by an array of Doubles of length KllHelper.computeTotalItemCapacity(...).
+ *
+ * }</pre>
+ *
+ *  @author Lee Rhodes
+ */
+final class KllPreambleUtil {
+
+  private KllPreambleUtil() {}
+
+  static final String LS = System.getProperty("line.separator");
+
+  /**
+   * The default value of K
+   */
+  public static final int DEFAULT_K = 200;
+  public static final int DEFAULT_M = 8;
+  static final int MAX_K = (1 << 16) - 1; // serialized as an unsigned short
+
+  // Preamble byte addresses
+  static final int PREAMBLE_INTS_BYTE_ADR     = 0;
+  static final int SER_VER_BYTE_ADR           = 1;
+  static final int FAMILY_BYTE_ADR            = 2;
+  static final int FLAGS_BYTE_ADR             = 3;
+  static final int K_SHORT_ADR                = 4;  // to 5
+  static final int M_BYTE_ADR                 = 6;
+  //                                            7 is reserved for future use
+  // SINGLE ITEM ONLY
+  static final int DATA_START_ADR_SINGLE_ITEM = 8;
+
+  // MULTI-ITEM
+  static final int N_LONG_ADR                 = 8;  // to 15
+  static final int DY_MIN_K_SHORT_ADR         = 16; // to 17
+  static final int NUM_LEVELS_BYTE_ADR        = 18;
+
+  // FLOAT SKETCH                               19 is reserved for future use in float sketch
+  static final int DATA_START_ADR_FLOAT       = 20; // float sketch, not single item
+
+  // DOUBLE SKETCH                              19 to 23 is reserved for future use in double sketch
+  static final int DATA_START_ADR_DOUBLE      = 20; // double sketch, not single item //TODO??

Review Comment:
   The comment in line 155 was obsolete.  I will fix.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@datasketches.apache.org
For additional commands, e-mail: commits-help@datasketches.apache.org