You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2021/06/22 22:31:18 UTC

[GitHub] [accumulo] ctubbsii commented on a change in pull request #2117: Make sorted recovery write to RFiles

ctubbsii commented on a change in pull request #2117:
URL: https://github.com/apache/accumulo/pull/2117#discussion_r656625243



##########
File path: server/tserver/src/main/java/org/apache/accumulo/tserver/logger/LogFileKey.java
##########
@@ -189,4 +198,162 @@ public String toString() {
     }
     throw new RuntimeException("Unknown type of entry: " + event);
   }
+
+  /**
+   * Converts LogFileKey to Key. Creates a Key containing all of the LogFileKey fields. The fields
+   * are stored so the Key sorts maintaining the legacy sort order. The row of the Key is composed
+   * of 3 fields: EventNum + tabletID + seq. The EventNum is the byte returned by eventType(). The
+   * column family is always the event. The column qualifier is dependent of the type of event and
+   * could be empty.
+   *
+   * <pre>
+   *     Key Schema:
+   *     Row = EventNum + tabletID + seq
+   *     Family = event
+   *     Qualifier = tserverSession OR filename OR KeyExtent
+   * </pre>
+   */
+  public Key toKey() throws IOException {
+    byte[] formattedRow;
+    String family = event.name();
+    var kb = Key.builder();
+    switch (event) {
+      case OPEN:
+        formattedRow = formatRow(0, 0);
+        return kb.row(formattedRow).family(family).qualifier(tserverSession).build();
+      case COMPACTION_START:
+        formattedRow = formatRow(tabletId, seq);
+        return kb.row(formattedRow).family(family).qualifier(filename).build();
+      case MUTATION:
+      case MANY_MUTATIONS:
+      case COMPACTION_FINISH:
+        return kb.row(formatRow(tabletId, seq)).family(family).build();
+      case DEFINE_TABLET:
+        formattedRow = formatRow(tabletId, seq);
+        DataOutputBuffer buffer = new DataOutputBuffer();
+        tablet.writeTo(buffer);
+        var q = copyOf(buffer.getData(), buffer.getLength());
+        buffer.close();
+        return kb.row(formattedRow).family(family).qualifier(q).build();
+      default:
+        throw new AssertionError("Invalid event type in LogFileKey: " + event);
+    }
+  }
+
+  /**
+   * Get the first byte for the event. The only possible values are 0-4. This is used as the highest
+   * byte in the row.
+   */
+  private byte getEventByte() {
+    int evenTypeInteger = eventType(event);
+    return (byte) (evenTypeInteger & 0xff);
+  }
+
+  /**
+   * Get the byte encoded row for this LogFileKey as a Text object.
+   */
+  private Text formatRow() {
+    return new Text(formatRow(tabletId, seq));
+  }
+
+  /**
+   * Format the row using 13 bytes encoded to allow proper sorting of the RFile Key. The highest
+   * byte is for the event number, 4 bytes for the tabletId and 8 bytes for the sequence long.
+   */
+  private byte[] formatRow(int tabletId, long seq) {
+    byte eventNum = getEventByte();
+    // These will not sort properly when encoded if negative. Negative is not expected currently,
+    // defending against future changes and/or bugs.
+    Preconditions.checkArgument(eventNum >= 0 && seq >= 0);
+    byte[] row = new byte[13];
+    // encode the signed integer so negatives will sort properly for tabletId
+    int encodedTabletId = tabletId ^ 0x80000000;
+
+    row[0] = eventNum;
+    row[1] = (byte) ((encodedTabletId >>> 24) & 0xff);
+    row[2] = (byte) ((encodedTabletId >>> 16) & 0xff);
+    row[3] = (byte) ((encodedTabletId >>> 8) & 0xff);
+    row[4] = (byte) (encodedTabletId & 0xff);
+    row[5] = (byte) (seq >>> 56);
+    row[6] = (byte) (seq >>> 48);
+    row[7] = (byte) (seq >>> 40);
+    row[8] = (byte) (seq >>> 32);
+    row[9] = (byte) (seq >>> 24);
+    row[10] = (byte) (seq >>> 16);
+    row[11] = (byte) (seq >>> 8);
+    row[12] = (byte) (seq); // >>> 0
+    return row;
+  }
+
+  /**
+   * Extract the tabletId integer from the byte encoded Row.
+   */
+  private static int getTabletId(byte[] row) {
+    int encoded = ((row[1] << 24) + (row[2] << 16) + (row[3] << 8) + row[4]);

Review comment:
       @milleruntime `(row[1] << 24)` puts the bits of `row[1]` in the leftmost byte of a 4-byte int, possibly making this a negative number, which, when added with `+`, will be incorrect. You should use bitwise `|` operators instead of `+` to ensure the combining these doesn't get messed up when one is negative.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org