You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by "keith-turner (via GitHub)" <gi...@apache.org> on 2023/05/11 20:52:30 UTC

[GitHub] [accumulo] keith-turner commented on a diff in pull request #3385: Encapsulate file metadata inside a new object

keith-turner commented on code in PR #3385:
URL: https://github.com/apache/accumulo/pull/3385#discussion_r1191660512


##########
core/src/main/java/org/apache/accumulo/core/metadata/schema/TabletFileMetadataEntry.java:
##########
@@ -0,0 +1,115 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   https://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.metadata.schema;
+
+import java.util.Objects;
+
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.io.Text;
+
+/**
+ * Represents the exact string stored in the metadata table "file" column family. Currently the
+ * metadata only contains a file reference but could be expanded in the future.
+ */
+public class TabletFileMetadataEntry implements Comparable<TabletFileMetadataEntry> {
+
+  private final Path filePath;
+  private final String filePathString;
+
+  public TabletFileMetadataEntry(final Path filePath) {
+    this.filePath = Objects.requireNonNull(filePath);
+    // Cache the string value of the filePath, so we don't have to keep converting.
+    this.filePathString = filePath.toString();
+  }
+
+  /**
+   * The file path portion of the metadata
+   *
+   * @return The file path
+   */
+  public Path getFilePath() {
+    return filePath;
+  }
+
+  /**
+   * String representation of the file path
+   *
+   * @return file path string
+   */
+  public String getFilePathString() {
+    return filePathString;
+  }
+
+  /**
+   * Exact representation of what is in the Metadata table Currently this is just a file path but
+   * may expand
+   *
+   * @return The exact metadata string
+   */
+  public String getMetaString() {
+    return filePathString;
+  }
+
+  /**
+   * Exact {@link Text} representation of the metadat Generated from {@link #getMetaString()}
+   *
+   * @return The exact metadata as Text
+   */
+  public Text getMetaText() {
+    return new Text(filePathString);
+  }
+
+  @Override
+  public boolean equals(Object o) {
+    if (this == o) {
+      return true;
+    }
+    if (o == null || getClass() != o.getClass()) {
+      return false;
+    }
+    TabletFileMetadataEntry that = (TabletFileMetadataEntry) o;
+    return Objects.equals(filePathString, that.filePathString);
+  }
+
+  @Override
+  public int hashCode() {
+    return Objects.hash(filePathString);
+  }
+
+  @Override
+  public int compareTo(TabletFileMetadataEntry o) {
+    return filePathString.compareTo(o.filePathString);
+  }
+
+  @Override
+  public String toString() {
+    return filePathString;
+  }
+
+  /**
+   * Utility to create a new TabletFileMetadataEntry from the exact String in the Metadata table
+   *
+   * @param metadataEntry Exact string for the metadataEntry
+   * @return A TabletFileMetadataEntry created from the metadata string
+   */
+  public static TabletFileMetadataEntry of(String metadataEntry) {
+    return new TabletFileMetadataEntry(new Path(Objects.requireNonNull(metadataEntry)));

Review Comment:
   Once the string passes through Path it may no longer equal the exact string stored in the metadata table because Path may normalize the string.  For the purposes of doing later metadata updates we must keep the original string that came from the metadata table without ever passing it through path.  For example the following little program
   
   ```java
       String metaEntry = "hdfs://localhost:1/a//b//c";
       System.out.println(TabletFileMetadataEntry.of(metaEntry).getMetaString());
   ``` 
   
   will print
   
   ```
   hdfs://localhost:1/a/b/c
   ```
   
   So if we have a column qualifier with `hdfs://localhost:1/a//b//c` and we later try to update the column qualifier `hdfs://localhost:1/a/b/c` then it will not update the same key in the metadata table.  This could result in duplicating a file or failing to delete a file.
   
   We currently normalize with path before writing to the metadata table, but its tricky to make any assumptions about this.  There are least two cases to consider. First, older Accumulo code may not have normalized paths in the same way before storing them.  Second, different versions of the Hadoop Path could possibly normalize in different ways.  This is why StoredTabletFile exists, it keeps that exact string that was read from the metadata table without doing any normalization on it.
   
   I am wondering if this new class is needed.  Could we not pass TabletFile and/or StoredTabletFile around in the Accumulo code instead of string?  The only code that needs StoredTabletFile is code that updating an existing metadata entry for a file, for all other code TableFile should suffice.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org