You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by "keith-turner (via GitHub)" <gi...@apache.org> on 2023/05/18 17:26:30 UTC

[GitHub] [accumulo] keith-turner commented on a diff in pull request #3401: Start using TabletFile instead of String for referencing data files

keith-turner commented on code in PR #3401:
URL: https://github.com/apache/accumulo/pull/3401#discussion_r1197958044


##########
core/src/main/java/org/apache/accumulo/core/metadata/AbstractTabletFile.java:
##########
@@ -0,0 +1,44 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   https://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.metadata;
+
+import java.util.Objects;
+
+import org.apache.hadoop.fs.Path;
+
+public abstract class AbstractTabletFile<T extends AbstractTabletFile<T>> implements Comparable<T> {

Review Comment:
   This is a half baked thought that I am posting just for discussion, not looking for any changes in this PR.
   
   Currently we can write possibly dubious code like the following because StoredTabletFile is a child type of TabletFile.
   
   ```java
   Set<TabletFile> aSet = ...
   aSet.add(new StoredTabletFile(..));
   aSet.containst(new StoredTabletFile(..));
   ```
   
   If we made StoredTabletFile a sibling type of TabletFile by making StoredTabletFile extend AbstractTabletFile, I am wondering if it would buy us anything.  It would make the code example above not compile (the add() call will not compile, the contains call would warn).  Wondering if that would help improve code correctness or just be a waste of time, not sure w/o experimenting with the change.
   
   I noticed UnassignedTabletFile was a sibling of TabletFile and that along with #2830 led me to wonder about the usefulness of making StoredTabletFile a sibling too.
   



##########
core/src/main/java/org/apache/accumulo/core/metadata/UnassignedTabletFile.java:
##########
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   https://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.metadata;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.Objects;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+
+public class UnassignedTabletFile extends AbstractTabletFile<UnassignedTabletFile> {

Review Comment:
   You mentioned you were not sure about the name.  One thing I have found that helps me with naming sometimes is to write the documentation and see what words I use in the docs.  I went through that exercise and found I was using the word referenced, so that could be a candidate for the name.  This is also aligns with the terminology used in the accumulo GC source, it calls files in tablets references.
   
   ```suggestion
   /**
    *  A file that is not intended to be added to a tablet as a reference, within the scope of the code using this class, but needs to be passed to code that processes tablet files.  These files could be temp files or files directly created by a user for bulk import.  The file may ultimately be added to a tablet later as a new file reference, but within a different scope (process, thread, code block, method, etc) that uses a different class to represent the file in that scope.
    */
   public class UnreferencedTabletFile extends AbstractTabletFile<UnreferencedTabletFile> {
   ```



##########
server/base/src/main/java/org/apache/accumulo/server/fs/FileManager.java:
##########
@@ -94,12 +93,12 @@ public boolean equals(Object obj) {
 
     @Override
     public int hashCode() {
-      return fileName.hashCode();
+      return file.hashCode();
     }
   }
 
-  private Map<String,List<OpenReader>> openFiles;
-  private HashMap<FileSKVIterator,String> reservedReaders;
+  private Map<TabletFile,List<OpenReader>> openFiles;

Review Comment:
   Anytime we use TabletFile as key we have to be wary of #2830.  I think this is probably ok for now, but future changes to StoredTabletFile equals and hashcode could break this.



##########
core/src/main/java/org/apache/accumulo/core/metadata/TabletFile.java:
##########
@@ -155,4 +142,9 @@ public int hashCode() {
   public String toString() {
     return normalizedPath;
   }
+
+  public static TabletFile of(final Path path) {

Review Comment:
   What motivated this new method? Could we eventually make the constructor private?



##########
core/src/main/java/org/apache/accumulo/core/clientImpl/bulk/BulkImport.java:
##########
@@ -548,22 +548,22 @@ public SortedMap<KeyExtent,Bulk.Files> computeFileToTabletMappings(FileSystem fs
         context.instanceOperations().getSystemConfiguration(), tableProps, tableId);
 
     for (FileStatus fileStatus : files) {
-      Path filePath = fileStatus.getPath();
+      UnassignedTabletFile file = UnassignedTabletFile.of(fs, fileStatus.getPath());
       CompletableFuture<Map<KeyExtent,Bulk.FileInfo>> future = CompletableFuture.supplyAsync(() -> {
         try {
           long t1 = System.currentTimeMillis();
           List<KeyExtent> extents =
-              findOverlappingTablets(context, extentCache, filePath, fs, fileLensCache, cs);
+              findOverlappingTablets(context, extentCache, file, fs, fileLensCache, cs);
           // make sure file isn't going to too many tablets
-          checkTabletCount(maxTablets, extents.size(), filePath.toString());
-          Map<KeyExtent,Long> estSizes = estimateSizes(context.getConfiguration(), filePath,
+          checkTabletCount(maxTablets, extents.size(), file.toString());
+          Map<KeyExtent,Long> estSizes = estimateSizes(context.getConfiguration(), file,
               fileStatus.getLen(), extents, fs, fileLensCache, cs);
           Map<KeyExtent,Bulk.FileInfo> pathLocations = new HashMap<>();
           for (KeyExtent ke : extents) {
-            pathLocations.put(ke, new Bulk.FileInfo(filePath, estSizes.getOrDefault(ke, 0L)));
+            pathLocations.put(ke, new Bulk.FileInfo(file.getPath(), estSizes.getOrDefault(ke, 0L)));

Review Comment:
   Looked into passing TabletFile to FileInfo.  FileInfo is used for serialization, so internally it needs a string.  Could possible pass the tablet file obj have it do the conversion in the constructor. 



##########
core/src/main/java/org/apache/accumulo/core/file/rfile/RFileOperations.java:
##########
@@ -55,7 +55,7 @@ public class RFileOperations extends FileOperations {
 
   private static RFile.Reader getReader(FileOptions options) throws IOException {
     CachableBuilder cb = new CachableBuilder()
-        .fsPath(options.getFileSystem(), new Path(options.getFilename()), options.dropCacheBehind)
+        .fsPath(options.getFileSystem(), options.getFile().getPath(), options.dropCacheBehind)

Review Comment:
   Interesting how few changes there are to this class.



##########
core/src/main/java/org/apache/accumulo/core/clientImpl/bulk/BulkImport.java:
##########
@@ -261,9 +262,9 @@ public MLong(long i) {
     long l;
   }
 
-  public static Map<KeyExtent,Long> estimateSizes(AccumuloConfiguration acuConf, Path dataFile,
-      long fileSize, Collection<KeyExtent> extents, FileSystem ns, Cache<String,Long> fileLenCache,
-      CryptoService cs) throws IOException {
+  public static Map<KeyExtent,Long> estimateSizes(AccumuloConfiguration acuConf,
+      UnassignedTabletFile dataFile, long fileSize, Collection<KeyExtent> extents, FileSystem ns,
+      Cache<String,Long> fileLenCache, CryptoService cs) throws IOException {

Review Comment:
   I was wondering if we could use a better type than String for `Cache<String,Long>`.  Looked into the code a bit an I am not sure if we could use TabletFile.  If we can not use TabletFile we may be able to created a specialized CachableBlockFile.CacheId type to use in lieu of String..  This would be a follow on issue.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org