You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by "keith-turner (via GitHub)" <gi...@apache.org> on 2023/05/11 17:22:58 UTC

[GitHub] [accumulo] keith-turner opened a new pull request, #3392: fixes race condition in metadata consistency check

keith-turner opened a new pull request, #3392:
URL: https://github.com/apache/accumulo/pull/3392

   While looking into #3386 I noticed the Accumulo metadata consistency check was incrementing a counter in the incorrect place.  It should increment before writing to the metadata table, but it does not. This could cause the check to report false postives.  The false positive in the case would be transient and should not repeat on subsequent checks.
   
   Also noticed a redundant check when deciding if the file should be added to the set of in memory files.  AFICT this redundant check is harmless, but it could cause problems for future changes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] keith-turner commented on pull request #3392: fixes race condition in metadata consistency check

Posted by "keith-turner (via GitHub)" <gi...@apache.org>.
keith-turner commented on PR #3392:
URL: https://github.com/apache/accumulo/pull/3392#issuecomment-1544422546

   I am trying to trigger the false positive with the following test.
   
   I set the following properties.
   
   ```
   tserver.compaction.minor.concurrent.max=20
   tserver.health.check.interval=1ms
   ```
   
   Then I run the following code in jshell
   
   ```java
       client.tableOperations().create("foo");
       
       var bw = client.createBatchWriter("foo");
   
       SortedSet<Text> splits = new TreeSet<>();
   
       IntStream.range(1, 100).mapToObj(i -> String.format("%03d", i)).map(Text::new)
           .forEach(splits::add);
   
       client.tableOperations().addSplits("foo", splits);
   
       while (true) {
         IntStream.range(1, 100).mapToObj(i -> String.format("%03d", i)).forEach(row -> {
           Mutation mutation = new Mutation(row);
           mutation.put("f1", "q1", "v1");
           try {
             bw.addMutation(mutation);
           } catch (Exception e) {
             throw new RuntimeException(e);
           }
         });
         bw.flush();
         client.tableOperations().flush("foo", null, null, true);
       }
   ```
   
   I am seeing around 40 minor compactions per second while this test is running against 2.1.0.  Not seeing it run into the race condition yet.  I think I need to more tservers doing more checks against more tablets to increase the probability of bumping into it.  I am running it under Uno.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] dlmarion commented on a diff in pull request #3392: fixes race condition in metadata consistency check

Posted by "dlmarion (via GitHub)" <gi...@apache.org>.
dlmarion commented on code in PR #3392:
URL: https://github.com/apache/accumulo/pull/3392#discussion_r1191581405


##########
server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/DatafileManager.java:
##########
@@ -350,41 +350,43 @@ Optional<StoredTabletFile> bringMinorCompactionOnline(TabletFile tmpDatafile,
         logFileOnly.add(unusedWalLog);
       }
     }
-    try {
-      // the order of writing to metadata and walog is important in the face of machine/process
-      // failures need to write to metadata before writing to walog, when things are done in the
-      // reverse order data could be lost... the minor compaction start even should be written
-      // before the following metadata write is made
-      newFile = tablet.updateTabletDataFile(commitSession.getMaxCommittedTime(), newDatafile, dfv,
-          unusedWalLogs, flushId);
-
-      // Mark that we have data we want to replicate
-      // This WAL could still be in use by other Tablets *from the same table*, so we can only mark
-      // that there is data to replicate,
-      // but it is *not* closed. We know it is not closed by the fact that this MinC triggered. A
-      // MinC cannot happen unless the
-      // tablet is online and thus these WALs are referenced by that tablet. Therefore, the WAL
-      // replication status cannot be 'closed'.
-      if (replicate) {
-        if (log.isDebugEnabled()) {
-          log.debug("Recording that data has been ingested into {} using {}", tablet.getExtent(),
-              logFileOnly);
-        }
-        for (String logFile : logFileOnly) {
-          @SuppressWarnings("deprecation")
-          Status status = org.apache.accumulo.server.replication.StatusUtil.openWithUnknownLength();
-          ReplicationTableUtil.updateFiles(tablet.getContext(), tablet.getExtent(), logFile,
-              status);
-        }
-      }
-    } finally {
-      tablet.finishClearingUnusedLogs();
-    }
 
     // increment start count before metadata update AND updating in memory map of files
     metadataUpdateCount.updateAndGet(MetadataUpdateCount::incrementStart);
     // do not place any code here between above stmt and try{}finally
     try {
+      try {
+        // the order of writing to metadata and walog is important in the face of machine/process
+        // failures need to write to metadata before writing to walog, when things are done in the
+        // reverse order data could be lost... the minor compaction start even should be written
+        // before the following metadata write is made
+        newFile = tablet.updateTabletDataFile(commitSession.getMaxCommittedTime(), newDatafile, dfv,

Review Comment:
   How do you think we should handle an Exception being raised here? The `minc` thread would die and leave `DatafileManager.datafileSizes` in an inconsistent state.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] keith-turner commented on a diff in pull request #3392: fixes race condition in metadata consistency check

Posted by "keith-turner (via GitHub)" <gi...@apache.org>.
keith-turner commented on code in PR #3392:
URL: https://github.com/apache/accumulo/pull/3392#discussion_r1191515037


##########
server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/DatafileManager.java:
##########
@@ -350,41 +350,43 @@ Optional<StoredTabletFile> bringMinorCompactionOnline(TabletFile tmpDatafile,
         logFileOnly.add(unusedWalLog);
       }
     }
-    try {
-      // the order of writing to metadata and walog is important in the face of machine/process
-      // failures need to write to metadata before writing to walog, when things are done in the
-      // reverse order data could be lost... the minor compaction start even should be written
-      // before the following metadata write is made
-      newFile = tablet.updateTabletDataFile(commitSession.getMaxCommittedTime(), newDatafile, dfv,
-          unusedWalLogs, flushId);
-
-      // Mark that we have data we want to replicate
-      // This WAL could still be in use by other Tablets *from the same table*, so we can only mark
-      // that there is data to replicate,
-      // but it is *not* closed. We know it is not closed by the fact that this MinC triggered. A
-      // MinC cannot happen unless the
-      // tablet is online and thus these WALs are referenced by that tablet. Therefore, the WAL
-      // replication status cannot be 'closed'.
-      if (replicate) {
-        if (log.isDebugEnabled()) {
-          log.debug("Recording that data has been ingested into {} using {}", tablet.getExtent(),
-              logFileOnly);
-        }
-        for (String logFile : logFileOnly) {
-          @SuppressWarnings("deprecation")
-          Status status = org.apache.accumulo.server.replication.StatusUtil.openWithUnknownLength();
-          ReplicationTableUtil.updateFiles(tablet.getContext(), tablet.getExtent(), logFile,
-              status);
-        }
-      }
-    } finally {
-      tablet.finishClearingUnusedLogs();
-    }
 
     // increment start count before metadata update AND updating in memory map of files
     metadataUpdateCount.updateAndGet(MetadataUpdateCount::incrementStart);
     // do not place any code here between above stmt and try{}finally
     try {
+      try {
+        // the order of writing to metadata and walog is important in the face of machine/process
+        // failures need to write to metadata before writing to walog, when things are done in the
+        // reverse order data could be lost... the minor compaction start even should be written
+        // before the following metadata write is made
+        newFile = tablet.updateTabletDataFile(commitSession.getMaxCommittedTime(), newDatafile, dfv,

Review Comment:
   Analyzing the code, it seems like the following log message should trigger
   
   https://github.com/apache/accumulo/blob/fdabdc5133dd262316bac5d71a832f332831a61a/server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/MinorCompactionTask.java#L135
   
   Also it rethrows the exception on the next line, so it seems like that should go to the uncaught exception handler in a thread pool.
   
   The exception takes an interesting path from being thrown till the log stmt referenced above.  Things catch it and rethrow it and do tracing stuff multiple times.  Something also wraps it another exception and adds the extent.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] dlmarion commented on a diff in pull request #3392: fixes race condition in metadata consistency check

Posted by "dlmarion (via GitHub)" <gi...@apache.org>.
dlmarion commented on code in PR #3392:
URL: https://github.com/apache/accumulo/pull/3392#discussion_r1191504746


##########
server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/DatafileManager.java:
##########
@@ -350,41 +350,43 @@ Optional<StoredTabletFile> bringMinorCompactionOnline(TabletFile tmpDatafile,
         logFileOnly.add(unusedWalLog);
       }
     }
-    try {
-      // the order of writing to metadata and walog is important in the face of machine/process
-      // failures need to write to metadata before writing to walog, when things are done in the
-      // reverse order data could be lost... the minor compaction start even should be written
-      // before the following metadata write is made
-      newFile = tablet.updateTabletDataFile(commitSession.getMaxCommittedTime(), newDatafile, dfv,
-          unusedWalLogs, flushId);
-
-      // Mark that we have data we want to replicate
-      // This WAL could still be in use by other Tablets *from the same table*, so we can only mark
-      // that there is data to replicate,
-      // but it is *not* closed. We know it is not closed by the fact that this MinC triggered. A
-      // MinC cannot happen unless the
-      // tablet is online and thus these WALs are referenced by that tablet. Therefore, the WAL
-      // replication status cannot be 'closed'.
-      if (replicate) {
-        if (log.isDebugEnabled()) {
-          log.debug("Recording that data has been ingested into {} using {}", tablet.getExtent(),
-              logFileOnly);
-        }
-        for (String logFile : logFileOnly) {
-          @SuppressWarnings("deprecation")
-          Status status = org.apache.accumulo.server.replication.StatusUtil.openWithUnknownLength();
-          ReplicationTableUtil.updateFiles(tablet.getContext(), tablet.getExtent(), logFile,
-              status);
-        }
-      }
-    } finally {
-      tablet.finishClearingUnusedLogs();
-    }
 
     // increment start count before metadata update AND updating in memory map of files
     metadataUpdateCount.updateAndGet(MetadataUpdateCount::incrementStart);
     // do not place any code here between above stmt and try{}finally
     try {
+      try {
+        // the order of writing to metadata and walog is important in the face of machine/process
+        // failures need to write to metadata before writing to walog, when things are done in the
+        // reverse order data could be lost... the minor compaction start even should be written
+        // before the following metadata write is made
+        newFile = tablet.updateTabletDataFile(commitSession.getMaxCommittedTime(), newDatafile, dfv,

Review Comment:
   What happens if this call fails, but fails after the metadata table is updated?  For example, this call updates the metadata table and the write succeeds in the remote tablet server, but something happens after that that throws an error. Won't it be the case that the metadata table is updated, but the in memory datafilesizes is not?
   
   Edit: This question is in the context of #3386, not this PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] dlmarion commented on a diff in pull request #3392: fixes race condition in metadata consistency check

Posted by "dlmarion (via GitHub)" <gi...@apache.org>.
dlmarion commented on code in PR #3392:
URL: https://github.com/apache/accumulo/pull/3392#discussion_r1191504746


##########
server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/DatafileManager.java:
##########
@@ -350,41 +350,43 @@ Optional<StoredTabletFile> bringMinorCompactionOnline(TabletFile tmpDatafile,
         logFileOnly.add(unusedWalLog);
       }
     }
-    try {
-      // the order of writing to metadata and walog is important in the face of machine/process
-      // failures need to write to metadata before writing to walog, when things are done in the
-      // reverse order data could be lost... the minor compaction start even should be written
-      // before the following metadata write is made
-      newFile = tablet.updateTabletDataFile(commitSession.getMaxCommittedTime(), newDatafile, dfv,
-          unusedWalLogs, flushId);
-
-      // Mark that we have data we want to replicate
-      // This WAL could still be in use by other Tablets *from the same table*, so we can only mark
-      // that there is data to replicate,
-      // but it is *not* closed. We know it is not closed by the fact that this MinC triggered. A
-      // MinC cannot happen unless the
-      // tablet is online and thus these WALs are referenced by that tablet. Therefore, the WAL
-      // replication status cannot be 'closed'.
-      if (replicate) {
-        if (log.isDebugEnabled()) {
-          log.debug("Recording that data has been ingested into {} using {}", tablet.getExtent(),
-              logFileOnly);
-        }
-        for (String logFile : logFileOnly) {
-          @SuppressWarnings("deprecation")
-          Status status = org.apache.accumulo.server.replication.StatusUtil.openWithUnknownLength();
-          ReplicationTableUtil.updateFiles(tablet.getContext(), tablet.getExtent(), logFile,
-              status);
-        }
-      }
-    } finally {
-      tablet.finishClearingUnusedLogs();
-    }
 
     // increment start count before metadata update AND updating in memory map of files
     metadataUpdateCount.updateAndGet(MetadataUpdateCount::incrementStart);
     // do not place any code here between above stmt and try{}finally
     try {
+      try {
+        // the order of writing to metadata and walog is important in the face of machine/process
+        // failures need to write to metadata before writing to walog, when things are done in the
+        // reverse order data could be lost... the minor compaction start even should be written
+        // before the following metadata write is made
+        newFile = tablet.updateTabletDataFile(commitSession.getMaxCommittedTime(), newDatafile, dfv,

Review Comment:
   What happens if this call fails, but fails after the metadata table is updated?  For example, this call updates the metadata table and the write succeeds in the remote tablet server, but something happens after that that throws an error. Won't it be the case that the metadata table is updated, but the in memory datafilesizes is not?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] keith-turner commented on pull request #3392: fixes race condition in metadata consistency check

Posted by "keith-turner (via GitHub)" <gi...@apache.org>.
keith-turner commented on PR #3392:
URL: https://github.com/apache/accumulo/pull/3392#issuecomment-1544501443

   Looking at the code I noticed the increment for the bulk import code is also in the wrong place.  Going to push a fix for that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] keith-turner commented on a diff in pull request #3392: fixes race condition in metadata consistency check

Posted by "keith-turner (via GitHub)" <gi...@apache.org>.
keith-turner commented on code in PR #3392:
URL: https://github.com/apache/accumulo/pull/3392#discussion_r1191506481


##########
server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/DatafileManager.java:
##########
@@ -350,41 +350,43 @@ Optional<StoredTabletFile> bringMinorCompactionOnline(TabletFile tmpDatafile,
         logFileOnly.add(unusedWalLog);
       }
     }
-    try {
-      // the order of writing to metadata and walog is important in the face of machine/process
-      // failures need to write to metadata before writing to walog, when things are done in the
-      // reverse order data could be lost... the minor compaction start even should be written
-      // before the following metadata write is made
-      newFile = tablet.updateTabletDataFile(commitSession.getMaxCommittedTime(), newDatafile, dfv,
-          unusedWalLogs, flushId);
-
-      // Mark that we have data we want to replicate
-      // This WAL could still be in use by other Tablets *from the same table*, so we can only mark
-      // that there is data to replicate,
-      // but it is *not* closed. We know it is not closed by the fact that this MinC triggered. A
-      // MinC cannot happen unless the
-      // tablet is online and thus these WALs are referenced by that tablet. Therefore, the WAL
-      // replication status cannot be 'closed'.
-      if (replicate) {
-        if (log.isDebugEnabled()) {
-          log.debug("Recording that data has been ingested into {} using {}", tablet.getExtent(),
-              logFileOnly);
-        }
-        for (String logFile : logFileOnly) {
-          @SuppressWarnings("deprecation")
-          Status status = org.apache.accumulo.server.replication.StatusUtil.openWithUnknownLength();
-          ReplicationTableUtil.updateFiles(tablet.getContext(), tablet.getExtent(), logFile,
-              status);
-        }
-      }
-    } finally {
-      tablet.finishClearingUnusedLogs();
-    }
 
     // increment start count before metadata update AND updating in memory map of files
     metadataUpdateCount.updateAndGet(MetadataUpdateCount::incrementStart);
     // do not place any code here between above stmt and try{}finally
     try {
+      try {
+        // the order of writing to metadata and walog is important in the face of machine/process
+        // failures need to write to metadata before writing to walog, when things are done in the
+        // reverse order data could be lost... the minor compaction start even should be written
+        // before the following metadata write is made
+        newFile = tablet.updateTabletDataFile(commitSession.getMaxCommittedTime(), newDatafile, dfv,

Review Comment:
   Yeah if that throws an exception after successfully making the metadata update then it would leave things in a bad state until the tablet is unloaded.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] keith-turner commented on a diff in pull request #3392: fixes race condition in metadata consistency check

Posted by "keith-turner (via GitHub)" <gi...@apache.org>.
keith-turner commented on code in PR #3392:
URL: https://github.com/apache/accumulo/pull/3392#discussion_r1191600481


##########
server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/DatafileManager.java:
##########
@@ -350,41 +350,43 @@ Optional<StoredTabletFile> bringMinorCompactionOnline(TabletFile tmpDatafile,
         logFileOnly.add(unusedWalLog);
       }
     }
-    try {
-      // the order of writing to metadata and walog is important in the face of machine/process
-      // failures need to write to metadata before writing to walog, when things are done in the
-      // reverse order data could be lost... the minor compaction start even should be written
-      // before the following metadata write is made
-      newFile = tablet.updateTabletDataFile(commitSession.getMaxCommittedTime(), newDatafile, dfv,
-          unusedWalLogs, flushId);
-
-      // Mark that we have data we want to replicate
-      // This WAL could still be in use by other Tablets *from the same table*, so we can only mark
-      // that there is data to replicate,
-      // but it is *not* closed. We know it is not closed by the fact that this MinC triggered. A
-      // MinC cannot happen unless the
-      // tablet is online and thus these WALs are referenced by that tablet. Therefore, the WAL
-      // replication status cannot be 'closed'.
-      if (replicate) {
-        if (log.isDebugEnabled()) {
-          log.debug("Recording that data has been ingested into {} using {}", tablet.getExtent(),
-              logFileOnly);
-        }
-        for (String logFile : logFileOnly) {
-          @SuppressWarnings("deprecation")
-          Status status = org.apache.accumulo.server.replication.StatusUtil.openWithUnknownLength();
-          ReplicationTableUtil.updateFiles(tablet.getContext(), tablet.getExtent(), logFile,
-              status);
-        }
-      }
-    } finally {
-      tablet.finishClearingUnusedLogs();
-    }
 
     // increment start count before metadata update AND updating in memory map of files
     metadataUpdateCount.updateAndGet(MetadataUpdateCount::incrementStart);
     // do not place any code here between above stmt and try{}finally
     try {
+      try {
+        // the order of writing to metadata and walog is important in the face of machine/process
+        // failures need to write to metadata before writing to walog, when things are done in the
+        // reverse order data could be lost... the minor compaction start even should be written
+        // before the following metadata write is made
+        newFile = tablet.updateTabletDataFile(commitSession.getMaxCommittedTime(), newDatafile, dfv,

Review Comment:
   The more I think about it not sure we want to do anything other than log at error.  Unexpected bugs can happen anywhere in the code an hopefully they are logged.  If there is an exception that we should be retrying on and are not then we should address that specific case when we encounter it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] dlmarion commented on a diff in pull request #3392: fixes race condition in metadata consistency check

Posted by "dlmarion (via GitHub)" <gi...@apache.org>.
dlmarion commented on code in PR #3392:
URL: https://github.com/apache/accumulo/pull/3392#discussion_r1191602471


##########
server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/DatafileManager.java:
##########
@@ -350,41 +350,43 @@ Optional<StoredTabletFile> bringMinorCompactionOnline(TabletFile tmpDatafile,
         logFileOnly.add(unusedWalLog);
       }
     }
-    try {
-      // the order of writing to metadata and walog is important in the face of machine/process
-      // failures need to write to metadata before writing to walog, when things are done in the
-      // reverse order data could be lost... the minor compaction start even should be written
-      // before the following metadata write is made
-      newFile = tablet.updateTabletDataFile(commitSession.getMaxCommittedTime(), newDatafile, dfv,
-          unusedWalLogs, flushId);
-
-      // Mark that we have data we want to replicate
-      // This WAL could still be in use by other Tablets *from the same table*, so we can only mark
-      // that there is data to replicate,
-      // but it is *not* closed. We know it is not closed by the fact that this MinC triggered. A
-      // MinC cannot happen unless the
-      // tablet is online and thus these WALs are referenced by that tablet. Therefore, the WAL
-      // replication status cannot be 'closed'.
-      if (replicate) {
-        if (log.isDebugEnabled()) {
-          log.debug("Recording that data has been ingested into {} using {}", tablet.getExtent(),
-              logFileOnly);
-        }
-        for (String logFile : logFileOnly) {
-          @SuppressWarnings("deprecation")
-          Status status = org.apache.accumulo.server.replication.StatusUtil.openWithUnknownLength();
-          ReplicationTableUtil.updateFiles(tablet.getContext(), tablet.getExtent(), logFile,
-              status);
-        }
-      }
-    } finally {
-      tablet.finishClearingUnusedLogs();
-    }
 
     // increment start count before metadata update AND updating in memory map of files
     metadataUpdateCount.updateAndGet(MetadataUpdateCount::incrementStart);
     // do not place any code here between above stmt and try{}finally
     try {
+      try {
+        // the order of writing to metadata and walog is important in the face of machine/process
+        // failures need to write to metadata before writing to walog, when things are done in the
+        // reverse order data could be lost... the minor compaction start even should be written
+        // before the following metadata write is made
+        newFile = tablet.updateTabletDataFile(commitSession.getMaxCommittedTime(), newDatafile, dfv,

Review Comment:
   Based on the code path analysis, that should happen already.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] keith-turner merged pull request #3392: fixes race condition in metadata consistency check

Posted by "keith-turner (via GitHub)" <gi...@apache.org>.
keith-turner merged PR #3392:
URL: https://github.com/apache/accumulo/pull/3392


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] keith-turner commented on pull request #3392: fixes race condition in metadata consistency check

Posted by "keith-turner (via GitHub)" <gi...@apache.org>.
keith-turner commented on PR #3392:
URL: https://github.com/apache/accumulo/pull/3392#issuecomment-1544397128

   Best to ignore whitespace when looking at the diffs


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] keith-turner commented on a diff in pull request #3392: fixes race condition in metadata consistency check

Posted by "keith-turner (via GitHub)" <gi...@apache.org>.
keith-turner commented on code in PR #3392:
URL: https://github.com/apache/accumulo/pull/3392#discussion_r1191598433


##########
server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/DatafileManager.java:
##########
@@ -350,41 +350,43 @@ Optional<StoredTabletFile> bringMinorCompactionOnline(TabletFile tmpDatafile,
         logFileOnly.add(unusedWalLog);
       }
     }
-    try {
-      // the order of writing to metadata and walog is important in the face of machine/process
-      // failures need to write to metadata before writing to walog, when things are done in the
-      // reverse order data could be lost... the minor compaction start even should be written
-      // before the following metadata write is made
-      newFile = tablet.updateTabletDataFile(commitSession.getMaxCommittedTime(), newDatafile, dfv,
-          unusedWalLogs, flushId);
-
-      // Mark that we have data we want to replicate
-      // This WAL could still be in use by other Tablets *from the same table*, so we can only mark
-      // that there is data to replicate,
-      // but it is *not* closed. We know it is not closed by the fact that this MinC triggered. A
-      // MinC cannot happen unless the
-      // tablet is online and thus these WALs are referenced by that tablet. Therefore, the WAL
-      // replication status cannot be 'closed'.
-      if (replicate) {
-        if (log.isDebugEnabled()) {
-          log.debug("Recording that data has been ingested into {} using {}", tablet.getExtent(),
-              logFileOnly);
-        }
-        for (String logFile : logFileOnly) {
-          @SuppressWarnings("deprecation")
-          Status status = org.apache.accumulo.server.replication.StatusUtil.openWithUnknownLength();
-          ReplicationTableUtil.updateFiles(tablet.getContext(), tablet.getExtent(), logFile,
-              status);
-        }
-      }
-    } finally {
-      tablet.finishClearingUnusedLogs();
-    }
 
     // increment start count before metadata update AND updating in memory map of files
     metadataUpdateCount.updateAndGet(MetadataUpdateCount::incrementStart);
     // do not place any code here between above stmt and try{}finally
     try {
+      try {
+        // the order of writing to metadata and walog is important in the face of machine/process
+        // failures need to write to metadata before writing to walog, when things are done in the
+        // reverse order data could be lost... the minor compaction start even should be written
+        // before the following metadata write is made
+        newFile = tablet.updateTabletDataFile(commitSession.getMaxCommittedTime(), newDatafile, dfv,

Review Comment:
   Lower down in the code, there are already thing that retry writes on known thrift exceptions.   So should not need to worry about that at this level.  An exception in this code is likely a bug.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] keith-turner commented on a diff in pull request #3392: fixes race condition in metadata consistency check

Posted by "keith-turner (via GitHub)" <gi...@apache.org>.
keith-turner commented on code in PR #3392:
URL: https://github.com/apache/accumulo/pull/3392#discussion_r1191596509


##########
server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/DatafileManager.java:
##########
@@ -350,41 +350,43 @@ Optional<StoredTabletFile> bringMinorCompactionOnline(TabletFile tmpDatafile,
         logFileOnly.add(unusedWalLog);
       }
     }
-    try {
-      // the order of writing to metadata and walog is important in the face of machine/process
-      // failures need to write to metadata before writing to walog, when things are done in the
-      // reverse order data could be lost... the minor compaction start even should be written
-      // before the following metadata write is made
-      newFile = tablet.updateTabletDataFile(commitSession.getMaxCommittedTime(), newDatafile, dfv,
-          unusedWalLogs, flushId);
-
-      // Mark that we have data we want to replicate
-      // This WAL could still be in use by other Tablets *from the same table*, so we can only mark
-      // that there is data to replicate,
-      // but it is *not* closed. We know it is not closed by the fact that this MinC triggered. A
-      // MinC cannot happen unless the
-      // tablet is online and thus these WALs are referenced by that tablet. Therefore, the WAL
-      // replication status cannot be 'closed'.
-      if (replicate) {
-        if (log.isDebugEnabled()) {
-          log.debug("Recording that data has been ingested into {} using {}", tablet.getExtent(),
-              logFileOnly);
-        }
-        for (String logFile : logFileOnly) {
-          @SuppressWarnings("deprecation")
-          Status status = org.apache.accumulo.server.replication.StatusUtil.openWithUnknownLength();
-          ReplicationTableUtil.updateFiles(tablet.getContext(), tablet.getExtent(), logFile,
-              status);
-        }
-      }
-    } finally {
-      tablet.finishClearingUnusedLogs();
-    }
 
     // increment start count before metadata update AND updating in memory map of files
     metadataUpdateCount.updateAndGet(MetadataUpdateCount::incrementStart);
     // do not place any code here between above stmt and try{}finally
     try {
+      try {
+        // the order of writing to metadata and walog is important in the face of machine/process
+        // failures need to write to metadata before writing to walog, when things are done in the
+        // reverse order data could be lost... the minor compaction start even should be written
+        // before the following metadata write is made
+        newFile = tablet.updateTabletDataFile(commitSession.getMaxCommittedTime(), newDatafile, dfv,

Review Comment:
   I noticed the write to the walog code a bit below catches an IOException and retries.  But that seems iffy, don't always get IOException (could be thrift exception or a wrapped exception).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] keith-turner commented on a diff in pull request #3392: fixes race condition in metadata consistency check

Posted by "keith-turner (via GitHub)" <gi...@apache.org>.
keith-turner commented on code in PR #3392:
URL: https://github.com/apache/accumulo/pull/3392#discussion_r1191598433


##########
server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/DatafileManager.java:
##########
@@ -350,41 +350,43 @@ Optional<StoredTabletFile> bringMinorCompactionOnline(TabletFile tmpDatafile,
         logFileOnly.add(unusedWalLog);
       }
     }
-    try {
-      // the order of writing to metadata and walog is important in the face of machine/process
-      // failures need to write to metadata before writing to walog, when things are done in the
-      // reverse order data could be lost... the minor compaction start even should be written
-      // before the following metadata write is made
-      newFile = tablet.updateTabletDataFile(commitSession.getMaxCommittedTime(), newDatafile, dfv,
-          unusedWalLogs, flushId);
-
-      // Mark that we have data we want to replicate
-      // This WAL could still be in use by other Tablets *from the same table*, so we can only mark
-      // that there is data to replicate,
-      // but it is *not* closed. We know it is not closed by the fact that this MinC triggered. A
-      // MinC cannot happen unless the
-      // tablet is online and thus these WALs are referenced by that tablet. Therefore, the WAL
-      // replication status cannot be 'closed'.
-      if (replicate) {
-        if (log.isDebugEnabled()) {
-          log.debug("Recording that data has been ingested into {} using {}", tablet.getExtent(),
-              logFileOnly);
-        }
-        for (String logFile : logFileOnly) {
-          @SuppressWarnings("deprecation")
-          Status status = org.apache.accumulo.server.replication.StatusUtil.openWithUnknownLength();
-          ReplicationTableUtil.updateFiles(tablet.getContext(), tablet.getExtent(), logFile,
-              status);
-        }
-      }
-    } finally {
-      tablet.finishClearingUnusedLogs();
-    }
 
     // increment start count before metadata update AND updating in memory map of files
     metadataUpdateCount.updateAndGet(MetadataUpdateCount::incrementStart);
     // do not place any code here between above stmt and try{}finally
     try {
+      try {
+        // the order of writing to metadata and walog is important in the face of machine/process
+        // failures need to write to metadata before writing to walog, when things are done in the
+        // reverse order data could be lost... the minor compaction start even should be written
+        // before the following metadata write is made
+        newFile = tablet.updateTabletDataFile(commitSession.getMaxCommittedTime(), newDatafile, dfv,

Review Comment:
   Lower down in the code, there are already things that retry writes on known thrift exceptions.   So should not need to worry about that at this level.  An exception in this code is likely a bug.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] dlmarion commented on a diff in pull request #3392: fixes race condition in metadata consistency check

Posted by "dlmarion (via GitHub)" <gi...@apache.org>.
dlmarion commented on code in PR #3392:
URL: https://github.com/apache/accumulo/pull/3392#discussion_r1191508166


##########
server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/DatafileManager.java:
##########
@@ -350,41 +350,43 @@ Optional<StoredTabletFile> bringMinorCompactionOnline(TabletFile tmpDatafile,
         logFileOnly.add(unusedWalLog);
       }
     }
-    try {
-      // the order of writing to metadata and walog is important in the face of machine/process
-      // failures need to write to metadata before writing to walog, when things are done in the
-      // reverse order data could be lost... the minor compaction start even should be written
-      // before the following metadata write is made
-      newFile = tablet.updateTabletDataFile(commitSession.getMaxCommittedTime(), newDatafile, dfv,
-          unusedWalLogs, flushId);
-
-      // Mark that we have data we want to replicate
-      // This WAL could still be in use by other Tablets *from the same table*, so we can only mark
-      // that there is data to replicate,
-      // but it is *not* closed. We know it is not closed by the fact that this MinC triggered. A
-      // MinC cannot happen unless the
-      // tablet is online and thus these WALs are referenced by that tablet. Therefore, the WAL
-      // replication status cannot be 'closed'.
-      if (replicate) {
-        if (log.isDebugEnabled()) {
-          log.debug("Recording that data has been ingested into {} using {}", tablet.getExtent(),
-              logFileOnly);
-        }
-        for (String logFile : logFileOnly) {
-          @SuppressWarnings("deprecation")
-          Status status = org.apache.accumulo.server.replication.StatusUtil.openWithUnknownLength();
-          ReplicationTableUtil.updateFiles(tablet.getContext(), tablet.getExtent(), logFile,
-              status);
-        }
-      }
-    } finally {
-      tablet.finishClearingUnusedLogs();
-    }
 
     // increment start count before metadata update AND updating in memory map of files
     metadataUpdateCount.updateAndGet(MetadataUpdateCount::incrementStart);
     // do not place any code here between above stmt and try{}finally
     try {
+      try {
+        // the order of writing to metadata and walog is important in the face of machine/process
+        // failures need to write to metadata before writing to walog, when things are done in the
+        // reverse order data could be lost... the minor compaction start even should be written
+        // before the following metadata write is made
+        newFile = tablet.updateTabletDataFile(commitSession.getMaxCommittedTime(), newDatafile, dfv,

Review Comment:
   So, in the context of #3386, maybe we need some error handling around this (and other like cases)?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org