You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2021/11/03 18:36:26 UTC

[GitHub] [accumulo-testing] keith-turner commented on a change in pull request #166: Add deletes to continuous ingest

keith-turner commented on a change in pull request #166:
URL: https://github.com/apache/accumulo-testing/pull/166#discussion_r740493229



##########
File path: conf/accumulo-testing.properties
##########
@@ -82,6 +82,11 @@ test.ci.ingest.pause.wait.max=180
 test.ci.ingest.pause.duration.min=60
 # Maximum pause duration (in seconds)
 test.ci.ingest.pause.duration.max=120
+# Entries will be periodically deleted during ingest if set to true
+test.ci.ingest.delete.enabled=false
+# The probability (in percent) that a set of entries will be deleted during continuous ingest
+# Note: test.ci.ingest.delete must be set to true for this value to be used
+test.ci.ingest.delete.probability=10

Review comment:
       Could only have the probability prop and default it to zero to turn off deletes
   
   ```suggestion
   # The probability (in percent) that a set of entries will be deleted during continuous ingest
   test.ci.ingest.delete.probability=0
   ```

##########
File path: src/main/java/org/apache/accumulo/testing/continuous/ContinuousIngest.java
##########
@@ -64,6 +65,16 @@ private static boolean pauseEnabled(Properties props) {
     return Boolean.parseBoolean(value);
   }
 
+  private static boolean deletesEnabled(Properties props) {
+    String stringValue = props.getProperty(TestProps.CI_INGEST_DELETE_ENABLED);
+    return Boolean.parseBoolean(stringValue);
+  }
+
+  private static int getDeleteProbability(Properties props) {
+    String stringValue = props.getProperty(TestProps.CI_INGEST_DELETE_PROBABILITY);
+    return Integer.parseInt(stringValue);

Review comment:
       Could add some sort of validation
   
   ```suggestion
       int prob = Integer.parseInt(stringValue);
       checkArgument(prob>= 0 && prob <=100, "TODO err msg")
   ```

##########
File path: src/main/java/org/apache/accumulo/testing/continuous/ContinuousIngest.java
##########
@@ -175,19 +191,47 @@ public static void main(String[] args) throws Exception {
 
         // generate subsequent sets of nodes that link to previous set of nodes
         for (int depth = 1; depth < maxDepth; depth++) {
+
+          // random chance that the entries will be deleted
+          boolean deletePrevious = deletesEnabled && r.nextInt(100) < deleteProbability;
+
+          // stack to hold mutations. stack ensures they are deleted in reverse order
+          Stack<Mutation> mutationStack = new Stack<>();
+
           for (int index = 0; index < flushInterval; index++) {
             long rowLong = genLong(rowMin, rowMax, r);
             byte[] prevRow = genRow(prevRows[index]);
             prevRows[index] = rowLong;
-            Mutation m = genMutation(rowLong, r.nextInt(maxColF), r.nextInt(maxColQ), cv,
-                ingestInstanceId, count, prevRow, checksum);
+            int cfInt = r.nextInt(maxColF);
+            int cqInt = r.nextInt(maxColQ);
+            Mutation m = genMutation(rowLong, cfInt, cqInt, cv, ingestInstanceId, count, prevRow,
+                checksum);
             count++;
             bw.addMutation(m);
+
+            // add a new delete mutation to the stack when applicable
+            if (deletePrevious) {
+              Mutation mutation = new Mutation(genRow(rowLong));
+              mutation.putDelete(genCol(cfInt), genCol(cqInt), cv);
+              mutationStack.add(mutation);
+            }
           }
 
           lastFlushTime = flush(bw, count, flushInterval, lastFlushTime);
           if (count >= numEntries)
             break out;
+
+          // delete last set of entries in reverse order
+          if (deletePrevious) {
+            log.info("Deleting previous set of entries");
+            while (!mutationStack.empty()) {
+              Mutation m = mutationStack.pop();
+              count--;
+              bw.addMutation(m);

Review comment:
       We need to flush the batch writer between some of the mutation for safety.  However we do not want to do it between each mutation as this would be too slow.  We can leverage the grouping that happens when writing during delete.  When this writes out mutations it does the following.
   
   Write out a group of nodes like  the following (the actual program does a group of 1 million by default, but making it smaller for the example).
   
      1. write N1
      2. write N2
      3. write N3
      4. write N4
      5. flush batch writer
   
   After flushing the batch writer another group will be written that refs the previous group.  In the code the prevRows array is used for this.
   
      1. write N5->N1
      2. write N6->N2
      3. write N7->N3
      4. write N8->N4
      5. flush batch writer
   
   Then another group will written
   
      1. write N9->N5
      2. write NA->N6
      3. write NB->N7
      4. write NC->N8
      5. flush batch writer
   
   It keeps writing batches like this, by default creating 1 million linked list of length 25. Then at the end it will stich all of these short linked list together using the nodes written in the first and last batch. Does something similar to the following.
   
      1. write N2->N9
      2. write N3->NA
      3. write N4->NB
      4. write N1->NC
      5. flush batch writer
     
   For the case of deleting maybe we could skip this step of stitching everything together above.  Also we could delete nodes in batches in reverse order. Like the following.
   
      1. delete N9
      2. delete NA
      3. delete NB
      4. delete NC
      5. flush batch writer
      1. delete N5
      2. delete N6
      3. delete N7
      4. delete N8
      5. flush batch writer
      1. delete N1
      2. delete N2
      3. delete N3
      4. delete N4
      5. flush batch writer
   
   Deleting in batches is efficient.  Doing it reverse order ensures that if the client process dies at any point that it will never leave a node pointing to something that does not exists. 
   
   Need to see if it takes too much memory keeping everything to delete in memory.  If it does maybe we could delete a contiguous subset of the groups.   We insert 25 groups. We could delete groups G4 through G9 for example.  We make everything in G10 point to G3, then we delete G4 through G9 in reveres order.
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org