You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2022/12/23 12:23:10 UTC

[GitHub] [accumulo] alerman opened a new pull request, #3140: Add option to only skip trash for Import files

alerman opened a new pull request, #3140:
URL: https://github.com/apache/accumulo/pull/3140

   This MR adds an option to optionally skip trash for Import files only.
   On our clusters, we find that we have to skip trash due to pressure on the namenodes and the scale that we deal with.
   
   We have seen a few instances of files going missing. We are still in the process of investigating (we thought it would be fixed by #2792 but was not.). When this happens because of our use case we can typically track down the data that contributed to Import files. However, data that has been compacted by one or more tablet servers is more difficult to tack down. By enabling this new option, we will have time to catch this issue and recover the data before the trash is emptied while we determine the cause of the files going missing.
   
   Intention is to backport this to 1.10 once the approach is agreed upon


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] ivakegg commented on pull request #3140: Add option to only skip trash for Import files

Posted by GitBox <gi...@apache.org>.
ivakegg commented on PR #3140:
URL: https://github.com/apache/accumulo/pull/3140#issuecomment-1367910943

   Does a skipTrashRegex make sense?  Just wondering if something a little more generic would be useful.  The other thought I had was whether this could be made table based (may not be possible).  I am fine with this as is, just a few thoughts to consider.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] dlmarion commented on a diff in pull request #3140: Add option to only skip trash for Import files

Posted by GitBox <gi...@apache.org>.
dlmarion commented on code in PR #3140:
URL: https://github.com/apache/accumulo/pull/3140#discussion_r1063597186


##########
core/src/main/java/org/apache/accumulo/core/conf/Property.java:
##########
@@ -746,6 +746,8 @@ public enum Property {
       "The number of threads used to delete RFiles and write-ahead logs", "1.3.5"),
   GC_TRASH_IGNORE("gc.trash.ignore", "false", PropertyType.BOOLEAN,
       "Do not use the Trash, even if it is configured.", "1.5.0"),
+  GC_TRASH_IGNORE_IMPORTS_ONLY("gc.trash.ignore.imports.only", "false", PropertyType.BOOLEAN,
+      "Skip trash only for Import files when gc.trash.ignore is true.", "3.0.0"),

Review Comment:
   ```
     GC_USE_TRASH("gc.trash.ignore", "true", PropertyType.BOOLEAN,
         "If trash is configured, move deleted non-bulk import files there instead of deleting them", "1.5.0"),
     GC_BULK_IMPORT_USE_TRASH("gc.trash.use.bulk.imports", "true", PropertyType.BOOLEAN,
         "If trash is configured, move deleted bulk import files there instead of deleting them", "3.0.0"),
   ```
   
   | Trash Enabled | GC_USE_TRASH | GC_BULK_IMPORT_USE_TRASH | Result |
   |----------------------|------------------------|-----------------------------------------------|-----------|
   | N | N/A | N/A | all files deleted |
   | Y | true | true | all files moved to trash |
   | Y | false | true | only bulk import files moved to trash |
   | Y | true | false | bulk import files deleted , others moved to trash |
   | Y | false | false | all files deleted | 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] EdColeman commented on a diff in pull request #3140: Add option to only skip trash for Import files

Posted by GitBox <gi...@apache.org>.
EdColeman commented on code in PR #3140:
URL: https://github.com/apache/accumulo/pull/3140#discussion_r1063591647


##########
core/src/main/java/org/apache/accumulo/core/conf/Property.java:
##########
@@ -746,6 +746,8 @@ public enum Property {
       "The number of threads used to delete RFiles and write-ahead logs", "1.3.5"),
   GC_TRASH_IGNORE("gc.trash.ignore", "false", PropertyType.BOOLEAN,
       "Do not use the Trash, even if it is configured.", "1.5.0"),
+  GC_TRASH_IGNORE_IMPORTS_ONLY("gc.trash.ignore.imports.only", "false", PropertyType.BOOLEAN,
+      "Skip trash only for Import files when gc.trash.ignore is true.", "3.0.0"),

Review Comment:
   That's what I ran into too - but renaming the GC_TRASH_IGNORE property seemed something to be avoided. Maybe if the new property was `GC_TRASH_IGNORE_IMPORTS` - that would make having the usage I described above easier to communicate?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] ctubbsii commented on a diff in pull request #3140: Add option to only skip trash for Import files

Posted by "ctubbsii (via GitHub)" <gi...@apache.org>.
ctubbsii commented on code in PR #3140:
URL: https://github.com/apache/accumulo/pull/3140#discussion_r1102123727


##########
core/src/main/java/org/apache/accumulo/core/conf/Property.java:
##########
@@ -746,6 +746,8 @@ public enum Property {
       "The number of threads used to delete RFiles and write-ahead logs", "1.3.5"),
   GC_TRASH_IGNORE("gc.trash.ignore", "false", PropertyType.BOOLEAN,
       "Do not use the Trash, even if it is configured.", "1.5.0"),
+  GC_TRASH_IGNORE_IMPORTS_ONLY("gc.trash.ignore.imports.only", "false", PropertyType.BOOLEAN,
+      "Skip Trash for Import files when the trash is in use.", "3.0.0"),

Review Comment:
   In spite of the description, it's not intuitive how this new property will interact with the previous `gc.trash.ignore` property. Rather than have a new BOOLEAN that modifies the behavior of the previous BOOLEAN only when the previous one was "true", how about changing `gc.trash.ignore` to accept a third possible value?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] EdColeman commented on a diff in pull request #3140: Add option to only skip trash for Import files

Posted by GitBox <gi...@apache.org>.
EdColeman commented on code in PR #3140:
URL: https://github.com/apache/accumulo/pull/3140#discussion_r1063581599


##########
core/src/main/java/org/apache/accumulo/core/conf/Property.java:
##########
@@ -746,6 +746,8 @@ public enum Property {
       "The number of threads used to delete RFiles and write-ahead logs", "1.3.5"),
   GC_TRASH_IGNORE("gc.trash.ignore", "false", PropertyType.BOOLEAN,
       "Do not use the Trash, even if it is configured.", "1.5.0"),
+  GC_TRASH_IGNORE_IMPORTS_ONLY("gc.trash.ignore.imports.only", "false", PropertyType.BOOLEAN,
+      "Skip trash only for Import files when gc.trash.ignore is true.", "3.0.0"),

Review Comment:
   I find this confusing,  Just from the descriptions, if the system is configured to use trash - then setting GC_TRASH_IGNORE = true overrides the system setting and files are not moved to trash and are immediately deleted.
   
   With GC_TRASH_IGNORE=true, setting GC_TRASH_IGNORE_IMPORTS_ONLY=true - then bulk import files skip the trash, but all other files will use the trash?
   
   Would it be easier for users if these were more independent?
   
   - GC_TRASH_IGNORE true - skip trash, false - use trash if configured for the system.
   - GC_TRASH_IGNORE_IMPORTS_ONLY - true - skip trash for bulk import files, overrides any GC_TRASH_IGNORE setting for bulk import files. false - use GC_TRASH_IGNORE or the system setting.
   
   My thinking is along the lines of:
   
   | system        | GC_TRASH_IGNORE | GC_TRASH_IGNORE_IMPORTS_ONLY | impact                |
   |---------------|-----------------|------------------------------|-----------------------|
   | trash disable | n/a             | n/a                          | skip trash, all files |
   | trash enabled | N               | N                            | ues trash, all files  |
   | trash enabled | N               | Y  | skip trash, bulk only |
   | trash enabled | Y  | N | skip trash, all files |
   | trash enabled | Y  | Y  | skip trash, all files |
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] ctubbsii commented on a diff in pull request #3140: Add option to only skip trash for Import files

Posted by "ctubbsii (via GitHub)" <gi...@apache.org>.
ctubbsii commented on code in PR #3140:
URL: https://github.com/apache/accumulo/pull/3140#discussion_r1126590309


##########
core/src/main/java/org/apache/accumulo/core/conf/Property.java:
##########
@@ -746,6 +746,8 @@ public enum Property {
       "The number of threads used to delete RFiles and write-ahead logs", "1.3.5"),
   GC_TRASH_IGNORE("gc.trash.ignore", "false", PropertyType.BOOLEAN,
       "Do not use the Trash, even if it is configured.", "1.5.0"),
+  GC_TRASH_IGNORE_IMPORTS_ONLY("gc.trash.ignore.imports.only", "false", PropertyType.BOOLEAN,
+      "Skip trash only for Import files when gc.trash.ignore is true.", "3.0.0"),

Review Comment:
   The table helps, but it also highlights how unintuitive having three separate properties are to control a desired outcome. I would much rather have a simple single property that had multiple options to select the specific policy, rather than leave it as an exercise for the user to try to determine the current policy by looking it up in a table like this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] ctubbsii commented on a diff in pull request #3140: Add option to only skip trash for Import files

Posted by "ctubbsii (via GitHub)" <gi...@apache.org>.
ctubbsii commented on code in PR #3140:
URL: https://github.com/apache/accumulo/pull/3140#discussion_r1132301179


##########
core/src/main/java/org/apache/accumulo/core/conf/Property.java:
##########
@@ -746,6 +746,8 @@ public enum Property {
       "The number of threads used to delete RFiles and write-ahead logs", "1.3.5"),
   GC_TRASH_IGNORE("gc.trash.ignore", "false", PropertyType.BOOLEAN,
       "Do not use the Trash, even if it is configured.", "1.5.0"),
+  GC_TRASH_IGNORE_IMPORTS_ONLY("gc.trash.ignore.imports.only", "false", PropertyType.BOOLEAN,
+      "Skip Trash for Import files when the trash is in use.", "3.0.0"),

Review Comment:
   Yes, that's what I was thinking when I said "third possible value" above. But, that solution is only good if that is the end of the requirements to expand this behavior and the number of configurable values stops at three. If the requirements expand to support additional use cases, then it would be better to make this behavior a pluggable policy than to enumerate a long list of possible values.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] ivakegg commented on a diff in pull request #3140: Add option to only skip trash for Import files

Posted by GitBox <gi...@apache.org>.
ivakegg commented on code in PR #3140:
URL: https://github.com/apache/accumulo/pull/3140#discussion_r1059377354


##########
core/src/main/java/org/apache/accumulo/core/conf/Property.java:
##########
@@ -746,6 +746,8 @@ public enum Property {
       "The number of threads used to delete RFiles and write-ahead logs", "1.3.5"),
   GC_TRASH_IGNORE("gc.trash.ignore", "false", PropertyType.BOOLEAN,
       "Do not use the Trash, even if it is configured.", "1.5.0"),
+  GC_TRASH_IGNORE_IMPORTS_ONLY("gc.trash.ignore.imports.only", "false", PropertyType.BOOLEAN,
+      "Skip trash only for Import files when gc.trash.ignore is true.", "2.1.0"),

Review Comment:
   I think this will be since 3.0.0 (or maybe 2.1.1 if backported).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] alerman commented on pull request #3140: Add option to only skip trash for Import files

Posted by GitBox <gi...@apache.org>.
alerman commented on PR #3140:
URL: https://github.com/apache/accumulo/pull/3140#issuecomment-1371245299

   I don't want to go through the effort for a regex for this, as that brings up some other questions in my opinion. For now, we take the simple approach and we can expand later if we deem it necessary. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] EdColeman commented on a diff in pull request #3140: Add option to only skip trash for Import files

Posted by GitBox <gi...@apache.org>.
EdColeman commented on code in PR #3140:
URL: https://github.com/apache/accumulo/pull/3140#discussion_r1063600680


##########
core/src/main/java/org/apache/accumulo/core/conf/Property.java:
##########
@@ -746,6 +746,8 @@ public enum Property {
       "The number of threads used to delete RFiles and write-ahead logs", "1.3.5"),
   GC_TRASH_IGNORE("gc.trash.ignore", "false", PropertyType.BOOLEAN,
       "Do not use the Trash, even if it is configured.", "1.5.0"),
+  GC_TRASH_IGNORE_IMPORTS_ONLY("gc.trash.ignore.imports.only", "false", PropertyType.BOOLEAN,
+      "Skip trash only for Import files when gc.trash.ignore is true.", "3.0.0"),

Review Comment:
   That's fine too - but that requires property upgrade code to translate the settings - and may have issues because we may need to deprecate, but keep GC_TRASH_IGNORE around for a while? Not sure...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] matthpeterson commented on pull request #3140: Add option to only skip trash for Import files

Posted by GitBox <gi...@apache.org>.
matthpeterson commented on PR #3140:
URL: https://github.com/apache/accumulo/pull/3140#issuecomment-1363964831

   Recommendation: add a test for a path containing directories.  Otherwise it looks good.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] dlmarion commented on a diff in pull request #3140: Add option to only skip trash for Import files

Posted by GitBox <gi...@apache.org>.
dlmarion commented on code in PR #3140:
URL: https://github.com/apache/accumulo/pull/3140#discussion_r1061787935


##########
server/gc/src/test/java/org/apache/accumulo/gc/SimpleGarbageCollectorTest.java:
##########
@@ -137,6 +137,21 @@ public void testMoveToTrash_NotUsingTrash() throws Exception {
     assertFalse(gc.moveToTrash(path));
   }
 
+  @Test
+  public void testMoveToTrash_NotUsingTrash_importsOnlyEnabled() throws Exception {
+    systemConfig.set(Property.GC_TRASH_IGNORE.getKey(), "true");
+    systemConfig.set(Property.GC_TRASH_IGNORE_IMPORTS_ONLY.getKey(), "true");
+    Path iFilePath = new Path("Ifile");

Review Comment:
   Please take a look at TabletFileTest.testValidPaths for some more realistic file and path names.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] EdColeman commented on a diff in pull request #3140: Add option to only skip trash for Import files

Posted by GitBox <gi...@apache.org>.
EdColeman commented on code in PR #3140:
URL: https://github.com/apache/accumulo/pull/3140#discussion_r1063600680


##########
core/src/main/java/org/apache/accumulo/core/conf/Property.java:
##########
@@ -746,6 +746,8 @@ public enum Property {
       "The number of threads used to delete RFiles and write-ahead logs", "1.3.5"),
   GC_TRASH_IGNORE("gc.trash.ignore", "false", PropertyType.BOOLEAN,
       "Do not use the Trash, even if it is configured.", "1.5.0"),
+  GC_TRASH_IGNORE_IMPORTS_ONLY("gc.trash.ignore.imports.only", "false", PropertyType.BOOLEAN,
+      "Skip trash only for Import files when gc.trash.ignore is true.", "3.0.0"),

Review Comment:
   That's fine too - but that requires property upgrade code to translate the settings - and may have issues because we may need to deprecate, but keep GC_TRASH_IGNORE? Not sure...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] cawaring commented on a diff in pull request #3140: Add option to only skip trash for Import files

Posted by "cawaring (via GitHub)" <gi...@apache.org>.
cawaring commented on code in PR #3140:
URL: https://github.com/apache/accumulo/pull/3140#discussion_r1132241113


##########
core/src/main/java/org/apache/accumulo/core/conf/Property.java:
##########
@@ -746,6 +746,8 @@ public enum Property {
       "The number of threads used to delete RFiles and write-ahead logs", "1.3.5"),
   GC_TRASH_IGNORE("gc.trash.ignore", "false", PropertyType.BOOLEAN,
       "Do not use the Trash, even if it is configured.", "1.5.0"),
+  GC_TRASH_IGNORE_IMPORTS_ONLY("gc.trash.ignore.imports.only", "false", PropertyType.BOOLEAN,
+      "Skip Trash for Import files when the trash is in use.", "3.0.0"),

Review Comment:
   Oh, I like the suggestion of one setting.  Something like gc_skip_trash = { ALL_FILES, BULK_FILES, NO_FILES}.  Is that more inline with what you were thinking? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] dlmarion commented on pull request #3140: Add option to only skip trash for Import files

Posted by "dlmarion (via GitHub)" <gi...@apache.org>.
dlmarion commented on PR #3140:
URL: https://github.com/apache/accumulo/pull/3140#issuecomment-1563325480

   I created #3428 against 2.1 as a replacement for this PR. It incorporates some of the suggestions here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] ctubbsii commented on pull request #3140: Add option to only skip trash for Import files

Posted by "ctubbsii (via GitHub)" <gi...@apache.org>.
ctubbsii commented on PR #3140:
URL: https://github.com/apache/accumulo/pull/3140#issuecomment-1563513486

   Closing this as superseded by #3428 . Ongoing design discussion in the code review there.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] alerman commented on a diff in pull request #3140: Add option to only skip trash for Import files

Posted by GitBox <gi...@apache.org>.
alerman commented on code in PR #3140:
URL: https://github.com/apache/accumulo/pull/3140#discussion_r1061870729


##########
server/gc/src/test/java/org/apache/accumulo/gc/SimpleGarbageCollectorTest.java:
##########
@@ -137,6 +137,21 @@ public void testMoveToTrash_NotUsingTrash() throws Exception {
     assertFalse(gc.moveToTrash(path));
   }
 
+  @Test
+  public void testMoveToTrash_NotUsingTrash_importsOnlyEnabled() throws Exception {
+    systemConfig.set(Property.GC_TRASH_IGNORE.getKey(), "true");
+    systemConfig.set(Property.GC_TRASH_IGNORE_IMPORTS_ONLY.getKey(), "true");
+    Path iFilePath = new Path("Ifile");

Review Comment:
   Updated the PR with paths similar to those in testValidPaths



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] dlmarion commented on a diff in pull request #3140: Add option to only skip trash for Import files

Posted by GitBox <gi...@apache.org>.
dlmarion commented on code in PR #3140:
URL: https://github.com/apache/accumulo/pull/3140#discussion_r1063585546


##########
core/src/main/java/org/apache/accumulo/core/conf/Property.java:
##########
@@ -746,6 +746,8 @@ public enum Property {
       "The number of threads used to delete RFiles and write-ahead logs", "1.3.5"),
   GC_TRASH_IGNORE("gc.trash.ignore", "false", PropertyType.BOOLEAN,
       "Do not use the Trash, even if it is configured.", "1.5.0"),
+  GC_TRASH_IGNORE_IMPORTS_ONLY("gc.trash.ignore.imports.only", "false", PropertyType.BOOLEAN,
+      "Skip trash only for Import files when gc.trash.ignore is true.", "3.0.0"),

Review Comment:
   @alerman and I had a discussion about this. He suggested renaming the properties to `GC_USE_TRASH`. It was very confusing to me working on this because things were not named in the affirmative, there were a bunch of negations in the code.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] alerman commented on pull request #3140: Add option to only skip trash for Import files

Posted by GitBox <gi...@apache.org>.
alerman commented on PR #3140:
URL: https://github.com/apache/accumulo/pull/3140#issuecomment-1363977070

   > Recommendation: add a test for an I-File using a path containing directories. Otherwise it looks good.
   
   Done. Nice catch.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] alerman commented on a diff in pull request #3140: Add option to only skip trash for Import files

Posted by GitBox <gi...@apache.org>.
alerman commented on code in PR #3140:
URL: https://github.com/apache/accumulo/pull/3140#discussion_r1061748306


##########
core/src/main/java/org/apache/accumulo/core/conf/Property.java:
##########
@@ -746,6 +746,8 @@ public enum Property {
       "The number of threads used to delete RFiles and write-ahead logs", "1.3.5"),
   GC_TRASH_IGNORE("gc.trash.ignore", "false", PropertyType.BOOLEAN,
       "Do not use the Trash, even if it is configured.", "1.5.0"),
+  GC_TRASH_IGNORE_IMPORTS_ONLY("gc.trash.ignore.imports.only", "false", PropertyType.BOOLEAN,
+      "Skip trash only for Import files when gc.trash.ignore is true.", "2.1.0"),

Review Comment:
   Updated to show 3.0.0



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] cawaring commented on a diff in pull request #3140: Add option to only skip trash for Import files

Posted by "cawaring (via GitHub)" <gi...@apache.org>.
cawaring commented on code in PR #3140:
URL: https://github.com/apache/accumulo/pull/3140#discussion_r1125727302


##########
core/src/main/java/org/apache/accumulo/core/conf/Property.java:
##########
@@ -746,6 +746,8 @@ public enum Property {
       "The number of threads used to delete RFiles and write-ahead logs", "1.3.5"),
   GC_TRASH_IGNORE("gc.trash.ignore", "false", PropertyType.BOOLEAN,
       "Do not use the Trash, even if it is configured.", "1.5.0"),
+  GC_TRASH_IGNORE_IMPORTS_ONLY("gc.trash.ignore.imports.only", "false", PropertyType.BOOLEAN,
+      "Skip Trash for Import files when the trash is in use.", "3.0.0"),

Review Comment:
   @ctubbsii,  I was actually thinking you were going to take the discussion in a different direction and also suggest that we  might consider doing this on a per table basis instead of as an all or nothing approach (all tables or no tables) as is now.  That it would be better to make this easier and less error prone for users instead of having them manage and configure something like regexs over Accumulo files in HDFS trash.  From a durability standpoint, we allow walogs and replication to be configured on a per table basis.  Trash is just another safety net that a user may wish to user.  Some tables can be more important than others and allowing that decision to be made on a per table basis to include bulk input files, seems like a useful feature.  
   
   I don't see this as Accumulo managing the trash.  I see this as more of giving the user explicit control over what Accumulo puts in the trash.    Once it is in the trash Accumulo isn't doing anything with it.  
   
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] ctubbsii commented on a diff in pull request #3140: Add option to only skip trash for Import files

Posted by "ctubbsii (via GitHub)" <gi...@apache.org>.
ctubbsii commented on code in PR #3140:
URL: https://github.com/apache/accumulo/pull/3140#discussion_r1126585073


##########
core/src/main/java/org/apache/accumulo/core/conf/Property.java:
##########
@@ -746,6 +746,8 @@ public enum Property {
       "The number of threads used to delete RFiles and write-ahead logs", "1.3.5"),
   GC_TRASH_IGNORE("gc.trash.ignore", "false", PropertyType.BOOLEAN,
       "Do not use the Trash, even if it is configured.", "1.5.0"),
+  GC_TRASH_IGNORE_IMPORTS_ONLY("gc.trash.ignore.imports.only", "false", PropertyType.BOOLEAN,
+      "Skip Trash for Import files when the trash is in use.", "3.0.0"),

Review Comment:
   Having a per-table property might be useful, but the unintuitiveness of two separate properties to control three modes of operation would still be an issue that needs addressing. The per-table option wasn't what was being proposed, and it didn't occur to me. I can see the merits of it, but it's not so simple to implement, since the accumulo-gc operates asynchronously, and doesn't know the table from which a file came... it can only make educated guesses based on the file's path, but that's not reliable, since it could have been used by multiple tables, or manually placed through some import process that didn't include a table id in the path... which is only done by convention, not a guarantee.
   
   My view that Accumulo is managing the trash is based on our long-held desire to try to decouple ourselves from HDFS, and I'm recognizing the trash as an HDFS-specific feature. The more that filesystem-related behavior is handled within the filesystem config, and Accumulo-related behavior is handled within Accumulo's config, the better, I think. While this may be less convenient initially, I think it makes greater sense as the overall system architecture grows in complexity. Replication (number of block replicas) and WALOG durability (hsync, hflush) are also filesystem-specific behaviors, but those knobs already exist. I'm more inclined to scrutinize the knobs that further tightly couple us to HDFS when they are new, so we're not digging ourselves a deeper hole that is harder to get out of (to decouple). But this is just a heuristic I apply to help me think about the consequences of a change... it's not a fixed prerequisite for me to support a change or anything like that.
   
   I was also thinking about an "on-delete" hook or similar... but I don't think I'd be in favor of that, because I think if a user wanted that, they could just subclass an existing FileSystem and add that feature to its delete method. If every user did that, it would be a hassle... but making a new FineGrainedTrashPolicyFileSystem only needs to be done once. I'm not suggesting this is the best option... just one possible alternative solution to the property changes being proposed here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] ctubbsii closed pull request #3140: Add option to only skip trash for Import files

Posted by "ctubbsii (via GitHub)" <gi...@apache.org>.
ctubbsii closed pull request #3140: Add option to only skip trash for Import files
URL: https://github.com/apache/accumulo/pull/3140


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org