You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2021/04/05 18:21:19 UTC

[GitHub] [accumulo-examples] jmark99 opened a new pull request #74: Avoid checking Accumulo table exists before creation

jmark99 opened a new pull request #74:
URL: https://github.com/apache/accumulo-examples/pull/74


   Rather than checking if a table exists prior to creation, just create, catch and ignore TableExistsException instead.
   
   This ticket removes the explicit check where appropriate. During the process several other small changes were made to the affected classes as well.
   
   For the Bloom related examples, the table names were updated to use contants. Several methods were created to remove some redundant code.
   
   For the CharacterHistogram.java class, an unneeded import was also removed.
   
   Closes #13 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-examples] jmark99 commented on pull request #74: Avoid checking Accumulo table exists before creation

Posted by GitBox <gi...@apache.org>.
jmark99 commented on pull request #74:
URL: https://github.com/apache/accumulo-examples/pull/74#issuecomment-813610625


   @EdColeman those are good questions to ask. I assume the initial intent of the ticket was to prevent checking for the existence of the table prior to every creation. The current state of the change keeps the examples working as before. That is, previously the code would check for the existence of the table and if it already existed would just keep chugging along without performing the actual creation. If errors arose or if the data was incorrect after a second consecuteve run, then the example would fail somewhere down the line or just have incorrect output (at least for most of the ones that I see). 
   
   Ideally, prior to running the examples you would want the table to not even exist. That way we would hopefully see consistent results between each run. I briefly considered updating the examples to delete  tables if they already existed prior to execution, but I was worried if someone ran an example and had a 'real' table with a name from the example that could cause a problem. I guess that could still occur if even if the examples run and names were identical. 
   
   Another possiblity is to update them to provide a message to the user to remove any existing tables before running and exit. That would allow the user to clear up existing tables and then run the example. I would probably prefer that path at the moment. What are your thoughts on that approach?
   
   I did run the integration tests back to back and they all still passed. I did not examine each example individually to see what would happen if run back to back. 
   
   So in summary, the current change keeps all of the examples running in the same manner as they were already running, but without the explicit table existence check. But I am open to updating to one of the aforementioned methods or to another route if you have other suggestions. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-examples] EdColeman commented on a change in pull request #74: Avoid checking Accumulo table exists before creation

Posted by GitBox <gi...@apache.org>.
EdColeman commented on a change in pull request #74:
URL: https://github.com/apache/accumulo-examples/pull/74#discussion_r627697227



##########
File path: spark/src/main/java/org/apache/accumulo/spark/CopyPlus5K.java
##########
@@ -76,8 +78,20 @@ private static void cleanupAndCreateTables(Properties props) throws Exception {
         client.tableOperations().delete(outputTable);
       }
       // Create tables
-      client.tableOperations().create(inputTable);
-      client.tableOperations().create(outputTable);
+      try {
+        client.tableOperations().create(inputTable);
+      } catch (TableExistsException e) {
+        low.error("Something went wrong. Table '{}' should have been deleted prior to creation "
+            + "attempt!", inputTable);
+        return;
+      }
+      try {
+        client.tableOperations().create(outputTable);
+      } catch (TableExistsException e) {
+        low.error("Something went wrong. Table '{}' should have been deleted prior to creation "

Review comment:
       Should this be log (instead of low)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-examples] EdColeman commented on pull request #74: Avoid checking Accumulo table exists before creation

Posted by GitBox <gi...@apache.org>.
EdColeman commented on pull request #74:
URL: https://github.com/apache/accumulo-examples/pull/74#issuecomment-813580605


   A general question - could ignoring a table exists cause confusion if the results are examined by the user?  Say for an ingest, running the example twice might lead to doubling the ingest - if I expected 10 rows but instead had 20 after a second run would that be worse than having the second fail because the table was there?  Or are the scanners that are pulling the expect number of rows ingested and would only return partial results on multiple runs?  Would it help if the exception was logged instead of silently ignored?
   
   Another thought is if the examples are to serve as "recommended coding practices" is there benefit of showing the handling of table exists exceptions something we would recommend?
   
   Overall I have no issues with the changes, just mainly curious as to why TableExists exceptions would be ignored.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-examples] jmark99 commented on pull request #74: Avoid checking Accumulo table exists before creation

Posted by GitBox <gi...@apache.org>.
jmark99 commented on pull request #74:
URL: https://github.com/apache/accumulo-examples/pull/74#issuecomment-813649725


   @EdColeman and I had some additional Slack conversations and came up with a few ideas to hopefully improve the examples a bit more. In regards to table creation a couple of additional changes will be made. Firstly, the table names are going to receive a prefix so that there will be no conflict with any existing tables that a user might already be using. Additionally, If a table already exists, this will be logged so that a user can make an informed decision as to whether they would like to clean up the tables prior to running or re-running the examples. The log messages will indicate whether any additional configuration has taken place in cases where the table already existed. Finally, if situations exist where the example should definitely not proceed if a table already exists, the example will be updated to reflect this.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-examples] jmark99 merged pull request #74: Avoid checking Accumulo table exists before creation

Posted by GitBox <gi...@apache.org>.
jmark99 merged pull request #74:
URL: https://github.com/apache/accumulo-examples/pull/74


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-examples] jmark99 commented on a change in pull request #74: Avoid checking Accumulo table exists before creation

Posted by GitBox <gi...@apache.org>.
jmark99 commented on a change in pull request #74:
URL: https://github.com/apache/accumulo-examples/pull/74#discussion_r627710963



##########
File path: spark/src/main/java/org/apache/accumulo/spark/CopyPlus5K.java
##########
@@ -76,8 +78,20 @@ private static void cleanupAndCreateTables(Properties props) throws Exception {
         client.tableOperations().delete(outputTable);
       }
       // Create tables
-      client.tableOperations().create(inputTable);
-      client.tableOperations().create(outputTable);
+      try {
+        client.tableOperations().create(inputTable);
+      } catch (TableExistsException e) {
+        low.error("Something went wrong. Table '{}' should have been deleted prior to creation "
+            + "attempt!", inputTable);
+        return;
+      }
+      try {
+        client.tableOperations().create(outputTable);
+      } catch (TableExistsException e) {
+        low.error("Something went wrong. Table '{}' should have been deleted prior to creation "

Review comment:
       Yes :) Thanks for the catch. It was not caught by compile since this class is not in the src path of the examples. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-examples] jmark99 commented on pull request #74: Avoid checking Accumulo table exists before creation

Posted by GitBox <gi...@apache.org>.
jmark99 commented on pull request #74:
URL: https://github.com/apache/accumulo-examples/pull/74#issuecomment-834301550


   @EdColeman, yes, you need to have a running instance and update the examples conf files accordingly. For a couple of the examples the examples jar file must be in the Accumulo lib directory or somewhere on the classpath. I use fluo-uno for running the examples. I've gone through them all and except for the couple of issues that are still listed  in GitHub, they are working as expected. Are you good with me pushing the changes at this time?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-examples] EdColeman commented on a change in pull request #74: Avoid checking Accumulo table exists before creation

Posted by GitBox <gi...@apache.org>.
EdColeman commented on a change in pull request #74:
URL: https://github.com/apache/accumulo-examples/pull/74#discussion_r627696869



##########
File path: spark/src/main/java/org/apache/accumulo/spark/CopyPlus5K.java
##########
@@ -76,8 +78,20 @@ private static void cleanupAndCreateTables(Properties props) throws Exception {
         client.tableOperations().delete(outputTable);
       }
       // Create tables
-      client.tableOperations().create(inputTable);
-      client.tableOperations().create(outputTable);
+      try {
+        client.tableOperations().create(inputTable);
+      } catch (TableExistsException e) {
+        low.error("Something went wrong. Table '{}' should have been deleted prior to creation "

Review comment:
       Should this be log?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo-examples] EdColeman commented on a change in pull request #74: Avoid checking Accumulo table exists before creation

Posted by GitBox <gi...@apache.org>.
EdColeman commented on a change in pull request #74:
URL: https://github.com/apache/accumulo-examples/pull/74#discussion_r607262218



##########
File path: spark/src/main/java/org/apache/accumulo/spark/CopyPlus5K.java
##########
@@ -76,8 +76,16 @@ private static void cleanupAndCreateTables(Properties props) throws Exception {
         client.tableOperations().delete(outputTable);
       }
       // Create tables
-      client.tableOperations().create(inputTable);
-      client.tableOperations().create(outputTable);
+      try {
+        client.tableOperations().create(inputTable);
+      } catch (TableExistsException e) {
+        // ignore

Review comment:
       Does this not represent an actual error condition - this implies that the proceeding table delete failed?  It shouldn't happen, but if it does then it seems something needs investigation.

##########
File path: src/main/java/org/apache/accumulo/examples/bloom/BloomFiltersNotFound.java
##########
@@ -33,22 +33,35 @@ public static void main(String[] args)
     ClientOpts opts = new ClientOpts();
     opts.parseArgs(BloomFiltersNotFound.class.getName(), args);
 
+    final String BLOOM_TEST3 = "bloom_test3";
+    final String BLOOM_TEST4 = "bloom_test4";
+
     try (AccumuloClient client = Accumulo.newClient().from(opts.getClientPropsPath()).build()) {
       try {
-        client.tableOperations().create("bloom_test3");
-        client.tableOperations().create("bloom_test4");
-        client.tableOperations().setProperty("bloom_test4", "table.bloom.enabled", "true");
+        client.tableOperations().create(BLOOM_TEST3);
+      } catch (TableExistsException e) {
+        // ignore

Review comment:
       If you run the test twice, do you get double the data? Does that matter?  




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org