You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2021/12/10 21:16:30 UTC

[GitHub] [accumulo] jmark99 opened a new pull request #2381: Refactor listSplits operation when using maxSplits

jmark99 opened a new pull request #2381:
URL: https://github.com/apache/accumulo/pull/2381


   Refactored listSplits method in TableOperationsImpl. This change affects the listSplits command which takes maxSplits as an option.
   
   * Renamed variable names to enhance readability.
   * Added documentation for the method.
   * Replaced while-loop with if-loop after determining while-loop was only run at most once each time.
   * Created IT test for method in ShellIT class.
   
   Closes #2371 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] milleruntime commented on a change in pull request #2381: Refactor listSplits operation when using maxSplits

Posted by GitBox <gi...@apache.org>.
milleruntime commented on a change in pull request #2381:
URL: https://github.com/apache/accumulo/pull/2381#discussion_r767819688



##########
File path: core/src/main/java/org/apache/accumulo/core/clientImpl/TableOperationsImpl.java
##########
@@ -689,28 +689,59 @@ public void deleteRows(String tableName, Text start, Text end)
 
   }
 
+  /**
+   * This version of listSplits is called when the maxSplits options is provided. If the value of
+   * maxSplits is greater than the number of existing splits, then all splits are returned and no
+   * additional processing is performed.
+   *
+   * But, if the value of maxSplits is less than the number of existing splits, maxSplit split
+   * values are returned. These split values are "evenly" selected from the existing splits based
+   * upon the algorithm implemented in the method.
+   *
+   * A stepSize is calculated based upon the number of splits requested and the total split count. A
+   * running sum adjusted by this stepSize is calculated as each split is parsed. Once this sum
+   * exceeds a value of 1, the current split point is selected to be returned. The sum is then
+   * decremented by 1 and the process continues until all existing splits have been parsed or
+   * maxSplits splits have been selected.
+   *
+   * @param tableName
+   *          the name of the table
+   * @param maxSplits
+   *          specifies the maximum number of splits to return
+   * @return a Collection containing a subset of evenly selected splits
+   */
   @Override
-  public Collection<Text> listSplits(String tableName, int maxSplits)
+  public Collection<Text> listSplits(final String tableName, final int maxSplits)
       throws TableNotFoundException, AccumuloSecurityException {
     // tableName is validated in _listSplits
-    List<Text> endRows = _listSplits(tableName);
-    if (endRows.size() <= maxSplits)
-      return endRows;
-
-    double r = (maxSplits + 1) / (double) (endRows.size());
-    double pos = 0;
-    ArrayList<Text> subset = new ArrayList<>(maxSplits);
-    int j = 0;
-    for (int i = 0; i < endRows.size() && j < maxSplits; i++) {
-      pos += r;
-      while (pos > 1) {
-        subset.add(endRows.get(i));
-        j++;
-        pos -= 1;
-      }
+    final List<Text> existingSplits = _listSplits(tableName);
+
+    // As long as maxSplits is equal to or larger than the number of current splits, the existing
+    // splits are returned and no additional processing is necessary.
+    if (existingSplits.size() <= maxSplits) {
+      return existingSplits;
     }
 
-    return subset;
+    // When the number of maxSplits requested is less than the number of existing splits, the
+    // following code populates the splitsSubset list 'evenly' from the existing splits
+    ArrayList<Text> splitsSubset = new ArrayList<>(maxSplits);
+    final int SELECTION_THRESHOLD = 1;
+
+    // stepSize can never be greater than 1 due to the if-loop check above.

Review comment:
       I think you meant if statement or code block.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] keith-turner commented on a change in pull request #2381: Refactor listSplits operation when using maxSplits

Posted by GitBox <gi...@apache.org>.
keith-turner commented on a change in pull request #2381:
URL: https://github.com/apache/accumulo/pull/2381#discussion_r767037425



##########
File path: core/src/main/java/org/apache/accumulo/core/clientImpl/TableOperationsImpl.java
##########
@@ -689,28 +689,61 @@ public void deleteRows(String tableName, Text start, Text end)
 
   }
 
+  /**
+   * This version of listSplits is called when the maxSplits options is provided. If the value of
+   * maxSplits is greater than the number of existing splits, then all splits are returned and no
+   * additional processing is performed.
+   *
+   * But, if the value of maxSplits is less than the number of existing splits, maxSplit split
+   * values are returned. These split values are "evenly" selected from the existing splits based
+   * upon the algorithm implemented in the method.
+   *
+   * A stepSize is calculated based upon the number of splits requested and the total split count. A
+   * running sum adjusted by this stepSize is calculated as each split is parsed. Once this sum
+   * exceeds a value of 1, the current split point is selected to be returned. The sum is then
+   * decremented by 1 and the process continues until all existing splits have been parsed or
+   * maxSplits splits have been selected.
+   *
+   * @param tableName
+   *          the name of the table
+   * @param maxSplits
+   *          specifies the maximum number of splits to return
+   * @return a Collection containing a subset of evenly selected splits
+   */
   @Override
-  public Collection<Text> listSplits(String tableName, int maxSplits)
+  public Collection<Text> listSplits(final String tableName, final int maxSplits)
       throws TableNotFoundException, AccumuloSecurityException {
     // tableName is validated in _listSplits
-    List<Text> endRows = _listSplits(tableName);
-    if (endRows.size() <= maxSplits)
-      return endRows;
-
-    double r = (maxSplits + 1) / (double) (endRows.size());
-    double pos = 0;
-    ArrayList<Text> subset = new ArrayList<>(maxSplits);
-    int j = 0;
-    for (int i = 0; i < endRows.size() && j < maxSplits; i++) {
-      pos += r;
-      while (pos > 1) {
-        subset.add(endRows.get(i));
-        j++;
-        pos -= 1;
-      }
+    final List<Text> existingSplits = _listSplits(tableName);
+
+    // As long as maxSplits is equal to or larger than the number of current splits, the existing
+    // splits are returned and no additional processing is necessary.
+    if (existingSplits.size() <= maxSplits) {
+      return existingSplits;
     }
 
-    return subset;
+    // When the number of maxSplits requested is less than the number of existing splits, the
+    // following code populates the splitsSubset list 'evenly' from the existing splits
+    ArrayList<Text> splitsSubset = new ArrayList<>(maxSplits);
+    int selectedSoFar = 0;
+    final int SELECTION_THRESHOLD = 1;
+
+    // stepSize can never be greater than 1 due to the if-loop check above.
+    final double stepSize = (maxSplits + 1) / (double) (existingSplits.size());
+    double selectionTrigger = 0.0;
+
+    for (Text exitingSplit : existingSplits) {
+      if (selectedSoFar >= maxSplits) {
+        break;
+      }
+      selectionTrigger += stepSize;
+      if (selectionTrigger > SELECTION_THRESHOLD) {
+        splitsSubset.add(exitingSplit);

Review comment:
       ```suggestion
           splitsSubset.add(existingSplit);
   ```

##########
File path: core/src/main/java/org/apache/accumulo/core/clientImpl/TableOperationsImpl.java
##########
@@ -689,28 +689,61 @@ public void deleteRows(String tableName, Text start, Text end)
 
   }
 
+  /**
+   * This version of listSplits is called when the maxSplits options is provided. If the value of
+   * maxSplits is greater than the number of existing splits, then all splits are returned and no
+   * additional processing is performed.
+   *
+   * But, if the value of maxSplits is less than the number of existing splits, maxSplit split
+   * values are returned. These split values are "evenly" selected from the existing splits based
+   * upon the algorithm implemented in the method.
+   *
+   * A stepSize is calculated based upon the number of splits requested and the total split count. A
+   * running sum adjusted by this stepSize is calculated as each split is parsed. Once this sum
+   * exceeds a value of 1, the current split point is selected to be returned. The sum is then
+   * decremented by 1 and the process continues until all existing splits have been parsed or
+   * maxSplits splits have been selected.
+   *
+   * @param tableName
+   *          the name of the table
+   * @param maxSplits
+   *          specifies the maximum number of splits to return
+   * @return a Collection containing a subset of evenly selected splits
+   */
   @Override
-  public Collection<Text> listSplits(String tableName, int maxSplits)
+  public Collection<Text> listSplits(final String tableName, final int maxSplits)
       throws TableNotFoundException, AccumuloSecurityException {
     // tableName is validated in _listSplits
-    List<Text> endRows = _listSplits(tableName);
-    if (endRows.size() <= maxSplits)
-      return endRows;
-
-    double r = (maxSplits + 1) / (double) (endRows.size());
-    double pos = 0;
-    ArrayList<Text> subset = new ArrayList<>(maxSplits);
-    int j = 0;
-    for (int i = 0; i < endRows.size() && j < maxSplits; i++) {
-      pos += r;
-      while (pos > 1) {
-        subset.add(endRows.get(i));
-        j++;
-        pos -= 1;
-      }
+    final List<Text> existingSplits = _listSplits(tableName);
+
+    // As long as maxSplits is equal to or larger than the number of current splits, the existing
+    // splits are returned and no additional processing is necessary.
+    if (existingSplits.size() <= maxSplits) {
+      return existingSplits;
     }
 
-    return subset;
+    // When the number of maxSplits requested is less than the number of existing splits, the
+    // following code populates the splitsSubset list 'evenly' from the existing splits
+    ArrayList<Text> splitsSubset = new ArrayList<>(maxSplits);
+    int selectedSoFar = 0;
+    final int SELECTION_THRESHOLD = 1;
+
+    // stepSize can never be greater than 1 due to the if-loop check above.
+    final double stepSize = (maxSplits + 1) / (double) (existingSplits.size());
+    double selectionTrigger = 0.0;
+
+    for (Text exitingSplit : existingSplits) {
+      if (selectedSoFar >= maxSplits) {

Review comment:
       May be able to do the following and drop the selectedSoFar variable.
   
   ```suggestion
         if (splitsSubset.size() >= maxSplits) {
   ```

##########
File path: core/src/main/java/org/apache/accumulo/core/clientImpl/TableOperationsImpl.java
##########
@@ -689,28 +689,61 @@ public void deleteRows(String tableName, Text start, Text end)
 
   }
 
+  /**
+   * This version of listSplits is called when the maxSplits options is provided. If the value of
+   * maxSplits is greater than the number of existing splits, then all splits are returned and no
+   * additional processing is performed.
+   *
+   * But, if the value of maxSplits is less than the number of existing splits, maxSplit split
+   * values are returned. These split values are "evenly" selected from the existing splits based
+   * upon the algorithm implemented in the method.
+   *
+   * A stepSize is calculated based upon the number of splits requested and the total split count. A
+   * running sum adjusted by this stepSize is calculated as each split is parsed. Once this sum
+   * exceeds a value of 1, the current split point is selected to be returned. The sum is then
+   * decremented by 1 and the process continues until all existing splits have been parsed or
+   * maxSplits splits have been selected.
+   *
+   * @param tableName
+   *          the name of the table
+   * @param maxSplits
+   *          specifies the maximum number of splits to return
+   * @return a Collection containing a subset of evenly selected splits
+   */
   @Override
-  public Collection<Text> listSplits(String tableName, int maxSplits)
+  public Collection<Text> listSplits(final String tableName, final int maxSplits)
       throws TableNotFoundException, AccumuloSecurityException {
     // tableName is validated in _listSplits
-    List<Text> endRows = _listSplits(tableName);
-    if (endRows.size() <= maxSplits)
-      return endRows;
-
-    double r = (maxSplits + 1) / (double) (endRows.size());
-    double pos = 0;
-    ArrayList<Text> subset = new ArrayList<>(maxSplits);
-    int j = 0;
-    for (int i = 0; i < endRows.size() && j < maxSplits; i++) {
-      pos += r;
-      while (pos > 1) {
-        subset.add(endRows.get(i));
-        j++;
-        pos -= 1;
-      }
+    final List<Text> existingSplits = _listSplits(tableName);
+
+    // As long as maxSplits is equal to or larger than the number of current splits, the existing
+    // splits are returned and no additional processing is necessary.
+    if (existingSplits.size() <= maxSplits) {
+      return existingSplits;
     }
 
-    return subset;
+    // When the number of maxSplits requested is less than the number of existing splits, the
+    // following code populates the splitsSubset list 'evenly' from the existing splits
+    ArrayList<Text> splitsSubset = new ArrayList<>(maxSplits);
+    int selectedSoFar = 0;
+    final int SELECTION_THRESHOLD = 1;
+
+    // stepSize can never be greater than 1 due to the if-loop check above.
+    final double stepSize = (maxSplits + 1) / (double) (existingSplits.size());
+    double selectionTrigger = 0.0;
+
+    for (Text exitingSplit : existingSplits) {

Review comment:
       ```suggestion
       for (Text existingSplit : existingSplits) {
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] jmark99 merged pull request #2381: Refactor listSplits operation when using maxSplits

Posted by GitBox <gi...@apache.org>.
jmark99 merged pull request #2381:
URL: https://github.com/apache/accumulo/pull/2381


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [accumulo] keith-turner commented on a change in pull request #2381: Refactor listSplits operation when using maxSplits

Posted by GitBox <gi...@apache.org>.
keith-turner commented on a change in pull request #2381:
URL: https://github.com/apache/accumulo/pull/2381#discussion_r767037425



##########
File path: core/src/main/java/org/apache/accumulo/core/clientImpl/TableOperationsImpl.java
##########
@@ -689,28 +689,61 @@ public void deleteRows(String tableName, Text start, Text end)
 
   }
 
+  /**
+   * This version of listSplits is called when the maxSplits options is provided. If the value of
+   * maxSplits is greater than the number of existing splits, then all splits are returned and no
+   * additional processing is performed.
+   *
+   * But, if the value of maxSplits is less than the number of existing splits, maxSplit split
+   * values are returned. These split values are "evenly" selected from the existing splits based
+   * upon the algorithm implemented in the method.
+   *
+   * A stepSize is calculated based upon the number of splits requested and the total split count. A
+   * running sum adjusted by this stepSize is calculated as each split is parsed. Once this sum
+   * exceeds a value of 1, the current split point is selected to be returned. The sum is then
+   * decremented by 1 and the process continues until all existing splits have been parsed or
+   * maxSplits splits have been selected.
+   *
+   * @param tableName
+   *          the name of the table
+   * @param maxSplits
+   *          specifies the maximum number of splits to return
+   * @return a Collection containing a subset of evenly selected splits
+   */
   @Override
-  public Collection<Text> listSplits(String tableName, int maxSplits)
+  public Collection<Text> listSplits(final String tableName, final int maxSplits)
       throws TableNotFoundException, AccumuloSecurityException {
     // tableName is validated in _listSplits
-    List<Text> endRows = _listSplits(tableName);
-    if (endRows.size() <= maxSplits)
-      return endRows;
-
-    double r = (maxSplits + 1) / (double) (endRows.size());
-    double pos = 0;
-    ArrayList<Text> subset = new ArrayList<>(maxSplits);
-    int j = 0;
-    for (int i = 0; i < endRows.size() && j < maxSplits; i++) {
-      pos += r;
-      while (pos > 1) {
-        subset.add(endRows.get(i));
-        j++;
-        pos -= 1;
-      }
+    final List<Text> existingSplits = _listSplits(tableName);
+
+    // As long as maxSplits is equal to or larger than the number of current splits, the existing
+    // splits are returned and no additional processing is necessary.
+    if (existingSplits.size() <= maxSplits) {
+      return existingSplits;
     }
 
-    return subset;
+    // When the number of maxSplits requested is less than the number of existing splits, the
+    // following code populates the splitsSubset list 'evenly' from the existing splits
+    ArrayList<Text> splitsSubset = new ArrayList<>(maxSplits);
+    int selectedSoFar = 0;
+    final int SELECTION_THRESHOLD = 1;
+
+    // stepSize can never be greater than 1 due to the if-loop check above.
+    final double stepSize = (maxSplits + 1) / (double) (existingSplits.size());
+    double selectionTrigger = 0.0;
+
+    for (Text exitingSplit : existingSplits) {
+      if (selectedSoFar >= maxSplits) {
+        break;
+      }
+      selectionTrigger += stepSize;
+      if (selectionTrigger > SELECTION_THRESHOLD) {
+        splitsSubset.add(exitingSplit);

Review comment:
       ```suggestion
           splitsSubset.add(existingSplit);
   ```

##########
File path: core/src/main/java/org/apache/accumulo/core/clientImpl/TableOperationsImpl.java
##########
@@ -689,28 +689,61 @@ public void deleteRows(String tableName, Text start, Text end)
 
   }
 
+  /**
+   * This version of listSplits is called when the maxSplits options is provided. If the value of
+   * maxSplits is greater than the number of existing splits, then all splits are returned and no
+   * additional processing is performed.
+   *
+   * But, if the value of maxSplits is less than the number of existing splits, maxSplit split
+   * values are returned. These split values are "evenly" selected from the existing splits based
+   * upon the algorithm implemented in the method.
+   *
+   * A stepSize is calculated based upon the number of splits requested and the total split count. A
+   * running sum adjusted by this stepSize is calculated as each split is parsed. Once this sum
+   * exceeds a value of 1, the current split point is selected to be returned. The sum is then
+   * decremented by 1 and the process continues until all existing splits have been parsed or
+   * maxSplits splits have been selected.
+   *
+   * @param tableName
+   *          the name of the table
+   * @param maxSplits
+   *          specifies the maximum number of splits to return
+   * @return a Collection containing a subset of evenly selected splits
+   */
   @Override
-  public Collection<Text> listSplits(String tableName, int maxSplits)
+  public Collection<Text> listSplits(final String tableName, final int maxSplits)
       throws TableNotFoundException, AccumuloSecurityException {
     // tableName is validated in _listSplits
-    List<Text> endRows = _listSplits(tableName);
-    if (endRows.size() <= maxSplits)
-      return endRows;
-
-    double r = (maxSplits + 1) / (double) (endRows.size());
-    double pos = 0;
-    ArrayList<Text> subset = new ArrayList<>(maxSplits);
-    int j = 0;
-    for (int i = 0; i < endRows.size() && j < maxSplits; i++) {
-      pos += r;
-      while (pos > 1) {
-        subset.add(endRows.get(i));
-        j++;
-        pos -= 1;
-      }
+    final List<Text> existingSplits = _listSplits(tableName);
+
+    // As long as maxSplits is equal to or larger than the number of current splits, the existing
+    // splits are returned and no additional processing is necessary.
+    if (existingSplits.size() <= maxSplits) {
+      return existingSplits;
     }
 
-    return subset;
+    // When the number of maxSplits requested is less than the number of existing splits, the
+    // following code populates the splitsSubset list 'evenly' from the existing splits
+    ArrayList<Text> splitsSubset = new ArrayList<>(maxSplits);
+    int selectedSoFar = 0;
+    final int SELECTION_THRESHOLD = 1;
+
+    // stepSize can never be greater than 1 due to the if-loop check above.
+    final double stepSize = (maxSplits + 1) / (double) (existingSplits.size());
+    double selectionTrigger = 0.0;
+
+    for (Text exitingSplit : existingSplits) {
+      if (selectedSoFar >= maxSplits) {

Review comment:
       May be able to do the following and drop the selectedSoFar variable.
   
   ```suggestion
         if (splitsSubset.size() >= maxSplits) {
   ```

##########
File path: core/src/main/java/org/apache/accumulo/core/clientImpl/TableOperationsImpl.java
##########
@@ -689,28 +689,61 @@ public void deleteRows(String tableName, Text start, Text end)
 
   }
 
+  /**
+   * This version of listSplits is called when the maxSplits options is provided. If the value of
+   * maxSplits is greater than the number of existing splits, then all splits are returned and no
+   * additional processing is performed.
+   *
+   * But, if the value of maxSplits is less than the number of existing splits, maxSplit split
+   * values are returned. These split values are "evenly" selected from the existing splits based
+   * upon the algorithm implemented in the method.
+   *
+   * A stepSize is calculated based upon the number of splits requested and the total split count. A
+   * running sum adjusted by this stepSize is calculated as each split is parsed. Once this sum
+   * exceeds a value of 1, the current split point is selected to be returned. The sum is then
+   * decremented by 1 and the process continues until all existing splits have been parsed or
+   * maxSplits splits have been selected.
+   *
+   * @param tableName
+   *          the name of the table
+   * @param maxSplits
+   *          specifies the maximum number of splits to return
+   * @return a Collection containing a subset of evenly selected splits
+   */
   @Override
-  public Collection<Text> listSplits(String tableName, int maxSplits)
+  public Collection<Text> listSplits(final String tableName, final int maxSplits)
       throws TableNotFoundException, AccumuloSecurityException {
     // tableName is validated in _listSplits
-    List<Text> endRows = _listSplits(tableName);
-    if (endRows.size() <= maxSplits)
-      return endRows;
-
-    double r = (maxSplits + 1) / (double) (endRows.size());
-    double pos = 0;
-    ArrayList<Text> subset = new ArrayList<>(maxSplits);
-    int j = 0;
-    for (int i = 0; i < endRows.size() && j < maxSplits; i++) {
-      pos += r;
-      while (pos > 1) {
-        subset.add(endRows.get(i));
-        j++;
-        pos -= 1;
-      }
+    final List<Text> existingSplits = _listSplits(tableName);
+
+    // As long as maxSplits is equal to or larger than the number of current splits, the existing
+    // splits are returned and no additional processing is necessary.
+    if (existingSplits.size() <= maxSplits) {
+      return existingSplits;
     }
 
-    return subset;
+    // When the number of maxSplits requested is less than the number of existing splits, the
+    // following code populates the splitsSubset list 'evenly' from the existing splits
+    ArrayList<Text> splitsSubset = new ArrayList<>(maxSplits);
+    int selectedSoFar = 0;
+    final int SELECTION_THRESHOLD = 1;
+
+    // stepSize can never be greater than 1 due to the if-loop check above.
+    final double stepSize = (maxSplits + 1) / (double) (existingSplits.size());
+    double selectionTrigger = 0.0;
+
+    for (Text exitingSplit : existingSplits) {

Review comment:
       ```suggestion
       for (Text existingSplit : existingSplits) {
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org