You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/02/20 23:41:18 UTC

[GitHub] [druid] jon-wei opened a new pull request #9384: Add join prefix duplicate/shadowing check

jon-wei opened a new pull request #9384: Add join prefix duplicate/shadowing check
URL: https://github.com/apache/druid/pull/9384
 
 
   This PR adds a sanity check for join clause prefix duplication and shadowing, addressing the followng PR comment: https://github.com/apache/druid/pull/9301#discussion_r375009323
   
   This PR has:
   - [x] been self-reviewed.
   - [ ] added documentation for new or modified features or behaviors.
   - [x] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/licenses.yaml)
   - [x] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [x] added unit tests or modified existing tests to cover new code paths.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] ccaominh commented on a change in pull request #9384: Add join prefix duplicate/shadowing check

Posted by GitBox <gi...@apache.org>.
ccaominh commented on a change in pull request #9384: Add join prefix duplicate/shadowing check
URL: https://github.com/apache/druid/pull/9384#discussion_r384184065
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/join/Joinables.java
 ##########
 @@ -114,4 +122,45 @@ public static boolean isPrefixedBy(final String columnName, final String prefix)
       );
     }).collect(Collectors.toList());
   }
+
+  private static void checkPreJoinableClausesForDuplicatesAndShadowing(
+      final List<PreJoinableClause> preJoinableClauses
+  )
+  {
+    List<String> prefixes = new ArrayList<>();
+    for (PreJoinableClause clause : preJoinableClauses) {
+      prefixes.add(clause.getPrefix());
+    }
+
+    checkPrefixesForDuplicatesAndShadowing(prefixes);
+  }
+
+  /**
+   * Check if any prefixes in the provided list duplicate or shadow each other.
+   *
+   * @param prefixes A mutable list containing the prefixes to check. This list will be sorted by descending
+   *                 string length.
+   */
+  public static void checkPrefixesForDuplicatesAndShadowing(
+      final List<String> prefixes
+  )
+  {
+    // this is a naive approach that assumes we'll typically handle only a small number of prefixes
+    prefixes.sort(DESCENDING_LENGTH_STRING_COMPARATOR);
+    for (int i = 0; i < prefixes.size(); i++) {
+      String prefix = prefixes.get(i);
+      for (int k = i; k < prefixes.size(); k++) {
+        if (i != k) {
 
 Review comment:
   Can the check for `i != k` be removed if `k` is initialized to `i + 1`?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei commented on a change in pull request #9384: Add join prefix duplicate/shadowing check

Posted by GitBox <gi...@apache.org>.
jon-wei commented on a change in pull request #9384: Add join prefix duplicate/shadowing check
URL: https://github.com/apache/druid/pull/9384#discussion_r383607918
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/join/Joinables.java
 ##########
 @@ -114,4 +118,37 @@ public static boolean isPrefixedBy(final String columnName, final String prefix)
       );
     }).collect(Collectors.toList());
   }
+
+  public static void checkPreJoinableClausesForDuplicatesAndShadowing(
+      final List<PreJoinableClause> preJoinableClauses
+  )
+  {
+    List<String> prefixes = new ArrayList<>();
+    for (PreJoinableClause clause : preJoinableClauses) {
+      prefixes.add(clause.getPrefix());
+    }
+
+    checkPrefixesForDuplicatesAndShadowing(prefixes);
+  }
+
+  public static void checkPrefixesForDuplicatesAndShadowing(
 
 Review comment:
   I kept this public, it's used in tests and I think it could be useful as a general utility method

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9384: Add join prefix duplicate/shadowing check

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9384: Add join prefix duplicate/shadowing check
URL: https://github.com/apache/druid/pull/9384#discussion_r382871144
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/join/Joinables.java
 ##########
 @@ -114,4 +118,37 @@ public static boolean isPrefixedBy(final String columnName, final String prefix)
       );
     }).collect(Collectors.toList());
   }
+
+  public static void checkPreJoinableClausesForDuplicatesAndShadowing(
+      final List<PreJoinableClause> preJoinableClauses
+  )
+  {
+    List<String> prefixes = new ArrayList<>();
+    for (PreJoinableClause clause : preJoinableClauses) {
+      prefixes.add(clause.getPrefix());
+    }
+
+    checkPrefixesForDuplicatesAndShadowing(prefixes);
+  }
+
+  public static void checkPrefixesForDuplicatesAndShadowing(
 
 Review comment:
   ```suggestion
     private static void checkPrefixesForDuplicatesAndShadowing(
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9384: Add join prefix duplicate/shadowing check

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9384: Add join prefix duplicate/shadowing check
URL: https://github.com/apache/druid/pull/9384#discussion_r382873381
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/join/Joinables.java
 ##########
 @@ -114,4 +118,37 @@ public static boolean isPrefixedBy(final String columnName, final String prefix)
       );
     }).collect(Collectors.toList());
   }
+
+  public static void checkPreJoinableClausesForDuplicatesAndShadowing(
+      final List<PreJoinableClause> preJoinableClauses
+  )
+  {
+    List<String> prefixes = new ArrayList<>();
+    for (PreJoinableClause clause : preJoinableClauses) {
+      prefixes.add(clause.getPrefix());
+    }
+
+    checkPrefixesForDuplicatesAndShadowing(prefixes);
 
 Review comment:
   I think this should use a Trie of some sort here so the complexity isn't `n + n^2 * k`
   
   I don't know the best data structure to use here off the top of my head, but I can search. How many clauses is reasonable - if it's just 5, this may not matter in the big picture?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei merged pull request #9384: Add join prefix duplicate/shadowing check

Posted by GitBox <gi...@apache.org>.
jon-wei merged pull request #9384: Add join prefix duplicate/shadowing check
URL: https://github.com/apache/druid/pull/9384
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei commented on a change in pull request #9384: Add join prefix duplicate/shadowing check

Posted by GitBox <gi...@apache.org>.
jon-wei commented on a change in pull request #9384: Add join prefix duplicate/shadowing check
URL: https://github.com/apache/druid/pull/9384#discussion_r383607739
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/join/HashJoinSegmentStorageAdapter.java
 ##########
 @@ -56,25 +56,29 @@
   private final List<JoinableClause> clauses;
   private final boolean enableFilterPushDown;
 
+  /**
+   * @param baseAdapter A StorageAdapter for the left-hand side base segment
+   * @param clauses The right-hand side clauses. The caller is responsible for ensuring that there are no
+   *                duplicate prefixes or prefixes that shadow each other across the clauses
+   * @param enableFilterPushDown Whether to enable filter push down optimizations to the base segment
+   */
   HashJoinSegmentStorageAdapter(
       StorageAdapter baseAdapter,
-      List<JoinableClause> clauses
+      List<JoinableClause> clauses,
+      final boolean enableFilterPushDown
   )
   {
     this.baseAdapter = baseAdapter;
     this.clauses = clauses;
-    this.enableFilterPushDown = QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_PUSH_DOWN;
+    this.enableFilterPushDown = enableFilterPushDown;
   }
 
   HashJoinSegmentStorageAdapter(
 
 Review comment:
   Added annotation

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei commented on a change in pull request #9384: Add join prefix duplicate/shadowing check

Posted by GitBox <gi...@apache.org>.
jon-wei commented on a change in pull request #9384: Add join prefix duplicate/shadowing check
URL: https://github.com/apache/druid/pull/9384#discussion_r384132702
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/join/Joinables.java
 ##########
 @@ -100,6 +112,9 @@ public static boolean isPrefixedBy(final String columnName, final String prefix)
       final JoinableFactory joinableFactory
   )
   {
+    // Since building a JoinableClause can be expensive, check for prefix conflicts before building
 
 Review comment:
   ```
   I think this needs to be done ahead of time because checking while building the JoinableClauses is slower than checking ahead of time. Is that correct?
   ```
   
   Yes, that's correct, it's to fail fast before doing the potentially expensive `joinableFactory.build` calls.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9384: Add join prefix duplicate/shadowing check

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9384: Add join prefix duplicate/shadowing check
URL: https://github.com/apache/druid/pull/9384#discussion_r382870885
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/join/HashJoinSegmentStorageAdapter.java
 ##########
 @@ -56,25 +56,29 @@
   private final List<JoinableClause> clauses;
   private final boolean enableFilterPushDown;
 
+  /**
+   * @param baseAdapter A StorageAdapter for the left-hand side base segment
+   * @param clauses The right-hand side clauses. The caller is responsible for ensuring that there are no
+   *                duplicate prefixes or prefixes that shadow each other across the clauses
+   * @param enableFilterPushDown Whether to enable filter push down optimizations to the base segment
+   */
   HashJoinSegmentStorageAdapter(
       StorageAdapter baseAdapter,
-      List<JoinableClause> clauses
+      List<JoinableClause> clauses,
+      final boolean enableFilterPushDown
   )
   {
     this.baseAdapter = baseAdapter;
     this.clauses = clauses;
-    this.enableFilterPushDown = QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_PUSH_DOWN;
+    this.enableFilterPushDown = enableFilterPushDown;
   }
 
   HashJoinSegmentStorageAdapter(
 
 Review comment:
   `@VisibleForTesting` Looks like this is only used in tests now

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9384: Add join prefix duplicate/shadowing check

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9384: Add join prefix duplicate/shadowing check
URL: https://github.com/apache/druid/pull/9384#discussion_r383972553
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/join/Joinables.java
 ##########
 @@ -39,6 +40,16 @@
  */
 public class Joinables
 {
+  private static final Comparator<String> DESCENDING_LENGTH_STRING_COMPARATOR = (s1, s2) -> {
+    if (s1.length() > s2.length()) {
+      return -1;
+    } else if (s1.length() < s2.length()) {
+      return 1;
+    } else {
+      return 0;
+    }
+  };
 
 Review comment:
   nit: use `(s1, s2) -> Integer.compare(s2.length(), s1.length())` instead.
   
   Less code to read / maintain.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9384: Add join prefix duplicate/shadowing check

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9384: Add join prefix duplicate/shadowing check
URL: https://github.com/apache/druid/pull/9384#discussion_r383979159
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/join/Joinables.java
 ##########
 @@ -100,6 +112,9 @@ public static boolean isPrefixedBy(final String columnName, final String prefix)
       final JoinableFactory joinableFactory
   )
   {
+    // Since building a JoinableClause can be expensive, check for prefix conflicts before building
 
 Review comment:
   I think this comment is throwing me off. If building the clause is expensive, checking for duplicates ahead of time only helps you fail faster at the cost of correct queries being slightly slower.
   
   I think this needs to be done ahead of time because checking while building the JoinableClauses is slower than checking ahead of time. Is that correct?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9384: Add join prefix duplicate/shadowing check

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9384: Add join prefix duplicate/shadowing check
URL: https://github.com/apache/druid/pull/9384#discussion_r382871126
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/join/Joinables.java
 ##########
 @@ -114,4 +118,37 @@ public static boolean isPrefixedBy(final String columnName, final String prefix)
       );
     }).collect(Collectors.toList());
   }
+
+  public static void checkPreJoinableClausesForDuplicatesAndShadowing(
 
 Review comment:
   ```suggestion
     private static void checkPreJoinableClausesForDuplicatesAndShadowing(
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei commented on a change in pull request #9384: Add join prefix duplicate/shadowing check

Posted by GitBox <gi...@apache.org>.
jon-wei commented on a change in pull request #9384: Add join prefix duplicate/shadowing check
URL: https://github.com/apache/druid/pull/9384#discussion_r383608320
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/join/Joinables.java
 ##########
 @@ -114,4 +118,37 @@ public static boolean isPrefixedBy(final String columnName, final String prefix)
       );
     }).collect(Collectors.toList());
   }
+
+  public static void checkPreJoinableClausesForDuplicatesAndShadowing(
+      final List<PreJoinableClause> preJoinableClauses
+  )
+  {
+    List<String> prefixes = new ArrayList<>();
+    for (PreJoinableClause clause : preJoinableClauses) {
+      prefixes.add(clause.getPrefix());
+    }
+
+    checkPrefixesForDuplicatesAndShadowing(prefixes);
 
 Review comment:
   I think we'd typically have a small number of clauses (one per JOIN statement in the query), I did some brief benchmarks with a radix tree approach and found it was significantly slower on the 6 prefix test I tried (https://github.com/jon-wei/druid/tree/join_prefix_check2)
   
   I think this is fine for now, I added a comment about the naive approach

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9384: Add join prefix duplicate/shadowing check

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9384: Add join prefix duplicate/shadowing check
URL: https://github.com/apache/druid/pull/9384#discussion_r382872068
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/join/Joinables.java
 ##########
 @@ -114,4 +118,37 @@ public static boolean isPrefixedBy(final String columnName, final String prefix)
       );
     }).collect(Collectors.toList());
   }
+
+  public static void checkPreJoinableClausesForDuplicatesAndShadowing(
+      final List<PreJoinableClause> preJoinableClauses
+  )
+  {
+    List<String> prefixes = new ArrayList<>();
+    for (PreJoinableClause clause : preJoinableClauses) {
+      prefixes.add(clause.getPrefix());
+    }
+
+    checkPrefixesForDuplicatesAndShadowing(prefixes);
+  }
+
+  public static void checkPrefixesForDuplicatesAndShadowing(
+      final List<String> prefixes
+  )
+  {
+    for (int i = 0; i < prefixes.size(); i++) {
+      String prefix1 = prefixes.get(i);
+      for (int k = 0; k < prefixes.size(); k++) {
+        if (i != k) {
+          String otherPrefix = prefixes.get(k);
+          if (prefix1.equals(otherPrefix)) {
+            throw new IAE("Detected duplicate prefix in join clauses: [%s]", prefix1);
+          }
+
+          if (isPrefixedBy(prefix1, otherPrefix)) {
 
 Review comment:
   this function already exists, but I think it's faster to change the order of the boolean expression in isPrefixedBy
   
   ```
   return columnName.length() > prefix.length() && columnName.startsWith(prefix);
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei commented on a change in pull request #9384: Add join prefix duplicate/shadowing check

Posted by GitBox <gi...@apache.org>.
jon-wei commented on a change in pull request #9384: Add join prefix duplicate/shadowing check
URL: https://github.com/apache/druid/pull/9384#discussion_r383607849
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/join/Joinables.java
 ##########
 @@ -114,4 +118,37 @@ public static boolean isPrefixedBy(final String columnName, final String prefix)
       );
     }).collect(Collectors.toList());
   }
+
+  public static void checkPreJoinableClausesForDuplicatesAndShadowing(
 
 Review comment:
   Made this private

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei commented on a change in pull request #9384: Add join prefix duplicate/shadowing check

Posted by GitBox <gi...@apache.org>.
jon-wei commented on a change in pull request #9384: Add join prefix duplicate/shadowing check
URL: https://github.com/apache/druid/pull/9384#discussion_r383607966
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/join/Joinables.java
 ##########
 @@ -114,4 +118,37 @@ public static boolean isPrefixedBy(final String columnName, final String prefix)
       );
     }).collect(Collectors.toList());
   }
+
+  public static void checkPreJoinableClausesForDuplicatesAndShadowing(
+      final List<PreJoinableClause> preJoinableClauses
+  )
+  {
+    List<String> prefixes = new ArrayList<>();
+    for (PreJoinableClause clause : preJoinableClauses) {
+      prefixes.add(clause.getPrefix());
+    }
+
+    checkPrefixesForDuplicatesAndShadowing(prefixes);
+  }
+
+  public static void checkPrefixesForDuplicatesAndShadowing(
+      final List<String> prefixes
+  )
+  {
+    for (int i = 0; i < prefixes.size(); i++) {
+      String prefix1 = prefixes.get(i);
+      for (int k = 0; k < prefixes.size(); k++) {
+        if (i != k) {
+          String otherPrefix = prefixes.get(k);
+          if (prefix1.equals(otherPrefix)) {
+            throw new IAE("Detected duplicate prefix in join clauses: [%s]", prefix1);
+          }
+
+          if (isPrefixedBy(prefix1, otherPrefix)) {
 
 Review comment:
   cool, changed the order there

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei commented on a change in pull request #9384: Add join prefix duplicate/shadowing check

Posted by GitBox <gi...@apache.org>.
jon-wei commented on a change in pull request #9384: Add join prefix duplicate/shadowing check
URL: https://github.com/apache/druid/pull/9384#discussion_r383608320
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/join/Joinables.java
 ##########
 @@ -114,4 +118,37 @@ public static boolean isPrefixedBy(final String columnName, final String prefix)
       );
     }).collect(Collectors.toList());
   }
+
+  public static void checkPreJoinableClausesForDuplicatesAndShadowing(
+      final List<PreJoinableClause> preJoinableClauses
+  )
+  {
+    List<String> prefixes = new ArrayList<>();
+    for (PreJoinableClause clause : preJoinableClauses) {
+      prefixes.add(clause.getPrefix());
+    }
+
+    checkPrefixesForDuplicatesAndShadowing(prefixes);
 
 Review comment:
   I think we'd typically have a small number of clauses and prefixes (one per JOIN statement in the query), I did some brief benchmarks with a radix tree approach and found it was significantly slower on the 6 prefix test I tried (https://github.com/jon-wei/druid/tree/join_prefix_check2)
   
   I think this is fine for now, I added a comment about the naive approach

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9384: Add join prefix duplicate/shadowing check

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9384: Add join prefix duplicate/shadowing check
URL: https://github.com/apache/druid/pull/9384#discussion_r383973769
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/join/Joinables.java
 ##########
 @@ -114,4 +118,37 @@ public static boolean isPrefixedBy(final String columnName, final String prefix)
       );
     }).collect(Collectors.toList());
   }
+
+  public static void checkPreJoinableClausesForDuplicatesAndShadowing(
+      final List<PreJoinableClause> preJoinableClauses
+  )
+  {
+    List<String> prefixes = new ArrayList<>();
+    for (PreJoinableClause clause : preJoinableClauses) {
+      prefixes.add(clause.getPrefix());
+    }
+
+    checkPrefixesForDuplicatesAndShadowing(prefixes);
 
 Review comment:
   👍 for the benchmarks. In my experience, sorting can actually be harmful at such small numbers. Just curious if you ran benchmarks to see if sorting would be beneficial here. Again, since it's such small numbers, it probably doesn't matter. Sorry for sending you on a wild goose chase with my comment.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei commented on a change in pull request #9384: Add join prefix duplicate/shadowing check

Posted by GitBox <gi...@apache.org>.
jon-wei commented on a change in pull request #9384: Add join prefix duplicate/shadowing check
URL: https://github.com/apache/druid/pull/9384#discussion_r384194877
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/join/Joinables.java
 ##########
 @@ -114,4 +122,45 @@ public static boolean isPrefixedBy(final String columnName, final String prefix)
       );
     }).collect(Collectors.toList());
   }
+
+  private static void checkPreJoinableClausesForDuplicatesAndShadowing(
+      final List<PreJoinableClause> preJoinableClauses
+  )
+  {
+    List<String> prefixes = new ArrayList<>();
+    for (PreJoinableClause clause : preJoinableClauses) {
+      prefixes.add(clause.getPrefix());
+    }
+
+    checkPrefixesForDuplicatesAndShadowing(prefixes);
+  }
+
+  /**
+   * Check if any prefixes in the provided list duplicate or shadow each other.
+   *
+   * @param prefixes A mutable list containing the prefixes to check. This list will be sorted by descending
+   *                 string length.
+   */
+  public static void checkPrefixesForDuplicatesAndShadowing(
+      final List<String> prefixes
+  )
+  {
+    // this is a naive approach that assumes we'll typically handle only a small number of prefixes
+    prefixes.sort(DESCENDING_LENGTH_STRING_COMPARATOR);
+    for (int i = 0; i < prefixes.size(); i++) {
+      String prefix = prefixes.get(i);
+      for (int k = i; k < prefixes.size(); k++) {
+        if (i != k) {
 
 Review comment:
   Good point, initializes to `i + 1` now

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9384: Add join prefix duplicate/shadowing check

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9384: Add join prefix duplicate/shadowing check
URL: https://github.com/apache/druid/pull/9384#discussion_r382873394
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/join/Joinables.java
 ##########
 @@ -114,4 +118,37 @@ public static boolean isPrefixedBy(final String columnName, final String prefix)
       );
     }).collect(Collectors.toList());
   }
+
+  public static void checkPreJoinableClausesForDuplicatesAndShadowing(
+      final List<PreJoinableClause> preJoinableClauses
+  )
+  {
+    List<String> prefixes = new ArrayList<>();
+    for (PreJoinableClause clause : preJoinableClauses) {
+      prefixes.add(clause.getPrefix());
+    }
+
+    checkPrefixesForDuplicatesAndShadowing(prefixes);
+  }
+
+  public static void checkPrefixesForDuplicatesAndShadowing(
+      final List<String> prefixes
+  )
+  {
+    for (int i = 0; i < prefixes.size(); i++) {
+      String prefix1 = prefixes.get(i);
+      for (int k = 0; k < prefixes.size(); k++) {
 
 Review comment:
   If the list is sorted by size (largest -> smallest) we can change this loop to `k = i` instead of `k = 0`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei commented on a change in pull request #9384: Add join prefix duplicate/shadowing check

Posted by GitBox <gi...@apache.org>.
jon-wei commented on a change in pull request #9384: Add join prefix duplicate/shadowing check
URL: https://github.com/apache/druid/pull/9384#discussion_r384135173
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/join/Joinables.java
 ##########
 @@ -39,6 +40,16 @@
  */
 public class Joinables
 {
+  private static final Comparator<String> DESCENDING_LENGTH_STRING_COMPARATOR = (s1, s2) -> {
+    if (s1.length() > s2.length()) {
+      return -1;
+    } else if (s1.length() < s2.length()) {
+      return 1;
+    } else {
+      return 0;
+    }
+  };
 
 Review comment:
   Changed the comparator

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei commented on a change in pull request #9384: Add join prefix duplicate/shadowing check

Posted by GitBox <gi...@apache.org>.
jon-wei commented on a change in pull request #9384: Add join prefix duplicate/shadowing check
URL: https://github.com/apache/druid/pull/9384#discussion_r383608448
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/join/Joinables.java
 ##########
 @@ -114,4 +118,37 @@ public static boolean isPrefixedBy(final String columnName, final String prefix)
       );
     }).collect(Collectors.toList());
   }
+
+  public static void checkPreJoinableClausesForDuplicatesAndShadowing(
+      final List<PreJoinableClause> preJoinableClauses
+  )
+  {
+    List<String> prefixes = new ArrayList<>();
+    for (PreJoinableClause clause : preJoinableClauses) {
+      prefixes.add(clause.getPrefix());
+    }
+
+    checkPrefixesForDuplicatesAndShadowing(prefixes);
+  }
+
+  public static void checkPrefixesForDuplicatesAndShadowing(
+      final List<String> prefixes
+  )
+  {
+    for (int i = 0; i < prefixes.size(); i++) {
+      String prefix1 = prefixes.get(i);
+      for (int k = 0; k < prefixes.size(); k++) {
 
 Review comment:
   Added a sort and the `k = i` optimization

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org