You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@iotdb.apache.org by GitBox <gi...@apache.org> on 2022/03/09 07:52:58 UTC

[GitHub] [iotdb] JackieTien97 commented on a change in pull request #5184: [IOTDB-2602] Add the new feature of the null value filter to support filtering based on partial columns

JackieTien97 commented on a change in pull request #5184:
URL: https://github.com/apache/iotdb/pull/5184#discussion_r822246480



##########
File path: server/src/main/java/org/apache/iotdb/db/qp/strategy/optimizer/ConcatPathOptimizer.java
##########
@@ -137,6 +211,52 @@ private void removeWildcardsInSelectPaths(QueryOperator queryOperator)
     }
   }
 
+  private void removeWildcardsWithoutNullColumns(QueryOperator queryOperator)
+      throws LogicalOptimizeException {
+    if (queryOperator.getIndexType() != null) {
+      return;
+    }
+
+    if (queryOperator.getSpecialClauseComponent() == null) {
+      return;
+    }
+
+    List<Expression> expressions =
+        queryOperator.getSpecialClauseComponent().getWithoutNullColumns();
+    WildcardsRemover withoutNullWildcardsRemover = new WildcardsRemover(queryOperator);
+    List<Expression> filterExpressions = new ArrayList<>();
+    List<Expression> resultExpressions = new ArrayList<>();
+    for (Expression expression : expressions) {
+      if (queryOperator.getAliasSet().contains(expression.getExpressionString())) {

Review comment:
       Add comments here to explain why we need alias judgement again.

##########
File path: server/src/main/java/org/apache/iotdb/db/qp/strategy/optimizer/ConcatPathOptimizer.java
##########
@@ -68,8 +72,20 @@ public Operator transform(Operator operator)
     if (!optimizable(queryOperator)) {
       return queryOperator;
     }
+
+    // add aliasSet
+    Set<String> aliasSet = new HashSet<>();
+    for (ResultColumn resultColumn : queryOperator.getSelectComponent().getResultColumns()) {
+      if (resultColumn.hasAlias()) {
+        aliasSet.add(resultColumn.getAlias());
+      }
+    }
+    queryOperator.setAliasSet(aliasSet);

Review comment:
       Move it to `IoTDBSqlVisitor.parseSelectClause()` method.

##########
File path: server/src/main/java/org/apache/iotdb/db/qp/strategy/optimizer/ConcatPathOptimizer.java
##########
@@ -104,6 +120,64 @@ private void concatSelect(QueryOperator queryOperator) throws LogicalOptimizeExc
     queryOperator.getSelectComponent().setResultColumns(resultColumns);
   }
 
+  private void concatWithoutNullColumns(QueryOperator queryOperator)
+      throws LogicalOptimizeException {
+    List<PartialPath> prefixPaths = queryOperator.getFromComponent().getPrefixPaths();
+    // has without null columns
+    if (queryOperator.getSpecialClauseComponent() != null
+        && !queryOperator.getSpecialClauseComponent().getWithoutNullColumns().isEmpty()) {
+      List<Expression> withoutNullColumns = new ArrayList<>();
+      for (Expression expression :
+          queryOperator.getSpecialClauseComponent().getWithoutNullColumns()) {
+        concatWithoutNullColumns(
+            prefixPaths, expression, withoutNullColumns, queryOperator.getAliasSet());
+      }
+      queryOperator.getSpecialClauseComponent().setWithoutNullColumns(withoutNullColumns);
+    }
+  }
+
+  private void concatWithoutNullColumns(
+      List<PartialPath> prefixPaths,
+      Expression expression,
+      List<Expression> withoutNullColumns,
+      Set<String> aliasSet)
+      throws LogicalOptimizeException {
+    if (expression instanceof TimeSeriesOperand) {
+      TimeSeriesOperand timeSeriesOperand = (TimeSeriesOperand) expression;
+      if (timeSeriesOperand
+          .getPath()
+          .getFullPath()
+          .startsWith(SQLConstant.ROOT + ".")) { // start with "root." don't concat
+        if (((TimeSeriesOperand) expression).getPath().getNodeLength() == 1) { // no split

Review comment:
       Add comments here to explain that full path which starts with `root.` won't be split in `ioTDBSqlVisitor.parseExpression` method, so we should split here by ourselves.

##########
File path: docs/UserGuide/Query-Data/Without-Null.md
##########
@@ -21,17 +21,162 @@
 
 # Null Value Filter
 
-In IoTDB, the `WITHOUT NULL` clause can be used to filter null values in the result set. There are two filtering strategies:
+In practical application, users may want to filter some rows with null values in the query result set. In IoTDB, the `WITHOUT NULL` clause can be used to filter null values in the result set. There are two filtering strategies: `WITHOUT NULL ANY`和`WITHOUT NULL ALL`. In addition, the `WITHOUT NULL` clause supports specifying the corresponding columns for filtering.
 
-1. IoTDB will join all the sensor value by its time, and if some sensors don't have values in that timestamp, we will fill it with null. In some analysis scenarios, we only need the row if all the columns of it have value.
+> WITHOUT NULL ANY: if one of the columns in the specified column set is null, the conditions are met for filtering.
+> 
+> WITHOUT NULL ALL: if all columns in the specified column set are null, the conditions are met for filtering.
+
+## Don't specify columns
+
+> By default, it is effective for all columns in the result set. That is the specified column set includes all columns in the result set.
+
+1. In the following query, if any column of one row in the result set is null, the row will be filtered out. That is the result set obtained does not contain any null values.
 
 ```sql
 select * from root.ln.** where time <= 2017-11-01T00:01:00 WITHOUT NULL ANY
 ```
 
-2. In group by query, we will fill null for any group by interval if the columns don't have values in that group by interval. However, if all columns in that group by interval are null, maybe users don't need that RowRecord, so we can use `WITHOUT NULL ALL` to filter that row.
+2. In the following query, if all columns of one row in the result set are null, the row will be filtered out.

Review comment:
       Actually, in raw query, there won't be a row whose fields are all null. So `without null all` only take effect in group by query. The previous sql case is not right, you can correct the sql instead of the description. 

##########
File path: server/src/main/java/org/apache/iotdb/db/qp/physical/crud/QueryPlan.java
##########
@@ -53,7 +58,11 @@
 
   private boolean ascending = true;
 
-  private Map<String, Integer> pathToIndex = new HashMap<>();
+  private Map<String, Integer> pathToIndex = new HashMap<>(); // align By time
+  private Set<String> alignByDeviceWithoutNullValidSet = new HashSet<>(); // align By Device

Review comment:
       Move it to AlignByDevicePlan.

##########
File path: server/src/main/java/org/apache/iotdb/db/qp/physical/crud/QueryPlan.java
##########
@@ -171,6 +188,14 @@ public void setColumnNameToDatasetOutputIndex(String columnName, Integer index)
     pathToIndex.put(columnName, index);
   }
 
+  public void addAlignByDeviceValidWithoutNullColumn(String columnName) {
+    alignByDeviceWithoutNullValidSet.add(columnName);
+  }
+
+  public boolean isValidWithoutNullColumnAlignByDevice(String columnName) {
+    return alignByDeviceWithoutNullValidSet.contains(columnName);
+  }
+

Review comment:
       Move these two methods to AlignByDevicePlan and add some java docs for thses two methods.

##########
File path: server/src/main/java/org/apache/iotdb/db/qp/physical/crud/AlignByDevicePlan.java
##########
@@ -48,6 +48,7 @@
 
   // to record result measurement columns, e.g. temperature, status, speed
   private List<String> measurements;
+  private Set<String> withoutNullColumns = new HashSet<>();

Review comment:
       add some comments to expplain what difference it has with `withoutNullValidSet`

##########
File path: server/src/main/java/org/apache/iotdb/db/query/dataset/RawQueryDataSetWithoutValueFilter.java
##########
@@ -509,30 +509,53 @@ public TSQueryDataSet fillBuffer(int fetchSize, WatermarkEncoder encoder)
     return tsQueryDataSet;
   }
 
-  /** if any column in the row record is null, we filter it. */
+  //  /** if any column in the row record is null, we filter it. */

Review comment:
       ```suggestion
   ```

##########
File path: server/src/main/java/org/apache/iotdb/db/utils/QueryDataSetUtils.java
##########
@@ -66,14 +67,57 @@ public static TSQueryDataSet convertQueryDataSetByFetchSize(
     for (int i = 0; i < fetchSize; i++) {
       if (queryDataSet.hasNext()) {
         RowRecord rowRecord = queryDataSet.next();
-        // filter rows whose columns are null according to the rule
-        if ((queryDataSet.isWithoutAllNull() && rowRecord.isAllNull())
-            || (queryDataSet.isWithoutAnyNull() && rowRecord.hasNullField())) {
-          // if the current RowRecord doesn't satisfy, we should also decrease AlreadyReturnedRowNum
-          queryDataSet.decreaseAlreadyReturnedRowNum();
-          i--;
-          continue;
+        Set<Integer> withoutNullColumnsIndex = queryDataSet.getWithoutNullColumnsIndex();
+        if (withoutNullColumnsIndex.isEmpty()) {
+          // filter rows whose columns are null according to the rule
+          if ((queryDataSet.isWithoutAllNull() && rowRecord.isAllNull())
+              || (queryDataSet.isWithoutAnyNull() && rowRecord.hasNullField())) {
+            // if the current RowRecord doesn't satisfy, we should also decrease
+            // AlreadyReturnedRowNum
+            queryDataSet.decreaseAlreadyReturnedRowNum();
+            i--;
+            continue;
+          }
+        } else {
+          boolean anyNullFlag = false, allNullFlag = true;
+          int index = 0;
+          for (Field field : rowRecord.getFields()) {
+            if (!withoutNullColumnsIndex.contains(index)) {

Review comment:
       Change the loop sequence

##########
File path: server/src/main/java/org/apache/iotdb/db/query/dataset/RawQueryDataSetWithoutValueFilter.java
##########
@@ -509,30 +509,53 @@ public TSQueryDataSet fillBuffer(int fetchSize, WatermarkEncoder encoder)
     return tsQueryDataSet;
   }
 
-  /** if any column in the row record is null, we filter it. */
+  //  /** if any column in the row record is null, we filter it. */
+  /** if columns in the row record match the condition of null value filter, we filter it. */
   private boolean filterRowRecord(int seriesNum, long minTime)
       throws IOException, InterruptedException {
-    boolean hasNull = false;
+    boolean hasNull = false, isAllNull = true;
     for (int seriesIndex = 0; seriesIndex < seriesNum; seriesIndex++) {
+      if (!withoutNullColumnsIndex.isEmpty() && !withoutNullColumnsIndex.contains(seriesIndex)) {

Review comment:
       We should consider aligned path here.

##########
File path: server/src/main/java/org/apache/iotdb/db/qp/logical/crud/QueryOperator.java
##########
@@ -356,14 +379,35 @@ private void checkDataTypeConsistency(TSDataType checkedDataType, MeasurementInf
     }
   }
 
-  protected void convertSpecialClauseValues(QueryPlan queryPlan) {
+  protected void convertSpecialClauseValues(QueryPlan queryPlan) throws QueryProcessException {
     if (specialClauseComponent != null) {
+      // meta consistence check
+      if (!queryPlan
+          .getPathToIndex()
+          .isEmpty()) { // align by device queryPlan.getPathToIndex() is empty
+        for (Expression expression : specialClauseComponent.getWithoutNullColumns()) {
+          if (queryPlan.getPathToIndex().containsKey(expression.getExpressionString())) {
+            queryPlan.addWithoutNullColumnIndex(
+                queryPlan.getPathToIndex().get(expression.getExpressionString()));
+          } else {
+            throw new QueryProcessException(QueryPlan.WITHOUT_NULL_FILTER_ERROR_MESSAGE);
+          }
+        }
+      }
       queryPlan.setWithoutAllNull(specialClauseComponent.isWithoutAllNull());
       queryPlan.setWithoutAnyNull(specialClauseComponent.isWithoutAnyNull());
       queryPlan.setRowLimit(specialClauseComponent.getRowLimit());
       queryPlan.setRowOffset(specialClauseComponent.getRowOffset());
       queryPlan.setAscending(specialClauseComponent.isAscending());
       queryPlan.setAlignByTime(specialClauseComponent.isAlignByTime());
+
+      // add without null column
+      if (queryPlan instanceof AlignByDevicePlan) {
+        AlignByDevicePlan alignByDevicePlan = (AlignByDevicePlan) queryPlan;
+        for (Expression expression : specialClauseComponent.getWithoutNullColumns()) {
+          alignByDevicePlan.addWithoutNullColumns(expression.getExpressionString());
+        }
+      }

Review comment:
       ```suggestion
         // record the without null column corresponding index if the plan is not AlignByDevicePlan
         // otherwise, we simply add all without null column names into AlignByDevicePlan, and then let itself do the logic of conversion in `checkWithoutNullColumnValidAlignByDevice` of `QueryOperator.generateAlignByDevicePlan`
         if (!queryPlan
             .getPathToIndex()
             .isEmpty()) { // align by device queryPlan.getPathToIndex() is empty
           for (Expression expression : specialClauseComponent.getWithoutNullColumns()) {
             if (queryPlan.getPathToIndex().containsKey(expression.getExpressionString())) {
               queryPlan.addWithoutNullColumnIndex(
                   queryPlan.getPathToIndex().get(expression.getExpressionString()));
             } else { // check whether current without null column exists in select clause, if not throw exception
               throw new QueryProcessException(QueryPlan.WITHOUT_NULL_FILTER_ERROR_MESSAGE);
             }
           }
         } else if (queryPlan instanceof AlignByDevicePlan) {
           AlignByDevicePlan alignByDevicePlan = (AlignByDevicePlan) queryPlan;
           for (Expression expression : specialClauseComponent.getWithoutNullColumns()) {
             alignByDevicePlan.addWithoutNullColumns(expression.getExpressionString());
           }
         }
         queryPlan.setWithoutAllNull(specialClauseComponent.isWithoutAllNull());
         queryPlan.setWithoutAnyNull(specialClauseComponent.isWithoutAnyNull());
         queryPlan.setRowLimit(specialClauseComponent.getRowLimit());
         queryPlan.setRowOffset(specialClauseComponent.getRowOffset());
         queryPlan.setAscending(specialClauseComponent.isAscending());
         queryPlan.setAlignByTime(specialClauseComponent.isAlignByTime());
       }
   ```

##########
File path: tsfile/src/main/java/org/apache/iotdb/tsfile/read/query/dataset/QueryDataSet.java
##########
@@ -107,17 +112,59 @@ protected void initQueryDataSetFields(
     }
   }
 
+  public Set<Integer> getWithoutNullColumnsIndex() {
+    return withoutNullColumnsIndex;
+  }
+
+  public void setWithoutNullColumnsIndex(Set<Integer> withoutNullColumnsIndex) {
+    this.withoutNullColumnsIndex = withoutNullColumnsIndex;
+  }
+
   public boolean hasNext() throws IOException {
     // proceed to the OFFSET row by skipping rows
     while (rowOffset > 0) {
       if (hasNextWithoutConstraint()) {
         RowRecord rowRecord = nextWithoutConstraint(); // DO NOT use next()
-        // filter rows whose columns are null according to the rule
-        if ((withoutAllNull && rowRecord.isAllNull())
-            || (withoutAnyNull && rowRecord.hasNullField())) {
-          continue;
+        if (withoutNullColumnsIndex.isEmpty()) {
+          // filter rows whose columns are null according to the rule
+          if ((withoutAllNull && rowRecord.isAllNull())
+              || (withoutAnyNull && rowRecord.hasNullField())) {
+            continue;
+          }
+          rowOffset--;
+        } else {
+          boolean anyNullFlag = false, allNullFlag = true;
+          int index = 0;
+          for (Field field : rowRecord.getFields()) {
+            if (!withoutNullColumnsIndex.contains(index)) {

Review comment:
       change the loop sequence

##########
File path: server/src/main/java/org/apache/iotdb/db/qp/strategy/optimizer/ConcatPathOptimizer.java
##########
@@ -137,6 +211,52 @@ private void removeWildcardsInSelectPaths(QueryOperator queryOperator)
     }
   }
 
+  private void removeWildcardsWithoutNullColumns(QueryOperator queryOperator)
+      throws LogicalOptimizeException {
+    if (queryOperator.getIndexType() != null) {
+      return;
+    }
+
+    if (queryOperator.getSpecialClauseComponent() == null) {
+      return;
+    }
+
+    List<Expression> expressions =
+        queryOperator.getSpecialClauseComponent().getWithoutNullColumns();
+    WildcardsRemover withoutNullWildcardsRemover = new WildcardsRemover(queryOperator);
+    List<Expression> filterExpressions = new ArrayList<>();

Review comment:
       Change the field name to `actualExpressions`, `filterExpressions` is a little bit ambiguous, and also add comments to explain the field.

##########
File path: tsfile/src/main/java/org/apache/iotdb/tsfile/read/query/dataset/QueryDataSet.java
##########
@@ -107,17 +112,59 @@ protected void initQueryDataSetFields(
     }
   }
 
+  public Set<Integer> getWithoutNullColumnsIndex() {
+    return withoutNullColumnsIndex;
+  }
+
+  public void setWithoutNullColumnsIndex(Set<Integer> withoutNullColumnsIndex) {
+    this.withoutNullColumnsIndex = withoutNullColumnsIndex;
+  }
+
   public boolean hasNext() throws IOException {
     // proceed to the OFFSET row by skipping rows
     while (rowOffset > 0) {
       if (hasNextWithoutConstraint()) {
         RowRecord rowRecord = nextWithoutConstraint(); // DO NOT use next()
-        // filter rows whose columns are null according to the rule
-        if ((withoutAllNull && rowRecord.isAllNull())
-            || (withoutAnyNull && rowRecord.hasNullField())) {
-          continue;
+        if (withoutNullColumnsIndex.isEmpty()) {
+          // filter rows whose columns are null according to the rule
+          if ((withoutAllNull && rowRecord.isAllNull())
+              || (withoutAnyNull && rowRecord.hasNullField())) {
+            continue;
+          }
+          rowOffset--;

Review comment:
       extract this commone statement in `if` and `else` block.

##########
File path: server/src/main/java/org/apache/iotdb/db/qp/physical/crud/AggregationPlan.java
##########
@@ -183,6 +183,9 @@ public String getColumnForDisplay(String columnForReader, int pathIndex) {
       String aggregatePath =
           groupByLevelController.getGroupedPath(
               String.format("%s(%s)", functionName, path.getFullPath()));
+      if (resultColumns.get(pathIndex).hasAlias()) {
+        return resultColumns.get(pathIndex).getAlias();
+      }

Review comment:
       move it to the first line of this fi block

##########
File path: server/src/main/java/org/apache/iotdb/db/qp/strategy/optimizer/ConcatPathOptimizer.java
##########
@@ -137,6 +211,52 @@ private void removeWildcardsInSelectPaths(QueryOperator queryOperator)
     }
   }
 
+  private void removeWildcardsWithoutNullColumns(QueryOperator queryOperator)
+      throws LogicalOptimizeException {
+    if (queryOperator.getIndexType() != null) {
+      return;
+    }
+
+    if (queryOperator.getSpecialClauseComponent() == null) {
+      return;
+    }
+
+    List<Expression> expressions =
+        queryOperator.getSpecialClauseComponent().getWithoutNullColumns();
+    WildcardsRemover withoutNullWildcardsRemover = new WildcardsRemover(queryOperator);
+    List<Expression> filterExpressions = new ArrayList<>();
+    List<Expression> resultExpressions = new ArrayList<>();
+    for (Expression expression : expressions) {
+      if (queryOperator.getAliasSet().contains(expression.getExpressionString())) {
+        resultExpressions.add(expression);
+        continue;
+      }
+      expression.removeWildcards(withoutNullWildcardsRemover, filterExpressions);
+    }
+
+    // group by level, use groupedPathMap
+    GroupByLevelController groupByLevelController =
+        queryOperator.getSpecialClauseComponent().getGroupByLevelController();
+    if (groupByLevelController != null) {
+      for (Expression expression : filterExpressions) {
+        String groupedPath =
+            groupByLevelController.getGroupedPath(expression.getExpressionString());
+        if (groupedPath != null) {
+          try {
+            resultExpressions.add(new TimeSeriesOperand(new PartialPath(groupedPath)));
+          } catch (IllegalPathException e) {
+            throw new LogicalOptimizeException(e.getMessage());
+          }
+        } else {
+          resultExpressions.add(expression);
+        }
+      }

Review comment:
       Add a IT for this case, like:
   `select sum(d1.s1), sum(d2.s1) from root.sg1 group by level = 1 without null any(sum(d1.s1), sum(d2.s1));`
   
   It will only contains one column: `sum(root.sg1.*.s1)`.

##########
File path: tsfile/src/main/java/org/apache/iotdb/tsfile/read/query/dataset/QueryDataSet.java
##########
@@ -107,17 +112,59 @@ protected void initQueryDataSetFields(
     }
   }
 
+  public Set<Integer> getWithoutNullColumnsIndex() {
+    return withoutNullColumnsIndex;
+  }
+
+  public void setWithoutNullColumnsIndex(Set<Integer> withoutNullColumnsIndex) {
+    this.withoutNullColumnsIndex = withoutNullColumnsIndex;
+  }
+
   public boolean hasNext() throws IOException {
     // proceed to the OFFSET row by skipping rows
     while (rowOffset > 0) {
       if (hasNextWithoutConstraint()) {
         RowRecord rowRecord = nextWithoutConstraint(); // DO NOT use next()
-        // filter rows whose columns are null according to the rule
-        if ((withoutAllNull && rowRecord.isAllNull())
-            || (withoutAnyNull && rowRecord.hasNullField())) {
-          continue;
+        if (withoutNullColumnsIndex.isEmpty()) {
+          // filter rows whose columns are null according to the rule
+          if ((withoutAllNull && rowRecord.isAllNull())
+              || (withoutAnyNull && rowRecord.hasNullField())) {
+            continue;
+          }
+          rowOffset--;
+        } else {
+          boolean anyNullFlag = false, allNullFlag = true;
+          int index = 0;
+          for (Field field : rowRecord.getFields()) {
+            if (!withoutNullColumnsIndex.contains(index)) {

Review comment:
       change the loop sequence

##########
File path: server/src/main/java/org/apache/iotdb/db/utils/QueryDataSetUtils.java
##########
@@ -66,14 +67,57 @@ public static TSQueryDataSet convertQueryDataSetByFetchSize(
     for (int i = 0; i < fetchSize; i++) {
       if (queryDataSet.hasNext()) {
         RowRecord rowRecord = queryDataSet.next();
-        // filter rows whose columns are null according to the rule
-        if ((queryDataSet.isWithoutAllNull() && rowRecord.isAllNull())
-            || (queryDataSet.isWithoutAnyNull() && rowRecord.hasNullField())) {
-          // if the current RowRecord doesn't satisfy, we should also decrease AlreadyReturnedRowNum
-          queryDataSet.decreaseAlreadyReturnedRowNum();
-          i--;
-          continue;
+        Set<Integer> withoutNullColumnsIndex = queryDataSet.getWithoutNullColumnsIndex();
+        if (withoutNullColumnsIndex.isEmpty()) {
+          // filter rows whose columns are null according to the rule
+          if ((queryDataSet.isWithoutAllNull() && rowRecord.isAllNull())
+              || (queryDataSet.isWithoutAnyNull() && rowRecord.hasNullField())) {
+            // if the current RowRecord doesn't satisfy, we should also decrease
+            // AlreadyReturnedRowNum
+            queryDataSet.decreaseAlreadyReturnedRowNum();
+            i--;
+            continue;
+          }
+        } else {
+          boolean anyNullFlag = false, allNullFlag = true;
+          int index = 0;
+          for (Field field : rowRecord.getFields()) {
+            if (!withoutNullColumnsIndex.contains(index)) {

Review comment:
       `withoutNullColumnsIndex` should not be very large, but the number of rowRecord's fields may be large. Exchange the loop sequence

##########
File path: server/src/main/java/org/apache/iotdb/db/query/dataset/UDTFAlignByTimeDataSet.java
##########
@@ -132,7 +137,10 @@ public TSQueryDataSet fillBuffer(int fetchSize, WatermarkEncoder encoder)
         // Here we get a timestamp first and then construct the row column by column.
         // We don't record this row when nullFieldsCnt > 0 and withoutAnyNull == true
         // or nullFieldsCnt == columnNum and withoutAllNull == true

Review comment:
       change the comments to current meaning




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@iotdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org