You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2020/12/16 01:23:01 UTC

[GitHub] [incubator-pinot] jackjlli opened a new pull request #6361: Detect invalid column names from query in Pinot server

jackjlli opened a new pull request #6361:
URL: https://github.com/apache/incubator-pinot/pull/6361


   ## Description
   This PR adds the logic of detecting invalid column names from query in Pinot server. 
   Since query context is already generated in pinot server, we don't need to parse an extra copy of query context for the same query.
   
   In pinot broker, once there exists non-null response metadata `invalidColumns` in any returned data table, a broker metric will be emitted. This helps us to detect whether there is any existing use cases that are querying with invalid columns right now.
   
   ## Upgrade Notes
   Does this PR prevent a zero down-time upgrade? (Assume upgrade order: Controller, Broker, Server, Minion)
   * [ ] Yes (Please label as **<code>backward-incompat</code>**, and complete the section below on Release Notes)
   
   Does this PR fix a zero-downtime upgrade introduced earlier?
   * [ ] Yes (Please label this as **<code>backward-incompat</code>**, and complete the section below on Release Notes)
   
   Does this PR otherwise need attention when creating release notes? Things to consider:
   - New configuration options
   - Deprecation of configurations
   - Signature changes to public methods/interfaces
   - New plugins added or old plugins removed
   * [ ] Yes (Please label this PR as **<code>release-notes</code>** and complete the section on Release Notes)
   ## Release Notes
   If you have tagged this as either backward-incompat or release-notes,
   you MUST add text here that you would like to see appear in release notes of the
   next release.
   
   If you have a series of commits adding or enabling a feature, then
   add this section only in final commit that marks the feature completed.
   Refer to earlier release notes to see examples of text
   
   ## Documentation
   If you have introduced a new feature or configuration, please add it to the documentation as well.
   See https://docs.pinot.apache.org/developers/developers-and-contributors/update-document
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mqliang commented on a change in pull request #6361: Detect invalid column names from query in Pinot server

Posted by GitBox <gi...@apache.org>.
mqliang commented on a change in pull request #6361:
URL: https://github.com/apache/incubator-pinot/pull/6361#discussion_r620714911



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/executor/ServerQueryExecutorV1Impl.java
##########
@@ -303,6 +299,31 @@ private DataTable processQuery(List<IndexSegment> indexSegments, QueryContext qu
     }
   }
 
+  /**
+   * If all the segments are pruned, check whether it's caused by invalid column name in the query.
+   * This is to keep the behavior consistent when new columns are added and not all the segments have the new columns in their metadata;
+   * old segments may contain stale schema until the table is reloaded.
+   */
+  private void detectInvalidColumnIfExists(QueryContext queryContext, DataTable dataTable) {
+    Set<String> columnNamesFromSchema = _instanceDataManager.getColumnNamesByTable(queryContext.getTableName());
+    Set<String> columnNamesFromQuery = queryContext.getColumns();
+    // Validate whether column names in the query are valid
+    if (!columnNamesFromSchema.isEmpty() && !columnNamesFromSchema.containsAll(columnNamesFromQuery)) {
+      columnNamesFromQuery.removeAll(columnNamesFromSchema);
+      dataTable.getMetadata().put(MetadataKey.INVALID_COLUMN_IN_QUERY.getName(), columnNamesFromQuery.toString());

Review comment:
       Put only the diff of columnNamesFromQuery/columnNamesFromSchema into metadata?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] jackjlli commented on pull request #6361: Detect invalid column names from query in Pinot server

Posted by GitBox <gi...@apache.org>.
jackjlli commented on pull request #6361:
URL: https://github.com/apache/incubator-pinot/pull/6361#issuecomment-828093494


   > High level question: why do we perform this check on server side? We should be able to perform the check in `BrokerRequestHandler` using `PinotQuery` and `Schema`, and directly reject the query without populating the query to the servers.
   > You can implement the logic within the `BaseBrokerRequestHandler.validateRequest()`
   
   Good question. In fact, that's the first thing I tried by putting the validation logic in broker (https://github.com/apache/incubator-pinot/pull/6066). While this approach has a drawback that in order to get the column names appeared in the query, we have to parse the query from broker side. By doing so, the query would be parsed two times, one in broker and one in server. Given the fact that the query has to be parsed in server, we can just reuse the same queryContext object to provide the column names from the query.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] jackjlli edited a comment on pull request #6361: Detect invalid column names from query in Pinot server

Posted by GitBox <gi...@apache.org>.
jackjlli edited a comment on pull request #6361:
URL: https://github.com/apache/incubator-pinot/pull/6361#issuecomment-761220753


   > It would be good to verify the following before pushing the change:
   > 
   > * Are there any existing prod cases that might break? For example, if a new column was added to schema, but no backfill was performed, so only the newer segments have this the new column. The new behavior will bail out early, this might be different from existing behavior.
   > * Double check any performance impact on high throughput cases.
   
   If a new column was added, the `_tableColumnNamesMap` in `HelixInstanceDataManager` will be updated with the latest schema if there is a segment reload or a server restart. In this case, when the new column gets queried, it can go through and query the new segments. Old segments remained un-queried since they will be pruned.
   
   If there is no segment reload nor server restart, the table is in an inconsistent state, where old segments don't have new column, and new segments do. Without telling pinot-server what should be the correct schema, `_tableColumnNamesMap` remains holding its existing copy of column names. The server holding the old copy will return empty response with a `invalidColumns` flag in the payload. One workaround is similarly to update table config, where a refresh table config message will be sent to brokers. When a table schema gets updated, a refresh schema message will be sent to servers. But I think this is similar to (schema update + auto segment reload).
   
   In terms of the performance impact, I'll benchmark it and update it here.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] jackjlli commented on a change in pull request #6361: Detect invalid column names from query in Pinot server

Posted by GitBox <gi...@apache.org>.
jackjlli commented on a change in pull request #6361:
URL: https://github.com/apache/incubator-pinot/pull/6361#discussion_r546011413



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/executor/ServerQueryExecutorV1Impl.java
##########
@@ -259,6 +260,18 @@ private DataTable processQuery(List<IndexSegment> indexSegments, QueryContext qu
       ExecutorService executorService, @Nullable StreamObserver<Server.ServerResponse> responseObserver, long endTimeMs,
       boolean enableStreaming)
       throws Exception {
+
+    // Validate whether column names in the query are valid

Review comment:
       We've already done the similar check in `DataSchemaSegmentPruner` class and the as I mentioned `containsAll` method is called for every indexed segment. 
   
   If the query does include invalid column names, those checks in `DataSchemaSegmentPruner` would be skipped. 
   If the query doesn't include any invalid columns, the effect here is like to add one more pruning check on one extra segment.
   
   I feel that `DataSchemaSegmentPruner` is still needed, in case if there is some inconsistence between the new and old segments and the table doesn't get reloaded.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] jackjlli commented on a change in pull request #6361: Detect invalid column names from query in Pinot server

Posted by GitBox <gi...@apache.org>.
jackjlli commented on a change in pull request #6361:
URL: https://github.com/apache/incubator-pinot/pull/6361#discussion_r558581408



##########
File path: pinot-common/src/main/java/org/apache/pinot/common/metrics/BrokerMeter.java
##########
@@ -87,6 +87,7 @@
 
   GROUP_BY_SIZE("queries", false),
   TOTAL_SERVER_RESPONSE_SIZE("queries", false),
+  QUERY_COLUMN_NAME_MISMATCH("queries", false),

Review comment:
       Updated the meter name




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mayankshriv commented on a change in pull request #6361: Detect invalid column names from query in Pinot server

Posted by GitBox <gi...@apache.org>.
mayankshriv commented on a change in pull request #6361:
URL: https://github.com/apache/incubator-pinot/pull/6361#discussion_r562128492



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/executor/ServerQueryExecutorV1Impl.java
##########
@@ -259,6 +260,18 @@ private DataTable processQuery(List<IndexSegment> indexSegments, QueryContext qu
       ExecutorService executorService, @Nullable StreamObserver<Server.ServerResponse> responseObserver, long endTimeMs,
       boolean enableStreaming)
       throws Exception {
+
+    // Validate whether column names in the query are valid
+    Set<String> columnNamesFromSchema = _instanceDataManager.getColumnNamesByTable(queryContext.getTableName());
+    Set<String> columnNamesFromQuery = queryContext.getColumns();
+    if (!columnNamesFromSchema.isEmpty() && !columnNamesFromSchema.containsAll(columnNamesFromQuery)) {
+      columnNamesFromQuery.removeAll(columnNamesFromSchema);

Review comment:
       Ok, great. Please add tests to cover aliasing case (if they don't already exist).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] jackjlli commented on a change in pull request #6361: Detect invalid column names from query in Pinot server

Posted by GitBox <gi...@apache.org>.
jackjlli commented on a change in pull request #6361:
URL: https://github.com/apache/incubator-pinot/pull/6361#discussion_r568165047



##########
File path: pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/BaseClusterIntegrationTestSet.java
##########
@@ -623,8 +623,10 @@ public void testReload(boolean includeOfflineTable)
         JsonNode testQueryResponse = postQuery(testQuery);
         // Should not throw exception during reload
         assertEquals(testQueryResponse.get("exceptions").size(), 0);
-        // Total docs should not change during reload
-        assertEquals(testQueryResponse.get("totalDocs").asLong(), numTotalDocs);
+        // Total docs should eventually match
+        if (testQueryResponse.get("totalDocs").asLong() != numTotalDocs) {

Review comment:
       This is the unit test for testing default values. When reload request is received, it takes time for pinot servers to update the schema. 
   And this `TEST_DEFAULT_COLUMNS_QUERY = "SELECT COUNT(*) FROM mytable WHERE NewAddedIntDimension < 0"` in Line 640 queries a non-existing column in the new schema. That's why before the new schema gets fully reloaded in pinot-server, it'd return the totalDocs being not the same as the actual value of numTotalDocs.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] jackjlli commented on a change in pull request #6361: Detect invalid column names from query in Pinot server

Posted by GitBox <gi...@apache.org>.
jackjlli commented on a change in pull request #6361:
URL: https://github.com/apache/incubator-pinot/pull/6361#discussion_r620729664



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/reduce/BrokerReduceService.java
##########
@@ -208,6 +209,11 @@ public BrokerResponseNative reduceOnDataTable(BrokerRequest brokerRequest,
       }
       numGroupsLimitReached |= Boolean.parseBoolean(metadata.get(MetadataKey.NUM_GROUPS_LIMIT_REACHED.getName()));
 
+      String invalidColumnNamesString = metadata.get(MetadataKey.INVALID_COLUMN_IN_QUERY.getName());
+      if (invalidColumnNamesString != null) {
+        invalidColumnNames = invalidColumnNames == null ? invalidColumnNamesString : invalidColumnNames;
+      }

Review comment:
       The if statement should be `if (invalidColumnNames == null && invalidColumnNamesString != null)`. 
   Will update that in the next push. Thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] jackjlli commented on a change in pull request #6361: Detect invalid column names from query in Pinot server

Posted by GitBox <gi...@apache.org>.
jackjlli commented on a change in pull request #6361:
URL: https://github.com/apache/incubator-pinot/pull/6361#discussion_r562132809



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/executor/ServerQueryExecutorV1Impl.java
##########
@@ -259,6 +260,18 @@ private DataTable processQuery(List<IndexSegment> indexSegments, QueryContext qu
       ExecutorService executorService, @Nullable StreamObserver<Server.ServerResponse> responseObserver, long endTimeMs,
       boolean enableStreaming)
       throws Exception {
+
+    // Validate whether column names in the query are valid
+    Set<String> columnNamesFromSchema = _instanceDataManager.getColumnNamesByTable(queryContext.getTableName());
+    Set<String> columnNamesFromQuery = queryContext.getColumns();
+    if (!columnNamesFromSchema.isEmpty() && !columnNamesFromSchema.containsAll(columnNamesFromQuery)) {
+      columnNamesFromQuery.removeAll(columnNamesFromSchema);

Review comment:
       Aliasing case has been covered in integration tests. We should be good here. :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] jackjlli commented on a change in pull request #6361: Detect invalid column names from query in Pinot server

Posted by GitBox <gi...@apache.org>.
jackjlli commented on a change in pull request #6361:
URL: https://github.com/apache/incubator-pinot/pull/6361#discussion_r568138689



##########
File path: pinot-server/src/main/java/org/apache/pinot/server/starter/helix/HelixInstanceDataManager.java
##########
@@ -326,6 +339,11 @@ public SegmentMetadata getSegmentMetadata(String tableNameWithType, String segme
     }
   }
 
+  @Override
+  public Set<String> getColumnNamesByTable(String tableNameWithType) {
+    return _tableColumnNamesMap.computeIfAbsent(tableNameWithType, k -> Collections.emptySet());

Review comment:
       I don't quite fully understand the meaning of testing the overhead of computing the map once per query. The table columns are fetched from the schema which is stored in Zookeeper, and they are used to detect the column name mismatch during query execution. If schema fetch is done once per query, the ZK access would be a bottleneck for query execution. Plus, it'd be good to stick to one behavior in all different circumstances. Having a cache for high qps usecase fabric while directly fetching once per query for large MT cluster would make the code more difficult to maintain.
   
   `_tableColumnNamesMap` would stay the same state with `_tableDataManagerMap`. This is to make sure that the behavior of these two object are always consistent. And I don't see `_tableDataManagerMap` drops any reference of dataManager, until the server is shutting down.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mayankshriv commented on a change in pull request #6361: Detect invalid column names from query in Pinot server

Posted by GitBox <gi...@apache.org>.
mayankshriv commented on a change in pull request #6361:
URL: https://github.com/apache/incubator-pinot/pull/6361#discussion_r568035246



##########
File path: pinot-server/src/main/java/org/apache/pinot/server/starter/helix/HelixInstanceDataManager.java
##########
@@ -326,6 +339,11 @@ public SegmentMetadata getSegmentMetadata(String tableNameWithType, String segme
     }
   }
 
+  @Override
+  public Set<String> getColumnNamesByTable(String tableNameWithType) {
+    return _tableColumnNamesMap.computeIfAbsent(tableNameWithType, k -> Collections.emptySet());

Review comment:
       Can this case memory issue if there are too many tables on an instance? We can check how the overhead of computing the map (if we can do once per query, not segment) compares to the overhead of caching (especially for large MT cluster).
   
   Also, what happens when a table is deleted, is this entry cleared?

##########
File path: pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/BaseClusterIntegrationTestSet.java
##########
@@ -623,8 +623,10 @@ public void testReload(boolean includeOfflineTable)
         JsonNode testQueryResponse = postQuery(testQuery);
         // Should not throw exception during reload
         assertEquals(testQueryResponse.get("exceptions").size(), 0);
-        // Total docs should not change during reload
-        assertEquals(testQueryResponse.get("totalDocs").asLong(), numTotalDocs);
+        // Total docs should eventually match
+        if (testQueryResponse.get("totalDocs").asLong() != numTotalDocs) {

Review comment:
       Why does the change in this PR need this change?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] jackjlli commented on a change in pull request #6361: Detect invalid column names from query in Pinot server

Posted by GitBox <gi...@apache.org>.
jackjlli commented on a change in pull request #6361:
URL: https://github.com/apache/incubator-pinot/pull/6361#discussion_r558602288



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/executor/ServerQueryExecutorV1Impl.java
##########
@@ -259,6 +260,18 @@ private DataTable processQuery(List<IndexSegment> indexSegments, QueryContext qu
       ExecutorService executorService, @Nullable StreamObserver<Server.ServerResponse> responseObserver, long endTimeMs,
       boolean enableStreaming)
       throws Exception {
+
+    // Validate whether column names in the query are valid
+    Set<String> columnNamesFromSchema = _instanceDataManager.getColumnNamesByTable(queryContext.getTableName());
+    Set<String> columnNamesFromQuery = queryContext.getColumns();
+    if (!columnNamesFromSchema.isEmpty() && !columnNamesFromSchema.containsAll(columnNamesFromQuery)) {
+      columnNamesFromQuery.removeAll(columnNamesFromSchema);

Review comment:
       Aliasing has already been taken care of when generating the queryContext in pinot-server. So any queries with alias will 
    still work here. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on pull request #6361: Detect invalid column names from query in Pinot server

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on pull request #6361:
URL: https://github.com/apache/incubator-pinot/pull/6361#issuecomment-828087842


   High level question: why do we perform this check on server side? We should be able to perform the check in `BrokerRequestHandler` using `PinotQuery` and `Schema`, and directly reject the query without populating the query to the servers.
   You can implement the logic within the `BaseBrokerRequestHandler.validateRequest()`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mayankshriv commented on a change in pull request #6361: Detect invalid column names from query in Pinot server

Posted by GitBox <gi...@apache.org>.
mayankshriv commented on a change in pull request #6361:
URL: https://github.com/apache/incubator-pinot/pull/6361#discussion_r562128492



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/executor/ServerQueryExecutorV1Impl.java
##########
@@ -259,6 +260,18 @@ private DataTable processQuery(List<IndexSegment> indexSegments, QueryContext qu
       ExecutorService executorService, @Nullable StreamObserver<Server.ServerResponse> responseObserver, long endTimeMs,
       boolean enableStreaming)
       throws Exception {
+
+    // Validate whether column names in the query are valid
+    Set<String> columnNamesFromSchema = _instanceDataManager.getColumnNamesByTable(queryContext.getTableName());
+    Set<String> columnNamesFromQuery = queryContext.getColumns();
+    if (!columnNamesFromSchema.isEmpty() && !columnNamesFromSchema.containsAll(columnNamesFromQuery)) {
+      columnNamesFromQuery.removeAll(columnNamesFromSchema);

Review comment:
       Ok, great. Please add tests to cover aliasing case (if they don't already exist).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] jackjlli commented on a change in pull request #6361: Detect invalid column names from query in Pinot server

Posted by GitBox <gi...@apache.org>.
jackjlli commented on a change in pull request #6361:
URL: https://github.com/apache/incubator-pinot/pull/6361#discussion_r558602462



##########
File path: pinot-server/src/main/java/org/apache/pinot/server/starter/helix/HelixInstanceDataManager.java
##########
@@ -326,6 +339,11 @@ public SegmentMetadata getSegmentMetadata(String tableNameWithType, String segme
     }
   }
 
+  @Override
+  public Set<String> getColumnNamesByTable(String tableNameWithType) {
+    return _tableColumnNamesMap.computeIfAbsent(tableNameWithType, k -> new HashSet<>());

Review comment:
       Adjusted.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] jackjlli commented on a change in pull request #6361: Detect invalid column names from query in Pinot server

Posted by GitBox <gi...@apache.org>.
jackjlli commented on a change in pull request #6361:
URL: https://github.com/apache/incubator-pinot/pull/6361#discussion_r546000521



##########
File path: pinot-common/src/main/java/org/apache/pinot/common/metrics/BrokerMeter.java
##########
@@ -87,6 +87,7 @@
 
   GROUP_BY_SIZE("queries", false),
   TOTAL_SERVER_RESPONSE_SIZE("queries", false),
+  QUERY_COLUMN_NAME_MISMATCH("queries", false),

Review comment:
       How about we rename it to `QUERY_COLUMN_NAME_MISSING_IN_SCHEMA`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] jackjlli commented on pull request #6361: Detect invalid column names from query in Pinot server

Posted by GitBox <gi...@apache.org>.
jackjlli commented on pull request #6361:
URL: https://github.com/apache/incubator-pinot/pull/6361#issuecomment-765640917


   @mayankshriv Regarding performance impact, I've benchmarked this PR with two type use cases. One is with high qps, one is with high number of columns in the schema. Neither of them showed performance degradation.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mayankshriv commented on a change in pull request #6361: Detect invalid column names from query in Pinot server

Posted by GitBox <gi...@apache.org>.
mayankshriv commented on a change in pull request #6361:
URL: https://github.com/apache/incubator-pinot/pull/6361#discussion_r545966725



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/executor/ServerQueryExecutorV1Impl.java
##########
@@ -259,6 +260,18 @@ private DataTable processQuery(List<IndexSegment> indexSegments, QueryContext qu
       ExecutorService executorService, @Nullable StreamObserver<Server.ServerResponse> responseObserver, long endTimeMs,
       boolean enableStreaming)
       throws Exception {
+
+    // Validate whether column names in the query are valid
+    Set<String> columnNamesFromSchema = _instanceDataManager.getColumnNamesByTable(queryContext.getTableName());
+    Set<String> columnNamesFromQuery = queryContext.getColumns();
+    if (!columnNamesFromSchema.isEmpty() && !columnNamesFromSchema.containsAll(columnNamesFromQuery)) {

Review comment:
       Is the empty check warranted?
   
   Also, does this work with aliasing? For example, `select foo as bar from table`?

##########
File path: pinot-common/src/main/java/org/apache/pinot/common/metrics/BrokerMeter.java
##########
@@ -87,6 +87,7 @@
 
   GROUP_BY_SIZE("queries", false),
   TOTAL_SERVER_RESPONSE_SIZE("queries", false),
+  QUERY_COLUMN_NAME_MISMATCH("queries", false),

Review comment:
       It is more of a case of column name in query missing from schema, than a column name mismatch, right?

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/executor/ServerQueryExecutorV1Impl.java
##########
@@ -259,6 +260,18 @@ private DataTable processQuery(List<IndexSegment> indexSegments, QueryContext qu
       ExecutorService executorService, @Nullable StreamObserver<Server.ServerResponse> responseObserver, long endTimeMs,
       boolean enableStreaming)
       throws Exception {
+
+    // Validate whether column names in the query are valid
+    Set<String> columnNamesFromSchema = _instanceDataManager.getColumnNamesByTable(queryContext.getTableName());
+    Set<String> columnNamesFromQuery = queryContext.getColumns();
+    if (!columnNamesFromSchema.isEmpty() && !columnNamesFromSchema.containsAll(columnNamesFromQuery)) {
+      columnNamesFromQuery.removeAll(columnNamesFromSchema);

Review comment:
       Using `containsAll()` and `removeAll` seems inefficient.

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/executor/ServerQueryExecutorV1Impl.java
##########
@@ -259,6 +260,18 @@ private DataTable processQuery(List<IndexSegment> indexSegments, QueryContext qu
       ExecutorService executorService, @Nullable StreamObserver<Server.ServerResponse> responseObserver, long endTimeMs,
       boolean enableStreaming)
       throws Exception {
+
+    // Validate whether column names in the query are valid

Review comment:
       I am a bit hesitant to add additional check to penalize all queries for catching an infrequent error that.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] jackjlli commented on a change in pull request #6361: Detect invalid column names from query in Pinot server

Posted by GitBox <gi...@apache.org>.
jackjlli commented on a change in pull request #6361:
URL: https://github.com/apache/incubator-pinot/pull/6361#discussion_r545991126



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/executor/ServerQueryExecutorV1Impl.java
##########
@@ -259,6 +260,18 @@ private DataTable processQuery(List<IndexSegment> indexSegments, QueryContext qu
       ExecutorService executorService, @Nullable StreamObserver<Server.ServerResponse> responseObserver, long endTimeMs,
       boolean enableStreaming)
       throws Exception {
+
+    // Validate whether column names in the query are valid
+    Set<String> columnNamesFromSchema = _instanceDataManager.getColumnNamesByTable(queryContext.getTableName());
+    Set<String> columnNamesFromQuery = queryContext.getColumns();
+    if (!columnNamesFromSchema.isEmpty() && !columnNamesFromSchema.containsAll(columnNamesFromQuery)) {

Review comment:
       The empty check is warranted. E.g. in local pinot cluster, there is no need to upload table schema before creating offline table config. Thus, ZK may not store any table schema for the table that is about to be queried.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] jackjlli commented on a change in pull request #6361: Detect invalid column names from query in Pinot server

Posted by GitBox <gi...@apache.org>.
jackjlli commented on a change in pull request #6361:
URL: https://github.com/apache/incubator-pinot/pull/6361#discussion_r546005399



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/executor/ServerQueryExecutorV1Impl.java
##########
@@ -259,6 +260,18 @@ private DataTable processQuery(List<IndexSegment> indexSegments, QueryContext qu
       ExecutorService executorService, @Nullable StreamObserver<Server.ServerResponse> responseObserver, long endTimeMs,
       boolean enableStreaming)
       throws Exception {
+
+    // Validate whether column names in the query are valid
+    Set<String> columnNamesFromSchema = _instanceDataManager.getColumnNamesByTable(queryContext.getTableName());
+    Set<String> columnNamesFromQuery = queryContext.getColumns();
+    if (!columnNamesFromSchema.isEmpty() && !columnNamesFromSchema.containsAll(columnNamesFromQuery)) {
+      columnNamesFromQuery.removeAll(columnNamesFromSchema);

Review comment:
       The `containsAll` method has already been used in `DataSchemaSegmentPruner` class as well. And it gets called for every selected segment. And here it is just done once. 
   The method `removeAll` here is invoked only when the if statement is satisfied, so no extra execution on querying actual segments. 
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] codecov-io edited a comment on pull request #6361: Detect invalid column names from query in Pinot server

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #6361:
URL: https://github.com/apache/incubator-pinot/pull/6361#issuecomment-745732477


   # [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6361?src=pr&el=h1) Report
   > Merging [#6361](https://codecov.io/gh/apache/incubator-pinot/pull/6361?src=pr&el=desc) (f2f3529) into [master](https://codecov.io/gh/apache/incubator-pinot/commit/1beaab59b73f26c4e35f3b9bc856b03806cddf5a?el=desc) (1beaab5) will **decrease** coverage by `1.48%`.
   > The diff coverage is `56.80%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-pinot/pull/6361/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz)](https://codecov.io/gh/apache/incubator-pinot/pull/6361?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #6361      +/-   ##
   ==========================================
   - Coverage   66.44%   64.95%   -1.49%     
   ==========================================
     Files        1075     1332     +257     
     Lines       54773    65248   +10475     
     Branches     8168     9513    +1345     
   ==========================================
   + Hits        36396    42385    +5989     
   - Misses      15700    19834    +4134     
   - Partials     2677     3029     +352     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | unittests | `64.95% <56.80%> (?)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-pinot/pull/6361?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...e/pinot/broker/api/resources/PinotBrokerDebug.java](https://codecov.io/gh/apache/incubator-pinot/pull/6361/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYXBpL3Jlc291cmNlcy9QaW5vdEJyb2tlckRlYnVnLmphdmE=) | `0.00% <0.00%> (-79.32%)` | :arrow_down: |
   | [...ot/broker/broker/AllowAllAccessControlFactory.java](https://codecov.io/gh/apache/incubator-pinot/pull/6361/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYnJva2VyL0FsbG93QWxsQWNjZXNzQ29udHJvbEZhY3RvcnkuamF2YQ==) | `71.42% <ø> (-28.58%)` | :arrow_down: |
   | [.../helix/BrokerUserDefinedMessageHandlerFactory.java](https://codecov.io/gh/apache/incubator-pinot/pull/6361/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYnJva2VyL2hlbGl4L0Jyb2tlclVzZXJEZWZpbmVkTWVzc2FnZUhhbmRsZXJGYWN0b3J5LmphdmE=) | `33.96% <0.00%> (-32.71%)` | :arrow_down: |
   | [...ker/routing/instanceselector/InstanceSelector.java](https://codecov.io/gh/apache/incubator-pinot/pull/6361/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvcm91dGluZy9pbnN0YW5jZXNlbGVjdG9yL0luc3RhbmNlU2VsZWN0b3IuamF2YQ==) | `100.00% <ø> (ø)` | |
   | [...ava/org/apache/pinot/client/AbstractResultSet.java](https://codecov.io/gh/apache/incubator-pinot/pull/6361/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0Fic3RyYWN0UmVzdWx0U2V0LmphdmE=) | `66.66% <0.00%> (+9.52%)` | :arrow_up: |
   | [.../main/java/org/apache/pinot/client/Connection.java](https://codecov.io/gh/apache/incubator-pinot/pull/6361/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0Nvbm5lY3Rpb24uamF2YQ==) | `35.55% <0.00%> (-13.29%)` | :arrow_down: |
   | [...inot/client/JsonAsyncHttpPinotClientTransport.java](https://codecov.io/gh/apache/incubator-pinot/pull/6361/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0pzb25Bc3luY0h0dHBQaW5vdENsaWVudFRyYW5zcG9ydC5qYXZh) | `10.90% <0.00%> (-51.10%)` | :arrow_down: |
   | [...not/common/assignment/InstancePartitionsUtils.java](https://codecov.io/gh/apache/incubator-pinot/pull/6361/diff?src=pr&el=tree#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vYXNzaWdubWVudC9JbnN0YW5jZVBhcnRpdGlvbnNVdGlscy5qYXZh) | `73.80% <ø> (+0.63%)` | :arrow_up: |
   | [...common/config/tuner/NoOpTableTableConfigTuner.java](https://codecov.io/gh/apache/incubator-pinot/pull/6361/diff?src=pr&el=tree#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vY29uZmlnL3R1bmVyL05vT3BUYWJsZVRhYmxlQ29uZmlnVHVuZXIuamF2YQ==) | `100.00% <ø> (ø)` | |
   | [...ot/common/config/tuner/RealTimeAutoIndexTuner.java](https://codecov.io/gh/apache/incubator-pinot/pull/6361/diff?src=pr&el=tree#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vY29uZmlnL3R1bmVyL1JlYWxUaW1lQXV0b0luZGV4VHVuZXIuamF2YQ==) | `100.00% <ø> (ø)` | |
   | ... and [1167 more](https://codecov.io/gh/apache/incubator-pinot/pull/6361/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6361?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6361?src=pr&el=footer). Last update [8d3d4d4...f2f3529](https://codecov.io/gh/apache/incubator-pinot/pull/6361?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] jackjlli commented on pull request #6361: Detect invalid column names from query in Pinot server

Posted by GitBox <gi...@apache.org>.
jackjlli commented on pull request #6361:
URL: https://github.com/apache/incubator-pinot/pull/6361#issuecomment-761220753


   > It would be good to verify the following before pushing the change:
   > 
   > * Are there any existing prod cases that might break? For example, if a new column was added to schema, but no backfill was performed, so only the newer segments have this the new column. The new behavior will bail out early, this might be different from existing behavior.
   > * Double check any performance impact on high throughput cases.
   
   If a new column was added, the `_tableColumnNamesMap` in `HelixInstanceDataManager` will be updated with the latest schema if there is a segment reload or a server restart. In this case, when the new column gets queried, it can go through and query the new segments. Old segments remained un-queried since it's been pruned.
   
   If there is no segment reload nor server restart, the table is in an inconsistent state, where old segments don't have new column, and new segments do. Without telling pinot-server what should be the correct schema, `_tableColumnNamesMap` remains holding its existing copy of column names. The server holding the old copy will return empty response with a `invalidColumns` flag in the payload. One workaround is similarly to update table config, where a refresh table config message will be sent to brokers. When a table schema gets updated, a refresh schema message will be sent to servers. But I think this is similar to (schema update + auto segment reload).
   
   In terms of the performance impact, I'll benchmark it and update it here.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mqliang commented on a change in pull request #6361: Detect invalid column names from query in Pinot server

Posted by GitBox <gi...@apache.org>.
mqliang commented on a change in pull request #6361:
URL: https://github.com/apache/incubator-pinot/pull/6361#discussion_r620722571



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/reduce/BrokerReduceService.java
##########
@@ -208,6 +209,11 @@ public BrokerResponseNative reduceOnDataTable(BrokerRequest brokerRequest,
       }
       numGroupsLimitReached |= Boolean.parseBoolean(metadata.get(MetadataKey.NUM_GROUPS_LIMIT_REACHED.getName()));
 
+      String invalidColumnNamesString = metadata.get(MetadataKey.INVALID_COLUMN_IN_QUERY.getName());
+      if (invalidColumnNamesString != null) {
+        invalidColumnNames = invalidColumnNames == null ? invalidColumnNamesString : invalidColumnNames;
+      }

Review comment:
       Can be changed as:
   ```
   if (invalidColumnNames != null && invalidColumnNamesString != null) {
           invalidColumnNames =  invalidColumnNamesString;
         }
   ```
   Which is a little bit more succinct. This is a minor comments, up to you to change or not. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] jackjlli commented on a change in pull request #6361: Detect invalid column names from query in Pinot server

Posted by GitBox <gi...@apache.org>.
jackjlli commented on a change in pull request #6361:
URL: https://github.com/apache/incubator-pinot/pull/6361#discussion_r546011413



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/executor/ServerQueryExecutorV1Impl.java
##########
@@ -259,6 +260,18 @@ private DataTable processQuery(List<IndexSegment> indexSegments, QueryContext qu
       ExecutorService executorService, @Nullable StreamObserver<Server.ServerResponse> responseObserver, long endTimeMs,
       boolean enableStreaming)
       throws Exception {
+
+    // Validate whether column names in the query are valid

Review comment:
       We've already done the similar check in `DataSchemaSegmentPruner` class and the as I mentioned `containsAll` method is called for every indexed segment. 
   
   If the query does include invalid column names, those check in `DataSchemaSegmentPruner` would be skipped. 
   If the query doesn't include any invalid columns, the effect here is like to add one more pruning check on one extra segment.
   
   I feel that `DataSchemaSegmentPruner` is still needed, in case if there is some inconsistence between the new and old segments and the table doesn't get reloaded.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] jackjlli commented on a change in pull request #6361: Detect invalid column names from query in Pinot server

Posted by GitBox <gi...@apache.org>.
jackjlli commented on a change in pull request #6361:
URL: https://github.com/apache/incubator-pinot/pull/6361#discussion_r620719548



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/executor/ServerQueryExecutorV1Impl.java
##########
@@ -303,6 +299,31 @@ private DataTable processQuery(List<IndexSegment> indexSegments, QueryContext qu
     }
   }
 
+  /**
+   * If all the segments are pruned, check whether it's caused by invalid column name in the query.
+   * This is to keep the behavior consistent when new columns are added and not all the segments have the new columns in their metadata;
+   * old segments may contain stale schema until the table is reloaded.
+   */
+  private void detectInvalidColumnIfExists(QueryContext queryContext, DataTable dataTable) {
+    Set<String> columnNamesFromSchema = _instanceDataManager.getColumnNamesByTable(queryContext.getTableName());
+    Set<String> columnNamesFromQuery = queryContext.getColumns();
+    // Validate whether column names in the query are valid
+    if (!columnNamesFromSchema.isEmpty() && !columnNamesFromSchema.containsAll(columnNamesFromQuery)) {
+      columnNamesFromQuery.removeAll(columnNamesFromSchema);
+      dataTable.getMetadata().put(MetadataKey.INVALID_COLUMN_IN_QUERY.getName(), columnNamesFromQuery.toString());

Review comment:
       Correct, that's exactly what we're doing in `detectInvalidColumnIfExists` method.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] jackjlli commented on a change in pull request #6361: Detect invalid column names from query in Pinot server

Posted by GitBox <gi...@apache.org>.
jackjlli commented on a change in pull request #6361:
URL: https://github.com/apache/incubator-pinot/pull/6361#discussion_r562132809



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/executor/ServerQueryExecutorV1Impl.java
##########
@@ -259,6 +260,18 @@ private DataTable processQuery(List<IndexSegment> indexSegments, QueryContext qu
       ExecutorService executorService, @Nullable StreamObserver<Server.ServerResponse> responseObserver, long endTimeMs,
       boolean enableStreaming)
       throws Exception {
+
+    // Validate whether column names in the query are valid
+    Set<String> columnNamesFromSchema = _instanceDataManager.getColumnNamesByTable(queryContext.getTableName());
+    Set<String> columnNamesFromQuery = queryContext.getColumns();
+    if (!columnNamesFromSchema.isEmpty() && !columnNamesFromSchema.containsAll(columnNamesFromQuery)) {
+      columnNamesFromQuery.removeAll(columnNamesFromSchema);

Review comment:
       Aliasing case has been covered in integration tests. We should be good here. :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mayankshriv commented on a change in pull request #6361: Detect invalid column names from query in Pinot server

Posted by GitBox <gi...@apache.org>.
mayankshriv commented on a change in pull request #6361:
URL: https://github.com/apache/incubator-pinot/pull/6361#discussion_r558484411



##########
File path: pinot-server/src/main/java/org/apache/pinot/server/starter/helix/HelixInstanceDataManager.java
##########
@@ -326,6 +339,11 @@ public SegmentMetadata getSegmentMetadata(String tableNameWithType, String segme
     }
   }
 
+  @Override
+  public Set<String> getColumnNamesByTable(String tableNameWithType) {
+    return _tableColumnNamesMap.computeIfAbsent(tableNameWithType, k -> new HashSet<>());

Review comment:
       Use `Collections.EmptySet`, instead of creating a new one.

##########
File path: pinot-common/src/main/java/org/apache/pinot/common/metrics/BrokerMeter.java
##########
@@ -87,6 +87,7 @@
 
   GROUP_BY_SIZE("queries", false),
   TOTAL_SERVER_RESPONSE_SIZE("queries", false),
+  QUERY_COLUMN_NAME_MISMATCH("queries", false),

Review comment:
       In other places (DataTable), you have used INVALID_COLUMN, so may be `INVALID_COLUMN_NAME_IN_QUERY`?

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/query/executor/ServerQueryExecutorV1Impl.java
##########
@@ -259,6 +260,18 @@ private DataTable processQuery(List<IndexSegment> indexSegments, QueryContext qu
       ExecutorService executorService, @Nullable StreamObserver<Server.ServerResponse> responseObserver, long endTimeMs,
       boolean enableStreaming)
       throws Exception {
+
+    // Validate whether column names in the query are valid
+    Set<String> columnNamesFromSchema = _instanceDataManager.getColumnNamesByTable(queryContext.getTableName());
+    Set<String> columnNamesFromQuery = queryContext.getColumns();
+    if (!columnNamesFromSchema.isEmpty() && !columnNamesFromSchema.containsAll(columnNamesFromQuery)) {
+      columnNamesFromQuery.removeAll(columnNamesFromSchema);

Review comment:
       How about aliasing? Will it work for `select foo as bar from table`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] codecov-io commented on pull request #6361: Detect invalid column names from query in Pinot server

Posted by GitBox <gi...@apache.org>.
codecov-io commented on pull request #6361:
URL: https://github.com/apache/incubator-pinot/pull/6361#issuecomment-745732477


   # [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6361?src=pr&el=h1) Report
   > Merging [#6361](https://codecov.io/gh/apache/incubator-pinot/pull/6361?src=pr&el=desc) (0740f10) into [master](https://codecov.io/gh/apache/incubator-pinot/commit/1beaab59b73f26c4e35f3b9bc856b03806cddf5a?el=desc) (1beaab5) will **decrease** coverage by `1.13%`.
   > The diff coverage is `45.12%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-pinot/pull/6361/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz)](https://codecov.io/gh/apache/incubator-pinot/pull/6361?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #6361      +/-   ##
   ==========================================
   - Coverage   66.44%   65.31%   -1.14%     
   ==========================================
     Files        1075     1291     +216     
     Lines       54773    62185    +7412     
     Branches     8168     9021     +853     
   ==========================================
   + Hits        36396    40617    +4221     
   - Misses      15700    18669    +2969     
   - Partials     2677     2899     +222     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | unittests | `65.31% <45.12%> (?)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-pinot/pull/6361?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...e/pinot/broker/api/resources/PinotBrokerDebug.java](https://codecov.io/gh/apache/incubator-pinot/pull/6361/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYXBpL3Jlc291cmNlcy9QaW5vdEJyb2tlckRlYnVnLmphdmE=) | `0.00% <0.00%> (-79.32%)` | :arrow_down: |
   | [...ot/broker/broker/AllowAllAccessControlFactory.java](https://codecov.io/gh/apache/incubator-pinot/pull/6361/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYnJva2VyL0FsbG93QWxsQWNjZXNzQ29udHJvbEZhY3RvcnkuamF2YQ==) | `71.42% <ø> (-28.58%)` | :arrow_down: |
   | [.../helix/BrokerUserDefinedMessageHandlerFactory.java](https://codecov.io/gh/apache/incubator-pinot/pull/6361/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYnJva2VyL2hlbGl4L0Jyb2tlclVzZXJEZWZpbmVkTWVzc2FnZUhhbmRsZXJGYWN0b3J5LmphdmE=) | `33.96% <0.00%> (-32.71%)` | :arrow_down: |
   | [...ker/routing/instanceselector/InstanceSelector.java](https://codecov.io/gh/apache/incubator-pinot/pull/6361/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvcm91dGluZy9pbnN0YW5jZXNlbGVjdG9yL0luc3RhbmNlU2VsZWN0b3IuamF2YQ==) | `100.00% <ø> (ø)` | |
   | [...ava/org/apache/pinot/client/AbstractResultSet.java](https://codecov.io/gh/apache/incubator-pinot/pull/6361/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0Fic3RyYWN0UmVzdWx0U2V0LmphdmE=) | `66.66% <0.00%> (+9.52%)` | :arrow_up: |
   | [.../main/java/org/apache/pinot/client/Connection.java](https://codecov.io/gh/apache/incubator-pinot/pull/6361/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0Nvbm5lY3Rpb24uamF2YQ==) | `35.55% <0.00%> (-13.29%)` | :arrow_down: |
   | [...inot/client/JsonAsyncHttpPinotClientTransport.java](https://codecov.io/gh/apache/incubator-pinot/pull/6361/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0pzb25Bc3luY0h0dHBQaW5vdENsaWVudFRyYW5zcG9ydC5qYXZh) | `10.90% <0.00%> (-51.10%)` | :arrow_down: |
   | [...not/common/assignment/InstancePartitionsUtils.java](https://codecov.io/gh/apache/incubator-pinot/pull/6361/diff?src=pr&el=tree#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vYXNzaWdubWVudC9JbnN0YW5jZVBhcnRpdGlvbnNVdGlscy5qYXZh) | `73.80% <ø> (+0.63%)` | :arrow_up: |
   | [...common/config/tuner/NoOpTableTableConfigTuner.java](https://codecov.io/gh/apache/incubator-pinot/pull/6361/diff?src=pr&el=tree#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vY29uZmlnL3R1bmVyL05vT3BUYWJsZVRhYmxlQ29uZmlnVHVuZXIuamF2YQ==) | `100.00% <ø> (ø)` | |
   | [...ot/common/config/tuner/RealTimeAutoIndexTuner.java](https://codecov.io/gh/apache/incubator-pinot/pull/6361/diff?src=pr&el=tree#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vY29uZmlnL3R1bmVyL1JlYWxUaW1lQXV0b0luZGV4VHVuZXIuamF2YQ==) | `100.00% <ø> (ø)` | |
   | ... and [1141 more](https://codecov.io/gh/apache/incubator-pinot/pull/6361/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6361?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6361?src=pr&el=footer). Last update [4183ffe...4957261](https://codecov.io/gh/apache/incubator-pinot/pull/6361?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] codecov-io edited a comment on pull request #6361: Detect invalid column names from query in Pinot server

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #6361:
URL: https://github.com/apache/incubator-pinot/pull/6361#issuecomment-745732477


   # [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6361?src=pr&el=h1) Report
   > Merging [#6361](https://codecov.io/gh/apache/incubator-pinot/pull/6361?src=pr&el=desc) (0f4b692) into [master](https://codecov.io/gh/apache/incubator-pinot/commit/1beaab59b73f26c4e35f3b9bc856b03806cddf5a?el=desc) (1beaab5) will **decrease** coverage by `21.04%`.
   > The diff coverage is `52.01%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-pinot/pull/6361/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz)](https://codecov.io/gh/apache/incubator-pinot/pull/6361?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff             @@
   ##           master    #6361       +/-   ##
   ===========================================
   - Coverage   66.44%   45.40%   -21.05%     
   ===========================================
     Files        1075     1291      +216     
     Lines       54773    62183     +7410     
     Branches     8168     9021      +853     
   ===========================================
   - Hits        36396    28234     -8162     
   - Misses      15700    31635    +15935     
   + Partials     2677     2314      -363     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration | `45.40% <52.01%> (?)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-pinot/pull/6361?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [...ot/broker/broker/AllowAllAccessControlFactory.java](https://codecov.io/gh/apache/incubator-pinot/pull/6361/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYnJva2VyL0FsbG93QWxsQWNjZXNzQ29udHJvbEZhY3RvcnkuamF2YQ==) | `100.00% <ø> (ø)` | |
   | [.../helix/BrokerUserDefinedMessageHandlerFactory.java](https://codecov.io/gh/apache/incubator-pinot/pull/6361/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYnJva2VyL2hlbGl4L0Jyb2tlclVzZXJEZWZpbmVkTWVzc2FnZUhhbmRsZXJGYWN0b3J5LmphdmE=) | `52.83% <0.00%> (-13.84%)` | :arrow_down: |
   | [...org/apache/pinot/broker/queryquota/HitCounter.java](https://codecov.io/gh/apache/incubator-pinot/pull/6361/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvcXVlcnlxdW90YS9IaXRDb3VudGVyLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...che/pinot/broker/queryquota/MaxHitRateTracker.java](https://codecov.io/gh/apache/incubator-pinot/pull/6361/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvcXVlcnlxdW90YS9NYXhIaXRSYXRlVHJhY2tlci5qYXZh) | `0.00% <0.00%> (ø)` | |
   | [...ache/pinot/broker/queryquota/QueryQuotaEntity.java](https://codecov.io/gh/apache/incubator-pinot/pull/6361/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvcXVlcnlxdW90YS9RdWVyeVF1b3RhRW50aXR5LmphdmE=) | `0.00% <0.00%> (-50.00%)` | :arrow_down: |
   | [...ker/routing/instanceselector/InstanceSelector.java](https://codecov.io/gh/apache/incubator-pinot/pull/6361/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvcm91dGluZy9pbnN0YW5jZXNlbGVjdG9yL0luc3RhbmNlU2VsZWN0b3IuamF2YQ==) | `100.00% <ø> (ø)` | |
   | [...ceselector/StrictReplicaGroupInstanceSelector.java](https://codecov.io/gh/apache/incubator-pinot/pull/6361/diff?src=pr&el=tree#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvcm91dGluZy9pbnN0YW5jZXNlbGVjdG9yL1N0cmljdFJlcGxpY2FHcm91cEluc3RhbmNlU2VsZWN0b3IuamF2YQ==) | `0.00% <0.00%> (ø)` | |
   | [.../main/java/org/apache/pinot/client/Connection.java](https://codecov.io/gh/apache/incubator-pinot/pull/6361/diff?src=pr&el=tree#diff-cGlub3QtY2xpZW50cy9waW5vdC1qYXZhLWNsaWVudC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY2xpZW50L0Nvbm5lY3Rpb24uamF2YQ==) | `22.22% <0.00%> (-26.62%)` | :arrow_down: |
   | [...not/common/assignment/InstancePartitionsUtils.java](https://codecov.io/gh/apache/incubator-pinot/pull/6361/diff?src=pr&el=tree#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vYXNzaWdubWVudC9JbnN0YW5jZVBhcnRpdGlvbnNVdGlscy5qYXZh) | `64.28% <ø> (-8.89%)` | :arrow_down: |
   | [...common/config/tuner/NoOpTableTableConfigTuner.java](https://codecov.io/gh/apache/incubator-pinot/pull/6361/diff?src=pr&el=tree#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vY29uZmlnL3R1bmVyL05vT3BUYWJsZVRhYmxlQ29uZmlnVHVuZXIuamF2YQ==) | `0.00% <ø> (ø)` | |
   | ... and [1289 more](https://codecov.io/gh/apache/incubator-pinot/pull/6361/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6361?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6361?src=pr&el=footer). Last update [4183ffe...0f4b692](https://codecov.io/gh/apache/incubator-pinot/pull/6361?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on pull request #6361: Detect invalid column names from query in Pinot server

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on pull request #6361:
URL: https://github.com/apache/incubator-pinot/pull/6361#issuecomment-828099160


   > > High level question: why do we perform this check on server side? We should be able to perform the check in `BrokerRequestHandler` using `PinotQuery` and `Schema`, and directly reject the query without populating the query to the servers.
   > > You can implement the logic within the `BaseBrokerRequestHandler.validateRequest()`
   > 
   > Good question. In fact, that's the first thing I tried by putting the validation logic in broker (#6066). While this approach has a drawback that in order to get the column names appeared in the query, we have to parse the query from broker side. By doing so, the query would be parsed two times, one in broker and one in server. Given the fact that the query has to be parsed in server, we can just reuse the same queryContext object to provide the column names from the query.
   
   The problem in #6066 is the logic is not implemented in the correct function. The query is already parsed on broker side, we just need to collect all the identifiers. There are several similar operations on broker, and the overhead is very small. Routing the query to the server and adding a new metric on server side will add much more overhead.
   Long term wise, we will add more validations on broker side before even sending the query, and that's the reason why we made schema available on broker side.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org