You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/06/13 11:55:48 UTC

[GitHub] [pinot] gortiz opened a new pull request, #8884: specify how many segments were pruned by each server segment pruner

gortiz opened a new pull request, #8884:
URL: https://github.com/apache/pinot/pull/8884

   This PR adds three new metrics that break down the older (and still supported) `numSegmentsPrunedByServer`.
   
   The three metrics are:
   - `numSegmentsPrunedInvalid`: Which include both segments that have been pruned because they are actually invalid (aka having docs) or their physical schema doesn't contain all the required columns (quite common when a new column is added but segments were not reloaded)
   - `numSegmentsPrunedByLimit`: The ones pruned by SelectionQuerySegmentPruner, for example: `select * from Table limit 10`
   - `numSegmentsPrunedByValue`: The ones pruned by ColumnValueSegmentPruner, for example when segments can be pruned by using bloom filters or max/min of each column.
   
   These metrics are shown when the controller is used. It may also be interesting to add a warning in the controller when `numSegmentsPrunedInvalid` is different than 0.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] gortiz commented on pull request #8884: specify how many segments were pruned by each server segment pruner

Posted by GitBox <gi...@apache.org>.
gortiz commented on PR #8884:
URL: https://github.com/apache/pinot/pull/8884#issuecomment-1162761909

   > My only concern is around the empty segment, which IMO shouldn't be count as invalid.
   
   Changed
   
   > is it possible to add a test for SegmentPruner for this behavior by mocking an invalid segment?
   
   I've added some tests _inspired_ by the other SegmentPruner tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] walterddr merged pull request #8884: specify how many segments were pruned by each server segment pruner

Posted by GitBox <gi...@apache.org>.
walterddr merged PR #8884:
URL: https://github.com/apache/pinot/pull/8884


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter commented on pull request #8884: specify how many segments were pruned by each server segment pruner

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on PR #8884:
URL: https://github.com/apache/pinot/pull/8884#issuecomment-1154836072

   # [Codecov](https://codecov.io/gh/apache/pinot/pull/8884?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#8884](https://codecov.io/gh/apache/pinot/pull/8884?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (b169c83) into [master](https://codecov.io/gh/apache/pinot/commit/9c8a689c431c1aa663901d4aeb40b1ce642ba098?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (9c8a689) will **decrease** coverage by `13.09%`.
   > The diff coverage is `0.00%`.
   
   ```diff
   @@              Coverage Diff              @@
   ##             master    #8884       +/-   ##
   =============================================
   - Coverage     28.47%   15.38%   -13.10%     
   - Complexity       45      170      +125     
   =============================================
     Files          1794     1760       -34     
     Lines         93849    92226     -1623     
     Branches      14012    13823      -189     
   =============================================
   - Hits          26721    14185    -12536     
   - Misses        64639    77016    +12377     
   + Partials       2489     1025     -1464     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `?` | |
   | integration2 | `?` | |
   | unittests2 | `15.38% <0.00%> (?)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/8884?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...a/org/apache/pinot/common/metrics/ServerMeter.java](https://codecov.io/gh/apache/pinot/pull/8884/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vbWV0cmljcy9TZXJ2ZXJNZXRlci5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...t/common/response/broker/BrokerResponseNative.java](https://codecov.io/gh/apache/pinot/pull/8884/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vcmVzcG9uc2UvYnJva2VyL0Jyb2tlclJlc3BvbnNlTmF0aXZlLmphdmE=) | `0.00% <0.00%> (-89.94%)` | :arrow_down: |
   | [.../java/org/apache/pinot/common/utils/DataTable.java](https://codecov.io/gh/apache/pinot/pull/8884/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vdXRpbHMvRGF0YVRhYmxlLmphdmE=) | `0.00% <0.00%> (-95.13%)` | :arrow_down: |
   | [...core/query/executor/ServerQueryExecutorV1Impl.java](https://codecov.io/gh/apache/pinot/pull/8884/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9leGVjdXRvci9TZXJ2ZXJRdWVyeUV4ZWN1dG9yVjFJbXBsLmphdmE=) | `0.00% <0.00%> (-76.80%)` | :arrow_down: |
   | [...pinot/core/query/pruner/SegmentPrunerProvider.java](https://codecov.io/gh/apache/pinot/pull/8884/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9wcnVuZXIvU2VnbWVudFBydW5lclByb3ZpZGVyLmphdmE=) | `0.00% <0.00%> (-84.62%)` | :arrow_down: |
   | [.../pinot/core/query/pruner/SegmentPrunerService.java](https://codecov.io/gh/apache/pinot/pull/8884/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9wcnVuZXIvU2VnbWVudFBydW5lclNlcnZpY2UuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...not/core/query/pruner/SegmentPrunerStatistics.java](https://codecov.io/gh/apache/pinot/pull/8884/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9wcnVuZXIvU2VnbWVudFBydW5lclN0YXRpc3RpY3MuamF2YQ==) | `0.00% <0.00%> (ø)` | |
   | [...che/pinot/core/query/reduce/BaseReduceService.java](https://codecov.io/gh/apache/pinot/pull/8884/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9yZWR1Y2UvQmFzZVJlZHVjZVNlcnZpY2UuamF2YQ==) | `0.00% <0.00%> (-91.13%)` | :arrow_down: |
   | [...che/pinot/core/query/scheduler/QueryScheduler.java](https://codecov.io/gh/apache/pinot/pull/8884/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9zY2hlZHVsZXIvUXVlcnlTY2hlZHVsZXIuamF2YQ==) | `0.00% <0.00%> (-81.40%)` | :arrow_down: |
   | [...src/main/java/org/apache/pinot/sql/FilterKind.java](https://codecov.io/gh/apache/pinot/pull/8884/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcWwvRmlsdGVyS2luZC5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | ... and [904 more](https://codecov.io/gh/apache/pinot/pull/8884/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/8884?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/8884?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [9c8a689...b169c83](https://codecov.io/gh/apache/pinot/pull/8884?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] walterddr commented on a diff in pull request #8884: specify how many segments were pruned by each server segment pruner

Posted by GitBox <gi...@apache.org>.
walterddr commented on code in PR #8884:
URL: https://github.com/apache/pinot/pull/8884#discussion_r898106187


##########
pinot-controller/src/main/resources/app/pages/Query.tsx:
##########
@@ -295,10 +297,22 @@ const QueryPage = () => {
     setResultData(results.result || { columns: [], records: [] });
     setQueryStats(results.queryStats || { columns: responseStatCols, records: [] });
     setOutputResult(JSON.stringify(results.data, null, 2) || '');
+    setWarnings(extractWarnings(results));
     setQueryLoader(false);
     queryExecuted.current = false;
   };
 
+  const extractWarnings = (result) => {
+    const warnings: Array<string> = [];
+    const numSegmentsPrunedInvalid = result.data.numSegmentsPrunedInvalid;
+    if (numSegmentsPrunedInvalid) {
+      warnings.push(`There are ${numSegmentsPrunedInvalid} invalid segment/s. This usually means that they were `
+         + `created with an older schema. `
+         + `Data wasn't lost, but these segments will be ignored by some queries until the table is reloaded.`);

Review Comment:
   ```suggestion
            + `Please reload the table in order to refresh these segments to the new schema.`);
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #8884: specify how many segments were pruned by each server segment pruner

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on code in PR #8884:
URL: https://github.com/apache/pinot/pull/8884#discussion_r905551783


##########
pinot-core/src/main/java/org/apache/pinot/core/query/pruner/SegmentPrunerService.java:
##########
@@ -76,18 +115,40 @@ public List<IndexSegment> prune(List<IndexSegment> segments, QueryContext query)
     return segments;
   }
 
-  private static List<IndexSegment> removeInvalidSegments(List<IndexSegment> segments, QueryContext query) {
+  /**
+   * Filters the given list, returning a list that only contains the valid segments, modifying the list received as
+   * argument.
+   *
+   * <p>
+   * This is a destructive operation. The list received as arguments may be modified, so only the returned list should
+   * be used.
+   * </p>
+   *
+   * @param segments the list of segments to be pruned. This is a destructive operation that may modify this list in an
+   *                 undefined way. Therefore, this list should not be used after calling this method.
+   * @return the new list with filtered elements. This is the list that have to be used.
+   */
+  private static List<IndexSegment> removeInvalidSegments(List<IndexSegment> segments, QueryContext query,
+      SegmentPrunerStatistics stats) {
     int selected = 0;
+    int invalid = 0;
     for (IndexSegment segment : segments) {
-      if (!isInvalidSegment(segment, query)) {
+      boolean isInvalid = isInvalidSegment(segment, query);

Review Comment:
   (minor) We can skip the valid check if a segment is empty
   ```suggestion
         if (!isEmptySegment(segment)) {
           if (isInvalidSegment(segment, query) {
             invalid++
           } else {
             segments.set(selected++, segment);
           }
         }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #8884: specify how many segments were pruned by each server segment pruner

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on code in PR #8884:
URL: https://github.com/apache/pinot/pull/8884#discussion_r905529608


##########
pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java:
##########
@@ -113,6 +113,9 @@ enum MetadataKey {
     SYSTEM_ACTIVITIES_CPU_TIME_NS("systemActivitiesCpuTimeNs", MetadataValueType.LONG),
     RESPONSE_SER_CPU_TIME_NS("responseSerializationCpuTimeNs", MetadataValueType.LONG),
     NUM_SEGMENTS_PRUNED_BY_SERVER("numSegmentsPrunedByServer", MetadataValueType.INT),
+    NUM_SEGMENTS_PRUNED_INVALID("numSegmentsPrunedByInvalid", MetadataValueType.INT),

Review Comment:
   (MAJOR) Please append the new keys to the end. See the javadoc for this enum.
   We should associate an id to each key instead of relying on the ordinal of the enum. That is out of the scope of this PR, and we need a newer version data table so that the change is backward compatible



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #8884: specify how many segments were pruned by each server segment pruner

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on code in PR #8884:
URL: https://github.com/apache/pinot/pull/8884#discussion_r905551783


##########
pinot-core/src/main/java/org/apache/pinot/core/query/pruner/SegmentPrunerService.java:
##########
@@ -76,18 +115,40 @@ public List<IndexSegment> prune(List<IndexSegment> segments, QueryContext query)
     return segments;
   }
 
-  private static List<IndexSegment> removeInvalidSegments(List<IndexSegment> segments, QueryContext query) {
+  /**
+   * Filters the given list, returning a list that only contains the valid segments, modifying the list received as
+   * argument.
+   *
+   * <p>
+   * This is a destructive operation. The list received as arguments may be modified, so only the returned list should
+   * be used.
+   * </p>
+   *
+   * @param segments the list of segments to be pruned. This is a destructive operation that may modify this list in an
+   *                 undefined way. Therefore, this list should not be used after calling this method.
+   * @return the new list with filtered elements. This is the list that have to be used.
+   */
+  private static List<IndexSegment> removeInvalidSegments(List<IndexSegment> segments, QueryContext query,
+      SegmentPrunerStatistics stats) {
     int selected = 0;
+    int invalid = 0;
     for (IndexSegment segment : segments) {
-      if (!isInvalidSegment(segment, query)) {
+      boolean isInvalid = isInvalidSegment(segment, query);

Review Comment:
   (minor) We can skip the valid check if a segment is empty
   ```suggestion
         if (!isEmptySegment(segment)) {
           if (isInvalidSegment(segment, query)) {
             invalid++
           } else {
             segments.set(selected++, segment);
           }
         }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] gortiz commented on a diff in pull request #8884: specify how many segments were pruned by each server segment pruner

Posted by GitBox <gi...@apache.org>.
gortiz commented on code in PR #8884:
URL: https://github.com/apache/pinot/pull/8884#discussion_r905781568


##########
pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java:
##########
@@ -113,6 +113,9 @@ enum MetadataKey {
     SYSTEM_ACTIVITIES_CPU_TIME_NS("systemActivitiesCpuTimeNs", MetadataValueType.LONG),
     RESPONSE_SER_CPU_TIME_NS("responseSerializationCpuTimeNs", MetadataValueType.LONG),
     NUM_SEGMENTS_PRUNED_BY_SERVER("numSegmentsPrunedByServer", MetadataValueType.INT),
+    NUM_SEGMENTS_PRUNED_INVALID("numSegmentsPrunedByInvalid", MetadataValueType.INT),

Review Comment:
   Nice catch



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org