You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/02/04 15:31:36 UTC

[GitHub] [pinot] mneedham opened a new pull request #8133: Refresh ZK metadata when dimension table is updated

mneedham opened a new pull request #8133:
URL: https://github.com/apache/pinot/pull/8133


   This PR fixes a problem where if you update a dimension table (e.g. by adding a new column and then uploading a new CSV file), the new column can't be read by the lookup function. Below is the type of error you'll see:
   
   ```
   [
     {
       "message": "QueryExecutionError:\norg.apache.pinot.spi.exception.BadQueryRequestException: Caught exception while initializing transform function: lookup\n\tat org.apache.pinot.core.operator.transform.function.TransformFunctionFactory.get(TransformFunctionFactory.java:244)\n\tat org.apache.pinot.core.operator.transform.function.TransformFunctionFactory.get(TransformFunctionFactory.java:239)\n\tat org.apache.pinot.core.operator.transform.TransformOperator.<init>(TransformOperator.java:59)\n\tat org.apache.pinot.core.plan.TransformPlanNode.run(TransformPlanNode.java:71)\n...\nCaused by: java.lang.IllegalArgumentException: Column does not exist in dimension table: courses_OFFLINE:startLocation\n\tat shaded.com.google.common.base.Preconditions.checkArgument(Preconditions.java:383)\n\tat org.apache.pinot.core.operator.transform.function.LookupTransformFunction.init(LookupTransformFunction.java:125)\n\tat org.apache.pinot.core.operator.transform.function.TransformFunctionFactory.get
 (TransformFunctionFactory.java:242)\n\t... 20 more",
       "errorCode": 200
     }
   ]
   ```
   
   The reason that happens is that `_propertyStore` and `_tableSchema` in `DimensionTableDataManager` don't get refreshed when the `addSegment` function gets called on the refreshing of the dimension table segment.
   
   This PR refreshes those fields along with the cached lookup table.
   
   
   ## Upgrade Notes
   Does this PR prevent a zero down-time upgrade? (Assume upgrade order: Controller, Broker, Server, Minion)
   * [ ] Yes (Please label as **<code>backward-incompat</code>**, and complete the section below on Release Notes)
   
   Does this PR fix a zero-downtime upgrade introduced earlier?
   * [ ] Yes (Please label this as **<code>backward-incompat</code>**, and complete the section below on Release Notes)
   
   Does this PR otherwise need attention when creating release notes? Things to consider:
   - New configuration options
   - Deprecation of configurations
   - Signature changes to public methods/interfaces
   - New plugins added or old plugins removed
   * [ ] Yes (Please label this PR as **<code>release-notes</code>** and complete the section on Release Notes)
   ## Release Notes
   <!-- If you have tagged this as either backward-incompat or release-notes,
   you MUST add text here that you would like to see appear in release notes of the
   next release. -->
   
   <!-- If you have a series of commits adding or enabling a feature, then
   add this section only in final commit that marks the feature completed.
   Refer to earlier release notes to see examples of text.
   -->
   ## Documentation
   <!-- If you have introduced a new feature or configuration, please add it to the documentation as well.
   See https://docs.pinot.apache.org/developers/developers-and-contributors/update-document
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #8133: Refresh ZK metadata when dimension table is updated

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #8133:
URL: https://github.com/apache/pinot/pull/8133#discussion_r800527347



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/offline/DimensionTableDataManager.java
##########
@@ -137,10 +142,16 @@ private void populate(Map<PrimaryKey, GenericRow> map)
             indexSegment.getSegmentMetadata().getIndexDir())) {
           while (reader.hasNext()) {
             GenericRow row = reader.next();
-            map.put(row.getPrimaryKey(_primaryKeyColumns), row);
+            map.put(row.getPrimaryKey(_dimensionTable.getPrimaryKeyColumns()), row);
           }
         }
       }
+
+      ZkHelixPropertyStore<ZNRecord> propertyStore = _helixManager.getHelixPropertyStore();
+      Schema tableSchema = ZKMetadataProvider.getTableSchema(propertyStore, _tableNameWithType);
+      List<String> primaryKeyColumns = tableSchema.getPrimaryKeyColumns();
+      dimensionTable.populate(map, tableSchema, primaryKeyColumns);

Review comment:
       I would construct a new `DimensionDataTable` here rather than pass a new one in to be populated.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #8133: Refresh ZK metadata when dimension table is updated

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #8133:
URL: https://github.com/apache/pinot/pull/8133#discussion_r800526509



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/offline/DimensionTable.java
##########
@@ -0,0 +1,58 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.data.manager.offline;
+
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import org.apache.pinot.spi.data.FieldSpec;
+import org.apache.pinot.spi.data.Schema;
+import org.apache.pinot.spi.data.readers.GenericRow;
+import org.apache.pinot.spi.data.readers.PrimaryKey;
+
+
+class DimensionTable {
+
+  private Map<PrimaryKey, GenericRow> _lookupTable = new HashMap<>();
+  private Schema _tableSchema;
+  private List<String> _primaryKeyColumns;

Review comment:
       I would make these final constructor parameters and remove `populate`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] mneedham commented on pull request #8133: Refresh ZK metadata when dimension table is updated

Posted by GitBox <gi...@apache.org>.
mneedham commented on pull request #8133:
URL: https://github.com/apache/pinot/pull/8133#issuecomment-1031359607


   @richardstartin I think I've addressed all your suggestions. Only not sure if I did the right thing for the DimensionTable. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter commented on pull request #8133: Refresh ZK metadata when dimension table is updated

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on pull request #8133:
URL: https://github.com/apache/pinot/pull/8133#issuecomment-1031199059


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/8133?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > :exclamation: No coverage uploaded for pull request base (`master@e59730a`). [Click here to learn what that means](https://docs.codecov.io/docs/error-reference?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#section-missing-base-commit).
   > The diff coverage is `100.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/8133/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/8133?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@            Coverage Diff            @@
   ##             master    #8133   +/-   ##
   =========================================
     Coverage          ?   70.35%           
     Complexity        ?     4305           
   =========================================
     Files             ?     1625           
     Lines             ?    84215           
     Branches          ?    12602           
   =========================================
     Hits              ?    59247           
     Misses            ?    20891           
     Partials          ?     4077           
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `28.76% <0.00%> (?)` | |
   | unittests1 | `67.97% <100.00%> (?)` | |
   | unittests2 | `14.20% <0.00%> (?)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/8133?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...inot/core/data/manager/offline/DimensionTable.java](https://codecov.io/gh/apache/pinot/pull/8133/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvb2ZmbGluZS9EaW1lbnNpb25UYWJsZS5qYXZh) | `100.00% <100.00%> (ø)` | |
   | [...ata/manager/offline/DimensionTableDataManager.java](https://codecov.io/gh/apache/pinot/pull/8133/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvb2ZmbGluZS9EaW1lbnNpb25UYWJsZURhdGFNYW5hZ2VyLmphdmE=) | `87.50% <100.00%> (ø)` | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/8133?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/8133?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [e59730a...1dc69e2](https://codecov.io/gh/apache/pinot/pull/8133?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin merged pull request #8133: Refresh ZK metadata when dimension table is updated

Posted by GitBox <gi...@apache.org>.
richardstartin merged pull request #8133:
URL: https://github.com/apache/pinot/pull/8133


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #8133: Refresh ZK metadata when dimension table is updated

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #8133:
URL: https://github.com/apache/pinot/pull/8133#discussion_r800528420



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/offline/DimensionTableDataManager.java
##########
@@ -118,17 +120,20 @@ public void removeSegment(String segmentName) {
    */
   private void loadLookupTable()
       throws Exception {
-    Map<PrimaryKey, GenericRow> snapshot;
-    Map<PrimaryKey, GenericRow> replacement;
+//    Map<PrimaryKey, GenericRow> snapshot;
+//    Map<PrimaryKey, GenericRow> replacement;

Review comment:
       Obviously, this can't be merged with these comments




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on pull request #8133: Refresh ZK metadata when dimension table is updated

Posted by GitBox <gi...@apache.org>.
richardstartin commented on pull request #8133:
URL: https://github.com/apache/pinot/pull/8133#issuecomment-1030447085


   > Good catch!
   > 
   > We can override `reloadSegment()` method and set the schema to the one passed in. We also need to change `_tableSchema` and `_primaryKeyColumns` to volatile because they can be accessed from a different thread
   
   Actually I think consistency needs to be maintained with the lookup table, so these fields should be updated iff the CAS succeeds. Best way to do this would be to store a volatile wrapper class with all the fields in it, pass the schema and primary key columns into the CAS loop, construct the wrapper then perform the CAS on the wrapper.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter edited a comment on pull request #8133: Refresh ZK metadata when dimension table is updated

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #8133:
URL: https://github.com/apache/pinot/pull/8133#issuecomment-1031199059


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/8133?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > :exclamation: No coverage uploaded for pull request base (`master@e59730a`). [Click here to learn what that means](https://docs.codecov.io/docs/error-reference?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#section-missing-base-commit).
   > The diff coverage is `100.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/8133/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/8133?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@            Coverage Diff            @@
   ##             master    #8133   +/-   ##
   =========================================
     Coverage          ?   71.44%           
     Complexity        ?     4305           
   =========================================
     Files             ?     1625           
     Lines             ?    84215           
     Branches          ?    12602           
   =========================================
     Hits              ?    60167           
     Misses            ?    19952           
     Partials          ?     4096           
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `28.76% <0.00%> (?)` | |
   | integration2 | `27.74% <0.00%> (?)` | |
   | unittests1 | `67.97% <100.00%> (?)` | |
   | unittests2 | `14.20% <0.00%> (?)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/8133?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...inot/core/data/manager/offline/DimensionTable.java](https://codecov.io/gh/apache/pinot/pull/8133/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvb2ZmbGluZS9EaW1lbnNpb25UYWJsZS5qYXZh) | `100.00% <100.00%> (ø)` | |
   | [...ata/manager/offline/DimensionTableDataManager.java](https://codecov.io/gh/apache/pinot/pull/8133/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvb2ZmbGluZS9EaW1lbnNpb25UYWJsZURhdGFNYW5hZ2VyLmphdmE=) | `87.50% <100.00%> (ø)` | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/8133?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/8133?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [e59730a...1dc69e2](https://codecov.io/gh/apache/pinot/pull/8133?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #8133: Refresh ZK metadata when dimension table is updated

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #8133:
URL: https://github.com/apache/pinot/pull/8133#discussion_r800528012



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/data/manager/offline/DimensionTableDataManager.java
##########
@@ -118,17 +120,20 @@ public void removeSegment(String segmentName) {
    */
   private void loadLookupTable()
       throws Exception {
-    Map<PrimaryKey, GenericRow> snapshot;
-    Map<PrimaryKey, GenericRow> replacement;
+//    Map<PrimaryKey, GenericRow> snapshot;
+//    Map<PrimaryKey, GenericRow> replacement;
+    DimensionTable snapshot;
+    DimensionTable replacement;
     do {
-      snapshot = _lookupTable;
-      replacement = new HashMap<>(snapshot.size());
+      snapshot = _dimensionTable;
+      replacement = new DimensionTable();
       populate(replacement);

Review comment:
       Could be `replacement = populate();` then the class's data doesn't need to be mutable.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org