You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "gortiz (via GitHub)" <gi...@apache.org> on 2023/02/28 16:11:56 UTC

[GitHub] [pinot] gortiz opened a new pull request, #10352: Encapsulate changes in IndexLoadingConfig and SegmentGeneratorConfig

gortiz opened a new pull request, #10352:
URL: https://github.com/apache/pinot/pull/10352

   When we were working on #10183, two classes that were quite problematic were `IndexLoadingConfig` and `SegmentGeneratorConfig`. Both classes are usually build from a `TableConfig`, which they use to extract some information and do some simple conversions. This is one example of the friction between the user syntax model and the programmer model that was described in that issue, but that is not actually important in the context of this PR.
   
   What it is important is that these two classes are being heavily modified in tests. What I mean by that is that there are a lot of tests that instead of instanciating these classes by using a `TableConfig` or instead of create new instances each time a modification has to be done, there are tests that directly modify `IndexLoadingConfig` and `SegmentGeneratorConfig`. These modifications are done by directly setting some properties (for example, calling `setInvertedIndexColumns`)  but also by indirectly modifying the mutable collections (for example, calling `getInvertedIndexColumns` and then adding or removing some elements). This means that a `IndexLoadingConfig` may have a reference to a `TableConfig` that says there are no inverted indexes while `IndexLoadingConfig.getInvertedIndexColumns` return a non empty collection. Even worse, there are some situations where incompatible configurations are created programatically.
   
   Ideally, these classes should either not exist or at least should be immutable. That would in fact improve the coverage the tests and, if written correctly, that could even make tests easier to read. But it is not feasible to do that refactor right now.
   
   What this PR does, instead, is to modify `IndexLoadingConfig` and `SegmentGeneratorConfig` in such a way that they return unmodificable collections. By doing that, tests that were indirectly modifying these two objects have been changed to call older or new modification methods.
   
   These changes should not change anything in the production code, as this code should not be doing these tricks. Changes on the PR are very repetitive and most of the changes have been created replacing code patterns in files. In some places manual changes had to be done and these places are the ones where it is more probable to introduce an error. Luckily, given that changes mostly affect the tests, if the CI indicates that tests are correct, we can confidently approve the PR*.
   
   * At least semantically, obviously reviewers may have their own opinions


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on pull request #10352: Encapsulate changes in IndexLoadingConfig and SegmentGeneratorConfig

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on PR #10352:
URL: https://github.com/apache/pinot/pull/10352#issuecomment-1449002253

   Trying to understand why making the `IndexLoadingConfig` fields immutable can help here. The fields within `IndexLoadingConfig` is already a copy of the original config, so it should be okay to modify them.
   Ideally we want to make `IndexLoadingConfig` directly read config from `TableConfig` and `Schema`. #10355 Shows the idea of that, but there are some tests need to be modified.
   The test failure should be caused by `HLRealtimeSegmentDataManager` line 146


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] gortiz commented on pull request #10352: Encapsulate changes in IndexLoadingConfig and SegmentGeneratorConfig

Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz commented on PR #10352:
URL: https://github.com/apache/pinot/pull/10352#issuecomment-1448739077

   I will try to fix the integration test that is failing tomorrow morning (eu time). I think I already solve that issue in index-spi


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] gortiz commented on pull request #10352: Encapsulate changes in IndexLoadingConfig and SegmentGeneratorConfig

Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz commented on PR #10352:
URL: https://github.com/apache/pinot/pull/10352#issuecomment-1449501930

   > Trying to understand why making the IndexLoadingConfig fields immutable can help here. The fields within IndexLoadingConfig is already a copy of the original config, so it should be okay to modify them.
   
   It is mandatory in index-spi, where we assume that index config is always going to be read from a `TableConfig`. What have to do there is different in `IndexLoadingConfig` and `SegmentGeneratorConfig` because they are used in a different way.
   
   In `IndexLoadingConfig` case we needed to add a new method that transform the internal state of `IndexLoadingConfig` into a config object. But to know when we need to call that method we introduced a boolean attribute called `dirty` (in index-spi). That attribute is changed whenever a mutator method is called in `IndexLoadingConfig`. This works... unless tests actually modify the mutable collections of private attributes in `IndexLoadingConfig`, breaking the encapsulation. Here we have two options:
   - To forbid external mutation of private attributes (which improves the encapsulation)
   - To add some _barriers_ in the code. These would be special points when we know that the `IndexLoadingClass` will not be modified again, so we can safely calculate there the index config object  there. This approach is less resilient. If we don't detect one of these _barriers_, then the result is not correct.
   
   I don't remember exactly why, but in `SegmentGeneratorConfig` we couldn't use these techniques, so instead of doing that, `SegmentGeneratorConfig` uses another technique where each time a mutator method is called, the index config is recalculated. This end up being a cleaner solution, but also a bit more expensive (as it needs to create several objects). I may try to implement this solution in `IndexLoadingConfig` (as suggested in the PEP) but both solutions require to remove the external mutation of internal attributes.
   
   > Ideally we want to make IndexLoadingConfig directly read config from TableConfig and Schema
   
   Yes. I know, but meanwhile we need a solution to the external mutation problem. This code is already included in index-spi, but I think that it would be better to add it in a different PR in order to:
   - Reduce the complexity of index-spi. This feature is something orthogonal to index-spi. We shouldn't break the encapsulation of our objects.
   - Forbid new tests created before index-spi is merged to use internal mutability (and therefore make easier to fix conflicts in index-spi).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] gortiz commented on a diff in pull request #10352: Encapsulate changes in IndexLoadingConfig and SegmentGeneratorConfig

Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz commented on code in PR #10352:
URL: https://github.com/apache/pinot/pull/10352#discussion_r1134059456


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/indexsegment/mutable/MutableSegmentImpl.java:
##########
@@ -244,7 +244,7 @@ public boolean isMutableSegment() {
     _logger =
         LoggerFactory.getLogger(MutableSegmentImpl.class.getName() + "_" + _segmentName + "_" + config.getStreamName());
 
-    Set<String> noDictionaryColumns = config.getNoDictionaryColumns();
+    Set<String> noDictionaryColumns = new HashSet<>(config.getNoDictionaryColumns());

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #10352: Encapsulate changes in IndexLoadingConfig and SegmentGeneratorConfig

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #10352:
URL: https://github.com/apache/pinot/pull/10352#discussion_r1122197569


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/IndexLoadingConfig.java:
##########
@@ -386,11 +387,15 @@ public void setSortedColumn(String sortedColumn) {
   }
 
   public Set<String> getInvertedIndexColumns() {
-    return _invertedIndexColumns;
+    return unmodifiable(_invertedIndexColumns);
   }
 
   public Set<String> getRangeIndexColumns() {
-    return _rangeIndexColumns;
+    return unmodifiable(_rangeIndexColumns);
+  }
+
+  public void addRangeIndexColumn(String... columns) {
+    _rangeIndexColumns.addAll(Arrays.asList(columns));

Review Comment:
   Suggest making at always taking one column, and return a boolean (whether it is added)
   ```suggestion
     public boolean addRangeIndexColumn(String column) {
       _rangeIndexColumns.add(column);
   ```



##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/creator/SegmentGeneratorConfig.java:
##########
@@ -249,7 +250,13 @@ public SegmentGeneratorConfig(TableConfig tableConfig, Schema schema) {
   }
 
   public Map<String, Map<String, String>> getColumnProperties() {
-    return _columnProperties;
+    HashMap<String, Map<String, String>> copy = new HashMap<>();

Review Comment:
   Making it immutable during read time can be quite expensive. We can make it immutable during construction time



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/IndexLoadingConfig.java:
##########
@@ -410,35 +415,55 @@ public FSTType getFSTIndexType() {
    * @return a set containing names of text index columns
    */
   public Set<String> getTextIndexColumns() {
-    return _textIndexColumns;
+    return unmodifiable(_textIndexColumns);
   }
 
   public Set<String> getFSTIndexColumns() {
-    return _fstIndexColumns;
+    return unmodifiable(_fstIndexColumns);
   }
 
   public Map<String, JsonIndexConfig> getJsonIndexConfigs() {
-    return _jsonIndexConfigs;
+    return unmodifiable(_jsonIndexConfigs);
   }
 
   public Map<String, H3IndexConfig> getH3IndexConfigs() {
-    return _h3IndexConfigs;
+    return unmodifiable(_h3IndexConfigs);
   }
 
   public Map<String, Map<String, String>> getColumnProperties() {
-    return _columnProperties;
+    return unmodifiable(_columnProperties);
   }
 
   public void setColumnProperties(Map<String, Map<String, String>> columnProperties) {
-    _columnProperties = columnProperties;
+    _columnProperties = new HashMap<>(columnProperties);
   }
 
   /**
    * For tests only.
    */
   @VisibleForTesting
   public void setInvertedIndexColumns(Set<String> invertedIndexColumns) {
-    _invertedIndexColumns = invertedIndexColumns;
+    _invertedIndexColumns = new HashSet<>(invertedIndexColumns);
+  }
+
+  @VisibleForTesting
+  public void addInvertedIndexColumns(String... invertedIndexColumns) {
+    _invertedIndexColumns.addAll(Arrays.asList(invertedIndexColumns));
+  }

Review Comment:
   Same for other places. If user needs batch add, they can use the next method
   ```suggestion
     public boolean addInvertedIndexColumn(String column) {
       return _invertedIndexColumns.add(column);
     }
   ```



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/IndexLoadingConfig.java:
##########
@@ -410,35 +415,55 @@ public FSTType getFSTIndexType() {
    * @return a set containing names of text index columns
    */
   public Set<String> getTextIndexColumns() {
-    return _textIndexColumns;
+    return unmodifiable(_textIndexColumns);
   }
 
   public Set<String> getFSTIndexColumns() {
-    return _fstIndexColumns;
+    return unmodifiable(_fstIndexColumns);
   }
 
   public Map<String, JsonIndexConfig> getJsonIndexConfigs() {
-    return _jsonIndexConfigs;
+    return unmodifiable(_jsonIndexConfigs);
   }
 
   public Map<String, H3IndexConfig> getH3IndexConfigs() {
-    return _h3IndexConfigs;
+    return unmodifiable(_h3IndexConfigs);
   }
 
   public Map<String, Map<String, String>> getColumnProperties() {
-    return _columnProperties;
+    return unmodifiable(_columnProperties);
   }
 
   public void setColumnProperties(Map<String, Map<String, String>> columnProperties) {
-    _columnProperties = columnProperties;
+    _columnProperties = new HashMap<>(columnProperties);
   }
 
   /**
    * For tests only.
    */
   @VisibleForTesting
   public void setInvertedIndexColumns(Set<String> invertedIndexColumns) {
-    _invertedIndexColumns = invertedIndexColumns;
+    _invertedIndexColumns = new HashSet<>(invertedIndexColumns);
+  }
+
+  @VisibleForTesting
+  public void addInvertedIndexColumns(String... invertedIndexColumns) {
+    _invertedIndexColumns.addAll(Arrays.asList(invertedIndexColumns));
+  }
+
+  @VisibleForTesting
+  public void addInvertedIndexColumns(Collection<String> invertedIndexColumns) {
+    _invertedIndexColumns.addAll(invertedIndexColumns);
+  }
+
+  @VisibleForTesting
+  public void removeInvertedIndexColumns(String... invertedIndexColumns) {
+    Arrays.asList(invertedIndexColumns).forEach(_invertedIndexColumns::remove);
+  }

Review Comment:
   Same for other places
   ```suggestion
     public boolean removeInvertedIndexColumn(String column) {
       return _invertedIndexColumns.remove(column);
     }
   ```



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/IndexLoadingConfig.java:
##########
@@ -386,11 +387,15 @@ public void setSortedColumn(String sortedColumn) {
   }
 
   public Set<String> getInvertedIndexColumns() {
-    return _invertedIndexColumns;
+    return unmodifiable(_invertedIndexColumns);
   }
 
   public Set<String> getRangeIndexColumns() {
-    return _rangeIndexColumns;
+    return unmodifiable(_rangeIndexColumns);
+  }
+
+  public void addRangeIndexColumn(String... columns) {
+    _rangeIndexColumns.addAll(Arrays.asList(columns));

Review Comment:
   Seems we have another `addRangeIndexColumns` below



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] gortiz commented on a diff in pull request #10352: Encapsulate changes in IndexLoadingConfig and SegmentGeneratorConfig

Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz commented on code in PR #10352:
URL: https://github.com/apache/pinot/pull/10352#discussion_r1122746290


##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/creator/SegmentGeneratorConfig.java:
##########
@@ -249,7 +250,13 @@ public SegmentGeneratorConfig(TableConfig tableConfig, Schema schema) {
   }
 
   public Map<String, Map<String, String>> getColumnProperties() {
-    return _columnProperties;
+    HashMap<String, Map<String, String>> copy = new HashMap<>();

Review Comment:
   I don't think this is expensive at all. This method is called a couple of times during the segment creation and not even in the hotpath.
   
   Although I can change it to make them immutable at construction time if needed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] gortiz commented on a diff in pull request #10352: Encapsulate changes in IndexLoadingConfig and SegmentGeneratorConfig

Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz commented on code in PR #10352:
URL: https://github.com/apache/pinot/pull/10352#discussion_r1134060052


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/IndexLoadingConfig.java:
##########
@@ -410,35 +415,55 @@ public FSTType getFSTIndexType() {
    * @return a set containing names of text index columns
    */
   public Set<String> getTextIndexColumns() {
-    return _textIndexColumns;
+    return unmodifiable(_textIndexColumns);
   }
 
   public Set<String> getFSTIndexColumns() {
-    return _fstIndexColumns;
+    return unmodifiable(_fstIndexColumns);
   }
 
   public Map<String, JsonIndexConfig> getJsonIndexConfigs() {
-    return _jsonIndexConfigs;
+    return unmodifiable(_jsonIndexConfigs);
   }
 
   public Map<String, H3IndexConfig> getH3IndexConfigs() {
-    return _h3IndexConfigs;
+    return unmodifiable(_h3IndexConfigs);
   }
 
   public Map<String, Map<String, String>> getColumnProperties() {
-    return _columnProperties;
+    return unmodifiable(_columnProperties);
   }
 
   public void setColumnProperties(Map<String, Map<String, String>> columnProperties) {
-    _columnProperties = columnProperties;
+    _columnProperties = new HashMap<>(columnProperties);
   }
 
   /**
    * For tests only.
    */
   @VisibleForTesting
   public void setInvertedIndexColumns(Set<String> invertedIndexColumns) {
-    _invertedIndexColumns = invertedIndexColumns;
+    _invertedIndexColumns = new HashSet<>(invertedIndexColumns);
+  }
+
+  @VisibleForTesting
+  public void addInvertedIndexColumns(String... invertedIndexColumns) {
+    _invertedIndexColumns.addAll(Arrays.asList(invertedIndexColumns));
+  }
+
+  @VisibleForTesting
+  public void addInvertedIndexColumns(Collection<String> invertedIndexColumns) {
+    _invertedIndexColumns.addAll(invertedIndexColumns);
+  }
+
+  @VisibleForTesting
+  public void removeInvertedIndexColumns(String... invertedIndexColumns) {
+    Arrays.asList(invertedIndexColumns).forEach(_invertedIndexColumns::remove);

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] gortiz commented on pull request #10352: Encapsulate changes in IndexLoadingConfig and SegmentGeneratorConfig

Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz commented on PR #10352:
URL: https://github.com/apache/pinot/pull/10352#issuecomment-1448463423

   @Jackie-Jiang please take a look


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang merged pull request #10352: Encapsulate changes in IndexLoadingConfig and SegmentGeneratorConfig

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang merged PR #10352:
URL: https://github.com/apache/pinot/pull/10352


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] gortiz commented on pull request #10352: Encapsulate changes in IndexLoadingConfig and SegmentGeneratorConfig

Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz commented on PR #10352:
URL: https://github.com/apache/pinot/pull/10352#issuecomment-1462180991

   About the tests failing: I've execute them locally and they seem to work. I will do another fake push to force CI to run again, but I don't think these problems are related to this PR


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] jackjlli commented on a diff in pull request #10352: Encapsulate changes in IndexLoadingConfig and SegmentGeneratorConfig

Posted by "jackjlli (via GitHub)" <gi...@apache.org>.
jackjlli commented on code in PR #10352:
URL: https://github.com/apache/pinot/pull/10352#discussion_r1128614697


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/IndexLoadingConfig.java:
##########
@@ -410,35 +415,55 @@ public FSTType getFSTIndexType() {
    * @return a set containing names of text index columns
    */
   public Set<String> getTextIndexColumns() {
-    return _textIndexColumns;
+    return unmodifiable(_textIndexColumns);
   }
 
   public Set<String> getFSTIndexColumns() {
-    return _fstIndexColumns;
+    return unmodifiable(_fstIndexColumns);
   }
 
   public Map<String, JsonIndexConfig> getJsonIndexConfigs() {
-    return _jsonIndexConfigs;
+    return unmodifiable(_jsonIndexConfigs);
   }
 
   public Map<String, H3IndexConfig> getH3IndexConfigs() {
-    return _h3IndexConfigs;
+    return unmodifiable(_h3IndexConfigs);
   }
 
   public Map<String, Map<String, String>> getColumnProperties() {
-    return _columnProperties;
+    return unmodifiable(_columnProperties);
   }
 
   public void setColumnProperties(Map<String, Map<String, String>> columnProperties) {
-    _columnProperties = columnProperties;
+    _columnProperties = new HashMap<>(columnProperties);
   }
 
   /**
    * For tests only.
    */
   @VisibleForTesting
   public void setInvertedIndexColumns(Set<String> invertedIndexColumns) {
-    _invertedIndexColumns = invertedIndexColumns;
+    _invertedIndexColumns = new HashSet<>(invertedIndexColumns);
+  }
+
+  @VisibleForTesting
+  public void addInvertedIndexColumns(String... invertedIndexColumns) {
+    _invertedIndexColumns.addAll(Arrays.asList(invertedIndexColumns));
+  }
+
+  @VisibleForTesting
+  public void addInvertedIndexColumns(Collection<String> invertedIndexColumns) {
+    _invertedIndexColumns.addAll(invertedIndexColumns);
+  }
+
+  @VisibleForTesting
+  public void removeInvertedIndexColumns(String... invertedIndexColumns) {
+    Arrays.asList(invertedIndexColumns).forEach(_invertedIndexColumns::remove);

Review Comment:
   Can this method call the one in Line 465 below? That can help reduce the duplicate code. Same for the other methods which contains two methods for the same purposes.



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/indexsegment/mutable/MutableSegmentImpl.java:
##########
@@ -244,7 +244,7 @@ public boolean isMutableSegment() {
     _logger =
         LoggerFactory.getLogger(MutableSegmentImpl.class.getName() + "_" + _segmentName + "_" + config.getStreamName());
 
-    Set<String> noDictionaryColumns = config.getNoDictionaryColumns();
+    Set<String> noDictionaryColumns = new HashSet<>(config.getNoDictionaryColumns());

Review Comment:
   nit: I know it's because the elements of this set can be removed down below but it'd be good to add some comments here to explain why the copy of `noDictionaryColumns` is made here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter commented on pull request #10352: Encapsulate changes in IndexLoadingConfig and SegmentGeneratorConfig

Posted by "codecov-commenter (via GitHub)" <gi...@apache.org>.
codecov-commenter commented on PR #10352:
URL: https://github.com/apache/pinot/pull/10352#issuecomment-1448621898

   # [Codecov](https://codecov.io/gh/apache/pinot/pull/10352?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#10352](https://codecov.io/gh/apache/pinot/pull/10352?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (0b675f3) into [master](https://codecov.io/gh/apache/pinot/commit/c5dd6df27c384b523c77adcb66956bf0e5c37251?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (c5dd6df) will **decrease** coverage by `47.14%`.
   > The diff coverage is `0.00%`.
   
   ```diff
   @@              Coverage Diff              @@
   ##             master   #10352       +/-   ##
   =============================================
   - Coverage     60.86%   13.72%   -47.14%     
   + Complexity     5007      221     -4786     
   =============================================
     Files          2020     1978       -42     
     Lines        109752   107690     -2062     
     Branches      16693    16440      -253     
   =============================================
   - Hits          66804    14784    -52020     
   - Misses        37925    91730    +53805     
   + Partials       5023     1176     -3847     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `?` | |
   | unittests1 | `?` | |
   | unittests2 | `13.72% <0.00%> (?)` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/10352?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...manager/realtime/LLRealtimeSegmentDataManager.java](https://codecov.io/gh/apache/pinot/pull/10352?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvcmVhbHRpbWUvTExSZWFsdGltZVNlZ21lbnREYXRhTWFuYWdlci5qYXZh) | `0.00% <0.00%> (-67.69%)` | :arrow_down: |
   | [...local/segment/index/loader/IndexLoadingConfig.java](https://codecov.io/gh/apache/pinot/pull/10352?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9JbmRleExvYWRpbmdDb25maWcuamF2YQ==) | `0.00% <0.00%> (-81.79%)` | :arrow_down: |
   | [...ot/segment/spi/creator/SegmentGeneratorConfig.java](https://codecov.io/gh/apache/pinot/pull/10352?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2NyZWF0b3IvU2VnbWVudEdlbmVyYXRvckNvbmZpZy5qYXZh) | `0.00% <0.00%> (-83.16%)` | :arrow_down: |
   | [...src/main/java/org/apache/pinot/sql/FilterKind.java](https://codecov.io/gh/apache/pinot/pull/10352?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcWwvRmlsdGVyS2luZC5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...ain/java/org/apache/pinot/core/data/table/Key.java](https://codecov.io/gh/apache/pinot/pull/10352?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL3RhYmxlL0tleS5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...in/java/org/apache/pinot/spi/utils/BytesUtils.java](https://codecov.io/gh/apache/pinot/pull/10352?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvdXRpbHMvQnl0ZXNVdGlscy5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...ain/java/org/apache/pinot/common/CustomObject.java](https://codecov.io/gh/apache/pinot/pull/10352?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vQ3VzdG9tT2JqZWN0LmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...n/java/org/apache/pinot/core/data/table/Table.java](https://codecov.io/gh/apache/pinot/pull/10352?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL3RhYmxlL1RhYmxlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [.../java/org/apache/pinot/core/data/table/Record.java](https://codecov.io/gh/apache/pinot/pull/10352?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL3RhYmxlL1JlY29yZC5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [.../java/org/apache/pinot/core/util/GroupByUtils.java](https://codecov.io/gh/apache/pinot/pull/10352?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS91dGlsL0dyb3VwQnlVdGlscy5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | ... and [1766 more](https://codecov.io/gh/apache/pinot/pull/10352?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] gortiz commented on a diff in pull request #10352: Encapsulate changes in IndexLoadingConfig and SegmentGeneratorConfig

Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz commented on code in PR #10352:
URL: https://github.com/apache/pinot/pull/10352#discussion_r1122743896


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/IndexLoadingConfig.java:
##########
@@ -386,11 +387,15 @@ public void setSortedColumn(String sortedColumn) {
   }
 
   public Set<String> getInvertedIndexColumns() {
-    return _invertedIndexColumns;
+    return unmodifiable(_invertedIndexColumns);
   }
 
   public Set<String> getRangeIndexColumns() {
-    return _rangeIndexColumns;
+    return unmodifiable(_rangeIndexColumns);
+  }
+
+  public void addRangeIndexColumn(String... columns) {
+    _rangeIndexColumns.addAll(Arrays.asList(columns));

Review Comment:
   I don't like the suggestion of having a single column as parameter. It is easier to use using `String...`. This is the pattern I used in all places because it let you either add one or multiple columns with the same syntax. The returned value is not useful, as it is not used anywhere.
   
   Usually using varargs is not the best because it implies to create an array. But this is not in the hotpath, these methods are only used in tests where we can afford the extra cost.
   
   About the repetition, I have to check it and remove them in case they are actually repeated



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] gortiz commented on a diff in pull request #10352: Encapsulate changes in IndexLoadingConfig and SegmentGeneratorConfig

Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz commented on code in PR #10352:
URL: https://github.com/apache/pinot/pull/10352#discussion_r1122772410


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/IndexLoadingConfig.java:
##########
@@ -410,35 +415,55 @@ public FSTType getFSTIndexType() {
    * @return a set containing names of text index columns
    */
   public Set<String> getTextIndexColumns() {
-    return _textIndexColumns;
+    return unmodifiable(_textIndexColumns);
   }
 
   public Set<String> getFSTIndexColumns() {
-    return _fstIndexColumns;
+    return unmodifiable(_fstIndexColumns);
   }
 
   public Map<String, JsonIndexConfig> getJsonIndexConfigs() {
-    return _jsonIndexConfigs;
+    return unmodifiable(_jsonIndexConfigs);
   }
 
   public Map<String, H3IndexConfig> getH3IndexConfigs() {
-    return _h3IndexConfigs;
+    return unmodifiable(_h3IndexConfigs);
   }
 
   public Map<String, Map<String, String>> getColumnProperties() {
-    return _columnProperties;
+    return unmodifiable(_columnProperties);
   }
 
   public void setColumnProperties(Map<String, Map<String, String>> columnProperties) {
-    _columnProperties = columnProperties;
+    _columnProperties = new HashMap<>(columnProperties);
   }
 
   /**
    * For tests only.
    */
   @VisibleForTesting
   public void setInvertedIndexColumns(Set<String> invertedIndexColumns) {
-    _invertedIndexColumns = invertedIndexColumns;
+    _invertedIndexColumns = new HashSet<>(invertedIndexColumns);
+  }
+
+  @VisibleForTesting
+  public void addInvertedIndexColumns(String... invertedIndexColumns) {
+    _invertedIndexColumns.addAll(Arrays.asList(invertedIndexColumns));
+  }

Review Comment:
   As said above, I don't like the suggestion. If user need to know whether this is going to add something or not, it can check it before calling `getWhateverColumns().contains(column)`. In fact, the caller code never wants to check that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] gortiz commented on a diff in pull request #10352: Encapsulate changes in IndexLoadingConfig and SegmentGeneratorConfig

Posted by "gortiz (via GitHub)" <gi...@apache.org>.
gortiz commented on code in PR #10352:
URL: https://github.com/apache/pinot/pull/10352#discussion_r1134058268


##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/creator/SegmentGeneratorConfig.java:
##########
@@ -249,7 +250,13 @@ public SegmentGeneratorConfig(TableConfig tableConfig, Schema schema) {
   }
 
   public Map<String, Map<String, String>> getColumnProperties() {
-    return _columnProperties;
+    HashMap<String, Map<String, String>> copy = new HashMap<>();

Review Comment:
   I've applied this change



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org