You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2022/01/13 17:53:46 UTC

[GitHub] [pinot] atris opened a new pull request #8016: Implement Real Time Mutable FST

atris opened a new pull request #8016:
URL: https://github.com/apache/pinot/pull/8016


   This PR implements real time mutable FST, allowing Pinot to perform real time full text searches. Mutable FST can be written to, one word at a time without needing to have the entire set available beforehand, nor does it require the input to be sorted.
   
   Mutable FST supports concurrent read and write i.e a single thread can write to the FST and other thread can read from the same.
   
   Mutable FST supports real time updates to searches, reflecting data as it is added. It does not require a flush operation.
   
   Please see the design document at:
   https://docs.google.com/document/d/1O2ttsplFAflkM1Q-8-7yRNrD9EgCCWYm-W63NdC7ghE/edit?usp=sharing
   
   Follow up to convert mutable FST to immutable FST and integration with FST index.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter edited a comment on pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#issuecomment-1012898899


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#8016](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (9524009) into [master](https://codecov.io/gh/apache/pinot/commit/77a706996099f9bb44b90ad506f5205b3c4d7a42?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (77a7069) will **decrease** coverage by `3.20%`.
   > The diff coverage is `62.62%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/8016/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #8016      +/-   ##
   ============================================
   - Coverage     71.31%   68.11%   -3.21%     
   - Complexity     4112     4185      +73     
   ============================================
     Files          1593     1208     -385     
     Lines         82365    60323   -22042     
     Branches      12270     9302    -2968     
   ============================================
   - Hits          58740    41089   -17651     
   + Misses        19660    16342    -3318     
   + Partials       3965     2892    -1073     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `?` | |
   | integration2 | `?` | |
   | unittests1 | `68.11% <62.62%> (-0.34%)` | :arrow_down: |
   | unittests2 | `?` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...ls/nativefst/mutablefst/utils/MutableFSTUtils.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC91dGlscy9uYXRpdmVmc3QvbXV0YWJsZWZzdC91dGlscy9NdXRhYmxlRlNUVXRpbHMuamF2YQ==) | `11.53% <11.53%> (ø)` | |
   | [...local/utils/nativefst/mutablefst/MutableState.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC91dGlscy9uYXRpdmVmc3QvbXV0YWJsZWZzdC9NdXRhYmxlU3RhdGUuamF2YQ==) | `50.00% <50.00%> (ø)` | |
   | [...t/local/utils/nativefst/mutablefst/MutableArc.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC91dGlscy9uYXRpdmVmc3QvbXV0YWJsZWZzdC9NdXRhYmxlQXJjLmphdmE=) | `54.54% <54.54%> (ø)` | |
   | [...cal/utils/nativefst/mutablefst/MutableFSTImpl.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC91dGlscy9uYXRpdmVmc3QvbXV0YWJsZWZzdC9NdXRhYmxlRlNUSW1wbC5qYXZh) | `66.66% <66.66%> (ø)` | |
   | [...l/utils/nativefst/utils/RealTimeRegexpMatcher.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC91dGlscy9uYXRpdmVmc3QvdXRpbHMvUmVhbFRpbWVSZWdleHBNYXRjaGVyLmphdmE=) | `92.30% <92.30%> (ø)` | |
   | [...a/org/apache/pinot/common/metrics/MinionMeter.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vbWV0cmljcy9NaW5pb25NZXRlci5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...g/apache/pinot/common/metrics/ControllerMeter.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vbWV0cmljcy9Db250cm9sbGVyTWV0ZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [.../apache/pinot/common/metrics/BrokerQueryPhase.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vbWV0cmljcy9Ccm9rZXJRdWVyeVBoYXNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [.../apache/pinot/common/metrics/MinionQueryPhase.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vbWV0cmljcy9NaW5pb25RdWVyeVBoYXNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...he/pinot/common/messages/SegmentReloadMessage.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vbWVzc2FnZXMvU2VnbWVudFJlbG9hZE1lc3NhZ2UuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | ... and [665 more](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [77a7069...9524009](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#discussion_r785781557



##########
File path: pinot-perf/src/main/java/org/apache/pinot/perf/BenchmarkMutableFST.java
##########
@@ -0,0 +1,97 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.perf;
+
+import java.io.BufferedReader;
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.util.List;
+import java.util.Objects;
+import java.util.SortedMap;
+import java.util.TreeMap;
+import java.util.concurrent.TimeUnit;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableFST;
+import org.apache.pinot.segment.local.utils.nativefst.utils.RealTimeRegexpMatcher;
+import org.openjdk.jmh.annotations.Benchmark;
+import org.openjdk.jmh.annotations.BenchmarkMode;
+import org.openjdk.jmh.annotations.Fork;
+import org.openjdk.jmh.annotations.Measurement;
+import org.openjdk.jmh.annotations.Mode;
+import org.openjdk.jmh.annotations.OutputTimeUnit;
+import org.openjdk.jmh.annotations.Param;
+import org.openjdk.jmh.annotations.Scope;
+import org.openjdk.jmh.annotations.Setup;
+import org.openjdk.jmh.annotations.State;
+import org.openjdk.jmh.annotations.Warmup;
+import org.openjdk.jmh.infra.Blackhole;
+import org.openjdk.jmh.runner.Runner;
+import org.openjdk.jmh.runner.options.OptionsBuilder;
+import org.roaringbitmap.RoaringBitmapWriter;
+import org.roaringbitmap.buffer.MutableRoaringBitmap;
+
+
+@BenchmarkMode(Mode.AverageTime)
+@OutputTimeUnit(TimeUnit.MICROSECONDS)
+@Warmup(iterations = 3, time = 30)
+@Measurement(iterations = 5, time = 30)
+@Fork(1)
+@State(Scope.Benchmark)
+public class BenchmarkMutableFST {
+  @Param({"q.[aeiou]c.*", ".*a", "b.*", ".*", ".*ated", ".*ba.*"})
+  public String _regex;
+
+  private MutableFST _mutableFST;
+  private org.apache.lucene.util.fst.FST _fst;
+
+  @Setup
+  public void setUp()
+      throws IOException {
+    SortedMap<String, Integer> input = new TreeMap<>();
+    try (BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(
+        Objects.requireNonNull(getClass().getClassLoader().getResourceAsStream("data/words.txt"))))) {
+      String currentWord;
+      int i = 0;
+      while ((currentWord = bufferedReader.readLine()) != null) {
+        _mutableFST.addPath(currentWord, i);
+        input.put(currentWord, i++);
+      }
+    }
+
+    _fst = org.apache.pinot.segment.local.utils.fst.FSTBuilder.buildFST(input);
+  }
+
+  @Benchmark
+  public MutableRoaringBitmap testMutableRegex(Blackhole blackhole) {

Review comment:
       Doesn't need a blackhole parameter.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#discussion_r784288174



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/nativefst/mutablefst/utils/MutableFSTUtils.java
##########
@@ -0,0 +1,117 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.segment.local.utils.nativefst.mutablefst.utils;
+
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableArc;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableFST;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableState;
+import org.apache.pinot.segment.local.utils.nativefst.utils.RealTimeRegexpMatcher;
+import org.roaringbitmap.RoaringBitmapWriter;
+import org.roaringbitmap.buffer.MutableRoaringBitmap;
+
+public class MutableFSTUtils {
+
+  private MutableFSTUtils() {
+
+  }
+
+  public static boolean fstEquals(Object thisFstObj,
+                                  Object thatFstObj) {
+    if (thisFstObj == thatFstObj) {
+      return true;
+    }
+    if (thisFstObj == null || thatFstObj == null) {
+      return false;
+    }
+    if (!MutableFST.class.isAssignableFrom(thisFstObj.getClass())
+        || !MutableFST.class.isAssignableFrom(thatFstObj.getClass())) {
+      return false;
+    }
+
+    MutableFST thisMutableFST = (MutableFST) thisFstObj;
+    MutableFST thatMutableFST = (MutableFST) thatFstObj;
+
+
+    if (thisMutableFST.getStartState() != null ? (thisMutableFST.getStartState().getLabel()
+        != thatMutableFST.getStartState().getLabel()) : thatMutableFST.getStartState() != null) {
+      return false;
+    }
+
+    return true;
+  }
+
+  public static boolean arcEquals(Object thisArcObj, Object thatArcObj) {
+    if (thisArcObj == thatArcObj) {
+      return true;
+    }
+    if (thisArcObj == null || thatArcObj == null) {
+      return false;
+    }
+    if (!MutableArc.class.isAssignableFrom(thisArcObj.getClass())
+        || !MutableArc.class.isAssignableFrom(thatArcObj.getClass())) {
+      return false;
+    }
+    MutableArc thisArc = (MutableArc) thisArcObj;
+    MutableArc thatArc = (MutableArc) thatArcObj;
+
+    if (thisArc.getNextState().getLabel() != thatArc.getNextState().getLabel()) {
+        return false;
+    }
+
+    return true;
+  }
+
+  public static boolean stateEquals(
+      Object thisStateObj, Object thatStateObj) {
+    if (thisStateObj == thatStateObj) {
+      return true;
+    }
+    if (thisStateObj == null || thatStateObj == null) {
+      return false;
+    }
+    if (!MutableState.class.isAssignableFrom(thisStateObj.getClass())
+        || !MutableState.class.isAssignableFrom(thatStateObj.getClass())) {
+      return false;
+    }
+
+    MutableState thisState = (MutableState) thisStateObj;
+    MutableState thatState = (MutableState) thatStateObj;
+
+    if (thisState.getLabel() != thatState.getLabel()) {
+      return false;
+    }
+
+    if (thisState.getArcs().size() != thatState.getArcs().size()) {
+      return false;
+    }
+
+    return true;
+  }

Review comment:
       This condenses into 
   
   ```java
   public static boolean stateEquals(Object thisStateObj, Object thatStateObj) {
       if (thisStateObj == thatStateObj) {
           return true;
       }
       if (thisStateObj instanceof MutableState && thatStateObj instanceof MutableState) {
         MutableState thisState = (MutableState) thisStateObj;
         MutableState thatState = (MutableState) thatStateObj;
         return thisState.getLabel() == thatState.getLabel() && thisState.getArcs().size() == thatState.getArcs().size();
       }
       return false;
   }
   ```
   
   The extensive use of spaces doesn't make this easier to read.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#discussion_r784289254



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/nativefst/mutablefst/utils/MutableFSTUtils.java
##########
@@ -0,0 +1,117 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.segment.local.utils.nativefst.mutablefst.utils;
+
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableArc;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableFST;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableState;
+import org.apache.pinot.segment.local.utils.nativefst.utils.RealTimeRegexpMatcher;
+import org.roaringbitmap.RoaringBitmapWriter;
+import org.roaringbitmap.buffer.MutableRoaringBitmap;
+
+public class MutableFSTUtils {
+
+  private MutableFSTUtils() {
+
+  }
+
+  public static boolean fstEquals(Object thisFstObj,
+                                  Object thatFstObj) {
+    if (thisFstObj == thatFstObj) {
+      return true;
+    }
+    if (thisFstObj == null || thatFstObj == null) {
+      return false;
+    }
+    if (!MutableFST.class.isAssignableFrom(thisFstObj.getClass())
+        || !MutableFST.class.isAssignableFrom(thatFstObj.getClass())) {
+      return false;
+    }
+
+    MutableFST thisMutableFST = (MutableFST) thisFstObj;
+    MutableFST thatMutableFST = (MutableFST) thatFstObj;
+
+
+    if (thisMutableFST.getStartState() != null ? (thisMutableFST.getStartState().getLabel()
+        != thatMutableFST.getStartState().getLabel()) : thatMutableFST.getStartState() != null) {
+      return false;
+    }
+
+    return true;
+  }
+
+  public static boolean arcEquals(Object thisArcObj, Object thatArcObj) {
+    if (thisArcObj == thatArcObj) {
+      return true;
+    }
+    if (thisArcObj == null || thatArcObj == null) {
+      return false;
+    }
+    if (!MutableArc.class.isAssignableFrom(thisArcObj.getClass())
+        || !MutableArc.class.isAssignableFrom(thatArcObj.getClass())) {
+      return false;
+    }
+    MutableArc thisArc = (MutableArc) thisArcObj;
+    MutableArc thatArc = (MutableArc) thatArcObj;
+
+    if (thisArc.getNextState().getLabel() != thatArc.getNextState().getLabel()) {
+        return false;
+    }
+
+    return true;
+  }
+
+  public static boolean stateEquals(
+      Object thisStateObj, Object thatStateObj) {
+    if (thisStateObj == thatStateObj) {
+      return true;
+    }
+    if (thisStateObj == null || thatStateObj == null) {
+      return false;
+    }
+    if (!MutableState.class.isAssignableFrom(thisStateObj.getClass())
+        || !MutableState.class.isAssignableFrom(thatStateObj.getClass())) {
+      return false;
+    }
+
+    MutableState thisState = (MutableState) thisStateObj;
+    MutableState thatState = (MutableState) thatStateObj;
+
+    if (thisState.getLabel() != thatState.getLabel()) {
+      return false;
+    }
+
+    if (thisState.getArcs().size() != thatState.getArcs().size()) {
+      return false;
+    }
+
+    return true;
+  }

Review comment:
       (this applies to the methods above too. `instanceof` checks fail when references are null.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] atris commented on pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
atris commented on pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#issuecomment-1014231281


   > I like the neatness of the algorithm. There are simple paths forward towards optimising it which could be made now if you want, or later if it proves necessary.
   > 
   A more optimised version is here: https://github.com/atris/thunderbolt/blob/main/src/main/java/io/github/atris/thunderbolt/MutableFst.java
   
   I do not believe we need this yet, but if needed, we can optimise later.
   
   > I have two general points about the layout of the code (I don't like focussing on this sort of thing, but feel it's necessary here):
   > 
   > 1. It needs to be formatted properly: https://github.com/apache/pinot/blob/master/config/codestyle-intellij.xml
   > 2. There are too many blank lines. In my humble opinion, these do not aid the reader at all. On many occasions I look at 15 lines of code and notice they compress to 5. It really obscures the logic by occupying screen space.
   
   Thanks for pointing that out. The problem is that my laptop is M1 Pro based, so a global build of Pinot fails on it. I am dependent on Linter checks to tell me if check style is failing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#discussion_r785790085



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/nativefst/mutablefst/MutableFSTImpl.java
##########
@@ -0,0 +1,196 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.segment.local.utils.nativefst.mutablefst;
+
+import com.google.common.base.Preconditions;
+import java.util.ArrayDeque;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Objects;
+import java.util.Queue;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.utils.MutableFSTUtils;
+
+
+/**
+ * A mutable finite state transducer implementation that allows you to build mutable via the API.
+ * This is not thread safe; convert to an ImmutableFst if you need to share across multiple writer
+ * threads.
+ *
+ * Concurrently writing and reading to/from a mutable FST is supported.
+ */
+public class MutableFSTImpl implements MutableFST {
+  private MutableState _start;
+
+  public MutableFSTImpl() {
+    _start = new MutableState(true);
+  }
+
+  /**
+   * Get the initial states
+   */
+  @Override
+  public MutableState getStartState() {
+    return _start;
+  }
+
+  /**
+   * Set the initial state
+   *
+   * @param start the initial state
+   */
+  @Override
+  public void setStartState(MutableState start) {
+    Preconditions.checkState(_start != null, "Cannot override a start state");
+    _start = start;
+  }
+
+  public MutableState newStartState() {
+    return newStartState();
+  }
+
+  public MutableArc addArc(MutableState startState, int outputSymbol, MutableState endState) {
+    MutableArc newArc = new MutableArc(outputSymbol,
+                                        endState);

Review comment:
       formatting: should be on previous line




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter edited a comment on pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#issuecomment-1012898899


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#8016](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (9524009) into [master](https://codecov.io/gh/apache/pinot/commit/77a706996099f9bb44b90ad506f5205b3c4d7a42?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (77a7069) will **decrease** coverage by `6.45%`.
   > The diff coverage is `62.62%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/8016/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #8016      +/-   ##
   ============================================
   - Coverage     71.31%   64.85%   -6.46%     
   - Complexity     4112     4266     +154     
   ============================================
     Files          1593     1559      -34     
     Lines         82365    81278    -1087     
     Branches      12270    12206      -64     
   ============================================
   - Hits          58740    52716    -6024     
   - Misses        19660    24808    +5148     
   + Partials       3965     3754     -211     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `?` | |
   | integration2 | `?` | |
   | unittests1 | `68.11% <62.62%> (-0.34%)` | :arrow_down: |
   | unittests2 | `14.30% <0.00%> (-0.02%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...ls/nativefst/mutablefst/utils/MutableFSTUtils.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC91dGlscy9uYXRpdmVmc3QvbXV0YWJsZWZzdC91dGlscy9NdXRhYmxlRlNUVXRpbHMuamF2YQ==) | `11.53% <11.53%> (ø)` | |
   | [...local/utils/nativefst/mutablefst/MutableState.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC91dGlscy9uYXRpdmVmc3QvbXV0YWJsZWZzdC9NdXRhYmxlU3RhdGUuamF2YQ==) | `50.00% <50.00%> (ø)` | |
   | [...t/local/utils/nativefst/mutablefst/MutableArc.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC91dGlscy9uYXRpdmVmc3QvbXV0YWJsZWZzdC9NdXRhYmxlQXJjLmphdmE=) | `54.54% <54.54%> (ø)` | |
   | [...cal/utils/nativefst/mutablefst/MutableFSTImpl.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC91dGlscy9uYXRpdmVmc3QvbXV0YWJsZWZzdC9NdXRhYmxlRlNUSW1wbC5qYXZh) | `66.66% <66.66%> (ø)` | |
   | [...l/utils/nativefst/utils/RealTimeRegexpMatcher.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC91dGlscy9uYXRpdmVmc3QvdXRpbHMvUmVhbFRpbWVSZWdleHBNYXRjaGVyLmphdmE=) | `92.30% <92.30%> (ø)` | |
   | [...a/org/apache/pinot/common/metrics/MinionMeter.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vbWV0cmljcy9NaW5pb25NZXRlci5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...g/apache/pinot/common/metrics/ControllerMeter.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vbWV0cmljcy9Db250cm9sbGVyTWV0ZXIuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [.../apache/pinot/common/metrics/BrokerQueryPhase.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vbWV0cmljcy9Ccm9rZXJRdWVyeVBoYXNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [.../apache/pinot/common/metrics/MinionQueryPhase.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vbWV0cmljcy9NaW5pb25RdWVyeVBoYXNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...he/pinot/common/messages/SegmentReloadMessage.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vbWVzc2FnZXMvU2VnbWVudFJlbG9hZE1lc3NhZ2UuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | ... and [430 more](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [77a7069...9524009](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] mayankshriv commented on a change in pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
mayankshriv commented on a change in pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#discussion_r786277106



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/nativefst/mutablefst/MutableFST.java
##########
@@ -0,0 +1,66 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.segment.local.utils.nativefst.mutablefst;
+
+
+/**
+ * A mutable FST represents a FST which can have arbitrary inputs added to it at
+ * any given point of time. Unlike a normal FST which is build once read many, mutable
+ * FST can be concurrently written to and read from. Mutable FST provides real time search
+ * i.e. search will see words as they are added without needing a flush.
+ *
+ * Unlike a normal FST, mutable FST does not require the entire input beforehand nor
+ * does it require the input to be sorted. Single word additions work well with
+ * mutable FST.
+ *
+ * The reason as to why normal FST and mutable FST have different interfaces is because

Review comment:
       Does this mean that there will have to be separate code to use mutable vs immutable FST? An abstraction that avoids that would be great.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#discussion_r784289923



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/nativefst/utils/RealTimeRegexpMatcher.java
##########
@@ -0,0 +1,159 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.utils.nativefst.utils;
+
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Set;
+
+import org.apache.pinot.segment.local.utils.nativefst.automaton.Automaton;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.CharacterRunAutomaton;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.RegExp;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.State;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.Transition;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableArc;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableFST;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableState;
+import org.roaringbitmap.IntConsumer;
+
+public class RealTimeRegexpMatcher {
+  private final String _regexQuery;
+  private final MutableFST _fst;
+  private final Automaton _automaton;
+  private final IntConsumer _dest;
+
+  public RealTimeRegexpMatcher(String regexQuery, MutableFST fst, IntConsumer dest) {
+    _regexQuery = regexQuery;
+    _fst = fst;
+    _dest = dest;
+
+    _automaton = new RegExp(_regexQuery).toAutomaton();
+  }
+
+  public static void regexMatch(String regexQuery, MutableFST fst, IntConsumer dest) {
+    RealTimeRegexpMatcher matcher = new RealTimeRegexpMatcher(regexQuery, fst, dest);
+    matcher.regexMatchOnFST();
+  }
+
+  // Matches "input" string with _regexQuery Automaton.
+  public boolean match(String input) {
+    CharacterRunAutomaton characterRunAutomaton = new CharacterRunAutomaton(_automaton);
+    return characterRunAutomaton.run(input);
+  }
+
+  /**
+   * This function runs matching on automaton built from regexQuery and the FST.
+   * FST stores key (string) to a value (Long). Both are state machines and state transition is based on
+   * a input character.
+   *
+   * This algorithm starts with Queue containing (Automaton Start Node, FST Start Node).
+   * Each step an entry is popped from the queue:
+   *    1) if the automaton state is accept and the FST Node is final (i.e. end node) then the value stored for that FST
+   *       is added to the set of result.
+   *    2) Else next set of transitions on automaton are gathered and for each transition target node for that character
+   *       is figured out in FST Node, resulting pair of (automaton state, fst node) are added to the queue.
+   *    3) This process is bound to complete since we are making progression on the FST (which is a DAG) towards final
+   *       nodes.
+   */
+  public void regexMatchOnFST() {
+    final List<Path> queue = new ArrayList<>();
+
+    if (_automaton.getNumberOfStates() == 0) {
+      return;
+    }
+
+    // Automaton start state and FST start node is added to the queue.
+    queue.add(new Path(_automaton.getInitialState(), (MutableState) _fst.getStartState(), null, new ArrayList<>()));
+
+    Set<State> acceptStates = _automaton.getAcceptStates();
+    while (queue.size() != 0) {

Review comment:
       `while (!queue.isEmpty())`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] atris commented on pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
atris commented on pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#issuecomment-1020904236


   This PR will be merged in 24 hours from now, barring objections


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#discussion_r785788294



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/nativefst/utils/RealTimeRegexpMatcher.java
##########
@@ -0,0 +1,156 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.utils.nativefst.utils;
+
+import java.util.ArrayDeque;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Queue;
+import java.util.Set;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.Automaton;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.CharacterRunAutomaton;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.RegExp;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.State;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.Transition;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableArc;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableFST;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableState;
+import org.roaringbitmap.IntConsumer;
+
+public class RealTimeRegexpMatcher {
+  private final String _regexQuery;
+  private final MutableFST _fst;
+  private final Automaton _automaton;
+  private final IntConsumer _dest;
+
+  public RealTimeRegexpMatcher(String regexQuery, MutableFST fst, IntConsumer dest) {
+    _regexQuery = regexQuery;
+    _fst = fst;
+    _dest = dest;
+

Review comment:
       remove blank line




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] atris commented on pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
atris commented on pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#issuecomment-1014338481


   > > > I have two general points about the layout of the code (I don't like focussing on this sort of thing, but feel it's necessary here):
   > > > 
   > > > 1. It needs to be formatted properly: https://github.com/apache/pinot/blob/master/config/codestyle-intellij.xml
   > > > 2. There are too many blank lines. In my humble opinion, these do not aid the reader at all. On many occasions I look at 15 lines of code and notice they compress to 5. It really obscures the logic by occupying screen space.
   > > 
   > > 
   > > Thanks for pointing that out. The problem is that my laptop is M1 Pro based, so a global build of Pinot fails on it. I am dependent on Linter checks to tell me if check style is failing.
   > 
   > I would certainly prefer to automate style checking because I want code to be readable automatically without needing to review it for readability. If a simple method doesn't fit on a screen without scrolling, it's unreadable. I've proposed using checkstyle to ban blank lines within functions to @Jackie-Jiang to help specifically with this case, but he wasn't too keen on the idea and reasoned that the _occasional_ blank line can help readability.
   > 
   > Regarding M1s, I think @dongxiaoman uses M1 and managed to build Pinot, maybe he can help?
   
   Thanks, I will check with @dongxiaoman.
   
   I would argue that the spacing around blanks is a subjective thing, unless somebody is putting 3 blanks consecutively (in which case, our check style does catch it).
   
   Beyond that, the definition of readability is YMMV. As long as it is not obviously and immediately evident that something is broken, we should IMO not let subjectivity take control


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] walterddr commented on a change in pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
walterddr commented on a change in pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#discussion_r786981509



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/nativefst/utils/RealTimeRegexpMatcher.java
##########
@@ -0,0 +1,156 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.utils.nativefst.utils;
+
+import java.util.ArrayDeque;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Queue;
+import java.util.Set;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.Automaton;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.CharacterRunAutomaton;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.RegExp;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.State;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.Transition;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableArc;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableFST;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableState;
+import org.roaringbitmap.IntConsumer;
+
+public class RealTimeRegexpMatcher {
+  private final String _regexQuery;
+  private final MutableFST _fst;
+  private final Automaton _automaton;
+  private final IntConsumer _dest;
+
+  public RealTimeRegexpMatcher(String regexQuery, MutableFST fst, IntConsumer dest) {
+    _regexQuery = regexQuery;
+    _fst = fst;
+    _dest = dest;
+
+    _automaton = new RegExp(_regexQuery).toAutomaton();
+  }
+
+  public static void regexMatch(String regexQuery, MutableFST fst, IntConsumer dest) {
+    RealTimeRegexpMatcher matcher = new RealTimeRegexpMatcher(regexQuery, fst, dest);
+    matcher.regexMatchOnFST();
+  }
+
+  // Matches "input" string with _regexQuery Automaton.
+  public boolean match(String input) {
+    CharacterRunAutomaton characterRunAutomaton = new CharacterRunAutomaton(_automaton);
+    return characterRunAutomaton.run(input);
+  }
+
+  /**
+   * This function runs matching on automaton built from regexQuery and the FST.
+   * FST stores key (string) to a value (Long). Both are state machines and state transition is based on
+   * a input character.
+   *
+   * This algorithm starts with Queue containing (Automaton Start Node, FST Start Node).
+   * Each step an entry is popped from the queue:
+   *    1) if the automaton state is accept and the FST Node is final (i.e. end node) then the value stored for that FST
+   *       is added to the set of result.
+   *    2) Else next set of transitions on automaton are gathered and for each transition target node for that character
+   *       is figured out in FST Node, resulting pair of (automaton state, fst node) are added to the queue.
+   *    3) This process is bound to complete since we are making progression on the FST (which is a DAG) towards final
+   *       nodes.
+   */
+  public void regexMatchOnFST() {

Review comment:
       Sorry for jumping in the discussion late. I myself have the habit to add many line separators for readability as well so glad to see some discussions about this. In general I found it hard to enforce a checkstyle on readability as I agree it is rather subjective. (other than those obvious ones such as https://google.github.io/styleguide/javaguide.html). 
   
   my own opinion here is to follow as close to the OSS codebase standard from as possible - this gives a consistency sense to users as well as first-time contributors. But without incurring too much overhead, I don't think we want to discourage contribution because of formatting in general.
   
   but i saw richard here provided many constructive feedback as well beside the formatting; in my experience ppl who reviewed extensively on such a large PR will also be the ones who maintain it and collaborate together going forward. it would be great to reach a reasonable readability middle ground.
   
   
   This is just my personal opinion, I am also pretty new to Pinot so please disregard if it doesn't seem to make sense. But definitely will use this as a standard for my contribution going forward if we can reach a general guideline
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] atris commented on a change in pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
atris commented on a change in pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#discussion_r795395417



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/nativefst/utils/RealTimeRegexpMatcher.java
##########
@@ -0,0 +1,156 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.utils.nativefst.utils;
+
+import java.util.ArrayDeque;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Queue;
+import java.util.Set;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.Automaton;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.CharacterRunAutomaton;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.RegExp;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.State;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.Transition;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableArc;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableFST;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableState;
+import org.roaringbitmap.IntConsumer;
+
+public class RealTimeRegexpMatcher {
+  private final String _regexQuery;
+  private final MutableFST _fst;
+  private final Automaton _automaton;
+  private final IntConsumer _dest;
+
+  public RealTimeRegexpMatcher(String regexQuery, MutableFST fst, IntConsumer dest) {
+    _regexQuery = regexQuery;
+    _fst = fst;
+    _dest = dest;
+
+    _automaton = new RegExp(_regexQuery).toAutomaton();
+  }
+
+  public static void regexMatch(String regexQuery, MutableFST fst, IntConsumer dest) {
+    RealTimeRegexpMatcher matcher = new RealTimeRegexpMatcher(regexQuery, fst, dest);
+    matcher.regexMatchOnFST();
+  }
+
+  // Matches "input" string with _regexQuery Automaton.
+  public boolean match(String input) {
+    CharacterRunAutomaton characterRunAutomaton = new CharacterRunAutomaton(_automaton);
+    return characterRunAutomaton.run(input);
+  }
+
+  /**
+   * This function runs matching on automaton built from regexQuery and the FST.
+   * FST stores key (string) to a value (Long). Both are state machines and state transition is based on
+   * a input character.
+   *
+   * This algorithm starts with Queue containing (Automaton Start Node, FST Start Node).
+   * Each step an entry is popped from the queue:
+   *    1) if the automaton state is accept and the FST Node is final (i.e. end node) then the value stored for that FST
+   *       is added to the set of result.
+   *    2) Else next set of transitions on automaton are gathered and for each transition target node for that character
+   *       is figured out in FST Node, resulting pair of (automaton state, fst node) are added to the queue.
+   *    3) This process is bound to complete since we are making progression on the FST (which is a DAG) towards final
+   *       nodes.
+   */
+  public void regexMatchOnFST() {

Review comment:
       Fixed to best of my knowledge




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#discussion_r787207392



##########
File path: pinot-perf/src/main/java/org/apache/pinot/perf/BenchmarkMutableFST.java
##########
@@ -0,0 +1,97 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.perf;
+
+import java.io.BufferedReader;
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.util.List;
+import java.util.Objects;
+import java.util.SortedMap;
+import java.util.TreeMap;
+import java.util.concurrent.TimeUnit;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableFST;
+import org.apache.pinot.segment.local.utils.nativefst.utils.RealTimeRegexpMatcher;
+import org.openjdk.jmh.annotations.Benchmark;
+import org.openjdk.jmh.annotations.BenchmarkMode;
+import org.openjdk.jmh.annotations.Fork;
+import org.openjdk.jmh.annotations.Measurement;
+import org.openjdk.jmh.annotations.Mode;
+import org.openjdk.jmh.annotations.OutputTimeUnit;
+import org.openjdk.jmh.annotations.Param;
+import org.openjdk.jmh.annotations.Scope;
+import org.openjdk.jmh.annotations.Setup;
+import org.openjdk.jmh.annotations.State;
+import org.openjdk.jmh.annotations.Warmup;
+import org.openjdk.jmh.infra.Blackhole;
+import org.openjdk.jmh.runner.Runner;
+import org.openjdk.jmh.runner.options.OptionsBuilder;
+import org.roaringbitmap.RoaringBitmapWriter;
+import org.roaringbitmap.buffer.MutableRoaringBitmap;
+
+
+@BenchmarkMode(Mode.AverageTime)
+@OutputTimeUnit(TimeUnit.MICROSECONDS)
+@Warmup(iterations = 3, time = 30)
+@Measurement(iterations = 5, time = 30)
+@Fork(1)
+@State(Scope.Benchmark)
+public class BenchmarkMutableFST {
+  @Param({"q.[aeiou]c.*", ".*a", "b.*", ".*", ".*ated", ".*ba.*"})
+  public String _regex;
+
+  private MutableFST _mutableFST;
+  private org.apache.lucene.util.fst.FST _fst;
+
+  @Setup
+  public void setUp()
+      throws IOException {
+    SortedMap<String, Integer> input = new TreeMap<>();
+    try (BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(
+        Objects.requireNonNull(getClass().getClassLoader().getResourceAsStream("data/words.txt"))))) {
+      String currentWord;
+      int i = 0;
+      while ((currentWord = bufferedReader.readLine()) != null) {
+        _mutableFST.addPath(currentWord, i);
+        input.put(currentWord, i++);
+      }
+    }
+
+    _fst = org.apache.pinot.segment.local.utils.fst.FSTBuilder.buildFST(input);

Review comment:
       Not resolved.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] siddharthteotia edited a comment on pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
siddharthteotia edited a comment on pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#issuecomment-1016209462


   Hi @atris, I will review this PR tomorrow


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] atris commented on a change in pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
atris commented on a change in pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#discussion_r785695708



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/nativefst/mutablefst/MutableFSTImpl.java
##########
@@ -0,0 +1,198 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.segment.local.utils.nativefst.mutablefst;
+
+import com.google.common.base.Preconditions;
+
+import java.util.ArrayList;
+import java.util.List;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.utils.MutableFSTUtils;
+
+
+/**
+ * A mutable finite state transducer implementation that allows you to build mutable via the API.
+ * This is not thread safe; convert to an ImmutableFst if you need to share across multiple writer
+ * threads.
+ *
+ * Concurrently writing and reading to/from a mutable FST is supported.
+ */
+public class MutableFSTImpl implements MutableFST {
+  private MutableState _start;
+
+  public MutableFSTImpl() {
+    _start = new MutableState(true);
+  }
+
+  /**
+   * Get the initial states
+   */
+  @Override
+  public MutableState getStartState() {
+    return _start;
+  }
+
+  /**
+   * Set the initial state
+   *
+   * @param start the initial state
+   */
+  @Override
+  public void setStartState(MutableState start) {
+    if (_start != null) {
+      throw new IllegalStateException("Cannot override a start state");
+    }
+
+    _start = start;
+  }
+
+  public MutableState newStartState() {
+    return newStartState();
+  }
+
+  public MutableArc addArc(MutableState startState, int outputSymbol, MutableState endState) {
+    MutableArc newArc = new MutableArc(outputSymbol,
+                                        endState);
+    startState.addArc(newArc);
+    endState.addIncomingState(startState);
+    return newArc;
+  }
+
+  @Override
+  public void throwIfInvalid() {
+    Preconditions.checkNotNull(_start, "must have a start state");
+  }
+
+  @Override
+  public void addPath(String word, int outputSymbol) {
+    MutableState state = getStartState();
+
+    if (state == null) {
+      throw new IllegalStateException("Start state cannot be null");
+    }
+
+    List<MutableArc> arcs = state.getArcs();
+
+    boolean isFound = false;
+
+    for (MutableArc arc : arcs) {
+      if (arc.getNextState().getLabel() == word.charAt(0)) {
+        state = arc.getNextState();
+        isFound = true;
+        break;
+      }
+    }
+
+    int foundPos = -1;
+
+    if (isFound) {
+      Pair<MutableState, Integer> pair = findPointOfDiversion(state, word, 0);
+
+      if (pair == null) {
+        // Word already exists
+        return;
+      }
+
+      foundPos = pair.getRight();
+      state = pair.getLeft();
+    }
+
+    for (int i = foundPos + 1; i < word.length(); i++) {
+      MutableState nextState = new MutableState();
+
+      nextState.setLabel(word.charAt(i));
+
+      int currentOutputSymbol = -1;
+
+      if (i == word.length() - 1) {
+        currentOutputSymbol = outputSymbol;
+      }
+
+      MutableArc mutableArc = new MutableArc(currentOutputSymbol, nextState);
+      state.addArc(mutableArc);
+
+      state = nextState;
+    }
+
+    state.setIsTerminal(true);
+  }
+
+  private Pair<MutableState, Integer> findPointOfDiversion(MutableState mutableState,
+      String word, int currentPos) {
+    if (currentPos == word.length() - 1) {
+      return null;
+    }
+
+    if (mutableState.getLabel() != word.charAt(currentPos)) {
+      throw new IllegalStateException("Current state needs to be part of word path");
+    }
+
+    List<MutableArc> arcs = mutableState.getArcs();
+
+    for (MutableArc arc : arcs) {
+      if (arc.getNextState().getLabel() == word.charAt(currentPos + 1)) {
+        return findPointOfDiversion(arc.getNextState(), word, currentPos + 1);
+      }
+    }
+
+    return Pair.of(mutableState, currentPos);
+  }

Review comment:
       Moved to an iterative model




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
richardstartin commented on pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#issuecomment-1014335321


   > > I have two general points about the layout of the code (I don't like focussing on this sort of thing, but feel it's necessary here):
   > > 
   > > 1. It needs to be formatted properly: https://github.com/apache/pinot/blob/master/config/codestyle-intellij.xml
   > > 2. There are too many blank lines. In my humble opinion, these do not aid the reader at all. On many occasions I look at 15 lines of code and notice they compress to 5. It really obscures the logic by occupying screen space.
   > 
   > Thanks for pointing that out. The problem is that my laptop is M1 Pro based, so a global build of Pinot fails on it. I am dependent on Linter checks to tell me if check style is failing.
   
   I would certainly prefer to automate style checking because I want code to be readable automatically without needing to review it for readability. If a simple method doesn't fit on a screen without scrolling, it's unreadable. I've proposed using checkstyle to ban blank lines within functions to @Jackie-Jiang to help specifically with this case, but he wasn't too keen on the idea and reasoned that the _occasional_ blank line can help readability. 
   
   Regarding M1s, I think @dongxiaoman uses M1 and managed to build Pinot, maybe he can help?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] mayankshriv commented on a change in pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
mayankshriv commented on a change in pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#discussion_r786280478



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/nativefst/utils/RealTimeRegexpMatcher.java
##########
@@ -0,0 +1,156 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.utils.nativefst.utils;
+
+import java.util.ArrayDeque;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Queue;
+import java.util.Set;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.Automaton;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.CharacterRunAutomaton;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.RegExp;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.State;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.Transition;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableArc;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableFST;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableState;
+import org.roaringbitmap.IntConsumer;
+
+public class RealTimeRegexpMatcher {
+  private final String _regexQuery;
+  private final MutableFST _fst;
+  private final Automaton _automaton;
+  private final IntConsumer _dest;
+
+  public RealTimeRegexpMatcher(String regexQuery, MutableFST fst, IntConsumer dest) {
+    _regexQuery = regexQuery;
+    _fst = fst;
+    _dest = dest;
+
+    _automaton = new RegExp(_regexQuery).toAutomaton();
+  }
+
+  public static void regexMatch(String regexQuery, MutableFST fst, IntConsumer dest) {
+    RealTimeRegexpMatcher matcher = new RealTimeRegexpMatcher(regexQuery, fst, dest);
+    matcher.regexMatchOnFST();
+  }
+
+  // Matches "input" string with _regexQuery Automaton.
+  public boolean match(String input) {
+    CharacterRunAutomaton characterRunAutomaton = new CharacterRunAutomaton(_automaton);
+    return characterRunAutomaton.run(input);
+  }
+
+  /**
+   * This function runs matching on automaton built from regexQuery and the FST.
+   * FST stores key (string) to a value (Long). Both are state machines and state transition is based on
+   * a input character.
+   *
+   * This algorithm starts with Queue containing (Automaton Start Node, FST Start Node).
+   * Each step an entry is popped from the queue:
+   *    1) if the automaton state is accept and the FST Node is final (i.e. end node) then the value stored for that FST
+   *       is added to the set of result.
+   *    2) Else next set of transitions on automaton are gathered and for each transition target node for that character
+   *       is figured out in FST Node, resulting pair of (automaton state, fst node) are added to the queue.
+   *    3) This process is bound to complete since we are making progression on the FST (which is a DAG) towards final
+   *       nodes.
+   */
+  public void regexMatchOnFST() {

Review comment:
       For the blank lines, may I suggest that while these are a bit subjective, there are some practical implications of these:
   - Readability of code on whether it looks like a giant wall of text, vs too sparse such that a method won't fit in a page.
   - Next person touching the code may alter these, leading to unnecessary diffs.
   
   My $0.02 would be to enforce this via formatter, or guidelines (if formatter rules cannot be customized).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#discussion_r785790611



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/nativefst/mutablefst/MutableState.java
##########
@@ -0,0 +1,143 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.segment.local.utils.nativefst.mutablefst;
+
+import com.google.common.collect.Lists;
+import com.google.common.collect.Sets;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Set;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.utils.MutableFSTUtils;
+
+
+/**
+ * The fst's mutable state implementation.
+ *
+ * Holds its outgoing {@link MutableArc} objects in an ArrayList allowing additions/deletions
+ */
+public class MutableState {
+
+  protected char _label;
+
+  // Is terminal
+  protected boolean _isTerminal;
+
+  // Is first state
+  protected boolean _isStartState;
+
+  // Outgoing arcs
+  private final ArrayList<MutableArc> _arcs;
+
+  // Incoming arcs (at least states with arcs that are incoming to us)
+  private final Set<MutableState> _incomingStates = Sets.newIdentityHashSet();
+
+  /**
+   * Default Constructor
+   */
+  public MutableState() {
+    _arcs = Lists.newArrayList();
+  }
+
+  public MutableState(boolean isStartState) {
+    _isStartState = isStartState;
+    _arcs = Lists.newArrayList();
+  }
+
+  public boolean isTerminal() {
+    return _isTerminal;
+  }
+
+  public boolean isStartState() {
+    return _isStartState;
+  }
+
+  public char getLabel() {
+    return _label;
+  }
+
+  public void setLabel(char label) {
+    _label = label;
+  }
+
+  public void setIsTerminal(boolean isTerminal) {
+    _isTerminal = isTerminal;
+  }
+
+  /**
+   * Get the number of outgoing arcs
+   */
+  public int getArcCount() {
+    return _arcs.size();
+  }
+
+  /**
+   * Get an arc based on it's index the arcs ArrayList
+   *
+   * @param index the arc's index
+   * @return the arc
+   */
+  public MutableArc getArc(int index) {
+    return _arcs.get(index);
+  }
+
+  public List<MutableArc> getArcs() {
+    return _arcs;
+  }
+
+  @Override
+  public String toString() {
+    return "(" + _label + ")";
+  }
+
+  // adds an arc but should only be used by MutableFst
+  void addArc(MutableArc arc) {
+      _arcs.add(arc);
+    }
+
+  void addIncomingState(MutableState inState) {
+    if (inState == this) {
+      return;
+    }
+

Review comment:
       remove blank line




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#discussion_r794325117



##########
File path: pinot-perf/src/main/java/org/apache/pinot/perf/BenchmarkMutableFST.java
##########
@@ -0,0 +1,100 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.perf;
+
+import java.io.BufferedReader;
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.util.List;
+import java.util.Objects;
+import java.util.SortedMap;
+import java.util.TreeMap;
+import java.util.concurrent.TimeUnit;
+import org.apache.pinot.segment.local.utils.fst.FSTBuilder;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableFST;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableFSTImpl;
+import org.apache.pinot.segment.local.utils.nativefst.utils.RealTimeRegexpMatcher;
+import org.openjdk.jmh.annotations.Benchmark;
+import org.openjdk.jmh.annotations.BenchmarkMode;
+import org.openjdk.jmh.annotations.Fork;
+import org.openjdk.jmh.annotations.Measurement;
+import org.openjdk.jmh.annotations.Mode;
+import org.openjdk.jmh.annotations.OutputTimeUnit;
+import org.openjdk.jmh.annotations.Param;
+import org.openjdk.jmh.annotations.Scope;
+import org.openjdk.jmh.annotations.Setup;
+import org.openjdk.jmh.annotations.State;
+import org.openjdk.jmh.annotations.Warmup;
+import org.openjdk.jmh.runner.Runner;
+import org.openjdk.jmh.runner.options.OptionsBuilder;
+import org.roaringbitmap.RoaringBitmapWriter;
+import org.roaringbitmap.buffer.MutableRoaringBitmap;
+
+
+@BenchmarkMode(Mode.AverageTime)
+@OutputTimeUnit(TimeUnit.MICROSECONDS)
+@Warmup(iterations = 3, time = 30)
+@Measurement(iterations = 5, time = 30)
+@Fork(1)
+@State(Scope.Benchmark)

Review comment:
       This class still has several unnecessarily fully qualified class names, one of them has been fixed but I expected you might generalise from the comment made on this class and fix the others.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] atris commented on pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
atris commented on pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#issuecomment-1024023665


   > I saw some unsolved comments. Please resolve them and wait for all tests pass before merging.
   > You might need to rebase to the latest master if the compatibility test against `0.9.0` fails
   
   There was one unresolved comment around class import -- have fixed that and merged master. Waiting for tests to complete


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] atris commented on a change in pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
atris commented on a change in pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#discussion_r794371555



##########
File path: pinot-perf/src/main/java/org/apache/pinot/perf/BenchmarkMutableFST.java
##########
@@ -0,0 +1,100 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.perf;
+
+import java.io.BufferedReader;
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.util.List;
+import java.util.Objects;
+import java.util.SortedMap;
+import java.util.TreeMap;
+import java.util.concurrent.TimeUnit;
+import org.apache.pinot.segment.local.utils.fst.FSTBuilder;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableFST;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableFSTImpl;
+import org.apache.pinot.segment.local.utils.nativefst.utils.RealTimeRegexpMatcher;
+import org.openjdk.jmh.annotations.Benchmark;
+import org.openjdk.jmh.annotations.BenchmarkMode;
+import org.openjdk.jmh.annotations.Fork;
+import org.openjdk.jmh.annotations.Measurement;
+import org.openjdk.jmh.annotations.Mode;
+import org.openjdk.jmh.annotations.OutputTimeUnit;
+import org.openjdk.jmh.annotations.Param;
+import org.openjdk.jmh.annotations.Scope;
+import org.openjdk.jmh.annotations.Setup;
+import org.openjdk.jmh.annotations.State;
+import org.openjdk.jmh.annotations.Warmup;
+import org.openjdk.jmh.runner.Runner;
+import org.openjdk.jmh.runner.options.OptionsBuilder;
+import org.roaringbitmap.RoaringBitmapWriter;
+import org.roaringbitmap.buffer.MutableRoaringBitmap;
+
+
+@BenchmarkMode(Mode.AverageTime)
+@OutputTimeUnit(TimeUnit.MICROSECONDS)
+@Warmup(iterations = 3, time = 30)
+@Measurement(iterations = 5, time = 30)
+@Fork(1)
+@State(Scope.Benchmark)

Review comment:
       The remaining ones are explicitly left so since the method names are identical between Lucene FST and mutable FST. In order for the reader to be able to clearly disambiguate between the two, I felt it was necessary to use fully qualified names when creating the references.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#discussion_r786580043



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/nativefst/utils/RealTimeRegexpMatcher.java
##########
@@ -0,0 +1,156 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.utils.nativefst.utils;
+
+import java.util.ArrayDeque;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Queue;
+import java.util.Set;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.Automaton;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.CharacterRunAutomaton;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.RegExp;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.State;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.Transition;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableArc;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableFST;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableState;
+import org.roaringbitmap.IntConsumer;
+
+public class RealTimeRegexpMatcher {
+  private final String _regexQuery;
+  private final MutableFST _fst;
+  private final Automaton _automaton;
+  private final IntConsumer _dest;
+
+  public RealTimeRegexpMatcher(String regexQuery, MutableFST fst, IntConsumer dest) {
+    _regexQuery = regexQuery;
+    _fst = fst;
+    _dest = dest;
+
+    _automaton = new RegExp(_regexQuery).toAutomaton();
+  }
+
+  public static void regexMatch(String regexQuery, MutableFST fst, IntConsumer dest) {
+    RealTimeRegexpMatcher matcher = new RealTimeRegexpMatcher(regexQuery, fst, dest);
+    matcher.regexMatchOnFST();
+  }
+
+  // Matches "input" string with _regexQuery Automaton.
+  public boolean match(String input) {
+    CharacterRunAutomaton characterRunAutomaton = new CharacterRunAutomaton(_automaton);
+    return characterRunAutomaton.run(input);
+  }
+
+  /**
+   * This function runs matching on automaton built from regexQuery and the FST.
+   * FST stores key (string) to a value (Long). Both are state machines and state transition is based on
+   * a input character.
+   *
+   * This algorithm starts with Queue containing (Automaton Start Node, FST Start Node).
+   * Each step an entry is popped from the queue:
+   *    1) if the automaton state is accept and the FST Node is final (i.e. end node) then the value stored for that FST
+   *       is added to the set of result.
+   *    2) Else next set of transitions on automaton are gathered and for each transition target node for that character
+   *       is figured out in FST Node, resulting pair of (automaton state, fst node) are added to the queue.
+   *    3) This process is bound to complete since we are making progression on the FST (which is a DAG) towards final
+   *       nodes.
+   */
+  public void regexMatchOnFST() {

Review comment:
       @mayankshriv the formatter leaves it up to the author for a few reasons:
   
   1. it's difficult to make the tools enforce a guideline like this only within method bodies - the only tool available is RegexpMultiline which is a blunt instrument
   2. it's expensive (in terms of build time) to apply the numerous RegexpMultiline rules necessary
   3. A lot of the time, a line _here and there_ makes code more readable (though it usually indicates that the enclosing scope is too large and needs to be broken down), but this PR consistently takes this grey area to the extreme. For instance, consider the constructor below: what do the blank lines mean? Are they intentional? Why is initialising `_pathState` different to `_state`, `_node`, or `_fstArc`? As a peer reviewing this change set, I think it's inconsistent and a waste of screen space.
   
   ```java
       public Path(State state, MutableState node, MutableArc fstArc, List<Character> pathState) {
          _state = state;
          _node = node;
          _fstArc = fstArc;
   
          _pathState = pathState;
   
          _pathState.add(node.getLabel());
        }
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#discussion_r785786513



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/nativefst/mutablefst/MutableFSTImpl.java
##########
@@ -0,0 +1,196 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.segment.local.utils.nativefst.mutablefst;
+
+import com.google.common.base.Preconditions;
+import java.util.ArrayDeque;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Objects;
+import java.util.Queue;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.utils.MutableFSTUtils;
+
+
+/**
+ * A mutable finite state transducer implementation that allows you to build mutable via the API.
+ * This is not thread safe; convert to an ImmutableFst if you need to share across multiple writer
+ * threads.
+ *
+ * Concurrently writing and reading to/from a mutable FST is supported.
+ */
+public class MutableFSTImpl implements MutableFST {
+  private MutableState _start;
+
+  public MutableFSTImpl() {
+    _start = new MutableState(true);
+  }
+
+  /**
+   * Get the initial states
+   */
+  @Override
+  public MutableState getStartState() {
+    return _start;
+  }
+
+  /**
+   * Set the initial state
+   *
+   * @param start the initial state
+   */
+  @Override
+  public void setStartState(MutableState start) {
+    Preconditions.checkState(_start != null, "Cannot override a start state");
+    _start = start;
+  }
+
+  public MutableState newStartState() {
+    return newStartState();
+  }
+
+  public MutableArc addArc(MutableState startState, int outputSymbol, MutableState endState) {
+    MutableArc newArc = new MutableArc(outputSymbol,
+                                        endState);
+    startState.addArc(newArc);
+    endState.addIncomingState(startState);
+    return newArc;
+  }
+
+  @Override
+  public void throwIfInvalid() {
+    Preconditions.checkNotNull(_start, "must have a start state");
+  }
+
+  @Override
+  public void addPath(String word, int outputSymbol) {
+    MutableState state = getStartState();
+
+    if (state == null) {
+      throw new IllegalStateException("Start state cannot be null");
+    }
+
+    List<MutableArc> arcs = state.getArcs();
+
+    boolean isFound = false;
+
+    for (MutableArc arc : arcs) {
+      if (arc.getNextState().getLabel() == word.charAt(0)) {
+        state = arc.getNextState();
+        isFound = true;
+        break;
+      }
+    }
+
+    int foundPos = -1;
+
+    if (isFound) {
+      Pair<MutableState, Integer> pair = findPointOfDiversion(state, word);
+
+      if (pair == null) {
+        // Word already exists
+        return;
+      }
+
+      foundPos = pair.getRight();
+      state = pair.getLeft();
+    }
+
+    for (int i = foundPos + 1; i < word.length(); i++) {
+      MutableState nextState = new MutableState();
+
+      nextState.setLabel(word.charAt(i));
+
+      int currentOutputSymbol = -1;
+
+      if (i == word.length() - 1) {
+        currentOutputSymbol = outputSymbol;
+      }
+
+      MutableArc mutableArc = new MutableArc(currentOutputSymbol, nextState);
+      state.addArc(mutableArc);
+
+      state = nextState;
+    }
+
+    state.setIsTerminal(true);
+  }
+
+  private Pair<MutableState, Integer> findPointOfDiversion(MutableState mutableState,
+      String word) {
+    Queue<Pair<MutableState, Integer>> queue = new ArrayDeque<>();
+    MutableState currentState = mutableState;
+    int pos = 0;
+
+    queue.add(Pair.of(mutableState, 0));
+
+    while (!queue.isEmpty()) {
+      Pair<MutableState, Integer> pair = queue.remove();
+      currentState = pair.getLeft();
+      pos = pair.getRight();
+
+      if (pos == word.length() - 1) {
+        return null;
+      }
+
+      if (currentState.getLabel() != word.charAt(pos)) {
+        throw new IllegalStateException("Current state needs to be part of word path");
+      }
+
+      List<MutableArc> arcs = currentState.getArcs();
+
+      for (MutableArc arc : arcs) {
+        if (arc.getNextState().getLabel() == word.charAt(pos + 1)) {
+          queue.add(Pair.of(arc.getNextState(), pos + 1));
+        }
+      }
+    }
+
+    return Pair.of(currentState, pos);
+  }
+
+  static <T> void compactNulls(ArrayList<T> list) {
+    list.removeIf(Objects::isNull);
+  }
+
+  @Override
+  public String toString() {
+    StringBuilder sb = new StringBuilder();
+    sb.append("Fst(start=").append(_start).append(")");
+    List<MutableArc> arcs = _start.getArcs();
+

Review comment:
       remove blank line




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#discussion_r785787792



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/nativefst/mutablefst/utils/MutableFSTUtils.java
##########
@@ -0,0 +1,85 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.segment.local.utils.nativefst.mutablefst.utils;
+
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableArc;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableFST;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableState;
+import org.apache.pinot.segment.local.utils.nativefst.utils.RealTimeRegexpMatcher;
+import org.roaringbitmap.RoaringBitmapWriter;
+import org.roaringbitmap.buffer.MutableRoaringBitmap;
+
+public class MutableFSTUtils {
+
+  private MutableFSTUtils() {
+
+  }
+
+  public static boolean fstEquals(Object thisFstObj,
+                                  Object thatFstObj) {
+    if (thisFstObj == thatFstObj) {
+      return true;
+    }
+

Review comment:
       remove blank line




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#discussion_r784673019



##########
File path: pinot-perf/src/main/java/org/apache/pinot/perf/BenchmarkMutableFST.java
##########
@@ -0,0 +1,97 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.perf;
+
+import java.io.BufferedReader;
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.util.Objects;
+import java.util.SortedMap;
+import java.util.TreeMap;
+import java.util.concurrent.TimeUnit;
+
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableFST;
+import org.apache.pinot.segment.local.utils.nativefst.utils.RealTimeRegexpMatcher;
+
+import org.openjdk.jmh.annotations.Benchmark;
+import org.openjdk.jmh.annotations.BenchmarkMode;
+import org.openjdk.jmh.annotations.Fork;
+import org.openjdk.jmh.annotations.Measurement;
+import org.openjdk.jmh.annotations.Mode;
+import org.openjdk.jmh.annotations.OutputTimeUnit;
+import org.openjdk.jmh.annotations.Param;
+import org.openjdk.jmh.annotations.Scope;
+import org.openjdk.jmh.annotations.Setup;
+import org.openjdk.jmh.annotations.State;
+import org.openjdk.jmh.annotations.Warmup;
+import org.openjdk.jmh.infra.Blackhole;
+import org.openjdk.jmh.runner.Runner;
+import org.openjdk.jmh.runner.options.OptionsBuilder;
+import org.roaringbitmap.RoaringBitmapWriter;
+import org.roaringbitmap.buffer.MutableRoaringBitmap;
+
+
+@BenchmarkMode(Mode.AverageTime)
+@OutputTimeUnit(TimeUnit.MICROSECONDS)
+@Warmup(iterations = 3, time = 30)
+@Measurement(iterations = 5, time = 30)
+@Fork(1)
+@State(Scope.Benchmark)
+public class BenchmarkMutableFST {
+  @Param({"q.[aeiou]c.*", ".*a", "b.*", ".*", ".*ated", ".*ba.*"})
+  public String _regex;
+
+  private MutableFST _mutableFST;
+  private org.apache.lucene.util.fst.FST _fst;
+
+  @Setup
+  public void setUp()
+      throws IOException {
+    SortedMap<String, Integer> input = new TreeMap<>();
+    try (BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(
+        Objects.requireNonNull(getClass().getClassLoader().getResourceAsStream("data/words.txt"))))) {
+      String currentWord;
+      int i = 0;
+      while ((currentWord = bufferedReader.readLine()) != null) {
+        _mutableFST.addPath(currentWord, i);
+        input.put(currentWord, i++);
+      }
+    }
+
+    _fst = org.apache.pinot.segment.local.utils.fst.FSTBuilder.buildFST(input);
+  }
+
+  @Benchmark
+  public void testMutableRegex(Blackhole blackhole) {
+    RoaringBitmapWriter<MutableRoaringBitmap> writer = RoaringBitmapWriter.bufferWriter().get();
+    RealTimeRegexpMatcher.regexMatch(_regex, _mutableFST, writer::add);
+    blackhole.consume(writer.get());

Review comment:
       When there’s only one value produced, just return it and let the framework do the blackhole stuff.

##########
File path: pinot-segment-local/src/test/java/org/apache/pinot/segment/local/utils/nativefst/mutablefst/MutableFSTConcurrentTest.java
##########
@@ -0,0 +1,145 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.utils.nativefst.mutablefst;
+
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.CountDownLatch;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import org.apache.commons.lang3.tuple.Pair;
+import org.testng.annotations.AfterClass;
+import org.testng.annotations.BeforeClass;
+import org.testng.annotations.Test;
+
+import static org.apache.pinot.segment.local.utils.nativefst.mutablefst.utils.MutableFSTUtils.regexQueryNrHitsForRealTimeFST;
+import static org.testng.AssertJUnit.assertEquals;
+
+
+public class MutableFSTConcurrentTest {
+  private ExecutorService _threadPool;
+
+  private CountDownLatch _countDownLatch;
+  private Set<String> _resultSet;
+
+  @BeforeClass
+  private void setup() {
+     _threadPool = Executors.newFixedThreadPool(2);
+    _countDownLatch = new CountDownLatch(2);
+    _resultSet = new HashSet<>();
+  }
+
+  @AfterClass
+  private void shutDown() {
+    _threadPool.shutdown();
+  }
+
+  @Test
+  public void testConcurrentWriteAndRead()
+      throws InterruptedException {
+    MutableFST mutableFST = new MutableFSTImpl();
+    List<String> words = new ArrayList<>();
+
+    words.add("ab");
+    words.add("abba");
+    words.add("aba");
+    words.add("bab");
+    words.add("cdd");
+    words.add("efg");
+
+    List<Pair<String, Integer>> wordsWithMetadata = new ArrayList<>();
+    int i = 1;
+
+    for (String currentWord : words) {
+      wordsWithMetadata.add(Pair.of(currentWord, i));
+      i++;
+    }
+
+    _threadPool.submit(() -> {
+      try {
+        performReads(mutableFST, words, 10, 200);
+      } catch (InterruptedException e) {
+        e.printStackTrace();
+      }
+    });
+
+    _threadPool.submit(() -> {
+      try {
+        performWrites(mutableFST, wordsWithMetadata, 10);
+      } catch (InterruptedException e) {
+        e.printStackTrace();
+      }
+    });
+
+    _countDownLatch.await();
+
+    assertEquals(_resultSet.size(), words.size());
+
+    assert (_resultSet.contains("ab"));
+    assert (_resultSet.contains("abba"));
+    assert (_resultSet.contains("aba"));
+    assert (_resultSet.contains("bab"));
+    assert (_resultSet.contains("cdd"));
+    assert (_resultSet.contains("efg"));

Review comment:
       please use `Assert.assertTrue` instead. To make it easier for the next person to work with this code, adding a message like "ab" or "\"ab\" not in the result set" would make it easier to figure out what's happened if someone breaks this code and this test catches it.

##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/nativefst/utils/RealTimeRegexpMatcher.java
##########
@@ -0,0 +1,158 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.utils.nativefst.utils;
+
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Set;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.Automaton;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.CharacterRunAutomaton;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.RegExp;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.State;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.Transition;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableArc;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableFST;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableState;
+import org.roaringbitmap.IntConsumer;
+
+public class RealTimeRegexpMatcher {
+  private final String _regexQuery;
+  private final MutableFST _fst;
+  private final Automaton _automaton;
+  private final IntConsumer _dest;
+
+  public RealTimeRegexpMatcher(String regexQuery, MutableFST fst, IntConsumer dest) {
+    _regexQuery = regexQuery;
+    _fst = fst;
+    _dest = dest;
+
+    _automaton = new RegExp(_regexQuery).toAutomaton();
+  }
+
+  public static void regexMatch(String regexQuery, MutableFST fst, IntConsumer dest) {
+    RealTimeRegexpMatcher matcher = new RealTimeRegexpMatcher(regexQuery, fst, dest);
+    matcher.regexMatchOnFST();
+  }
+
+  // Matches "input" string with _regexQuery Automaton.
+  public boolean match(String input) {
+    CharacterRunAutomaton characterRunAutomaton = new CharacterRunAutomaton(_automaton);
+    return characterRunAutomaton.run(input);
+  }
+
+  /**
+   * This function runs matching on automaton built from regexQuery and the FST.
+   * FST stores key (string) to a value (Long). Both are state machines and state transition is based on
+   * a input character.
+   *
+   * This algorithm starts with Queue containing (Automaton Start Node, FST Start Node).
+   * Each step an entry is popped from the queue:
+   *    1) if the automaton state is accept and the FST Node is final (i.e. end node) then the value stored for that FST
+   *       is added to the set of result.
+   *    2) Else next set of transitions on automaton are gathered and for each transition target node for that character
+   *       is figured out in FST Node, resulting pair of (automaton state, fst node) are added to the queue.
+   *    3) This process is bound to complete since we are making progression on the FST (which is a DAG) towards final
+   *       nodes.
+   */
+  public void regexMatchOnFST() {
+    final List<Path> queue = new ArrayList<>();
+
+    if (_automaton.getNumberOfStates() == 0) {
+      return;
+    }
+
+    // Automaton start state and FST start node is added to the queue.
+    queue.add(new Path(_automaton.getInitialState(), _fst.getStartState(), null, new ArrayList<>()));
+
+    Set<State> acceptStates = _automaton.getAcceptStates();
+    while (queue.size() != 0) {
+      final Path path = queue.remove(queue.size() - 1);

Review comment:
       When you change this to `ArrayDeque` you can just call `queue.removeLast()` instead. This is exactly the sort of thing `Deque` is for.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter edited a comment on pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#issuecomment-1012898899


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#8016](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (c20ca3c) into [master](https://codecov.io/gh/apache/pinot/commit/77a706996099f9bb44b90ad506f5205b3c4d7a42?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (77a7069) will **decrease** coverage by `0.95%`.
   > The diff coverage is `62.62%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/8016/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #8016      +/-   ##
   ============================================
   - Coverage     71.31%   70.36%   -0.96%     
   - Complexity     4112     4268     +156     
   ============================================
     Files          1593     1604      +11     
     Lines         82365    83155     +790     
     Branches      12270    12410     +140     
   ============================================
   - Hits          58740    58513     -227     
   - Misses        19660    20632     +972     
   - Partials       3965     4010      +45     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `28.96% <0.00%> (-0.06%)` | :arrow_down: |
   | integration2 | `?` | |
   | unittests1 | `68.12% <62.62%> (-0.33%)` | :arrow_down: |
   | unittests2 | `14.31% <0.00%> (-0.01%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...ls/nativefst/mutablefst/utils/MutableFSTUtils.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC91dGlscy9uYXRpdmVmc3QvbXV0YWJsZWZzdC91dGlscy9NdXRhYmxlRlNUVXRpbHMuamF2YQ==) | `11.53% <11.53%> (ø)` | |
   | [...local/utils/nativefst/mutablefst/MutableState.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC91dGlscy9uYXRpdmVmc3QvbXV0YWJsZWZzdC9NdXRhYmxlU3RhdGUuamF2YQ==) | `50.00% <50.00%> (ø)` | |
   | [...t/local/utils/nativefst/mutablefst/MutableArc.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC91dGlscy9uYXRpdmVmc3QvbXV0YWJsZWZzdC9NdXRhYmxlQXJjLmphdmE=) | `54.54% <54.54%> (ø)` | |
   | [...cal/utils/nativefst/mutablefst/MutableFSTImpl.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC91dGlscy9uYXRpdmVmc3QvbXV0YWJsZWZzdC9NdXRhYmxlRlNUSW1wbC5qYXZh) | `66.66% <66.66%> (ø)` | |
   | [...l/utils/nativefst/utils/RealTimeRegexpMatcher.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC91dGlscy9uYXRpdmVmc3QvdXRpbHMvUmVhbFRpbWVSZWdleHBNYXRjaGVyLmphdmE=) | `92.30% <92.30%> (ø)` | |
   | [...t/core/plan/StreamingInstanceResponsePlanNode.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9wbGFuL1N0cmVhbWluZ0luc3RhbmNlUmVzcG9uc2VQbGFuTm9kZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...ore/operator/streaming/StreamingResponseUtils.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9vcGVyYXRvci9zdHJlYW1pbmcvU3RyZWFtaW5nUmVzcG9uc2VVdGlscy5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...ager/realtime/PeerSchemeSplitSegmentCommitter.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL21hbmFnZXIvcmVhbHRpbWUvUGVlclNjaGVtZVNwbGl0U2VnbWVudENvbW1pdHRlci5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...pache/pinot/common/utils/grpc/GrpcQueryClient.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vdXRpbHMvZ3JwYy9HcnBjUXVlcnlDbGllbnQuamF2YQ==) | `0.00% <0.00%> (-94.74%)` | :arrow_down: |
   | [...he/pinot/core/plan/StreamingSelectionPlanNode.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9wbGFuL1N0cmVhbWluZ1NlbGVjdGlvblBsYW5Ob2RlLmphdmE=) | `0.00% <0.00%> (-88.89%)` | :arrow_down: |
   | ... and [179 more](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [77a7069...c20ca3c](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#discussion_r785781305



##########
File path: pinot-perf/src/main/java/org/apache/pinot/perf/BenchmarkMutableFST.java
##########
@@ -0,0 +1,97 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.perf;
+
+import java.io.BufferedReader;
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.util.List;
+import java.util.Objects;
+import java.util.SortedMap;
+import java.util.TreeMap;
+import java.util.concurrent.TimeUnit;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableFST;
+import org.apache.pinot.segment.local.utils.nativefst.utils.RealTimeRegexpMatcher;
+import org.openjdk.jmh.annotations.Benchmark;
+import org.openjdk.jmh.annotations.BenchmarkMode;
+import org.openjdk.jmh.annotations.Fork;
+import org.openjdk.jmh.annotations.Measurement;
+import org.openjdk.jmh.annotations.Mode;
+import org.openjdk.jmh.annotations.OutputTimeUnit;
+import org.openjdk.jmh.annotations.Param;
+import org.openjdk.jmh.annotations.Scope;
+import org.openjdk.jmh.annotations.Setup;
+import org.openjdk.jmh.annotations.State;
+import org.openjdk.jmh.annotations.Warmup;
+import org.openjdk.jmh.infra.Blackhole;
+import org.openjdk.jmh.runner.Runner;
+import org.openjdk.jmh.runner.options.OptionsBuilder;
+import org.roaringbitmap.RoaringBitmapWriter;
+import org.roaringbitmap.buffer.MutableRoaringBitmap;
+
+
+@BenchmarkMode(Mode.AverageTime)
+@OutputTimeUnit(TimeUnit.MICROSECONDS)
+@Warmup(iterations = 3, time = 30)
+@Measurement(iterations = 5, time = 30)
+@Fork(1)
+@State(Scope.Benchmark)
+public class BenchmarkMutableFST {
+  @Param({"q.[aeiou]c.*", ".*a", "b.*", ".*", ".*ated", ".*ba.*"})
+  public String _regex;
+
+  private MutableFST _mutableFST;
+  private org.apache.lucene.util.fst.FST _fst;
+
+  @Setup
+  public void setUp()
+      throws IOException {
+    SortedMap<String, Integer> input = new TreeMap<>();
+    try (BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(
+        Objects.requireNonNull(getClass().getClassLoader().getResourceAsStream("data/words.txt"))))) {
+      String currentWord;
+      int i = 0;
+      while ((currentWord = bufferedReader.readLine()) != null) {
+        _mutableFST.addPath(currentWord, i);
+        input.put(currentWord, i++);
+      }
+    }
+
+    _fst = org.apache.pinot.segment.local.utils.fst.FSTBuilder.buildFST(input);

Review comment:
       Just import `org.apache.pinot.segment.local.utils.fst.FSTBuilder`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
richardstartin commented on pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#issuecomment-1014591577


   > > > > I have two general points about the layout of the code (I don't like focussing on this sort of thing, but feel it's necessary here):
   > > > > 
   > > > > 1. It needs to be formatted properly: https://github.com/apache/pinot/blob/master/config/codestyle-intellij.xml
   > > > > 2. There are too many blank lines. In my humble opinion, these do not aid the reader at all. On many occasions I look at 15 lines of code and notice they compress to 5. It really obscures the logic by occupying screen space.
   > > > 
   > > > 
   > > > Thanks for pointing that out. The problem is that my laptop is M1 Pro based, so a global build of Pinot fails on it. I am dependent on Linter checks to tell me if check style is failing.
   > > 
   > > 
   > > I would certainly prefer to automate style checking because I want code to be readable automatically without needing to review it for readability. If a simple method doesn't fit on a screen without scrolling, it's unreadable. I've proposed using checkstyle to ban blank lines within functions to @Jackie-Jiang to help specifically with this case, but he wasn't too keen on the idea and reasoned that the _occasional_ blank line can help readability.
   > > Regarding M1s, I think @dongxiaoman uses M1 and managed to build Pinot, maybe he can help?
   > 
   > Thanks, I will check with @dongxiaoman.
   > 
   > I would argue that the spacing around blanks is a subjective thing, unless somebody is putting 3 blanks consecutively (in which case, our check style does catch it).
   > 
   > Beyond that, the definition of readability is YMMV. As long as it is not obviously and immediately evident that something is broken, we should IMO not let subjectivity take control
   
   This is not a subjective matter but a question of how much code fits on screen. Objectively, given finite screen space, large numbers of blank lines will prevent methods from fitting on screen at some point. One method in this PR contains 15 blank lines.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] atris commented on pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
atris commented on pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#issuecomment-1025516369


    
   
   > 👍
   
   Thank you for the review!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#discussion_r794771583



##########
File path: pinot-perf/src/main/java/org/apache/pinot/perf/BenchmarkMutableFST.java
##########
@@ -0,0 +1,100 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.perf;
+
+import java.io.BufferedReader;
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.util.List;
+import java.util.Objects;
+import java.util.SortedMap;
+import java.util.TreeMap;
+import java.util.concurrent.TimeUnit;
+import org.apache.pinot.segment.local.utils.fst.FSTBuilder;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableFST;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableFSTImpl;
+import org.apache.pinot.segment.local.utils.nativefst.utils.RealTimeRegexpMatcher;
+import org.openjdk.jmh.annotations.Benchmark;
+import org.openjdk.jmh.annotations.BenchmarkMode;
+import org.openjdk.jmh.annotations.Fork;
+import org.openjdk.jmh.annotations.Measurement;
+import org.openjdk.jmh.annotations.Mode;
+import org.openjdk.jmh.annotations.OutputTimeUnit;
+import org.openjdk.jmh.annotations.Param;
+import org.openjdk.jmh.annotations.Scope;
+import org.openjdk.jmh.annotations.Setup;
+import org.openjdk.jmh.annotations.State;
+import org.openjdk.jmh.annotations.Warmup;
+import org.openjdk.jmh.runner.Runner;
+import org.openjdk.jmh.runner.options.OptionsBuilder;
+import org.roaringbitmap.RoaringBitmapWriter;
+import org.roaringbitmap.buffer.MutableRoaringBitmap;
+
+
+@BenchmarkMode(Mode.AverageTime)
+@OutputTimeUnit(TimeUnit.MICROSECONDS)
+@Warmup(iterations = 3, time = 30)
+@Measurement(iterations = 5, time = 30)
+@Fork(1)
+@State(Scope.Benchmark)

Review comment:
       That's not the case - are you looking at the same file as I am? 
   
   This benchmark measures `org.apache.pinot.segment.local.utils.nativefst.utils.RealTimeRegexpMatcher` and `org.apache.pinot.segment.local.utils.fst.RegexpMatcher`. There is no name clash.
   
   `org.apache.lucene.util.fst.FST` does not clash with any other referenced type in this class, so does not need to be fully qualified either.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#discussion_r786703811



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/nativefst/utils/RealTimeRegexpMatcher.java
##########
@@ -0,0 +1,156 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.utils.nativefst.utils;
+
+import java.util.ArrayDeque;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Queue;
+import java.util.Set;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.Automaton;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.CharacterRunAutomaton;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.RegExp;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.State;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.Transition;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableArc;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableFST;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableState;
+import org.roaringbitmap.IntConsumer;
+
+public class RealTimeRegexpMatcher {
+  private final String _regexQuery;
+  private final MutableFST _fst;
+  private final Automaton _automaton;
+  private final IntConsumer _dest;
+
+  public RealTimeRegexpMatcher(String regexQuery, MutableFST fst, IntConsumer dest) {
+    _regexQuery = regexQuery;
+    _fst = fst;
+    _dest = dest;
+
+    _automaton = new RegExp(_regexQuery).toAutomaton();
+  }
+
+  public static void regexMatch(String regexQuery, MutableFST fst, IntConsumer dest) {
+    RealTimeRegexpMatcher matcher = new RealTimeRegexpMatcher(regexQuery, fst, dest);
+    matcher.regexMatchOnFST();
+  }
+
+  // Matches "input" string with _regexQuery Automaton.
+  public boolean match(String input) {
+    CharacterRunAutomaton characterRunAutomaton = new CharacterRunAutomaton(_automaton);
+    return characterRunAutomaton.run(input);
+  }
+
+  /**
+   * This function runs matching on automaton built from regexQuery and the FST.
+   * FST stores key (string) to a value (Long). Both are state machines and state transition is based on
+   * a input character.
+   *
+   * This algorithm starts with Queue containing (Automaton Start Node, FST Start Node).
+   * Each step an entry is popped from the queue:
+   *    1) if the automaton state is accept and the FST Node is final (i.e. end node) then the value stored for that FST
+   *       is added to the set of result.
+   *    2) Else next set of transitions on automaton are gathered and for each transition target node for that character
+   *       is figured out in FST Node, resulting pair of (automaton state, fst node) are added to the queue.
+   *    3) This process is bound to complete since we are making progression on the FST (which is a DAG) towards final
+   *       nodes.
+   */
+  public void regexMatchOnFST() {

Review comment:
       > However, I would also ask yourself to be a bit considerate for expression of a personal style. 
   ...
   > All I am requesting you (humbly) is to allow me a bit of room (within reasonable limits, of course)
   
   I don't believe there is room for personal expression in a shared codebase - it's engineering, not poetry. Ideally it shouldn't be possible to tell who wrote the code without looking at git blame. More time will be spent reading this code than was spent writing it, so it all needs to look the same for the sake of the person who may need to make changes in the future.
   
   > I request you to kindly not allege that I am being intentional by adding too many lines and trying to obfuscate the project (quoting your latest comment)
   
   I don't know what you are referring to - no allegations have been made about intent to obfuscate.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#discussion_r785785733



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/nativefst/mutablefst/MutableFSTImpl.java
##########
@@ -0,0 +1,196 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.segment.local.utils.nativefst.mutablefst;
+
+import com.google.common.base.Preconditions;
+import java.util.ArrayDeque;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Objects;
+import java.util.Queue;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.utils.MutableFSTUtils;
+
+
+/**
+ * A mutable finite state transducer implementation that allows you to build mutable via the API.
+ * This is not thread safe; convert to an ImmutableFst if you need to share across multiple writer
+ * threads.
+ *
+ * Concurrently writing and reading to/from a mutable FST is supported.
+ */
+public class MutableFSTImpl implements MutableFST {
+  private MutableState _start;
+
+  public MutableFSTImpl() {
+    _start = new MutableState(true);
+  }
+
+  /**
+   * Get the initial states
+   */
+  @Override
+  public MutableState getStartState() {
+    return _start;
+  }
+
+  /**
+   * Set the initial state
+   *
+   * @param start the initial state
+   */
+  @Override
+  public void setStartState(MutableState start) {
+    Preconditions.checkState(_start != null, "Cannot override a start state");
+    _start = start;
+  }
+
+  public MutableState newStartState() {
+    return newStartState();
+  }
+
+  public MutableArc addArc(MutableState startState, int outputSymbol, MutableState endState) {
+    MutableArc newArc = new MutableArc(outputSymbol,
+                                        endState);
+    startState.addArc(newArc);
+    endState.addIncomingState(startState);
+    return newArc;
+  }
+
+  @Override
+  public void throwIfInvalid() {
+    Preconditions.checkNotNull(_start, "must have a start state");
+  }
+
+  @Override
+  public void addPath(String word, int outputSymbol) {

Review comment:
       Formatting: this method should fit on a screen without scrolling, but contains 15 blank lines which make the code unreadable.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#discussion_r785787934



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/nativefst/mutablefst/utils/MutableFSTUtils.java
##########
@@ -0,0 +1,85 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.segment.local.utils.nativefst.mutablefst.utils;
+
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableArc;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableFST;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableState;
+import org.apache.pinot.segment.local.utils.nativefst.utils.RealTimeRegexpMatcher;
+import org.roaringbitmap.RoaringBitmapWriter;
+import org.roaringbitmap.buffer.MutableRoaringBitmap;
+
+public class MutableFSTUtils {
+
+  private MutableFSTUtils() {
+
+  }
+
+  public static boolean fstEquals(Object thisFstObj,
+                                  Object thatFstObj) {
+    if (thisFstObj == thatFstObj) {
+      return true;
+    }
+
+    if (thisFstObj instanceof MutableFST && thatFstObj instanceof MutableFST) {
+      MutableFST thisFST = (MutableFST) thisFstObj;
+      MutableFST thatFST = (MutableFST) thatFstObj;
+      return thisFST.getStartState().getLabel() == thatFST.getStartState().getLabel()
+          && thisFST.getStartState().getArcs().size() == thatFST.getStartState().getArcs().size();
+    }
+    return false;
+  }
+
+  public static boolean arcEquals(Object thisArcObj, Object thatArcObj) {
+    if (thisArcObj == thatArcObj) {
+      return true;
+    }
+

Review comment:
       remove blank line




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] atris commented on a change in pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
atris commented on a change in pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#discussion_r786529909



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/nativefst/utils/RealTimeRegexpMatcher.java
##########
@@ -0,0 +1,156 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.utils.nativefst.utils;
+
+import java.util.ArrayDeque;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Queue;
+import java.util.Set;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.Automaton;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.CharacterRunAutomaton;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.RegExp;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.State;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.Transition;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableArc;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableFST;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableState;
+import org.roaringbitmap.IntConsumer;
+
+public class RealTimeRegexpMatcher {
+  private final String _regexQuery;
+  private final MutableFST _fst;
+  private final Automaton _automaton;
+  private final IntConsumer _dest;
+
+  public RealTimeRegexpMatcher(String regexQuery, MutableFST fst, IntConsumer dest) {
+    _regexQuery = regexQuery;
+    _fst = fst;
+    _dest = dest;
+
+    _automaton = new RegExp(_regexQuery).toAutomaton();
+  }
+
+  public static void regexMatch(String regexQuery, MutableFST fst, IntConsumer dest) {
+    RealTimeRegexpMatcher matcher = new RealTimeRegexpMatcher(regexQuery, fst, dest);
+    matcher.regexMatchOnFST();
+  }
+
+  // Matches "input" string with _regexQuery Automaton.
+  public boolean match(String input) {
+    CharacterRunAutomaton characterRunAutomaton = new CharacterRunAutomaton(_automaton);
+    return characterRunAutomaton.run(input);
+  }
+
+  /**
+   * This function runs matching on automaton built from regexQuery and the FST.
+   * FST stores key (string) to a value (Long). Both are state machines and state transition is based on
+   * a input character.
+   *
+   * This algorithm starts with Queue containing (Automaton Start Node, FST Start Node).
+   * Each step an entry is popped from the queue:
+   *    1) if the automaton state is accept and the FST Node is final (i.e. end node) then the value stored for that FST
+   *       is added to the set of result.
+   *    2) Else next set of transitions on automaton are gathered and for each transition target node for that character
+   *       is figured out in FST Node, resulting pair of (automaton state, fst node) are added to the queue.
+   *    3) This process is bound to complete since we are making progression on the FST (which is a DAG) towards final
+   *       nodes.
+   */
+  public void regexMatchOnFST() {

Review comment:
       So in this scenario, we are at an impasse because we neither have formatter rules nor guidelines. I will concede and remove all blank lines, per Richard's command.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] atris commented on a change in pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
atris commented on a change in pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#discussion_r795395708



##########
File path: pinot-perf/src/main/java/org/apache/pinot/perf/BenchmarkMutableFST.java
##########
@@ -0,0 +1,100 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.perf;
+
+import java.io.BufferedReader;
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.util.List;
+import java.util.Objects;
+import java.util.SortedMap;
+import java.util.TreeMap;
+import java.util.concurrent.TimeUnit;
+import org.apache.pinot.segment.local.utils.fst.FSTBuilder;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableFST;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableFSTImpl;
+import org.apache.pinot.segment.local.utils.nativefst.utils.RealTimeRegexpMatcher;
+import org.openjdk.jmh.annotations.Benchmark;
+import org.openjdk.jmh.annotations.BenchmarkMode;
+import org.openjdk.jmh.annotations.Fork;
+import org.openjdk.jmh.annotations.Measurement;
+import org.openjdk.jmh.annotations.Mode;
+import org.openjdk.jmh.annotations.OutputTimeUnit;
+import org.openjdk.jmh.annotations.Param;
+import org.openjdk.jmh.annotations.Scope;
+import org.openjdk.jmh.annotations.Setup;
+import org.openjdk.jmh.annotations.State;
+import org.openjdk.jmh.annotations.Warmup;
+import org.openjdk.jmh.runner.Runner;
+import org.openjdk.jmh.runner.options.OptionsBuilder;
+import org.roaringbitmap.RoaringBitmapWriter;
+import org.roaringbitmap.buffer.MutableRoaringBitmap;
+
+
+@BenchmarkMode(Mode.AverageTime)
+@OutputTimeUnit(TimeUnit.MICROSECONDS)
+@Warmup(iterations = 3, time = 30)
+@Measurement(iterations = 5, time = 30)
+@Fork(1)
+@State(Scope.Benchmark)

Review comment:
       Fixed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] atris merged pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
atris merged pull request #8016:
URL: https://github.com/apache/pinot/pull/8016


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#discussion_r784301448



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/nativefst/mutablefst/MutableFSTImpl.java
##########
@@ -0,0 +1,198 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.segment.local.utils.nativefst.mutablefst;
+
+import com.google.common.base.Preconditions;
+
+import java.util.ArrayList;
+import java.util.List;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.utils.MutableFSTUtils;
+
+
+/**
+ * A mutable finite state transducer implementation that allows you to build mutable via the API.
+ * This is not thread safe; convert to an ImmutableFst if you need to share across multiple writer
+ * threads.
+ *
+ * Concurrently writing and reading to/from a mutable FST is supported.
+ */
+public class MutableFSTImpl implements MutableFST {
+  private MutableState _start;
+
+  public MutableFSTImpl() {
+    _start = new MutableState(true);
+  }
+
+  /**
+   * Get the initial states
+   */
+  @Override
+  public MutableState getStartState() {
+    return _start;
+  }
+
+  /**
+   * Set the initial state
+   *
+   * @param start the initial state
+   */
+  @Override
+  public void setStartState(MutableState start) {
+    if (_start != null) {
+      throw new IllegalStateException("Cannot override a start state");
+    }
+
+    _start = start;
+  }
+
+  public MutableState newStartState() {
+    return newStartState();
+  }
+
+  public MutableArc addArc(MutableState startState, int outputSymbol, MutableState endState) {
+    MutableArc newArc = new MutableArc(outputSymbol,
+                                        endState);
+    startState.addArc(newArc);
+    endState.addIncomingState(startState);
+    return newArc;
+  }
+
+  @Override
+  public void throwIfInvalid() {
+    Preconditions.checkNotNull(_start, "must have a start state");
+  }
+
+  @Override
+  public void addPath(String word, int outputSymbol) {
+    MutableState state = getStartState();
+
+    if (state == null) {
+      throw new IllegalStateException("Start state cannot be null");
+    }
+
+    List<MutableArc> arcs = state.getArcs();
+
+    boolean isFound = false;
+
+    for (MutableArc arc : arcs) {
+      if (arc.getNextState().getLabel() == word.charAt(0)) {
+        state = arc.getNextState();
+        isFound = true;
+        break;
+      }
+    }
+
+    int foundPos = -1;
+
+    if (isFound) {
+      Pair<MutableState, Integer> pair = findPointOfDiversion(state, word, 0);
+
+      if (pair == null) {
+        // Word already exists
+        return;
+      }
+
+      foundPos = pair.getRight();
+      state = pair.getLeft();
+    }
+
+    for (int i = foundPos + 1; i < word.length(); i++) {
+      MutableState nextState = new MutableState();
+
+      nextState.setLabel(word.charAt(i));
+
+      int currentOutputSymbol = -1;
+
+      if (i == word.length() - 1) {
+        currentOutputSymbol = outputSymbol;
+      }
+
+      MutableArc mutableArc = new MutableArc(currentOutputSymbol, nextState);
+      state.addArc(mutableArc);
+
+      state = nextState;
+    }
+
+    state.setIsTerminal(true);
+  }
+
+  private Pair<MutableState, Integer> findPointOfDiversion(MutableState mutableState,
+      String word, int currentPos) {
+    if (currentPos == word.length() - 1) {
+      return null;
+    }
+
+    if (mutableState.getLabel() != word.charAt(currentPos)) {
+      throw new IllegalStateException("Current state needs to be part of word path");
+    }
+
+    List<MutableArc> arcs = mutableState.getArcs();
+
+    for (MutableArc arc : arcs) {
+      if (arc.getNextState().getLabel() == word.charAt(currentPos + 1)) {
+        return findPointOfDiversion(arc.getNextState(), word, currentPos + 1);
+      }
+    }
+
+    return Pair.of(mutableState, currentPos);
+  }

Review comment:
       this non tail-recursion is quite dangerous because it could result in very deep stacks, what is the bound on the number of states?
   
   
   I wonder if you could rework this slightly so instead of matching `List<MutableArc>` - each of which holds a `char` - and a `String` to a direct comparison between two `char[]` (convert the string to char[] at the edge, invert `List<MutableArc>`) so you can use `Arrays.mismatch`. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#discussion_r784297911



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/nativefst/mutablefst/MutableFSTImpl.java
##########
@@ -0,0 +1,198 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.segment.local.utils.nativefst.mutablefst;
+
+import com.google.common.base.Preconditions;
+
+import java.util.ArrayList;
+import java.util.List;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.utils.MutableFSTUtils;
+
+
+/**
+ * A mutable finite state transducer implementation that allows you to build mutable via the API.
+ * This is not thread safe; convert to an ImmutableFst if you need to share across multiple writer
+ * threads.
+ *
+ * Concurrently writing and reading to/from a mutable FST is supported.
+ */
+public class MutableFSTImpl implements MutableFST {
+  private MutableState _start;
+
+  public MutableFSTImpl() {
+    _start = new MutableState(true);
+  }
+
+  /**
+   * Get the initial states
+   */
+  @Override
+  public MutableState getStartState() {
+    return _start;
+  }
+
+  /**
+   * Set the initial state
+   *
+   * @param start the initial state
+   */
+  @Override
+  public void setStartState(MutableState start) {
+    if (_start != null) {
+      throw new IllegalStateException("Cannot override a start state");
+    }
+
+    _start = start;

Review comment:
       2 lines not 5:
   ```java
   Preconditions.checkState(_start != null, "Cannot override a start state");
   _start = start;
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#discussion_r784289722



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/nativefst/utils/RealTimeRegexpMatcher.java
##########
@@ -0,0 +1,159 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.utils.nativefst.utils;
+
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Set;
+
+import org.apache.pinot.segment.local.utils.nativefst.automaton.Automaton;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.CharacterRunAutomaton;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.RegExp;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.State;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.Transition;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableArc;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableFST;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableState;
+import org.roaringbitmap.IntConsumer;
+
+public class RealTimeRegexpMatcher {
+  private final String _regexQuery;
+  private final MutableFST _fst;
+  private final Automaton _automaton;
+  private final IntConsumer _dest;
+
+  public RealTimeRegexpMatcher(String regexQuery, MutableFST fst, IntConsumer dest) {
+    _regexQuery = regexQuery;
+    _fst = fst;
+    _dest = dest;
+
+    _automaton = new RegExp(_regexQuery).toAutomaton();
+  }
+
+  public static void regexMatch(String regexQuery, MutableFST fst, IntConsumer dest) {
+    RealTimeRegexpMatcher matcher = new RealTimeRegexpMatcher(regexQuery, fst, dest);
+    matcher.regexMatchOnFST();
+  }
+
+  // Matches "input" string with _regexQuery Automaton.
+  public boolean match(String input) {
+    CharacterRunAutomaton characterRunAutomaton = new CharacterRunAutomaton(_automaton);
+    return characterRunAutomaton.run(input);
+  }
+
+  /**
+   * This function runs matching on automaton built from regexQuery and the FST.
+   * FST stores key (string) to a value (Long). Both are state machines and state transition is based on
+   * a input character.
+   *
+   * This algorithm starts with Queue containing (Automaton Start Node, FST Start Node).
+   * Each step an entry is popped from the queue:
+   *    1) if the automaton state is accept and the FST Node is final (i.e. end node) then the value stored for that FST
+   *       is added to the set of result.
+   *    2) Else next set of transitions on automaton are gathered and for each transition target node for that character
+   *       is figured out in FST Node, resulting pair of (automaton state, fst node) are added to the queue.
+   *    3) This process is bound to complete since we are making progression on the FST (which is a DAG) towards final
+   *       nodes.
+   */
+  public void regexMatchOnFST() {
+    final List<Path> queue = new ArrayList<>();

Review comment:
       Use `ArrayDeque` instead




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#discussion_r784296987



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/nativefst/mutablefst/MutableFSTImpl.java
##########
@@ -0,0 +1,198 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.segment.local.utils.nativefst.mutablefst;
+
+import com.google.common.base.Preconditions;
+
+import java.util.ArrayList;
+import java.util.List;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.utils.MutableFSTUtils;
+
+
+/**
+ * A mutable finite state transducer implementation that allows you to build mutable via the API.
+ * This is not thread safe; convert to an ImmutableFst if you need to share across multiple writer
+ * threads.
+ *
+ * Concurrently writing and reading to/from a mutable FST is supported.
+ */
+public class MutableFSTImpl implements MutableFST {
+  private MutableState _start;
+
+  public MutableFSTImpl() {
+    _start = new MutableState(true);
+  }
+
+  /**
+   * Get the initial states
+   */
+  @Override
+  public MutableState getStartState() {
+    return _start;
+  }
+
+  /**
+   * Set the initial state
+   *
+   * @param start the initial state
+   */
+  @Override
+  public void setStartState(MutableState start) {
+    if (_start != null) {
+      throw new IllegalStateException("Cannot override a start state");
+    }
+
+    _start = start;
+  }
+
+  public MutableState newStartState() {
+    return newStartState();
+  }
+
+  public MutableArc addArc(MutableState startState, int outputSymbol, MutableState endState) {
+    MutableArc newArc = new MutableArc(outputSymbol,
+                                        endState);
+    startState.addArc(newArc);
+    endState.addIncomingState(startState);
+    return newArc;
+  }
+
+  @Override
+  public void throwIfInvalid() {
+    Preconditions.checkNotNull(_start, "must have a start state");
+  }
+
+  @Override
+  public void addPath(String word, int outputSymbol) {
+    MutableState state = getStartState();
+
+    if (state == null) {
+      throw new IllegalStateException("Start state cannot be null");
+    }
+
+    List<MutableArc> arcs = state.getArcs();
+
+    boolean isFound = false;
+
+    for (MutableArc arc : arcs) {
+      if (arc.getNextState().getLabel() == word.charAt(0)) {
+        state = arc.getNextState();
+        isFound = true;
+        break;
+      }
+    }
+
+    int foundPos = -1;
+
+    if (isFound) {
+      Pair<MutableState, Integer> pair = findPointOfDiversion(state, word, 0);
+
+      if (pair == null) {
+        // Word already exists
+        return;
+      }
+
+      foundPos = pair.getRight();
+      state = pair.getLeft();
+    }
+
+    for (int i = foundPos + 1; i < word.length(); i++) {
+      MutableState nextState = new MutableState();
+
+      nextState.setLabel(word.charAt(i));
+
+      int currentOutputSymbol = -1;
+
+      if (i == word.length() - 1) {
+        currentOutputSymbol = outputSymbol;
+      }
+
+      MutableArc mutableArc = new MutableArc(currentOutputSymbol, nextState);
+      state.addArc(mutableArc);
+
+      state = nextState;
+    }
+
+    state.setIsTerminal(true);
+  }
+
+  private Pair<MutableState, Integer> findPointOfDiversion(MutableState mutableState,
+      String word, int currentPos) {
+    if (currentPos == word.length() - 1) {
+      return null;
+    }
+
+    if (mutableState.getLabel() != word.charAt(currentPos)) {
+      throw new IllegalStateException("Current state needs to be part of word path");
+    }
+
+    List<MutableArc> arcs = mutableState.getArcs();
+
+    for (MutableArc arc : arcs) {
+      if (arc.getNextState().getLabel() == word.charAt(currentPos + 1)) {
+        return findPointOfDiversion(arc.getNextState(), word, currentPos + 1);
+      }
+    }
+
+    return Pair.of(mutableState, currentPos);
+  }
+
+  static <T> void compactNulls(ArrayList<T> list) {
+    int nextGood = 0;
+    for (int i = 0; i < list.size(); i++) {
+      T ss = list.get(i);
+      if (ss != null) {
+        if (i != nextGood) {
+          list.set(nextGood, ss);
+        }
+        nextGood += 1;
+      }
+    }
+    // trim the end
+    while (list.size() > nextGood) {
+      list.remove(list.size() - 1);
+    }
+  }

Review comment:
       I think you could replace this with `list.removeIf(Objects::isNull)` without any performance impact, if it's written this way for performance reasons.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#discussion_r794377855



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/nativefst/utils/RealTimeRegexpMatcher.java
##########
@@ -0,0 +1,156 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.utils.nativefst.utils;
+
+import java.util.ArrayDeque;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Queue;
+import java.util.Set;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.Automaton;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.CharacterRunAutomaton;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.RegExp;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.State;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.Transition;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableArc;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableFST;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableState;
+import org.roaringbitmap.IntConsumer;
+
+public class RealTimeRegexpMatcher {
+  private final String _regexQuery;
+  private final MutableFST _fst;
+  private final Automaton _automaton;
+  private final IntConsumer _dest;
+
+  public RealTimeRegexpMatcher(String regexQuery, MutableFST fst, IntConsumer dest) {
+    _regexQuery = regexQuery;
+    _fst = fst;
+    _dest = dest;
+
+    _automaton = new RegExp(_regexQuery).toAutomaton();
+  }
+
+  public static void regexMatch(String regexQuery, MutableFST fst, IntConsumer dest) {
+    RealTimeRegexpMatcher matcher = new RealTimeRegexpMatcher(regexQuery, fst, dest);
+    matcher.regexMatchOnFST();
+  }
+
+  // Matches "input" string with _regexQuery Automaton.
+  public boolean match(String input) {
+    CharacterRunAutomaton characterRunAutomaton = new CharacterRunAutomaton(_automaton);
+    return characterRunAutomaton.run(input);
+  }
+
+  /**
+   * This function runs matching on automaton built from regexQuery and the FST.
+   * FST stores key (string) to a value (Long). Both are state machines and state transition is based on
+   * a input character.
+   *
+   * This algorithm starts with Queue containing (Automaton Start Node, FST Start Node).
+   * Each step an entry is popped from the queue:
+   *    1) if the automaton state is accept and the FST Node is final (i.e. end node) then the value stored for that FST
+   *       is added to the set of result.
+   *    2) Else next set of transitions on automaton are gathered and for each transition target node for that character
+   *       is figured out in FST Node, resulting pair of (automaton state, fst node) are added to the queue.
+   *    3) This process is bound to complete since we are making progression on the FST (which is a DAG) towards final
+   *       nodes.
+   */
+  public void regexMatchOnFST() {

Review comment:
       Sorry to resurrect this thread but none of the underlying issues (profuse blank lines) have been addressed in this class.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a change in pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on a change in pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#discussion_r794876398



##########
File path: pinot-perf/src/main/java/org/apache/pinot/perf/BenchmarkMutableFST.java
##########
@@ -0,0 +1,100 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.perf;
+
+import java.io.BufferedReader;
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.util.List;
+import java.util.Objects;
+import java.util.SortedMap;
+import java.util.TreeMap;
+import java.util.concurrent.TimeUnit;
+import org.apache.pinot.segment.local.utils.fst.FSTBuilder;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableFST;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableFSTImpl;
+import org.apache.pinot.segment.local.utils.nativefst.utils.RealTimeRegexpMatcher;
+import org.openjdk.jmh.annotations.Benchmark;
+import org.openjdk.jmh.annotations.BenchmarkMode;
+import org.openjdk.jmh.annotations.Fork;
+import org.openjdk.jmh.annotations.Measurement;
+import org.openjdk.jmh.annotations.Mode;
+import org.openjdk.jmh.annotations.OutputTimeUnit;
+import org.openjdk.jmh.annotations.Param;
+import org.openjdk.jmh.annotations.Scope;
+import org.openjdk.jmh.annotations.Setup;
+import org.openjdk.jmh.annotations.State;
+import org.openjdk.jmh.annotations.Warmup;
+import org.openjdk.jmh.runner.Runner;
+import org.openjdk.jmh.runner.options.OptionsBuilder;
+import org.roaringbitmap.RoaringBitmapWriter;
+import org.roaringbitmap.buffer.MutableRoaringBitmap;
+
+
+@BenchmarkMode(Mode.AverageTime)
+@OutputTimeUnit(TimeUnit.MICROSECONDS)
+@Warmup(iterations = 3, time = 30)
+@Measurement(iterations = 5, time = 30)
+@Fork(1)
+@State(Scope.Benchmark)

Review comment:
       +1 on this, let's avoid full class name if there is no name clash




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] atris commented on a change in pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
atris commented on a change in pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#discussion_r794316102



##########
File path: pinot-perf/src/main/java/org/apache/pinot/perf/BenchmarkMutableFST.java
##########
@@ -0,0 +1,97 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.perf;
+
+import java.io.BufferedReader;
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.util.List;
+import java.util.Objects;
+import java.util.SortedMap;
+import java.util.TreeMap;
+import java.util.concurrent.TimeUnit;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableFST;
+import org.apache.pinot.segment.local.utils.nativefst.utils.RealTimeRegexpMatcher;
+import org.openjdk.jmh.annotations.Benchmark;
+import org.openjdk.jmh.annotations.BenchmarkMode;
+import org.openjdk.jmh.annotations.Fork;
+import org.openjdk.jmh.annotations.Measurement;
+import org.openjdk.jmh.annotations.Mode;
+import org.openjdk.jmh.annotations.OutputTimeUnit;
+import org.openjdk.jmh.annotations.Param;
+import org.openjdk.jmh.annotations.Scope;
+import org.openjdk.jmh.annotations.Setup;
+import org.openjdk.jmh.annotations.State;
+import org.openjdk.jmh.annotations.Warmup;
+import org.openjdk.jmh.infra.Blackhole;
+import org.openjdk.jmh.runner.Runner;
+import org.openjdk.jmh.runner.options.OptionsBuilder;
+import org.roaringbitmap.RoaringBitmapWriter;
+import org.roaringbitmap.buffer.MutableRoaringBitmap;
+
+
+@BenchmarkMode(Mode.AverageTime)
+@OutputTimeUnit(TimeUnit.MICROSECONDS)
+@Warmup(iterations = 3, time = 30)
+@Measurement(iterations = 5, time = 30)
+@Fork(1)
+@State(Scope.Benchmark)
+public class BenchmarkMutableFST {
+  @Param({"q.[aeiou]c.*", ".*a", "b.*", ".*", ".*ated", ".*ba.*"})
+  public String _regex;
+
+  private MutableFST _mutableFST;
+  private org.apache.lucene.util.fst.FST _fst;
+
+  @Setup
+  public void setUp()
+      throws IOException {
+    SortedMap<String, Integer> input = new TreeMap<>();
+    try (BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(
+        Objects.requireNonNull(getClass().getClassLoader().getResourceAsStream("data/words.txt"))))) {
+      String currentWord;
+      int i = 0;
+      while ((currentWord = bufferedReader.readLine()) != null) {
+        _mutableFST.addPath(currentWord, i);
+        input.put(currentWord, i++);
+      }
+    }
+
+    _fst = org.apache.pinot.segment.local.utils.fst.FSTBuilder.buildFST(input);

Review comment:
       Fixed




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#discussion_r794375450



##########
File path: pinot-perf/src/main/java/org/apache/pinot/perf/BenchmarkMutableFST.java
##########
@@ -0,0 +1,100 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.perf;
+
+import java.io.BufferedReader;
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.util.List;
+import java.util.Objects;
+import java.util.SortedMap;
+import java.util.TreeMap;
+import java.util.concurrent.TimeUnit;
+import org.apache.pinot.segment.local.utils.fst.FSTBuilder;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableFST;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableFSTImpl;
+import org.apache.pinot.segment.local.utils.nativefst.utils.RealTimeRegexpMatcher;
+import org.openjdk.jmh.annotations.Benchmark;
+import org.openjdk.jmh.annotations.BenchmarkMode;
+import org.openjdk.jmh.annotations.Fork;
+import org.openjdk.jmh.annotations.Measurement;
+import org.openjdk.jmh.annotations.Mode;
+import org.openjdk.jmh.annotations.OutputTimeUnit;
+import org.openjdk.jmh.annotations.Param;
+import org.openjdk.jmh.annotations.Scope;
+import org.openjdk.jmh.annotations.Setup;
+import org.openjdk.jmh.annotations.State;
+import org.openjdk.jmh.annotations.Warmup;
+import org.openjdk.jmh.runner.Runner;
+import org.openjdk.jmh.runner.options.OptionsBuilder;
+import org.roaringbitmap.RoaringBitmapWriter;
+import org.roaringbitmap.buffer.MutableRoaringBitmap;
+
+
+@BenchmarkMode(Mode.AverageTime)
+@OutputTimeUnit(TimeUnit.MICROSECONDS)
+@Warmup(iterations = 3, time = 30)
+@Measurement(iterations = 5, time = 30)
+@Fork(1)
+@State(Scope.Benchmark)

Review comment:
       What about line 93? 
   
   ```java
   return org.apache.pinot.segment.local.utils.fst.RegexpMatcher.regexMatch(_regex, _fst);
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter edited a comment on pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#issuecomment-1012898899


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#8016](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (c507893) into [master](https://codecov.io/gh/apache/pinot/commit/0fe7ef89127b9e920ef369cd9adad9c8b817dde9?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (0fe7ef8) will **decrease** coverage by `57.04%`.
   > The diff coverage is `0.00%`.
   
   > :exclamation: Current head c507893 differs from pull request most recent head 0951c0c. Consider uploading reports for the commit 0951c0c to get more accurate results
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/8016/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@              Coverage Diff              @@
   ##             master    #8016       +/-   ##
   =============================================
   - Coverage     71.24%   14.20%   -57.05%     
   + Complexity     4262       81     -4181     
   =============================================
     Files          1607     1567       -40     
     Lines         83409    81727     -1682     
     Branches      12458    12289      -169     
   =============================================
   - Hits          59426    11610    -47816     
   - Misses        19941    69259    +49318     
   + Partials       4042      858     -3184     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `?` | |
   | integration2 | `?` | |
   | unittests1 | `?` | |
   | unittests2 | `14.20% <0.00%> (-0.05%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...t/local/utils/nativefst/mutablefst/MutableArc.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC91dGlscy9uYXRpdmVmc3QvbXV0YWJsZWZzdC9NdXRhYmxlQXJjLmphdmE=) | `0.00% <0.00%> (ø)` | |
   | [...cal/utils/nativefst/mutablefst/MutableFSTImpl.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC91dGlscy9uYXRpdmVmc3QvbXV0YWJsZWZzdC9NdXRhYmxlRlNUSW1wbC5qYXZh) | `0.00% <0.00%> (ø)` | |
   | [...local/utils/nativefst/mutablefst/MutableState.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC91dGlscy9uYXRpdmVmc3QvbXV0YWJsZWZzdC9NdXRhYmxlU3RhdGUuamF2YQ==) | `0.00% <0.00%> (ø)` | |
   | [...ls/nativefst/mutablefst/utils/MutableFSTUtils.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC91dGlscy9uYXRpdmVmc3QvbXV0YWJsZWZzdC91dGlscy9NdXRhYmxlRlNUVXRpbHMuamF2YQ==) | `0.00% <0.00%> (ø)` | |
   | [...l/utils/nativefst/utils/RealTimeRegexpMatcher.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC91dGlscy9uYXRpdmVmc3QvdXRpbHMvUmVhbFRpbWVSZWdleHBNYXRjaGVyLmphdmE=) | `0.00% <0.00%> (ø)` | |
   | [...ain/java/org/apache/pinot/core/data/table/Key.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL3RhYmxlL0tleS5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [.../java/org/apache/pinot/spi/utils/BooleanUtils.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvdXRpbHMvQm9vbGVhblV0aWxzLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [.../java/org/apache/pinot/core/data/table/Record.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9kYXRhL3RhYmxlL1JlY29yZC5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [.../java/org/apache/pinot/core/util/GroupByUtils.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS91dGlsL0dyb3VwQnlVdGlscy5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...ava/org/apache/pinot/spi/config/table/FSTType.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvY29uZmlnL3RhYmxlL0ZTVFR5cGUuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | ... and [1281 more](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [0fe7ef8...0951c0c](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] walterddr commented on a change in pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
walterddr commented on a change in pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#discussion_r786981509



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/nativefst/utils/RealTimeRegexpMatcher.java
##########
@@ -0,0 +1,156 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.utils.nativefst.utils;
+
+import java.util.ArrayDeque;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Queue;
+import java.util.Set;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.Automaton;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.CharacterRunAutomaton;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.RegExp;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.State;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.Transition;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableArc;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableFST;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableState;
+import org.roaringbitmap.IntConsumer;
+
+public class RealTimeRegexpMatcher {
+  private final String _regexQuery;
+  private final MutableFST _fst;
+  private final Automaton _automaton;
+  private final IntConsumer _dest;
+
+  public RealTimeRegexpMatcher(String regexQuery, MutableFST fst, IntConsumer dest) {
+    _regexQuery = regexQuery;
+    _fst = fst;
+    _dest = dest;
+
+    _automaton = new RegExp(_regexQuery).toAutomaton();
+  }
+
+  public static void regexMatch(String regexQuery, MutableFST fst, IntConsumer dest) {
+    RealTimeRegexpMatcher matcher = new RealTimeRegexpMatcher(regexQuery, fst, dest);
+    matcher.regexMatchOnFST();
+  }
+
+  // Matches "input" string with _regexQuery Automaton.
+  public boolean match(String input) {
+    CharacterRunAutomaton characterRunAutomaton = new CharacterRunAutomaton(_automaton);
+    return characterRunAutomaton.run(input);
+  }
+
+  /**
+   * This function runs matching on automaton built from regexQuery and the FST.
+   * FST stores key (string) to a value (Long). Both are state machines and state transition is based on
+   * a input character.
+   *
+   * This algorithm starts with Queue containing (Automaton Start Node, FST Start Node).
+   * Each step an entry is popped from the queue:
+   *    1) if the automaton state is accept and the FST Node is final (i.e. end node) then the value stored for that FST
+   *       is added to the set of result.
+   *    2) Else next set of transitions on automaton are gathered and for each transition target node for that character
+   *       is figured out in FST Node, resulting pair of (automaton state, fst node) are added to the queue.
+   *    3) This process is bound to complete since we are making progression on the FST (which is a DAG) towards final
+   *       nodes.
+   */
+  public void regexMatchOnFST() {

Review comment:
       Sorry for jumping in the discussion late. I myself have the habit to add many line separators for readability as well so glad to see some discussions about this. In general I found it hard to enforce a checkstyle on readability as I agree it is rather subjective. (other than those obvious ones such as https://google.github.io/styleguide/javaguide.html). 
   
   my own opinion here is to follow as close to the OSS codebase standard from as possible - this gives a consistency sense to users as well as first-time contributors. But without incurring too much overhead, I don't think we want to discourage contribution because of formatting in general.
   
   but i saw richard here provided many constructive feedback as well beside the formatting; in my experience ppl who reviewed extensively on such a large PR will also be the ones who maintain it and collaborate together going forward. it would be great to reach a reasonable readability middle ground.
   
   
   This is just my personal opinion, I am also pretty new to Pinot so please disregard if it doesn't seem to make sense. But definitely will use this as a standard for my contribution going forward. 
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] atris commented on a change in pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
atris commented on a change in pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#discussion_r786523280



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/nativefst/mutablefst/MutableFST.java
##########
@@ -0,0 +1,66 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.segment.local.utils.nativefst.mutablefst;
+
+
+/**
+ * A mutable FST represents a FST which can have arbitrary inputs added to it at
+ * any given point of time. Unlike a normal FST which is build once read many, mutable
+ * FST can be concurrently written to and read from. Mutable FST provides real time search
+ * i.e. search will see words as they are added without needing a flush.
+ *
+ * Unlike a normal FST, mutable FST does not require the entire input beforehand nor
+ * does it require the input to be sorted. Single word additions work well with
+ * mutable FST.
+ *
+ * The reason as to why normal FST and mutable FST have different interfaces is because

Review comment:
       > Thanks a lot for adding this powerful feature. It would be great to see a proposal on high this ties in end-to-end (as in would the interfaces still hold).
   
   Added in the documentation




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] atris commented on a change in pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
atris commented on a change in pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#discussion_r786543070



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/nativefst/mutablefst/MutableFST.java
##########
@@ -0,0 +1,66 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.segment.local.utils.nativefst.mutablefst;
+
+
+/**
+ * A mutable FST represents a FST which can have arbitrary inputs added to it at
+ * any given point of time. Unlike a normal FST which is build once read many, mutable
+ * FST can be concurrently written to and read from. Mutable FST provides real time search
+ * i.e. search will see words as they are added without needing a flush.
+ *
+ * Unlike a normal FST, mutable FST does not require the entire input beforehand nor
+ * does it require the input to be sorted. Single word additions work well with
+ * mutable FST.
+ *
+ * The reason as to why normal FST and mutable FST have different interfaces is because

Review comment:
       > Does this mean that there will have to be separate code to use mutable vs immutable FST? An abstraction that avoids that would be great.
   
   That's a great point. However, here is the catch. Immutable and Mutable FSTs, while providing the same functionality, follow different data structures. It is pretty hard to get them to agree on one interface without significantly compromising ImmutableFST's serialised state optimisations or Mutable FST's speed of addition of new paths.
   
   However, any layer above pinot-segment will not be aware of this difference. Referring to the proposal document above, the integration path for mutable FST will include a native FST reader which will propagate a query to both mutable and immutable FSTs(within a segment) at the same time.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] atris commented on a change in pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
atris commented on a change in pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#discussion_r794762474



##########
File path: pinot-perf/src/main/java/org/apache/pinot/perf/BenchmarkMutableFST.java
##########
@@ -0,0 +1,100 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.perf;
+
+import java.io.BufferedReader;
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.util.List;
+import java.util.Objects;
+import java.util.SortedMap;
+import java.util.TreeMap;
+import java.util.concurrent.TimeUnit;
+import org.apache.pinot.segment.local.utils.fst.FSTBuilder;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableFST;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableFSTImpl;
+import org.apache.pinot.segment.local.utils.nativefst.utils.RealTimeRegexpMatcher;
+import org.openjdk.jmh.annotations.Benchmark;
+import org.openjdk.jmh.annotations.BenchmarkMode;
+import org.openjdk.jmh.annotations.Fork;
+import org.openjdk.jmh.annotations.Measurement;
+import org.openjdk.jmh.annotations.Mode;
+import org.openjdk.jmh.annotations.OutputTimeUnit;
+import org.openjdk.jmh.annotations.Param;
+import org.openjdk.jmh.annotations.Scope;
+import org.openjdk.jmh.annotations.Setup;
+import org.openjdk.jmh.annotations.State;
+import org.openjdk.jmh.annotations.Warmup;
+import org.openjdk.jmh.runner.Runner;
+import org.openjdk.jmh.runner.options.OptionsBuilder;
+import org.roaringbitmap.RoaringBitmapWriter;
+import org.roaringbitmap.buffer.MutableRoaringBitmap;
+
+
+@BenchmarkMode(Mode.AverageTime)
+@OutputTimeUnit(TimeUnit.MICROSECONDS)
+@Warmup(iterations = 3, time = 30)
+@Measurement(iterations = 5, time = 30)
+@Fork(1)
+@State(Scope.Benchmark)

Review comment:
       Same reason -- RegexpMatcher and RegexpMatcher.regexMatch are present for all three -- Lucene, Native and Mutable FST.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter edited a comment on pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#issuecomment-1012898899


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#8016](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (9f7aca3) into [master](https://codecov.io/gh/apache/pinot/commit/0fe7ef89127b9e920ef369cd9adad9c8b817dde9?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (0fe7ef8) will **decrease** coverage by `43.65%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/8016/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@              Coverage Diff              @@
   ##             master    #8016       +/-   ##
   =============================================
   - Coverage     71.24%   27.58%   -43.66%     
   =============================================
     Files          1607     1603        -4     
     Lines         83409    83269      -140     
     Branches      12458    12457        -1     
   =============================================
   - Hits          59426    22972    -36454     
   - Misses        19941    58180    +38239     
   + Partials       4042     2117     -1925     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `?` | |
   | integration2 | `27.58% <0.00%> (-0.09%)` | :arrow_down: |
   | unittests1 | `?` | |
   | unittests2 | `?` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...t/local/utils/nativefst/mutablefst/MutableArc.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC91dGlscy9uYXRpdmVmc3QvbXV0YWJsZWZzdC9NdXRhYmxlQXJjLmphdmE=) | `0.00% <0.00%> (ø)` | |
   | [...cal/utils/nativefst/mutablefst/MutableFSTImpl.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC91dGlscy9uYXRpdmVmc3QvbXV0YWJsZWZzdC9NdXRhYmxlRlNUSW1wbC5qYXZh) | `0.00% <0.00%> (ø)` | |
   | [...local/utils/nativefst/mutablefst/MutableState.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC91dGlscy9uYXRpdmVmc3QvbXV0YWJsZWZzdC9NdXRhYmxlU3RhdGUuamF2YQ==) | `0.00% <0.00%> (ø)` | |
   | [...ls/nativefst/mutablefst/utils/MutableFSTUtils.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC91dGlscy9uYXRpdmVmc3QvbXV0YWJsZWZzdC91dGlscy9NdXRhYmxlRlNUVXRpbHMuamF2YQ==) | `0.00% <0.00%> (ø)` | |
   | [...l/utils/nativefst/utils/RealTimeRegexpMatcher.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC91dGlscy9uYXRpdmVmc3QvdXRpbHMvUmVhbFRpbWVSZWdleHBNYXRjaGVyLmphdmE=) | `0.00% <0.00%> (ø)` | |
   | [.../java/org/apache/pinot/spi/utils/BooleanUtils.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvdXRpbHMvQm9vbGVhblV0aWxzLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...ava/org/apache/pinot/spi/config/table/FSTType.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvY29uZmlnL3RhYmxlL0ZTVFR5cGUuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...ava/org/apache/pinot/spi/data/MetricFieldSpec.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvZGF0YS9NZXRyaWNGaWVsZFNwZWMuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...va/org/apache/pinot/spi/utils/BigDecimalUtils.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvdXRpbHMvQmlnRGVjaW1hbFV0aWxzLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...java/org/apache/pinot/common/tier/TierFactory.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vdGllci9UaWVyRmFjdG9yeS5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | ... and [1153 more](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [0fe7ef8...9f7aca3](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter edited a comment on pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#issuecomment-1012898899


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#8016](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (c507893) into [master](https://codecov.io/gh/apache/pinot/commit/0fe7ef89127b9e920ef369cd9adad9c8b817dde9?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (0fe7ef8) will **decrease** coverage by `1.26%`.
   > The diff coverage is `62.62%`.
   
   > :exclamation: Current head c507893 differs from pull request most recent head 0951c0c. Consider uploading reports for the commit 0951c0c to get more accurate results
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/8016/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #8016      +/-   ##
   ============================================
   - Coverage     71.24%   69.98%   -1.27%     
   - Complexity     4262     4306      +44     
   ============================================
     Files          1607     1612       +5     
     Lines         83409    83607     +198     
     Branches      12458    12493      +35     
   ============================================
   - Hits          59426    58510     -916     
   - Misses        19941    21087    +1146     
   + Partials       4042     4010      -32     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `?` | |
   | integration2 | `27.57% <0.00%> (-0.10%)` | :arrow_down: |
   | unittests1 | `67.90% <62.62%> (+0.03%)` | :arrow_up: |
   | unittests2 | `14.20% <0.00%> (-0.05%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...ls/nativefst/mutablefst/utils/MutableFSTUtils.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC91dGlscy9uYXRpdmVmc3QvbXV0YWJsZWZzdC91dGlscy9NdXRhYmxlRlNUVXRpbHMuamF2YQ==) | `11.53% <11.53%> (ø)` | |
   | [...local/utils/nativefst/mutablefst/MutableState.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC91dGlscy9uYXRpdmVmc3QvbXV0YWJsZWZzdC9NdXRhYmxlU3RhdGUuamF2YQ==) | `50.00% <50.00%> (ø)` | |
   | [...t/local/utils/nativefst/mutablefst/MutableArc.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC91dGlscy9uYXRpdmVmc3QvbXV0YWJsZWZzdC9NdXRhYmxlQXJjLmphdmE=) | `54.54% <54.54%> (ø)` | |
   | [...cal/utils/nativefst/mutablefst/MutableFSTImpl.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC91dGlscy9uYXRpdmVmc3QvbXV0YWJsZWZzdC9NdXRhYmxlRlNUSW1wbC5qYXZh) | `66.66% <66.66%> (ø)` | |
   | [...l/utils/nativefst/utils/RealTimeRegexpMatcher.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC91dGlscy9uYXRpdmVmc3QvdXRpbHMvUmVhbFRpbWVSZWdleHBNYXRjaGVyLmphdmE=) | `92.30% <92.30%> (ø)` | |
   | [...pinot/minion/exception/TaskCancelledException.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtbWluaW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9taW5pb24vZXhjZXB0aW9uL1Rhc2tDYW5jZWxsZWRFeGNlcHRpb24uamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...nverttorawindex/ConvertToRawIndexTaskExecutor.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtcGx1Z2lucy9waW5vdC1taW5pb24tdGFza3MvcGlub3QtbWluaW9uLWJ1aWx0aW4tdGFza3Mvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3BsdWdpbi9taW5pb24vdGFza3MvY29udmVydHRvcmF3aW5kZXgvQ29udmVydFRvUmF3SW5kZXhUYXNrRXhlY3V0b3IuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...e/pinot/common/minion/MergeRollupTaskMetadata.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vbWluaW9uL01lcmdlUm9sbHVwVGFza01ldGFkYXRhLmphdmE=) | `0.00% <0.00%> (-94.74%)` | :arrow_down: |
   | [...plugin/segmentuploader/SegmentUploaderDefault.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtcGx1Z2lucy9waW5vdC1zZWdtZW50LXVwbG9hZGVyL3Bpbm90LXNlZ21lbnQtdXBsb2FkZXItZGVmYXVsdC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvcGx1Z2luL3NlZ21lbnR1cGxvYWRlci9TZWdtZW50VXBsb2FkZXJEZWZhdWx0LmphdmE=) | `0.00% <0.00%> (-87.10%)` | :arrow_down: |
   | [.../transform/function/MapValueTransformFunction.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9vcGVyYXRvci90cmFuc2Zvcm0vZnVuY3Rpb24vTWFwVmFsdWVUcmFuc2Zvcm1GdW5jdGlvbi5qYXZh) | `0.00% <0.00%> (-85.30%)` | :arrow_down: |
   | ... and [109 more](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [0fe7ef8...0951c0c](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter edited a comment on pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#issuecomment-1012898899


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#8016](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (c507893) into [master](https://codecov.io/gh/apache/pinot/commit/0fe7ef89127b9e920ef369cd9adad9c8b817dde9?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (0fe7ef8) will **decrease** coverage by `35.18%`.
   > The diff coverage is `0.00%`.
   
   > :exclamation: Current head c507893 differs from pull request most recent head 0951c0c. Consider uploading reports for the commit 0951c0c to get more accurate results
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/8016/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@              Coverage Diff              @@
   ##             master    #8016       +/-   ##
   =============================================
   - Coverage     71.24%   36.05%   -35.19%     
   + Complexity     4262       81     -4181     
   =============================================
     Files          1607     1612        +5     
     Lines         83409    83607      +198     
     Branches      12458    12493       +35     
   =============================================
   - Hits          59426    30148    -29278     
   - Misses        19941    51056    +31115     
   + Partials       4042     2403     -1639     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `?` | |
   | integration2 | `27.57% <0.00%> (-0.10%)` | :arrow_down: |
   | unittests1 | `?` | |
   | unittests2 | `14.20% <0.00%> (-0.05%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...t/local/utils/nativefst/mutablefst/MutableArc.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC91dGlscy9uYXRpdmVmc3QvbXV0YWJsZWZzdC9NdXRhYmxlQXJjLmphdmE=) | `0.00% <0.00%> (ø)` | |
   | [...cal/utils/nativefst/mutablefst/MutableFSTImpl.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC91dGlscy9uYXRpdmVmc3QvbXV0YWJsZWZzdC9NdXRhYmxlRlNUSW1wbC5qYXZh) | `0.00% <0.00%> (ø)` | |
   | [...local/utils/nativefst/mutablefst/MutableState.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC91dGlscy9uYXRpdmVmc3QvbXV0YWJsZWZzdC9NdXRhYmxlU3RhdGUuamF2YQ==) | `0.00% <0.00%> (ø)` | |
   | [...ls/nativefst/mutablefst/utils/MutableFSTUtils.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC91dGlscy9uYXRpdmVmc3QvbXV0YWJsZWZzdC91dGlscy9NdXRhYmxlRlNUVXRpbHMuamF2YQ==) | `0.00% <0.00%> (ø)` | |
   | [...l/utils/nativefst/utils/RealTimeRegexpMatcher.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC91dGlscy9uYXRpdmVmc3QvdXRpbHMvUmVhbFRpbWVSZWdleHBNYXRjaGVyLmphdmE=) | `0.00% <0.00%> (ø)` | |
   | [.../java/org/apache/pinot/spi/utils/BooleanUtils.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvdXRpbHMvQm9vbGVhblV0aWxzLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...ava/org/apache/pinot/spi/config/table/FSTType.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvY29uZmlnL3RhYmxlL0ZTVFR5cGUuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...ava/org/apache/pinot/spi/data/MetricFieldSpec.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvZGF0YS9NZXRyaWNGaWVsZFNwZWMuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...va/org/apache/pinot/spi/utils/BigDecimalUtils.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvdXRpbHMvQmlnRGVjaW1hbFV0aWxzLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...java/org/apache/pinot/common/tier/TierFactory.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vdGllci9UaWVyRmFjdG9yeS5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | ... and [972 more](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [0fe7ef8...0951c0c](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] atris commented on pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
atris commented on pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#issuecomment-1025458439


   @richardstartin Would you have any further comments?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#issuecomment-1021453476


   I saw some unsolved comments. Please resolve them and wait for all tests pass before merging.
   You might need to rebase to the latest master if the compatibility test against `0.9.0` fails


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#discussion_r785788891



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/nativefst/utils/RealTimeRegexpMatcher.java
##########
@@ -0,0 +1,156 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.utils.nativefst.utils;
+
+import java.util.ArrayDeque;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Queue;
+import java.util.Set;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.Automaton;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.CharacterRunAutomaton;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.RegExp;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.State;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.Transition;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableArc;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableFST;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableState;
+import org.roaringbitmap.IntConsumer;
+
+public class RealTimeRegexpMatcher {
+  private final String _regexQuery;
+  private final MutableFST _fst;
+  private final Automaton _automaton;
+  private final IntConsumer _dest;
+
+  public RealTimeRegexpMatcher(String regexQuery, MutableFST fst, IntConsumer dest) {
+    _regexQuery = regexQuery;
+    _fst = fst;
+    _dest = dest;
+
+    _automaton = new RegExp(_regexQuery).toAutomaton();
+  }
+
+  public static void regexMatch(String regexQuery, MutableFST fst, IntConsumer dest) {
+    RealTimeRegexpMatcher matcher = new RealTimeRegexpMatcher(regexQuery, fst, dest);
+    matcher.regexMatchOnFST();
+  }
+
+  // Matches "input" string with _regexQuery Automaton.
+  public boolean match(String input) {
+    CharacterRunAutomaton characterRunAutomaton = new CharacterRunAutomaton(_automaton);
+    return characterRunAutomaton.run(input);
+  }
+
+  /**
+   * This function runs matching on automaton built from regexQuery and the FST.
+   * FST stores key (string) to a value (Long). Both are state machines and state transition is based on
+   * a input character.
+   *
+   * This algorithm starts with Queue containing (Automaton Start Node, FST Start Node).
+   * Each step an entry is popped from the queue:
+   *    1) if the automaton state is accept and the FST Node is final (i.e. end node) then the value stored for that FST
+   *       is added to the set of result.
+   *    2) Else next set of transitions on automaton are gathered and for each transition target node for that character
+   *       is figured out in FST Node, resulting pair of (automaton state, fst node) are added to the queue.
+   *    3) This process is bound to complete since we are making progression on the FST (which is a DAG) towards final
+   *       nodes.
+   */
+  public void regexMatchOnFST() {

Review comment:
       remove 9 blank lines

##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/nativefst/utils/RealTimeRegexpMatcher.java
##########
@@ -0,0 +1,156 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.utils.nativefst.utils;
+
+import java.util.ArrayDeque;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Queue;
+import java.util.Set;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.Automaton;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.CharacterRunAutomaton;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.RegExp;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.State;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.Transition;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableArc;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableFST;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableState;
+import org.roaringbitmap.IntConsumer;
+
+public class RealTimeRegexpMatcher {
+  private final String _regexQuery;
+  private final MutableFST _fst;
+  private final Automaton _automaton;
+  private final IntConsumer _dest;
+
+  public RealTimeRegexpMatcher(String regexQuery, MutableFST fst, IntConsumer dest) {
+    _regexQuery = regexQuery;
+    _fst = fst;
+    _dest = dest;
+
+    _automaton = new RegExp(_regexQuery).toAutomaton();
+  }
+
+  public static void regexMatch(String regexQuery, MutableFST fst, IntConsumer dest) {
+    RealTimeRegexpMatcher matcher = new RealTimeRegexpMatcher(regexQuery, fst, dest);
+    matcher.regexMatchOnFST();
+  }
+
+  // Matches "input" string with _regexQuery Automaton.
+  public boolean match(String input) {
+    CharacterRunAutomaton characterRunAutomaton = new CharacterRunAutomaton(_automaton);
+    return characterRunAutomaton.run(input);
+  }
+
+  /**
+   * This function runs matching on automaton built from regexQuery and the FST.
+   * FST stores key (string) to a value (Long). Both are state machines and state transition is based on
+   * a input character.
+   *
+   * This algorithm starts with Queue containing (Automaton Start Node, FST Start Node).
+   * Each step an entry is popped from the queue:
+   *    1) if the automaton state is accept and the FST Node is final (i.e. end node) then the value stored for that FST
+   *       is added to the set of result.
+   *    2) Else next set of transitions on automaton are gathered and for each transition target node for that character
+   *       is figured out in FST Node, resulting pair of (automaton state, fst node) are added to the queue.
+   *    3) This process is bound to complete since we are making progression on the FST (which is a DAG) towards final
+   *       nodes.
+   */
+  public void regexMatchOnFST() {
+    final Queue<Path> queue = new ArrayDeque();
+
+    if (_automaton.getNumberOfStates() == 0) {
+      return;
+    }
+
+    // Automaton start state and FST start node is added to the queue.
+    queue.add(new Path(_automaton.getInitialState(), _fst.getStartState(), null, new ArrayList<>()));
+
+    Set<State> acceptStates = _automaton.getAcceptStates();
+    while (!queue.isEmpty()) {
+      final Path path = queue.remove();
+
+      // If automaton is in accept state and the fstNode is final (i.e. end node) then add the entry to endNodes which
+      // contains the result set.
+      if (acceptStates.contains(path._state)) {
+        if (path._node.isTerminal()) {
+          //endNodes.add((long) _fst.getOutputSymbol(path._fstArc));
+          _dest.accept(path._fstArc.getOutputSymbol());
+        }
+      }
+
+      Set<Transition> stateTransitions = path._state.getTransitionSet();
+
+      for (Transition t : stateTransitions) {
+        final int min = t._min;
+        final int max = t._max;
+
+        if (min == max) {
+          MutableArc arc = getArcForLabel(path._node, t._min);
+
+          if (arc != null) {
+            queue.add(new Path(t._to, arc.getNextState(), arc, path._pathState));
+          }
+        } else {
+          List<MutableArc> arcs = path._node.getArcs();
+
+          for (MutableArc arc : arcs) {
+            char label = arc.getNextState().getLabel();
+            if (label >= min && label <= max) {
+              queue.add(new Path(t._to, arc.getNextState(), arc, path._pathState));
+            }
+          }
+        }
+      }
+    }
+  }
+
+  private MutableArc getArcForLabel(MutableState mutableState, char label) {

Review comment:
       remove 2 blank lines




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#discussion_r784301448



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/nativefst/mutablefst/MutableFSTImpl.java
##########
@@ -0,0 +1,198 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.segment.local.utils.nativefst.mutablefst;
+
+import com.google.common.base.Preconditions;
+
+import java.util.ArrayList;
+import java.util.List;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.utils.MutableFSTUtils;
+
+
+/**
+ * A mutable finite state transducer implementation that allows you to build mutable via the API.
+ * This is not thread safe; convert to an ImmutableFst if you need to share across multiple writer
+ * threads.
+ *
+ * Concurrently writing and reading to/from a mutable FST is supported.
+ */
+public class MutableFSTImpl implements MutableFST {
+  private MutableState _start;
+
+  public MutableFSTImpl() {
+    _start = new MutableState(true);
+  }
+
+  /**
+   * Get the initial states
+   */
+  @Override
+  public MutableState getStartState() {
+    return _start;
+  }
+
+  /**
+   * Set the initial state
+   *
+   * @param start the initial state
+   */
+  @Override
+  public void setStartState(MutableState start) {
+    if (_start != null) {
+      throw new IllegalStateException("Cannot override a start state");
+    }
+
+    _start = start;
+  }
+
+  public MutableState newStartState() {
+    return newStartState();
+  }
+
+  public MutableArc addArc(MutableState startState, int outputSymbol, MutableState endState) {
+    MutableArc newArc = new MutableArc(outputSymbol,
+                                        endState);
+    startState.addArc(newArc);
+    endState.addIncomingState(startState);
+    return newArc;
+  }
+
+  @Override
+  public void throwIfInvalid() {
+    Preconditions.checkNotNull(_start, "must have a start state");
+  }
+
+  @Override
+  public void addPath(String word, int outputSymbol) {
+    MutableState state = getStartState();
+
+    if (state == null) {
+      throw new IllegalStateException("Start state cannot be null");
+    }
+
+    List<MutableArc> arcs = state.getArcs();
+
+    boolean isFound = false;
+
+    for (MutableArc arc : arcs) {
+      if (arc.getNextState().getLabel() == word.charAt(0)) {
+        state = arc.getNextState();
+        isFound = true;
+        break;
+      }
+    }
+
+    int foundPos = -1;
+
+    if (isFound) {
+      Pair<MutableState, Integer> pair = findPointOfDiversion(state, word, 0);
+
+      if (pair == null) {
+        // Word already exists
+        return;
+      }
+
+      foundPos = pair.getRight();
+      state = pair.getLeft();
+    }
+
+    for (int i = foundPos + 1; i < word.length(); i++) {
+      MutableState nextState = new MutableState();
+
+      nextState.setLabel(word.charAt(i));
+
+      int currentOutputSymbol = -1;
+
+      if (i == word.length() - 1) {
+        currentOutputSymbol = outputSymbol;
+      }
+
+      MutableArc mutableArc = new MutableArc(currentOutputSymbol, nextState);
+      state.addArc(mutableArc);
+
+      state = nextState;
+    }
+
+    state.setIsTerminal(true);
+  }
+
+  private Pair<MutableState, Integer> findPointOfDiversion(MutableState mutableState,
+      String word, int currentPos) {
+    if (currentPos == word.length() - 1) {
+      return null;
+    }
+
+    if (mutableState.getLabel() != word.charAt(currentPos)) {
+      throw new IllegalStateException("Current state needs to be part of word path");
+    }
+
+    List<MutableArc> arcs = mutableState.getArcs();
+
+    for (MutableArc arc : arcs) {
+      if (arc.getNextState().getLabel() == word.charAt(currentPos + 1)) {
+        return findPointOfDiversion(arc.getNextState(), word, currentPos + 1);
+      }
+    }
+
+    return Pair.of(mutableState, currentPos);
+  }

Review comment:
       this non tail-recursion is quite dangerous because it could result in very deep stacks, what is the bound on the number of states?
   
   
   I wonder if you could rework this slightly so instead of matching `List<MutableArc>` - each of which holds a `char` - and a `String` do a direct comparison between two `char[]` (convert the string to char[] at the edge, invert `List<MutableArc>`) so you can use `Arrays.mismatch`. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#discussion_r784290668



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/nativefst/utils/RealTimeRegexpMatcher.java
##########
@@ -0,0 +1,159 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.utils.nativefst.utils;
+
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Set;
+
+import org.apache.pinot.segment.local.utils.nativefst.automaton.Automaton;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.CharacterRunAutomaton;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.RegExp;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.State;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.Transition;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableArc;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableFST;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableState;
+import org.roaringbitmap.IntConsumer;
+
+public class RealTimeRegexpMatcher {
+  private final String _regexQuery;
+  private final MutableFST _fst;
+  private final Automaton _automaton;
+  private final IntConsumer _dest;
+
+  public RealTimeRegexpMatcher(String regexQuery, MutableFST fst, IntConsumer dest) {
+    _regexQuery = regexQuery;
+    _fst = fst;
+    _dest = dest;
+
+    _automaton = new RegExp(_regexQuery).toAutomaton();
+  }
+
+  public static void regexMatch(String regexQuery, MutableFST fst, IntConsumer dest) {
+    RealTimeRegexpMatcher matcher = new RealTimeRegexpMatcher(regexQuery, fst, dest);
+    matcher.regexMatchOnFST();
+  }
+
+  // Matches "input" string with _regexQuery Automaton.
+  public boolean match(String input) {
+    CharacterRunAutomaton characterRunAutomaton = new CharacterRunAutomaton(_automaton);
+    return characterRunAutomaton.run(input);
+  }
+
+  /**
+   * This function runs matching on automaton built from regexQuery and the FST.
+   * FST stores key (string) to a value (Long). Both are state machines and state transition is based on
+   * a input character.
+   *
+   * This algorithm starts with Queue containing (Automaton Start Node, FST Start Node).
+   * Each step an entry is popped from the queue:
+   *    1) if the automaton state is accept and the FST Node is final (i.e. end node) then the value stored for that FST
+   *       is added to the set of result.
+   *    2) Else next set of transitions on automaton are gathered and for each transition target node for that character
+   *       is figured out in FST Node, resulting pair of (automaton state, fst node) are added to the queue.
+   *    3) This process is bound to complete since we are making progression on the FST (which is a DAG) towards final
+   *       nodes.
+   */
+  public void regexMatchOnFST() {
+    final List<Path> queue = new ArrayList<>();
+
+    if (_automaton.getNumberOfStates() == 0) {
+      return;
+    }
+
+    // Automaton start state and FST start node is added to the queue.
+    queue.add(new Path(_automaton.getInitialState(), (MutableState) _fst.getStartState(), null, new ArrayList<>()));
+
+    Set<State> acceptStates = _automaton.getAcceptStates();
+    while (queue.size() != 0) {
+      final Path path = queue.remove(queue.size() - 1);
+
+      // If automaton is in accept state and the fstNode is final (i.e. end node) then add the entry to endNodes which
+      // contains the result set.
+      if (acceptStates.contains(path._state)) {
+        if (path._node.isTerminal()) {
+          //endNodes.add((long) _fst.getOutputSymbol(path._fstArc));
+          _dest.accept(path._fstArc.getOutputSymbol());
+        }
+      }
+
+      Set<Transition> stateTransitions = path._state.getTransitionSet();
+      Iterator<Transition> iterator = stateTransitions.iterator();
+
+      while (iterator.hasNext()) {
+        Transition t = iterator.next();

Review comment:
       ```java
       for (Transition transition : stateTransitions) {
           ...
      }
   ```
   Is a lot easier to read.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#discussion_r785786269



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/nativefst/mutablefst/MutableFSTImpl.java
##########
@@ -0,0 +1,196 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.segment.local.utils.nativefst.mutablefst;
+
+import com.google.common.base.Preconditions;
+import java.util.ArrayDeque;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Objects;
+import java.util.Queue;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.utils.MutableFSTUtils;
+
+
+/**
+ * A mutable finite state transducer implementation that allows you to build mutable via the API.
+ * This is not thread safe; convert to an ImmutableFst if you need to share across multiple writer
+ * threads.
+ *
+ * Concurrently writing and reading to/from a mutable FST is supported.
+ */
+public class MutableFSTImpl implements MutableFST {
+  private MutableState _start;
+
+  public MutableFSTImpl() {
+    _start = new MutableState(true);
+  }
+
+  /**
+   * Get the initial states
+   */
+  @Override
+  public MutableState getStartState() {
+    return _start;
+  }
+
+  /**
+   * Set the initial state
+   *
+   * @param start the initial state
+   */
+  @Override
+  public void setStartState(MutableState start) {
+    Preconditions.checkState(_start != null, "Cannot override a start state");
+    _start = start;
+  }
+
+  public MutableState newStartState() {
+    return newStartState();
+  }
+
+  public MutableArc addArc(MutableState startState, int outputSymbol, MutableState endState) {
+    MutableArc newArc = new MutableArc(outputSymbol,
+                                        endState);
+    startState.addArc(newArc);
+    endState.addIncomingState(startState);
+    return newArc;
+  }
+
+  @Override
+  public void throwIfInvalid() {
+    Preconditions.checkNotNull(_start, "must have a start state");
+  }
+
+  @Override
+  public void addPath(String word, int outputSymbol) {
+    MutableState state = getStartState();
+
+    if (state == null) {
+      throw new IllegalStateException("Start state cannot be null");
+    }
+
+    List<MutableArc> arcs = state.getArcs();
+
+    boolean isFound = false;
+
+    for (MutableArc arc : arcs) {
+      if (arc.getNextState().getLabel() == word.charAt(0)) {
+        state = arc.getNextState();
+        isFound = true;
+        break;
+      }
+    }
+
+    int foundPos = -1;
+
+    if (isFound) {
+      Pair<MutableState, Integer> pair = findPointOfDiversion(state, word);
+
+      if (pair == null) {
+        // Word already exists
+        return;
+      }
+
+      foundPos = pair.getRight();
+      state = pair.getLeft();
+    }
+
+    for (int i = foundPos + 1; i < word.length(); i++) {
+      MutableState nextState = new MutableState();
+
+      nextState.setLabel(word.charAt(i));
+
+      int currentOutputSymbol = -1;
+
+      if (i == word.length() - 1) {
+        currentOutputSymbol = outputSymbol;
+      }
+
+      MutableArc mutableArc = new MutableArc(currentOutputSymbol, nextState);
+      state.addArc(mutableArc);
+
+      state = nextState;
+    }
+
+    state.setIsTerminal(true);
+  }
+
+  private Pair<MutableState, Integer> findPointOfDiversion(MutableState mutableState,

Review comment:
       Please remove 7 blank lines.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter edited a comment on pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#issuecomment-1012898899


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#8016](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (c20ca3c) into [master](https://codecov.io/gh/apache/pinot/commit/77a706996099f9bb44b90ad506f5205b3c4d7a42?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (77a7069) will **increase** coverage by `0.02%`.
   > The diff coverage is `62.62%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/8016/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #8016      +/-   ##
   ============================================
   + Coverage     71.31%   71.33%   +0.02%     
   - Complexity     4112     4268     +156     
   ============================================
     Files          1593     1604      +11     
     Lines         82365    83155     +790     
     Branches      12270    12410     +140     
   ============================================
   + Hits          58740    59322     +582     
   - Misses        19660    19822     +162     
   - Partials       3965     4011      +46     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `28.96% <0.00%> (-0.06%)` | :arrow_down: |
   | integration2 | `27.65% <0.00%> (-0.07%)` | :arrow_down: |
   | unittests1 | `68.12% <62.62%> (-0.33%)` | :arrow_down: |
   | unittests2 | `14.31% <0.00%> (-0.01%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...ls/nativefst/mutablefst/utils/MutableFSTUtils.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC91dGlscy9uYXRpdmVmc3QvbXV0YWJsZWZzdC91dGlscy9NdXRhYmxlRlNUVXRpbHMuamF2YQ==) | `11.53% <11.53%> (ø)` | |
   | [...local/utils/nativefst/mutablefst/MutableState.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC91dGlscy9uYXRpdmVmc3QvbXV0YWJsZWZzdC9NdXRhYmxlU3RhdGUuamF2YQ==) | `50.00% <50.00%> (ø)` | |
   | [...t/local/utils/nativefst/mutablefst/MutableArc.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC91dGlscy9uYXRpdmVmc3QvbXV0YWJsZWZzdC9NdXRhYmxlQXJjLmphdmE=) | `54.54% <54.54%> (ø)` | |
   | [...cal/utils/nativefst/mutablefst/MutableFSTImpl.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC91dGlscy9uYXRpdmVmc3QvbXV0YWJsZWZzdC9NdXRhYmxlRlNUSW1wbC5qYXZh) | `66.66% <66.66%> (ø)` | |
   | [...l/utils/nativefst/utils/RealTimeRegexpMatcher.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC91dGlscy9uYXRpdmVmc3QvdXRpbHMvUmVhbFRpbWVSZWdleHBNYXRjaGVyLmphdmE=) | `92.30% <92.30%> (ø)` | |
   | [...ry/optimizer/statement/JsonStatementOptimizer.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9xdWVyeS9vcHRpbWl6ZXIvc3RhdGVtZW50L0pzb25TdGF0ZW1lbnRPcHRpbWl6ZXIuamF2YQ==) | `0.00% <0.00%> (-77.64%)` | :arrow_down: |
   | [...elix/core/minion/generator/PinotTaskGenerator.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29udHJvbGxlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29udHJvbGxlci9oZWxpeC9jb3JlL21pbmlvbi9nZW5lcmF0b3IvUGlub3RUYXNrR2VuZXJhdG9yLmphdmE=) | `33.33% <0.00%> (-66.67%)` | :arrow_down: |
   | [...readers/forward/BaseChunkSVForwardIndexReader.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L3JlYWRlcnMvZm9yd2FyZC9CYXNlQ2h1bmtTVkZvcndhcmRJbmRleFJlYWRlci5qYXZh) | `46.15% <0.00%> (-46.50%)` | :arrow_down: |
   | [.../pinot/spi/exception/BadQueryRequestException.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvZXhjZXB0aW9uL0JhZFF1ZXJ5UmVxdWVzdEV4Y2VwdGlvbi5qYXZh) | `66.66% <0.00%> (-33.34%)` | :arrow_down: |
   | [...in/stream/kafka20/KafkaPartitionLevelConsumer.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtcGx1Z2lucy9waW5vdC1zdHJlYW0taW5nZXN0aW9uL3Bpbm90LWthZmthLTIuMC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvcGx1Z2luL3N0cmVhbS9rYWZrYTIwL0thZmthUGFydGl0aW9uTGV2ZWxDb25zdW1lci5qYXZh) | `66.66% <0.00%> (-20.00%)` | :arrow_down: |
   | ... and [121 more](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [77a7069...c20ca3c](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] siddharthteotia commented on pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#issuecomment-1016209462


   Hi @atris , I will take a look at this PR tomorrow


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] atris commented on pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
atris commented on pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#issuecomment-1020904236


   This PR will be merged in 24 hours from now, barring objections


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter edited a comment on pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#issuecomment-1012898899


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#8016](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (9f7aca3) into [master](https://codecov.io/gh/apache/pinot/commit/0fe7ef89127b9e920ef369cd9adad9c8b817dde9?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (0fe7ef8) will **decrease** coverage by `40.50%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/8016/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@              Coverage Diff              @@
   ##             master    #8016       +/-   ##
   =============================================
   - Coverage     71.24%   30.73%   -40.51%     
   =============================================
     Files          1607     1603        -4     
     Lines         83409    83269      -140     
     Branches      12458    12457        -1     
   =============================================
   - Hits          59426    25595    -33831     
   - Misses        19941    55419    +35478     
   + Partials       4042     2255     -1787     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `29.01% <0.00%> (+0.04%)` | :arrow_up: |
   | integration2 | `27.58% <0.00%> (-0.09%)` | :arrow_down: |
   | unittests1 | `?` | |
   | unittests2 | `?` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...t/local/utils/nativefst/mutablefst/MutableArc.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC91dGlscy9uYXRpdmVmc3QvbXV0YWJsZWZzdC9NdXRhYmxlQXJjLmphdmE=) | `0.00% <0.00%> (ø)` | |
   | [...cal/utils/nativefst/mutablefst/MutableFSTImpl.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC91dGlscy9uYXRpdmVmc3QvbXV0YWJsZWZzdC9NdXRhYmxlRlNUSW1wbC5qYXZh) | `0.00% <0.00%> (ø)` | |
   | [...local/utils/nativefst/mutablefst/MutableState.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC91dGlscy9uYXRpdmVmc3QvbXV0YWJsZWZzdC9NdXRhYmxlU3RhdGUuamF2YQ==) | `0.00% <0.00%> (ø)` | |
   | [...ls/nativefst/mutablefst/utils/MutableFSTUtils.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC91dGlscy9uYXRpdmVmc3QvbXV0YWJsZWZzdC91dGlscy9NdXRhYmxlRlNUVXRpbHMuamF2YQ==) | `0.00% <0.00%> (ø)` | |
   | [...l/utils/nativefst/utils/RealTimeRegexpMatcher.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC91dGlscy9uYXRpdmVmc3QvdXRpbHMvUmVhbFRpbWVSZWdleHBNYXRjaGVyLmphdmE=) | `0.00% <0.00%> (ø)` | |
   | [.../java/org/apache/pinot/spi/utils/BooleanUtils.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvdXRpbHMvQm9vbGVhblV0aWxzLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...ava/org/apache/pinot/spi/config/table/FSTType.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvY29uZmlnL3RhYmxlL0ZTVFR5cGUuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...ava/org/apache/pinot/spi/data/MetricFieldSpec.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvZGF0YS9NZXRyaWNGaWVsZFNwZWMuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...va/org/apache/pinot/spi/utils/BigDecimalUtils.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvdXRpbHMvQmlnRGVjaW1hbFV0aWxzLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...java/org/apache/pinot/common/tier/TierFactory.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vdGllci9UaWVyRmFjdG9yeS5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | ... and [1102 more](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [0fe7ef8...9f7aca3](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter commented on pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#issuecomment-1012898899


   # [Codecov](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#8016](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (1f94e56) into [master](https://codecov.io/gh/apache/pinot/commit/77a706996099f9bb44b90ad506f5205b3c4d7a42?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (77a7069) will **decrease** coverage by `42.35%`.
   > The diff coverage is `0.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/pinot/pull/8016/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@              Coverage Diff              @@
   ##             master    #8016       +/-   ##
   =============================================
   - Coverage     71.31%   28.95%   -42.36%     
   =============================================
     Files          1593     1592        -1     
     Lines         82365    82739      +374     
     Branches      12270    12368       +98     
   =============================================
   - Hits          58740    23960    -34780     
   - Misses        19660    56619    +36959     
   + Partials       3965     2160     -1805     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1 | `28.95% <0.00%> (-0.07%)` | :arrow_down: |
   | integration2 | `?` | |
   | unittests1 | `?` | |
   | unittests2 | `?` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...t/local/utils/nativefst/mutablefst/MutableArc.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC91dGlscy9uYXRpdmVmc3QvbXV0YWJsZWZzdC9NdXRhYmxlQXJjLmphdmE=) | `0.00% <0.00%> (ø)` | |
   | [...cal/utils/nativefst/mutablefst/MutableFSTImpl.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC91dGlscy9uYXRpdmVmc3QvbXV0YWJsZWZzdC9NdXRhYmxlRlNUSW1wbC5qYXZh) | `0.00% <0.00%> (ø)` | |
   | [...local/utils/nativefst/mutablefst/MutableState.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC91dGlscy9uYXRpdmVmc3QvbXV0YWJsZWZzdC9NdXRhYmxlU3RhdGUuamF2YQ==) | `0.00% <0.00%> (ø)` | |
   | [...ls/nativefst/mutablefst/utils/MutableFSTUtils.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC91dGlscy9uYXRpdmVmc3QvbXV0YWJsZWZzdC91dGlscy9NdXRhYmxlRlNUVXRpbHMuamF2YQ==) | `0.00% <0.00%> (ø)` | |
   | [...l/utils/nativefst/utils/RealTimeRegexpMatcher.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC91dGlscy9uYXRpdmVmc3QvdXRpbHMvUmVhbFRpbWVSZWdleHBNYXRjaGVyLmphdmE=) | `0.00% <0.00%> (ø)` | |
   | [.../java/org/apache/pinot/spi/utils/BooleanUtils.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvdXRpbHMvQm9vbGVhblV0aWxzLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...ava/org/apache/pinot/spi/config/table/FSTType.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvY29uZmlnL3RhYmxlL0ZTVFR5cGUuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...ava/org/apache/pinot/spi/data/MetricFieldSpec.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvZGF0YS9NZXRyaWNGaWVsZFNwZWMuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...va/org/apache/pinot/spi/utils/BigDecimalUtils.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvdXRpbHMvQmlnRGVjaW1hbFV0aWxzLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...java/org/apache/pinot/common/tier/TierFactory.java](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vdGllci9UaWVyRmFjdG9yeS5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | ... and [1153 more](https://codecov.io/gh/apache/pinot/pull/8016/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [77a7069...1f94e56](https://codecov.io/gh/apache/pinot/pull/8016?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] atris commented on a change in pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
atris commented on a change in pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#discussion_r786636540



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/nativefst/utils/RealTimeRegexpMatcher.java
##########
@@ -0,0 +1,156 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.utils.nativefst.utils;
+
+import java.util.ArrayDeque;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Queue;
+import java.util.Set;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.Automaton;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.CharacterRunAutomaton;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.RegExp;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.State;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.Transition;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableArc;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableFST;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableState;
+import org.roaringbitmap.IntConsumer;
+
+public class RealTimeRegexpMatcher {
+  private final String _regexQuery;
+  private final MutableFST _fst;
+  private final Automaton _automaton;
+  private final IntConsumer _dest;
+
+  public RealTimeRegexpMatcher(String regexQuery, MutableFST fst, IntConsumer dest) {
+    _regexQuery = regexQuery;
+    _fst = fst;
+    _dest = dest;
+
+    _automaton = new RegExp(_regexQuery).toAutomaton();
+  }
+
+  public static void regexMatch(String regexQuery, MutableFST fst, IntConsumer dest) {
+    RealTimeRegexpMatcher matcher = new RealTimeRegexpMatcher(regexQuery, fst, dest);
+    matcher.regexMatchOnFST();
+  }
+
+  // Matches "input" string with _regexQuery Automaton.
+  public boolean match(String input) {
+    CharacterRunAutomaton characterRunAutomaton = new CharacterRunAutomaton(_automaton);
+    return characterRunAutomaton.run(input);
+  }
+
+  /**
+   * This function runs matching on automaton built from regexQuery and the FST.
+   * FST stores key (string) to a value (Long). Both are state machines and state transition is based on
+   * a input character.
+   *
+   * This algorithm starts with Queue containing (Automaton Start Node, FST Start Node).
+   * Each step an entry is popped from the queue:
+   *    1) if the automaton state is accept and the FST Node is final (i.e. end node) then the value stored for that FST
+   *       is added to the set of result.
+   *    2) Else next set of transitions on automaton are gathered and for each transition target node for that character
+   *       is figured out in FST Node, resulting pair of (automaton state, fst node) are added to the queue.
+   *    3) This process is bound to complete since we are making progression on the FST (which is a DAG) towards final
+   *       nodes.
+   */
+  public void regexMatchOnFST() {

Review comment:
       @richardstartin First off, I really appreciate your in-depth look into the PR. As usual, your comments are invaluable.
   I agree with you that the spacing on the PR was a bit too much -- and I have fixed it per your comments. However, I would also ask yourself to be a bit considerate for expression of a personal style. The example you made above about inconsistent spacing makes perfect sense, and I am happy to fix that (and would have fixed it in a blink, had it been the primary comment that you made about spacing). However, you have commented on nearly every blank line that I added in this PR, including all the places where I was consistent. For eg., I added a blank space after variable declaration and before usage, and this is a consistent pattern. You did not agree with the same and asked me to revert that in mostly every file that this PR consists of. I accepted your comments and changed the same. All I am requesting you (humbly) is to allow me a bit of room (within reasonable limits, of course). I tend to find adding a blank line soothing to my eyes, and while I understand that this PR went ove
 rboard with the same, I have tried my best to fix it now. I request you to kindly not allege that I am being intentional by adding too many lines and trying to obfuscate the project (quoting your latest comment). I sincerely apologise if I have crossed any boundaries or have been detrimental to the project in any form -- that is not my intention.
   
   And, as you must have noticed, I have followed your advise to not add spaces in my text anymore. This comment is a testament to that.
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] atris commented on a change in pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
atris commented on a change in pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#discussion_r786636540



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/nativefst/utils/RealTimeRegexpMatcher.java
##########
@@ -0,0 +1,156 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.utils.nativefst.utils;
+
+import java.util.ArrayDeque;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Queue;
+import java.util.Set;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.Automaton;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.CharacterRunAutomaton;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.RegExp;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.State;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.Transition;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableArc;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableFST;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableState;
+import org.roaringbitmap.IntConsumer;
+
+public class RealTimeRegexpMatcher {
+  private final String _regexQuery;
+  private final MutableFST _fst;
+  private final Automaton _automaton;
+  private final IntConsumer _dest;
+
+  public RealTimeRegexpMatcher(String regexQuery, MutableFST fst, IntConsumer dest) {
+    _regexQuery = regexQuery;
+    _fst = fst;
+    _dest = dest;
+
+    _automaton = new RegExp(_regexQuery).toAutomaton();
+  }
+
+  public static void regexMatch(String regexQuery, MutableFST fst, IntConsumer dest) {
+    RealTimeRegexpMatcher matcher = new RealTimeRegexpMatcher(regexQuery, fst, dest);
+    matcher.regexMatchOnFST();
+  }
+
+  // Matches "input" string with _regexQuery Automaton.
+  public boolean match(String input) {
+    CharacterRunAutomaton characterRunAutomaton = new CharacterRunAutomaton(_automaton);
+    return characterRunAutomaton.run(input);
+  }
+
+  /**
+   * This function runs matching on automaton built from regexQuery and the FST.
+   * FST stores key (string) to a value (Long). Both are state machines and state transition is based on
+   * a input character.
+   *
+   * This algorithm starts with Queue containing (Automaton Start Node, FST Start Node).
+   * Each step an entry is popped from the queue:
+   *    1) if the automaton state is accept and the FST Node is final (i.e. end node) then the value stored for that FST
+   *       is added to the set of result.
+   *    2) Else next set of transitions on automaton are gathered and for each transition target node for that character
+   *       is figured out in FST Node, resulting pair of (automaton state, fst node) are added to the queue.
+   *    3) This process is bound to complete since we are making progression on the FST (which is a DAG) towards final
+   *       nodes.
+   */
+  public void regexMatchOnFST() {

Review comment:
       @richardstartin First off, I really appreciate your in-depth look into the PR. As usual, your comments are invaluable.
   I agree with you that the spacing on the PR was a bit too much -- and I have fixed it per your comments. However, I would also ask yourself to be a bit considerate for expression of a personal style. The example you made above about inconsistent spacing makes perfect space, and I am happy to fix that (and would have fixed it in a blink, had it been the primary comment that you made about spacing). However, you have commented on nearly every blank line that I added in this PR, including all the places where I was consistent. For eg., I added a blank space after variable declaration and before usage, and this is a consistent pattern. You did not agree with the same and asked me to revert that in mostly every file that this PR consists of. I accepted your comments and changed the same. All I am requesting you (humbly) is to allow me a bit of room (within reasonable limits, of course). I tend to find adding a blank line soothing to my eyes, and while I understand that this PR went ove
 rboard with the same, I have tried my best to fix it now. I request you to kindly not allege that I am being intentional by adding too many lines and trying to obfuscate the project (quoting your latest comment). I sincerely apologise if I have crossed any boundaries or have been detrimental to the project in any form -- that is not my intention.
   
   And, as you must have noticed, I have followed your advise to not add spaces in my text anymore. This comment is a testament to that.
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] atris commented on pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
atris commented on pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#issuecomment-1014254854


   @richardstartin Updated the PR, please see


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#discussion_r785782197



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/nativefst/mutablefst/MutableArc.java
##########
@@ -0,0 +1,72 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.segment.local.utils.nativefst.mutablefst;
+
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.utils.MutableFSTUtils;
+
+
+/**
+ * A mutable FST's arc
+ */
+public class MutableArc {
+
+  private int _outputSymbol;
+  private MutableState _nextState;
+
+  /**
+   * Arc Constructor
+   *
+   * @param nextState the arc's next state
+   */
+  public MutableArc(int outputSymbol, MutableState nextState) {
+    _outputSymbol = outputSymbol;
+    _nextState = nextState;
+  }
+
+  public int getOutputSymbol() {
+    return _outputSymbol;
+  }
+
+  /**
+   * Get the next state
+   */
+  public MutableState getNextState() {
+    return _nextState;
+  }
+
+  @Override
+  public boolean equals(Object obj) {
+    return MutableFSTUtils.arcEquals(this, obj);
+  }
+
+  @Override
+  public int hashCode() {
+    int result = 1;
+
+    result = 31 * result + (_nextState != null ? _nextState.getLabel() : 0);
+    return result;
+  }
+
+  @Override
+  public String toString() {
+    return "(" + _nextState.toString()
+           + ")";

Review comment:
       Formatting: doesn't need a line break.

##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/nativefst/mutablefst/MutableArc.java
##########
@@ -0,0 +1,72 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.segment.local.utils.nativefst.mutablefst;
+
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.utils.MutableFSTUtils;
+
+
+/**
+ * A mutable FST's arc
+ */
+public class MutableArc {
+
+  private int _outputSymbol;
+  private MutableState _nextState;
+
+  /**
+   * Arc Constructor
+   *
+   * @param nextState the arc's next state
+   */
+  public MutableArc(int outputSymbol, MutableState nextState) {
+    _outputSymbol = outputSymbol;
+    _nextState = nextState;
+  }
+
+  public int getOutputSymbol() {
+    return _outputSymbol;
+  }
+
+  /**
+   * Get the next state
+   */
+  public MutableState getNextState() {
+    return _nextState;
+  }
+
+  @Override
+  public boolean equals(Object obj) {
+    return MutableFSTUtils.arcEquals(this, obj);
+  }
+
+  @Override
+  public int hashCode() {
+    int result = 1;
+

Review comment:
       Formatting: doesn't need a line break.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] richardstartin commented on a change in pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#discussion_r785788087



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/nativefst/mutablefst/utils/MutableFSTUtils.java
##########
@@ -0,0 +1,85 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.segment.local.utils.nativefst.mutablefst.utils;
+
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableArc;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableFST;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableState;
+import org.apache.pinot.segment.local.utils.nativefst.utils.RealTimeRegexpMatcher;
+import org.roaringbitmap.RoaringBitmapWriter;
+import org.roaringbitmap.buffer.MutableRoaringBitmap;
+
+public class MutableFSTUtils {
+
+  private MutableFSTUtils() {
+
+  }
+
+  public static boolean fstEquals(Object thisFstObj,
+                                  Object thatFstObj) {
+    if (thisFstObj == thatFstObj) {
+      return true;
+    }
+
+    if (thisFstObj instanceof MutableFST && thatFstObj instanceof MutableFST) {
+      MutableFST thisFST = (MutableFST) thisFstObj;
+      MutableFST thatFST = (MutableFST) thatFstObj;
+      return thisFST.getStartState().getLabel() == thatFST.getStartState().getLabel()
+          && thisFST.getStartState().getArcs().size() == thatFST.getStartState().getArcs().size();
+    }
+    return false;
+  }
+
+  public static boolean arcEquals(Object thisArcObj, Object thatArcObj) {
+    if (thisArcObj == thatArcObj) {
+      return true;
+    }
+
+    if (thisArcObj instanceof MutableArc && thatArcObj instanceof MutableArc) {
+      MutableArc thisArc = (MutableArc) thisArcObj;
+      MutableArc thatArc = (MutableArc) thatArcObj;
+      return thisArc.getNextState().getLabel() == thatArc.getNextState().getLabel()
+          && thisArc.getNextState().getArcs().size() == thatArc.getNextState().getArcs().size();
+    }
+    return false;
+  }
+
+  public static boolean stateEquals(Object thisStateObj, Object thatStateObj) {
+    if (thisStateObj == thatStateObj) {
+      return true;
+    }
+    if (thisStateObj instanceof MutableState && thatStateObj instanceof MutableState) {
+      MutableState thisState = (MutableState) thisStateObj;
+      MutableState thatState = (MutableState) thatStateObj;
+      return thisState.getLabel() == thatState.getLabel() && thisState.getArcs().size() == thatState.getArcs().size();
+    }
+    return false;
+  }
+
+  /**
+   * Return number of matches for given regex for realtime FST
+   */
+  public static long regexQueryNrHitsForRealTimeFST(String regex, MutableFST fst) {
+    RoaringBitmapWriter<MutableRoaringBitmap> writer = RoaringBitmapWriter.bufferWriter().get();
+    RealTimeRegexpMatcher.regexMatch(regex, fst, writer::add);
+

Review comment:
       remove blank line




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] atris commented on a change in pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
atris commented on a change in pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#discussion_r784645746



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/nativefst/mutablefst/MutableFSTImpl.java
##########
@@ -0,0 +1,198 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.segment.local.utils.nativefst.mutablefst;
+
+import com.google.common.base.Preconditions;
+
+import java.util.ArrayList;
+import java.util.List;
+import org.apache.commons.lang3.tuple.Pair;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.utils.MutableFSTUtils;
+
+
+/**
+ * A mutable finite state transducer implementation that allows you to build mutable via the API.
+ * This is not thread safe; convert to an ImmutableFst if you need to share across multiple writer
+ * threads.
+ *
+ * Concurrently writing and reading to/from a mutable FST is supported.
+ */
+public class MutableFSTImpl implements MutableFST {
+  private MutableState _start;
+
+  public MutableFSTImpl() {
+    _start = new MutableState(true);
+  }
+
+  /**
+   * Get the initial states
+   */
+  @Override
+  public MutableState getStartState() {
+    return _start;
+  }
+
+  /**
+   * Set the initial state
+   *
+   * @param start the initial state
+   */
+  @Override
+  public void setStartState(MutableState start) {
+    if (_start != null) {
+      throw new IllegalStateException("Cannot override a start state");
+    }
+
+    _start = start;
+  }
+
+  public MutableState newStartState() {
+    return newStartState();
+  }
+
+  public MutableArc addArc(MutableState startState, int outputSymbol, MutableState endState) {
+    MutableArc newArc = new MutableArc(outputSymbol,
+                                        endState);
+    startState.addArc(newArc);
+    endState.addIncomingState(startState);
+    return newArc;
+  }
+
+  @Override
+  public void throwIfInvalid() {
+    Preconditions.checkNotNull(_start, "must have a start state");
+  }
+
+  @Override
+  public void addPath(String word, int outputSymbol) {
+    MutableState state = getStartState();
+
+    if (state == null) {
+      throw new IllegalStateException("Start state cannot be null");
+    }
+
+    List<MutableArc> arcs = state.getArcs();
+
+    boolean isFound = false;
+
+    for (MutableArc arc : arcs) {
+      if (arc.getNextState().getLabel() == word.charAt(0)) {
+        state = arc.getNextState();
+        isFound = true;
+        break;
+      }
+    }
+
+    int foundPos = -1;
+
+    if (isFound) {
+      Pair<MutableState, Integer> pair = findPointOfDiversion(state, word, 0);
+
+      if (pair == null) {
+        // Word already exists
+        return;
+      }
+
+      foundPos = pair.getRight();
+      state = pair.getLeft();
+    }
+
+    for (int i = foundPos + 1; i < word.length(); i++) {
+      MutableState nextState = new MutableState();
+
+      nextState.setLabel(word.charAt(i));
+
+      int currentOutputSymbol = -1;
+
+      if (i == word.length() - 1) {
+        currentOutputSymbol = outputSymbol;
+      }
+
+      MutableArc mutableArc = new MutableArc(currentOutputSymbol, nextState);
+      state.addArc(mutableArc);
+
+      state = nextState;
+    }
+
+    state.setIsTerminal(true);
+  }
+
+  private Pair<MutableState, Integer> findPointOfDiversion(MutableState mutableState,
+      String word, int currentPos) {
+    if (currentPos == word.length() - 1) {
+      return null;
+    }
+
+    if (mutableState.getLabel() != word.charAt(currentPos)) {
+      throw new IllegalStateException("Current state needs to be part of word path");
+    }
+
+    List<MutableArc> arcs = mutableState.getArcs();
+
+    for (MutableArc arc : arcs) {
+      if (arc.getNextState().getLabel() == word.charAt(currentPos + 1)) {
+        return findPointOfDiversion(arc.getNextState(), word, currentPos + 1);
+      }
+    }
+
+    return Pair.of(mutableState, currentPos);
+  }
+
+  static <T> void compactNulls(ArrayList<T> list) {
+    int nextGood = 0;
+    for (int i = 0; i < list.size(); i++) {
+      T ss = list.get(i);
+      if (ss != null) {
+        if (i != nextGood) {
+          list.set(nextGood, ss);
+        }
+        nextGood += 1;
+      }
+    }
+    // trim the end
+    while (list.size() > nextGood) {
+      list.remove(list.size() - 1);
+    }
+  }

Review comment:
       Yes, the objective was to maximise performance.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] walterddr commented on a change in pull request #8016: Implement Real Time Mutable FST

Posted by GitBox <gi...@apache.org>.
walterddr commented on a change in pull request #8016:
URL: https://github.com/apache/pinot/pull/8016#discussion_r786981509



##########
File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/nativefst/utils/RealTimeRegexpMatcher.java
##########
@@ -0,0 +1,156 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.utils.nativefst.utils;
+
+import java.util.ArrayDeque;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Queue;
+import java.util.Set;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.Automaton;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.CharacterRunAutomaton;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.RegExp;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.State;
+import org.apache.pinot.segment.local.utils.nativefst.automaton.Transition;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableArc;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableFST;
+import org.apache.pinot.segment.local.utils.nativefst.mutablefst.MutableState;
+import org.roaringbitmap.IntConsumer;
+
+public class RealTimeRegexpMatcher {
+  private final String _regexQuery;
+  private final MutableFST _fst;
+  private final Automaton _automaton;
+  private final IntConsumer _dest;
+
+  public RealTimeRegexpMatcher(String regexQuery, MutableFST fst, IntConsumer dest) {
+    _regexQuery = regexQuery;
+    _fst = fst;
+    _dest = dest;
+
+    _automaton = new RegExp(_regexQuery).toAutomaton();
+  }
+
+  public static void regexMatch(String regexQuery, MutableFST fst, IntConsumer dest) {
+    RealTimeRegexpMatcher matcher = new RealTimeRegexpMatcher(regexQuery, fst, dest);
+    matcher.regexMatchOnFST();
+  }
+
+  // Matches "input" string with _regexQuery Automaton.
+  public boolean match(String input) {
+    CharacterRunAutomaton characterRunAutomaton = new CharacterRunAutomaton(_automaton);
+    return characterRunAutomaton.run(input);
+  }
+
+  /**
+   * This function runs matching on automaton built from regexQuery and the FST.
+   * FST stores key (string) to a value (Long). Both are state machines and state transition is based on
+   * a input character.
+   *
+   * This algorithm starts with Queue containing (Automaton Start Node, FST Start Node).
+   * Each step an entry is popped from the queue:
+   *    1) if the automaton state is accept and the FST Node is final (i.e. end node) then the value stored for that FST
+   *       is added to the set of result.
+   *    2) Else next set of transitions on automaton are gathered and for each transition target node for that character
+   *       is figured out in FST Node, resulting pair of (automaton state, fst node) are added to the queue.
+   *    3) This process is bound to complete since we are making progression on the FST (which is a DAG) towards final
+   *       nodes.
+   */
+  public void regexMatchOnFST() {

Review comment:
       Sorry for jumping in the discussion late. I myself have the habit to add many line separators for readability as well so glad to see some discussions about this. In general I found it hard to enforce a checkstyle on readability as I agree it is rather subjective. (other than those obvious ones such as https://google.github.io/styleguide/javaguide.html). 
   
   my own opinion here is to follow as close to the OSS codebase standard from as possible - this gives a consistency sense to users as well as first-time contributors. for example i don't see other code separate variable definition from its first usage immediately after; However this should not incur too much overhead, I don't think we want to discourage contribution because I need to read how others format their code first.
   
   but i saw richard here provided many constructive feedback as well beside the formatting; in my experience ppl who reviewed extensively on such a large PR will also be the ones who maintain it and collaborate together going forward. it would be great to reach a reasonable readability middle ground. I don't think we want to discourage reviews either because of readability.
   
   
   This is just my personal opinion, I am also pretty new to Pinot so please disregard if it doesn't seem to make sense. But definitely will use this as a standard for my contribution going forward if we can reach a general guideline
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org