You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "gortiz (via GitHub)" <gi...@apache.org> on 2023/05/03 07:51:48 UTC

[GitHub] [pinot] gortiz commented on a diff in pull request #10687: Implement mutable index using index SPI

gortiz commented on code in PR #10687:
URL: https://github.com/apache/pinot/pull/10687#discussion_r1183349882


##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/mutable/MutableIndex.java:
##########
@@ -0,0 +1,48 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.segment.spi.index.mutable;
+
+import javax.annotation.Nonnull;
+import javax.annotation.Nullable;
+import org.apache.pinot.segment.spi.index.IndexReader;
+
+
+public interface MutableIndex extends IndexReader {

Review Comment:
   To be honest I don't understand the concern here. In the theoretical void it makes sense to make `MutableIndex` to extend `IndexReader` and `IndexCreator`. I mean, at high level we see a MutableIndex as something that is a reader and a creator at the same time. But at lower level it doesn't make sense. It is clear once we focus on the semantics these interfaces. 
   
   As suggested by Neha, I will try to add a javadoc in MutableIndex trying to summarize what is said in this (quite long) text.
   
   ## IndexReader
   
   Given that each specific IndexReader uses their own methods to read, we weren't able to generalize more methods in IndexReader, this interface is actually just a marker interface that extends Closeable. As you can expect, there is nothing in common in how a bloom filter and range index are used, for example.
   
   > does it have to extend IndexReader?
   
   That is a good question. Given that IndexReader is just a marker interface, they don't _have_ to extend IndexReader. What we would actually like to say is something like the following:
   ```
   public interface MutableIndex<IR extends IndexReader>  extends IR
   ```
   
   Then MutableInvertedIndex should be declared as:
   
   ```
   public interface MutableInvertedIndex  extends MutableIndex<InvertedIndexReader>
   ```
   
   If I correctly understood Neha, that is what she suggest when she said:
   
   > would it be a tall order to split that up and clean it up for all impls?
   
   But Java typesystem is not expressive enough to say that and the code above doesn't compile. In `public interface Whatever<E> extends E`, E must be an interface and the Java generic system cannot ensure that (there is no way to say that E has to be a interface and not a class).
   
   Another option would be to change IndexType interface from:
   
   ```
   public interface IndexType<C extends IndexConfig, IR extends IndexReader, IC extends IndexCreator> {
     ...
     MutableIndex createMutableIndex(MutableIndexContext context, C config);
     ...
   }
   ```
   
   to:
   
   ```
   public interface IndexType<C extends IndexConfig, IR extends IndexReader, IC extends IndexCreator, IM extends IR & IC> {
     ...
     IM createMutableIndex(MutableIndexContext context, C config);
     ...
   }
   ```
   
   That has two problems. First, it breaks compatibility. We should change all usages of `IndexType<?, ?, ?>` with `IndexType<?, ?, ?, ?>`. This is not desired, but acceptable. The main problem is that, again, it doesn't compile
   
   Therefore, as far as I know, the most we can do in MutableIndex in the reading part is to extend IndexReader. This means that each time we want to use a MutableWhateverIndex as a reader we actually need to cast it to the specific index reader we need to use. Do we _have_ to extend IndexReader? No, we don't. But as a marker interface, it marks some intention: MutableIndexes should implement some specific index reader. This is not true in the case of IndexCreators.
   
   ## About IndexCreator
   
   IndexCreator isn't just the interface used to create indexes. It is the interface used to create indexes _in offline tables_. MutableIndexes is the interface used to read and create indexes in _realtime segments_. That detail is important. Being focused on offline segments IndexCreator imposes some contraints that are not fulfilled in MutableIndexes. More specifically, rows must be added in docId order starting from docId 0. This is is a constraint we always had, but was just explicitly added in its javadoc in `index-spi`.
   
   On the other hand, MutableIndexes are used in realtime segments where the docId isn't just increasing each time a new row is added. TBH I don't fully understand how `MutableSegmentImpl` deals with docIds when they can change once segment is committed and therefore reordered if there is a sorted column, but it is clear that it isn't just an auto-increasing int, which is the IndexCreator constraint. Given that MutableIndexes cannot assume that constraint, they cannot implement IndexCreator.
   
   This isn't something new. Before this PR, each `MutableWhateverIndexes` implemented `WhateverIndexReader` but they did not implement `WhateverIndexCreator`.
   
   > I do see you need docId in mutable index's add call. I remember discussing passing docId to IndexCreator add too.. Can we unify both mutable and immutable creator to unify the add signature?
   
   I hope the text above makes it clear, but in order to be more explicit, when we discussed about docId in IndexCreator, the question was _don't we need to include the docId as a parameter?_ and the answer was _no, the docId is an auto-increasing value_. 
   
   We cannot remove the _docId_ in MutableIndex and we should not add the _docId_ in IndexCreator. The first is already reasoned here, the later was reasoned in the other PR, but I think it is worth to repeat it here. We should not add _docId_ as a parameter in IndexCreator because implementations of IndexCreator assume the docId is an auto-increasing value. In case we add the parameter we would need to add a runtime precondition in IndexCreators that verify the constraint we have at compilation level right now. This will make the code less table and more confusing. In that scenario a reader would think: _why do IndexCreators accept a parameter that is always deterministic but add a precondition to it?_. We would lose compile time checks and substitute them with a runtime check. And the only reason to do that is to make interfaces fit in our high level mental model where a MutableIndex is used to create and read indexes, when there is no actual semantical equivalent between these int
 erfaces.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org