You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "klsince (via GitHub)" <gi...@apache.org> on 2023/03/09 21:31:51 UTC

[GitHub] [pinot] klsince commented on a diff in pull request #10191: [Index SPI] IndexType

klsince commented on code in PR #10191:
URL: https://github.com/apache/pinot/pull/10191#discussion_r1131613995


##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/IndexCreator.java:
##########
@@ -0,0 +1,62 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.segment.spi.index;
+
+import java.io.Closeable;
+import java.io.IOException;
+import javax.annotation.Nonnull;
+import javax.annotation.Nullable;
+
+
+/**
+ * The interface used to create indexes.
+ *
+ * The lifecycle for an IndexCreator is to be created, receive one or more calls to either
+ * {@link #add(Object, int)} or {@link #add(Object[], int[])} (but not
+ * mix them),
+ * a call to {@link #seal()} and finally be closed. Calls to add cell methods must be done in document id order,

Review Comment:
   looks like the comments were not formatted, btw "Calls to add methods..."?



##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/IndexReaderConstraintException.java:
##########
@@ -0,0 +1,47 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.segment.spi.index;
+
+public class IndexReaderConstraintException extends Exception {
+  public IndexReaderConstraintException() {
+  }
+
+  public IndexReaderConstraintException(String message) {
+    super(message);
+  }
+
+  public IndexReaderConstraintException(String columnName, IndexType<?, ?, ?> type, String constraintDesc,
+      Throwable cause) {
+    this("Cannot read an index of type " + type + " on column " + columnName + ". Reason: " + constraintDesc,

Review Comment:
   nit: String.format(...) to read more easily



##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/IndexType.java:
##########
@@ -0,0 +1,121 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.spi.index;
+
+import java.util.Map;
+import javax.annotation.Nullable;
+import org.apache.pinot.segment.spi.ColumnMetadata;
+import org.apache.pinot.segment.spi.creator.IndexCreationContext;
+import org.apache.pinot.segment.spi.index.column.ColumnIndexContainer;
+import org.apache.pinot.segment.spi.store.SegmentDirectory;
+import org.apache.pinot.spi.config.table.IndexConfig;
+import org.apache.pinot.spi.config.table.TableConfig;
+import org.apache.pinot.spi.data.Schema;
+
+
+/**
+ * TODO: implement mutable indexes.
+ * @param <C> the class that represents how this object is configured.
+ * @param <IR> the {@link IndexReader} subclass that should be used to read indexes of this type.
+ * @param <IC> the {@link IndexCreator} subclass that should be used to create indexes of this type.
+ */
+public interface IndexType<C extends IndexConfig, IR extends IndexReader, IC extends IndexCreator> {
+
+  /**
+   * The unique id that identifies this index type.
+   * <p>The returned value for each index should be constant across different Pinot versions as it is used as:</p>
+   *
+   * <ul>
+   *   <li>They key used when the index is registered in IndexService.</li>

Review Comment:
   `s/They/The`



##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/IndexCreator.java:
##########
@@ -0,0 +1,62 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.segment.spi.index;
+
+import java.io.Closeable;
+import java.io.IOException;
+import javax.annotation.Nonnull;
+import javax.annotation.Nullable;
+
+
+/**
+ * The interface used to create indexes.
+ *
+ * The lifecycle for an IndexCreator is to be created, receive one or more calls to either
+ * {@link #add(Object, int)} or {@link #add(Object[], int[])} (but not
+ * mix them),
+ * a call to {@link #seal()} and finally be closed. Calls to add cell methods must be done in document id order,
+ * starting from the first document id.
+ */
+public interface IndexCreator extends Closeable {
+  /**
+   * Adds the given single value cell to the index.
+   *
+   * Rows will be added in docId order, starting with the one with docId 0.
+   *
+   * @param value The nonnull value of the cell. In case the cell was actually null, a default value is received instead
+   * @param dictId An optional dictionary value of the cell. If there is no dictionary, -1 is received
+   */
+  void add(@Nonnull Object value, int dictId)
+      throws IOException;
+
+  /**
+   * Adds the given multi value cell to the index
+   *
+   * Rows will be added in docId order, starting with the one with docId 0.
+   *
+   * @param values The nonnull value of the cell. In case the cell was actually null, an empty array is received instead
+   * @param dictIds An optional array of dictionary values. If there is no dictionary, null is received.
+   */
+  void add(@Nonnull Object[] values, @Nullable int[] dictIds)

Review Comment:
   IIRC, the convention is to annotate `@Nullable` only and leave default as `@Nonnull`. 



##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/IndexType.java:
##########
@@ -0,0 +1,121 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.spi.index;
+
+import java.util.Map;
+import javax.annotation.Nullable;
+import org.apache.pinot.segment.spi.ColumnMetadata;
+import org.apache.pinot.segment.spi.creator.IndexCreationContext;
+import org.apache.pinot.segment.spi.index.column.ColumnIndexContainer;
+import org.apache.pinot.segment.spi.store.SegmentDirectory;
+import org.apache.pinot.spi.config.table.IndexConfig;
+import org.apache.pinot.spi.config.table.TableConfig;
+import org.apache.pinot.spi.data.Schema;
+
+
+/**
+ * TODO: implement mutable indexes.
+ * @param <C> the class that represents how this object is configured.
+ * @param <IR> the {@link IndexReader} subclass that should be used to read indexes of this type.
+ * @param <IC> the {@link IndexCreator} subclass that should be used to create indexes of this type.
+ */
+public interface IndexType<C extends IndexConfig, IR extends IndexReader, IC extends IndexCreator> {
+
+  /**
+   * The unique id that identifies this index type.
+   * <p>The returned value for each index should be constant across different Pinot versions as it is used as:</p>
+   *
+   * <ul>
+   *   <li>They key used when the index is registered in IndexService.</li>
+   *   <li>The internal identification in v1 files and metadata persisted on disk.</li>
+   *   <li>The default toString implementation.</li>
+   *   <li>The key that identifies the index config in the indexes section inside
+   *   {@link org.apache.pinot.spi.config.table.FieldConfig}, although specific index types may choose to read other
+   *   names (for example, <code>inverted_index</code> may read <code>inverted</code> key.</li>
+   * </ul>
+   */
+  String getId();
+
+  Class<C> getIndexConfigClass();
+
+  /**
+   * The default config when it is not explicitly defined by the user.
+   */
+  C getDefaultConfig();
+
+  C getConfig(TableConfig tableConfig, Schema schema);
+
+  /**
+   * Optional method that can be implemented to ignore the index creation.
+   *
+   * Sometimes it doesn't make sense to create an index, even when the user explicitly asked for it. For example, an
+   * inverted index shouldn't be created when the column is sorted.
+   *
+   * Apache Pinot will call this method once all index configurations have been parsed and it is included in the
+   * {@link FieldIndexConfigs} param.
+   *
+   * This method do not need to return false when the index type itself is not included in the {@link FieldIndexConfigs}
+   * param.
+   */
+  default boolean shouldBeCreated(IndexCreationContext context, FieldIndexConfigs configs) {
+    return true;
+  }
+
+  /**
+   * Returns the {@link IndexCreator} that can should be used to create an index of this type with the given context
+   * and configuration.
+   *
+   * The caller has the ownership of the creator and therefore it has to close it.
+   * @param context The object that stores all the contextual information related to the index creation. Like the
+   *                cardinality or the total number of documents.
+   * @param indexConfig The index specific configuration that should be used.
+   */
+  IC createIndexCreator(IndexCreationContext context, C indexConfig)
+      throws Exception;
+
+  /**
+   * Returns the {@link IndexReaderFactory} that should be used to return readers for this type.
+   */
+  IndexReaderFactory<IR> getReaderFactory();
+
+  /**
+   * This method is used to extract a compatible reader from a given ColumnIndexContainer.
+   *
+   * Most implementations just return {@link ColumnIndexContainer#getIndex(IndexType)}, but some may try to reuse other
+   * indexes. For example, InvertedIndexType delegates on the ForwardIndexReader when it is sorted.
+   */
+  @Nullable
+  default IR getIndexReader(ColumnIndexContainer indexContainer) {
+    throw new UnsupportedOperationException();
+  }
+
+  String getFileExtension(ColumnMetadata columnMetadata);
+
+  /**
+   * Returns whether the index is stored as a buffer or not.
+   *
+   * Most indexes are stored as a buffer, but for example TextIndexType is stored in a separate lucene file.
+   */
+  default boolean storedAsBuffer() {

Review Comment:
   curious how this method is going to be used?
   
   The StarTree index data is kept in a separate file (star_tree_index) as PinotDataBuffers. Would it break any assumption here?



##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/index/IndexType.java:
##########
@@ -0,0 +1,121 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.spi.index;
+
+import java.util.Map;
+import javax.annotation.Nullable;
+import org.apache.pinot.segment.spi.ColumnMetadata;
+import org.apache.pinot.segment.spi.creator.IndexCreationContext;
+import org.apache.pinot.segment.spi.index.column.ColumnIndexContainer;
+import org.apache.pinot.segment.spi.store.SegmentDirectory;
+import org.apache.pinot.spi.config.table.IndexConfig;
+import org.apache.pinot.spi.config.table.TableConfig;
+import org.apache.pinot.spi.data.Schema;
+
+
+/**
+ * TODO: implement mutable indexes.
+ * @param <C> the class that represents how this object is configured.
+ * @param <IR> the {@link IndexReader} subclass that should be used to read indexes of this type.
+ * @param <IC> the {@link IndexCreator} subclass that should be used to create indexes of this type.
+ */
+public interface IndexType<C extends IndexConfig, IR extends IndexReader, IC extends IndexCreator> {
+
+  /**
+   * The unique id that identifies this index type.
+   * <p>The returned value for each index should be constant across different Pinot versions as it is used as:</p>
+   *
+   * <ul>
+   *   <li>They key used when the index is registered in IndexService.</li>
+   *   <li>The internal identification in v1 files and metadata persisted on disk.</li>
+   *   <li>The default toString implementation.</li>
+   *   <li>The key that identifies the index config in the indexes section inside
+   *   {@link org.apache.pinot.spi.config.table.FieldConfig}, although specific index types may choose to read other
+   *   names (for example, <code>inverted_index</code> may read <code>inverted</code> key.</li>
+   * </ul>
+   */
+  String getId();
+
+  Class<C> getIndexConfigClass();
+
+  /**
+   * The default config when it is not explicitly defined by the user.
+   */
+  C getDefaultConfig();
+
+  C getConfig(TableConfig tableConfig, Schema schema);
+
+  /**
+   * Optional method that can be implemented to ignore the index creation.
+   *
+   * Sometimes it doesn't make sense to create an index, even when the user explicitly asked for it. For example, an
+   * inverted index shouldn't be created when the column is sorted.
+   *
+   * Apache Pinot will call this method once all index configurations have been parsed and it is included in the

Review Comment:
   `Apache Pinot` feels too generic to me here. Perhaps list some example callers of this method here, e.g. SegmentPreProcessor or SegmentCreator (iiuc) .



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org