You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@pinot.apache.org by si...@apache.org on 2020/05/31 01:04:37 UTC

[incubator-pinot] branch hotfix-0530 updated (54cac4a -> 5580af7)

This is an automated email from the ASF dual-hosted git repository.

siddteotia pushed a change to branch hotfix-0530
in repository https://gitbox.apache.org/repos/asf/incubator-pinot.git.


 discard 54cac4a  Two changes:
     new 9e35e15  Two changes:
     new 5580af7  Remove master branch restriction (#5467)

This update added new revisions after undoing existing revisions.
That is to say, some revisions that were in the old version of the
branch are not in the new version.  This situation occurs
when a user --force pushes a change and generates a repository
containing something like this:

 * -- * -- B -- O -- O -- O   (54cac4a)
            \
             N -- N -- N   refs/heads/hotfix-0530 (5580af7)

You should already have received notification emails for all of the O
revisions, and so the following emails describe only the N revisions
from the common base, B.

Any revisions marked "omit" are not gone; other references still
refer to them.  Any revisions marked "discard" are gone forever.

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .travis.yml                                                 |  8 ++++----
 .../pinot/core/indexsegment/mutable/MutableSegmentImpl.java | 13 +++++--------
 .../segment/creator/impl/SegmentColumnarIndexCreator.java   | 12 ++++++------
 .../creator/impl/fwd/SingleValueVarByteRawIndexCreator.java |  4 ++--
 .../loader/defaultcolumn/BaseDefaultColumnHandler.java      |  3 ++-
 .../java/org/apache/pinot/spi/config/table/FieldConfig.java |  2 +-
 6 files changed, 20 insertions(+), 22 deletions(-)


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[incubator-pinot] 01/02: Two changes:

Posted by si...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

siddteotia pushed a commit to branch hotfix-0530
in repository https://gitbox.apache.org/repos/asf/incubator-pinot.git

commit 9e35e157270e1d63ca9333e13f13951df30867ff
Author: Siddharth Teotia <st...@steotia-mn1.linkedin.biz>
AuthorDate: Sat May 30 14:14:38 2020 -0700

    Two changes:
    
    (1) PR https://github.com/apache/incubator-pinot/pull/5256
    added support for deriving num docs per chunk for var byte
    raw index create from column length. This was specifically
    done as part of supporting text blobs. For use cases that
    don't want this feature and are high QPS, see a negative
    impact since size of chunk increases (earlier value
    of numDocsPerChunk was hardcoded to 1000) and based on the
    access pattern we might end up uncompressing a bigger chunk to get values
    for a set of docIds. We have made this change configurable.
    So the default behaviour is same as old (1000 docs per chunk)
    
    (2) PR https://github.com/apache/incubator-pinot/pull/4791
    added support for noDict for STRING/BYTES in consuming segments.
    There is a particular impact of this change on the use cases
    that have set noDict on their STRING dimension columns for other performance
    reasons and also want metricsAggregation. These use cases don't get to
    aggregateMetrics because the new implementation was able to honor their
    table config setting of noDict on STRING/BYTES. Without metrics aggregation,
    memory pressure increases. So to continue aggregating metrics for such cases,
    we will create dictionary even if the column is part of noDictionary set
    from table config.
---
 .../generator/SegmentGeneratorConfig.java          | 15 +++++++++++
 .../indexsegment/mutable/MutableSegmentImpl.java   | 29 +++++++++++++++++++---
 .../creator/impl/SegmentColumnarIndexCreator.java  | 22 ++++++++++++++--
 .../fwd/SingleValueVarByteRawIndexCreator.java     | 10 +++++++-
 .../defaultcolumn/BaseDefaultColumnHandler.java    |  3 ++-
 .../apache/pinot/spi/config/table/FieldConfig.java |  1 +
 6 files changed, 72 insertions(+), 8 deletions(-)

diff --git a/pinot-core/src/main/java/org/apache/pinot/core/indexsegment/generator/SegmentGeneratorConfig.java b/pinot-core/src/main/java/org/apache/pinot/core/indexsegment/generator/SegmentGeneratorConfig.java
index 59531fe..9af9c16 100644
--- a/pinot-core/src/main/java/org/apache/pinot/core/indexsegment/generator/SegmentGeneratorConfig.java
+++ b/pinot-core/src/main/java/org/apache/pinot/core/indexsegment/generator/SegmentGeneratorConfig.java
@@ -102,6 +102,9 @@ public class SegmentGeneratorConfig {
   private boolean _skipTimeValueCheck = false;
   private boolean _nullHandlingEnabled = false;
 
+  // constructed from FieldConfig
+  private Map<String, Map<String, String>> _columnProperties = new HashMap<>();
+
   @Deprecated
   public SegmentGeneratorConfig() {
   }
@@ -174,12 +177,24 @@ public class SegmentGeneratorConfig {
         _invertedIndexCreationColumns = indexingConfig.getInvertedIndexColumns();
       }
 
+      List<FieldConfig> fieldConfigList = tableConfig.getFieldConfigList();
+      if (fieldConfigList != null) {
+        for (FieldConfig fieldConfig : fieldConfigList) {
+          _columnProperties.put(fieldConfig.getName(), fieldConfig.getProperties());
+        }
+      }
+
       extractTextIndexColumnsFromTableConfig(tableConfig);
 
       _nullHandlingEnabled = indexingConfig.isNullHandlingEnabled();
     }
   }
 
+  @Nonnull
+  public Map<String, Map<String, String>> getColumnProperties() {
+    return _columnProperties;
+  }
+
   /**
    * Set time column details using the given time column
    */
diff --git a/pinot-core/src/main/java/org/apache/pinot/core/indexsegment/mutable/MutableSegmentImpl.java b/pinot-core/src/main/java/org/apache/pinot/core/indexsegment/mutable/MutableSegmentImpl.java
index 07e5ec9..0e24664 100644
--- a/pinot-core/src/main/java/org/apache/pinot/core/indexsegment/mutable/MutableSegmentImpl.java
+++ b/pinot-core/src/main/java/org/apache/pinot/core/indexsegment/mutable/MutableSegmentImpl.java
@@ -219,7 +219,7 @@ public class MutableSegmentImpl implements MutableSegment {
       FieldSpec.DataType dataType = fieldSpec.getDataType();
       boolean isFixedWidthColumn = dataType.isFixedWidth();
       int forwardIndexColumnSize = -1;
-      if (isNoDictionaryColumn(noDictionaryColumns, invertedIndexColumns, textIndexColumns, fieldSpec, column)) {
+      if (isNoDictionaryColumn(noDictionaryColumns, invertedIndexColumns, fieldSpec, column)) {
         // no dictionary
         // each forward index entry will be equal to size of data for that row
         // For INT, LONG, FLOAT, DOUBLE it is equal to the number of fixed bytes used to store the value,
@@ -329,9 +329,30 @@ public class MutableSegmentImpl implements MutableSegment {
    * @return true if column is no-dictionary, false if dictionary encoded
    */
   private boolean isNoDictionaryColumn(Set<String> noDictionaryColumns, Set<String> invertedIndexColumns,
-      Set<String> textIndexColumns, FieldSpec fieldSpec, String column) {
-    return textIndexColumns.contains(column) || (noDictionaryColumns.contains(column) && fieldSpec.isSingleValueField()
-        && !invertedIndexColumns.contains(column));
+      FieldSpec fieldSpec, String column) {
+    FieldSpec.DataType dataType = fieldSpec.getDataType();
+    if (noDictionaryColumns.contains(column)) {
+      // Earlier we didn't support noDict in consuming segments for STRING and BYTES columns.
+      // So even if the user had the column in noDictionaryColumns set in table config, we still
+      // created dictionary in consuming segments.
+      // Later on we added this support. There is a particular impact of this change on the use cases
+      // that have set noDict on their STRING dimension columns for other performance
+      // reasons and also want metricsAggregation. These use cases don't get to
+      // aggregateMetrics because the new implementation is able to honor their table config setting
+      // of noDict on STRING/BYTES. Without metrics aggregation, memory pressure increases.
+      // So to continue aggregating metrics for such cases, we will create dictionary even
+      // if the column is part of noDictionary set from table config
+      if (fieldSpec instanceof DimensionFieldSpec && _aggregateMetrics && (dataType == FieldSpec.DataType.STRING ||
+          dataType == FieldSpec.DataType.BYTES)) {
+        _logger.info("Not creating dictionary in consuming segment for column {} of type {}", column, dataType.toString());
+        return false;
+      }
+      // So don't create dictionary if the column is member of noDictionary, is single-value
+      // and doesn't have an inverted index
+      return fieldSpec.isSingleValueField() && !invertedIndexColumns.contains(column);
+    }
+    // column is not a part of noDictionary set, so create dictionary
+    return false;
   }
 
   public SegmentPartitionConfig getSegmentPartitionConfig() {
diff --git a/pinot-core/src/main/java/org/apache/pinot/core/segment/creator/impl/SegmentColumnarIndexCreator.java b/pinot-core/src/main/java/org/apache/pinot/core/segment/creator/impl/SegmentColumnarIndexCreator.java
index 12e6feb..9ff2864 100644
--- a/pinot-core/src/main/java/org/apache/pinot/core/segment/creator/impl/SegmentColumnarIndexCreator.java
+++ b/pinot-core/src/main/java/org/apache/pinot/core/segment/creator/impl/SegmentColumnarIndexCreator.java
@@ -53,6 +53,7 @@ import org.apache.pinot.core.segment.creator.impl.inv.OffHeapBitmapInvertedIndex
 import org.apache.pinot.core.segment.creator.impl.inv.OnHeapBitmapInvertedIndexCreator;
 import org.apache.pinot.core.segment.creator.impl.inv.text.LuceneTextIndexCreator;
 import org.apache.pinot.core.segment.creator.impl.nullvalue.NullValueVectorCreator;
+import org.apache.pinot.spi.config.table.FieldConfig;
 import org.apache.pinot.spi.data.DateTimeFieldSpec;
 import org.apache.pinot.spi.data.FieldSpec;
 import org.apache.pinot.spi.data.FieldSpec.FieldType;
@@ -193,9 +194,10 @@ public class SegmentColumnarIndexCreator implements SegmentCreator {
             getColumnCompressionType(segmentCreationSpec, fieldSpec);
 
         // Initialize forward index creator
+        boolean deriveNumDocsPerChunk = shouldDeriveNumDocsPerChunk(columnName, segmentCreationSpec.getColumnProperties());
         _forwardIndexCreatorMap.put(columnName,
             getRawIndexCreatorForColumn(_indexDir, compressionType, columnName, fieldSpec.getDataType(), totalDocs,
-                indexCreationInfo.getLengthOfLongestEntry()));
+                indexCreationInfo.getLengthOfLongestEntry(), deriveNumDocsPerChunk));
 
         // Initialize text index creator
         if (_textIndexColumns.contains(columnName)) {
@@ -213,6 +215,14 @@ public class SegmentColumnarIndexCreator implements SegmentCreator {
     }
   }
 
+  public static boolean shouldDeriveNumDocsPerChunk(String columnName, Map<String, Map<String, String>> columnProperties) {
+    if (columnProperties != null) {
+      Map<String, String> properties = columnProperties.get(columnName);
+      return properties != null && Boolean.parseBoolean(properties.get(FieldConfig.DERIVE_NUM_DOCS_PER_CHUNK_RAW_INDEX_KEY));
+    }
+    return false;
+  }
+
   /**
    * Helper method that returns compression type to use based on segment creation spec and field type.
    * <ul>
@@ -539,6 +549,13 @@ public class SegmentColumnarIndexCreator implements SegmentCreator {
       ChunkCompressorFactory.CompressionType compressionType, String column, FieldSpec.DataType dataType, int totalDocs,
       int lengthOfLongestEntry)
       throws IOException {
+    return getRawIndexCreatorForColumn(file, compressionType, column, dataType, totalDocs, lengthOfLongestEntry, false);
+  }
+
+  public static SingleValueRawIndexCreator getRawIndexCreatorForColumn(File file,
+      ChunkCompressorFactory.CompressionType compressionType, String column, FieldSpec.DataType dataType, int totalDocs,
+      int lengthOfLongestEntry, boolean deriveNumDocsPerChunk)
+      throws IOException {
 
     SingleValueRawIndexCreator indexCreator;
     switch (dataType) {
@@ -561,7 +578,8 @@ public class SegmentColumnarIndexCreator implements SegmentCreator {
       case STRING:
       case BYTES:
         indexCreator =
-            new SingleValueVarByteRawIndexCreator(file, compressionType, column, totalDocs, lengthOfLongestEntry);
+            new SingleValueVarByteRawIndexCreator(file, compressionType, column, totalDocs, lengthOfLongestEntry,
+                deriveNumDocsPerChunk);
         break;
 
       default:
diff --git a/pinot-core/src/main/java/org/apache/pinot/core/segment/creator/impl/fwd/SingleValueVarByteRawIndexCreator.java b/pinot-core/src/main/java/org/apache/pinot/core/segment/creator/impl/fwd/SingleValueVarByteRawIndexCreator.java
index a8d7e6b..93652e5 100644
--- a/pinot-core/src/main/java/org/apache/pinot/core/segment/creator/impl/fwd/SingleValueVarByteRawIndexCreator.java
+++ b/pinot-core/src/main/java/org/apache/pinot/core/segment/creator/impl/fwd/SingleValueVarByteRawIndexCreator.java
@@ -28,6 +28,7 @@ import org.apache.pinot.core.segment.creator.impl.V1Constants;
 
 
 public class SingleValueVarByteRawIndexCreator extends BaseSingleValueRawIndexCreator {
+  private static final int DEFAULT_NUM_DOCS_PER_CHUNK = 1000;
   private static final int TARGET_MAX_CHUNK_SIZE = 1024 * 1024;
 
   private final VarByteChunkSingleValueWriter _indexWriter;
@@ -35,8 +36,15 @@ public class SingleValueVarByteRawIndexCreator extends BaseSingleValueRawIndexCr
   public SingleValueVarByteRawIndexCreator(File baseIndexDir, ChunkCompressorFactory.CompressionType compressionType,
       String column, int totalDocs, int maxLength)
       throws IOException {
+    this(baseIndexDir, compressionType, column, totalDocs, maxLength, false);
+  }
+
+  public SingleValueVarByteRawIndexCreator(File baseIndexDir, ChunkCompressorFactory.CompressionType compressionType,
+      String column, int totalDocs, int maxLength, boolean deriveNumDocsPerChunk)
+      throws IOException {
     File file = new File(baseIndexDir, column + V1Constants.Indexes.RAW_SV_FORWARD_INDEX_FILE_EXTENSION);
-    _indexWriter = new VarByteChunkSingleValueWriter(file, compressionType, totalDocs, getNumDocsPerChunk(maxLength), maxLength);
+    int numDocsPerChunk = deriveNumDocsPerChunk ? getNumDocsPerChunk(maxLength) : DEFAULT_NUM_DOCS_PER_CHUNK;
+    _indexWriter = new VarByteChunkSingleValueWriter(file, compressionType, totalDocs, numDocsPerChunk, maxLength);
   }
 
   @VisibleForTesting
diff --git a/pinot-core/src/main/java/org/apache/pinot/core/segment/index/loader/defaultcolumn/BaseDefaultColumnHandler.java b/pinot-core/src/main/java/org/apache/pinot/core/segment/index/loader/defaultcolumn/BaseDefaultColumnHandler.java
index d7277b9..12d1e9b 100644
--- a/pinot-core/src/main/java/org/apache/pinot/core/segment/index/loader/defaultcolumn/BaseDefaultColumnHandler.java
+++ b/pinot-core/src/main/java/org/apache/pinot/core/segment/index/loader/defaultcolumn/BaseDefaultColumnHandler.java
@@ -362,9 +362,10 @@ public abstract class BaseDefaultColumnHandler implements DefaultColumnHandler {
     int lengthOfLongestEntry = StringUtil.encodeUtf8(stringDefaultValue).length;
     int dictionaryElementSize = 0;
 
+    boolean deriveNumDocsPerChunk = SegmentColumnarIndexCreator.shouldDeriveNumDocsPerChunk(column, indexLoadingConfig.getColumnProperties());
     SingleValueVarByteRawIndexCreator rawIndexCreator =
         new SingleValueVarByteRawIndexCreator(_indexDir, ChunkCompressorFactory.CompressionType.SNAPPY, column,
-            totalDocs, lengthOfLongestEntry);
+            totalDocs, lengthOfLongestEntry, deriveNumDocsPerChunk);
 
     for (int docId = 0; docId < totalDocs; docId++) {
       rawIndexCreator.index(docId, defaultValue);
diff --git a/pinot-spi/src/main/java/org/apache/pinot/spi/config/table/FieldConfig.java b/pinot-spi/src/main/java/org/apache/pinot/spi/config/table/FieldConfig.java
index 8e0d2e0..03d2590 100644
--- a/pinot-spi/src/main/java/org/apache/pinot/spi/config/table/FieldConfig.java
+++ b/pinot-spi/src/main/java/org/apache/pinot/spi/config/table/FieldConfig.java
@@ -36,6 +36,7 @@ public class FieldConfig extends BaseJsonConfig {
   public static String BLOOM_FILTER_COLUMN_KEY = "bloom.filter";
   public static String ON_HEAP_DICTIONARY_COLUMN_KEY = "onheap.dictionary";
   public static String VAR_LENGTH_DICTIONARY_COLUMN_KEY = "var.length.dictionary";
+  public static String DERIVE_NUM_DOCS_PER_CHUNK_RAW_INDEX_KEY = "derive.num.docs.per.chunk.raw.index";
 
   public static String TEXT_INDEX_REALTIME_READER_REFRESH_KEY = "text.index.realtime.reader.refresh";
   // Lucene creates a query result cache if this option is enabled


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org

[incubator-pinot] 02/02: Remove master branch restriction (#5467)

Posted by si...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

siddteotia pushed a commit to branch hotfix-0530
in repository https://gitbox.apache.org/repos/asf/incubator-pinot.git

commit 5580af7f253d66a3421971e1400c1bb919a9f115
Author: Jialiang Li <jl...@linkedin.com>
AuthorDate: Fri May 29 09:05:25 2020 -0700

    Remove master branch restriction (#5467)
    
    Co-authored-by: Jack Li(Analytics Engineering) <jl...@jlli-mn1.linkedin.biz>
---
 .travis.yml | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/.travis.yml b/.travis.yml
index 84dfceb..76eb09b 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -25,14 +25,14 @@ addons:
 install:
   - ./.travis/.travis_install.sh
 
-branches:
-  only:
-    - master
+#branches:
+#  only:
+#    - master
 
 stages:
   - test
   - name: deploy
-    if: branch = master
+#    if: branch = master
 
 jobs:
   include:


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org