You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by "shounakmk219 (via GitHub)" <gi...@apache.org> on 2023/06/13 12:42:58 UTC

[GitHub] [pinot] shounakmk219 opened a new pull request, #10905: Startree index build enhancement

shounakmk219 opened a new pull request, #10905:
URL: https://github.com/apache/pinot/pull/10905

   Currently if you add a new StarTree index to the table we end up throwing all existing StarTree indexes and rebuild all from scratch, not the new one only. Same goes for a StarTree index deletion, it will rebuild the rest of the trees again.
   This in a scenario where data size is large and there are bunch of existing indexes can take multiple days to rebuild.
   
   This PR tries to enhance the process by providing a way to reuse the pre built startrees.
   
   **Logical flow**
   
   ```
   1. Separate and extract the existing startrees
       1.1. Separate out the combined index file into separate index files
       1.2. Extract the index map file
   2. Delete the startrees
   3. Iterate over the new startrees list
       3.1. If startree is present in the extracted existing startrees
           3.1.1. Copy the pre existing index files
           3.1.2. Update the metadata properties
       3.2. Else
           3.2.1. Build the startree from the configs
       3.3. Combine the reused/generated index files
   4. Write the files to segment directory
   5. Cleanup created temp files.
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #10905: Startree index build enhancement

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #10905:
URL: https://github.com/apache/pinot/pull/10905#discussion_r1232927284


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/SegmentPreProcessor.java:
##########
@@ -243,10 +243,8 @@ private void processStarTrees(File indexDir)
       if (starTreeMetadataList != null) {
         // There are existing star-trees
         if (StarTreeBuilderUtils.shouldRemoveExistingStarTrees(starTreeBuilderConfigs, starTreeMetadataList)) {

Review Comment:
   (minor) Let's rename this method to something like `shouldModifyExistingStarTrees()` and also modify the documentation to reflect the change



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/SegmentPreProcessor.java:
##########
@@ -243,10 +243,8 @@ private void processStarTrees(File indexDir)
       if (starTreeMetadataList != null) {
         // There are existing star-trees
         if (StarTreeBuilderUtils.shouldRemoveExistingStarTrees(starTreeBuilderConfigs, starTreeMetadataList)) {
-          // Remove the existing star-trees
-          LOGGER.info("Removing star-trees from segment: {}", _segmentMetadata.getName());
-          StarTreeBuilderUtils.removeStarTrees(indexDir);
-          _segmentMetadata = new SegmentMetadataImpl(indexDir);
+          LOGGER.info("Change detected in star-trees for segment: {}", _segmentMetadata.getName());
+          shouldGenerateStarTree = true;

Review Comment:
   We can simply remove the existing star trees when `shouldGenerateStarTree` is false (user wants to remove all the star-trees)



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/startree/v2/builder/MultipleTreesBuilder.java:
##########
@@ -115,13 +117,33 @@ public MultipleTreesBuilder(@Nullable List<StarTreeIndexConfig> indexConfigs, bo
     }
   }
 
+  private SeparatedStarTreesMetadata getExistingStarTrees()
+      throws Exception {
+    try {
+      if (_metadataProperties.containsKey(MetadataKey.STAR_TREE_COUNT)) {
+        // Extract existing startrees
+        // clean star-tree related files and configs once all the star-trees are separated and extracted
+        SeparatedStarTreesMetadata existingStarTrees = extractStarTrees();
+        StarTreeBuilderUtils.removeStarTrees(_segmentDirectory);

Review Comment:
   `removeStarTrees()` should take `indexDir` instead of `segmentDirectory` (even though it works if we pass in the `segmentDirectory`). `segmentDirectory` could be the `v3` sub-directory under the `indexDir`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #10905: Startree index build enhancement

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #10905:
URL: https://github.com/apache/pinot/pull/10905#discussion_r1239394345


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/startree/v2/builder/MultipleTreesBuilder.java:
##########
@@ -81,9 +84,10 @@ public MultipleTreesBuilder(List<StarTreeV2BuilderConfig> builderConfigs, File i
     _builderConfigs = builderConfigs;
     _buildMode = buildMode;
     _segmentDirectory = SegmentDirectoryPaths.findSegmentDirectory(indexDir);
+    _indexDirectory = indexDir;
     _metadataProperties =
         CommonsConfigurationUtils.fromFile(new File(_segmentDirectory, V1Constants.MetadataKeys.METADATA_FILE_NAME));
-    Preconditions.checkState(!_metadataProperties.containsKey(MetadataKey.STAR_TREE_COUNT), "Star-tree already exists");
+    _existingStarTrees = getExistingStarTrees();

Review Comment:
   Oh I see the point. Basically we need to first move the existing files into a temporary location, then delete it when the new trees are created



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] shounakmk219 commented on a diff in pull request #10905: Startree index build enhancement

Posted by "shounakmk219 (via GitHub)" <gi...@apache.org>.
shounakmk219 commented on code in PR #10905:
URL: https://github.com/apache/pinot/pull/10905#discussion_r1238635297


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/startree/v2/builder/MultipleTreesBuilder.java:
##########
@@ -61,8 +62,10 @@ public class MultipleTreesBuilder implements Closeable {
   private final List<StarTreeV2BuilderConfig> _builderConfigs;
   private final BuildMode _buildMode;
   private final File _segmentDirectory;
+  private final File _indexDirectory;

Review Comment:
   done
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] shounakmk219 commented on a diff in pull request #10905: Startree index build enhancement

Posted by "shounakmk219 (via GitHub)" <gi...@apache.org>.
shounakmk219 commented on code in PR #10905:
URL: https://github.com/apache/pinot/pull/10905#discussion_r1238641093


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/startree/v2/builder/MultipleTreesBuilder.java:
##########
@@ -148,9 +180,54 @@ public void build()
       StarTreeIndexMapUtils
           .storeToFile(indexMaps, new File(_segmentDirectory, StarTreeV2Constants.INDEX_MAP_FILE_NAME));
       FileUtils.forceDelete(starTreeIndexDir);
+      _existingStarTrees.cleanOutputDirectory();
     }
 
-    LOGGER.info("Finished building {} star-trees in {}ms", numStarTrees, System.currentTimeMillis() - startTime);
+    LOGGER.info("Newly built start-trees {}.\n Reused star-trees {}.", numStarTrees - reusedStarTrees, reusedStarTrees);
+    LOGGER.info("Finished building star-trees in {}ms", System.currentTimeMillis() - startTime);
+  }
+
+  /**
+   * Extracts the individual star-trees from the combined index file and returns {@link SeparatedStarTreesMetadata}
+   * which contains the information about the output directory where the extracted star-trees are located and also
+   * has the builder configs of each of the star-tree.
+   */
+  private SeparatedStarTreesMetadata extractStarTrees()
+      throws Exception {
+    SeparatedStarTreesMetadata metadata = new SeparatedStarTreesMetadata(_segmentDirectory);
+    if (_metadataProperties.containsKey(MetadataKey.STAR_TREE_COUNT)) {

Review Comment:
   Nice catch, removed it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] shounakmk219 commented on a diff in pull request #10905: Startree index build enhancement

Posted by "shounakmk219 (via GitHub)" <gi...@apache.org>.
shounakmk219 commented on code in PR #10905:
URL: https://github.com/apache/pinot/pull/10905#discussion_r1238636573


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/startree/v2/builder/MultipleTreesBuilder.java:
##########
@@ -148,9 +180,54 @@ public void build()
       StarTreeIndexMapUtils
           .storeToFile(indexMaps, new File(_segmentDirectory, StarTreeV2Constants.INDEX_MAP_FILE_NAME));
       FileUtils.forceDelete(starTreeIndexDir);
+      _existingStarTrees.cleanOutputDirectory();

Review Comment:
   makes sense, done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] shounakmk219 commented on a diff in pull request #10905: Startree index build enhancement

Posted by "shounakmk219 (via GitHub)" <gi...@apache.org>.
shounakmk219 commented on code in PR #10905:
URL: https://github.com/apache/pinot/pull/10905#discussion_r1238648754


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/startree/v2/builder/MultipleTreesBuilder.java:
##########
@@ -81,9 +84,10 @@ public MultipleTreesBuilder(List<StarTreeV2BuilderConfig> builderConfigs, File i
     _builderConfigs = builderConfigs;
     _buildMode = buildMode;
     _segmentDirectory = SegmentDirectoryPaths.findSegmentDirectory(indexDir);
+    _indexDirectory = indexDir;
     _metadataProperties =
         CommonsConfigurationUtils.fromFile(new File(_segmentDirectory, V1Constants.MetadataKeys.METADATA_FILE_NAME));
-    Preconditions.checkState(!_metadataProperties.containsKey(MetadataKey.STAR_TREE_COUNT), "Star-tree already exists");
+    _existingStarTrees = getExistingStarTrees();

Review Comment:
   Once all the trees are extracted I am deleting the existing star-tree index files so that new index can be built. Even if we just extract the required tree on demand instead of separating all trees at the start, we need to copy the existing index files to a temp folder for the separator to operate. So we can't avoid the temp folder, but I can surely update the logic to only separate the required tree on demand if current logic feels like an overhead.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #10905: Startree index build enhancement

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #10905:
URL: https://github.com/apache/pinot/pull/10905#discussion_r1237648011


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/startree/v2/builder/MultipleTreesBuilder.java:
##########
@@ -61,8 +62,10 @@ public class MultipleTreesBuilder implements Closeable {
   private final List<StarTreeV2BuilderConfig> _builderConfigs;
   private final BuildMode _buildMode;
   private final File _segmentDirectory;
+  private final File _indexDirectory;

Review Comment:
   (minor) For convention, we usually call it `_indexDir` so that it is easier to map names



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/startree/v2/builder/MultipleTreesBuilder.java:
##########
@@ -148,9 +180,54 @@ public void build()
       StarTreeIndexMapUtils
           .storeToFile(indexMaps, new File(_segmentDirectory, StarTreeV2Constants.INDEX_MAP_FILE_NAME));
       FileUtils.forceDelete(starTreeIndexDir);
+      _existingStarTrees.cleanOutputDirectory();
     }
 
-    LOGGER.info("Finished building {} star-trees in {}ms", numStarTrees, System.currentTimeMillis() - startTime);
+    LOGGER.info("Newly built start-trees {}.\n Reused star-trees {}.", numStarTrees - reusedStarTrees, reusedStarTrees);
+    LOGGER.info("Finished building star-trees in {}ms", System.currentTimeMillis() - startTime);

Review Comment:
   (minor) Combine them into one single log line
   `Finished building {} star-trees ({} reused) in {}ms`



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/startree/v2/builder/MultipleTreesBuilder.java:
##########
@@ -81,9 +84,10 @@ public MultipleTreesBuilder(List<StarTreeV2BuilderConfig> builderConfigs, File i
     _builderConfigs = builderConfigs;
     _buildMode = buildMode;
     _segmentDirectory = SegmentDirectoryPaths.findSegmentDirectory(indexDir);
+    _indexDirectory = indexDir;
     _metadataProperties =
         CommonsConfigurationUtils.fromFile(new File(_segmentDirectory, V1Constants.MetadataKeys.METADATA_FILE_NAME));
-    Preconditions.checkState(!_metadataProperties.containsKey(MetadataKey.STAR_TREE_COUNT), "Star-tree already exists");
+    _existingStarTrees = getExistingStarTrees();

Review Comment:
   We might not want to always separate all the trees. Ideally the separator can track all the star-tree metadata (merge `SeparatedStarTreesMetadata` into the separator), and only extract the tree that can be reused into the target folder. Currently we always copy all the trees into separate files, even though they might not be reusable. This way we don't need the temp folder as well.



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/startree/v2/builder/MultipleTreesBuilder.java:
##########
@@ -148,9 +180,54 @@ public void build()
       StarTreeIndexMapUtils
           .storeToFile(indexMaps, new File(_segmentDirectory, StarTreeV2Constants.INDEX_MAP_FILE_NAME));
       FileUtils.forceDelete(starTreeIndexDir);
+      _existingStarTrees.cleanOutputDirectory();
     }
 
-    LOGGER.info("Finished building {} star-trees in {}ms", numStarTrees, System.currentTimeMillis() - startTime);
+    LOGGER.info("Newly built start-trees {}.\n Reused star-trees {}.", numStarTrees - reusedStarTrees, reusedStarTrees);
+    LOGGER.info("Finished building star-trees in {}ms", System.currentTimeMillis() - startTime);
+  }
+
+  /**
+   * Extracts the individual star-trees from the combined index file and returns {@link SeparatedStarTreesMetadata}
+   * which contains the information about the output directory where the extracted star-trees are located and also
+   * has the builder configs of each of the star-tree.
+   */
+  private SeparatedStarTreesMetadata extractStarTrees()
+      throws Exception {
+    SeparatedStarTreesMetadata metadata = new SeparatedStarTreesMetadata(_segmentDirectory);
+    if (_metadataProperties.containsKey(MetadataKey.STAR_TREE_COUNT)) {
+      File indexFile = new File(_segmentDirectory, StarTreeV2Constants.INDEX_FILE_NAME);
+      File indexMapFile = new File(_segmentDirectory, StarTreeV2Constants.INDEX_MAP_FILE_NAME);
+      try (StarTreeIndexSeparator separator =
+          new StarTreeIndexSeparator(indexMapFile, indexFile,
+              _metadataProperties.getInt(MetadataKey.STAR_TREE_COUNT))) {
+        separator.separate(metadata.getOutputDirectory());
+        metadata.setBuilderConfigList(separator.extractBuilderConfigs(_metadataProperties));

Review Comment:
   Consider making these fields read-only (set them in the constructor). `SeparatedStarTreesMetadata` should be a read-only helper class to extract fields from the metadata



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/startree/v2/builder/MultipleTreesBuilder.java:
##########
@@ -148,9 +180,54 @@ public void build()
       StarTreeIndexMapUtils
           .storeToFile(indexMaps, new File(_segmentDirectory, StarTreeV2Constants.INDEX_MAP_FILE_NAME));
       FileUtils.forceDelete(starTreeIndexDir);
+      _existingStarTrees.cleanOutputDirectory();

Review Comment:
   Should we move this to `close()` so that it can be cleaned up when exception happens?



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/startree/v2/builder/MultipleTreesBuilder.java:
##########
@@ -115,13 +120,33 @@ public MultipleTreesBuilder(@Nullable List<StarTreeIndexConfig> indexConfigs, bo
     }
   }
 
+  private SeparatedStarTreesMetadata getExistingStarTrees()
+      throws Exception {
+    try {
+      if (_metadataProperties.containsKey(MetadataKey.STAR_TREE_COUNT)) {
+        // Extract existing startrees
+        // clean star-tree related files and configs once all the star-trees are separated and extracted
+        SeparatedStarTreesMetadata existingStarTrees = extractStarTrees();
+        StarTreeBuilderUtils.removeStarTrees(_indexDirectory);
+        _metadataProperties.refresh();
+        return existingStarTrees;
+      } else {
+        return new SeparatedStarTreesMetadata(_segmentDirectory);

Review Comment:
   Consider returning `null` here to avoid overhead (`null` basically represents there is no existing star-tree)



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/startree/v2/builder/MultipleTreesBuilder.java:
##########
@@ -136,9 +161,16 @@ public void build()
       for (int i = 0; i < numStarTrees; i++) {
         StarTreeV2BuilderConfig builderConfig = _builderConfigs.get(i);
         Configuration metadataProperties = _metadataProperties.subset(MetadataKey.getStarTreePrefix(i));
-        try (SingleTreeBuilder singleTreeBuilder = getSingleTreeBuilder(builderConfig, starTreeIndexDir, _segment,
-            metadataProperties, _buildMode)) {
-          singleTreeBuilder.build();
+        if (_existingStarTrees.containsTree(builderConfig)) {

Review Comment:
   We can directly get index here to avoid one extra lookup



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/startree/v2/builder/MultipleTreesBuilder.java:
##########
@@ -148,9 +180,54 @@ public void build()
       StarTreeIndexMapUtils
           .storeToFile(indexMaps, new File(_segmentDirectory, StarTreeV2Constants.INDEX_MAP_FILE_NAME));
       FileUtils.forceDelete(starTreeIndexDir);
+      _existingStarTrees.cleanOutputDirectory();
     }
 
-    LOGGER.info("Finished building {} star-trees in {}ms", numStarTrees, System.currentTimeMillis() - startTime);
+    LOGGER.info("Newly built start-trees {}.\n Reused star-trees {}.", numStarTrees - reusedStarTrees, reusedStarTrees);
+    LOGGER.info("Finished building star-trees in {}ms", System.currentTimeMillis() - startTime);
+  }
+
+  /**
+   * Extracts the individual star-trees from the combined index file and returns {@link SeparatedStarTreesMetadata}
+   * which contains the information about the output directory where the extracted star-trees are located and also
+   * has the builder configs of each of the star-tree.
+   */
+  private SeparatedStarTreesMetadata extractStarTrees()
+      throws Exception {
+    SeparatedStarTreesMetadata metadata = new SeparatedStarTreesMetadata(_segmentDirectory);
+    if (_metadataProperties.containsKey(MetadataKey.STAR_TREE_COUNT)) {

Review Comment:
   (minor) This check is redundant



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on pull request #10905: Startree index build enhancement

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on PR #10905:
URL: https://github.com/apache/pinot/pull/10905#issuecomment-1603769836

   Can you please take a look at the failed integration test? I think it is related to the changes


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] saurabhd336 merged pull request #10905: Startree index build enhancement

Posted by "saurabhd336 (via GitHub)" <gi...@apache.org>.
saurabhd336 merged PR #10905:
URL: https://github.com/apache/pinot/pull/10905


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] shounakmk219 commented on a diff in pull request #10905: Startree index build enhancement

Posted by "shounakmk219 (via GitHub)" <gi...@apache.org>.
shounakmk219 commented on code in PR #10905:
URL: https://github.com/apache/pinot/pull/10905#discussion_r1238635714


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/startree/v2/builder/MultipleTreesBuilder.java:
##########
@@ -115,13 +120,33 @@ public MultipleTreesBuilder(@Nullable List<StarTreeIndexConfig> indexConfigs, bo
     }
   }
 
+  private SeparatedStarTreesMetadata getExistingStarTrees()
+      throws Exception {
+    try {
+      if (_metadataProperties.containsKey(MetadataKey.STAR_TREE_COUNT)) {
+        // Extract existing startrees
+        // clean star-tree related files and configs once all the star-trees are separated and extracted
+        SeparatedStarTreesMetadata existingStarTrees = extractStarTrees();
+        StarTreeBuilderUtils.removeStarTrees(_indexDirectory);
+        _metadataProperties.refresh();
+        return existingStarTrees;
+      } else {
+        return new SeparatedStarTreesMetadata(_segmentDirectory);

Review Comment:
   makes sense, done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] shounakmk219 commented on a diff in pull request #10905: Startree index build enhancement

Posted by "shounakmk219 (via GitHub)" <gi...@apache.org>.
shounakmk219 commented on code in PR #10905:
URL: https://github.com/apache/pinot/pull/10905#discussion_r1245121270


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/startree/v2/builder/MultipleTreesBuilder.java:
##########
@@ -83,7 +86,11 @@ public MultipleTreesBuilder(List<StarTreeV2BuilderConfig> builderConfigs, File i
     _segmentDirectory = SegmentDirectoryPaths.findSegmentDirectory(indexDir);
     _metadataProperties =
         CommonsConfigurationUtils.fromFile(new File(_segmentDirectory, V1Constants.MetadataKeys.METADATA_FILE_NAME));
-    Preconditions.checkState(!_metadataProperties.containsKey(MetadataKey.STAR_TREE_COUNT), "Star-tree already exists");
+    _separator = getSeparator();
+    if (_separator != null) {
+      StarTreeBuilderUtils.removeStarTrees(indexDir);
+      _metadataProperties.reload();

Review Comment:
   To ensure that the startree index props removed by `StarTreeBuilderUtils.removeStarTrees` are reflected in the `_metadataProperties`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #10905: Startree index build enhancement

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #10905:
URL: https://github.com/apache/pinot/pull/10905#discussion_r1244331242


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/startree/v2/builder/MultipleTreesBuilder.java:
##########
@@ -83,7 +86,11 @@ public MultipleTreesBuilder(List<StarTreeV2BuilderConfig> builderConfigs, File i
     _segmentDirectory = SegmentDirectoryPaths.findSegmentDirectory(indexDir);
     _metadataProperties =
         CommonsConfigurationUtils.fromFile(new File(_segmentDirectory, V1Constants.MetadataKeys.METADATA_FILE_NAME));
-    Preconditions.checkState(!_metadataProperties.containsKey(MetadataKey.STAR_TREE_COUNT), "Star-tree already exists");
+    _separator = getSeparator();
+    if (_separator != null) {
+      StarTreeBuilderUtils.removeStarTrees(indexDir);
+      _metadataProperties.reload();

Review Comment:
   Why do we need to reload the metadata?



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/startree/v2/builder/MultipleTreesBuilder.java:
##########
@@ -104,7 +111,11 @@ public MultipleTreesBuilder(@Nullable List<StarTreeIndexConfig> indexConfigs, bo
     _segmentDirectory = SegmentDirectoryPaths.findSegmentDirectory(indexDir);
     _metadataProperties =
         CommonsConfigurationUtils.fromFile(new File(_segmentDirectory, V1Constants.MetadataKeys.METADATA_FILE_NAME));
-    Preconditions.checkState(!_metadataProperties.containsKey(MetadataKey.STAR_TREE_COUNT), "Star-tree already exists");
+    _separator = getSeparator();
+    if (_separator != null) {
+      StarTreeBuilderUtils.removeStarTrees(indexDir);
+      _metadataProperties.refresh();
+    }

Review Comment:
   We don't need this handling because this constructor is always used by the segment creator, at which moment the star-tree will never exist.



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/startree/v2/builder/StarTreeIndexSeparator.java:
##########
@@ -0,0 +1,154 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.segment.local.startree.v2.builder;
+
+import com.google.common.collect.Lists;
+import java.io.Closeable;
+import java.io.File;
+import java.io.FileInputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.RandomAccessFile;
+import java.nio.channels.FileChannel;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import org.apache.commons.configuration.Configuration;
+import org.apache.commons.configuration.PropertiesConfiguration;
+import org.apache.commons.io.FileUtils;
+import org.apache.pinot.segment.local.startree.v2.store.StarTreeIndexMapUtils;
+import org.apache.pinot.segment.spi.V1Constants;
+import org.apache.pinot.segment.spi.index.startree.AggregationFunctionColumnPair;
+import org.apache.pinot.segment.spi.index.startree.StarTreeV2Constants;
+import org.apache.pinot.spi.config.table.StarTreeIndexConfig;
+
+
+/**
+ * The {@code StarTreeIndexSeparator} pulls out the individual star-trees from the common star-tree index file
+ */
+public class StarTreeIndexSeparator implements Closeable {
+
+  private final FileChannel _indexFileChannel;
+  private final List<Map<StarTreeIndexMapUtils.IndexKey, StarTreeIndexMapUtils.IndexValue>> _indexMapList;
+  private final List<StarTreeV2BuilderConfig> _builderConfigList;
+  private final List<Integer> _totalDocsList;
+
+  public StarTreeIndexSeparator(File indexMapFile, File indexFile, PropertiesConfiguration metadataProperties)
+      throws IOException {
+    _indexMapList = extractIndexMap(indexMapFile,
+        metadataProperties.getInt(StarTreeV2Constants.MetadataKey.STAR_TREE_COUNT));
+    _indexFileChannel = new RandomAccessFile(indexFile, "r").getChannel();
+    _builderConfigList = extractBuilderConfigs(metadataProperties);
+    _totalDocsList = extractTotalDocsList(metadataProperties);
+  }
+
+  private List<Map<StarTreeIndexMapUtils.IndexKey, StarTreeIndexMapUtils.IndexValue>> extractIndexMap(File indexMapFile,
+      int numStarTrees) {
+    try (InputStream inputStream = new FileInputStream(indexMapFile)) {
+      return StarTreeIndexMapUtils.loadFromInputStream(inputStream, numStarTrees);
+    } catch (IOException e) {
+      throw new RuntimeException(e);
+    }
+  }
+
+  public List<Integer> extractTotalDocsList(PropertiesConfiguration metadataProperties) {
+    List<Integer> totalDocsList = new ArrayList<>(_indexMapList.size());
+    for (int i = 0; i < _indexMapList.size(); i++) {
+      Configuration metadata = metadataProperties.subset(StarTreeV2Constants.MetadataKey.getStarTreePrefix(i));
+      totalDocsList.add(i, metadata.getInt(StarTreeV2Constants.MetadataKey.TOTAL_DOCS));
+    }
+    return totalDocsList;
+  }
+
+  /**
+   * Extract the list of {@link StarTreeV2BuilderConfig} for each of the star-tree present in the given metadata
+   * properties.
+   * @param metadataProperties index metadata properties
+   * @return List of {@link StarTreeV2BuilderConfig}
+   */
+  public List<StarTreeV2BuilderConfig> extractBuilderConfigs(PropertiesConfiguration metadataProperties) {
+    List<StarTreeV2BuilderConfig> builderConfigList = new ArrayList<>(_indexMapList.size());
+    for (int i = 0; i < _indexMapList.size(); i++) {
+      Configuration metadata = metadataProperties.subset(StarTreeV2Constants.MetadataKey.getStarTreePrefix(i));
+      builderConfigList.add(i, StarTreeV2BuilderConfig.fromIndexConfig(new StarTreeIndexConfig(
+          Lists.newArrayList(metadata.getStringArray(StarTreeV2Constants.MetadataKey.DIMENSIONS_SPLIT_ORDER)),
+          Lists.newArrayList(
+              metadata.getStringArray(StarTreeV2Constants.MetadataKey.SKIP_STAR_NODE_CREATION_FOR_DIMENSIONS)),
+          Lists.newArrayList(metadata.getStringArray(StarTreeV2Constants.MetadataKey.FUNCTION_COLUMN_PAIRS)),
+          metadata.getInt(StarTreeV2Constants.MetadataKey.MAX_LEAF_RECORDS))));
+    }
+    return builderConfigList;
+  }
+
+  /**
+   * Extract star-tree index files of the star-tree represented by the builderConfig.
+   * The extracted star-tree index files are written to the provided starTreeOutputDir.
+   *
+   * @param starTreeOutputDir output directory for extracted star-trees
+   * @param builderConfig {@link StarTreeV2BuilderConfig} of the star-tree to separate
+   * @return if star-tree exist then total docs in the separated tree, else -1
+   * @throws IOException
+   */
+  public int separate(File starTreeOutputDir, StarTreeV2BuilderConfig builderConfig)
+      throws IOException {
+    int treeIndex = _builderConfigList.indexOf(builderConfig);
+    if (treeIndex == -1) {
+      return treeIndex;

Review Comment:
   (minor) this should be `return -1` (although the same effect, but logically it should return the docs)



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/startree/v2/builder/MultipleTreesBuilder.java:
##########
@@ -115,13 +126,38 @@ public MultipleTreesBuilder(@Nullable List<StarTreeIndexConfig> indexConfigs, bo
     }
   }
 
+  private StarTreeIndexSeparator getSeparator()
+      throws Exception {
+    if (!_metadataProperties.containsKey(MetadataKey.STAR_TREE_COUNT)) {
+      return null;
+    }
+    try {
+      _separatorTempDir = new File(_segmentDirectory, StarTreeV2Constants.EXISTING_STAR_TREE_TEMP_DIR);
+      FileUtils.forceMkdir(_separatorTempDir);
+      FileUtils.copyFileToDirectory(
+          new File(_segmentDirectory, StarTreeV2Constants.INDEX_FILE_NAME), _separatorTempDir, false);
+      FileUtils.copyFileToDirectory(
+          new File(_segmentDirectory, StarTreeV2Constants.INDEX_MAP_FILE_NAME), _separatorTempDir, false);
+      return new StarTreeIndexSeparator(new File(_separatorTempDir, StarTreeV2Constants.INDEX_MAP_FILE_NAME),
+          new File(_separatorTempDir, StarTreeV2Constants.INDEX_FILE_NAME), _metadataProperties);
+    } catch (Exception e) {
+      try {
+        FileUtils.forceDelete(_separatorTempDir);
+      } catch (Exception e1) {
+        LOGGER.warn(e1.getMessage(), e1);

Review Comment:
   (minor) Put a more meaningful message, e.g. `Caught exception when deleting the separator tmp directory: {}`



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/startree/v2/builder/MultipleTreesBuilder.java:
##########
@@ -115,13 +126,38 @@ public MultipleTreesBuilder(@Nullable List<StarTreeIndexConfig> indexConfigs, bo
     }
   }
 
+  private StarTreeIndexSeparator getSeparator()

Review Comment:
   (minor) Annotate the return as `@Nullable`
   ```suggestion
     @Nullable
     private StarTreeIndexSeparator getSeparator()
   ```



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/startree/v2/builder/MultipleTreesBuilder.java:
##########
@@ -115,13 +126,38 @@ public MultipleTreesBuilder(@Nullable List<StarTreeIndexConfig> indexConfigs, bo
     }
   }
 
+  private StarTreeIndexSeparator getSeparator()
+      throws Exception {
+    if (!_metadataProperties.containsKey(MetadataKey.STAR_TREE_COUNT)) {
+      return null;
+    }
+    try {
+      _separatorTempDir = new File(_segmentDirectory, StarTreeV2Constants.EXISTING_STAR_TREE_TEMP_DIR);
+      FileUtils.forceMkdir(_separatorTempDir);
+      FileUtils.copyFileToDirectory(
+          new File(_segmentDirectory, StarTreeV2Constants.INDEX_FILE_NAME), _separatorTempDir, false);
+      FileUtils.copyFileToDirectory(
+          new File(_segmentDirectory, StarTreeV2Constants.INDEX_MAP_FILE_NAME), _separatorTempDir, false);

Review Comment:
   Can we move the file instead of copying? We can move the index files and only remove the metadata fields



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/startree/v2/builder/MultipleTreesBuilder.java:
##########
@@ -165,6 +231,13 @@ private static SingleTreeBuilder getSingleTreeBuilder(StarTreeV2BuilderConfig bu
 
   @Override
   public void close() {
+    if (_separatorTempDir != null) {
+      try {
+        FileUtils.forceDelete(_separatorTempDir);
+      } catch (Exception e) {
+        LOGGER.warn(e.getMessage(), e);

Review Comment:
   (minor) Put a more meaningful message, e.g. `Caught exception when deleting the separator tmp directory: {}`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] shounakmk219 commented on a diff in pull request #10905: Startree index build enhancement

Posted by "shounakmk219 (via GitHub)" <gi...@apache.org>.
shounakmk219 commented on code in PR #10905:
URL: https://github.com/apache/pinot/pull/10905#discussion_r1229065216


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/startree/v2/builder/MultipleTreesBuilder.java:
##########
@@ -104,7 +106,7 @@ public MultipleTreesBuilder(@Nullable List<StarTreeIndexConfig> indexConfigs, bo
     _segmentDirectory = SegmentDirectoryPaths.findSegmentDirectory(indexDir);
     _metadataProperties =
         CommonsConfigurationUtils.fromFile(new File(_segmentDirectory, V1Constants.MetadataKeys.METADATA_FILE_NAME));
-    Preconditions.checkState(!_metadataProperties.containsKey(MetadataKey.STAR_TREE_COUNT), "Star-tree already exists");
+    _existingStarTrees = getExistingStarTrees();

Review Comment:
   Good catch. Will wrap it in a try catch block.  I think it's good to call _existingStarTrees.cleanOutputDirectory() just to be safe.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] shounakmk219 commented on a diff in pull request #10905: Startree index build enhancement

Posted by "shounakmk219 (via GitHub)" <gi...@apache.org>.
shounakmk219 commented on code in PR #10905:
URL: https://github.com/apache/pinot/pull/10905#discussion_r1238640101


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/startree/v2/builder/MultipleTreesBuilder.java:
##########
@@ -136,9 +161,16 @@ public void build()
       for (int i = 0; i < numStarTrees; i++) {
         StarTreeV2BuilderConfig builderConfig = _builderConfigs.get(i);
         Configuration metadataProperties = _metadataProperties.subset(MetadataKey.getStarTreePrefix(i));
-        try (SingleTreeBuilder singleTreeBuilder = getSingleTreeBuilder(builderConfig, starTreeIndexDir, _segment,
-            metadataProperties, _buildMode)) {
-          singleTreeBuilder.build();
+        if (_existingStarTrees.containsTree(builderConfig)) {

Review Comment:
   Updated the `handleExistingStarTreeAddition` to return boolean based on whether the tree was reused, and moved all the check inside this function itself.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] shounakmk219 commented on a diff in pull request #10905: Startree index build enhancement

Posted by "shounakmk219 (via GitHub)" <gi...@apache.org>.
shounakmk219 commented on code in PR #10905:
URL: https://github.com/apache/pinot/pull/10905#discussion_r1238636098


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/startree/v2/builder/MultipleTreesBuilder.java:
##########
@@ -148,9 +180,54 @@ public void build()
       StarTreeIndexMapUtils
           .storeToFile(indexMaps, new File(_segmentDirectory, StarTreeV2Constants.INDEX_MAP_FILE_NAME));
       FileUtils.forceDelete(starTreeIndexDir);
+      _existingStarTrees.cleanOutputDirectory();
     }
 
-    LOGGER.info("Finished building {} star-trees in {}ms", numStarTrees, System.currentTimeMillis() - startTime);
+    LOGGER.info("Newly built start-trees {}.\n Reused star-trees {}.", numStarTrees - reusedStarTrees, reusedStarTrees);
+    LOGGER.info("Finished building star-trees in {}ms", System.currentTimeMillis() - startTime);

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] saurabhd336 commented on a diff in pull request #10905: Startree index build enhancement

Posted by "saurabhd336 (via GitHub)" <gi...@apache.org>.
saurabhd336 commented on code in PR #10905:
URL: https://github.com/apache/pinot/pull/10905#discussion_r1229017691


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/startree/v2/builder/MultipleTreesBuilder.java:
##########
@@ -104,7 +106,7 @@ public MultipleTreesBuilder(@Nullable List<StarTreeIndexConfig> indexConfigs, bo
     _segmentDirectory = SegmentDirectoryPaths.findSegmentDirectory(indexDir);
     _metadataProperties =
         CommonsConfigurationUtils.fromFile(new File(_segmentDirectory, V1Constants.MetadataKeys.METADATA_FILE_NAME));
-    Preconditions.checkState(!_metadataProperties.containsKey(MetadataKey.STAR_TREE_COUNT), "Star-tree already exists");
+    _existingStarTrees = getExistingStarTrees();

Review Comment:
   I see a try catch block on line 109. Will the directory _segment.destroy() handle the cleanup of the new dir we're creating, or should we also be calling _existingStarTrees.cleanupDir() in the catch block?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] saurabhd336 commented on pull request #10905: Startree index build enhancement

Posted by "saurabhd336 (via GitHub)" <gi...@apache.org>.
saurabhd336 commented on PR #10905:
URL: https://github.com/apache/pinot/pull/10905#issuecomment-1590517422

   Some minor comments but overall looks good to me. @shounakmk219 Please add the design doc to PR description.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] shounakmk219 commented on a diff in pull request #10905: Startree index build enhancement

Posted by "shounakmk219 (via GitHub)" <gi...@apache.org>.
shounakmk219 commented on code in PR #10905:
URL: https://github.com/apache/pinot/pull/10905#discussion_r1245119124


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/startree/v2/builder/MultipleTreesBuilder.java:
##########
@@ -104,7 +111,11 @@ public MultipleTreesBuilder(@Nullable List<StarTreeIndexConfig> indexConfigs, bo
     _segmentDirectory = SegmentDirectoryPaths.findSegmentDirectory(indexDir);
     _metadataProperties =
         CommonsConfigurationUtils.fromFile(new File(_segmentDirectory, V1Constants.MetadataKeys.METADATA_FILE_NAME));
-    Preconditions.checkState(!_metadataProperties.containsKey(MetadataKey.STAR_TREE_COUNT), "Star-tree already exists");
+    _separator = getSeparator();
+    if (_separator != null) {
+      StarTreeBuilderUtils.removeStarTrees(indexDir);
+      _metadataProperties.refresh();
+    }

Review Comment:
   Agree, will remove the handling.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] saurabhd336 commented on a diff in pull request #10905: Startree index build enhancement

Posted by "saurabhd336 (via GitHub)" <gi...@apache.org>.
saurabhd336 commented on code in PR #10905:
URL: https://github.com/apache/pinot/pull/10905#discussion_r1229020395


##########
pinot-segment-local/src/test/java/org/apache/pinot/segment/local/startree/v2/builder/StarTreeIndexSeparatorTest.java:
##########
@@ -0,0 +1,103 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.startree.v2.builder;
+
+import com.google.common.collect.Lists;
+import java.io.File;
+import java.io.IOException;
+import java.net.URL;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Objects;
+import org.apache.commons.configuration.PropertiesConfiguration;
+import org.apache.commons.io.FileUtils;
+import org.apache.pinot.segment.spi.V1Constants;
+import org.apache.pinot.segment.spi.index.startree.StarTreeV2Constants;
+import org.apache.pinot.spi.config.table.StarTreeIndexConfig;
+import org.apache.pinot.spi.env.CommonsConfigurationUtils;
+import org.testng.annotations.BeforeClass;
+import org.testng.annotations.Test;
+
+import static org.apache.pinot.segment.local.startree.v2.builder.StarTreeIndexSeparator.STARTREE_SEPARATOR_PREFIX;
+import static org.apache.pinot.segment.spi.V1Constants.Indexes.RAW_SV_FORWARD_INDEX_FILE_EXTENSION;
+import static org.apache.pinot.segment.spi.V1Constants.Indexes.UNSORTED_SV_FORWARD_INDEX_FILE_EXTENSION;
+import static org.apache.pinot.segment.spi.index.startree.StarTreeV2Constants.STAR_TREE_INDEX_FILE_NAME;
+import static org.testng.AssertJUnit.assertEquals;
+import static org.testng.AssertJUnit.assertNotNull;
+import static org.testng.AssertJUnit.assertTrue;
+
+
+public class StarTreeIndexSeparatorTest {
+
+  private static final String SEGMENT_PATH = "data/startree/segment";

Review Comment:
   The resource files added seem to be rather huge, can the desired startree index files be generated on the fly as part of running the test?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] codecov-commenter commented on pull request #10905: Startree index build enhancement

Posted by "codecov-commenter (via GitHub)" <gi...@apache.org>.
codecov-commenter commented on PR #10905:
URL: https://github.com/apache/pinot/pull/10905#issuecomment-1589367249

   ## [Codecov](https://app.codecov.io/gh/apache/pinot/pull/10905?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) Report
   > Merging [#10905](https://app.codecov.io/gh/apache/pinot/pull/10905?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (5d09cbd) into [master](https://app.codecov.io/gh/apache/pinot/commit/26e5952d75ff5accedd05c5eb9812cf900d3f413?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (26e5952) will **increase** coverage by `0.00%`.
   > The diff coverage is `0.00%`.
   
   ```diff
   @@            Coverage Diff            @@
   ##           master   #10905     +/-   ##
   =========================================
     Coverage    0.11%    0.11%             
   =========================================
     Files        2186     2133     -53     
     Lines      117358   115087   -2271     
     Branches    17732    17466    -266     
   =========================================
     Hits          137      137             
   + Misses     117201   114930   -2271     
     Partials       20       20             
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration1temurin11 | `?` | |
   | integration1temurin17 | `?` | |
   | unittests1temurin11 | `?` | |
   | unittests1temurin17 | `?` | |
   | unittests1temurin20 | `?` | |
   | unittests2temurin11 | `0.11% <0.00%> (-0.01%)` | :arrow_down: |
   | unittests2temurin17 | `0.11% <0.00%> (-0.01%)` | :arrow_down: |
   | unittests2temurin20 | `0.11% <0.00%> (-0.01%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://app.codecov.io/gh/apache/pinot/pull/10905?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Coverage Δ | |
   |---|---|---|
   | [...ocal/segment/index/loader/SegmentPreProcessor.java](https://app.codecov.io/gh/apache/pinot/pull/10905?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zZWdtZW50L2luZGV4L2xvYWRlci9TZWdtZW50UHJlUHJvY2Vzc29yLmphdmE=) | `0.00% <0.00%> (ø)` | |
   | [...ocal/startree/v2/builder/MultipleTreesBuilder.java](https://app.codecov.io/gh/apache/pinot/pull/10905?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zdGFydHJlZS92Mi9idWlsZGVyL011bHRpcGxlVHJlZXNCdWlsZGVyLmphdmE=) | `0.00% <0.00%> (ø)` | |
   | [...al/startree/v2/builder/StarTreeIndexSeparator.java](https://app.codecov.io/gh/apache/pinot/pull/10905?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9zdGFydHJlZS92Mi9idWlsZGVyL1N0YXJUcmVlSW5kZXhTZXBhcmF0b3IuamF2YQ==) | `0.00% <0.00%> (ø)` | |
   | [...egment/spi/index/startree/StarTreeV2Constants.java](https://app.codecov.io/gh/apache/pinot/pull/10905?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-cGlub3Qtc2VnbWVudC1zcGkvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3NlZ21lbnQvc3BpL2luZGV4L3N0YXJ0cmVlL1N0YXJUcmVlVjJDb25zdGFudHMuamF2YQ==) | `0.00% <ø> (ø)` | |
   
   ... and [84 files with indirect coverage changes](https://app.codecov.io/gh/apache/pinot/pull/10905/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] shounakmk219 commented on a diff in pull request #10905: Startree index build enhancement

Posted by "shounakmk219 (via GitHub)" <gi...@apache.org>.
shounakmk219 commented on code in PR #10905:
URL: https://github.com/apache/pinot/pull/10905#discussion_r1245114343


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/startree/v2/builder/MultipleTreesBuilder.java:
##########
@@ -115,13 +126,38 @@ public MultipleTreesBuilder(@Nullable List<StarTreeIndexConfig> indexConfigs, bo
     }
   }
 
+  private StarTreeIndexSeparator getSeparator()
+      throws Exception {
+    if (!_metadataProperties.containsKey(MetadataKey.STAR_TREE_COUNT)) {
+      return null;
+    }
+    try {
+      _separatorTempDir = new File(_segmentDirectory, StarTreeV2Constants.EXISTING_STAR_TREE_TEMP_DIR);
+      FileUtils.forceMkdir(_separatorTempDir);
+      FileUtils.copyFileToDirectory(
+          new File(_segmentDirectory, StarTreeV2Constants.INDEX_FILE_NAME), _separatorTempDir, false);
+      FileUtils.copyFileToDirectory(
+          new File(_segmentDirectory, StarTreeV2Constants.INDEX_MAP_FILE_NAME), _separatorTempDir, false);

Review Comment:
   That's a valid point, but I didn't move the files as I wanted to use `StarTreeBuilderUtils.removeStarTrees` (to avoid duplicating the startree index cleanup logic), and that was throwing exception upon missing startree index files. Let me know your thoughts on this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] shounakmk219 commented on a diff in pull request #10905: Startree index build enhancement

Posted by "shounakmk219 (via GitHub)" <gi...@apache.org>.
shounakmk219 commented on code in PR #10905:
URL: https://github.com/apache/pinot/pull/10905#discussion_r1236965567


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/SegmentPreProcessor.java:
##########
@@ -243,10 +243,8 @@ private void processStarTrees(File indexDir)
       if (starTreeMetadataList != null) {
         // There are existing star-trees
         if (StarTreeBuilderUtils.shouldRemoveExistingStarTrees(starTreeBuilderConfigs, starTreeMetadataList)) {

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] shounakmk219 commented on a diff in pull request #10905: Startree index build enhancement

Posted by "shounakmk219 (via GitHub)" <gi...@apache.org>.
shounakmk219 commented on code in PR #10905:
URL: https://github.com/apache/pinot/pull/10905#discussion_r1236968638


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/startree/v2/builder/MultipleTreesBuilder.java:
##########
@@ -115,13 +117,33 @@ public MultipleTreesBuilder(@Nullable List<StarTreeIndexConfig> indexConfigs, bo
     }
   }
 
+  private SeparatedStarTreesMetadata getExistingStarTrees()
+      throws Exception {
+    try {
+      if (_metadataProperties.containsKey(MetadataKey.STAR_TREE_COUNT)) {
+        // Extract existing startrees
+        // clean star-tree related files and configs once all the star-trees are separated and extracted
+        SeparatedStarTreesMetadata existingStarTrees = extractStarTrees();
+        StarTreeBuilderUtils.removeStarTrees(_segmentDirectory);

Review Comment:
   Nice catch! made the changes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] shounakmk219 commented on a diff in pull request #10905: Startree index build enhancement

Posted by "shounakmk219 (via GitHub)" <gi...@apache.org>.
shounakmk219 commented on code in PR #10905:
URL: https://github.com/apache/pinot/pull/10905#discussion_r1236967919


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/SegmentPreProcessor.java:
##########
@@ -243,10 +243,8 @@ private void processStarTrees(File indexDir)
       if (starTreeMetadataList != null) {
         // There are existing star-trees
         if (StarTreeBuilderUtils.shouldRemoveExistingStarTrees(starTreeBuilderConfigs, starTreeMetadataList)) {
-          // Remove the existing star-trees
-          LOGGER.info("Removing star-trees from segment: {}", _segmentMetadata.getName());
-          StarTreeBuilderUtils.removeStarTrees(indexDir);
-          _segmentMetadata = new SegmentMetadataImpl(indexDir);
+          LOGGER.info("Change detected in star-trees for segment: {}", _segmentMetadata.getName());
+          shouldGenerateStarTree = true;

Review Comment:
   Ah, you are right, made the changes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #10905: Startree index build enhancement

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #10905:
URL: https://github.com/apache/pinot/pull/10905#discussion_r1248326912


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/startree/v2/builder/MultipleTreesBuilder.java:
##########
@@ -104,7 +107,7 @@ public MultipleTreesBuilder(@Nullable List<StarTreeIndexConfig> indexConfigs, bo
     _segmentDirectory = SegmentDirectoryPaths.findSegmentDirectory(indexDir);
     _metadataProperties =
         CommonsConfigurationUtils.fromFile(new File(_segmentDirectory, V1Constants.MetadataKeys.METADATA_FILE_NAME));
-    Preconditions.checkState(!_metadataProperties.containsKey(MetadataKey.STAR_TREE_COUNT), "Star-tree already exists");
+    _separator = getSeparator();

Review Comment:
   We should revert this line. When this constructor is used, star-tree should not exist
   ```suggestion
       Preconditions.checkState(!_metadataProperties.containsKey(MetadataKey.STAR_TREE_COUNT), "Star-tree already exists");
       _separator = null;
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #10905: Startree index build enhancement

Posted by "Jackie-Jiang (via GitHub)" <gi...@apache.org>.
Jackie-Jiang commented on code in PR #10905:
URL: https://github.com/apache/pinot/pull/10905#discussion_r1247275361


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/startree/v2/builder/MultipleTreesBuilder.java:
##########
@@ -115,13 +126,38 @@ public MultipleTreesBuilder(@Nullable List<StarTreeIndexConfig> indexConfigs, bo
     }
   }
 
+  private StarTreeIndexSeparator getSeparator()
+      throws Exception {
+    if (!_metadataProperties.containsKey(MetadataKey.STAR_TREE_COUNT)) {
+      return null;
+    }
+    try {
+      _separatorTempDir = new File(_segmentDirectory, StarTreeV2Constants.EXISTING_STAR_TREE_TEMP_DIR);
+      FileUtils.forceMkdir(_separatorTempDir);
+      FileUtils.copyFileToDirectory(
+          new File(_segmentDirectory, StarTreeV2Constants.INDEX_FILE_NAME), _separatorTempDir, false);
+      FileUtils.copyFileToDirectory(
+          new File(_segmentDirectory, StarTreeV2Constants.INDEX_MAP_FILE_NAME), _separatorTempDir, false);

Review Comment:
   Moving file is much faster than copying file in most FS. Let's use move and directly clean up the fields in `_metadataProperties` by doing `_metadataProperties.subset(StarTreeV2Constants.MetadataKey.STAR_TREE_SUBSET).clear();`. This way we can also avoid reloading the `_metadataProperties`
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org