You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2021/05/27 18:13:23 UTC

[GitHub] [incubator-pinot] oker1 opened a new pull request #6994: Realtime to offline save space

oker1 opened a new pull request #6994:
URL: https://github.com/apache/incubator-pinot/pull/6994


   ## Description
   
   We saw high disk usage when using RealtimeToOfflineSegmentsTask, and as I looked into the code I've found that it keeps lots of temporary files around until the whole task is finished. As I understand the compressed files can be deleted immediately after they are extracted, and the extracted segment files don't need to be copied. It would be even better to download/extract/process the files one by one to reduce the disk space needed for the task, but this seems like a small change with nice immediate benefits.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] oker1 commented on a change in pull request #6994: Realtime to offline space saving

Posted by GitBox <gi...@apache.org>.
oker1 commented on a change in pull request #6994:
URL: https://github.com/apache/incubator-pinot/pull/6994#discussion_r641299148



##########
File path: pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/src/main/java/org/apache/pinot/plugin/minion/tasks/realtime_to_offline_segments/RealtimeToOfflineSegmentsTaskExecutor.java
##########
@@ -197,7 +197,7 @@ public void preProcess(PinotTaskConfig pinotTaskConfig) {
     Preconditions.checkState(inputSegmentsDir.mkdirs(), "Failed to create input directory: %s for task: %s",
         inputSegmentsDir.getAbsolutePath(), taskType);
     for (File indexDir : originalIndexDirs) {
-      FileUtils.copyDirectoryToDirectory(indexDir, inputSegmentsDir);
+      FileUtils.moveDirectoryToDirectory(indexDir, inputSegmentsDir, false);

Review comment:
       I can give it a shot in this PR, or would you prefer to do it separately?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #6994: Realtime to offline space saving

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on a change in pull request #6994:
URL: https://github.com/apache/incubator-pinot/pull/6994#discussion_r641706697



##########
File path: pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/src/main/java/org/apache/pinot/plugin/minion/tasks/realtime_to_offline_segments/RealtimeToOfflineSegmentsTaskExecutor.java
##########
@@ -197,7 +197,7 @@ public void preProcess(PinotTaskConfig pinotTaskConfig) {
     Preconditions.checkState(inputSegmentsDir.mkdirs(), "Failed to create input directory: %s for task: %s",
         inputSegmentsDir.getAbsolutePath(), taskType);
     for (File indexDir : originalIndexDirs) {
-      FileUtils.copyDirectoryToDirectory(indexDir, inputSegmentsDir);
+      FileUtils.moveDirectoryToDirectory(indexDir, inputSegmentsDir, false);

Review comment:
       @oker1 `RealtimeToOfflineSegmentsTaskExecutorTest` failed. Please check what causes the failure.
   Making it in one PR is even better




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] oker1 commented on a change in pull request #6994: Realtime to offline space saving

Posted by GitBox <gi...@apache.org>.
oker1 commented on a change in pull request #6994:
URL: https://github.com/apache/incubator-pinot/pull/6994#discussion_r641299148



##########
File path: pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/src/main/java/org/apache/pinot/plugin/minion/tasks/realtime_to_offline_segments/RealtimeToOfflineSegmentsTaskExecutor.java
##########
@@ -197,7 +197,7 @@ public void preProcess(PinotTaskConfig pinotTaskConfig) {
     Preconditions.checkState(inputSegmentsDir.mkdirs(), "Failed to create input directory: %s for task: %s",
         inputSegmentsDir.getAbsolutePath(), taskType);
     for (File indexDir : originalIndexDirs) {
-      FileUtils.copyDirectoryToDirectory(indexDir, inputSegmentsDir);
+      FileUtils.moveDirectoryToDirectory(indexDir, inputSegmentsDir, false);

Review comment:
       I can give it a shot in this PR (to use the original dir), or would you prefer to do it separately?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #6994: Realtime to offline space saving

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on a change in pull request #6994:
URL: https://github.com/apache/incubator-pinot/pull/6994#discussion_r643325326



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/segment/processing/framework/SegmentProcessorFramework.java
##########
@@ -60,20 +61,19 @@
 
   /**
    * Initializes the Segment Processor framework with input segments, output path and processing config
-   * @param inputSegmentsDir directory containing the input segments. These can be tarred or untarred.
+   * @param inputSegments list of input segments. These can be tarred or untarred.
    * @param segmentProcessorConfig config for segment processing
    * @param outputSegmentsDir directory for placing the resulting segments. This should already exist.
    */
-  public SegmentProcessorFramework(File inputSegmentsDir, SegmentProcessorConfig segmentProcessorConfig,
-      File outputSegmentsDir) {
+  public SegmentProcessorFramework(List<File> inputSegments, SegmentProcessorConfig segmentProcessorConfig,
+                                   File outputSegmentsDir) {

Review comment:
       Please follow the code style: https://docs.pinot.apache.org/developers/developers-and-contributors/code-setup

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/segment/processing/framework/SegmentProcessorFramework.java
##########
@@ -107,15 +107,14 @@ public void processSegments()
       throws Exception {
 
     // Check for input segments
-    File[] segmentFiles = _inputSegmentsDir.listFiles();
-    if (segmentFiles.length == 0) {
-      throw new IllegalStateException("No segments found in input dir: " + _inputSegmentsDir.getAbsolutePath()
+    if (_inputSegments.size() == 0) {

Review comment:
       Move this check to the constructor




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] Jackie-Jiang commented on a change in pull request #6994: Realtime to offline space saving

Posted by GitBox <gi...@apache.org>.
Jackie-Jiang commented on a change in pull request #6994:
URL: https://github.com/apache/incubator-pinot/pull/6994#discussion_r641020167



##########
File path: pinot-plugins/pinot-minion-tasks/pinot-minion-builtin-tasks/src/main/java/org/apache/pinot/plugin/minion/tasks/realtime_to_offline_segments/RealtimeToOfflineSegmentsTaskExecutor.java
##########
@@ -197,7 +197,7 @@ public void preProcess(PinotTaskConfig pinotTaskConfig) {
     Preconditions.checkState(inputSegmentsDir.mkdirs(), "Failed to create input directory: %s for task: %s",
         inputSegmentsDir.getAbsolutePath(), taskType);
     for (File indexDir : originalIndexDirs) {
-      FileUtils.copyDirectoryToDirectory(indexDir, inputSegmentsDir);
+      FileUtils.moveDirectoryToDirectory(indexDir, inputSegmentsDir, false);

Review comment:
       Ideally we should not move the index directories because the caller might still need them. We can directly read from them instead of copying them to another dir.
   The current code should work because the parent class won't access the original index directories, but please at least add a TODO to describe the side effect




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] codecov-commenter commented on pull request #6994: Realtime to offline space saving

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on pull request #6994:
URL: https://github.com/apache/incubator-pinot/pull/6994#issuecomment-850013482


   # [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6994?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#6994](https://codecov.io/gh/apache/incubator-pinot/pull/6994?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (9004d19) into [master](https://codecov.io/gh/apache/incubator-pinot/commit/0b5dcb7aed1b85ab26345ab6c1076ccb88358f4a?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (0b5dcb7) will **decrease** coverage by `31.42%`.
   > The diff coverage is `100.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-pinot/pull/6994/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/incubator-pinot/pull/6994?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@              Coverage Diff              @@
   ##             master    #6994       +/-   ##
   =============================================
   - Coverage     73.39%   41.97%   -31.43%     
   + Complexity       12        7        -5     
   =============================================
     Files          1441     1441               
     Lines         71450    71451        +1     
     Branches      10354    10354               
   =============================================
   - Hits          52438    29988    -22450     
   - Misses        15487    38872    +23385     
   + Partials       3525     2591      -934     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration | `41.97% <100.00%> (-0.12%)` | :arrow_down: |
   | unittests | `?` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-pinot/pull/6994?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [.../tasks/BaseMultipleSegmentsConversionExecutor.java](https://codecov.io/gh/apache/incubator-pinot/pull/6994/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtcGx1Z2lucy9waW5vdC1taW5pb24tdGFza3MvcGlub3QtbWluaW9uLWJ1aWx0aW4tdGFza3Mvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3BsdWdpbi9taW5pb24vdGFza3MvQmFzZU11bHRpcGxlU2VnbWVudHNDb252ZXJzaW9uRXhlY3V0b3IuamF2YQ==) | `90.00% <100.00%> (+0.16%)` | :arrow_up: |
   | [...egments/RealtimeToOfflineSegmentsTaskExecutor.java](https://codecov.io/gh/apache/incubator-pinot/pull/6994/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtcGx1Z2lucy9waW5vdC1taW5pb24tdGFza3MvcGlub3QtbWluaW9uLWJ1aWx0aW4tdGFza3Mvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL3Bpbm90L3BsdWdpbi9taW5pb24vdGFza3MvcmVhbHRpbWVfdG9fb2ZmbGluZV9zZWdtZW50cy9SZWFsdGltZVRvT2ZmbGluZVNlZ21lbnRzVGFza0V4ZWN1dG9yLmphdmE=) | `69.11% <100.00%> (-20.59%)` | :arrow_down: |
   | [...c/main/java/org/apache/pinot/common/tier/Tier.java](https://codecov.io/gh/apache/incubator-pinot/pull/6994/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vdGllci9UaWVyLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...ava/org/apache/pinot/spi/data/MetricFieldSpec.java](https://codecov.io/gh/apache/incubator-pinot/pull/6994/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvZGF0YS9NZXRyaWNGaWVsZFNwZWMuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...a/org/apache/pinot/spi/config/table/TableType.java](https://codecov.io/gh/apache/incubator-pinot/pull/6994/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvY29uZmlnL3RhYmxlL1RhYmxlVHlwZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...a/org/apache/pinot/spi/data/DateTimeFieldSpec.java](https://codecov.io/gh/apache/incubator-pinot/pull/6994/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvZGF0YS9EYXRlVGltZUZpZWxkU3BlYy5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [.../org/apache/pinot/spi/data/DimensionFieldSpec.java](https://codecov.io/gh/apache/incubator-pinot/pull/6994/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvZGF0YS9EaW1lbnNpb25GaWVsZFNwZWMuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [.../org/apache/pinot/spi/data/readers/FileFormat.java](https://codecov.io/gh/apache/incubator-pinot/pull/6994/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvZGF0YS9yZWFkZXJzL0ZpbGVGb3JtYXQuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...org/apache/pinot/spi/config/table/QuotaConfig.java](https://codecov.io/gh/apache/incubator-pinot/pull/6994/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvY29uZmlnL3RhYmxlL1F1b3RhQ29uZmlnLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...org/apache/pinot/spi/config/tenant/TenantRole.java](https://codecov.io/gh/apache/incubator-pinot/pull/6994/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvY29uZmlnL3RlbmFudC9UZW5hbnRSb2xlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | ... and [915 more](https://codecov.io/gh/apache/incubator-pinot/pull/6994/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6994?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/6994?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [0b5dcb7...9004d19](https://codecov.io/gh/apache/incubator-pinot/pull/6994?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org