You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2021/07/23 22:50:09 UTC

[GitHub] [pinot] Jackie-Jiang commented on a change in pull request #7193: Reduce the disk usage for segment conversion task

Jackie-Jiang commented on a change in pull request #7193:
URL: https://github.com/apache/pinot/pull/7193#discussion_r675900136



##########
File path: pinot-tools/src/main/java/org/apache/pinot/tools/admin/command/SegmentProcessorFrameworkCommand.java
##########
@@ -72,25 +80,42 @@ public String description() {
   @Override
   public boolean execute()
       throws Exception {
+    PluginManager.get().init();
 
     SegmentProcessorFrameworkSpec segmentProcessorFrameworkSpec =
         JsonUtils.fileToObject(new File(_segmentProcessorFrameworkSpec), SegmentProcessorFrameworkSpec.class);
 
     File inputSegmentsDir = new File(segmentProcessorFrameworkSpec.getInputSegmentsDir());
     File outputSegmentsDir = new File(segmentProcessorFrameworkSpec.getOutputSegmentsDir());
-    if (!outputSegmentsDir.exists()) {
-      if (!outputSegmentsDir.mkdirs()) {
-        throw new RuntimeException(
-            "Did not find output directory, and could not create it either: " + segmentProcessorFrameworkSpec
-                .getOutputSegmentsDir());
+    File workingDir = new File(outputSegmentsDir, "tmp-" + UUID.randomUUID());
+    File untarredSegmentsDir = new File(workingDir, "untarred_segments");
+    FileUtils.forceMkdir(untarredSegmentsDir);
+    File[] segmentDirs = inputSegmentsDir.listFiles();
+    Preconditions
+        .checkState(segmentDirs != null && segmentDirs.length > 0, "Failed to find files under input segments dir: %s",
+            inputSegmentsDir.getAbsolutePath());
+    List<RecordReader> recordReaders = new ArrayList<>(segmentDirs.length);
+    for (File segmentDir : segmentDirs) {
+      String fileName = segmentDir.getName();
+
+      // Untar the segments if needed
+      if (!segmentDir.isDirectory()) {
+        if (fileName.endsWith(".tar.gz") || fileName.endsWith(".tgz")) {
+          segmentDir = TarGzCompressionUtils.untar(segmentDir, untarredSegmentsDir).get(0);
+        } else {
+          throw new IllegalStateException("Unsupported segment format: " + segmentDir.getAbsolutePath());

Review comment:
       Good point. One workaround would be blindly untar the file assuming it is a tar.gz file.
   Ideally we should put some extra config to indicate the file type, and we can use this command to process any data files, not limited to pinot segments.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org