You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@carbondata.apache.org by GitBox <gi...@apache.org> on 2021/12/28 15:12:07 UTC

[GitHub] [carbondata] akashrn5 commented on a change in pull request #4242: [CARBONDATA-4318]Improve load overwrite performance for partition tables

akashrn5 commented on a change in pull request #4242:
URL: https://github.com/apache/carbondata/pull/4242#discussion_r775953047



##########
File path: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonOutputCommitter.java
##########
@@ -316,31 +326,38 @@ private void commitJobForPartition(JobContext context, boolean overwriteSet,
    * of all segment files.
    */
   private String overwritePartitions(CarbonLoadModel loadModel, LoadMetadataDetails newMetaEntry,
-      String uuid) throws IOException {
+      String uuid, List<String> partitionList, List<PartitionSpec> currentPartitionsOfTable)
+      throws IOException {
     CarbonTable table = loadModel.getCarbonDataLoadSchema().getCarbonTable();
-    SegmentFileStore fileStore = new SegmentFileStore(loadModel.getTablePath(),
-        loadModel.getSegmentId() + "_" + loadModel.getFactTimeStamp()
-            + CarbonTablePath.SEGMENT_EXT);
-    List<PartitionSpec> partitionSpecs = fileStore.getPartitionSpecs();
-
-    if (partitionSpecs != null && partitionSpecs.size() > 0) {
-      List<Segment> validSegments =
-          new SegmentStatusManager(table.getAbsoluteTableIdentifier())
-              .getValidAndInvalidSegments(table.isMV()).getValidSegments();
-      String uniqueId = String.valueOf(System.currentTimeMillis());
-      List<String> toBeUpdatedSegments = new ArrayList<>();
-      List<String> toBeDeletedSegments = new ArrayList<>();
-      // First drop the partitions from partition mapper files of each segment
-      for (Segment segment : validSegments) {
-        new SegmentFileStore(table.getTablePath(), segment.getSegmentFileName()).dropPartitions(
-            segment, partitionSpecs, uniqueId, toBeDeletedSegments, toBeUpdatedSegments);
+    if (partitionList != null && partitionList.size() > 0) {
+      // check if any partitions overlaps
+      List<String> overlappingPartitions = currentPartitionsOfTable.stream()
+          .map(partitionSpec -> partitionSpec.getLocation().toString())
+          .filter(partitionList::contains).collect(Collectors.toList());
+      if (!overlappingPartitions.isEmpty()) {
+        List<LoadMetadataDetails> validLoadMetadataDetails =
+            loadModel.getLoadMetadataDetails().stream().filter(
+                loadMetadataDetail -> !loadMetadataDetail.getLoadName()
+                    .equalsIgnoreCase(newMetaEntry.getLoadName())).collect(Collectors.toList());
+        String uniqueId = String.valueOf(System.currentTimeMillis());
+        List<String> toBeUpdatedSegments = new ArrayList<>();

Review comment:
       anyways, this was old code just added in a condition, but yeah, initialized with an initial capacity of `validLoadMetadataDetails`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org