You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@carbondata.apache.org by GitBox <gi...@apache.org> on 2021/12/28 08:50:57 UTC

[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4242: [CARBONDATA-4318]Improve load overwrite performance for partition tables

vikramahuja1001 commented on a change in pull request #4242:
URL: https://github.com/apache/carbondata/pull/4242#discussion_r775794190



##########
File path: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonOutputCommitter.java
##########
@@ -316,31 +326,38 @@ private void commitJobForPartition(JobContext context, boolean overwriteSet,
    * of all segment files.
    */
   private String overwritePartitions(CarbonLoadModel loadModel, LoadMetadataDetails newMetaEntry,
-      String uuid) throws IOException {
+      String uuid, List<String> partitionList, List<PartitionSpec> currentPartitionsOfTable)
+      throws IOException {
     CarbonTable table = loadModel.getCarbonDataLoadSchema().getCarbonTable();
-    SegmentFileStore fileStore = new SegmentFileStore(loadModel.getTablePath(),
-        loadModel.getSegmentId() + "_" + loadModel.getFactTimeStamp()
-            + CarbonTablePath.SEGMENT_EXT);
-    List<PartitionSpec> partitionSpecs = fileStore.getPartitionSpecs();
-
-    if (partitionSpecs != null && partitionSpecs.size() > 0) {
-      List<Segment> validSegments =
-          new SegmentStatusManager(table.getAbsoluteTableIdentifier())
-              .getValidAndInvalidSegments(table.isMV()).getValidSegments();
-      String uniqueId = String.valueOf(System.currentTimeMillis());
-      List<String> toBeUpdatedSegments = new ArrayList<>();
-      List<String> toBeDeletedSegments = new ArrayList<>();
-      // First drop the partitions from partition mapper files of each segment
-      for (Segment segment : validSegments) {
-        new SegmentFileStore(table.getTablePath(), segment.getSegmentFileName()).dropPartitions(
-            segment, partitionSpecs, uniqueId, toBeDeletedSegments, toBeUpdatedSegments);
+    if (partitionList != null && partitionList.size() > 0) {
+      // check if any partitions overlaps
+      List<String> overlappingPartitions = currentPartitionsOfTable.stream()
+          .map(partitionSpec -> partitionSpec.getLocation().toString())
+          .filter(partitionList::contains).collect(Collectors.toList());
+      if (!overlappingPartitions.isEmpty()) {
+        List<LoadMetadataDetails> validLoadMetadataDetails =
+            loadModel.getLoadMetadataDetails().stream().filter(
+                loadMetadataDetail -> !loadMetadataDetail.getLoadName()
+                    .equalsIgnoreCase(newMetaEntry.getLoadName())).collect(Collectors.toList());
+        String uniqueId = String.valueOf(System.currentTimeMillis());
+        List<String> toBeUpdatedSegments = new ArrayList<>();

Review comment:
       maybe for this list we can give the max length as number of segments while instantiating. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@carbondata.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org