You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by GitBox <gi...@apache.org> on 2021/01/04 06:46:10 UTC

[GitHub] [carbondata] Karan980 opened a new pull request #4070: [CARBONDATA-4082] Fix alter table add segment query on adding a segment having delete delta files.

Karan980 opened a new pull request #4070:
URL: https://github.com/apache/carbondata/pull/4070


    ### Why is this PR needed?
   When a segment is added to a carbon table by alter table add segment query and that segment also have a deleteDelta file present in it, then on querying the carbon table the deleted rows are coming in the result.
    
    ### What changes were proposed in this PR?
   Updating the tableStatus and tableUpdateStatus files in correct way for the segments having delta delta files.
   
    ### Does this PR introduce any user interface change?
    - No
   
    ### Is any new testcase added?
    - Yes
   
       
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] kunal642 commented on pull request #4070: [CARBONDATA-4082] Fix alter table add segment query on adding a segment having delete delta files.

Posted by GitBox <gi...@apache.org>.
kunal642 commented on pull request #4070:
URL: https://github.com/apache/carbondata/pull/4070#issuecomment-773154078


   LGTM


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4070: [CARBONDATA-4082] Fix alter table add segment query on adding a segment having delete delta files.

Posted by GitBox <gi...@apache.org>.
CarbonDataQA2 commented on pull request #4070:
URL: https://github.com/apache/carbondata/pull/4070#issuecomment-766552158


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/5054/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4070: [CARBONDATA-4082] Fix alter table add segment query on adding a segment having delete delta files.

Posted by GitBox <gi...@apache.org>.
CarbonDataQA2 commented on pull request #4070:
URL: https://github.com/apache/carbondata/pull/4070#issuecomment-757531715


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3532/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4070: [CARBONDATA-4082] Fix alter table add segment query on adding a segment having delete delta files.

Posted by GitBox <gi...@apache.org>.
CarbonDataQA2 commented on pull request #4070:
URL: https://github.com/apache/carbondata/pull/4070#issuecomment-766552158






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4070: [CARBONDATA-4082] Fix alter table add segment query on adding a segment having delete delta files.

Posted by GitBox <gi...@apache.org>.
CarbonDataQA2 commented on pull request #4070:
URL: https://github.com/apache/carbondata/pull/4070#issuecomment-759238242


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3546/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4070: [CARBONDATA-4082] Fix alter table add segment query on adding a segment having delete delta files.

Posted by GitBox <gi...@apache.org>.
CarbonDataQA2 commented on pull request #4070:
URL: https://github.com/apache/carbondata/pull/4070#issuecomment-757531962


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5292/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4070: [CARBONDATA-4082] Fix alter table add segment query on adding a segment having delete delta files.

Posted by GitBox <gi...@apache.org>.
CarbonDataQA2 commented on pull request #4070:
URL: https://github.com/apache/carbondata/pull/4070#issuecomment-769939912


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3621/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] Karan980 commented on pull request #4070: [CARBONDATA-4082] Fix alter table add segment query on adding a segment having delete delta files.

Posted by GitBox <gi...@apache.org>.
Karan980 commented on pull request #4070:
URL: https://github.com/apache/carbondata/pull/4070#issuecomment-772441209


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4070: [CARBONDATA-4082] Fix alter table add segment query on adding a segment having delete delta files.

Posted by GitBox <gi...@apache.org>.
CarbonDataQA2 commented on pull request #4070:
URL: https://github.com/apache/carbondata/pull/4070#issuecomment-753973981


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5277/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] kunal642 commented on a change in pull request #4070: [CARBONDATA-4082] Fix alter table add segment query on adding a segment having delete delta files.

Posted by GitBox <gi...@apache.org>.
kunal642 commented on a change in pull request #4070:
URL: https://github.com/apache/carbondata/pull/4070#discussion_r568530696



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonAddLoadCommand.scala
##########
@@ -294,6 +297,49 @@ case class CarbonAddLoadCommand(
       OperationListenerBus.getInstance().fireEvent(loadTablePreStatusUpdateEvent, operationContext)
     }
 
+    val deltaFiles = FileFactory.getCarbonFile(segmentPath).listFiles()
+      .filter(_.getName.endsWith(CarbonCommonConstants.DELETE_DELTA_FILE_EXT))
+    if (deltaFiles.length > 0) {
+      val blockNameToDeltaFilesMap =
+        collection.mutable.Map[String, collection.mutable.ListBuffer[(CarbonFile, String)]]()
+      deltaFiles.foreach { deltaFile =>
+        val tmpDeltaFilePath = deltaFile.getAbsolutePath
+          .replace(CarbonCommonConstants.WINDOWS_FILE_SEPARATOR,
+            CarbonCommonConstants.FILE_SEPARATOR)
+        val deltaFilePathElements = tmpDeltaFilePath.split(CarbonCommonConstants.FILE_SEPARATOR)
+        if (deltaFilePathElements != null && deltaFilePathElements.nonEmpty) {
+          val deltaFileName = deltaFilePathElements(deltaFilePathElements.length - 1)
+          val blockName = CarbonTablePath.DataFileUtil
+            .getBlockNameFromDeleteDeltaFile(deltaFileName)
+          if (blockNameToDeltaFilesMap.contains(blockName)) {
+            blockNameToDeltaFilesMap(blockName) += ((deltaFile, deltaFileName))
+          } else {
+            val deltaFileList = new ListBuffer[(CarbonFile, String)]()
+            deltaFileList += ((deltaFile, deltaFileName))
+            blockNameToDeltaFilesMap.put(blockName, deltaFileList)
+          }
+        }
+      }
+      val segmentUpdateDetails = new util.ArrayList[SegmentUpdateDetails]()
+      val columnCompressor = CompressorFactory.getInstance.getCompressor.getName
+      blockNameToDeltaFilesMap.foreach { entry =>
+        val segmentUpdateDetail = new SegmentUpdateDetails()
+        segmentUpdateDetail.setBlockName(entry._1)
+        segmentUpdateDetail.setActualBlockName(
+          entry._1 + CarbonCommonConstants.POINT + columnCompressor +
+            CarbonCommonConstants.FACT_FILE_EXT)
+        segmentUpdateDetail.setSegmentName(model.getSegmentId)
+        setMinMaxDeltaStampAndDeletedRowCount(entry._2, segmentUpdateDetail)
+        segmentUpdateDetails.add(segmentUpdateDetail)
+      }
+      val timestamp = System.currentTimeMillis().toString
+      val segmentDetails = new util.HashSet[Segment]()
+      segmentDetails.add(model.getSegment)
+      CarbonUpdateUtil.updateSegmentStatus(segmentUpdateDetails, carbonTable, timestamp, false)

Review comment:
       can we pass a check like forcewrite in the updateSegmentStatus to avoid the validation of the segment from tablestaus file.. this flag would be true in addload command when delete delta is present. This way you can avoid writing twice.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4070: [CARBONDATA-4082] Fix alter table add segment query on adding a segment having delete delta files.

Posted by GitBox <gi...@apache.org>.
CarbonDataQA2 commented on pull request #4070:
URL: https://github.com/apache/carbondata/pull/4070#issuecomment-753830863


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3511/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] Karan980 commented on a change in pull request #4070: [CARBONDATA-4082] Fix alter table add segment query on adding a segment having delete delta files.

Posted by GitBox <gi...@apache.org>.
Karan980 commented on a change in pull request #4070:
URL: https://github.com/apache/carbondata/pull/4070#discussion_r551275838



##########
File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/addsegment/AddSegmentTestCase.scala
##########
@@ -82,6 +82,59 @@ class AddSegmentTestCase extends QueryTest with BeforeAndAfterAll {
     FileFactory.deleteAllFilesOfDir(new File(newPath))
   }
 
+  test("Test add segment for the segment having delete delta files") {

Review comment:
       Done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4070: [CARBONDATA-4082] Fix alter table add segment query on adding a segment having delete delta files.

Posted by GitBox <gi...@apache.org>.
CarbonDataQA2 commented on pull request #4070:
URL: https://github.com/apache/carbondata/pull/4070#issuecomment-753829271


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5272/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4070: [CARBONDATA-4082] Fix alter table add segment query on adding a segment having delete delta files.

Posted by GitBox <gi...@apache.org>.
CarbonDataQA2 commented on pull request #4070:
URL: https://github.com/apache/carbondata/pull/4070#issuecomment-772493677


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5419/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] QiangCai commented on a change in pull request #4070: [CARBONDATA-4082] Fix alter table add segment query on adding a segment having delete delta files.

Posted by GitBox <gi...@apache.org>.
QiangCai commented on a change in pull request #4070:
URL: https://github.com/apache/carbondata/pull/4070#discussion_r553074562



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonAddLoadCommand.scala
##########
@@ -294,6 +297,49 @@ case class CarbonAddLoadCommand(
       OperationListenerBus.getInstance().fireEvent(loadTablePreStatusUpdateEvent, operationContext)
     }
 
+    val deltaFiles = FileFactory.getCarbonFile(segmentPath).listFiles()

Review comment:
       how to know which deletedelta files are valid?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] Karan980 commented on a change in pull request #4070: [CARBONDATA-4082] Fix alter table add segment query on adding a segment having delete delta files.

Posted by GitBox <gi...@apache.org>.
Karan980 commented on a change in pull request #4070:
URL: https://github.com/apache/carbondata/pull/4070#discussion_r569256604



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonAddLoadCommand.scala
##########
@@ -294,6 +297,49 @@ case class CarbonAddLoadCommand(
       OperationListenerBus.getInstance().fireEvent(loadTablePreStatusUpdateEvent, operationContext)
     }
 
+    val deltaFiles = FileFactory.getCarbonFile(segmentPath).listFiles()

Review comment:
       Done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] Karan980 commented on a change in pull request #4070: [CARBONDATA-4082] Fix alter table add segment query on adding a segment having delete delta files.

Posted by GitBox <gi...@apache.org>.
Karan980 commented on a change in pull request #4070:
URL: https://github.com/apache/carbondata/pull/4070#discussion_r561607408



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonAddLoadCommand.scala
##########
@@ -294,6 +297,49 @@ case class CarbonAddLoadCommand(
       OperationListenerBus.getInstance().fireEvent(loadTablePreStatusUpdateEvent, operationContext)
     }
 
+    val deltaFiles = FileFactory.getCarbonFile(segmentPath).listFiles()

Review comment:
       Now, i'm taking just one delete delta file with highest timestamp for each block, by assuming that threshold for horizontal compaction is not changed from default. In case of SDK, all files are valid as horizontal compaction is not there in SDK




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4070: [CARBONDATA-4082] Fix alter table add segment query on adding a segment having delete delta files.

Posted by GitBox <gi...@apache.org>.
CarbonDataQA2 commented on pull request #4070:
URL: https://github.com/apache/carbondata/pull/4070#issuecomment-763757918


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3562/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] kunal642 commented on a change in pull request #4070: [CARBONDATA-4082] Fix alter table add segment query on adding a segment having delete delta files.

Posted by GitBox <gi...@apache.org>.
kunal642 commented on a change in pull request #4070:
URL: https://github.com/apache/carbondata/pull/4070#discussion_r568530696



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonAddLoadCommand.scala
##########
@@ -294,6 +297,49 @@ case class CarbonAddLoadCommand(
       OperationListenerBus.getInstance().fireEvent(loadTablePreStatusUpdateEvent, operationContext)
     }
 
+    val deltaFiles = FileFactory.getCarbonFile(segmentPath).listFiles()
+      .filter(_.getName.endsWith(CarbonCommonConstants.DELETE_DELTA_FILE_EXT))
+    if (deltaFiles.length > 0) {
+      val blockNameToDeltaFilesMap =
+        collection.mutable.Map[String, collection.mutable.ListBuffer[(CarbonFile, String)]]()
+      deltaFiles.foreach { deltaFile =>
+        val tmpDeltaFilePath = deltaFile.getAbsolutePath
+          .replace(CarbonCommonConstants.WINDOWS_FILE_SEPARATOR,
+            CarbonCommonConstants.FILE_SEPARATOR)
+        val deltaFilePathElements = tmpDeltaFilePath.split(CarbonCommonConstants.FILE_SEPARATOR)
+        if (deltaFilePathElements != null && deltaFilePathElements.nonEmpty) {
+          val deltaFileName = deltaFilePathElements(deltaFilePathElements.length - 1)
+          val blockName = CarbonTablePath.DataFileUtil
+            .getBlockNameFromDeleteDeltaFile(deltaFileName)
+          if (blockNameToDeltaFilesMap.contains(blockName)) {
+            blockNameToDeltaFilesMap(blockName) += ((deltaFile, deltaFileName))
+          } else {
+            val deltaFileList = new ListBuffer[(CarbonFile, String)]()
+            deltaFileList += ((deltaFile, deltaFileName))
+            blockNameToDeltaFilesMap.put(blockName, deltaFileList)
+          }
+        }
+      }
+      val segmentUpdateDetails = new util.ArrayList[SegmentUpdateDetails]()
+      val columnCompressor = CompressorFactory.getInstance.getCompressor.getName
+      blockNameToDeltaFilesMap.foreach { entry =>
+        val segmentUpdateDetail = new SegmentUpdateDetails()
+        segmentUpdateDetail.setBlockName(entry._1)
+        segmentUpdateDetail.setActualBlockName(
+          entry._1 + CarbonCommonConstants.POINT + columnCompressor +
+            CarbonCommonConstants.FACT_FILE_EXT)
+        segmentUpdateDetail.setSegmentName(model.getSegmentId)
+        setMinMaxDeltaStampAndDeletedRowCount(entry._2, segmentUpdateDetail)
+        segmentUpdateDetails.add(segmentUpdateDetail)
+      }
+      val timestamp = System.currentTimeMillis().toString
+      val segmentDetails = new util.HashSet[Segment]()
+      segmentDetails.add(model.getSegment)
+      CarbonUpdateUtil.updateSegmentStatus(segmentUpdateDetails, carbonTable, timestamp, false)

Review comment:
       can we pass a check like forcewrite in the updateSegmentStatus to avoid the validation of the segment from tablestaus file.. this flag would be true in addload command when delete delta is present. This way you can avoid writing twice.

##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonAddLoadCommand.scala
##########
@@ -369,5 +426,64 @@ case class CarbonAddLoadCommand(
     }
   }
 
+  /**
+   * If there are more than one deleteDelta File present  for a block. Then This method
+   * will pick the deltaFile with highest timestamp, because the default threshold for horizontal
+   * compaction is 1. It is assumed that threshold for horizontal compaction is not changed from
+   * default value. So there will always be only one valid delete delta file present for a block.
+   * It also sets the number of deleted rows for a segment.
+   */
+  def setValidDeltaFileAndDeletedRowCount(
+      deleteDeltaFiles : ListBuffer[(CarbonFile, String)],
+      segmentUpdateDetails : SegmentUpdateDetails
+      ) : Unit = {
+    var maxDeltaStamp : Long = -1
+    var deletedRowsCount : Long = 0
+    var validDeltaFile : CarbonFile = null
+    deleteDeltaFiles.foreach { deltaFile =>
+      val currentFileTimestamp = CarbonTablePath.DataFileUtil
+        .getTimeStampFromDeleteDeltaFile(deltaFile._2)
+      if (currentFileTimestamp.toLong > maxDeltaStamp) {
+        maxDeltaStamp = currentFileTimestamp.toLong
+        validDeltaFile = deltaFile._1
+      }
+    }
+    val blockDetails =
+      new CarbonDeleteDeltaFileReaderImpl(validDeltaFile.getAbsolutePath).readJson()
+    blockDetails.getBlockletDetails.asScala.foreach { blocklet =>
+      deletedRowsCount = deletedRowsCount + blocklet.getDeletedRows.size()
+    }
+    segmentUpdateDetails.setDeleteDeltaStartTimestamp(maxDeltaStamp.toString)
+    segmentUpdateDetails.setDeleteDeltaEndTimestamp(maxDeltaStamp.toString)
+    segmentUpdateDetails.setDeletedRowsInBlock(deletedRowsCount.toString)
+  }
+
+  /**
+   * As horizontal compaction not supported for SDK segments. So all delta files are valid
+   */
+  def readAllDeltaFiles(
+      deleteDeltaFiles : ListBuffer[(CarbonFile, String)],
+      segmentUpdateDetails : SegmentUpdateDetails
+  ) : Unit = {

Review comment:
       please fix this formatting.. move to above line. Check other code for the same as well

##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonAddLoadCommand.scala
##########
@@ -294,6 +297,49 @@ case class CarbonAddLoadCommand(
       OperationListenerBus.getInstance().fireEvent(loadTablePreStatusUpdateEvent, operationContext)
     }
 
+    val deltaFiles = FileFactory.getCarbonFile(segmentPath).listFiles()

Review comment:
       Better to use CarbonFileFilter to list only the delete delta files




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] QiangCai commented on a change in pull request #4070: [CARBONDATA-4082] Fix alter table add segment query on adding a segment having delete delta files.

Posted by GitBox <gi...@apache.org>.
QiangCai commented on a change in pull request #4070:
URL: https://github.com/apache/carbondata/pull/4070#discussion_r553073502



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonAddLoadCommand.scala
##########
@@ -294,6 +297,49 @@ case class CarbonAddLoadCommand(
       OperationListenerBus.getInstance().fireEvent(loadTablePreStatusUpdateEvent, operationContext)
     }
 
+    val deltaFiles = FileFactory.getCarbonFile(segmentPath).listFiles()
+      .filter(_.getName.endsWith(CarbonCommonConstants.DELETE_DELTA_FILE_EXT))
+    if (deltaFiles.length > 0) {
+      val blockNameToDeltaFilesMap =
+        collection.mutable.Map[String, collection.mutable.ListBuffer[(CarbonFile, String)]]()
+      deltaFiles.foreach { deltaFile =>
+        val tmpDeltaFilePath = deltaFile.getAbsolutePath
+          .replace(CarbonCommonConstants.WINDOWS_FILE_SEPARATOR,
+            CarbonCommonConstants.FILE_SEPARATOR)
+        val deltaFilePathElements = tmpDeltaFilePath.split(CarbonCommonConstants.FILE_SEPARATOR)
+        if (deltaFilePathElements != null && deltaFilePathElements.nonEmpty) {
+          val deltaFileName = deltaFilePathElements(deltaFilePathElements.length - 1)
+          val blockName = CarbonTablePath.DataFileUtil
+            .getBlockNameFromDeleteDeltaFile(deltaFileName)
+          if (blockNameToDeltaFilesMap.contains(blockName)) {
+            blockNameToDeltaFilesMap(blockName) += ((deltaFile, deltaFileName))
+          } else {
+            val deltaFileList = new ListBuffer[(CarbonFile, String)]()
+            deltaFileList += ((deltaFile, deltaFileName))
+            blockNameToDeltaFilesMap.put(blockName, deltaFileList)
+          }
+        }
+      }
+      val segmentUpdateDetails = new util.ArrayList[SegmentUpdateDetails]()
+      val columnCompressor = CompressorFactory.getInstance.getCompressor.getName
+      blockNameToDeltaFilesMap.foreach { entry =>
+        val segmentUpdateDetail = new SegmentUpdateDetails()
+        segmentUpdateDetail.setBlockName(entry._1)
+        segmentUpdateDetail.setActualBlockName(
+          entry._1 + CarbonCommonConstants.POINT + columnCompressor +
+            CarbonCommonConstants.FACT_FILE_EXT)
+        segmentUpdateDetail.setSegmentName(model.getSegmentId)
+        setMinMaxDeltaStampAndDeletedRowCount(entry._2, segmentUpdateDetail)
+        segmentUpdateDetails.add(segmentUpdateDetail)
+      }
+      val timestamp = System.currentTimeMillis().toString
+      val segmentDetails = new util.HashSet[Segment]()
+      segmentDetails.add(model.getSegment)
+      CarbonUpdateUtil.updateSegmentStatus(segmentUpdateDetails, carbonTable, timestamp, false)

Review comment:
       can we write tablestatus file once during add load command?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] Karan980 commented on a change in pull request #4070: [CARBONDATA-4082] Fix alter table add segment query on adding a segment having delete delta files.

Posted by GitBox <gi...@apache.org>.
Karan980 commented on a change in pull request #4070:
URL: https://github.com/apache/carbondata/pull/4070#discussion_r569256920



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonAddLoadCommand.scala
##########
@@ -294,6 +297,49 @@ case class CarbonAddLoadCommand(
       OperationListenerBus.getInstance().fireEvent(loadTablePreStatusUpdateEvent, operationContext)
     }
 
+    val deltaFiles = FileFactory.getCarbonFile(segmentPath).listFiles()
+      .filter(_.getName.endsWith(CarbonCommonConstants.DELETE_DELTA_FILE_EXT))
+    if (deltaFiles.length > 0) {
+      val blockNameToDeltaFilesMap =
+        collection.mutable.Map[String, collection.mutable.ListBuffer[(CarbonFile, String)]]()
+      deltaFiles.foreach { deltaFile =>
+        val tmpDeltaFilePath = deltaFile.getAbsolutePath
+          .replace(CarbonCommonConstants.WINDOWS_FILE_SEPARATOR,
+            CarbonCommonConstants.FILE_SEPARATOR)
+        val deltaFilePathElements = tmpDeltaFilePath.split(CarbonCommonConstants.FILE_SEPARATOR)
+        if (deltaFilePathElements != null && deltaFilePathElements.nonEmpty) {
+          val deltaFileName = deltaFilePathElements(deltaFilePathElements.length - 1)
+          val blockName = CarbonTablePath.DataFileUtil
+            .getBlockNameFromDeleteDeltaFile(deltaFileName)
+          if (blockNameToDeltaFilesMap.contains(blockName)) {
+            blockNameToDeltaFilesMap(blockName) += ((deltaFile, deltaFileName))
+          } else {
+            val deltaFileList = new ListBuffer[(CarbonFile, String)]()
+            deltaFileList += ((deltaFile, deltaFileName))
+            blockNameToDeltaFilesMap.put(blockName, deltaFileList)
+          }
+        }
+      }
+      val segmentUpdateDetails = new util.ArrayList[SegmentUpdateDetails]()
+      val columnCompressor = CompressorFactory.getInstance.getCompressor.getName
+      blockNameToDeltaFilesMap.foreach { entry =>
+        val segmentUpdateDetail = new SegmentUpdateDetails()
+        segmentUpdateDetail.setBlockName(entry._1)
+        segmentUpdateDetail.setActualBlockName(
+          entry._1 + CarbonCommonConstants.POINT + columnCompressor +
+            CarbonCommonConstants.FACT_FILE_EXT)
+        segmentUpdateDetail.setSegmentName(model.getSegmentId)
+        setMinMaxDeltaStampAndDeletedRowCount(entry._2, segmentUpdateDetail)
+        segmentUpdateDetails.add(segmentUpdateDetail)
+      }
+      val timestamp = System.currentTimeMillis().toString
+      val segmentDetails = new util.HashSet[Segment]()
+      segmentDetails.add(model.getSegment)
+      CarbonUpdateUtil.updateSegmentStatus(segmentUpdateDetails, carbonTable, timestamp, false)

Review comment:
       Done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4070: [CARBONDATA-4082] Fix alter table add segment query on adding a segment having delete delta files.

Posted by GitBox <gi...@apache.org>.
CarbonDataQA2 commented on pull request #4070:
URL: https://github.com/apache/carbondata/pull/4070#issuecomment-769874496


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3620/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4070: [CARBONDATA-4082] Fix alter table add segment query on adding a segment having delete delta files.

Posted by GitBox <gi...@apache.org>.
CarbonDataQA2 commented on pull request #4070:
URL: https://github.com/apache/carbondata/pull/4070#issuecomment-766553774


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3296/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4070: [CARBONDATA-4082] Fix alter table add segment query on adding a segment having delete delta files.

Posted by GitBox <gi...@apache.org>.
CarbonDataQA2 commented on pull request #4070:
URL: https://github.com/apache/carbondata/pull/4070#issuecomment-769878179


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5380/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4070: [CARBONDATA-4082] Fix alter table add segment query on adding a segment having delete delta files.

Posted by GitBox <gi...@apache.org>.
CarbonDataQA2 commented on pull request #4070:
URL: https://github.com/apache/carbondata/pull/4070#issuecomment-772492037


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3658/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] asfgit closed pull request #4070: [CARBONDATA-4082] Fix alter table add segment query on adding a segment having delete delta files.

Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #4070:
URL: https://github.com/apache/carbondata/pull/4070


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] Karan980 commented on pull request #4070: [CARBONDATA-4082] Fix alter table add segment query on adding a segment having delete delta files.

Posted by GitBox <gi...@apache.org>.
Karan980 commented on pull request #4070:
URL: https://github.com/apache/carbondata/pull/4070#issuecomment-769884076


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4070: [CARBONDATA-4082] Fix alter table add segment query on adding a segment having delete delta files.

Posted by GitBox <gi...@apache.org>.
CarbonDataQA2 commented on pull request #4070:
URL: https://github.com/apache/carbondata/pull/4070#issuecomment-759239290


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5306/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] Karan980 commented on a change in pull request #4070: [CARBONDATA-4082] Fix alter table add segment query on adding a segment having delete delta files.

Posted by GitBox <gi...@apache.org>.
Karan980 commented on a change in pull request #4070:
URL: https://github.com/apache/carbondata/pull/4070#discussion_r569256476



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonAddLoadCommand.scala
##########
@@ -369,5 +426,64 @@ case class CarbonAddLoadCommand(
     }
   }
 
+  /**
+   * If there are more than one deleteDelta File present  for a block. Then This method
+   * will pick the deltaFile with highest timestamp, because the default threshold for horizontal
+   * compaction is 1. It is assumed that threshold for horizontal compaction is not changed from
+   * default value. So there will always be only one valid delete delta file present for a block.
+   * It also sets the number of deleted rows for a segment.
+   */
+  def setValidDeltaFileAndDeletedRowCount(
+      deleteDeltaFiles : ListBuffer[(CarbonFile, String)],
+      segmentUpdateDetails : SegmentUpdateDetails
+      ) : Unit = {
+    var maxDeltaStamp : Long = -1
+    var deletedRowsCount : Long = 0
+    var validDeltaFile : CarbonFile = null
+    deleteDeltaFiles.foreach { deltaFile =>
+      val currentFileTimestamp = CarbonTablePath.DataFileUtil
+        .getTimeStampFromDeleteDeltaFile(deltaFile._2)
+      if (currentFileTimestamp.toLong > maxDeltaStamp) {
+        maxDeltaStamp = currentFileTimestamp.toLong
+        validDeltaFile = deltaFile._1
+      }
+    }
+    val blockDetails =
+      new CarbonDeleteDeltaFileReaderImpl(validDeltaFile.getAbsolutePath).readJson()
+    blockDetails.getBlockletDetails.asScala.foreach { blocklet =>
+      deletedRowsCount = deletedRowsCount + blocklet.getDeletedRows.size()
+    }
+    segmentUpdateDetails.setDeleteDeltaStartTimestamp(maxDeltaStamp.toString)
+    segmentUpdateDetails.setDeleteDeltaEndTimestamp(maxDeltaStamp.toString)
+    segmentUpdateDetails.setDeletedRowsInBlock(deletedRowsCount.toString)
+  }
+
+  /**
+   * As horizontal compaction not supported for SDK segments. So all delta files are valid
+   */
+  def readAllDeltaFiles(
+      deleteDeltaFiles : ListBuffer[(CarbonFile, String)],
+      segmentUpdateDetails : SegmentUpdateDetails
+  ) : Unit = {

Review comment:
       Done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4070: [CARBONDATA-4082] Fix alter table add segment query on adding a segment having delete delta files.

Posted by GitBox <gi...@apache.org>.
CarbonDataQA2 commented on pull request #4070:
URL: https://github.com/apache/carbondata/pull/4070#issuecomment-772366736


   Build Failed  with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5416/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] asfgit closed pull request #4070: [CARBONDATA-4082] Fix alter table add segment query on adding a segment having delete delta files.

Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #4070:
URL: https://github.com/apache/carbondata/pull/4070


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4070: [CARBONDATA-4082] Fix alter table add segment query on adding a segment having delete delta files.

Posted by GitBox <gi...@apache.org>.
CarbonDataQA2 commented on pull request #4070:
URL: https://github.com/apache/carbondata/pull/4070#issuecomment-769939153


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5381/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4070: [CARBONDATA-4082] Fix alter table add segment query on adding a segment having delete delta files.

Posted by GitBox <gi...@apache.org>.
CarbonDataQA2 commented on pull request #4070:
URL: https://github.com/apache/carbondata/pull/4070#issuecomment-772432804


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5417/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] Karan980 commented on a change in pull request #4070: [CARBONDATA-4082] Fix alter table add segment query on adding a segment having delete delta files.

Posted by GitBox <gi...@apache.org>.
Karan980 commented on a change in pull request #4070:
URL: https://github.com/apache/carbondata/pull/4070#discussion_r561607408



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonAddLoadCommand.scala
##########
@@ -294,6 +297,49 @@ case class CarbonAddLoadCommand(
       OperationListenerBus.getInstance().fireEvent(loadTablePreStatusUpdateEvent, operationContext)
     }
 
+    val deltaFiles = FileFactory.getCarbonFile(segmentPath).listFiles()

Review comment:
       Now, i'm taking just one delete delta file with highest timestamp for each block, by assuming that threshold for horizontal compaction is not changed from default. In case of SDK, all files are valid as horizontal compaction is not there in SDK




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4070: [CARBONDATA-4082] Fix alter table add segment query on adding a segment having delete delta files.

Posted by GitBox <gi...@apache.org>.
CarbonDataQA2 commented on pull request #4070:
URL: https://github.com/apache/carbondata/pull/4070#issuecomment-772366116


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3655/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4070: [CARBONDATA-4082] Fix alter table add segment query on adding a segment having delete delta files.

Posted by GitBox <gi...@apache.org>.
CarbonDataQA2 commented on pull request #4070:
URL: https://github.com/apache/carbondata/pull/4070#issuecomment-763751165


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5322/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] Karan980 commented on a change in pull request #4070: [CARBONDATA-4082] Fix alter table add segment query on adding a segment having delete delta files.

Posted by GitBox <gi...@apache.org>.
Karan980 commented on a change in pull request #4070:
URL: https://github.com/apache/carbondata/pull/4070#discussion_r566840116



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonAddLoadCommand.scala
##########
@@ -294,6 +297,49 @@ case class CarbonAddLoadCommand(
       OperationListenerBus.getInstance().fireEvent(loadTablePreStatusUpdateEvent, operationContext)
     }
 
+    val deltaFiles = FileFactory.getCarbonFile(segmentPath).listFiles()
+      .filter(_.getName.endsWith(CarbonCommonConstants.DELETE_DELTA_FILE_EXT))
+    if (deltaFiles.length > 0) {
+      val blockNameToDeltaFilesMap =
+        collection.mutable.Map[String, collection.mutable.ListBuffer[(CarbonFile, String)]]()
+      deltaFiles.foreach { deltaFile =>
+        val tmpDeltaFilePath = deltaFile.getAbsolutePath
+          .replace(CarbonCommonConstants.WINDOWS_FILE_SEPARATOR,
+            CarbonCommonConstants.FILE_SEPARATOR)
+        val deltaFilePathElements = tmpDeltaFilePath.split(CarbonCommonConstants.FILE_SEPARATOR)
+        if (deltaFilePathElements != null && deltaFilePathElements.nonEmpty) {
+          val deltaFileName = deltaFilePathElements(deltaFilePathElements.length - 1)
+          val blockName = CarbonTablePath.DataFileUtil
+            .getBlockNameFromDeleteDeltaFile(deltaFileName)
+          if (blockNameToDeltaFilesMap.contains(blockName)) {
+            blockNameToDeltaFilesMap(blockName) += ((deltaFile, deltaFileName))
+          } else {
+            val deltaFileList = new ListBuffer[(CarbonFile, String)]()
+            deltaFileList += ((deltaFile, deltaFileName))
+            blockNameToDeltaFilesMap.put(blockName, deltaFileList)
+          }
+        }
+      }
+      val segmentUpdateDetails = new util.ArrayList[SegmentUpdateDetails]()
+      val columnCompressor = CompressorFactory.getInstance.getCompressor.getName
+      blockNameToDeltaFilesMap.foreach { entry =>
+        val segmentUpdateDetail = new SegmentUpdateDetails()
+        segmentUpdateDetail.setBlockName(entry._1)
+        segmentUpdateDetail.setActualBlockName(
+          entry._1 + CarbonCommonConstants.POINT + columnCompressor +
+            CarbonCommonConstants.FACT_FILE_EXT)
+        segmentUpdateDetail.setSegmentName(model.getSegmentId)
+        setMinMaxDeltaStampAndDeletedRowCount(entry._2, segmentUpdateDetail)
+        segmentUpdateDetails.add(segmentUpdateDetail)
+      }
+      val timestamp = System.currentTimeMillis().toString
+      val segmentDetails = new util.HashSet[Segment]()
+      segmentDetails.add(model.getSegment)
+      CarbonUpdateUtil.updateSegmentStatus(segmentUpdateDetails, carbonTable, timestamp, false)

Review comment:
       I have analyzed this thing and find that before writing the new tableUpdateStatus file, we look for segment entries in tableStatus file and then write data for only those segments in updateTableStatus file which are present in tableStatus file. So, if we don't have newly added segment or the segment which we are adding now entry in the tableStatus file, then it will not write its corresponding entry in new tableUpdateStatus file also. So, for this scenario it is required to write the tableStatus file twice.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #4070: [CARBONDATA-4082] Fix alter table add segment query on adding a segment having delete delta files.

Posted by GitBox <gi...@apache.org>.
vikramahuja1001 commented on a change in pull request #4070:
URL: https://github.com/apache/carbondata/pull/4070#discussion_r551243624



##########
File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/addsegment/AddSegmentTestCase.scala
##########
@@ -82,6 +82,59 @@ class AddSegmentTestCase extends QueryTest with BeforeAndAfterAll {
     FileFactory.deleteAllFilesOfDir(new File(newPath))
   }
 
+  test("Test add segment for the segment having delete delta files") {

Review comment:
       Add a test case for update too




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4070: [CARBONDATA-4082] Fix alter table add segment query on adding a segment having delete delta files.

Posted by GitBox <gi...@apache.org>.
CarbonDataQA2 commented on pull request #4070:
URL: https://github.com/apache/carbondata/pull/4070#issuecomment-772436769


   Build Failed  with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3656/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] kunal642 commented on a change in pull request #4070: [CARBONDATA-4082] Fix alter table add segment query on adding a segment having delete delta files.

Posted by GitBox <gi...@apache.org>.
kunal642 commented on a change in pull request #4070:
URL: https://github.com/apache/carbondata/pull/4070#discussion_r568531096



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonAddLoadCommand.scala
##########
@@ -369,5 +426,64 @@ case class CarbonAddLoadCommand(
     }
   }
 
+  /**
+   * If there are more than one deleteDelta File present  for a block. Then This method
+   * will pick the deltaFile with highest timestamp, because the default threshold for horizontal
+   * compaction is 1. It is assumed that threshold for horizontal compaction is not changed from
+   * default value. So there will always be only one valid delete delta file present for a block.
+   * It also sets the number of deleted rows for a segment.
+   */
+  def setValidDeltaFileAndDeletedRowCount(
+      deleteDeltaFiles : ListBuffer[(CarbonFile, String)],
+      segmentUpdateDetails : SegmentUpdateDetails
+      ) : Unit = {
+    var maxDeltaStamp : Long = -1
+    var deletedRowsCount : Long = 0
+    var validDeltaFile : CarbonFile = null
+    deleteDeltaFiles.foreach { deltaFile =>
+      val currentFileTimestamp = CarbonTablePath.DataFileUtil
+        .getTimeStampFromDeleteDeltaFile(deltaFile._2)
+      if (currentFileTimestamp.toLong > maxDeltaStamp) {
+        maxDeltaStamp = currentFileTimestamp.toLong
+        validDeltaFile = deltaFile._1
+      }
+    }
+    val blockDetails =
+      new CarbonDeleteDeltaFileReaderImpl(validDeltaFile.getAbsolutePath).readJson()
+    blockDetails.getBlockletDetails.asScala.foreach { blocklet =>
+      deletedRowsCount = deletedRowsCount + blocklet.getDeletedRows.size()
+    }
+    segmentUpdateDetails.setDeleteDeltaStartTimestamp(maxDeltaStamp.toString)
+    segmentUpdateDetails.setDeleteDeltaEndTimestamp(maxDeltaStamp.toString)
+    segmentUpdateDetails.setDeletedRowsInBlock(deletedRowsCount.toString)
+  }
+
+  /**
+   * As horizontal compaction not supported for SDK segments. So all delta files are valid
+   */
+  def readAllDeltaFiles(
+      deleteDeltaFiles : ListBuffer[(CarbonFile, String)],
+      segmentUpdateDetails : SegmentUpdateDetails
+  ) : Unit = {

Review comment:
       please fix this formatting.. move to above line. Check other code for the same as well




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4070: [CARBONDATA-4082] Fix alter table add segment query on adding a segment having delete delta files.

Posted by GitBox <gi...@apache.org>.
CarbonDataQA2 commented on pull request #4070:
URL: https://github.com/apache/carbondata/pull/4070#issuecomment-753976459


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3516/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4070: [CARBONDATA-4082] Fix alter table add segment query on adding a segment having delete delta files.

Posted by GitBox <gi...@apache.org>.
CarbonDataQA2 commented on pull request #4070:
URL: https://github.com/apache/carbondata/pull/4070#issuecomment-762634833


   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5313/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] Karan980 commented on pull request #4070: [CARBONDATA-4082] Fix alter table add segment query on adding a segment having delete delta files.

Posted by GitBox <gi...@apache.org>.
Karan980 commented on pull request #4070:
URL: https://github.com/apache/carbondata/pull/4070#issuecomment-762604221


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] Karan980 commented on pull request #4070: [CARBONDATA-4082] Fix alter table add segment query on adding a segment having delete delta files.

Posted by GitBox <gi...@apache.org>.
Karan980 commented on pull request #4070:
URL: https://github.com/apache/carbondata/pull/4070#issuecomment-769821724


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] kunal642 commented on a change in pull request #4070: [CARBONDATA-4082] Fix alter table add segment query on adding a segment having delete delta files.

Posted by GitBox <gi...@apache.org>.
kunal642 commented on a change in pull request #4070:
URL: https://github.com/apache/carbondata/pull/4070#discussion_r568534198



##########
File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonAddLoadCommand.scala
##########
@@ -294,6 +297,49 @@ case class CarbonAddLoadCommand(
       OperationListenerBus.getInstance().fireEvent(loadTablePreStatusUpdateEvent, operationContext)
     }
 
+    val deltaFiles = FileFactory.getCarbonFile(segmentPath).listFiles()

Review comment:
       Better to use CarbonFileFilter to list only the delete delta files




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] kunal642 commented on pull request #4070: [CARBONDATA-4082] Fix alter table add segment query on adding a segment having delete delta files.

Posted by GitBox <gi...@apache.org>.
kunal642 commented on pull request #4070:
URL: https://github.com/apache/carbondata/pull/4070#issuecomment-773154078


   LGTM


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4070: [CARBONDATA-4082] Fix alter table add segment query on adding a segment having delete delta files.

Posted by GitBox <gi...@apache.org>.
CarbonDataQA2 commented on pull request #4070:
URL: https://github.com/apache/carbondata/pull/4070#issuecomment-762637383


   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3553/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org