You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by GitBox <gi...@apache.org> on 2020/03/24 09:54:23 UTC

[GitHub] [carbondata] akashrn5 opened a new pull request #3676: [WIP]Clean up the data file and index files after SI rebuild

akashrn5 opened a new pull request #3676: [WIP]Clean up the data file and index files after SI rebuild
URL: https://github.com/apache/carbondata/pull/3676
 
 
    ### Why is this PR needed?
    
    
    ### What changes were proposed in this PR?
   
       
    ### Does this PR introduce any user interface change?
    - No
    - Yes. (please explain the change and update document)
   
    ### Is any new testcase added?
    - No
    - Yes
   
       
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] kunal642 commented on issue #3676: [CARBONDATA-3754]Clean up the data file and index files after SI rebuild

Posted by GitBox <gi...@apache.org>.
kunal642 commented on issue #3676: [CARBONDATA-3754]Clean up the data file and index files after SI rebuild
URL: https://github.com/apache/carbondata/pull/3676#issuecomment-610801004
 
 
   @akashrn5 Conflict again

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] akashrn5 commented on a change in pull request #3676: [WIP]Clean up the data file and index files after SI rebuild

Posted by GitBox <gi...@apache.org>.
akashrn5 commented on a change in pull request #3676: [WIP]Clean up the data file and index files after SI rebuild
URL: https://github.com/apache/carbondata/pull/3676#discussion_r397307874
 
 

 ##########
 File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/rdd/CarbonSIRebuildRDD.scala
 ##########
 @@ -321,6 +324,26 @@ class CarbonSIRebuildRDD[K, V](
           LOGGER.info("Closing compaction processor instance to clean up loading resources")
           processor.close()
         }
+
+        // delete all the old data files which are used for merging
+        splits.asScala.foreach { split =>
+          val carbonFile = FileFactory.getCarbonFile(split.getFilePath)
+          carbonFile.delete()
+        }
+
+        // delete the indexfile/merge index carbonFile of old data files
+        val segmentPath = FileFactory.getCarbonFile(indexTable.getSegmentPath(segmentId))
+        val indexFiles = segmentPath.listFiles(new CarbonFileFilter {
 
 Review comment:
   actually, we will be having all the data files in splits, but we dont use index files for rebuilding, so i will check again if there any possibility to avoid this

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3676: [WIP]Clean up the data file and index files after SI rebuild

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on issue #3676: [WIP]Clean up the data file and index files after SI rebuild
URL: https://github.com/apache/carbondata/pull/3676#issuecomment-603207514
 
 
   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2546/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3676: [CARBONDATA-3754]Clean up the data file and index files after SI rebuild

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on issue #3676: [CARBONDATA-3754]Clean up the data file and index files after SI rebuild
URL: https://github.com/apache/carbondata/pull/3676#issuecomment-610811726
 
 
   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/961/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] kunal642 commented on issue #3676: [CARBONDATA-3754]Clean up the data file and index files after SI rebuild

Posted by GitBox <gi...@apache.org>.
kunal642 commented on issue #3676: [CARBONDATA-3754]Clean up the data file and index files after SI rebuild
URL: https://github.com/apache/carbondata/pull/3676#issuecomment-610755428
 
 
   @akashrn5 Please rebase

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] dhatchayani commented on a change in pull request #3676: [WIP]Clean up the data file and index files after SI rebuild

Posted by GitBox <gi...@apache.org>.
dhatchayani commented on a change in pull request #3676: [WIP]Clean up the data file and index files after SI rebuild
URL: https://github.com/apache/carbondata/pull/3676#discussion_r398470509
 
 

 ##########
 File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/rdd/CarbonSIRebuildRDD.scala
 ##########
 @@ -321,6 +324,26 @@ class CarbonSIRebuildRDD[K, V](
           LOGGER.info("Closing compaction processor instance to clean up loading resources")
           processor.close()
         }
+
+        // delete all the old data files which are used for merging
+        splits.asScala.foreach { split =>
+          val carbonFile = FileFactory.getCarbonFile(split.getFilePath)
+          carbonFile.delete()
+        }
+
+        // delete the indexfile/merge index carbonFile of old data files
+        val segmentPath = FileFactory.getCarbonFile(indexTable.getSegmentPath(segmentId))
+        val indexFiles = segmentPath.listFiles(new CarbonFileFilter {
+          override def accept(carbonFile: CarbonFile): Boolean = {
+            (carbonFile.getName.endsWith(CarbonTablePath.INDEX_FILE_EXT) ||
+             carbonFile.getName.endsWith(CarbonTablePath.MERGE_INDEX_FILE_EXT)) &&
+            DataFileUtil.getTimeStampFromFileName(carbonFile.getAbsolutePath).toLong <
+            carbonLoadModelCopy.getFactTimeStamp
+          }
+        })
+        indexFiles.foreach { indexFile =>
+          indexFile.delete()
 
 Review comment:
   Please make sure to clear the cache for the index files(in case loaded already) which are to be deleted

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3676: [CARBONDATA-3754]Clean up the data file and index files after SI rebuild

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on issue #3676: [CARBONDATA-3754]Clean up the data file and index files after SI rebuild
URL: https://github.com/apache/carbondata/pull/3676#issuecomment-610813883
 
 
   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2673/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3676: [CARBONDATA-3754]Clean up the data file and index files after SI rebuild

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on issue #3676: [CARBONDATA-3754]Clean up the data file and index files after SI rebuild
URL: https://github.com/apache/carbondata/pull/3676#issuecomment-610878800
 
 
   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2679/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #3676: [WIP]Clean up the data file and index files after SI rebuild

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on a change in pull request #3676: [WIP]Clean up the data file and index files after SI rebuild
URL: https://github.com/apache/carbondata/pull/3676#discussion_r397303472
 
 

 ##########
 File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/rdd/CarbonSIRebuildRDD.scala
 ##########
 @@ -321,6 +324,26 @@ class CarbonSIRebuildRDD[K, V](
           LOGGER.info("Closing compaction processor instance to clean up loading resources")
           processor.close()
         }
+
+        // delete all the old data files which are used for merging
+        splits.asScala.foreach { split =>
+          val carbonFile = FileFactory.getCarbonFile(split.getFilePath)
+          carbonFile.delete()
+        }
+
+        // delete the indexfile/merge index carbonFile of old data files
+        val segmentPath = FileFactory.getCarbonFile(indexTable.getSegmentPath(segmentId))
+        val indexFiles = segmentPath.listFiles(new CarbonFileFilter {
 
 Review comment:
   Is list files needed again ? It is costly in S3a/OBS. It is better to keep the list of data files and index files participated in rebuild in memory and clean up after rebuild without calling list files again. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3676: [WIP]Clean up the data file and index files after SI rebuild

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on issue #3676: [WIP]Clean up the data file and index files after SI rebuild
URL: https://github.com/apache/carbondata/pull/3676#issuecomment-603204485
 
 
   Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/839/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] akashrn5 commented on issue #3676: [CARBONDATA-3754]Clean up the data file and index files after SI rebuild

Posted by GitBox <gi...@apache.org>.
akashrn5 commented on issue #3676: [CARBONDATA-3754]Clean up the data file and index files after SI rebuild
URL: https://github.com/apache/carbondata/pull/3676#issuecomment-610758369
 
 
   @kunal642 rebased, please check

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] akashrn5 commented on a change in pull request #3676: [WIP]Clean up the data file and index files after SI rebuild

Posted by GitBox <gi...@apache.org>.
akashrn5 commented on a change in pull request #3676: [WIP]Clean up the data file and index files after SI rebuild
URL: https://github.com/apache/carbondata/pull/3676#discussion_r397307874
 
 

 ##########
 File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/rdd/CarbonSIRebuildRDD.scala
 ##########
 @@ -321,6 +324,26 @@ class CarbonSIRebuildRDD[K, V](
           LOGGER.info("Closing compaction processor instance to clean up loading resources")
           processor.close()
         }
+
+        // delete all the old data files which are used for merging
+        splits.asScala.foreach { split =>
+          val carbonFile = FileFactory.getCarbonFile(split.getFilePath)
+          carbonFile.delete()
+        }
+
+        // delete the indexfile/merge index carbonFile of old data files
+        val segmentPath = FileFactory.getCarbonFile(indexTable.getSegmentPath(segmentId))
+        val indexFiles = segmentPath.listFiles(new CarbonFileFilter {
 
 Review comment:
   actually, we will be having all the data files in splits, but we dont use index files for rebuilding, so i will check again if there any possibility to avoid this

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] asfgit closed pull request #3676: [CARBONDATA-3754]Clean up the data file and index files after SI rebuild

Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #3676: [CARBONDATA-3754]Clean up the data file and index files after SI rebuild
URL: https://github.com/apache/carbondata/pull/3676
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] dhatchayani commented on a change in pull request #3676: [WIP]Clean up the data file and index files after SI rebuild

Posted by GitBox <gi...@apache.org>.
dhatchayani commented on a change in pull request #3676: [WIP]Clean up the data file and index files after SI rebuild
URL: https://github.com/apache/carbondata/pull/3676#discussion_r398470227
 
 

 ##########
 File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/rdd/CarbonSIRebuildRDD.scala
 ##########
 @@ -321,6 +324,26 @@ class CarbonSIRebuildRDD[K, V](
           LOGGER.info("Closing compaction processor instance to clean up loading resources")
           processor.close()
         }
+
+        // delete all the old data files which are used for merging
+        splits.asScala.foreach { split =>
+          val carbonFile = FileFactory.getCarbonFile(split.getFilePath)
+          carbonFile.delete()
+        }
+
+        // delete the indexfile/merge index carbonFile of old data files
+        val segmentPath = FileFactory.getCarbonFile(indexTable.getSegmentPath(segmentId))
+        val indexFiles = segmentPath.listFiles(new CarbonFileFilter {
+          override def accept(carbonFile: CarbonFile): Boolean = {
+            (carbonFile.getName.endsWith(CarbonTablePath.INDEX_FILE_EXT) ||
+             carbonFile.getName.endsWith(CarbonTablePath.MERGE_INDEX_FILE_EXT)) &&
+            DataFileUtil.getTimeStampFromFileName(carbonFile.getAbsolutePath).toLong <
+            carbonLoadModelCopy.getFactTimeStamp
+          }
+        })
+        indexFiles.foreach { indexFile =>
+          indexFile.delete()
 
 Review comment:
   Please test the scenario when
   (1) the index files before rebuild is already queried and cached
   (2) then rebuild and query are concurrent
   in this scenario query will take the index file and go on query, but if the rebuild deletes it, then the file will be unavailable and either says exception or will result in null set.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] akashrn5 commented on a change in pull request #3676: [WIP]Clean up the data file and index files after SI rebuild

Posted by GitBox <gi...@apache.org>.
akashrn5 commented on a change in pull request #3676: [WIP]Clean up the data file and index files after SI rebuild
URL: https://github.com/apache/carbondata/pull/3676#discussion_r398489828
 
 

 ##########
 File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/rdd/CarbonSIRebuildRDD.scala
 ##########
 @@ -321,6 +324,26 @@ class CarbonSIRebuildRDD[K, V](
           LOGGER.info("Closing compaction processor instance to clean up loading resources")
           processor.close()
         }
+
+        // delete all the old data files which are used for merging
+        splits.asScala.foreach { split =>
+          val carbonFile = FileFactory.getCarbonFile(split.getFilePath)
+          carbonFile.delete()
+        }
+
+        // delete the indexfile/merge index carbonFile of old data files
+        val segmentPath = FileFactory.getCarbonFile(indexTable.getSegmentPath(segmentId))
+        val indexFiles = segmentPath.listFiles(new CarbonFileFilter {
+          override def accept(carbonFile: CarbonFile): Boolean = {
+            (carbonFile.getName.endsWith(CarbonTablePath.INDEX_FILE_EXT) ||
+             carbonFile.getName.endsWith(CarbonTablePath.MERGE_INDEX_FILE_EXT)) &&
+            DataFileUtil.getTimeStampFromFileName(carbonFile.getAbsolutePath).toLong <
+            carbonLoadModelCopy.getFactTimeStamp
+          }
+        })
+        indexFiles.foreach { indexFile =>
+          indexFile.delete()
 
 Review comment:
   it was already handled to clear the cache after rebuild.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] akashrn5 commented on a change in pull request #3676: [WIP]Clean up the data file and index files after SI rebuild

Posted by GitBox <gi...@apache.org>.
akashrn5 commented on a change in pull request #3676: [WIP]Clean up the data file and index files after SI rebuild
URL: https://github.com/apache/carbondata/pull/3676#discussion_r398489554
 
 

 ##########
 File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/rdd/CarbonSIRebuildRDD.scala
 ##########
 @@ -321,6 +324,26 @@ class CarbonSIRebuildRDD[K, V](
           LOGGER.info("Closing compaction processor instance to clean up loading resources")
           processor.close()
         }
+
+        // delete all the old data files which are used for merging
+        splits.asScala.foreach { split =>
+          val carbonFile = FileFactory.getCarbonFile(split.getFilePath)
+          carbonFile.delete()
+        }
+
+        // delete the indexfile/merge index carbonFile of old data files
+        val segmentPath = FileFactory.getCarbonFile(indexTable.getSegmentPath(segmentId))
+        val indexFiles = segmentPath.listFiles(new CarbonFileFilter {
+          override def accept(carbonFile: CarbonFile): Boolean = {
+            (carbonFile.getName.endsWith(CarbonTablePath.INDEX_FILE_EXT) ||
+             carbonFile.getName.endsWith(CarbonTablePath.MERGE_INDEX_FILE_EXT)) &&
+            DataFileUtil.getTimeStampFromFileName(carbonFile.getAbsolutePath).toLong <
+            carbonLoadModelCopy.getFactTimeStamp
+          }
+        })
+        indexFiles.foreach { indexFile =>
+          indexFile.delete()
 
 Review comment:
   this scenario is verified, during query if the rebuild finished before caching,then it should not be a problem, but if its already reading then it might fail, this is scenario in many cases, like merge index, index server, because we dont have lock mechanism in read right, so this should be ok

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] kunal642 commented on issue #3676: [CARBONDATA-3754]Clean up the data file and index files after SI rebuild

Posted by GitBox <gi...@apache.org>.
kunal642 commented on issue #3676: [CARBONDATA-3754]Clean up the data file and index files after SI rebuild
URL: https://github.com/apache/carbondata/pull/3676#issuecomment-610816939
 
 
   LGTM

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] akashrn5 commented on issue #3676: [CARBONDATA-3754]Clean up the data file and index files after SI rebuild

Posted by GitBox <gi...@apache.org>.
akashrn5 commented on issue #3676: [CARBONDATA-3754]Clean up the data file and index files after SI rebuild
URL: https://github.com/apache/carbondata/pull/3676#issuecomment-610802044
 
 
   > @akashrn5 Conflict again
   
   @kunal642 i can see no conflict

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] dhatchayani commented on a change in pull request #3676: [WIP]Clean up the data file and index files after SI rebuild

Posted by GitBox <gi...@apache.org>.
dhatchayani commented on a change in pull request #3676: [WIP]Clean up the data file and index files after SI rebuild
URL: https://github.com/apache/carbondata/pull/3676#discussion_r398470509
 
 

 ##########
 File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/rdd/CarbonSIRebuildRDD.scala
 ##########
 @@ -321,6 +324,26 @@ class CarbonSIRebuildRDD[K, V](
           LOGGER.info("Closing compaction processor instance to clean up loading resources")
           processor.close()
         }
+
+        // delete all the old data files which are used for merging
+        splits.asScala.foreach { split =>
+          val carbonFile = FileFactory.getCarbonFile(split.getFilePath)
+          carbonFile.delete()
+        }
+
+        // delete the indexfile/merge index carbonFile of old data files
+        val segmentPath = FileFactory.getCarbonFile(indexTable.getSegmentPath(segmentId))
+        val indexFiles = segmentPath.listFiles(new CarbonFileFilter {
+          override def accept(carbonFile: CarbonFile): Boolean = {
+            (carbonFile.getName.endsWith(CarbonTablePath.INDEX_FILE_EXT) ||
+             carbonFile.getName.endsWith(CarbonTablePath.MERGE_INDEX_FILE_EXT)) &&
+            DataFileUtil.getTimeStampFromFileName(carbonFile.getAbsolutePath).toLong <
+            carbonLoadModelCopy.getFactTimeStamp
+          }
+        })
+        indexFiles.foreach { indexFile =>
+          indexFile.delete()
 
 Review comment:
   Please make sure to clear the cache for the index files which are to be deleted

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] kunal642 commented on issue #3676: [CARBONDATA-3754]Clean up the data file and index files after SI rebuild

Posted by GitBox <gi...@apache.org>.
kunal642 commented on issue #3676: [CARBONDATA-3754]Clean up the data file and index files after SI rebuild
URL: https://github.com/apache/carbondata/pull/3676#issuecomment-610803523
 
 
   retest this please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] akashrn5 commented on a change in pull request #3676: [CARBONDATA-3754]Clean up the data file and index files after SI rebuild

Posted by GitBox <gi...@apache.org>.
akashrn5 commented on a change in pull request #3676: [CARBONDATA-3754]Clean up the data file and index files after SI rebuild
URL: https://github.com/apache/carbondata/pull/3676#discussion_r405341211
 
 

 ##########
 File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/rdd/CarbonSIRebuildRDD.scala
 ##########
 @@ -321,6 +324,26 @@ class CarbonSIRebuildRDD[K, V](
           LOGGER.info("Closing compaction processor instance to clean up loading resources")
           processor.close()
         }
+
+        // delete all the old data files which are used for merging
+        splits.asScala.foreach { split =>
+          val carbonFile = FileFactory.getCarbonFile(split.getFilePath)
+          carbonFile.delete()
+        }
+
+        // delete the indexfile/merge index carbonFile of old data files
+        val segmentPath = FileFactory.getCarbonFile(indexTable.getSegmentPath(segmentId))
+        val indexFiles = segmentPath.listFiles(new CarbonFileFilter {
 
 Review comment:
   we will have the list of data files in the task but not the index files, i will try and fix in anothr PR

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3676: [CARBONDATA-3754]Clean up the data file and index files after SI rebuild

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on issue #3676: [CARBONDATA-3754]Clean up the data file and index files after SI rebuild
URL: https://github.com/apache/carbondata/pull/3676#issuecomment-610886067
 
 
   Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/968/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services