You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by rahulforallp <gi...@git.apache.org> on 2018/04/01 12:13:54 UTC
[GitHub] carbondata pull request #2128: [WIP] partition table clean files fixed
GitHub user rahulforallp opened a pull request:
https://github.com/apache/carbondata/pull/2128
[WIP] partition table clean files fixed
Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:
- [ ] Any interfaces changed?
- [ ] Any backward compatibility impacted?
- [ ] Document update required?
- [ ] Testing done
Please provide details on
- Whether new unit test cases have been added or why no new tests are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance test report.
- Any additional information to help reviewers in testing this change.
- [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/rahulforallp/incubator-carbondata part_tab_cleanFile
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/carbondata/pull/2128.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2128
----
commit 8044edb5afa858fa72ae7b2d0d1cf0685cf92597
Author: rahulforallp <ra...@...>
Date: 2018-04-01T12:08:51Z
partition table clean files fixed
----
---
[GitHub] carbondata issue #2128: [CARBONDATA-2303] If dataload is failed for parition...
Posted by gvramana <gi...@git.apache.org>.
Github user gvramana commented on the issue:
https://github.com/apache/carbondata/pull/2128
LGTM
---
[GitHub] carbondata issue #2128: [CARBONDATA-2303] [WIP] If dataload is failed for pa...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2128
Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3564/
---
[GitHub] carbondata issue #2128: [CARBONDATA-2303] If dataload is failed for parition...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2128
Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3727/
---
[GitHub] carbondata issue #2128: [WIP] partition table clean files fixed
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2128
SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4222/
---
[GitHub] carbondata issue #2128: [WIP] partition table clean files fixed
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2128
Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3486/
---
[GitHub] carbondata issue #2128: [CARBONDATA-2303] If dataload is failed for parition...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2128
SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4359/
---
[GitHub] carbondata issue #2128: [WIP] partition table clean files fixed
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2128
SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4220/
---
[GitHub] carbondata issue #2128: [CARBONDATA-2303] If dataload is failed for parition...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2128
SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4401/
---
[GitHub] carbondata issue #2128: [CARBONDATA-2303] If dataload is failed for parition...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2128
Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3728/
---
[GitHub] carbondata issue #2128: [CARBONDATA-2303] If dataload is failed for parition...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2128
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4947/
---
[GitHub] carbondata pull request #2128: [CARBONDATA-2303] If dataload is failed for p...
Posted by gvramana <gi...@git.apache.org>.
Github user gvramana commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2128#discussion_r180099123
--- Diff: core/src/main/java/org/apache/carbondata/core/datastore/filesystem/LocalCarbonFile.java ---
@@ -156,6 +158,25 @@ public boolean delete() {
}
+ @Override
+ public CarbonFile[] listFiles(Boolean recurssive) {
+ if (!file.isDirectory()) {
+ return new CarbonFile[0];
+ }
+ String[] filter = null;
+ Collection<File> fileCollection = FileUtils.listFiles(file, null, true);
+ File[] files = fileCollection.toArray(new File[fileCollection.size()]);
+ if (files == null) {
+ return new CarbonFile[0];
+ }
+ CarbonFile[] carbonFiles = new CarbonFile[files.length];
--- End diff --
directly copy into array
---
[GitHub] carbondata issue #2128: [CARBONDATA-2303] [WIP] If dataload is failed for pa...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2128
Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3556/
---
[GitHub] carbondata issue #2128: [CARBONDATA-2303] If dataload is failed for parition...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2128
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4949/
---
[GitHub] carbondata issue #2128: [CARBONDATA-2303] [WIP] If dataload is failed for pa...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2128
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4733/
---
[GitHub] carbondata issue #2128: [CARBONDATA-2303] If dataload is failed for parition...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2128
Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3577/
---
[GitHub] carbondata issue #2128: [WIP] partition table clean files fixed
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2128
SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4225/
---
[GitHub] carbondata issue #2128: [CARBONDATA-2303] [WIP] If dataload is failed for pa...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2128
Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3506/
---
[GitHub] carbondata pull request #2128: [CARBONDATA-2303] If dataload is failed for p...
Posted by rahulforallp <gi...@git.apache.org>.
Github user rahulforallp commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2128#discussion_r180156407
--- Diff: core/src/main/java/org/apache/carbondata/core/datastore/filesystem/LocalCarbonFile.java ---
@@ -156,6 +158,25 @@ public boolean delete() {
}
+ @Override
+ public CarbonFile[] listFiles(Boolean recurssive) {
+ if (!file.isDirectory()) {
+ return new CarbonFile[0];
+ }
+ String[] filter = null;
+ Collection<File> fileCollection = FileUtils.listFiles(file, null, true);
+ File[] files = fileCollection.toArray(new File[fileCollection.size()]);
+ if (files == null) {
+ return new CarbonFile[0];
+ }
+ CarbonFile[] carbonFiles = new CarbonFile[files.length];
--- End diff --
done
---
[GitHub] carbondata issue #2128: [CARBONDATA-2303] If dataload is failed for parition...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2128
SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4347/
---
[GitHub] carbondata issue #2128: [CARBONDATA-2303] [WIP] If dataload is failed for pa...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2128
SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4258/
---
[GitHub] carbondata issue #2128: [WIP] partition table clean files fixed
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2128
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4716/
---
[GitHub] carbondata pull request #2128: [CARBONDATA-2303] If dataload is failed for p...
Posted by rahulforallp <gi...@git.apache.org>.
Github user rahulforallp commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2128#discussion_r180156365
--- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/api/CarbonStore.scala ---
@@ -151,13 +153,82 @@ object CarbonStore {
}
}
} finally {
+ if (currentTablePartitions.equals(None)) {
+ cleanUpPartitionFoldersRecurssively(carbonTable, List.empty[PartitionSpec])
+ } else {
+ cleanUpPartitionFoldersRecurssively(carbonTable, currentTablePartitions.get.toList)
+ }
+
if (carbonCleanFilesLock != null) {
CarbonLockUtil.fileUnlock(carbonCleanFilesLock, LockUsage.CLEAN_FILES_LOCK)
}
}
LOGGER.audit(s"Clean files operation is success for $dbName.$tableName.")
}
+ /**
+ * delete partition folders recurssively
+ *
+ * @param carbonTable
+ * @param partitionSpecList
+ */
+ def cleanUpPartitionFoldersRecurssively(carbonTable: CarbonTable,
+ partitionSpecList: List[PartitionSpec]): Unit = {
+ if (carbonTable != null) {
+ val loadMetadataDetails = SegmentStatusManager
+ .readLoadMetadata(carbonTable.getMetadataPath)
+
+ val fileType = FileFactory.getFileType(carbonTable.getTablePath)
+ val carbonFile = FileFactory.getCarbonFile(carbonTable.getTablePath, fileType)
+
+ // list all files from table path
+ val listOfDefaultPartFilesIterator = carbonFile.listFiles(true)
+ loadMetadataDetails.foreach { metadataDetail =>
+ if (metadataDetail.getSegmentStatus.equals(SegmentStatus.MARKED_FOR_DELETE) &&
+ metadataDetail.getSegmentFile == null) {
+ val loadStartTime: Long = metadataDetail.getLoadStartTime
+ // delete all files of @loadStartTime from tablepath
+ cleanPartitionFolder(listOfDefaultPartFilesIterator, loadStartTime)
+ partitionSpecList.foreach {
+ partitionSpec =>
+ val partitionLocation = partitionSpec.getLocation
+ // For partition folder outside the tablePath
+ if (!partitionLocation.toString.startsWith(carbonTable.getTablePath)) {
+ val fileType = FileFactory.getFileType(partitionLocation.toString)
+ val partitionCarbonFile = FileFactory
+ .getCarbonFile(partitionLocation.toString, fileType)
+ // list all files from partitionLoacation
+ val listOfExternalPartFilesIterator = partitionCarbonFile.listFiles(true)
+ // delete all files of @loadStartTime from externalPath
+ cleanPartitionFolder(listOfExternalPartFilesIterator, loadStartTime)
+ }
+ }
+ }
+ }
+ }
+ }
+
+ /**
+ *
+ * @param carbonFiles
+ * @param timestamp
+ */
+ private def cleanPartitionFolder(carbonFiles: Array[CarbonFile],
+ timestamp: Long): Unit = {
+ carbonFiles.foreach {
+ carbonFile =>
+ val filePath = carbonFile.getPath
+ val fileName = carbonFile.getName
+ if (fileName.lastIndexOf("-") > 0 && fileName.lastIndexOf(".") > 0) {
+ if (fileName.substring(fileName.lastIndexOf("-") + 1, fileName.lastIndexOf("."))
--- End diff --
done
---
[GitHub] carbondata pull request #2128: [CARBONDATA-2303] If dataload is failed for p...
Posted by rahulforallp <gi...@git.apache.org>.
Github user rahulforallp commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2128#discussion_r180079486
--- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/api/CarbonStore.scala ---
@@ -151,13 +152,88 @@ object CarbonStore {
}
}
} finally {
+ if (currentTablePartitions.equals(None)) {
+ cleanUpPartitionFoldersRecurssively(carbonTable, List.empty[PartitionSpec])
+ } else {
+ cleanUpPartitionFoldersRecurssively(carbonTable, currentTablePartitions.get.toList)
+ }
+
if (carbonCleanFilesLock != null) {
CarbonLockUtil.fileUnlock(carbonCleanFilesLock, LockUsage.CLEAN_FILES_LOCK)
}
}
LOGGER.audit(s"Clean files operation is success for $dbName.$tableName.")
}
+ /**
+ * delete partition folders recurssively
+ *
+ * @param carbonTable
+ * @param partitionSpecList
+ */
+ def cleanUpPartitionFoldersRecurssively(carbonTable: CarbonTable,
+ partitionSpecList: List[PartitionSpec]): Unit = {
+ if (carbonTable != null) {
+ val loadMetadataDetails = SegmentStatusManager
--- End diff --
1. partition folders cannot be deleted, as there is no way to check if new dataload is using them. ==> Done
2. Shouldnot take multiple snapshots of file system during clean files. ==> earlier we are not taking snapshot recurssively . so it required here for partition folders.
3. Partition location will be valid for partitions inside table path also, those folders should not be scanned twice. ==> Done
4. CarbonFile interface should be used for filesystem operations. ==> Done
---
[GitHub] carbondata issue #2128: [CARBONDATA-2303] If dataload is failed for parition...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2128
SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4404/
---
[GitHub] carbondata issue #2128: [WIP] partition table clean files fixed
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2128
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4719/
---
[GitHub] carbondata issue #2128: [CARBONDATA-2303] If dataload is failed for parition...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2128
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4889/
---
[GitHub] carbondata issue #2128: [CARBONDATA-2303] If dataload is failed for parition...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2128
SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4340/
---
[GitHub] carbondata issue #2128: [CARBONDATA-2303] If dataload is failed for parition...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2128
SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4364/
---
[GitHub] carbondata issue #2128: [CARBONDATA-2303] If dataload is failed for parition...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2128
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4871/
---
[GitHub] carbondata issue #2128: [CARBONDATA-2303] If dataload is failed for parition...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2128
Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3655/
---
[GitHub] carbondata issue #2128: [CARBONDATA-2303] [WIP] If dataload is failed for pa...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2128
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4788/
---
[GitHub] carbondata issue #2128: [CARBONDATA-2303] If dataload is failed for parition...
Posted by rahulforallp <gi...@git.apache.org>.
Github user rahulforallp commented on the issue:
https://github.com/apache/carbondata/pull/2128
retest this please
---
[GitHub] carbondata issue #2128: [WIP] partition table clean files fixed
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2128
SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4226/
---
[GitHub] carbondata issue #2128: [CARBONDATA-2303] If dataload is failed for parition...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2128
SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4403/
---
[GitHub] carbondata issue #2128: [CARBONDATA-2303] [WIP] If dataload is failed for pa...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2128
Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3525/
---
[GitHub] carbondata issue #2128: [CARBONDATA-2303] If dataload is failed for parition...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2128
Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3672/
---
[GitHub] carbondata issue #2128: [CARBONDATA-2303] If dataload is failed for parition...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2128
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4895/
---
[GitHub] carbondata issue #2128: [WIP] partition table clean files fixed
Posted by rahulforallp <gi...@git.apache.org>.
Github user rahulforallp commented on the issue:
https://github.com/apache/carbondata/pull/2128
retest this please
---
[GitHub] carbondata issue #2128: [CARBONDATA-2303] [WIP] If dataload is failed for pa...
Posted by rahulforallp <gi...@git.apache.org>.
Github user rahulforallp commented on the issue:
https://github.com/apache/carbondata/pull/2128
retest sdv please
---
[GitHub] carbondata issue #2128: [CARBONDATA-2303] [WIP] If dataload is failed for pa...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2128
SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4284/
---
[GitHub] carbondata issue #2128: [WIP] partition table clean files fixed
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2128
SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4223/
---
[GitHub] carbondata issue #2128: [CARBONDATA-2303] If dataload is failed for parition...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2128
Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3649/
---
[GitHub] carbondata pull request #2128: [CARBONDATA-2303] If dataload is failed for p...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/carbondata/pull/2128
---
[GitHub] carbondata issue #2128: [CARBONDATA-2303] If dataload is failed for parition...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2128
Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3730/
---
[GitHub] carbondata issue #2128: [CARBONDATA-2303] If dataload is failed for parition...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2128
Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3666/
---
[GitHub] carbondata issue #2128: [CARBONDATA-2303] [WIP] If dataload is failed for pa...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2128
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4752/
---
[GitHub] carbondata issue #2128: [CARBONDATA-2303] [WIP] If dataload is failed for pa...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2128
SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4245/
---
[GitHub] carbondata issue #2128: [WIP] partition table clean files fixed
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2128
Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3492/
---
[GitHub] carbondata issue #2128: [CARBONDATA-2303] If dataload is failed for parition...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2128
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4878/
---
[GitHub] carbondata issue #2128: [CARBONDATA-2303] If dataload is failed for parition...
Posted by rahulforallp <gi...@git.apache.org>.
Github user rahulforallp commented on the issue:
https://github.com/apache/carbondata/pull/2128
retest this please
---
[GitHub] carbondata pull request #2128: [CARBONDATA-2303] If dataload is failed for p...
Posted by gvramana <gi...@git.apache.org>.
Github user gvramana commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2128#discussion_r180109707
--- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/api/CarbonStore.scala ---
@@ -151,13 +153,82 @@ object CarbonStore {
}
}
} finally {
+ if (currentTablePartitions.equals(None)) {
+ cleanUpPartitionFoldersRecurssively(carbonTable, List.empty[PartitionSpec])
+ } else {
+ cleanUpPartitionFoldersRecurssively(carbonTable, currentTablePartitions.get.toList)
+ }
+
if (carbonCleanFilesLock != null) {
CarbonLockUtil.fileUnlock(carbonCleanFilesLock, LockUsage.CLEAN_FILES_LOCK)
}
}
LOGGER.audit(s"Clean files operation is success for $dbName.$tableName.")
}
+ /**
+ * delete partition folders recurssively
+ *
+ * @param carbonTable
+ * @param partitionSpecList
+ */
+ def cleanUpPartitionFoldersRecurssively(carbonTable: CarbonTable,
+ partitionSpecList: List[PartitionSpec]): Unit = {
+ if (carbonTable != null) {
+ val loadMetadataDetails = SegmentStatusManager
+ .readLoadMetadata(carbonTable.getMetadataPath)
+
+ val fileType = FileFactory.getFileType(carbonTable.getTablePath)
+ val carbonFile = FileFactory.getCarbonFile(carbonTable.getTablePath, fileType)
+
+ // list all files from table path
+ val listOfDefaultPartFilesIterator = carbonFile.listFiles(true)
+ loadMetadataDetails.foreach { metadataDetail =>
+ if (metadataDetail.getSegmentStatus.equals(SegmentStatus.MARKED_FOR_DELETE) &&
+ metadataDetail.getSegmentFile == null) {
+ val loadStartTime: Long = metadataDetail.getLoadStartTime
+ // delete all files of @loadStartTime from tablepath
+ cleanPartitionFolder(listOfDefaultPartFilesIterator, loadStartTime)
+ partitionSpecList.foreach {
+ partitionSpec =>
+ val partitionLocation = partitionSpec.getLocation
+ // For partition folder outside the tablePath
+ if (!partitionLocation.toString.startsWith(carbonTable.getTablePath)) {
+ val fileType = FileFactory.getFileType(partitionLocation.toString)
+ val partitionCarbonFile = FileFactory
+ .getCarbonFile(partitionLocation.toString, fileType)
+ // list all files from partitionLoacation
+ val listOfExternalPartFilesIterator = partitionCarbonFile.listFiles(true)
+ // delete all files of @loadStartTime from externalPath
+ cleanPartitionFolder(listOfExternalPartFilesIterator, loadStartTime)
+ }
+ }
+ }
+ }
+ }
+ }
+
+ /**
+ *
+ * @param carbonFiles
+ * @param timestamp
+ */
+ private def cleanPartitionFolder(carbonFiles: Array[CarbonFile],
+ timestamp: Long): Unit = {
+ carbonFiles.foreach {
+ carbonFile =>
+ val filePath = carbonFile.getPath
+ val fileName = carbonFile.getName
+ if (fileName.lastIndexOf("-") > 0 && fileName.lastIndexOf(".") > 0) {
+ if (fileName.substring(fileName.lastIndexOf("-") + 1, fileName.lastIndexOf("."))
--- End diff --
move getCarbonFileTimeStamp function can be moved to CarbonTablePath
Change function name to cleanCarbonFilesInFolder
---
[GitHub] carbondata issue #2128: [WIP] partition table clean files fixed
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2128
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4713/
---
[GitHub] carbondata issue #2128: [CARBONDATA-2303] If dataload is failed for parition...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2128
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4800/
---
[GitHub] carbondata pull request #2128: [CARBONDATA-2303] If dataload is failed for p...
Posted by gvramana <gi...@git.apache.org>.
Github user gvramana commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2128#discussion_r180034323
--- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/api/CarbonStore.scala ---
@@ -151,13 +152,88 @@ object CarbonStore {
}
}
} finally {
+ if (currentTablePartitions.equals(None)) {
+ cleanUpPartitionFoldersRecurssively(carbonTable, List.empty[PartitionSpec])
+ } else {
+ cleanUpPartitionFoldersRecurssively(carbonTable, currentTablePartitions.get.toList)
+ }
+
if (carbonCleanFilesLock != null) {
CarbonLockUtil.fileUnlock(carbonCleanFilesLock, LockUsage.CLEAN_FILES_LOCK)
}
}
LOGGER.audit(s"Clean files operation is success for $dbName.$tableName.")
}
+ /**
+ * delete partition folders recurssively
+ *
+ * @param carbonTable
+ * @param partitionSpecList
+ */
+ def cleanUpPartitionFoldersRecurssively(carbonTable: CarbonTable,
+ partitionSpecList: List[PartitionSpec]): Unit = {
+ if (carbonTable != null) {
+ val loadMetadataDetails = SegmentStatusManager
--- End diff --
1. partition folders cannot be deleted, as there is no way to check if new dataload is using them.
2. Shouldnot take multiple snapshots of file system during clean files.
3. Partition location will be valid for partitions inside table path also, those folders should not be scanned twice.
4. CarbonFile interface should be used for filesystem operations.
---
[GitHub] carbondata issue #2128: [WIP] partition table clean files fixed
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2128
Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3489/
---
[GitHub] carbondata issue #2128: [CARBONDATA-2303] If dataload is failed for parition...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2128
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4946/
---
[GitHub] carbondata issue #2128: [CARBONDATA-2303] [WIP] If dataload is failed for pa...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2128
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4782/
---
[GitHub] carbondata issue #2128: [CARBONDATA-2303] If dataload is failed for parition...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2128
SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4290/
---