You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by jackylk <gi...@git.apache.org> on 2017/08/03 06:41:11 UTC

[GitHub] carbondata pull request #1227: Overwrite

GitHub user jackylk opened a pull request:

    https://github.com/apache/carbondata/pull/1227

    Overwrite

    When user issued insert overwrite command, it should delete the file immediately to avoid stale folders.
    This PR implements this behavior.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jackylk/incubator-carbondata overwrite

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/1227.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1227
    
----
commit 72d63d7aec0fac8a86c375e70d20ac64ef26c9e7
Author: Jacky Li <ja...@qq.com>
Date:   2017-08-03T06:17:15Z

    wip

commit 9def5d906b4cc55060959c1fc23821414b7d5356
Author: Jacky Li <ja...@qq.com>
Date:   2017-08-03T06:37:39Z

    delete file when insert overwrite

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1227: [CARBONDATA-1356] Delete stale folder in inse...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1227#discussion_r131153248
  
    --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala ---
    @@ -713,16 +714,20 @@ object CommonUtil {
                             SegmentStatusManager.readLoadMetadata(
                               carbonTablePath.getMetadataDirectoryPath)
                           var loadInprogressExist = false
    +                      val staleFolders: Seq[CarbonFile] = Seq()
                           listOfLoadFolderDetailsArray.foreach { load =>
                             if (load.getLoadStatus.equals(LoadStatusType.IN_PROGRESS.getMessage) ||
                                 load.getLoadStatus.equals(LoadStatusType.INSERT_OVERWRITE.getMessage)) {
                               load.setLoadStatus(CarbonCommonConstants.MARKED_FOR_DELETE)
    +                          staleFolders :+ FileFactory.getCarbonFile(
    +                            carbonTablePath.getCarbonDataDirectoryPath("0", load.getLoadName))
    --- End diff --
    
    Because it is always safe to update the status file first before deleting anything.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1227: [CARBONDATA-1356] Delete stale folder in inse...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1227#discussion_r131097020
  
    --- Diff: integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala ---
    @@ -907,25 +906,20 @@ object CarbonDataRDDFactory {
               throw new Exception("No Data to load")
             }
             val metadataDetails = status(0)._2._1
    -        if (!isAgg) {
    --- End diff --
    
    isAgg is always false


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1227: [CARBONDATA-1356] Delete stale folder in insert over...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1227
  
    @jackylk it seems some tests are failing in 1.6


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1227: [CARBONDATA-1356] Delete stale folder in inse...

Posted by xuchuanyin <gi...@git.apache.org>.
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1227#discussion_r131138030
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/impl/FileFactory.java ---
    @@ -87,6 +87,10 @@ else if (path.startsWith(CarbonCommonConstants.VIEWFSURL_PREFIX)) {
         return FileType.LOCAL;
       }
     
    +  public static CarbonFile getCarbonFile(String path) {
    --- End diff --
    
    it seems we can simplify many methods by removing FileType arg


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1227: [CARBONDATA-1356] Delete stale folder in insert over...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1227
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3348/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1227: [CARBONDATA-1356] Delete stale folder in inse...

Posted by xuchuanyin <gi...@git.apache.org>.
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1227#discussion_r131137501
  
    --- Diff: integration/spark-common/src/main/java/org/apache/carbondata/spark/load/CarbonLoaderUtil.java ---
    @@ -312,18 +313,28 @@ public static boolean recordLoadMetadata(LoadMetadataDetails newMetaEntry,
               }
               if (insertOverwrite) {
                 for (LoadMetadataDetails entry : listOfLoadFolderDetails) {
    -              entry.setLoadStatus(CarbonCommonConstants.MARKED_FOR_DELETE);
    +              if (!entry.getLoadStatus().equals(LoadStatusType.INSERT_OVERWRITE.getMessage())) {
    +                entry.setLoadStatus(CarbonCommonConstants.MARKED_FOR_DELETE);
    +                // For insert overwrite, we will delete the old segment folder immediately
    +                // So collect the old segments here
    +                String path = carbonTablePath.getCarbonDataDirectoryPath("0", entry.getLoadName());
    +                staleFolders.add(FileFactory.getCarbonFile(path));
    --- End diff --
    
    as another comment mentioned...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1227: [CARBONDATA-1356] Delete stale folder in inse...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1227#discussion_r131152011
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/impl/FileFactory.java ---
    @@ -87,6 +87,10 @@ else if (path.startsWith(CarbonCommonConstants.VIEWFSURL_PREFIX)) {
         return FileType.LOCAL;
       }
     
    +  public static CarbonFile getCarbonFile(String path) {
    --- End diff --
    
    Yes, I think this can be done in another PR


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1227: [CARBONDATA-1356] Delete stale folder in insert over...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1227
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3340/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1227: [CARBONDATA-1356] Delete stale folder in insert over...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1227
  
    SDV Build Failed with Spark 2.1, Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/88/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1227: [CARBONDATA-1356] Delete stale folder in insert over...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1227
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1227: [CARBONDATA-1356] Delete stale folder in inse...

Posted by xuchuanyin <gi...@git.apache.org>.
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1227#discussion_r131294374
  
    --- Diff: integration/spark-common/src/main/java/org/apache/carbondata/spark/load/CarbonLoaderUtil.java ---
    @@ -312,18 +313,28 @@ public static boolean recordLoadMetadata(LoadMetadataDetails newMetaEntry,
               }
               if (insertOverwrite) {
                 for (LoadMetadataDetails entry : listOfLoadFolderDetails) {
    -              entry.setLoadStatus(CarbonCommonConstants.MARKED_FOR_DELETE);
    +              if (!entry.getLoadStatus().equals(LoadStatusType.INSERT_OVERWRITE.getMessage())) {
    +                entry.setLoadStatus(CarbonCommonConstants.MARKED_FOR_DELETE);
    +                // For insert overwrite, we will delete the old segment folder immediately
    +                // So collect the old segments here
    +                String path = carbonTablePath.getCarbonDataDirectoryPath("0", entry.getLoadName());
    +                staleFolders.add(FileFactory.getCarbonFile(path));
    +              }
                 }
               }
               listOfLoadFolderDetails.set(indexToOverwriteNewMetaEntry, newMetaEntry);
             }
             SegmentStatusManager.writeLoadDetailsIntoFile(tableStatusPath, listOfLoadFolderDetails
                 .toArray(new LoadMetadataDetails[listOfLoadFolderDetails.size()]));
    +        // Delete all old stale segment folders
    +        for (CarbonFile staleFolder : staleFolders) {
    +          CarbonUtil.deleteFoldersAndFiles(staleFolder);
    +        }
    --- End diff --
    
    but it seems that we delete the folder without checking the operation status here,different from the behavior in another changes. Am I missing something?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1227: [CARBONDATA-1356] Delete stale folder in inse...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1227#discussion_r131153192
  
    --- Diff: integration/spark-common/src/main/java/org/apache/carbondata/spark/load/CarbonLoaderUtil.java ---
    @@ -312,18 +313,28 @@ public static boolean recordLoadMetadata(LoadMetadataDetails newMetaEntry,
               }
               if (insertOverwrite) {
                 for (LoadMetadataDetails entry : listOfLoadFolderDetails) {
    -              entry.setLoadStatus(CarbonCommonConstants.MARKED_FOR_DELETE);
    +              if (!entry.getLoadStatus().equals(LoadStatusType.INSERT_OVERWRITE.getMessage())) {
    +                entry.setLoadStatus(CarbonCommonConstants.MARKED_FOR_DELETE);
    +                // For insert overwrite, we will delete the old segment folder immediately
    +                // So collect the old segments here
    +                String path = carbonTablePath.getCarbonDataDirectoryPath("0", entry.getLoadName());
    +                staleFolders.add(FileFactory.getCarbonFile(path));
    --- End diff --
    
    The same reason


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1227: [CARBONDATA-1356] Delete stale folder in inse...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1227#discussion_r131154027
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/impl/FileFactory.java ---
    @@ -87,6 +87,10 @@ else if (path.startsWith(CarbonCommonConstants.VIEWFSURL_PREFIX)) {
         return FileType.LOCAL;
       }
     
    +  public static CarbonFile getCarbonFile(String path) {
    --- End diff --
    
    I don't think it is better to change the existing caller methods, going forward user can use this if he needs it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1227: [CARBONDATA-1356] Delete stale folder in insert over...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1227
  
    Build Failed with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/742/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1227: [CARBONDATA-1356] Delete stale folder in insert over...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1227
  
    SDV Build Failed with Spark 2.1, Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/82/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1227: [CARBONDATA-1356] Delete stale folder in insert over...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1227
  
    Build Success with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/764/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1227: [CARBONDATA-1356] Delete stale folder in insert over...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1227
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3361/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1227: [CARBONDATA-1356] Delete stale folder in insert over...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1227
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3343/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1227: [CARBONDATA-1356] Delete stale folder in insert over...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1227
  
    Build Failed with Spark 1.6, Please check CI http://144.76.159.231:8080/job/ApacheCarbonPRBuilder/750/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1227: [CARBONDATA-1356] Delete stale folder in inse...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1227#discussion_r131314507
  
    --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/allqueries/InsertIntoCarbonTableTestCase.scala ---
    @@ -273,7 +251,9 @@ class InsertIntoCarbonTableTestCase extends QueryTest with BeforeAndAfterAll {
         sql("LOAD DATA INPATH '" + resourcesPath + "/100_olap.csv' overwrite INTO table TCarbonSourceOverwrite options ('DELIMITER'=',', 'QUOTECHAR'='\', 'FILEHEADER'='imei,deviceInformationId,MAC,deviceColor,device_backColor,modelId,marketName,AMSize,ROMSize,CUPAudit,CPIClocked,series,productionDate,bomCode,internalModels,deliveryTime,channelsId,channelsName,deliveryAreaId,deliveryCountry,deliveryProvince,deliveryCity,deliveryDistrict,deliveryStreet,oxSingleNumber,ActiveCheckTime,ActiveAreaId,ActiveCountry,ActiveProvince,Activecity,ActiveDistrict,ActiveStreet,ActiveOperatorId,Active_releaseId,Active_EMUIVersion,Active_operaSysVersion,Active_BacVerNumber,Active_BacFlashVer,Active_webUIVersion,Active_webUITypeCarrVer,Active_webTypeDataVerNumber,Active_operatorsVersion,Active_phonePADPartitionedVersions,Latest_YEAR,Latest_MONTH,Latest_DAY,Latest_HOUR,Latest_areaId,Latest_country,Latest_province,Latest_city,Latest_district,Latest_street,Latest_releaseId,Latest_EMUIVersion,Latest_operaS
 ysVersion,Latest_BacVerNumber,Latest_BacFlashVer,Latest_webUIVersion,Latest_webUITypeCarrVer,Latest_webTypeDataVerNumber,Latest_operatorsVersion,Latest_phonePADPartitionedVersions,Latest_operatorId,gamePointDescription,gamePointId,contractNumber')")
         sql(s"LOAD DATA local INPATH '$resourcesPath/100_olap.csv' overwrite INTO TABLE HiveOverwrite")
         checkAnswer(sql("select count(*) from TCarbonSourceOverwrite"), sql("select count(*) from HiveOverwrite"))
    -    CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, timeStampPropOrig)
    +    val folder = new File(s"$storeLocation/default/CarbonOverwrite/Fact/Part0/")
    --- End diff --
    
    Use tcarbonsourceoverwrite not CarbonOverwrite and also use lower case as folder, so use as below to avoid test failure, 
    ```
    val folder = new File(s"$storeLocation/default/tcarbonsourceoverwrite/Fact/Part0/")
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1227: [CARBONDATA-1356] Delete stale folder in inse...

Posted by xuchuanyin <gi...@git.apache.org>.
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1227#discussion_r131295031
  
    --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala ---
    @@ -713,16 +714,20 @@ object CommonUtil {
                             SegmentStatusManager.readLoadMetadata(
                               carbonTablePath.getMetadataDirectoryPath)
                           var loadInprogressExist = false
    +                      val staleFolders: Seq[CarbonFile] = Seq()
                           listOfLoadFolderDetailsArray.foreach { load =>
                             if (load.getLoadStatus.equals(LoadStatusType.IN_PROGRESS.getMessage) ||
                                 load.getLoadStatus.equals(LoadStatusType.INSERT_OVERWRITE.getMessage)) {
                               load.setLoadStatus(CarbonCommonConstants.MARKED_FOR_DELETE)
    +                          staleFolders :+ FileFactory.getCarbonFile(
    +                            carbonTablePath.getCarbonDataDirectoryPath("0", load.getLoadName))
    --- End diff --
    
    yeah,I mistakenly thought it was `if (loadInprogressExist)` that matters


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1227: [CARBONDATA-1356] Delete stale folder in inse...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/carbondata/pull/1227


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1227: [CARBONDATA-1356] Delete stale folder in inse...

Posted by xuchuanyin <gi...@git.apache.org>.
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1227#discussion_r131294865
  
    --- Diff: integration/spark-common/src/main/java/org/apache/carbondata/spark/load/CarbonLoaderUtil.java ---
    @@ -312,18 +313,28 @@ public static boolean recordLoadMetadata(LoadMetadataDetails newMetaEntry,
               }
               if (insertOverwrite) {
                 for (LoadMetadataDetails entry : listOfLoadFolderDetails) {
    -              entry.setLoadStatus(CarbonCommonConstants.MARKED_FOR_DELETE);
    +              if (!entry.getLoadStatus().equals(LoadStatusType.INSERT_OVERWRITE.getMessage())) {
    +                entry.setLoadStatus(CarbonCommonConstants.MARKED_FOR_DELETE);
    +                // For insert overwrite, we will delete the old segment folder immediately
    +                // So collect the old segments here
    +                String path = carbonTablePath.getCarbonDataDirectoryPath("0", entry.getLoadName());
    +                staleFolders.add(FileFactory.getCarbonFile(path));
    +              }
                 }
               }
               listOfLoadFolderDetails.set(indexToOverwriteNewMetaEntry, newMetaEntry);
             }
             SegmentStatusManager.writeLoadDetailsIntoFile(tableStatusPath, listOfLoadFolderDetails
                 .toArray(new LoadMetadataDetails[listOfLoadFolderDetails.size()]));
    +        // Delete all old stale segment folders
    +        for (CarbonFile staleFolder : staleFolders) {
    +          CarbonUtil.deleteFoldersAndFiles(staleFolder);
    +        }
    --- End diff --
    
    yeah,I get it~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1227: [CARBONDATA-1356] Delete stale folder in inse...

Posted by xuchuanyin <gi...@git.apache.org>.
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1227#discussion_r131137046
  
    --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala ---
    @@ -713,16 +714,20 @@ object CommonUtil {
                             SegmentStatusManager.readLoadMetadata(
                               carbonTablePath.getMetadataDirectoryPath)
                           var loadInprogressExist = false
    +                      val staleFolders: Seq[CarbonFile] = Seq()
                           listOfLoadFolderDetailsArray.foreach { load =>
                             if (load.getLoadStatus.equals(LoadStatusType.IN_PROGRESS.getMessage) ||
                                 load.getLoadStatus.equals(LoadStatusType.INSERT_OVERWRITE.getMessage)) {
                               load.setLoadStatus(CarbonCommonConstants.MARKED_FOR_DELETE)
    +                          staleFolders :+ FileFactory.getCarbonFile(
    +                            carbonTablePath.getCarbonDataDirectoryPath("0", load.getLoadName))
    --- End diff --
    
    why not delete the stale folder right now?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1227: [CARBONDATA-1356] Delete stale folder in inse...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1227#discussion_r131099457
  
    --- Diff: integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala ---
    @@ -907,25 +906,20 @@ object CarbonDataRDDFactory {
               throw new Exception("No Data to load")
             }
             val metadataDetails = status(0)._2._1
    -        if (!isAgg) {
    --- End diff --
    
    yes, it is not required now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1227: [CARBONDATA-1356] Delete stale folder in insert over...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1227
  
    SDV Build Success with Spark 2.1, Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/83/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1227: [CARBONDATA-1356] Delete stale folder in inse...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1227#discussion_r131314408
  
    --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/allqueries/InsertIntoCarbonTableTestCase.scala ---
    @@ -252,13 +231,12 @@ class InsertIntoCarbonTableTestCase extends QueryTest with BeforeAndAfterAll {
         sql("insert overwrite table CarbonOverwrite select * from THive")
         sql("insert overwrite table HiveOverwrite select * from THive")
         checkAnswer(sql("select count(*) from CarbonOverwrite"), sql("select count(*) from HiveOverwrite"))
    -    CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, timeStampPropOrig)
    +    val folder = new File(s"$storeLocation/default/CarbonOverwrite/Fact/Part0/")
    --- End diff --
    
    CarbonOverwrite is always writes in lower case as folder, so use as below to avoid test failure, 
    ```
    val folder = new File(s"$storeLocation/default/carbonoverwrite/Fact/Part0/")
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1227: Overwrite

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1227
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/3339/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata pull request #1227: [CARBONDATA-1356] Delete stale folder in inse...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1227#discussion_r131153070
  
    --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala ---
    @@ -713,16 +714,20 @@ object CommonUtil {
                             SegmentStatusManager.readLoadMetadata(
                               carbonTablePath.getMetadataDirectoryPath)
                           var loadInprogressExist = false
    +                      val staleFolders: Seq[CarbonFile] = Seq()
                           listOfLoadFolderDetailsArray.foreach { load =>
                             if (load.getLoadStatus.equals(LoadStatusType.IN_PROGRESS.getMessage) ||
                                 load.getLoadStatus.equals(LoadStatusType.INSERT_OVERWRITE.getMessage)) {
                               load.setLoadStatus(CarbonCommonConstants.MARKED_FOR_DELETE)
    +                          staleFolders :+ FileFactory.getCarbonFile(
    +                            carbonTablePath.getCarbonDataDirectoryPath("0", load.getLoadName))
    --- End diff --
    
    It is saver to delete after writing the new table status file (line 729), if we delete the folder here, it could result in inconsistent state if system crash between line 723 and line 729


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] carbondata issue #1227: [CARBONDATA-1356] Delete stale folder in insert over...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1227
  
    SDV Build Failed with Spark 2.1, Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/95/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---