You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by zzcclp <gi...@git.apache.org> on 2018/03/22 17:41:31 UTC

[GitHub] carbondata pull request #2091: [CARBONDATA-2258] Separate visible and invisi...

GitHub user zzcclp opened a pull request:

    https://github.com/apache/carbondata/pull/2091

    [CARBONDATA-2258] Separate visible and invisible segments info into two files to reduce the size of tablestatus file.

    The size of the tablestatus file is getting larger, there are many places will scan this file and it will impact the performance of reading this file.
    According to the discussion on [thread](http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/The-size-of-the-tablestatus-file-is-getting-larger-does-it-impact-the-performance-of-reading-this-fi-td41941.html), it can *append* the
    invisible segment list to the file called 'tablestatus.history' when execute
    command 'CLEAN FILES FOR TABLE' (in method 'SegmentStatusManager.deleteLoadsAndUpdateMetadata') every time, separate visible and invisible segments into two files(tablestatus file and tablestatus.history file).
    
    Be sure to do all of the following checklist to help us incorporate 
    your contribution quickly and easily:
    
     - [ ] Any interfaces changed?
     
     - [ ] Any backward compatibility impacted?
     
     - [ ] Document update required?
    
     - [ ] Testing done
            Please provide details on 
            - Whether new unit test cases have been added or why no new tests are required?
            - How it is tested? Please attach test report.
            - Is it a performance related change? Please attach the performance test report.
            - Any additional information to help reviewers in testing this change.
           
     - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. 
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zzcclp/carbondata CARBONDATA-2258

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2091.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2091
    
----
commit 46301958107bd7ee2800eb891d19c885772b6a6c
Author: Zhang Zhichao <44...@...>
Date:   2018-03-22T17:39:36Z

    [CARBONDATA-2258] Separate visible and invisible segments info into two files to reduce the size of tablestatus file.
    
    The size of the tablestatus file is getting larger, there are many places will scan this file and it will impact the performance of reading this file.
    According to the discussion on [thread|http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/The-size-of-the-tablestatus-file-is-getting-larger-does-it-impact-the-performance-of-reading-this-fi-td41941.html], it can *append* the
    invisible segment list to the file called 'tablestatus.history' when execute
    command 'CLEAN FILES FOR TABLE' (in method 'SegmentStatusManager.deleteLoadsAndUpdateMetadata') every time, separate visible and invisible segments into two files(tablestatus file and tablestatus.history file).

----


---

[GitHub] carbondata issue #2091: [CARBONDATA-2258] Separate visible and invisible seg...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2091
  
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3288/



---

[GitHub] carbondata issue #2091: [CARBONDATA-2258] Separate visible and invisible seg...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2091
  
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3322/



---

[GitHub] carbondata issue #2091: [CARBONDATA-2258] Separate visible and invisible seg...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2091
  
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4017/



---

[GitHub] carbondata pull request #2091: [CARBONDATA-2258] Separate visible and invisi...

Posted by zzcclp <gi...@git.apache.org>.
Github user zzcclp commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2091#discussion_r176907591
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java ---
    @@ -913,4 +950,91 @@ public static void deleteLoadsAndUpdateMetadata(
         CarbonLockUtil.deleteExpiredSegmentLockFiles(carbonTable);
       }
     
    +  /**
    +   * Get the number of invisible segment info from segment info list.
    +   */
    +  public static int countInvisibleSegments(LoadMetadataDetails[] segmentList) {
    +    int invisibleSegmentCnt = 0;
    +    if (segmentList.length != 0) {
    +      for (LoadMetadataDetails eachSeg : segmentList) {
    +        // can not remove segment 0, there are some info will be used later
    +        // for example: updateStatusFileName
    +        if (!eachSeg.getLoadName().equalsIgnoreCase("0")
    +            && eachSeg.getVisibility().equalsIgnoreCase("false")) {
    +          invisibleSegmentCnt += 1;
    +        }
    +      }
    +    }
    +    return invisibleSegmentCnt;
    +  }
    +
    +  private static class TableStatusReturnTuple {
    +    LoadMetadataDetails[] arrayOfLoadDetails;
    +    LoadMetadataDetails[] arrayOfLoadHistoryDetails;
    +    TableStatusReturnTuple(LoadMetadataDetails[] arrayOfLoadDetails,
    +        LoadMetadataDetails[] arrayOfLoadHistoryDetails) {
    +      this.arrayOfLoadDetails = arrayOfLoadDetails;
    +      this.arrayOfLoadHistoryDetails = arrayOfLoadHistoryDetails;
    +    }
    +  }
    +
    +  /**
    +   * Separate visible and invisible segments into two array.
    +   */
    +  public static TableStatusReturnTuple separateVisibleAndInvisibleSegments(
    +      LoadMetadataDetails[] oldList,
    +      LoadMetadataDetails[] newList,
    +      int invisibleSegmentCnt) {
    +    int newSegmentsLength = newList.length;
    +    int visibleSegmentCnt = newSegmentsLength - invisibleSegmentCnt;
    +    LoadMetadataDetails[] arrayOfVisibleSegments = new LoadMetadataDetails[visibleSegmentCnt];
    +    LoadMetadataDetails[] arrayOfInvisibleSegments = new LoadMetadataDetails[invisibleSegmentCnt];
    +    int oldSegmentsLength = oldList.length;
    +    int visibleIdx = 0;
    +    int invisibleIdx = 0;
    +    for (int i = 0; i < newSegmentsLength; i++) {
    +      LoadMetadataDetails newSegment = newList[i];
    +      if (i < oldSegmentsLength) {
    +        LoadMetadataDetails oldSegment = oldList[i];
    +        if (newSegment.getLoadName().equalsIgnoreCase("0")) {
    +          newSegment.setVisibility(oldSegment.getVisibility());
    +          arrayOfVisibleSegments[visibleIdx] = newSegment;
    +          visibleIdx++;
    +        } else if ("false".equalsIgnoreCase(oldSegment.getVisibility())) {
    +          newSegment.setVisibility("false");
    +          arrayOfInvisibleSegments[invisibleIdx] = newSegment;
    +          invisibleIdx++;
    +        } else {
    +          arrayOfVisibleSegments[visibleIdx] = newSegment;
    +          visibleIdx++;
    +        }
    +      } else {
    +        arrayOfVisibleSegments[visibleIdx] = newSegment;
    +        visibleIdx++;
    +      }
    +    }
    +    return new TableStatusReturnTuple(arrayOfVisibleSegments, arrayOfInvisibleSegments);
    +  }
    +
    +  /**
    +   * Append new invisible segment info to old list.
    --- End diff --
    
    Done


---

[GitHub] carbondata pull request #2091: [CARBONDATA-2258] Separate visible and invisi...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/carbondata/pull/2091


---

[GitHub] carbondata issue #2091: [CARBONDATA-2258] Separate visible and invisible seg...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on the issue:

    https://github.com/apache/carbondata/pull/2091
  
    LGTM


---

[GitHub] carbondata issue #2091: [CARBONDATA-2258] Separate visible and invisible seg...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2091
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4549/



---

[GitHub] carbondata issue #2091: [CARBONDATA-2258] Separate visible and invisible seg...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2091
  
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4044/



---

[GitHub] carbondata pull request #2091: [CARBONDATA-2258] Separate visible and invisi...

Posted by zzcclp <gi...@git.apache.org>.
Github user zzcclp commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2091#discussion_r176907586
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ---
    @@ -1602,8 +1602,20 @@
       // default value is 2 days
       public static final String CARBON_SEGMENT_LOCK_FILES_PRESERVE_HOURS_DEFAULT = "48";
     
    +  /**
    +   * The number of invisible segment info which will be preserved in tablestatus file,
    +   * if it exceeds this value, they will be removed and write to tablestatus.history file.
    +   */
    +  @CarbonProperty
    +  public static final String CARBON_INVISIBLE_SEGMENTS_PRESERVE_COUNT =
    +      "carbon.invisible.segments.preserve.count";
    +
    +  /**
    +   * default value is 20, it means that it will preserve 20 invisible segment info
    --- End diff --
    
    Done


---

[GitHub] carbondata issue #2091: [CARBONDATA-2258] Separate visible and invisible seg...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2091
  
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3303/



---

[GitHub] carbondata issue #2091: [CARBONDATA-2258] Separate visible and invisible seg...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2091
  
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4031/



---

[GitHub] carbondata issue #2091: [CARBONDATA-2258] Separate visible and invisible seg...

Posted by zzcclp <gi...@git.apache.org>.
Github user zzcclp commented on the issue:

    https://github.com/apache/carbondata/pull/2091
  
    retest sdv please.


---

[GitHub] carbondata issue #2091: [CARBONDATA-2258] Separate visible and invisible seg...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2091
  
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3318/



---

[GitHub] carbondata issue #2091: [CARBONDATA-2258] Separate visible and invisible seg...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2091
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4565/



---

[GitHub] carbondata issue #2091: [CARBONDATA-2258] Separate visible and invisible seg...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2091
  
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4043/



---

[GitHub] carbondata issue #2091: [CARBONDATA-2258] Separate visible and invisible seg...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2091
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4542/



---

[GitHub] carbondata issue #2091: [CARBONDATA-2258] Separate visible and invisible seg...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2091
  
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3308/



---

[GitHub] carbondata issue #2091: [CARBONDATA-2258] Separate visible and invisible seg...

Posted by zzcclp <gi...@git.apache.org>.
Github user zzcclp commented on the issue:

    https://github.com/apache/carbondata/pull/2091
  
    @jackylk @ravipesala  please help to review, thanks


---

[GitHub] carbondata issue #2091: [CARBONDATA-2258] Separate visible and invisible seg...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2091
  
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4061/



---

[GitHub] carbondata pull request #2091: [CARBONDATA-2258] Separate visible and invisi...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2091#discussion_r176903880
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ---
    @@ -1602,8 +1602,20 @@
       // default value is 2 days
       public static final String CARBON_SEGMENT_LOCK_FILES_PRESERVE_HOURS_DEFAULT = "48";
     
    +  /**
    +   * The number of invisible segment info which will be preserved in tablestatus file,
    +   * if it exceeds this value, they will be removed and write to tablestatus.history file.
    +   */
    +  @CarbonProperty
    +  public static final String CARBON_INVISIBLE_SEGMENTS_PRESERVE_COUNT =
    +      "carbon.invisible.segments.preserve.count";
    +
    +  /**
    +   * default value is 20, it means that it will preserve 20 invisible segment info
    --- End diff --
    
    The default is 200 right. Please update comment


---

[GitHub] carbondata pull request #2091: [CARBONDATA-2258] Separate visible and invisi...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2091#discussion_r176903993
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java ---
    @@ -913,4 +950,91 @@ public static void deleteLoadsAndUpdateMetadata(
         CarbonLockUtil.deleteExpiredSegmentLockFiles(carbonTable);
       }
     
    +  /**
    +   * Get the number of invisible segment info from segment info list.
    +   */
    +  public static int countInvisibleSegments(LoadMetadataDetails[] segmentList) {
    +    int invisibleSegmentCnt = 0;
    +    if (segmentList.length != 0) {
    +      for (LoadMetadataDetails eachSeg : segmentList) {
    +        // can not remove segment 0, there are some info will be used later
    +        // for example: updateStatusFileName
    +        if (!eachSeg.getLoadName().equalsIgnoreCase("0")
    +            && eachSeg.getVisibility().equalsIgnoreCase("false")) {
    +          invisibleSegmentCnt += 1;
    +        }
    +      }
    +    }
    +    return invisibleSegmentCnt;
    +  }
    +
    +  private static class TableStatusReturnTuple {
    +    LoadMetadataDetails[] arrayOfLoadDetails;
    +    LoadMetadataDetails[] arrayOfLoadHistoryDetails;
    +    TableStatusReturnTuple(LoadMetadataDetails[] arrayOfLoadDetails,
    +        LoadMetadataDetails[] arrayOfLoadHistoryDetails) {
    +      this.arrayOfLoadDetails = arrayOfLoadDetails;
    +      this.arrayOfLoadHistoryDetails = arrayOfLoadHistoryDetails;
    +    }
    +  }
    +
    +  /**
    +   * Separate visible and invisible segments into two array.
    +   */
    +  public static TableStatusReturnTuple separateVisibleAndInvisibleSegments(
    +      LoadMetadataDetails[] oldList,
    +      LoadMetadataDetails[] newList,
    +      int invisibleSegmentCnt) {
    +    int newSegmentsLength = newList.length;
    +    int visibleSegmentCnt = newSegmentsLength - invisibleSegmentCnt;
    +    LoadMetadataDetails[] arrayOfVisibleSegments = new LoadMetadataDetails[visibleSegmentCnt];
    +    LoadMetadataDetails[] arrayOfInvisibleSegments = new LoadMetadataDetails[invisibleSegmentCnt];
    +    int oldSegmentsLength = oldList.length;
    +    int visibleIdx = 0;
    +    int invisibleIdx = 0;
    +    for (int i = 0; i < newSegmentsLength; i++) {
    +      LoadMetadataDetails newSegment = newList[i];
    +      if (i < oldSegmentsLength) {
    +        LoadMetadataDetails oldSegment = oldList[i];
    +        if (newSegment.getLoadName().equalsIgnoreCase("0")) {
    +          newSegment.setVisibility(oldSegment.getVisibility());
    +          arrayOfVisibleSegments[visibleIdx] = newSegment;
    +          visibleIdx++;
    +        } else if ("false".equalsIgnoreCase(oldSegment.getVisibility())) {
    +          newSegment.setVisibility("false");
    +          arrayOfInvisibleSegments[invisibleIdx] = newSegment;
    +          invisibleIdx++;
    +        } else {
    +          arrayOfVisibleSegments[visibleIdx] = newSegment;
    +          visibleIdx++;
    +        }
    +      } else {
    +        arrayOfVisibleSegments[visibleIdx] = newSegment;
    +        visibleIdx++;
    +      }
    +    }
    +    return new TableStatusReturnTuple(arrayOfVisibleSegments, arrayOfInvisibleSegments);
    +  }
    +
    +  /**
    +   * Append new invisible segment info to old list.
    --- End diff --
    
    change to `Pick all invisible segment entries in appendList and add them into historyList`


---

[GitHub] carbondata issue #2091: [CARBONDATA-2258] Separate visible and invisible seg...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2091
  
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4035/



---

[GitHub] carbondata issue #2091: [CARBONDATA-2258] Separate visible and invisible seg...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2091
  
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3283/



---

[GitHub] carbondata issue #2091: [CARBONDATA-2258] Separate visible and invisible seg...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2091
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4545/



---

[GitHub] carbondata issue #2091: [CARBONDATA-2258] Separate visible and invisible seg...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2091
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4531/



---

[GitHub] carbondata issue #2091: [CARBONDATA-2258] Separate visible and invisible seg...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2091
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4536/



---

[GitHub] carbondata issue #2091: [CARBONDATA-2258] Separate visible and invisible seg...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2091
  
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4040/



---

[GitHub] carbondata issue #2091: [CARBONDATA-2258] Separate visible and invisible seg...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2091
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4516/



---

[GitHub] carbondata issue #2091: [CARBONDATA-2258] Separate visible and invisible seg...

Posted by zzcclp <gi...@git.apache.org>.
Github user zzcclp commented on the issue:

    https://github.com/apache/carbondata/pull/2091
  
    retest this please


---

[GitHub] carbondata issue #2091: [CARBONDATA-2258] Separate visible and invisible seg...

Posted by zzcclp <gi...@git.apache.org>.
Github user zzcclp commented on the issue:

    https://github.com/apache/carbondata/pull/2091
  
    retest this please


---

[GitHub] carbondata issue #2091: [CARBONDATA-2258] Separate visible and invisible seg...

Posted by zzcclp <gi...@git.apache.org>.
Github user zzcclp commented on the issue:

    https://github.com/apache/carbondata/pull/2091
  
    retest sdv please


---

[GitHub] carbondata issue #2091: [CARBONDATA-2258] Separate visible and invisible seg...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2091
  
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3339/



---

[GitHub] carbondata issue #2091: [CARBONDATA-2258] Separate visible and invisible seg...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2091
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4511/



---

[GitHub] carbondata issue #2091: [CARBONDATA-2258] Separate visible and invisible seg...

Posted by zzcclp <gi...@git.apache.org>.
Github user zzcclp commented on the issue:

    https://github.com/apache/carbondata/pull/2091
  
    retest this please


---