You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by Xaprice <gi...@git.apache.org> on 2018/01/16 10:03:52 UTC

[GitHub] carbondata pull request #1812: [CARBONDATA-2033]support user specified segme...

GitHub user Xaprice opened a pull request:

    https://github.com/apache/carbondata/pull/1812

    [CARBONDATA-2033]support user specified segments in major compation

    Be sure to do all of the following checklist to help us incorporate 
    your contribution quickly and easily:
    
     - [ ] Any interfaces changed?
    **no**
     - [ ] Any backward compatibility impacted?
      **no**
     - [x] Document update required?
    **Yes, data-management-on-carbondata.md has been updated.**
     - [x] Testing done
            Please provide details on 
            - Whether new unit test cases have been added or why no new tests are required?
           **yes**
            - How it is tested? Please attach test report.
           **test on cluster with 7 nodes**
            - Is it a performance related change? Please attach the performance test report.
           **no**
            - Any additional information to help reviewers in testing this change.
           
     - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. 
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/Xaprice/carbondata specified_segs_in_major_compact

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/1812.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1812
    
----
commit 96bddafbc9edf48cbb427a75d267178cc1cef2f8
Author: Jin Zhou <xa...@...>
Date:   2018-01-16T09:02:51Z

    [CARBONDATA-2033]support user specified segments in major compation

----


---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]Support user specified segments in ...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1812
  
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4296/



---

[GitHub] carbondata pull request #1812: [CARBONDATA-2033]Support user specified segme...

Posted by manishgupta88 <gi...@git.apache.org>.
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1812#discussion_r184398253
  
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonAlterTableCompactionCommand.scala ---
    @@ -212,13 +224,18 @@ case class CarbonAlterTableCompactionCommand(
         carbonLoadModel.setFactTimeStamp(loadStartTime)
     
         val isCompactionTriggerByDDl = true
    +    var segmentIds: Option[List[String]] = None
    +    if (compactionType == CompactionType.CUSTOM && alterTableModel.customSegmentIds.isDefined) {
    +      segmentIds = alterTableModel.customSegmentIds
    +    }
         val compactionModel = CompactionModel(compactionSize,
           compactionType,
           carbonTable,
           isCompactionTriggerByDDl,
           CarbonFilters.getCurrentPartitions(sqlContext.sparkSession,
             TableIdentifier(carbonTable.getTableName,
    -        Some(carbonTable.getDatabaseName)))
    +        Some(carbonTable.getDatabaseName))),
    +      segmentIds
    --- End diff --
    
    Please check for code alignment here. It does not seem to be proper


---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]support user specified segments in ...

Posted by Xaprice <gi...@git.apache.org>.
Github user Xaprice commented on the issue:

    https://github.com/apache/carbondata/pull/1812
  
    @ravipesala  Compacting adjacent segments is certainly the best practice in most cases. But is it not flexible enough to take it as  a mandatory rule?  


---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]support user specified segments in ...

Posted by chenliang613 <gi...@git.apache.org>.
Github user chenliang613 commented on the issue:

    https://github.com/apache/carbondata/pull/1812
  
    retest this please


---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]Support user specified segments in ...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1812
  
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2840/



---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]Support user specified segments in ...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1812
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5460/



---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]support user specified segments in ...

Posted by Xaprice <gi...@git.apache.org>.
Github user Xaprice commented on the issue:

    https://github.com/apache/carbondata/pull/1812
  
    @chenliang613  
    For question 1:  I thought minor compaction are mainly used in auto-merging scenario. But after reconsidering this feature, maybe it's better to support both major and minor compaction. I will add support of minor compaction soon.
    For question 2: I will follow your advice and modify the syntax to keep consistent syntax as "query with specified segments".


---

[GitHub] carbondata pull request #1812: [CARBONDATA-2033]Support user specified segme...

Posted by manishgupta88 <gi...@git.apache.org>.
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1812#discussion_r184398845
  
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/parser/CarbonSpark2SqlParser.scala ---
    @@ -124,11 +124,13 @@ class CarbonSpark2SqlParser extends CarbonDDLSqlParser {
     
     
       protected lazy val alterTable: Parser[LogicalPlan] =
    -    ALTER ~> TABLE ~> (ident <~ ".").? ~ ident ~ (COMPACT ~ stringLit) <~ opt(";")  ^^ {
    -      case dbName ~ table ~ (compact ~ compactType) =>
    +    ALTER ~> TABLE ~> (ident <~ ".").? ~ ident ~ (COMPACT ~ stringLit) ~
    +      (WHERE ~> (SEGMENT ~ "." ~ ID) ~> IN ~> "(" ~> repsep(segmentId, ",") <~ ")").? <~
    +      opt(";") ^^ {
    +      case dbName ~ table ~ (compact ~ compactType) ~ segs =>
             val altertablemodel =
               AlterTableModel(convertDbNameToLowerCase(dbName), table, None, compactType,
    -          Some(System.currentTimeMillis()), null)
    +            Some(System.currentTimeMillis()), null, segs)
    --- End diff --
    
    Check for code alignment


---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]support user specified segments in ...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1812
  
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2778/



---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]Support user specified segments in ...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1812
  
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4524/



---

[GitHub] carbondata pull request #1812: [CARBONDATA-2033]Support user specified segme...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/carbondata/pull/1812


---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]Support user specified segments in ...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1812
  
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4222/



---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]Support user specified segments in ...

Posted by Xaprice <gi...@git.apache.org>.
Github user Xaprice commented on the issue:

    https://github.com/apache/carbondata/pull/1812
  
    @manishgupta88, I've submitted some changes, have a look please.


---

[GitHub] carbondata pull request #1812: [CARBONDATA-2033]Support user specified segme...

Posted by manishgupta88 <gi...@git.apache.org>.
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1812#discussion_r184397997
  
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonAlterTableCompactionCommand.scala ---
    @@ -212,13 +224,18 @@ case class CarbonAlterTableCompactionCommand(
         carbonLoadModel.setFactTimeStamp(loadStartTime)
     
         val isCompactionTriggerByDDl = true
    +    var segmentIds: Option[List[String]] = None
    +    if (compactionType == CompactionType.CUSTOM && alterTableModel.customSegmentIds.isDefined) {
    +      segmentIds = alterTableModel.customSegmentIds
    +    }
    --- End diff --
    
    Modify the code to
    val segmentIds: Option[List[String]] = if (compactionType == CompactionType.CUSTOM && alterTableModel.customSegmentIds.isDefined) {
          alterTableModel.customSegmentIds
        } else {
          None
    	}


---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]Support user specified segments in ...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1812
  
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3972/



---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]Support user specified segments in ...

Posted by gvramana <gi...@git.apache.org>.
Github user gvramana commented on the issue:

    https://github.com/apache/carbondata/pull/1812
  
    @Xaprice  Currently Minor and Major compaction has fixed meaning, minor is based on frequency of segments and Major is based on size. So better to not to change the current meaning.
    Also CARBON_INPUT_SEGMENTS will impact only read query but will not impact any other DDL/DML.
     
    So you can add a new compaction type CUSTOM and pass the required segments in the same command, so that it will not create any confusion.
    so command can be
     ALTER TABLE tablename compact 'CUSTOM' '1, 2, 3, 4'
    It is also required to mention in documentation that it will not respect other features like preserve_segments, size etc. Also invalid segments in list are ignored. Also CUSTOM compacted segments will not participate in minor compaction triggered later.
    



---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]Support user specified segments in ...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1812
  
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4527/



---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]Support user specified segments in ...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1812
  
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3049/



---

[GitHub] carbondata pull request #1812: [CARBONDATA-2033]support user specified segme...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1812#discussion_r162245791
  
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java ---
    @@ -441,6 +452,30 @@ public int compare(LoadMetadataDetails seg1, LoadMetadataDetails seg2) {
         });
       }
     
    +  /**
    +   * This method will return the list of loads which are specified by user in SQL.
    +   *
    +   * @param listOfSegmentsLoadedInSameDateInterval
    +   * @param segmentIds
    +   * @return
    +   */
    +  private static List<LoadMetadataDetails> identitySegmentsToBeMergedBasedOnSpecifiedSegments(
    +          List<LoadMetadataDetails> listOfSegmentsLoadedInSameDateInterval,
    +          Set<String> segmentIds) {
    +    List<LoadMetadataDetails> listOfSegmentsSpecified =
    +            new ArrayList<>(CarbonCommonConstants.DEFAULT_COLLECTION_SIZE);
    +    if (segmentIds != null && segmentIds.size() != 0) {
    +      for (LoadMetadataDetails detail : listOfSegmentsLoadedInSameDateInterval) {
    +        if (isSegmentValid(detail) && segmentIds.contains(detail.getLoadName())) {
    --- End diff --
    
    If the specified segment is not valid, better throw exception about invalid segments instead of ignoring it


---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]Support user specified segments in ...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1812
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4085/



---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]Support user specified segments in ...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1812
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5182/



---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]Support user specified segments in ...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1812
  
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4188/



---

[GitHub] carbondata pull request #1812: [CARBONDATA-2033]Support user specified segme...

Posted by manishgupta88 <gi...@git.apache.org>.
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1812#discussion_r184401040
  
    --- Diff: processing/src/main/java/org/apache/carbondata/processing/merger/CarbonDataMergerUtil.java ---
    @@ -444,6 +450,26 @@ public int compare(LoadMetadataDetails seg1, LoadMetadataDetails seg2) {
         });
       }
     
    +  /**
    +   * This method will return the list of loads which are specified by user in SQL.
    +   *
    +   * @param listOfSegments
    +   * @param segmentIds
    +   * @return
    +   */
    +  private static List<LoadMetadataDetails> identitySegmentsToBeMergedBasedOnSpecifiedSegments(
    +          List<LoadMetadataDetails> listOfSegments,
    +          Set<String> segmentIds) {
    +    List<LoadMetadataDetails> listOfSegmentsSpecified =
    +            new ArrayList<>(CarbonCommonConstants.DEFAULT_COLLECTION_SIZE);
    +    for (LoadMetadataDetails detail : listOfSegments) {
    +      if (isSegmentValid(detail) && segmentIds.contains(detail.getLoadName())) {
    +        listOfSegmentsSpecified.add(detail);
    +      }
    +    }
    +    return listOfSegmentsSpecified;
    --- End diff --
    
    In case of custom compaction user is completely aware of the segments provided in the Alter SQL. Therefore in the segment List provided by the user if any segment is found invalid we should throw exception and compaction process should be aborted


---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]Support user specified segments in ...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1812
  
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2781/



---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]Support user specified segments in ...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1812
  
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4500/



---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]Support user specified segments in ...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1812
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4026/



---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]support user specified segments in ...

Posted by chenliang613 <gi...@git.apache.org>.
Github user chenliang613 commented on the issue:

    https://github.com/apache/carbondata/pull/1812
  
    please change the title to : [CARBONDATA-2033] Support user specified segments in major compaction


---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]Support user specified segments in ...

Posted by Xaprice <gi...@git.apache.org>.
Github user Xaprice commented on the issue:

    https://github.com/apache/carbondata/pull/1812
  
    retest this please


---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]Support user specified segments in ...

Posted by xuchuanyin <gi...@git.apache.org>.
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/1812
  
    @gvramana 
    I think ‘major’ and ‘minor’ is enough to describe compaction, there is no need to add another on. And 'custom' is somewhat ambiguous.
    
    As it is described in readme,
    ```
    In Major compaction, multiple segments can be merged into one large segment. User will specify the compaction size until which segments can be merged.
    ```
    The previous (default without condition) major compaction is size based, carbondata choose the segments by size. And for the newly major compaction (with condition), we specify the segments and let carbondata merge them into one large segment. They are no different. So we don't need an another compaction type.


---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]support user specified segments in ...

Posted by chenliang613 <gi...@git.apache.org>.
Github user chenliang613 commented on the issue:

    https://github.com/apache/carbondata/pull/1812
  
    Two questions:
    1. Why only consider major compaction with specified segments, no need to consider minor compaction?
    2. Whether can keep consistent syntax as "query with specified segments", or not ?
    a. First set segment id : "SET carbon.input.segments.dbname.tablename=1,3"
    b.Do compaction : ALTER TABLE tablename compact 'MAJOR' 



---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]Support user specified segments in ...

Posted by manishgupta88 <gi...@git.apache.org>.
Github user manishgupta88 commented on the issue:

    https://github.com/apache/carbondata/pull/1812
  
    LGTM...can be merged once build is passed
    Please raise a sub-jira task under the same jira to track the Custom compaction implementation for child tables/datamaps and add the jira link link here as we need to implement custom compaction for child tables/datamaps also.


---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]support user specified segments in ...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1812
  
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1690/



---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]support user specified segments in ...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1812
  
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/1593/



---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]Support user specified segments in ...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1812
  
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4220/



---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]support user specified segments in ...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1812
  
    @Xaprice I think we should have validation for the order of segments to be merged. For suppose we have segments of 1 to 8, and the user gives the compaction on 1, 5, 8 then this should not be valid as it will impact the order of data it is inserted. 


---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]support user specified segments in ...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1812
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4022/



---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]support user specified segments in ...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1812
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2828/



---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]Support user specified segments in ...

Posted by manishgupta88 <gi...@git.apache.org>.
Github user manishgupta88 commented on the issue:

    https://github.com/apache/carbondata/pull/1812
  
    retest sdv please


---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]Support user specified segments in ...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1812
  
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4626/



---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]support user specified segments in ...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1812
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2923/



---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]Support user specified segments in ...

Posted by bill1208 <gi...@git.apache.org>.
Github user bill1208 commented on the issue:

    https://github.com/apache/carbondata/pull/1812
  
    I agree with @gvramana 


---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]Support user specified segments in ...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1812
  
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4185/



---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]Support user specified segments in ...

Posted by manishgupta88 <gi...@git.apache.org>.
Github user manishgupta88 commented on the issue:

    https://github.com/apache/carbondata/pull/1812
  
    I agree with @gvramana 
    1. We should not use Major/Minor compaction type as they have a specific meaning and both are controlled by the system for taking decisions whether segment is valid to be compacted or not.
    2. We should not use carbon.input.segments.default.seg_compact to set the segments to be compacted.
    3. We should introduce a new compaction type in the DDL 'CUSTOM' as suggested above because it is something like force compaction for the given segments as it will not check for size and frequency of segments. We can work on using the below syntax for custom compaction.
    
    **ALTER TABLE [db_name.]table_name COMPACT 'CUSTOM' WHERE SEGMENT.ID IN (0,5,8)**
    
    Once a table is compacted using Custom compaction, then minor compaction does not hold good for the custom compacted segment. Custom compacted segment should only participate during major compaction if it satisfies the major compaction size property.


---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]Support user specified segments in ...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1812
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5362/



---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]Support user specified segments in ...

Posted by xuchuanyin <gi...@git.apache.org>.
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/1812
  
    @Xaprice @chenliang613 @ravipesala @gvramana 
    
    I think the syntax of segment compaction should be similar with that of other management on segment.
    Currently in carbondata, we delete segment using syntax:
    ```
    DELETE FROM TABLE CarbonDatabase.CarbonTable WHERE SEGMENT.ID IN (0,5,8)
    ```
    And
    ```
    DELETE FROM TABLE CarbonDatabase.CarbonTable WHERE SEGMENT.STARTTIME BEFORE '2017-06-01 12:05:06' 
    ```
    
    So, we can imitate the above syntax and get the followings:
    ```
    ALTER TABLE [db_name.]table_name COMPACT 'MINOR/MAJOR' WHERE SEGMENT.ID IN (0,5,8)
    ```
    And
    ```
    ALTER TABLE [db_name.]table_name COMPACT 'MINOR/MAJOR' WHERE SEGMENT.STARTTIME BEFORE '2017-06-01 12:05:06' AND SEGMENT.STARTTIME AFTER '2017-05-01 12:05:06' 
    ```
    We can support compact segment by specifying IDs and dates.


---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]Support user specified segments in ...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1812
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5387/



---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]Support user specified segments in ...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1812
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5389/



---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]Support user specified segments in ...

Posted by Xaprice <gi...@git.apache.org>.
Github user Xaprice commented on the issue:

    https://github.com/apache/carbondata/pull/1812
  
    I've raised a sub-task for custom compaction for child tables/datamaps:
    https://issues.apache.org/jira/browse/CARBONDATA-2412


---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]Support user specified segments in ...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1812
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4030/



---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]support user specified segments in ...

Posted by Xaprice <gi...@git.apache.org>.
Github user Xaprice commented on the issue:

    https://github.com/apache/carbondata/pull/1812
  
    Hi @chenliang613 , can you please take a look?


---

[GitHub] carbondata pull request #1812: [CARBONDATA-2033]Support user specified segme...

Posted by manishgupta88 <gi...@git.apache.org>.
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1812#discussion_r184403180
  
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/preaaggregate/PreAggregateListeners.scala ---
    @@ -566,6 +567,11 @@ object AlterPreAggregateTableCompactionPostListener extends OperationEventListen
         val compactionType = compactionEvent.carbonMergerMapping.campactionType
         val carbonLoadModel = compactionEvent.carbonLoadModel
         val sparkSession = compactionEvent.sparkSession
    +    val segmentIds = if (compactionType == CompactionType.CUSTOM) {
    +      Some(compactionEvent.carbonMergerMapping.validSegments.map(x => x.getSegmentNo).toList)
    --- End diff --
    
    Custom compaction should be done only for the given table in the Alter SQL. Custom compaction executed on main table should not be applied on child tables/datamaps as one to one mapping of segments might not be there between main table and its child tables.
    Custom compaction can be done for child tables/datamaps directly like main table by specifying the child table/datamap name in the alter SQL.


---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]support user specified segments in ...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1812
  
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2923/



---

[GitHub] carbondata pull request #1812: [CARBONDATA-2033]Support user specified segme...

Posted by manishgupta88 <gi...@git.apache.org>.
Github user manishgupta88 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1812#discussion_r184386961
  
    --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/datacompaction/CompactionSupportSpecifiedSegmentsTest.scala ---
    @@ -0,0 +1,141 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the"License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an"AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.carbondata.spark.testsuite.datacompaction
    +
    +import org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException
    +import org.apache.carbondata.core.constants.CarbonCommonConstants
    +import org.apache.carbondata.core.util.CarbonProperties
    +import org.apache.spark.sql.test.util.QueryTest
    +import org.scalatest.{BeforeAndAfterAll, BeforeAndAfterEach}
    +
    +
    +class CompactionSupportSpecifiedSegmentsTest
    +  extends QueryTest with BeforeAndAfterEach with BeforeAndAfterAll {
    +
    +  val filePath: String = resourcesPath + "/globalsort/sample1.csv"
    +
    +  override def beforeAll(): Unit = {
    +    super.beforeAll()
    +  }
    +
    +  override def afterAll(): Unit = {
    +    super.afterAll()
    +  }
    +
    +  override def beforeEach {
    +    resetConf()
    +    sql("DROP TABLE IF EXISTS seg_compact")
    +    sql(
    +      """
    +        |CREATE TABLE seg_compact
    +        |(id INT, name STRING, city STRING, age INT)
    +        |STORED BY 'org.apache.carbondata.format'
    +        |TBLPROPERTIES('SORT_COLUMNS'='city,name')
    +      """.stripMargin)
    +  }
    +
    +  override def afterEach {
    +    sql("DROP TABLE IF EXISTS seg_compact")
    +  }
    +
    +  private def resetConf() = {
    +    CarbonProperties.getInstance()
    +      .addProperty(CarbonCommonConstants.ENABLE_AUTO_LOAD_MERGE,
    +        CarbonCommonConstants.DEFAULT_ENABLE_AUTO_LOAD_MERGE)
    +  }
    +
    +  test("custom compaction") {
    +    for (i <- 0 until 5) {
    +      sql(s"LOAD DATA LOCAL INPATH '$filePath' INTO TABLE seg_compact")
    +    }
    +    sql("ALTER TABLE seg_compact COMPACT 'CUSTOM' WHERE SEGMENT.ID IN (1,2,3)")
    +
    +    val segments = sql("SHOW SEGMENTS FOR TABLE seg_compact")
    +    val segInfos = segments.collect().map { each =>
    +      ((each.toSeq) (0).toString, (each.toSeq) (1).toString)
    +    }
    +    assert(segInfos.length == 6)
    +    assert(segInfos.contains(("0", "Success")))
    +    assert(segInfos.contains(("1", "Compacted")))
    +    assert(segInfos.contains(("2", "Compacted")))
    +    assert(segInfos.contains(("3", "Compacted")))
    +    assert(segInfos.contains(("1.1", "Success")))
    +    assert(segInfos.contains(("4", "Success")))
    +  }
    +
    +  test("custom compaction with preagg datamap"){
    +    sql(
    +      s"""create datamap preagg_sum on table seg_compact using 'preaggregate' as select id,sum(age) from seg_compact group by id"""
    +        .stripMargin)
    +    for (i <- 0 until 5) {
    +      sql(s"LOAD DATA LOCAL INPATH '$filePath' INTO TABLE seg_compact")
    +    }
    +    sql("ALTER TABLE seg_compact COMPACT 'CUSTOM' WHERE SEGMENT.ID IN (1,2,3)")
    +    val segments = sql("SHOW SEGMENTS FOR TABLE seg_compact_preagg_sum")
    +    val segInfos = segments.collect().map { each =>
    +      ((each.toSeq) (0).toString, (each.toSeq) (1).toString)
    +    }
    --- End diff --
    
    what will be the behavior of custom compaction when preaggregate datamap exists but segment No 1,2,3 does not exist in the preaggregate datamap?


---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]Support user specified segments in ...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1812
  
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4465/



---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]Support user specified segments in ...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1812
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4293/



---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]Support user specified segments in ...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1812
  
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2786/



---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]Support user specified segments in ...

Posted by Xaprice <gi...@git.apache.org>.
Github user Xaprice commented on the issue:

    https://github.com/apache/carbondata/pull/1812
  
    retest this please


---

[GitHub] carbondata issue #1812: [CARBONDATA-2033]Support user specified segments in ...

Posted by manishgupta88 <gi...@git.apache.org>.
Github user manishgupta88 commented on the issue:

    https://github.com/apache/carbondata/pull/1812
  
    LGTM


---