You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by jackylk <gi...@git.apache.org> on 2018/05/01 09:21:34 UTC

[GitHub] carbondata pull request #2255: [CARBONDATA-2416] Support DEFERRED REBUILD wh...

GitHub user jackylk opened a pull request:

    https://github.com/apache/carbondata/pull/2255

    [CARBONDATA-2416] Support DEFERRED REBUILD when creating DataMap

    This PR is on top on #2254 
    1. REFRESH DATAMAP is changed to REBUILD DATAMAP command
    2. When creating datamap, user can choose to load the datamap immediately or later by manually trigger REBUILD DATAMAP command
    
     - [X] Any interfaces changed?
     No
     - [X] Any backward compatibility impacted?
     No
     - [X] Document update required?
    Yes
     - [X] Testing done
            Please provide details on 
            - Whether new unit test cases have been added or why no new tests are required?
            - How it is tested? Please attach test report.
            - Is it a performance related change? Please attach the performance test report.
            - Any additional information to help reviewers in testing this change.
           Test case added
     - [X] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. 
    NA

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jackylk/incubator-carbondata deferred-rebuild

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2255.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2255
    
----
commit 84f5e13edd6a5ac0cbc786d9df3fc400c4bd0774
Author: Jacky Li <ja...@...>
Date:   2018-05-01T06:51:08Z

    support refresh datamap for all index datamap

commit 81ed2cb6159a90b6eeba9bec866e946cba3cac63
Author: Jacky Li <ja...@...>
Date:   2018-05-01T09:17:33Z

    support deferred rebuild

----


---

[GitHub] carbondata issue #2255: [CARBONDATA-2416] Support DEFERRED REBUILD when crea...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2255
  
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4602/



---

[GitHub] carbondata issue #2255: [CARBONDATA-2416] Support DEFERRED REBUILD when crea...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2255
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5663/



---

[GitHub] carbondata issue #2255: [CARBONDATA-2416] Support DEFERRED REBUILD when crea...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2255
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5556/



---

[GitHub] carbondata issue #2255: [CARBONDATA-2416] Support DEFERRED REBUILD when crea...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2255
  
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4657/



---

[GitHub] carbondata issue #2255: [CARBONDATA-2416] Support DEFERRED REBUILD when crea...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2255
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5761/



---

[GitHub] carbondata pull request #2255: [CARBONDATA-2416] Support DEFERRED REBUILD wh...

Posted by xuchuanyin <gi...@git.apache.org>.
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2255#discussion_r186294368
  
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java ---
    @@ -447,18 +450,22 @@ protected Expression getFilterPredicates(Configuration configuration) {
               cgDataMapExprWrapper.getDataMapSchema(), prunedBlocklets.size());
         }
         // Now try to prune with FG DataMap.
    -    dataMapExprWrapper = DataMapChooser.get()
    +    DataMapExprWrapper fgDataMapExprWrapper = DataMapChooser.get()
             .chooseFGDataMap(getOrCreateCarbonTable(job.getConfiguration()), resolver);
    -    if (dataMapExprWrapper != null && dataMapExprWrapper.getDataMapLevel() == DataMapLevel.FG
    -        && isFgDataMapPruningEnable(job.getConfiguration()) && dataMapJob != null) {
    +    if (fgDataMapExprWrapper != null &&
    --- End diff --
    
    Same as Line435


---

[GitHub] carbondata issue #2255: [CARBONDATA-2416] Support DEFERRED REBUILD when crea...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2255
  
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4587/



---

[GitHub] carbondata issue #2255: [CARBONDATA-2416] Support DEFERRED REBUILD when crea...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2255
  
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4804/



---

[GitHub] carbondata pull request #2255: [CARBONDATA-2416] Support DEFERRED REBUILD wh...

Posted by xuchuanyin <gi...@git.apache.org>.
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2255#discussion_r186294417
  
    --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/datamap/lucene/LuceneFineGrainDataMapSuite.scala ---
    @@ -695,7 +695,29 @@ class LuceneFineGrainDataMapSuite extends QueryTest with BeforeAndAfterAll {
           sql(s"select * from datamap_test5 where name='n10'"))
         checkAnswer(sql("SELECT * FROM datamap_test5 WHERE TEXT_MATCH('city:c020')"),
           sql(s"SELECT * FROM datamap_test5 WHERE city='c020'"))
    +    sql("DROP TABLE IF EXISTS datamap_test5")
    +  }
     
    +  test("test lucene fine grain datamap rebuild") {
    +    sql("DROP TABLE IF EXISTS datamap_test5")
    +    sql(
    +      """
    +        | CREATE TABLE datamap_test5(id INT, name STRING, city STRING, age INT)
    +        | STORED BY 'carbondata'
    +        | TBLPROPERTIES('SORT_COLUMNS'='city,name', 'SORT_SCOPE'='LOCAL_SORT')
    +      """.stripMargin)
    +    sql(
    +      s"""
    +         | CREATE DATAMAP dm ON TABLE datamap_test5
    +         | USING 'lucene'
    +         | WITH DEFERRED REBUILD
    +         | DMProperties('INDEX_COLUMNS'='city')
    +      """.stripMargin)
    +    sql(s"LOAD DATA LOCAL INPATH '$file2' INTO TABLE datamap_test5 OPTIONS('header'='false')")
    +    sql("REBUILD DATAMAP dm ON TABLE datamap_test5")
    --- End diff --
    
    Before this statement, what's the status of the datamap? Can we add a case to test it here?


---

[GitHub] carbondata pull request #2255: [CARBONDATA-2416] Support DEFERRED REBUILD wh...

Posted by xuchuanyin <gi...@git.apache.org>.
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2255#discussion_r186294354
  
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java ---
    @@ -431,7 +432,9 @@ protected Expression getFilterPredicates(Configuration configuration) {
         // Get the available CG datamaps and prune further.
         DataMapExprWrapper cgDataMapExprWrapper = DataMapChooser.get()
             .chooseCGDataMap(getOrCreateCarbonTable(job.getConfiguration()), resolver);
    -    if (cgDataMapExprWrapper != null) {
    +    if (cgDataMapExprWrapper != null &&
    --- End diff --
    
    I think we can do this judgement earlier.
    Currently, we 
    1. iterate all the datamaps,
    2. then filter out the visible ones,
    3. then filter out the corse/fine ones,
    4. then apply the filter-expression to filter out the datamaps,
    5. then judge whether the datamap is enabled...
    We can migrate step5 with step2 and skip step3~4 if the datamap is disabled.


---

[GitHub] carbondata issue #2255: [CARBONDATA-2416] Support DEFERRED REBUILD when crea...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on the issue:

    https://github.com/apache/carbondata/pull/2255
  
    @xuchuanyin With `DEFERED REBUILD` syntax, it means the datamap loading will be lazy. If there are new load to the fact table, this datamap will be disabled until user trigger `REBUILD DATAMAP`.



---

[GitHub] carbondata issue #2255: [CARBONDATA-2416] Support DEFERRED REBUILD when crea...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2255
  
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4747/



---

[GitHub] carbondata issue #2255: [CARBONDATA-2416] Support DEFERRED REBUILD when crea...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2255
  
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4503/



---

[GitHub] carbondata pull request #2255: [CARBONDATA-2416] Support DEFERRED REBUILD wh...

Posted by xuchuanyin <gi...@git.apache.org>.
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2255#discussion_r186294696
  
    --- Diff: integration/spark2/src/test/scala/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMapSuite.scala ---
    @@ -41,25 +42,34 @@ class BloomCoarseGrainDataMapSuite extends QueryTest with BeforeAndAfterAll {
         sql(s"DROP TABLE IF EXISTS $bloomDMSampleTable")
       }
     
    -  private def checkQuery = {
    +  override def afterEach(): Unit = {
    +    sql(s"DROP TABLE IF EXISTS $normalTable")
    +    sql(s"DROP TABLE IF EXISTS $bloomDMSampleTable")
    +  }
    +
    +  private def dmSql(sqlText: String, dataMapName: String, shouldHit: Boolean): DataFrame = {
    --- End diff --
    
    better to optimize the method name


---

[GitHub] carbondata issue #2255: [CARBONDATA-2416] Support DEFERRED REBUILD when crea...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2255
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5669/



---

[GitHub] carbondata issue #2255: [CARBONDATA-2416] Support DEFERRED REBUILD when crea...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2255
  
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4393/



---

[GitHub] carbondata pull request #2255: [CARBONDATA-2416] Support DEFERRED REBUILD wh...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2255#discussion_r186349815
  
    --- Diff: integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala ---
    @@ -837,8 +837,6 @@ object CarbonDataRDDFactory {
                        s"${ carbonLoadModel.getDatabaseName }.${ carbonLoadModel.getTableName }")
           LOGGER.error("Dataload failed due to failure in table status updation.")
           throw new Exception(errorMessage)
    -    } else {
    -      DataMapStatusManager.disableDataMapsOfTable(carbonTable)
    --- End diff --
    
    yes


---

[GitHub] carbondata pull request #2255: [CARBONDATA-2416] Support DEFERRED REBUILD wh...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2255#discussion_r186350055
  
    --- Diff: integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala ---
    @@ -837,8 +837,6 @@ object CarbonDataRDDFactory {
                        s"${ carbonLoadModel.getDatabaseName }.${ carbonLoadModel.getTableName }")
    --- End diff --
    
    fixed


---

[GitHub] carbondata issue #2255: [CARBONDATA-2416] Support DEFERRED REBUILD when crea...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2255
  
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4533/



---

[GitHub] carbondata issue #2255: [CARBONDATA-2416] Support DEFERRED REBUILD when crea...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2255
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5742/



---

[GitHub] carbondata issue #2255: [CARBONDATA-2416] Support DEFERRED REBUILD when crea...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2255
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5557/



---

[GitHub] carbondata pull request #2255: [CARBONDATA-2416] Support DEFERRED REBUILD wh...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2255#discussion_r186348978
  
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java ---
    @@ -431,7 +432,9 @@ protected Expression getFilterPredicates(Configuration configuration) {
         // Get the available CG datamaps and prune further.
         DataMapExprWrapper cgDataMapExprWrapper = DataMapChooser.get()
             .chooseCGDataMap(getOrCreateCarbonTable(job.getConfiguration()), resolver);
    -    if (cgDataMapExprWrapper != null) {
    +    if (cgDataMapExprWrapper != null &&
    --- End diff --
    
    ok, fixed


---

[GitHub] carbondata issue #2255: [CARBONDATA-2416] Support DEFERRED REBUILD when crea...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2255
  
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4812/



---

[GitHub] carbondata issue #2255: [CARBONDATA-2416] Support DEFERRED REBUILD when crea...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2255
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5750/



---

[GitHub] carbondata issue #2255: [CARBONDATA-2416] Support DEFERRED REBUILD when crea...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2255
  
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4750/



---

[GitHub] carbondata issue #2255: [CARBONDATA-2416] Support DEFERRED REBUILD when crea...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2255
  
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4653/



---

[GitHub] carbondata issue #2255: [CARBONDATA-2416] Support DEFERRED REBUILD when crea...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2255
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5693/



---

[GitHub] carbondata pull request #2255: [CARBONDATA-2416] Support DEFERRED REBUILD wh...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2255#discussion_r186349004
  
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java ---
    @@ -447,18 +450,22 @@ protected Expression getFilterPredicates(Configuration configuration) {
               cgDataMapExprWrapper.getDataMapSchema(), prunedBlocklets.size());
         }
         // Now try to prune with FG DataMap.
    -    dataMapExprWrapper = DataMapChooser.get()
    +    DataMapExprWrapper fgDataMapExprWrapper = DataMapChooser.get()
             .chooseFGDataMap(getOrCreateCarbonTable(job.getConfiguration()), resolver);
    -    if (dataMapExprWrapper != null && dataMapExprWrapper.getDataMapLevel() == DataMapLevel.FG
    -        && isFgDataMapPruningEnable(job.getConfiguration()) && dataMapJob != null) {
    +    if (fgDataMapExprWrapper != null &&
    --- End diff --
    
    ok, fixed


---

[GitHub] carbondata issue #2255: [CARBONDATA-2416] Support DEFERRED REBUILD when crea...

Posted by QiangCai <gi...@git.apache.org>.
Github user QiangCai commented on the issue:

    https://github.com/apache/carbondata/pull/2255
  
    LGTM


---

[GitHub] carbondata issue #2255: [CARBONDATA-2416] Support DEFERRED REBUILD when crea...

Posted by xuchuanyin <gi...@git.apache.org>.
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/2255
  
    Can you explain the meaning of `deferred rebuild`?
    The data of datamap will be updated only after we fire the `rebuild` statement?
    Do we need to do it only once or have to do it multiple times?
    For a datamap that is not deferred, what will happen if we call `rebuild` on it?


---

[GitHub] carbondata issue #2255: [CARBONDATA-2416] Support DEFERRED REBUILD when crea...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2255
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5673/



---

[GitHub] carbondata pull request #2255: [CARBONDATA-2416] Support DEFERRED REBUILD wh...

Posted by xuchuanyin <gi...@git.apache.org>.
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2255#discussion_r186294566
  
    --- Diff: integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala ---
    @@ -837,8 +837,6 @@ object CarbonDataRDDFactory {
                        s"${ carbonLoadModel.getDatabaseName }.${ carbonLoadModel.getTableName }")
    --- End diff --
    
    Just come across the style problems. Delete the write spaces after '{' and before '}'.


---

[GitHub] carbondata issue #2255: [CARBONDATA-2416] Support DEFERRED REBUILD when crea...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2255
  
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4652/



---

[GitHub] carbondata pull request #2255: [CARBONDATA-2416] Support DEFERRED REBUILD wh...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2255#discussion_r186299489
  
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java ---
    @@ -431,7 +432,9 @@ protected Expression getFilterPredicates(Configuration configuration) {
         // Get the available CG datamaps and prune further.
         DataMapExprWrapper cgDataMapExprWrapper = DataMapChooser.get()
             .chooseCGDataMap(getOrCreateCarbonTable(job.getConfiguration()), resolver);
    -    if (cgDataMapExprWrapper != null) {
    +    if (cgDataMapExprWrapper != null &&
    +        DataMapStatusManager.isDataMapEnabled(
    +            cgDataMapExprWrapper.getDataMapSchema().getDataMapName())) {
    --- End diff --
    
    This logic should be moved to DataMapStoreManager and DataMapChooser. There when it gets all datamaps it should only enabled datamaps. There should not be any changes here.


---

[GitHub] carbondata pull request #2255: [CARBONDATA-2416] Support DEFERRED REBUILD wh...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2255#discussion_r186350293
  
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/CarbonSession.scala ---
    @@ -110,6 +110,14 @@ class CarbonSession(@transient val sc: SparkContext,
         )
       }
     
    +  /**
    +   * Return true if the specified sql statement will hit the datamap
    +   */
    +  def isDataMapHit(sqlStatement: String, dataMapName: String): Boolean = {
    --- End diff --
    
    fixed


---

[GitHub] carbondata issue #2255: [CARBONDATA-2416] Support DEFERRED REBUILD when crea...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2255
  
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4591/



---

[GitHub] carbondata pull request #2255: [CARBONDATA-2416] Support DEFERRED REBUILD wh...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2255#discussion_r186355275
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/status/DataMapStatusManager.java ---
    @@ -51,6 +51,16 @@ private DataMapStatusManager() {
         return storageProvider.getDataMapStatusDetails();
       }
     
    +  public static boolean isDataMapEnabled(String dataMapName) throws IOException {
    --- End diff --
    
    ok. will be fixed


---

[GitHub] carbondata issue #2255: [CARBONDATA-2416] Support DEFERRED REBUILD when crea...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2255
  
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4744/



---

[GitHub] carbondata pull request #2255: [CARBONDATA-2416] Support DEFERRED REBUILD wh...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2255#discussion_r186349611
  
    --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/datamap/lucene/LuceneFineGrainDataMapSuite.scala ---
    @@ -695,7 +695,29 @@ class LuceneFineGrainDataMapSuite extends QueryTest with BeforeAndAfterAll {
           sql(s"select * from datamap_test5 where name='n10'"))
         checkAnswer(sql("SELECT * FROM datamap_test5 WHERE TEXT_MATCH('city:c020')"),
           sql(s"SELECT * FROM datamap_test5 WHERE city='c020'"))
    +    sql("DROP TABLE IF EXISTS datamap_test5")
    +  }
     
    +  test("test lucene fine grain datamap rebuild") {
    +    sql("DROP TABLE IF EXISTS datamap_test5")
    +    sql(
    +      """
    +        | CREATE TABLE datamap_test5(id INT, name STRING, city STRING, age INT)
    +        | STORED BY 'carbondata'
    +        | TBLPROPERTIES('SORT_COLUMNS'='city,name', 'SORT_SCOPE'='LOCAL_SORT')
    +      """.stripMargin)
    +    sql(
    +      s"""
    +         | CREATE DATAMAP dm ON TABLE datamap_test5
    +         | USING 'lucene'
    +         | WITH DEFERRED REBUILD
    +         | DMProperties('INDEX_COLUMNS'='city')
    +      """.stripMargin)
    +    sql(s"LOAD DATA LOCAL INPATH '$file2' INTO TABLE datamap_test5 OPTIONS('header'='false')")
    +    sql("REBUILD DATAMAP dm ON TABLE datamap_test5")
    --- End diff --
    
    added an assert


---

[GitHub] carbondata pull request #2255: [CARBONDATA-2416] Support DEFERRED REBUILD wh...

Posted by xuchuanyin <gi...@git.apache.org>.
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2255#discussion_r186294094
  
    --- Diff: datamap/lucene/src/main/java/org/apache/carbondata/datamap/lucene/LuceneDataMapFactoryBase.java ---
    @@ -153,8 +153,8 @@ public DataMapWriter createWriter(Segment segment, String shardName) {
       }
     
       @Override
    -  public DataMapRefresher createRefresher(Segment segment, String shardName) {
    -    return new LuceneDataMapRefresher(getCarbonTable().getTablePath(), dataMapName,
    +  public DataMapBuilder createBuilder(Segment segment, String shardName) {
    --- End diff --
    
    A question that does not relate to this PR: what's the relationship between a `segment` and a `shard`? (in traditional carbondata table storage and in other table storage, such as external table/ hive-partition-like table...)


---

[GitHub] carbondata issue #2255: [CARBONDATA-2416] Support DEFERRED REBUILD when crea...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on the issue:

    https://github.com/apache/carbondata/pull/2255
  
    all comment are handled


---

[GitHub] carbondata pull request #2255: [CARBONDATA-2416] Support DEFERRED REBUILD wh...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2255#discussion_r186311641
  
    --- Diff: datamap/examples/src/minmaxdatamap/main/java/org/apache/carbondata/datamap/examples/MinMaxIndexDataMapFactory.java ---
    @@ -96,7 +96,7 @@ public MinMaxIndexDataMapFactory(CarbonTable carbonTable) {
             dataMapMeta.getIndexedColumns());
       }
     
    -  @Override public DataMapRefresher createRefresher(Segment segment, String shardName)
    +  @Override public DataMapRefresher createBuilder(Segment segment, String shardName)
    --- End diff --
    
    fixed


---

[GitHub] carbondata issue #2255: [CARBONDATA-2416] Support DEFERRED REBUILD when crea...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2255
  
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4513/



---

[GitHub] carbondata pull request #2255: [CARBONDATA-2416] Support DEFERRED REBUILD wh...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2255#discussion_r186351682
  
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java ---
    @@ -431,7 +432,9 @@ protected Expression getFilterPredicates(Configuration configuration) {
         // Get the available CG datamaps and prune further.
         DataMapExprWrapper cgDataMapExprWrapper = DataMapChooser.get()
             .chooseCGDataMap(getOrCreateCarbonTable(job.getConfiguration()), resolver);
    -    if (cgDataMapExprWrapper != null) {
    +    if (cgDataMapExprWrapper != null &&
    +        DataMapStatusManager.isDataMapEnabled(
    +            cgDataMapExprWrapper.getDataMapSchema().getDataMapName())) {
    --- End diff --
    
    logic modified, please check again


---

[GitHub] carbondata issue #2255: [CARBONDATA-2416] Support DEFERRED REBUILD when crea...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2255
  
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4771/



---

[GitHub] carbondata issue #2255: [CARBONDATA-2416] Support DEFERRED REBUILD when crea...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2255
  
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4822/



---

[GitHub] carbondata issue #2255: [CARBONDATA-2416] Support DEFERRED REBUILD when crea...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2255
  
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4394/



---

[GitHub] carbondata pull request #2255: [CARBONDATA-2416] Support DEFERRED REBUILD wh...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2255#discussion_r186312034
  
    --- Diff: datamap/lucene/src/main/java/org/apache/carbondata/datamap/lucene/LuceneDataMapFactoryBase.java ---
    @@ -153,8 +153,8 @@ public DataMapWriter createWriter(Segment segment, String shardName) {
       }
     
       @Override
    -  public DataMapRefresher createRefresher(Segment segment, String shardName) {
    -    return new LuceneDataMapRefresher(getCarbonTable().getTablePath(), dataMapName,
    +  public DataMapBuilder createBuilder(Segment segment, String shardName) {
    --- End diff --
    
    For a managed table (traditional carbondata table) a shard is an index generated by same write task while loading, inside one segment. 
    For external table, shard concept is the same, there are segments and shards inside each partition folder. There is a segment file in meta folder to indicate the segment and shard path.


---

[GitHub] carbondata pull request #2255: [CARBONDATA-2416] Support DEFERRED REBUILD wh...

Posted by xuchuanyin <gi...@git.apache.org>.
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2255#discussion_r186294632
  
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/CarbonSession.scala ---
    @@ -110,6 +110,14 @@ class CarbonSession(@transient val sc: SparkContext,
         )
       }
     
    +  /**
    +   * Return true if the specified sql statement will hit the datamap
    +   */
    +  def isDataMapHit(sqlStatement: String, dataMapName: String): Boolean = {
    --- End diff --
    
    Is this method only for tests? If so, better to limit its usage, such as adding annotation 'forTests' or other restictions


---

[GitHub] carbondata issue #2255: [CARBONDATA-2416] Support DEFERRED REBUILD when crea...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2255
  
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4573/



---

[GitHub] carbondata pull request #2255: [CARBONDATA-2416] Support DEFERRED REBUILD wh...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2255#discussion_r186299532
  
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/CarbonSession.scala ---
    @@ -110,6 +110,14 @@ class CarbonSession(@transient val sc: SparkContext,
         )
       }
     
    +  /**
    +   * Return true if the specified sql statement will hit the datamap
    +   */
    +  def isDataMapHit(sqlStatement: String, dataMapName: String): Boolean = {
    --- End diff --
    
    Test method move to testcase


---

[GitHub] carbondata pull request #2255: [CARBONDATA-2416] Support DEFERRED REBUILD wh...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2255#discussion_r186355295
  
    --- Diff: integration/spark2/src/test/scala/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMapSuite.scala ---
    @@ -41,25 +42,34 @@ class BloomCoarseGrainDataMapSuite extends QueryTest with BeforeAndAfterAll {
         sql(s"DROP TABLE IF EXISTS $bloomDMSampleTable")
       }
     
    -  private def checkQuery = {
    +  override def afterEach(): Unit = {
    +    sql(s"DROP TABLE IF EXISTS $normalTable")
    +    sql(s"DROP TABLE IF EXISTS $bloomDMSampleTable")
    +  }
    +
    +  private def dmSql(sqlText: String, dataMapName: String, shouldHit: Boolean): DataFrame = {
    --- End diff --
    
    fixed


---

[GitHub] carbondata issue #2255: [CARBONDATA-2416] Support DEFERRED REBUILD when crea...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2255
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5561/



---

[GitHub] carbondata issue #2255: [CARBONDATA-2416] Support DEFERRED REBUILD when crea...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2255
  
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4802/



---

[GitHub] carbondata pull request #2255: [CARBONDATA-2416] Support DEFERRED REBUILD wh...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2255#discussion_r186351820
  
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/CarbonSession.scala ---
    @@ -110,6 +110,14 @@ class CarbonSession(@transient val sc: SparkContext,
         )
       }
     
    +  /**
    +   * Return true if the specified sql statement will hit the datamap
    +   */
    +  def isDataMapHit(sqlStatement: String, dataMapName: String): Boolean = {
    --- End diff --
    
    I have added an annotation for it, I think it is useful in multiple test suite, so add it here


---

[GitHub] carbondata issue #2255: [CARBONDATA-2416] Support DEFERRED REBUILD when crea...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2255
  
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4811/



---

[GitHub] carbondata pull request #2255: [CARBONDATA-2416] Support DEFERRED REBUILD wh...

Posted by xuchuanyin <gi...@git.apache.org>.
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2255#discussion_r186294474
  
    --- Diff: integration/spark-common/src/main/scala/org/apache/spark/sql/catalyst/CarbonDDLSqlParser.scala ---
    @@ -181,7 +181,8 @@ abstract class CarbonDDLSqlParser extends AbstractCarbonSparkSQLParser {
       protected val ON = carbonKeyWord("ON")
       protected val DMPROPERTIES = carbonKeyWord("DMPROPERTIES")
       protected val SELECT = carbonKeyWord("SELECT")
    -  protected val REFRESH = carbonKeyWord("REFRESH")
    +  protected val REBUILD = carbonKeyWord("REBUILD")
    +  protected val DEFERRED = carbonKeyWord("DEFERRED")
    --- End diff --
    
    Now, Carbondata has its own reserved keywords, so we'd better add an document to list them in one place as reference.


---

[GitHub] carbondata issue #2255: [CARBONDATA-2416] Support DEFERRED REBUILD when crea...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2255
  
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4510/



---

[GitHub] carbondata issue #2255: [CARBONDATA-2416] Support DEFERRED REBUILD when crea...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2255
  
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4400/



---

[GitHub] carbondata pull request #2255: [CARBONDATA-2416] Support DEFERRED REBUILD wh...

Posted by xuchuanyin <gi...@git.apache.org>.
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2255#discussion_r186294517
  
    --- Diff: integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala ---
    @@ -837,8 +837,6 @@ object CarbonDataRDDFactory {
                        s"${ carbonLoadModel.getDatabaseName }.${ carbonLoadModel.getTableName }")
           LOGGER.error("Dataload failed due to failure in table status updation.")
           throw new Exception(errorMessage)
    -    } else {
    -      DataMapStatusManager.disableDataMapsOfTable(carbonTable)
    --- End diff --
    
    Is it a bug in previous implementation?


---

[GitHub] carbondata pull request #2255: [CARBONDATA-2416] Support DEFERRED REBUILD wh...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/carbondata/pull/2255


---

[GitHub] carbondata pull request #2255: [CARBONDATA-2416] Support DEFERRED REBUILD wh...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2255#discussion_r186299596
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/status/DataMapStatusManager.java ---
    @@ -51,6 +51,16 @@ private DataMapStatusManager() {
         return storageProvider.getDataMapStatusDetails();
       }
     
    +  public static boolean isDataMapEnabled(String dataMapName) throws IOException {
    --- End diff --
    
    By default all datamaps will be disabled when doing loading, but indexdatamap should be taken special case as it will be updated online along with load so  it shouldn't be disabled after load of table. But deffered build it should be disabled after load.


---

[GitHub] carbondata pull request #2255: [CARBONDATA-2416] Support DEFERRED REBUILD wh...

Posted by xuchuanyin <gi...@git.apache.org>.
Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2255#discussion_r186293948
  
    --- Diff: datamap/examples/src/minmaxdatamap/main/java/org/apache/carbondata/datamap/examples/MinMaxIndexDataMapFactory.java ---
    @@ -96,7 +96,7 @@ public MinMaxIndexDataMapFactory(CarbonTable carbonTable) {
             dataMapMeta.getIndexedColumns());
       }
     
    -  @Override public DataMapRefresher createRefresher(Segment segment, String shardName)
    +  @Override public DataMapRefresher createBuilder(Segment segment, String shardName)
    --- End diff --
    
    Move override to the previous line 😄 


---