You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by ravipesala <gi...@git.apache.org> on 2017/09/21 17:47:48 UTC

[GitHub] carbondata pull request #1377: [WIP] Fixed refresh of segments in datamap fo...

GitHub user ravipesala opened a pull request:

    https://github.com/apache/carbondata/pull/1377

    [WIP] Fixed refresh of segments in datamap for update and partition

    Currently datamap scans complete segment every time for query execution to get all the carbonindex files. It will be slow when data/number of files are big. 
    So this PR caches the content of segment and refreshes when any updation or store changes.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ravipesala/incubator-carbondata datamap-refresh

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/1377.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1377
    
----
commit 1f0e834353946dc53ed2f5686e223951fd8c9707
Author: Ravindra Pesala <ra...@gmail.com>
Date:   2017-09-21T13:33:06Z

    Fixed refersh of segments in datamap for update and partition

----


---

[GitHub] carbondata pull request #1377: [CARBONDATA-1504] Fixed refresh of segments i...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1377#discussion_r141355777
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java ---
    @@ -87,6 +93,7 @@ public TableDataMap getDataMap(AbsoluteTableIdentifier identifier,
       public TableDataMap createAndRegisterDataMap(AbsoluteTableIdentifier identifier,
           String factoryClassName, String dataMapName) {
         String table = identifier.uniqueName();
    +    getTableSegmentRefresher(identifier);
    --- End diff --
    
    ignoring the return object?


---

[GitHub] carbondata issue #1377: [WIP] Fixed refresh of segments in datamap for updat...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1377
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/263/



---

[GitHub] carbondata pull request #1377: [CARBONDATA-1504] Fixed refresh of segments i...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1377#discussion_r141528239
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java ---
    @@ -120,15 +121,17 @@ public void clear(String segmentId) {
         if (blockIndexes != null) {
           for (TableBlockIndexUniqueIdentifier blockIndex : blockIndexes) {
             DataMap dataMap = cache.getIfPresent(blockIndex);
    -        dataMap.clear();
    -        cache.invalidate(blockIndex);
    +        if (dataMap != null) {
    +          cache.invalidate(blockIndex);
    +          dataMap.clear();
    +        }
           }
         }
       }
     
       @Override
       public void clear() {
    -    for (String segmentId: segmentMap.keySet()) {
    +    for (String segmentId: segmentMap.keySet().toArray(new String[segmentMap.size()])) {
    --- End diff --
    
    It is needed as we cannot we cannot iterate on set and remove an entry at the same time. So here it is converted to array so that we can remove an entry from map in clear method


---

[GitHub] carbondata issue #1377: [WIP] Fixed refresh of segments in datamap for updat...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1377
  
    Build Success with Spark 1.6, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/139/



---

[GitHub] carbondata pull request #1377: [CARBONDATA-1504] Fixed refresh of segments i...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1377#discussion_r141528559
  
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java ---
    @@ -292,6 +291,26 @@ private AbsoluteTableIdentifier getAbsoluteTableIdentifier(Configuration configu
           }
         }
     
    +    // Clean the updated segments from memory if the update happens on segments
    --- End diff --
    
    Do we need send the `SegmentUpdateDetails` detail information to `TableDataMap` ? I think it is better finding out the refreshed segments can be kept here.


---

[GitHub] carbondata pull request #1377: [CARBONDATA-1504] Fixed refresh of segments i...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1377#discussion_r141357648
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java ---
    @@ -120,15 +121,17 @@ public void clear(String segmentId) {
         if (blockIndexes != null) {
           for (TableBlockIndexUniqueIdentifier blockIndex : blockIndexes) {
             DataMap dataMap = cache.getIfPresent(blockIndex);
    -        dataMap.clear();
    -        cache.invalidate(blockIndex);
    +        if (dataMap != null) {
    +          cache.invalidate(blockIndex);
    +          dataMap.clear();
    +        }
           }
         }
       }
     
       @Override
       public void clear() {
    -    for (String segmentId: segmentMap.keySet()) {
    +    for (String segmentId: segmentMap.keySet().toArray(new String[segmentMap.size()])) {
    --- End diff --
    
    Why this is needed? 


---

[GitHub] carbondata pull request #1377: [CARBONDATA-1504] Fixed refresh of segments i...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1377#discussion_r141356113
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java ---
    @@ -126,20 +133,20 @@ private TableDataMap getTableDataMap(String dataMapName,
       /**
        * Clear the datamap/datamaps of a mentioned datamap name and table from memory
    --- End diff --
    
    please modify the comment also, datamap name is removed


---

[GitHub] carbondata pull request #1377: [CARBONDATA-1504] Fixed refresh of segments i...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1377#discussion_r141528126
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java ---
    @@ -151,4 +158,61 @@ public static DataMapStoreManager getInstance() {
         return instance;
       }
     
    +  public TableSegmentRefresher getTableSegmentRefresher(AbsoluteTableIdentifier identifier) {
    +    String uniqueName = identifier.uniqueName();
    +    if (segmentRefreshMap.get(uniqueName) == null) {
    +      segmentRefreshMap.put(uniqueName, new TableSegmentRefresher(identifier));
    +    }
    +    return segmentRefreshMap.get(uniqueName);
    +  }
    +
    +  /**
    +   * Keep track of the segment refresh time.
    +   */
    +  public static class TableSegmentRefresher {
    +
    +    private Map<String, Long> segmentRefreshTime = new HashMap<>();
    --- End diff --
    
    ok


---

[GitHub] carbondata pull request #1377: [CARBONDATA-1504] Fixed refresh of segments i...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1377#discussion_r141527297
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java ---
    @@ -126,20 +133,20 @@ private TableDataMap getTableDataMap(String dataMapName,
       /**
        * Clear the datamap/datamaps of a mentioned datamap name and table from memory
    --- End diff --
    
    ok


---

[GitHub] carbondata pull request #1377: [CARBONDATA-1504] Fixed refresh of segments i...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1377#discussion_r141527158
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java ---
    @@ -87,6 +93,7 @@ public TableDataMap getDataMap(AbsoluteTableIdentifier identifier,
       public TableDataMap createAndRegisterDataMap(AbsoluteTableIdentifier identifier,
           String factoryClassName, String dataMapName) {
         String table = identifier.uniqueName();
    +    getTableSegmentRefresher(identifier);
    --- End diff --
    
    Yes, not required as we want only update segmentRefreshMap here at first time.


---

[GitHub] carbondata pull request #1377: [CARBONDATA-1504] Fixed refresh of segments i...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1377#discussion_r141356751
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java ---
    @@ -151,4 +158,61 @@ public static DataMapStoreManager getInstance() {
         return instance;
       }
     
    +  public TableSegmentRefresher getTableSegmentRefresher(AbsoluteTableIdentifier identifier) {
    +    String uniqueName = identifier.uniqueName();
    +    if (segmentRefreshMap.get(uniqueName) == null) {
    +      segmentRefreshMap.put(uniqueName, new TableSegmentRefresher(identifier));
    +    }
    +    return segmentRefreshMap.get(uniqueName);
    +  }
    +
    +  /**
    +   * Keep track of the segment refresh time.
    +   */
    +  public static class TableSegmentRefresher {
    +
    +    private Map<String, Long> segmentRefreshTime = new HashMap<>();
    --- End diff --
    
    add comment to describe the content, like map segmentId to last updated timestamp


---

[GitHub] carbondata issue #1377: [CARBONDATA-1504] Fixed refresh of segments in datam...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on the issue:

    https://github.com/apache/carbondata/pull/1377
  
    LGTM


---

[GitHub] carbondata pull request #1377: [CARBONDATA-1504] Fixed refresh of segments i...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1377#discussion_r141359606
  
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java ---
    @@ -292,6 +291,26 @@ private AbsoluteTableIdentifier getAbsoluteTableIdentifier(Configuration configu
           }
         }
     
    +    // Clean the updated segments from memory if the update happens on segments
    --- End diff --
    
    Suggest to extract these logic as a method to refresh the datamap, and put this method in TableDataMap


---

[GitHub] carbondata issue #1377: [CARBONDATA-1504] Fixed refresh of segments in datam...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1377
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/313/



---

[GitHub] carbondata issue #1377: [CARBONDATA-1504] Fixed refresh of segments in datam...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1377
  
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/944/



---

[GitHub] carbondata pull request #1377: [CARBONDATA-1504] Fixed refresh of segments i...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1377#discussion_r140658131
  
    --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/datamap/DataMapWriterSuite.scala ---
    @@ -41,9 +41,9 @@ class C2DataMapFactory() extends DataMapFactory {
     
       override def fireEvent(event: ChangeEvent[_]): Unit = ???
     
    -  override def clear(segmentId: String): Unit = ???
    +  override def clear(segmentId: String): Unit = {}
    --- End diff --
    
    why change this?


---

[GitHub] carbondata pull request #1377: [CARBONDATA-1504] Fixed refresh of segments i...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/carbondata/pull/1377


---

[GitHub] carbondata issue #1377: [CARBONDATA-1504] Fixed refresh of segments in datam...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1377
  
    Build Success with Spark 1.6, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/188/



---

[GitHub] carbondata pull request #1377: [CARBONDATA-1504] Fixed refresh of segments i...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1377#discussion_r141526946
  
    --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/datamap/DataMapWriterSuite.scala ---
    @@ -41,9 +41,9 @@ class C2DataMapFactory() extends DataMapFactory {
     
       override def fireEvent(event: ChangeEvent[_]): Unit = ???
     
    -  override def clear(segmentId: String): Unit = ???
    +  override def clear(segmentId: String): Unit = {}
    --- End diff --
    
    Earlier clear was not calling, but not it is called so implementation has to be provided otherwise testcase fails.


---

[GitHub] carbondata issue #1377: [WIP] Fixed refresh of segments in datamap for updat...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1377
  
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/894/



---