You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by manishgupta88 <gi...@git.apache.org> on 2018/07/05 14:16:02 UTC

[GitHub] carbondata pull request #2454: [WIP] [CARBONDATA-2701] Refactor code to stor...

GitHub user manishgupta88 opened a pull request:

    https://github.com/apache/carbondata/pull/2454

    [WIP] [CARBONDATA-2701] Refactor code to store minimal required info in Block and Blocklet Cache

    Things done as part of this PR
    1. Refactored code to keep only minimal information in block and blocklet cache.
    2. Introduced segment properties holder at JVM level to hold the segment properties. As it is heavy object, new segment properties object will be created only when schema or cardinality is changed for a table.
    This PR depends on PR #2437 
    
     - [ ] Any interfaces changed?
     No
     - [ ] Any backward compatibility impacted?
     NA
     - [ ] Document update required?
    No
     - [ ] Testing done
    Yes       
     - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. 
    NA


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/manishgupta88/carbondata refactor_segmentproperties

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2454.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2454
    
----
commit c06de06046da4efe6dc606f410686dcea256d46f
Author: manishgupta88 <to...@...>
Date:   2018-06-25T06:43:00Z

    segregate block and blocklet cache

commit a5017751f45a43ce75a98610214049e1c894e1e7
Author: manishgupta88 <to...@...>
Date:   2018-07-04T15:30:54Z

    Refactor Block and Blocklet DataMap to store only segmentProeprties Index instead of segmentProperties

----


---

[GitHub] carbondata issue #2454: [CARBONDATA-2701] Refactor code to store minimal req...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2454
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6890/



---

[GitHub] carbondata pull request #2454: [CARBONDATA-2701] Refactor code to store mini...

Posted by kumarvishal09 <gi...@git.apache.org>.
Github user kumarvishal09 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2454#discussion_r200870385
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/util/BlockletDataMapUtil.java ---
    @@ -321,4 +328,43 @@ private static boolean isSameColumnSchemaList(List<ColumnSchema> indexFileColumn
         }
         return updatedValues;
       }
    +
    +  /**
    +   * Convert schema to binary
    +   */
    +  public static byte[] convertSchemaToBinary(List<ColumnSchema> columnSchemas) throws IOException {
    +    ByteArrayOutputStream stream = new ByteArrayOutputStream();
    +    DataOutput dataOutput = new DataOutputStream(stream);
    +    dataOutput.writeShort(columnSchemas.size());
    +    for (ColumnSchema columnSchema : columnSchemas) {
    +      if (columnSchema.getColumnReferenceId() == null) {
    +        columnSchema.setColumnReferenceId(columnSchema.getColumnUniqueId());
    +      }
    +      columnSchema.write(dataOutput);
    +    }
    +    byte[] byteArray = stream.toByteArray();
    +    // Compress with snappy to reduce the size of schema
    +    return Snappy.rawCompress(byteArray, byteArray.length);
    --- End diff --
    
    Use compressor factory.


---

[GitHub] carbondata issue #2454: [WIP] [CARBONDATA-2701] Refactor code to store minim...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2454
  
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5636/



---

[GitHub] carbondata issue #2454: [CARBONDATA-2701] Refactor code to store minimal req...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2454
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6927/



---

[GitHub] carbondata issue #2454: [CARBONDATA-2701] Refactor code to store minimal req...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2454
  
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5701/



---

[GitHub] carbondata pull request #2454: [CARBONDATA-2701] Refactor code to store mini...

Posted by kumarvishal09 <gi...@git.apache.org>.
Github user kumarvishal09 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2454#discussion_r200965517
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/BlockletDataMapIndexStore.java ---
    @@ -184,6 +185,23 @@ public BlockletDataMapIndexWrapper get(TableBlockIndexUniqueIdentifierWrapper id
        */
       @Override public void invalidate(
           TableBlockIndexUniqueIdentifierWrapper tableSegmentUniqueIdentifierWrapper) {
    +    BlockletDataMapIndexWrapper blockletDataMapIndexWrapper =
    +        getIfPresent(tableSegmentUniqueIdentifierWrapper);
    +    if (null != blockletDataMapIndexWrapper) {
    +      // clear the segmentProperties cache
    +      List<BlockDataMap> dataMaps = blockletDataMapIndexWrapper.getDataMaps();
    +      if (null != dataMaps) {
    +        String segmentId =
    +            tableSegmentUniqueIdentifierWrapper.getTableBlockIndexUniqueIdentifier().getSegmentId();
    +        for (BlockDataMap dataMap : dataMaps) {
    --- End diff --
    
    this for loop will run ony once , change this to dataMap.get(0)


---

[GitHub] carbondata issue #2454: [CARBONDATA-2701] Refactor code to store minimal req...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2454
  
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5713/



---

[GitHub] carbondata issue #2454: [CARBONDATA-2701] Refactor code to store minimal req...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2454
  
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5712/



---

[GitHub] carbondata issue #2454: [WIP] [CARBONDATA-2701] Refactor code to store minim...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2454
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6843/



---

[GitHub] carbondata issue #2454: [CARBONDATA-2701] Refactor code to store minimal req...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2454
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6934/



---

[GitHub] carbondata issue #2454: [CARBONDATA-2701] Refactor code to store minimal req...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2454
  
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5670/



---

[GitHub] carbondata issue #2454: [CARBONDATA-2701] Refactor code to store minimal req...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2454
  
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5718/



---

[GitHub] carbondata pull request #2454: [CARBONDATA-2701] Refactor code to store mini...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/carbondata/pull/2454


---

[GitHub] carbondata issue #2454: [CARBONDATA-2701] Refactor code to store minimal req...

Posted by kumarvishal09 <gi...@git.apache.org>.
Github user kumarvishal09 commented on the issue:

    https://github.com/apache/carbondata/pull/2454
  
    LGTM except few minor comments 
    @manishgupta88 Please check


---

[GitHub] carbondata issue #2454: [CARBONDATA-2701] Refactor code to store minimal req...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2454
  
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5700/



---

[GitHub] carbondata issue #2454: [WIP] [CARBONDATA-2701] Refactor code to store minim...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2454
  
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5638/



---

[GitHub] carbondata pull request #2454: [CARBONDATA-2701] Refactor code to store mini...

Posted by kumarvishal09 <gi...@git.apache.org>.
Github user kumarvishal09 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2454#discussion_r200870697
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java ---
    @@ -17,7 +17,11 @@
     package org.apache.carbondata.core.indexstore.blockletindex;
     
     import java.io.IOException;
    -import java.util.*;
    +import java.util.ArrayList;
    --- End diff --
    
    Remove unnecessary changes 


---

[GitHub] carbondata issue #2454: [CARBONDATA-2701] Refactor code to store minimal req...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2454
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6928/



---

[GitHub] carbondata issue #2454: [CARBONDATA-2701] Refactor code to store minimal req...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2454
  
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5706/



---

[GitHub] carbondata pull request #2454: [CARBONDATA-2701] Refactor code to store mini...

Posted by kumarvishal09 <gi...@git.apache.org>.
Github user kumarvishal09 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2454#discussion_r200966124
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockDataMap.java ---
    @@ -16,7 +16,9 @@
      */
     package org.apache.carbondata.core.indexstore.blockletindex;
     
    -import java.io.*;
    +import java.io.IOException;
    --- End diff --
    
    Remove this change 


---

[GitHub] carbondata issue #2454: [WIP] [CARBONDATA-2701] Refactor code to store minim...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2454
  
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5637/



---

[GitHub] carbondata issue #2454: [WIP] [CARBONDATA-2701] Refactor code to store minim...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2454
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6845/



---

[GitHub] carbondata issue #2454: [CARBONDATA-2701] Refactor code to store minimal req...

Posted by manishgupta88 <gi...@git.apache.org>.
Github user manishgupta88 commented on the issue:

    https://github.com/apache/carbondata/pull/2454
  
    @kumarvishal09 ..As discussed with you I have handled these comments as part of PR https://github.com/apache/carbondata/pull/2467


---

[GitHub] carbondata pull request #2454: [CARBONDATA-2701] Refactor code to store mini...

Posted by kumarvishal09 <gi...@git.apache.org>.
Github user kumarvishal09 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2454#discussion_r200870442
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/util/BlockletDataMapUtil.java ---
    @@ -321,4 +328,43 @@ private static boolean isSameColumnSchemaList(List<ColumnSchema> indexFileColumn
         }
         return updatedValues;
       }
    +
    +  /**
    +   * Convert schema to binary
    +   */
    +  public static byte[] convertSchemaToBinary(List<ColumnSchema> columnSchemas) throws IOException {
    +    ByteArrayOutputStream stream = new ByteArrayOutputStream();
    +    DataOutput dataOutput = new DataOutputStream(stream);
    +    dataOutput.writeShort(columnSchemas.size());
    +    for (ColumnSchema columnSchema : columnSchemas) {
    +      if (columnSchema.getColumnReferenceId() == null) {
    +        columnSchema.setColumnReferenceId(columnSchema.getColumnUniqueId());
    +      }
    +      columnSchema.write(dataOutput);
    +    }
    +    byte[] byteArray = stream.toByteArray();
    +    // Compress with snappy to reduce the size of schema
    +    return Snappy.rawCompress(byteArray, byteArray.length);
    +  }
    +
    +  /**
    +   * Read column schema from binary
    +   *
    +   * @param schemaArray
    +   * @throws IOException
    +   */
    +  public static List<ColumnSchema> readColumnSchema(byte[] schemaArray) throws IOException {
    +    // uncompress it.
    +    schemaArray = Snappy.uncompress(schemaArray);
    --- End diff --
    
    Same as abive


---