You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by kumarvishal09 <gi...@git.apache.org> on 2017/03/10 11:16:41 UTC

[GitHub] incubator-carbondata pull request #642: [CARBONDATA-756]Fixed RLE Encoding I...

GitHub user kumarvishal09 opened a pull request:

    https://github.com/apache/incubator-carbondata/pull/642

    [CARBONDATA-756]Fixed RLE Encoding Issue

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kumarvishal09/incubator-carbondata FixedRLEEncodingIssue

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-carbondata/pull/642.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #642
    
----
commit c899d729a79062bcdc14e932d79e2913c92d9ea4
Author: kumarvishal <ku...@gmail.com>
Date:   2017-03-10T11:13:24Z

    Fixed RLE Encoding Issue

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #642: [CARBONDATA-756] Fixed RLE Encoding Issue

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/642
  
    Build Success with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1095/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #642: [CARBONDATA-756] Fixed RLE Encoding Issue

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/642
  
    Build Success with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1111/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #642: [CARBONDATA-756]Fixed RLE Encoding Issue

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/642
  
    @jackylk  This PR is regarding RLE encoding of data, It is not good to have RLE if the compressed data is more than 70% of actual data size, it wastes processing. So we enable RLE only if the data is able to compress less than 70%.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #642: [CARBONDATA-756] Fixed RLE Encoding ...

Posted by kumarvishal09 <gi...@git.apache.org>.
Github user kumarvishal09 commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/642#discussion_r105614538
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/columnar/BlockIndexerStorageForShort.java ---
    @@ -192,12 +192,23 @@ private void compressDataMyOwnWay(ColumnWithShortIndex[] indexes) {
         }
         map.add(start);
         map.add(counter);
    -    this.keyBlock = convertToKeyArray(list);
    -    if (indexes.length == keyBlock.length) {
    -      dataIndexMap = new short[0];
    -    } else {
    +    boolean useRle = (((list.size() + map.size()) * 100) / indexes.length) > 70 ? false : true;
    +    if (useRle) {
    +      this.keyBlock = convertToKeyArray(list);
           dataIndexMap = convertToArray(map);
    +    } else {
    +      this.keyBlock = convertToKeyArray(indexes);
    +      dataIndexMap = new short[0];
    --- End diff --
    
    yes If empty array we will not add Rle encoder in data chunk


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #642: [CARBONDATA-756] Fixed RLE Encoding ...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/642#discussion_r105575706
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/columnar/BlockIndexerStorageForShort.java ---
    @@ -192,12 +192,23 @@ private void compressDataMyOwnWay(ColumnWithShortIndex[] indexes) {
         }
         map.add(start);
         map.add(counter);
    -    this.keyBlock = convertToKeyArray(list);
    -    if (indexes.length == keyBlock.length) {
    -      dataIndexMap = new short[0];
    -    } else {
    +    boolean useRle = (((list.size() + map.size()) * 100) / indexes.length) > 70 ? false : true;
    --- End diff --
    
    suggest to use:
    `(((list.size() + map.size()) * 100) / indexes.length) < 70`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #642: [CARBONDATA-756] Fixed RLE Encoding ...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/642#discussion_r105578013
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/columnar/BlockIndexerStorageForShort.java ---
    @@ -192,12 +192,23 @@ private void compressDataMyOwnWay(ColumnWithShortIndex[] indexes) {
         }
         map.add(start);
         map.add(counter);
    -    this.keyBlock = convertToKeyArray(list);
    -    if (indexes.length == keyBlock.length) {
    -      dataIndexMap = new short[0];
    -    } else {
    +    boolean useRle = (((list.size() + map.size()) * 100) / indexes.length) > 70 ? false : true;
    --- End diff --
    
    Can we decide this in a more heuristic way? Like if we find there are more than 5 pages not doing RLE, then do not pay the cost to try to compress it in all future blocklets.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #642: [CARBONDATA-756] Fixed RLE Encoding Issue

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/642
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #642: [CARBONDATA-756] Fixed RLE Encoding ...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/642#discussion_r105577915
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/columnar/BlockIndexerStorageForShort.java ---
    @@ -192,12 +192,23 @@ private void compressDataMyOwnWay(ColumnWithShortIndex[] indexes) {
         }
         map.add(start);
         map.add(counter);
    -    this.keyBlock = convertToKeyArray(list);
    --- End diff --
    
    This is a comment for `compressMyOwnWay` function, suggest to use `indexes.length / 2` as the initial size to allocate the ArrayList, instead of 10, which is too small and will cause repeated arraylist expansion


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #642: [CARBONDATA-756]Fixed RLE Encoding Issue

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/642
  
    Build Success with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1074/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #642: [CARBONDATA-756] Fixed RLE Encoding ...

Posted by kumarvishal09 <gi...@git.apache.org>.
Github user kumarvishal09 commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/642#discussion_r105614647
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/columnar/BlockIndexerStorageForShort.java ---
    @@ -192,12 +192,23 @@ private void compressDataMyOwnWay(ColumnWithShortIndex[] indexes) {
         }
         map.add(start);
         map.add(counter);
    -    this.keyBlock = convertToKeyArray(list);
    --- End diff --
    
    This is a old code i will update the same 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #642: [CARBONDATA-756] Fixed RLE Encoding ...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/642#discussion_r105576295
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/columnar/BlockIndexerStorageForShort.java ---
    @@ -192,12 +192,23 @@ private void compressDataMyOwnWay(ColumnWithShortIndex[] indexes) {
         }
         map.add(start);
         map.add(counter);
    -    this.keyBlock = convertToKeyArray(list);
    -    if (indexes.length == keyBlock.length) {
    -      dataIndexMap = new short[0];
    -    } else {
    +    boolean useRle = (((list.size() + map.size()) * 100) / indexes.length) > 70 ? false : true;
    +    if (useRle) {
    +      this.keyBlock = convertToKeyArray(list);
           dataIndexMap = convertToArray(map);
    +    } else {
    +      this.keyBlock = convertToKeyArray(indexes);
    +      dataIndexMap = new short[0];
    --- End diff --
    
    So we are judging based on whether it is empty array when reading?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #642: [CARBONDATA-756] Fixed RLE Encoding Issue

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/642
  
    Build Success with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1108/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #642: [CARBONDATA-756]Fixed RLE Encoding Issue

Posted by chenliang613 <gi...@git.apache.org>.
Github user chenliang613 commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/642
  
    please change the title as per the format: [CARBONDATA-issue number>] Title of the pull request (need to add a blank)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #642: [CARBONDATA-756]Fixed RLE Encoding I...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/642#discussion_r105408221
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/columnar/BlockIndexerStorageForShort.java ---
    @@ -192,12 +192,24 @@ private void compressDataMyOwnWay(ColumnWithShortIndex[] indexes) {
         }
         map.add(start);
         map.add(counter);
    -    this.keyBlock = convertToKeyArray(list);
    -    if (indexes.length == keyBlock.length) {
    -      dataIndexMap = new short[0];
    -    } else {
    +    boolean useRle = (list.size() > indexes.length
    --- End diff --
    
    I guess you can simply as below.
    ```
    boolean useRle = !((((list.size() + map.size()) * 100) / indexes.length) > 70);
    ```
    I think need of `list.size() > indexes.length` is not required as the percentage calculation can include this as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #642: [CARBONDATA-756] Fixed RLE Encoding ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/incubator-carbondata/pull/642


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #642: [CARBONDATA-756]Fixed RLE Encoding Issue

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/642
  
    Please describe this PR


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata issue #642: [CARBONDATA-756]Fixed RLE Encoding Issue

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/incubator-carbondata/pull/642
  
    Build Failed  with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1077/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---