You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by simafengyun <gi...@git.apache.org> on 2017/03/09 09:11:12 UTC
[GitHub] incubator-carbondata pull request #638: Carbondata 748
GitHub user simafengyun opened a pull request:
https://github.com/apache/incubator-carbondata/pull/638
Carbondata 748
use binary search to improve performance according to filter values' order
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/simafengyun/incubator-carbondata CARBONDATA-748
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-carbondata/pull/638.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #638
----
commit 252649eecee6a7b74eef5a7b7f17d58a363c09ea
Author: mayun <ma...@10.100.56.61>
Date: 2017-03-09T05:13:22Z
use binary search to improve the performance in method
setFilterdIndexToBitSet
commit c50054fa519cc1004b78941cf88541f7ad838976
Author: mayun <ma...@10.100.56.61>
Date: 2017-03-09T07:51:50Z
add binary range search and add test case
commit 25839b1425986cc95275b5e628e03d3fa8d19103
Author: mayun <ma...@10.100.56.61>
Date: 2017-03-09T08:08:21Z
revert previous change
commit 0644946a8bb9877ccdafd96420b091364d126669
Author: mayun <ma...@10.100.56.61>
Date: 2017-03-09T08:38:29Z
format changed code
commit 516c5541722f12dffe5c709238bbb8a2f64e65dc
Author: mayun <ma...@10.100.56.61>
Date: 2017-03-09T09:09:06Z
change code format to pass check style
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] incubator-carbondata issue #638: Carbondata 748
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/incubator-carbondata/pull/638
Build Success with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1055/
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] incubator-carbondata issue #638: Carbondata 748
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/incubator-carbondata/pull/638
Build Success with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1056/
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] incubator-carbondata pull request #638: [Carbondata 748] use binary search i...
Posted by mayunSaicmotor <gi...@git.apache.org>.
Github user mayunSaicmotor commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/638#discussion_r105416528
--- Diff: core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java ---
@@ -419,6 +419,94 @@ public static int getFirstIndexUsingBinarySearch(FixedLengthDimensionDataChunk d
return -(low + 1);
}
+ public static int[] getRangeIndexUsingBinarySearch(
--- End diff --
you are right, I really done binary search even for getting the ranges previously, but yesterday I done performance test and found the performance is not better than current logic. the binary search range has advantage only under the condition of data array size is very long and the repeated data is too much. But usually the data array size is 12000 for a chunk, not too long. So the binary search range has no advantage and I decide to keep the current logic
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] incubator-carbondata pull request #638: [Carbondata 748] use binary search i...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/638#discussion_r105404228
--- Diff: core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java ---
@@ -419,6 +419,94 @@ public static int getFirstIndexUsingBinarySearch(FixedLengthDimensionDataChunk d
return -(low + 1);
}
+ public static int[] getRangeIndexUsingBinarySearch(
--- End diff --
Please provide comments this method
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] incubator-carbondata issue #638: [CARBONDATA-748] use binary search improve ...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/incubator-carbondata/pull/638
Build Success with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1085/
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] incubator-carbondata issue #638: [CARBONDATA-748] use binary search improve ...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:
https://github.com/apache/incubator-carbondata/pull/638
LGTM
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] incubator-carbondata pull request #638: [CARBONDATA-748] use binary search i...
Posted by mayunSaicmotor <gi...@git.apache.org>.
Github user mayunSaicmotor commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/638#discussion_r105429564
--- Diff: core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java ---
@@ -419,6 +419,94 @@ public static int getFirstIndexUsingBinarySearch(FixedLengthDimensionDataChunk d
return -(low + 1);
}
+ public static int[] getRangeIndexUsingBinarySearch(
--- End diff --
comments was added. Is there anything else need to change?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] incubator-carbondata pull request #638: [Carbondata 748] use binary search i...
Posted by mayunSaicmotor <gi...@git.apache.org>.
Github user mayunSaicmotor commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/638#discussion_r105422550
--- Diff: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/IncludeFilterExecuterImpl.java ---
@@ -150,12 +138,15 @@ private BitSet setFilterdIndexToBitSet(DimensionColumnDataChunk dimensionColumnD
BitSet bitSet = new BitSet(numerOfRows);
if (dimensionColumnDataChunk instanceof FixedLengthDimensionDataChunk) {
byte[][] filterValues = dimColumnExecuterInfo.getFilterKeys();
- for (int k = 0; k < filterValues.length; k++) {
- for (int j = 0; j < numerOfRows; j++) {
- if (dimensionColumnDataChunk.compareTo(j, filterValues[k]) == 0) {
- bitSet.set(j);
- }
+ for (int i = 0; i < numerOfRows; i++) {
+
+ int index = CarbonUtil.binarySearch(filterValues, 0, filterValues.length,
--- End diff --
does the below is OK?
private BitSet setFilterdIndexToBitSet(DimensionColumnDataChunk dimensionColumnDataChunk,
int numerOfRows) {
BitSet bitSet = new BitSet(numerOfRows);
if (dimensionColumnDataChunk instanceof FixedLengthDimensionDataChunk) {
byte[][] filterValues = dimColumnExecuterInfo.getFilterKeys();
for (int i = 0; i < numerOfRows; i++) {
if (filterValues.length > 1) {
int index = CarbonUtil.binarySearch(filterValues, 0, filterValues.length - 1,
dimensionColumnDataChunk.getChunkData(i));
if (index >= 0) {
bitSet.set(i);
}
} else if (filterValues.length == 1) {
if (dimensionColumnDataChunk.compareTo(i, filterValues[0]) == 0) {
bitSet.set(i);
}
} else {
break;
}
}
}
return bitSet;
}
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] incubator-carbondata pull request #638: [Carbondata 748] use binary search i...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/638#discussion_r105405605
--- Diff: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/IncludeFilterExecuterImpl.java ---
@@ -150,12 +138,15 @@ private BitSet setFilterdIndexToBitSet(DimensionColumnDataChunk dimensionColumnD
BitSet bitSet = new BitSet(numerOfRows);
if (dimensionColumnDataChunk instanceof FixedLengthDimensionDataChunk) {
byte[][] filterValues = dimColumnExecuterInfo.getFilterKeys();
- for (int k = 0; k < filterValues.length; k++) {
- for (int j = 0; j < numerOfRows; j++) {
- if (dimensionColumnDataChunk.compareTo(j, filterValues[k]) == 0) {
- bitSet.set(j);
- }
+ for (int i = 0; i < numerOfRows; i++) {
+
+ int index = CarbonUtil.binarySearch(filterValues, 0, filterValues.length,
--- End diff --
if `filterValues` size is one then we better avoid this binary search , just compare would be enough.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] incubator-carbondata issue #638: [Carbondata 748] use binary search improve ...
Posted by chenliang613 <gi...@git.apache.org>.
Github user chenliang613 commented on the issue:
https://github.com/apache/incubator-carbondata/pull/638
@mayunSaicmotor please change "[Carbondata 748] " to "[CARBONDATA-748]" for PR's title.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] incubator-carbondata pull request #638: [Carbondata 748] use binary search i...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/638#discussion_r105424505
--- Diff: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/IncludeFilterExecuterImpl.java ---
@@ -150,12 +138,15 @@ private BitSet setFilterdIndexToBitSet(DimensionColumnDataChunk dimensionColumnD
BitSet bitSet = new BitSet(numerOfRows);
if (dimensionColumnDataChunk instanceof FixedLengthDimensionDataChunk) {
byte[][] filterValues = dimColumnExecuterInfo.getFilterKeys();
- for (int k = 0; k < filterValues.length; k++) {
- for (int j = 0; j < numerOfRows; j++) {
- if (dimensionColumnDataChunk.compareTo(j, filterValues[k]) == 0) {
- bitSet.set(j);
- }
+ for (int i = 0; i < numerOfRows; i++) {
+
+ int index = CarbonUtil.binarySearch(filterValues, 0, filterValues.length,
--- End diff --
looks fine
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] incubator-carbondata pull request #638: [CARBONDATA-748] use binary search i...
Posted by mayunSaicmotor <gi...@git.apache.org>.
Github user mayunSaicmotor commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/638#discussion_r105522667
--- Diff: core/src/main/java/org/apache/carbondata/core/scan/filter/executer/IncludeFilterExecuterImpl.java ---
@@ -150,12 +138,15 @@ private BitSet setFilterdIndexToBitSet(DimensionColumnDataChunk dimensionColumnD
BitSet bitSet = new BitSet(numerOfRows);
if (dimensionColumnDataChunk instanceof FixedLengthDimensionDataChunk) {
byte[][] filterValues = dimColumnExecuterInfo.getFilterKeys();
- for (int k = 0; k < filterValues.length; k++) {
- for (int j = 0; j < numerOfRows; j++) {
- if (dimensionColumnDataChunk.compareTo(j, filterValues[k]) == 0) {
- bitSet.set(j);
- }
+ for (int i = 0; i < numerOfRows; i++) {
+
+ int index = CarbonUtil.binarySearch(filterValues, 0, filterValues.length,
--- End diff --
@ravipesala, If put the if clause out of the for clause, it is better?
` private BitSet setFilterdIndexToBitSet(DimensionColumnDataChunk dimensionColumnDataChunk,
int numerOfRows) {
BitSet bitSet = new BitSet(numerOfRows);
if (dimensionColumnDataChunk instanceof FixedLengthDimensionDataChunk) {
byte[][] filterValues = dimColumnExecuterInfo.getFilterKeys();
if (filterValues.length > 1) {
for (int i = 0; i < numerOfRows; i++) {
int index = CarbonUtil.binarySearch(filterValues, 0, filterValues.length - 1,
dimensionColumnDataChunk.getChunkData(i));
if (index >= 0) {
bitSet.set(i);
}
}
} else if (filterValues.length == 1) {
for (int i = 0; i < numerOfRows; i++) {
if (dimensionColumnDataChunk.compareTo(i, filterValues[0]) == 0) {
bitSet.set(i);
}
}
}
}
return bitSet;
}`
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] incubator-carbondata issue #638: Carbondata 748
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/incubator-carbondata/pull/638
Build Failed with Spark 1.6.2, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1054/
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] incubator-carbondata pull request #638: [CARBONDATA-748] use binary search i...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/incubator-carbondata/pull/638
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] incubator-carbondata pull request #638: [Carbondata 748] use binary search i...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on a diff in the pull request:
https://github.com/apache/incubator-carbondata/pull/638#discussion_r105406369
--- Diff: core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java ---
@@ -419,6 +419,94 @@ public static int getFirstIndexUsingBinarySearch(FixedLengthDimensionDataChunk d
return -(low + 1);
}
+ public static int[] getRangeIndexUsingBinarySearch(
--- End diff --
There is not much difference between `getFirstIndexUsingBinarySearch` and this method, I remembered in your last PR you have done binary search even for getting the ranges, what happened to it, did you get any functional or performance issues?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] incubator-carbondata pull request #638: Carbondata 748
Posted by simafengyun <gi...@git.apache.org>.
Github user simafengyun closed the pull request at:
https://github.com/apache/incubator-carbondata/pull/638
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] incubator-carbondata pull request #638: Carbondata 748
Posted by simafengyun <gi...@git.apache.org>.
GitHub user simafengyun reopened a pull request:
https://github.com/apache/incubator-carbondata/pull/638
Carbondata 748
use binary search to improve performance according to filter values' order
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/simafengyun/incubator-carbondata CARBONDATA-748
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-carbondata/pull/638.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #638
----
commit 252649eecee6a7b74eef5a7b7f17d58a363c09ea
Author: mayun <ma...@10.100.56.61>
Date: 2017-03-09T05:13:22Z
use binary search to improve the performance in method
setFilterdIndexToBitSet
commit c50054fa519cc1004b78941cf88541f7ad838976
Author: mayun <ma...@10.100.56.61>
Date: 2017-03-09T07:51:50Z
add binary range search and add test case
commit 25839b1425986cc95275b5e628e03d3fa8d19103
Author: mayun <ma...@10.100.56.61>
Date: 2017-03-09T08:08:21Z
revert previous change
commit 0644946a8bb9877ccdafd96420b091364d126669
Author: mayun <ma...@10.100.56.61>
Date: 2017-03-09T08:38:29Z
format changed code
commit 516c5541722f12dffe5c709238bbb8a2f64e65dc
Author: mayun <ma...@10.100.56.61>
Date: 2017-03-09T09:09:06Z
change code format to pass check style
commit 141e26425ed7296b661a5382a4fe168e33fb71d1
Author: mayun <ma...@10.100.56.61>
Date: 2017-03-09T09:51:22Z
revert the code to use inverted index
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---