You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tajo.apache.org by eminency <gi...@git.apache.org> on 2016/01/15 13:15:24 UTC
[GitHub] tajo pull request: TAJO-2059: Binary search in BST reader does com...
GitHub user eminency opened a pull request:
https://github.com/apache/tajo/pull/945
TAJO-2059: Binary search in BST reader does compare too frequently
Some simple test based on unit test was done with 1M tuples and searching 1M times.
Two columns, which are long and double, are used as sort key.
Count above is a number of invoking compare().
##### Current
find : 44 sec
count : 42001207
##### Patch
find : 43 sec
count : 20141495
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/eminency/tajo bstidx
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/tajo/pull/945.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #945
----
commit d5fdf2f0e658928ac156d83ec4d4dbfb7f1c42e5
Author: Jongyoung Park <em...@gmail.com>
Date: 2016-01-13T07:33:08Z
lightweight improvement
commit a7b2686f1275e05866dc4d5563a8e0a224fb49e1
Author: Jongyoung Park <em...@gmail.com>
Date: 2016-01-13T09:22:25Z
Refine binary search
commit 58a59e832a8bba350814ab92ef421ae95f5be158
Author: Jongyoung Park <em...@gmail.com>
Date: 2016-01-15T07:39:56Z
test code
commit 74c857dcab827b662d565e13284f665723eb798c
Author: Jongyoung Park <em...@gmail.com>
Date: 2016-01-15T12:05:10Z
Revert "test code"
This reverts commit 58a59e832a8bba350814ab92ef421ae95f5be158.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] tajo pull request: TAJO-2059: Binary search in BST reader does com...
Posted by eminency <gi...@git.apache.org>.
Github user eminency commented on the pull request:
https://github.com/apache/tajo/pull/945#issuecomment-172755772
@jihoonson
Thank you for the comment.
I'm working on it. I will keep you posted.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] tajo pull request: TAJO-2059: Binary search in BST reader does com...
Posted by jihoonson <gi...@git.apache.org>.
Github user jihoonson commented on a diff in the pull request:
https://github.com/apache/tajo/pull/945#discussion_r52862828
--- Diff: tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/index/bst/BSTIndex.java ---
@@ -794,42 +794,79 @@ private int binarySearch(Tuple[] arr, Tuple key, int startPos, int endPos) {
int offset = -1;
int start = startPos;
int end = endPos;
+ int prevCenter = -1;
//http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6412541
int centerPos = (start + end) >>> 1;
if (arr.length == 0) {
LOG.error("arr.length: 0, loadNum: " + loadNum + ", inited: " + inited.get());
+ return -1;
}
+
+ correctable = false;
+ if (arr.length == 1) {
+ int comp = comparator.compare(arr[0], key);
+
+ if (comp < 0) {
+ return 0;
+ } else if (comp > 0) {
+ return -1;
+ }
+
+ correctable = true;
+ return 0;
+ }
+
while (true) {
- if (comparator.compare(arr[centerPos], key) > 0) {
- if (centerPos == 0) {
- correctable = false;
- break;
- } else if (comparator.compare(arr[centerPos - 1], key) < 0) {
- correctable = false;
- offset = centerPos - 1;
+ if (end - start == 1) {
+ int comp;
+ // prevCenter should be either end or start
+ if (end == prevCenter) {
+ comp = comparator.compare(arr[start], key);
+
+ if (comp == 0) {
+ correctable = true;
+ offset = start;
+ } else if (comp < 0) {
+ offset = start;
+ }
break;
} else {
- end = centerPos;
- centerPos = (start + end) / 2;
- }
- } else if (comparator.compare(arr[centerPos], key) < 0) {
- if (centerPos == arr.length - 1) {
- correctable = false;
- offset = centerPos;
- break;
- } else if (comparator.compare(arr[centerPos + 1], key) > 0) {
- correctable = false;
- offset = centerPos;
+ if (end == arr.length) {
--- End diff --
In this method ```binarySearch()```, ```endPos``` is intended as an exclusive end position. It looks that codes in this block need to consider this intention.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] tajo pull request: TAJO-2059: Binary search in BST reader does com...
Posted by jihoonson <gi...@git.apache.org>.
Github user jihoonson commented on a diff in the pull request:
https://github.com/apache/tajo/pull/945#discussion_r52862691
--- Diff: tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/index/bst/BSTIndex.java ---
@@ -794,42 +794,79 @@ private int binarySearch(Tuple[] arr, Tuple key, int startPos, int endPos) {
int offset = -1;
int start = startPos;
int end = endPos;
+ int prevCenter = -1;
//http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6412541
int centerPos = (start + end) >>> 1;
if (arr.length == 0) {
LOG.error("arr.length: 0, loadNum: " + loadNum + ", inited: " + inited.get());
+ return -1;
}
+
+ correctable = false;
+ if (arr.length == 1) {
+ int comp = comparator.compare(arr[0], key);
+
+ if (comp < 0) {
+ return 0;
+ } else if (comp > 0) {
+ return -1;
+ }
+
+ correctable = true;
+ return 0;
+ }
+
while (true) {
- if (comparator.compare(arr[centerPos], key) > 0) {
- if (centerPos == 0) {
- correctable = false;
- break;
- } else if (comparator.compare(arr[centerPos - 1], key) < 0) {
- correctable = false;
- offset = centerPos - 1;
+ if (end - start == 1) {
+ int comp;
+ // prevCenter should be either end or start
+ if (end == prevCenter) {
+ comp = comparator.compare(arr[start], key);
+
+ if (comp == 0) {
+ correctable = true;
+ offset = start;
+ } else if (comp < 0) {
+ offset = start;
+ }
break;
} else {
- end = centerPos;
- centerPos = (start + end) / 2;
- }
- } else if (comparator.compare(arr[centerPos], key) < 0) {
- if (centerPos == arr.length - 1) {
- correctable = false;
- offset = centerPos;
- break;
- } else if (comparator.compare(arr[centerPos + 1], key) > 0) {
- correctable = false;
- offset = centerPos;
+ if (end == arr.length) {
+ if (comparator.compare(arr[start], key) == 0) {
+ correctable = true;
+ }
+ offset = start;
+ break;
+ }
+
+ comp = comparator.compare(arr[end], key);
+ if (comp == 0) {
+ correctable = true;
+ offset = end;
+ } else if (comp > 0) {
+ offset = start;
+ }
break;
- } else {
- start = centerPos + 1;
- centerPos = (start + end) / 2;
}
- } else {
+ }
+
+ int compareResult = comparator.compare(arr[centerPos], key);
+
+ if (compareResult == 0) {
correctable = true;
offset = centerPos;
break;
+ } else {
+ prevCenter = centerPos;
+
+ if (compareResult > 0) {
+ end = centerPos;
+ } else {
+ start = centerPos;
+ }
+
+ centerPos = (start + end) / 2;
--- End diff --
This line also looks to be changed to ```centerPos = (start + end) >>> 1;```
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] tajo pull request: TAJO-2059: Binary search in BST reader does com...
Posted by jihoonson <gi...@git.apache.org>.
Github user jihoonson commented on the pull request:
https://github.com/apache/tajo/pull/945#issuecomment-172678842
Overall, this patch looks good to me. Would you check the test failures?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---