You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tajo.apache.org by eminency <gi...@git.apache.org> on 2016/01/15 13:15:24 UTC

[GitHub] tajo pull request: TAJO-2059: Binary search in BST reader does com...

GitHub user eminency opened a pull request:

    https://github.com/apache/tajo/pull/945

    TAJO-2059: Binary search in BST reader does compare too frequently

    Some simple test based on unit test was done with 1M tuples and searching 1M times.
    Two columns, which are long and double, are used as sort key.
    Count above is a number of invoking compare().
    
    ##### Current
    
    find : 44 sec
    count : 42001207
    
    ##### Patch
    
    find : 43 sec
    count : 20141495


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/eminency/tajo bstidx

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/tajo/pull/945.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #945
    
----
commit d5fdf2f0e658928ac156d83ec4d4dbfb7f1c42e5
Author: Jongyoung Park <em...@gmail.com>
Date:   2016-01-13T07:33:08Z

    lightweight improvement

commit a7b2686f1275e05866dc4d5563a8e0a224fb49e1
Author: Jongyoung Park <em...@gmail.com>
Date:   2016-01-13T09:22:25Z

    Refine binary search

commit 58a59e832a8bba350814ab92ef421ae95f5be158
Author: Jongyoung Park <em...@gmail.com>
Date:   2016-01-15T07:39:56Z

    test code

commit 74c857dcab827b662d565e13284f665723eb798c
Author: Jongyoung Park <em...@gmail.com>
Date:   2016-01-15T12:05:10Z

    Revert "test code"
    
    This reverts commit 58a59e832a8bba350814ab92ef421ae95f5be158.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-2059: Binary search in BST reader does com...

Posted by eminency <gi...@git.apache.org>.
Github user eminency commented on the pull request:

    https://github.com/apache/tajo/pull/945#issuecomment-172755772
  
    @jihoonson 
    Thank you for the comment.
    I'm working on it. I will keep you posted.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-2059: Binary search in BST reader does com...

Posted by jihoonson <gi...@git.apache.org>.
Github user jihoonson commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/945#discussion_r52862828
  
    --- Diff: tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/index/bst/BSTIndex.java ---
    @@ -794,42 +794,79 @@ private int binarySearch(Tuple[] arr, Tuple key, int startPos, int endPos) {
           int offset = -1;
           int start = startPos;
           int end = endPos;
    +      int prevCenter = -1;
     
           //http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6412541
           int centerPos = (start + end) >>> 1;
           if (arr.length == 0) {
             LOG.error("arr.length: 0, loadNum: " + loadNum + ", inited: " + inited.get());
    +        return -1;
           }
    +
    +      correctable = false;
    +      if (arr.length == 1) {
    +        int comp = comparator.compare(arr[0], key);
    +
    +        if (comp < 0) {
    +          return 0;
    +        } else if (comp > 0) {
    +          return -1;
    +        }
    +
    +        correctable = true;
    +        return 0;
    +      }
    +
           while (true) {
    -        if (comparator.compare(arr[centerPos], key) > 0) {
    -          if (centerPos == 0) {
    -            correctable = false;
    -            break;
    -          } else if (comparator.compare(arr[centerPos - 1], key) < 0) {
    -            correctable = false;
    -            offset = centerPos - 1;
    +        if (end - start == 1) {
    +          int comp;
    +          // prevCenter should be either end or start
    +          if (end == prevCenter) {
    +            comp = comparator.compare(arr[start], key);
    +
    +            if (comp == 0) {
    +              correctable = true;
    +              offset = start;
    +            } else if (comp < 0) {
    +              offset = start;
    +            }
                 break;
               } else {
    -            end = centerPos;
    -            centerPos = (start + end) / 2;
    -          }
    -        } else if (comparator.compare(arr[centerPos], key) < 0) {
    -          if (centerPos == arr.length - 1) {
    -            correctable = false;
    -            offset = centerPos;
    -            break;
    -          } else if (comparator.compare(arr[centerPos + 1], key) > 0) {
    -            correctable = false;
    -            offset = centerPos;
    +            if (end == arr.length) {
    --- End diff --
    
    In this method ```binarySearch()```, ```endPos``` is intended as an exclusive end position. It looks that codes in this block need to consider this intention.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-2059: Binary search in BST reader does com...

Posted by jihoonson <gi...@git.apache.org>.
Github user jihoonson commented on a diff in the pull request:

    https://github.com/apache/tajo/pull/945#discussion_r52862691
  
    --- Diff: tajo-storage/tajo-storage-hdfs/src/main/java/org/apache/tajo/storage/index/bst/BSTIndex.java ---
    @@ -794,42 +794,79 @@ private int binarySearch(Tuple[] arr, Tuple key, int startPos, int endPos) {
           int offset = -1;
           int start = startPos;
           int end = endPos;
    +      int prevCenter = -1;
     
           //http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6412541
           int centerPos = (start + end) >>> 1;
           if (arr.length == 0) {
             LOG.error("arr.length: 0, loadNum: " + loadNum + ", inited: " + inited.get());
    +        return -1;
           }
    +
    +      correctable = false;
    +      if (arr.length == 1) {
    +        int comp = comparator.compare(arr[0], key);
    +
    +        if (comp < 0) {
    +          return 0;
    +        } else if (comp > 0) {
    +          return -1;
    +        }
    +
    +        correctable = true;
    +        return 0;
    +      }
    +
           while (true) {
    -        if (comparator.compare(arr[centerPos], key) > 0) {
    -          if (centerPos == 0) {
    -            correctable = false;
    -            break;
    -          } else if (comparator.compare(arr[centerPos - 1], key) < 0) {
    -            correctable = false;
    -            offset = centerPos - 1;
    +        if (end - start == 1) {
    +          int comp;
    +          // prevCenter should be either end or start
    +          if (end == prevCenter) {
    +            comp = comparator.compare(arr[start], key);
    +
    +            if (comp == 0) {
    +              correctable = true;
    +              offset = start;
    +            } else if (comp < 0) {
    +              offset = start;
    +            }
                 break;
               } else {
    -            end = centerPos;
    -            centerPos = (start + end) / 2;
    -          }
    -        } else if (comparator.compare(arr[centerPos], key) < 0) {
    -          if (centerPos == arr.length - 1) {
    -            correctable = false;
    -            offset = centerPos;
    -            break;
    -          } else if (comparator.compare(arr[centerPos + 1], key) > 0) {
    -            correctable = false;
    -            offset = centerPos;
    +            if (end == arr.length) {
    +              if (comparator.compare(arr[start], key) == 0) {
    +                correctable = true;
    +              }
    +              offset = start;
    +              break;
    +            }
    +
    +            comp = comparator.compare(arr[end], key);
    +            if (comp == 0) {
    +              correctable = true;
    +              offset = end;
    +            } else if (comp > 0) {
    +              offset = start;
    +            }
                 break;
    -          } else {
    -            start = centerPos + 1;
    -            centerPos = (start + end) / 2;
               }
    -        } else {
    +        }
    +
    +        int compareResult = comparator.compare(arr[centerPos], key);
    +
    +        if (compareResult == 0) {
               correctable = true;
               offset = centerPos;
               break;
    +        } else {
    +          prevCenter = centerPos;
    +
    +          if (compareResult > 0) {
    +            end = centerPos;
    +          } else {
    +            start = centerPos;
    +          }
    +
    +          centerPos = (start + end) / 2;
    --- End diff --
    
    This line also looks to be changed to ```centerPos = (start + end) >>> 1;```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tajo pull request: TAJO-2059: Binary search in BST reader does com...

Posted by jihoonson <gi...@git.apache.org>.
Github user jihoonson commented on the pull request:

    https://github.com/apache/tajo/pull/945#issuecomment-172678842
  
    Overall, this patch looks good to me. Would you check the test failures?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---