You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@accumulo.apache.org by milleruntime <gi...@git.apache.org> on 2017/05/08 19:17:43 UTC

[GitHub] accumulo pull request #257: ACCUMULO-4636 system iterator improvements for 1...

GitHub user milleruntime opened a pull request:

    https://github.com/apache/accumulo/pull/257

    ACCUMULO-4636 system iterator improvements for 1.7

    Back ported some of the improvements from ACCUMULO-3079 and PR #244 to 1.7 (and will also merge with 1.8).  These improvements just include internal changes and leave out the new optimized server iterators introduced in 2.0.  
    
    This PR is mainly a sanity check since there are a decent number of changes but I would also still accept any new feedback. 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/milleruntime/accumulo ACCUMULO-4636

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/accumulo/pull/257.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #257
    
----
commit 2ff26780e77ff87ae62bd85f815655d639cf8519
Author: Mike Miller <mm...@apache.org>
Date:   2017-05-05T20:21:20Z

    ACCUMULO-4636 system iterator improvements

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #257: ACCUMULO-4636 system iterator improvements for 1.7

Posted by ctubbsii <gi...@git.apache.org>.
Github user ctubbsii commented on the issue:

    https://github.com/apache/accumulo/pull/257
  
    Once again, please don't forget to close the JIRA once a patch is merged and the issue is complete. :)
    For better or worse, we're still doing our primary bug tracking in JIRA.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo pull request #257: ACCUMULO-4636 system iterator improvements for 1...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/accumulo/pull/257


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #257: ACCUMULO-4636 system iterator improvements for 1.7

Posted by keith-turner <gi...@git.apache.org>.
Github user keith-turner commented on the issue:

    https://github.com/apache/accumulo/pull/257
  
    I threw it in a spreadsheet and did the math.  It was a 3.53% speedup


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #257: ACCUMULO-4636 system iterator improvements for 1.7

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on the issue:

    https://github.com/apache/accumulo/pull/257
  
    @keith-turner do you have the resources to spin up stuff on ec2 so that we can compare it to previous release's numbers? It would be nice to do a coarse comparison, lest we do another maintenance release and accidentally include a perf regression..


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #257: ACCUMULO-4636 system iterator improvements for 1.7

Posted by milleruntime <gi...@git.apache.org>.
Github user milleruntime commented on the issue:

    https://github.com/apache/accumulo/pull/257
  
    I merged the changes with the 1.8 branch and created benchmarks for them as well. Ran those benchmarks on my laptop and got similar results.  Full test results can be seen [here](https://github.com/milleruntime/accumulo-benchmark/commit/1906de3a4c2eca37d3df684286fbd15154333185).
    <pre>
    Test: MyBenchmark.seekAllSystemIters10 showed 68.0% increase in ops/s
    Test: MyBenchmark.seekAllSystemIters1000 showed 64.5% increase in ops/s
    Test: MyBenchmark.seekTestWithCQFilter10 showed 4.3% increase in ops/s
    Test: MyBenchmark.seekTestWithCQFilter1000 showed -3.7% increase in ops/s
    Test: MyBenchmark.seekTestWithCfFilter10 showed 32.8% increase in ops/s
    Test: MyBenchmark.seekTestWithCfFilter1000 showed 36.4% increase in ops/s
    Test: MyBenchmark.seekTestWithEmptyAuths10 showed 98.0% increase in ops/s
    Test: MyBenchmark.seekTestWithEmptyAuths1000 showed 120.9% increase in ops/s
    Total average 52.66% increase in ops/s
    </pre>


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #257: ACCUMULO-4636 system iterator improvements for 1.7

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on the issue:

    https://github.com/apache/accumulo/pull/257
  
    Thanks for running with the test, Keith. I'm ok with this on 1.7. A similar sanity check for 1.8 would similarly make me happy for that branch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #257: ACCUMULO-4636 system iterator improvements for 1.7

Posted by keith-turner <gi...@git.apache.org>.
Github user keith-turner commented on the issue:

    https://github.com/apache/accumulo/pull/257
  
    @joshelser I used the following script to run some performance test locally using [Uno](https://github.com/astralway/uno).  I didn't bother computing stats, but there is definitely an observable performance improvement (and no degradation).   That's pretty cool to see, because the performance test run earlier were targeted to just the iterator stack.  This is showing improvements in an end to end test.
    
    ```bash
    #! /usr/bin/env bash
    
    eval "$(./bin/uno env)"
    
    accumulo shell -u root -p secret -e 'createtable test_ingest'
    
    accumulo org.apache.accumulo.test.TestIngest -i uno -u root -p secret --timestamp 1 --size 50 --random 56 --rows 1000000 --start 0 --cols 1 &
    accumulo org.apache.accumulo.test.TestIngest -i uno -u root -p secret --timestamp 1 --size 50 --random 56 --rows 1000000 --start 1000000 --cols 1 &
    accumulo org.apache.accumulo.test.TestIngest -i uno -u root -p secret --timestamp 1 --size 50 --random 56 --rows 1000000 --start 2000000 --cols 1 &
    accumulo org.apache.accumulo.test.TestIngest -i uno -u root -p secret --timestamp 1 --size 50 --random 56 --rows 1000000 --start 3000000 --cols 1 &
    accumulo org.apache.accumulo.test.TestIngest -i uno -u root -p secret --timestamp 1 --size 50 --random 56 --rows 1000000 --start 4000000 --cols 1 &
    
    wait
    
    accumulo shell -u root -p secret -e 'compact -t test_ingest -w'
    
    for run in {1..10}
    do
      accumulo  org.apache.accumulo.test.VerifyIngest -i uno -u root -p secret --size 50 --timestamp 1 --random 56 --rows 5000000 --start 0 --cols 1
    done
    
    ```
    
    Below are the results for 1.7.3
    ```
       5,000,000 records read |  390,502 records/sec |  395,000,000 bytes read | 30,849,734 bytes/sec | 12.804 secs   
       5,000,000 records read |  411,861 records/sec |  395,000,000 bytes read | 32,537,067 bytes/sec | 12.140 secs   
       5,000,000 records read |  424,268 records/sec |  395,000,000 bytes read | 33,517,182 bytes/sec | 11.785 secs   
       5,000,000 records read |  428,338 records/sec |  395,000,000 bytes read | 33,838,773 bytes/sec | 11.673 secs   
       5,000,000 records read |  427,350 records/sec |  395,000,000 bytes read | 33,760,683 bytes/sec | 11.700 secs   
       5,000,000 records read |  428,228 records/sec |  395,000,000 bytes read | 33,830,078 bytes/sec | 11.676 secs   
       5,000,000 records read |  427,606 records/sec |  395,000,000 bytes read | 33,780,894 bytes/sec | 11.693 secs   
       5,000,000 records read |  427,606 records/sec |  395,000,000 bytes read | 33,780,894 bytes/sec | 11.693 secs   
       5,000,000 records read |  431,816 records/sec |  395,000,000 bytes read | 34,113,481 bytes/sec | 11.579 secs   
       5,000,000 records read |  414,250 records/sec |  395,000,000 bytes read | 32,725,766 bytes/sec | 12.070 secs   
    
    ```
    
    Below are the results for this branch
    ```
       5,000,000 records read |  408,563 records/sec |  395,000,000 bytes read | 32,276,515 bytes/sec | 12.238 secs   
       5,000,000 records read |  426,657 records/sec |  395,000,000 bytes read | 33,705,947 bytes/sec | 11.719 secs   
       5,000,000 records read |  447,027 records/sec |  395,000,000 bytes read | 35,315,154 bytes/sec | 11.185 secs   
       5,000,000 records read |  442,830 records/sec |  395,000,000 bytes read | 34,983,615 bytes/sec | 11.291 secs   
       5,000,000 records read |  446,388 records/sec |  395,000,000 bytes read | 35,264,708 bytes/sec | 11.201 secs   
       5,000,000 records read |  445,990 records/sec |  395,000,000 bytes read | 35,233,253 bytes/sec | 11.211 secs   
       5,000,000 records read |  435,957 records/sec |  395,000,000 bytes read | 34,440,666 bytes/sec | 11.469 secs   
       5,000,000 records read |  442,869 records/sec |  395,000,000 bytes read | 34,986,713 bytes/sec | 11.290 secs   
       5,000,000 records read |  439,521 records/sec |  395,000,000 bytes read | 34,722,222 bytes/sec | 11.376 secs   
       5,000,000 records read |  424,592 records/sec |  395,000,000 bytes read | 33,542,798 bytes/sec | 11.776 secs  
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #257: ACCUMULO-4636 system iterator improvements for 1.7

Posted by milleruntime <gi...@git.apache.org>.
Github user milleruntime commented on the issue:

    https://github.com/apache/accumulo/pull/257
  
    Scratch that... the tests weren't great.  Doing some tweaking and then will run again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #257: ACCUMULO-4636 system iterator improvements for 1.7

Posted by milleruntime <gi...@git.apache.org>.
Github user milleruntime commented on the issue:

    https://github.com/apache/accumulo/pull/257
  
    I took a look again at the changes to ColumnQualifierFilter and I think they are still a positive change.  I think the slowdown with seekTestWithCQFilter10 is due to the overhead of the setup since the test only brings back 10x (16) keys.  We do see it pay off in the long run with the test with more keys, seekTestWithCQFilter1000, showing a positive increase.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #257: ACCUMULO-4636 system iterator improvements for 1.7

Posted by milleruntime <gi...@git.apache.org>.
Github user milleruntime commented on the issue:

    https://github.com/apache/accumulo/pull/257
  
    I modified the benchmark tests to work with 1.7 and ran against my branch and 1.7.  A few individual tests showed a decrease.  Here were the results:
    <pre>
    Test: MyBenchmark.seekAllSystemIters10 showed -1.9% increase in ops/s
    Test: MyBenchmark.seekAllSystemIters1000 showed 6.0% increase in ops/s
    Test: MyBenchmark.seekTestWithCQFilter10 showed -4.3% increase in ops/s
    Test: MyBenchmark.seekTestWithCQFilter1000 showed -2.7% increase in ops/s
    Test: MyBenchmark.seekTestWithCfFilter10 showed 14.0% increase in ops/s
    Test: MyBenchmark.seekTestWithCfFilter1000 showed 11.5% increase in ops/s
    Test: MyBenchmark.seekTestWithEmptyAuths10 showed 12.1% increase in ops/s
    Test: MyBenchmark.seekTestWithEmptyAuths1000 showed 9.1% increase in ops/s
    Total 5.48% increase in ops/s
    </pre>


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #257: ACCUMULO-4636 system iterator improvements for 1.7

Posted by milleruntime <gi...@git.apache.org>.
Github user milleruntime commented on the issue:

    https://github.com/apache/accumulo/pull/257
  
    Results from my Linux laptop:
    <pre>
    Test: MyBenchmark.seekAllSystemIters10 showed 61.0% increase in ops/s
    Test: MyBenchmark.seekAllSystemIters1000 showed 66.6% increase in ops/s
    Test: MyBenchmark.seekTestWithCQFilter10 showed 11.1% increase in ops/s
    Test: MyBenchmark.seekTestWithCQFilter1000 showed 0.7% increase in ops/s
    Test: MyBenchmark.seekTestWithCfFilter10 showed 31.2% increase in ops/s
    Test: MyBenchmark.seekTestWithCfFilter1000 showed 30.1% increase in ops/s
    Test: MyBenchmark.seekTestWithEmptyAuths10 showed 102.8% increase in ops/s
    Test: MyBenchmark.seekTestWithEmptyAuths1000 showed 125.5% increase in ops/s
    Total average 53.62% increase in ops/s
    </pre>


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] accumulo issue #257: ACCUMULO-4636 system iterator improvements for 1.7

Posted by milleruntime <gi...@git.apache.org>.
Github user milleruntime commented on the issue:

    https://github.com/apache/accumulo/pull/257
  
    Refactored the benchmark suite to build different version.  Here are the new results:
    <pre>
    Test: MyBenchmark.seekAllSystemIters10 showed 89.8% increase in ops/s
    Test: MyBenchmark.seekAllSystemIters1000 showed 93.8% increase in ops/s
    Test: MyBenchmark.seekTestWithCQFilter10 showed -13.4% increase in ops/s
    Test: MyBenchmark.seekTestWithCQFilter1000 showed 4.2% increase in ops/s
    Test: MyBenchmark.seekTestWithCfFilter10 showed 35.7% increase in ops/s
    Test: MyBenchmark.seekTestWithCfFilter1000 showed 63.3% increase in ops/s
    Test: MyBenchmark.seekTestWithEmptyAuths10 showed 112.1% increase in ops/s
    Test: MyBenchmark.seekTestWithEmptyAuths1000 showed 199.9% increase in ops/s
    Total average 73.17% increase in ops/s
    </pre>
    The one scenario still seems to show some regression, seekTestWithCQFilter, which seeks to data with a Column Qualifer.  I am not sure why this is happening... perhaps the extra checks added in ColumnQualifierFilter cause a slight slowdown when a CQ is present?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---