You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/06/22 16:44:46 UTC

[GitHub] [arrow] wesm opened a new pull request #7516: ARROW-9201: [Archery] More user-friendly console output for benchmark diffs, add repetitions argument, don't build unit tests

wesm opened a new pull request #7516:
URL: https://github.com/apache/arrow/pull/7516


   This uses pandas to generate a sorted text table when using `archery benchmark diff`. Example:
   
   https://github.com/apache/arrow/pull/7506#issuecomment-647633470
   
   There's some other incidental changes
   
   * pandas is required for `archery benchmark diff`. I don't think there's value in reimplementing the stuff that pandas can do in a few lines of code (read JSON, create a sorted table and print it nicely for us). 
   * The default # of benchmarks repetitions has been changed from 10 to 1 (see ARROW-9155 for context). IMHO more interactive benchmark results is more useful than higher precision. If you need higher precision you can pass `--repetitions=10` on the command line
   * `archery benchmark` was building the unit tests unnecessarily. This also occluded a bug ARROW-9209, which is fixed here


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs commented on pull request #7516: ARROW-9201: [Archery] More user-friendly console output for benchmark diffs, add repetitions argument, don't build unit tests

Posted by GitBox <gi...@apache.org>.
kszucs commented on pull request #7516:
URL: https://github.com/apache/arrow/pull/7516#issuecomment-648473447


   I’m going to update the bot tomorrow.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs edited a comment on pull request #7516: ARROW-9201: [Archery] More user-friendly console output for benchmark diffs, add repetitions argument, don't build unit tests

Posted by GitBox <gi...@apache.org>.
kszucs edited a comment on pull request #7516:
URL: https://github.com/apache/arrow/pull/7516#issuecomment-647677894


   > @kszucs can you assist me with adapting ursabot for these changes?
   
   Sure.
   
   > I think we can use pandas's `DataFrame.to_html` to create a colorized table for GitHub, too https://pandas.pydata.org/pandas-docs/stable/user_guide/style.html
   
   I'm afraid this is not going to work, because we can't embed any CSS into the comment, this is why we generate the ursabot responses as diffs.
   
   > 
   > Changes that would be good to have in `ursabot benchmark`:
   > 
   > * Pass through `--cc` and `--cxx` options
   > * Pass through `--repetitions`
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #7516: ARROW-9201: [Archery] More user-friendly console output for benchmark diffs, add repetitions argument, don't build unit tests

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7516:
URL: https://github.com/apache/arrow/pull/7516#issuecomment-648162680


   +1. The bot changes can't be done here so going to go ahead and merge this so I can use it more easily without having to switch branches (to use this branch) before running benchmarks


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs edited a comment on pull request #7516: ARROW-9201: [Archery] More user-friendly console output for benchmark diffs, add repetitions argument, don't build unit tests

Posted by GitBox <gi...@apache.org>.
kszucs edited a comment on pull request #7516:
URL: https://github.com/apache/arrow/pull/7516#issuecomment-647798092


   Using pandas is not a problem, but the results cannot be improved much other than sorting the table.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #7516: ARROW-9201: [Archery] More user-friendly console output for benchmark diffs, add repetitions argument, don't build unit tests

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #7516:
URL: https://github.com/apache/arrow/pull/7516#issuecomment-647673667


   Just a small question: why are `m` and `b` used for millions and billions, respectively? (I would probably expect `M` and `G`)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs commented on pull request #7516: ARROW-9201: [Archery] More user-friendly console output for benchmark diffs, add repetitions argument, don't build unit tests

Posted by GitBox <gi...@apache.org>.
kszucs commented on pull request #7516:
URL: https://github.com/apache/arrow/pull/7516#issuecomment-647677894


   > @kszucs can you assist me with adapting ursabot for these changes?
   Sure.
   > I think we can use pandas's `DataFrame.to_html` to create a colorized table for GitHub, too https://pandas.pydata.org/pandas-docs/stable/user_guide/style.html
   I'm afraid this is not going to work, because we can't embed any CSS into the comment, this is why we generate the ursabot responses as diffs.
   > 
   > Changes that would be good to have in `ursabot benchmark`:
   > 
   > * Pass through `--cc` and `--cxx` options
   > * Pass through `--repetitions`
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #7516: ARROW-9201: [Archery] More user-friendly console output for benchmark diffs, add repetitions argument, don't build unit tests

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #7516:
URL: https://github.com/apache/arrow/pull/7516#issuecomment-647641059


   https://issues.apache.org/jira/browse/ARROW-9201


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #7516: ARROW-9201: [Archery] More user-friendly console output for benchmark diffs, add repetitions argument, don't build unit tests

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7516:
URL: https://github.com/apache/arrow/pull/7516#issuecomment-647883524


   I improved the output to show the `state.counters` stuff
   
   ```
                     benchmark         baseline        contender  change %                                    counters
   40            UniqueInt64/5    6.442 GiB/sec   18.346 GiB/sec   184.782  {'iterations': 145, 'null_percent': 100.0}
   0            UniqueInt64/11    6.500 GiB/sec   18.364 GiB/sec   182.522  {'iterations': 145, 'null_percent': 100.0}
   11            UniqueUInt8/5  812.047 MiB/sec    1.755 GiB/sec   121.298  {'iterations': 142, 'null_percent': 100.0}
   7             UniqueUInt8/1  683.943 MiB/sec    1.253 GiB/sec    87.593    {'iterations': 117, 'null_percent': 0.1}
   38            UniqueUInt8/4  762.983 MiB/sec  950.521 MiB/sec    24.580   {'iterations': 133, 'null_percent': 99.0}
   29            UniqueUInt8/2  659.082 MiB/sec  820.410 MiB/sec    24.478    {'iterations': 114, 'null_percent': 1.0}
   5             UniqueInt64/1    2.656 GiB/sec    3.300 GiB/sec    24.223     {'iterations': 60, 'null_percent': 0.1}
   32            UniqueInt64/4    5.627 GiB/sec    6.772 GiB/sec    20.349   {'iterations': 119, 'null_percent': 99.0}
   25           UniqueInt64/10    5.234 GiB/sec    6.294 GiB/sec    20.254   {'iterations': 110, 'null_percent': 99.0}
   39  UniqueString100bytes/11   26.815 GiB/sec   31.122 GiB/sec    16.061   {'iterations': 48, 'null_percent': 100.0}
   23    UniqueString10bytes/5    2.691 GiB/sec    3.113 GiB/sec    15.667   {'iterations': 48, 'null_percent': 100.0}
   34   UniqueString100bytes/5   26.944 GiB/sec   31.015 GiB/sec    15.108   {'iterations': 48, 'null_percent': 100.0}
   6    UniqueString10bytes/11    2.699 GiB/sec    3.096 GiB/sec    14.721   {'iterations': 49, 'null_percent': 100.0}
   21   UniqueString100bytes/7    1.947 GiB/sec    2.217 GiB/sec    13.866      {'iterations': 3, 'null_percent': 0.1}
   28            UniqueInt64/2    2.622 GiB/sec    2.904 GiB/sec    10.770     {'iterations': 59, 'null_percent': 1.0}
   13            UniqueInt64/3    2.157 GiB/sec    2.343 GiB/sec     8.644    {'iterations': 48, 'null_percent': 10.0}
   33   UniqueString100bytes/4   24.286 GiB/sec   26.030 GiB/sec     7.181    {'iterations': 43, 'null_percent': 99.0}
   22            UniqueInt64/7    2.542 GiB/sec    2.707 GiB/sec     6.497     {'iterations': 56, 'null_percent': 0.1}
   20  UniqueString100bytes/10   22.536 GiB/sec   23.985 GiB/sec     6.432    {'iterations': 40, 'null_percent': 99.0}
   35    UniqueString10bytes/1  788.817 MiB/sec  836.008 MiB/sec     5.983     {'iterations': 14, 'null_percent': 0.1}
   17    UniqueString10bytes/7  592.671 MiB/sec  628.054 MiB/sec     5.970     {'iterations': 10, 'null_percent': 0.1}
   3     UniqueString10bytes/4    2.515 GiB/sec    2.658 GiB/sec     5.687    {'iterations': 45, 'null_percent': 99.0}
   19   UniqueString10bytes/10    2.402 GiB/sec    2.529 GiB/sec     5.269    {'iterations': 42, 'null_percent': 99.0}
   9    UniqueString100bytes/1    3.929 GiB/sec    4.077 GiB/sec     3.762      {'iterations': 7, 'null_percent': 0.1}
   30    UniqueString10bytes/8  593.560 MiB/sec  610.253 MiB/sec     2.812     {'iterations': 10, 'null_percent': 1.0}
   12    UniqueString10bytes/2  788.505 MiB/sec  808.396 MiB/sec     2.523     {'iterations': 14, 'null_percent': 1.0}
   37   UniqueString100bytes/8    1.965 GiB/sec    1.998 GiB/sec     1.697      {'iterations': 3, 'null_percent': 1.0}
   1    UniqueString100bytes/2    3.984 GiB/sec    4.025 GiB/sec     1.028      {'iterations': 7, 'null_percent': 1.0}
   36   UniqueString100bytes/3    4.262 GiB/sec    4.293 GiB/sec     0.725     {'iterations': 8, 'null_percent': 10.0}
   8     BuildStringDictionary   85.507 MiB/sec   85.687 MiB/sec     0.211                         {'iterations': 198}
   16   UniqueString100bytes/9    2.121 GiB/sec    2.111 GiB/sec    -0.469     {'iterations': 4, 'null_percent': 10.0}
   4    UniqueString100bytes/6    2.056 GiB/sec    2.043 GiB/sec    -0.626      {'iterations': 4, 'null_percent': 0.0}
   10            UniqueUInt8/3  453.281 MiB/sec  448.407 MiB/sec    -1.075    {'iterations': 79, 'null_percent': 10.0}
   14   UniqueString100bytes/0    4.100 GiB/sec    4.055 GiB/sec    -1.089      {'iterations': 7, 'null_percent': 0.0}
   24            UniqueInt64/8    2.473 GiB/sec    2.443 GiB/sec    -1.202     {'iterations': 55, 'null_percent': 1.0}
   26    UniqueString10bytes/9  615.880 MiB/sec  608.453 MiB/sec    -1.206    {'iterations': 11, 'null_percent': 10.0}
   42    UniqueString10bytes/6  651.430 MiB/sec  640.128 MiB/sec    -1.735     {'iterations': 11, 'null_percent': 0.0}
   27            UniqueUInt8/0    1.775 GiB/sec    1.738 GiB/sec    -2.063    {'iterations': 318, 'null_percent': 0.0}
   31            UniqueInt64/9    2.076 GiB/sec    2.033 GiB/sec    -2.067    {'iterations': 46, 'null_percent': 10.0}
   15          BuildDictionary    1.535 GiB/sec    1.503 GiB/sec    -2.079                         {'iterations': 277}
   41            UniqueInt64/0    3.915 GiB/sec    3.827 GiB/sec    -2.262     {'iterations': 87, 'null_percent': 0.0}
   43    UniqueString10bytes/3  802.729 MiB/sec  784.279 MiB/sec    -2.298    {'iterations': 14, 'null_percent': 10.0}
   18            UniqueInt64/6    3.284 GiB/sec    3.178 GiB/sec    -3.229     {'iterations': 72, 'null_percent': 0.0}
   2     UniqueString10bytes/0  895.983 MiB/sec  849.150 MiB/sec    -5.227     {'iterations': 16, 'null_percent': 0.0}
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #7516: ARROW-9201: [Archery] More user-friendly console output for benchmark diffs, add repetitions argument, don't build unit tests

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7516:
URL: https://github.com/apache/arrow/pull/7516#issuecomment-647758158


   I’m sort of -1 on using anything but pandas for data munging and data presentation in our tooling. It’s not a very large dependency and has everything we need. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm edited a comment on pull request #7516: ARROW-9201: [Archery] More user-friendly console output for benchmark diffs, add repetitions argument, don't build unit tests

Posted by GitBox <gi...@apache.org>.
wesm edited a comment on pull request #7516:
URL: https://github.com/apache/arrow/pull/7516#issuecomment-647758158


   I’m sort of -1 on using anything but pandas for data munging and data presentation in our tooling. It’s not a very large dependency and has everything we need. FWIW, the current Ursabot output doesn't even sort the results, which is really needed to easily make sense of what got faster or slower at a glance. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] fsaintjacques commented on pull request #7516: ARROW-9201: [Archery] More user-friendly console output for benchmark diffs, add repetitions argument, don't build unit tests

Posted by GitBox <gi...@apache.org>.
fsaintjacques commented on pull request #7516:
URL: https://github.com/apache/arrow/pull/7516#issuecomment-647698721


   ursabot uses `tabulate` which I think is smaller dependencies.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs commented on pull request #7516: ARROW-9201: [Archery] More user-friendly console output for benchmark diffs, add repetitions argument, don't build unit tests

Posted by GitBox <gi...@apache.org>.
kszucs commented on pull request #7516:
URL: https://github.com/apache/arrow/pull/7516#issuecomment-647798092


   Using pandas is not a problem, but other than sorting the results we cannot really improve the look and feel. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm closed pull request #7516: ARROW-9201: [Archery] More user-friendly console output for benchmark diffs, add repetitions argument, don't build unit tests

Posted by GitBox <gi...@apache.org>.
wesm closed pull request #7516:
URL: https://github.com/apache/arrow/pull/7516


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #7516: ARROW-9201: [Archery] More user-friendly console output for benchmark diffs, add repetitions argument, don't build unit tests

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7516:
URL: https://github.com/apache/arrow/pull/7516#issuecomment-647641007


   @kszucs can you assist me with adapting ursabot for these changes? I think we can use pandas's `DataFrame.to_html` to create a colorized table for GitHub, too https://pandas.pydata.org/pandas-docs/stable/user_guide/style.html
   
   Changes that would be good to have in `ursabot benchmark`:
   
   * Pass through `--cc` and `--cxx` options
   * Pass through `--repetitions`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org