You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@arrow.apache.org by "Melik-Adamyan, Areg" <ar...@intel.com> on 2019/04/23 21:43:42 UTC

RE: Benchmarking mailing list thread [was Fwd: [Discuss] Benchmarking infrastructure]

Because we are using Google Benchmark, which has specific format there is a tool called becnhcmp which compares two runs:

$ benchcmp old.txt new.txt
benchmark           old ns/op     new ns/op     delta
BenchmarkConcat     523           68.6          -86.88%

So the comparison part is done and there is no need to create infra for that.

What we need is to change the ctest -L Benchmarks output to stdout to standard google benchmark output
--------------------------------------------------------------
Benchmark                        Time           CPU Iterations
--------------------------------------------------------------
BM_UserCounter/threads:1      9504 ns       9504 ns      73787
BM_UserCounter/threads:2      4775 ns       9550 ns      72606
BM_UserCounter/threads:4      2508 ns       9951 ns      70332
BM_UserCounter/threads:8      2055 ns       9933 ns      70344
BM_UserCounter/threads:16     1610 ns       9946 ns      70720
BM_UserCounter/threads:32     1192 ns       9948 ns      70496

The script on the build machine will parse this and alongside with the machine info send to DB.

The subset is done through passing --benchmark-filter=<...>
$ ./run_benchmarks.x --benchmark_filter=BM_memcpy/32
Run on (1 X 2300 MHz CPU )
2016-06-25 19:34:24
Benchmark              Time           CPU Iterations
----------------------------------------------------
BM_memcpy/32          11 ns         11 ns   79545455
BM_memcpy/32k       2181 ns       2185 ns     324074
BM_memcpy/32          12 ns         12 ns   54687500
BM_memcpy/32k       1834 ns       1837 ns     357143

Or we can create buildbot mode and produce output in JSON format
{
  "context": {
    "date": "2019/03/17-18:40:25",
    "num_cpus": 40,
    "mhz_per_cpu": 2801,
    "cpu_scaling_enabled": false,
    "build_type": "debug"
  },
  "benchmarks": [
    {
      "name": "BM_SetInsert/1024/1",
      "iterations": 94877,
      "real_time": 29275,
      "cpu_time": 29836,
      "bytes_per_second": 134066,
      "items_per_second": 33516
    }
  ]
}

So we have all the ingredients and do not need to reinvent anything, we need just to agree on the process: what is done when and put to where in which format.


---------- Forwarded message ---------
From: Francois Saint-Jacques <fs...@gmail.com>>
Date: Tue, Apr 16, 2019 at 11:44 AM
Subject: Re: [Discuss] Benchmarking infrastructure
To: <de...@arrow.apache.org>>


Hello,

A small status update, I recently implemented archery [1] a tool for Arrow benchmarks comparison [2]. The documentation ([3] and [4]) is in the pull-request. The primary goal is to compare 2 commits (and/or build
directories) for performance regressions. For now, it supports C++ benchmarks.
This is accessible via the command `archery benchmark diff`. The end result is a one comparison per line, with an regression indicator.

Currently, there is no facility to perform a single "run", e.g. run benchmarks in the current workspace without comparing to a previous version. This was initially implemented in [5] but depended heavily on ctest (with no control on execution). Once [1] is merged, I'll re-implement single run (ARROW-5071) this in term of archery, since it already execute and parses C++ benchmarks.

The next goal is to be able to push the results into an upstream database, be it the one defined in dev/benchmarking, or codespeed as Areg proposed. The steps required for this:
- ARROW-5071: Run and format benchmark results for upstream consumption
  (ideally under the `archery benchmark run` sub-command)
- ARROW-5175: Make a list of benchmarks to include in regression checks
- ARROW-4716: Collect machine and benchmarks context
- ARROW-TBD: Push benchmark results to upstream database

In parallel, with ARROW-4827, Krisztian and I are working on 2 related buildbot sub-projects enabling some regression detection:
- Triggering on-demand benchmark comparison via comments in PR
   (as proposed by Wes)
- Regression check on master merge (without database support)

François

P.S.
A collateral of this PR is that archery is a modular python library and can be used for other purposes, e.g. it could centralize orphaned scripts in dev/, e.g. linting, release, and merge since it offers utilities to handle arrow sources, git, cmake and exposes a usable CLI interface (with documentation).

[1] https://github.com/apache/arrow/pull/4141
[2] https://jira.apache.org/jira/browse/ARROW-4827
[3]
https://github.com/apache/arrow/blob/512ae64bc074a0b620966131f9338d4a1eed2356/docs/source/developers/benchmarks.rst
[4]
https://github.com/apache/arrow/pull/4141/files#diff-7a8805436a6884ddf74fe3eaec697e71R216
[5] https://github.com/apache/arrow/pull/4077

On Fri, Mar 29, 2019 at 3:21 PM Melik-Adamyan, Areg < areg.melik-adamyan@intel.com<ma...@intel.com>> wrote:

> >When you say "output is parsed", how is that exactly? We don't have
> >any
> scripts in the repository to do this yet (I have some comments on this
> below). We also have to collect machine information and insert that
> into the database. From my >perspective we have quite a bit of
> engineering work on this topic ("benchmark execution and data collection") to do.
> Yes I wrote one as a test.  Then it can do POST to the needed endpoint
> the JSON structure. Everything else will be done in the
>
> >My team and I have some physical hardware (including an Aarch64
> >Jetson
> TX2 machine, might be interesting to see what the ARM64 results look
> like) where we'd like to run benchmarks and upload the results also,
> so we need to write some documentation about how to add a new machine
> and set up a cron job of some kind.
> If it can run Linux, then we can setup it.
>
> >I'd like to eventually have a bot that we can ask to run a benchmark
> comparison versus master. Reporting on all PRs automatically might be
> quite a bit of work (and load on the machines) You should be able to
> choose the comparison between any two points:
> master-PR, master now - master yesterday, etc.
>
> >I thought the idea (based on our past e-mail discussions) was that we
> would implement benchmark collectors (as programs in the Arrow git
> repository) for each benchmarking framework, starting with gbenchmark
> and expanding to include ASV (for Python) and then others I'll open a
> PR and happy to put it into Arrow.
>
> >It seems like writing the benchmark collector script that runs the
> benchmarks, collects machine information, and inserts data into an
> instance of the database is the next milestone. Until that's done it
> seems difficult to do much else Ok, will update the Jira 5070 and link
> the 5071.
>
> Thanks.
>

Re: Benchmarking mailing list thread [was Fwd: [Discuss] Benchmarking infrastructure]

Posted by Wes McKinney <we...@gmail.com>.

On Thu, Apr 25, 2019 at 1:28 AM Melik-Adamyan, Areg
<ar...@intel.com> wrote:
>
> Hi,
>
> We are talking about the same thing actually, but you do not want to use 3rd party tools.
> For 3 and 4 - you run the first version store in 1.out, then second version store in 2.out and run compare tool. Your tool does two steps automatically, that is fine.
>
> > Various reason why I think the archery route is preferred over a mix of
> > scattered scripts, CI pipeline steps and random go binaries.
> >
> > 1. It is OS agnostic since it's written in python, and depends on cmake + git
> >    installed in PATH.
> [>] So is Google Benchmark, cmake and git, no?
> >
> > 2. Self contained in arrow's repository, no need to manually install external
> >    dependencies (go toolchain, then compile & install benchstat, benchcmp).
> >    Assuming python3 and pip are provided, which we already need for pyarrow.
> [>] Those operations are lighter than 'conda install', but ok, point taken.
> >
> > 3. Written as a library where the command line is a frontend. This makes it
> >    very easy to test and re-use. It also opens the door to clearing
> >    technical debt we've accumulated in `dev/`. This is not relevant for the
> >    benchmark sub-project, but still relevant for arrow developers in general.
> [>] Agree, but out of the scope of the benchmarking.
> >
> > 4. Benchmark framework agnostic. This does not depend on google's
> > benchmark and
> >    go benchmark output format. It does support it, but does not mandate it.
> >    Will be key to support Python (ASV) and other languages.
> [>] I do not understand what do you mean by other languages testing: core performance will come from the core C++ libraries, everything else will be wrappers around. So if I understand correctly by testing languages, we are testing wrappers?
> >

C++ is only one native Arrow implementation. There are 5 others: Java,
JavaScript, Rust, Go, and C#. There are 5 binding-centric languages:
C, Ruby, R, Python, and MATLAB. On a 2 year horizon I would expect to
see some other languages here: Swift and Julia are a couple likely
ones.

Some benchmarks involve downstream languages -- e.g. Python, Ruby, and
R have performance critical integrations whose behavior needs
continuous monitoring. As an example, the performance of conversions
between pandas and Arrow columnar format is very important for
downstream use cases (see e.g. [1]).

[1]: http://arrow.apache.org/blog/2019/02/05/python-string-memory-0.12/

> > 5. Shell scripts tend to grow un-maintenance. I say this as someone who abuse
> >    them. (archery implementation is derived from a local bash script).
> [>] There is no shell script in the first approach, but I totally share your pain.
> >
> > 6. It is not orchestrated by a complex CI pipeline (which effectively is a
> >    non-portable hardly reproducible script). It is self contained, can run
> >    within a CI or on a local machine. This is very convenient for local testing
> >    and debugging. I loathe waiting for the CI, especially when iterating in
> > development.
> [>] What you are really saying, is that Archery *is the CI* that you ship with the source code. It does all the same things. I am not against, but it will create a maintenance burden, and in a couple of years, you'll discover that it is outdated :)
>

It seems that we disagree about the scope of work involved in this
project, which is OK. We aren't asking you to do any extra work
yourself, but having a scalable (in project complexity and process)
and configurable long-term solution to benchmarking is important to
us, and we (myself and my colleagues) are committing ourselves to
building and maintaining it.

Thanks
Wes

> > You can get a sneak peek at of automation working here
> > http://nashville.ursalabs.org:4100/#/builders/16/builds/129,
> > note that this doesn't use dedicated hardware yet.
> [>] Nice, so, when we can start using it, and I guess nobody will object that perf.zaiteki.tech is not competing with Archery. So how can I help you to proceed faster? I can create and host DB from 5071 in the cloud if you want.
>
> -Areg.

RE: Benchmarking mailing list thread [was Fwd: [Discuss] Benchmarking infrastructure]

Posted by "Melik-Adamyan, Areg" <ar...@intel.com>.

Hi,

We are talking about the same thing actually, but you do not want to use 3rd party tools. 
For 3 and 4 - you run the first version store in 1.out, then second version store in 2.out and run compare tool. Your tool does two steps automatically, that is fine. 

> Various reason why I think the archery route is preferred over a mix of
> scattered scripts, CI pipeline steps and random go binaries.
> 
> 1. It is OS agnostic since it's written in python, and depends on cmake + git
>    installed in PATH.
[>] So is Google Benchmark, cmake and git, no?
> 
> 2. Self contained in arrow's repository, no need to manually install external
>    dependencies (go toolchain, then compile & install benchstat, benchcmp).
>    Assuming python3 and pip are provided, which we already need for pyarrow.
[>] Those operations are lighter than 'conda install', but ok, point taken.
> 
> 3. Written as a library where the command line is a frontend. This makes it
>    very easy to test and re-use. It also opens the door to clearing
>    technical debt we've accumulated in `dev/`. This is not relevant for the
>    benchmark sub-project, but still relevant for arrow developers in general.
[>] Agree, but out of the scope of the benchmarking.
> 
> 4. Benchmark framework agnostic. This does not depend on google's
> benchmark and
>    go benchmark output format. It does support it, but does not mandate it.
>    Will be key to support Python (ASV) and other languages.
[>] I do not understand what do you mean by other languages testing: core performance will come from the core C++ libraries, everything else will be wrappers around. So if I understand correctly by testing languages, we are testing wrappers?
> 
> 5. Shell scripts tend to grow un-maintenance. I say this as someone who abuse
>    them. (archery implementation is derived from a local bash script).
[>] There is no shell script in the first approach, but I totally share your pain.
> 
> 6. It is not orchestrated by a complex CI pipeline (which effectively is a
>    non-portable hardly reproducible script). It is self contained, can run
>    within a CI or on a local machine. This is very convenient for local testing
>    and debugging. I loathe waiting for the CI, especially when iterating in
> development.
[>] What you are really saying, is that Archery *is the CI* that you ship with the source code. It does all the same things. I am not against, but it will create a maintenance burden, and in a couple of years, you'll discover that it is outdated :)

> You can get a sneak peek at of automation working here
> http://nashville.ursalabs.org:4100/#/builders/16/builds/129,
> note that this doesn't use dedicated hardware yet.
[>] Nice, so, when we can start using it, and I guess nobody will object that perf.zaiteki.tech is not competing with Archery. So how can I help you to proceed faster? I can create and host DB from 5071 in the cloud if you want.

-Areg.

Re: Benchmarking mailing list thread [was Fwd: [Discuss] Benchmarking infrastructure]

Posted by Francois Saint-Jacques <fs...@gmail.com>.

Hello,

archery is the "shim" scripts that glue some of the steps (2-4) that you
described. It builds arrow (c++ for now), find the multiple benchmark
binaries, runs them, and collects the outputs. I encourage you to check the
implementation, notably [1] and [2] (and generally [3]).

Think of it as merging the steps 2-4 into a single script without the CI's
orchestration (steps in the pipeline). Thus, making it CI agnostic and
reproducible locally.

To put you in context, here's some "user stories" we'd like to achieve
(ARROW-5070):

1. Performance data should be tracked and stored in a database for each commit
   in the master branch. (ARROW-5071)

2. A reviewer should be able to trigger an on-demand regression check in a PR
   (ARROW-5071 and some ursabot stuff). Feedback (regression or not) should
   be given either via a PR status, an automated comment, or a bot-user
   declined review. (ARROW-5071)

3. A developer should be able to compare (diff) builds locally. By build, I
   mean cmake build directory, e.g. it can be a toolchain change, or different
   compiler flags. (ARROW-4827)

4. A developer should be able to compare commits locally. (ARROW-4827)

The current iteration of archery does 3 and 4 (via `archery benchmark diff`),
and easily modifiable to do 1 and 2 (via `archery benchmark` and minus the
infrastructure setup).  What you're proposing is only targeting 1, maybe 2,
but definitively not 3 and 4.

Various reason why I think the archery route is preferred over a mix of
scattered scripts, CI pipeline steps and random go binaries.

1. It is OS agnostic since it's written in python, and depends on cmake + git
   installed in PATH.

2. Self contained in arrow's repository, no need to manually install external
   dependencies (go toolchain, then compile & install benchstat, benchcmp).
   Assuming python3 and pip are provided, which we already need for pyarrow.

3. Written as a library where the command line is a frontend. This makes it
   very easy to test and re-use. It also opens the door to clearing
   technical debt we've accumulated in `dev/`. This is not relevant for the
   benchmark sub-project, but still relevant for arrow developers in general.

4. Benchmark framework agnostic. This does not depend on google's benchmark and
   go benchmark output format. It does support it, but does not mandate it.
   Will be key to support Python (ASV) and other languages.

5. Shell scripts tend to grow un-maintenable. I say this as someone who abuse
   them. (archery implementation is derived from a local bash script).

6. It is not orchestrated by a complex CI pipeline (which effectively is a
   non-portable hardly reproducible script). It is self contained, can run
   within a CI or on a local machine. This is very convenient for local testing
   and debugging. I loathe waiting for the CI, especially when
iterating in development.

You can get a sneak peek at of automation working here
http://nashville.ursalabs.org:4100/#/builders/16/builds/129,
note that this doesn't use dedicated hardware yet.

François

[1] https://github.com/apache/arrow/blob/2a953f1808566da01bbb90faeabe8131ff55f902/dev/archery/archery/benchmark/google.py
[2] https://github.com/apache/arrow/blob/2a953f1808566da01bbb90faeabe8131ff55f902/dev/archery/archery/benchmark/runner.py
[3] https://github.com/apache/arrow/pull/4141/files

On Wed, Apr 24, 2019 at 9:24 PM Melik-Adamyan, Areg
<ar...@intel.com> wrote:
>
> Wes,
>
> The process as I think should be the following.
> 1. Commit triggers to build in TeamCity. I have set the TeamCity, but we can use whatever CI we would like.
> 2. TeamCity is using the pool of identical machines to run the predefined (or all) performance benchmarks on one the build machines from the pool.
> 3. Each benchmark generates output - by using Google Benchmarks we generate JSON format file.
> 4. The build step in the TeamCity which runs the performance gathers all those files and parses them.
> 5. For each parsed output it creates an entry in the DB with the commit ID as a key and auxiliary information that can be helpful.
> 6. The codespeed sitting on top of that Database visualize data in the dashboard by marking regressions as red and progressions as green compared to either baseline which you define or previous commit, as all the commits are ordered in the time.
> 7. You can create custom queries to compare specific commits or see trends on the timeline.
>
> I am not mandating codespeed or anything else, but we should start with something. We can use something more sophisticated, like Influx.
>
> > In the benchmarking one of the hardest parts (IMHO) is the process/workflow
> > automation. I'm in support of the development of a "meta-benchmarking"
> > framework that offers automation, extensibility, and possibility for
> > customization.
> [>] Meta is good, and I am totally supporting it, but meanwhile we are doing that there is a need for something very simple but usable.
> >
> > One of the reasons that people don't do more benchmarking as part of their
> > development process is that the tooling around it isn't great.
> > Using a command line tool [1] that outputs unconfigurable text to the terminal
> > to compare benchmarks seems inadequate to me.
> [>] I would argue here - it is the minimal config that works with external tooling without creating huge infrastructure around it. We already use Google Benchmark library which provides all the needed output format. And if you do not like CodeSpeed we can use anything else, e.g. Dana (https://github.com/google/dana) from Google.
> >
> > In the cited example
> >
> > $ benchcmp old.txt new.txt
> >
> > Where do old.txt and new.txt come from? I would like to have that detail (build
> > of appropriate component, execution of benchmarks and collection of results)
> > automated.
> [>]In the case of Go it is: $go test -run=^$ -bench=. ./... > old.txt
> Then you switch to the new branch and do the same with >new.txt then you do benchcmp and it does the comparison. 3 bash commands.
>
> >
> > FWIW, 7 and a half years ago [2] I wrote a small project called vbench to assist
> > with benchmark automation, so this has been a long-term interest of mine.
> > Codespeed existed in 2011, here is what I wrote about it in December 2011,
> > and it is definitely odd to find myself typing almost the exact same words years
> > later:
> >
> > "Before starting to write a new project I looked briefly at codespeed... The
> > dealbreaker is that codespeed is just a web application. It doesn't actually (to
> > my knowledge, someone correct me if I'm wrong?) have any kind of a
> > framework for orchestrating the running of benchmarks throughout your code
> > history."
> [>] I totally agree with you. But the good part is that it doesn't need to have orchestration. TeamCitry or any other CI will do those steps for you. And the fact that you can run the benchmarks by hand and CI can just replicate your actions make suitable for most of the cases. And I don't care about codespeed or asv, as you said it is just a stupid web app. The most important part is to create a working pipeline. While we are looking for the best salt-cellar, we can use the plastic one. :)
> >
> > asv [3] is a more modern and evolved version of vbench. But it's Python-
> > specific. I think we need the same kind of thing except being able to automate
> > the execution of any benchmarks for any component in the Arrow project. So
> > we have some work to do.
> [>] Here is the catch - trying to do for any benchmarks will consume time and resources, and still there will be something left behind. It is hard to cover general case, and assume that the particular one, like C++ will be covered.
>
> >
> > - Wes
> >
> > [1]:
> > https://github.com/golang/tools/blob/master/cmd/benchcmp/benchcmp.go
> > [2]: http://wesmckinney.com/blog/introducing-vbench-new-code-performance-
> > analysis-and-monitoring-tool/
> > [3]: https://github.com/airspeed-velocity/asv
> >
> > On Wed, Apr 24, 2019 at 11:18 AM Sebastien Binet <bi...@cern.ch> wrote:
> > >
> > > On Wed, Apr 24, 2019 at 11:22 AM Antoine Pitrou <an...@python.org>
> > wrote:
> > >
> > > >
> > > > Hi Areg,
> > > >
> > > > Le 23/04/2019 à 23:43, Melik-Adamyan, Areg a écrit :
> > > > > Because we are using Google Benchmark, which has specific format
> > > > > there
> > > > is a tool called becnhcmp which compares two runs:
> > > > >
> > > > > $ benchcmp old.txt new.txt
> > > > > benchmark           old ns/op     new ns/op     delta
> > > > > BenchmarkConcat     523           68.6          -86.88%
> > > > >
> > > > > So the comparison part is done and there is no need to create
> > > > > infra for
> > > > that.
> > > >
> > >
> > > "surprisingly" Go is already using that benchmark format :) and (on
> > > top of a Go-based benchcmp command) there is also a benchstat command
> > > that, given a set of multiple before/after data points adds some
> > > amount of statistical analysis:
> > >  https://godoc.org/golang.org/x/perf/cmd/benchstat
> > >
> > > using the "benchmark" file format of benchcmp and benchstat would
> > > allow better cross-language interop.
> > >
> > > cheers,
> > > -s

RE: Benchmarking mailing list thread [was Fwd: [Discuss] Benchmarking infrastructure]

Posted by "Melik-Adamyan, Areg" <ar...@intel.com>.

Wes,

The process as I think should be the following.
1. Commit triggers to build in TeamCity. I have set the TeamCity, but we can use whatever CI we would like.
2. TeamCity is using the pool of identical machines to run the predefined (or all) performance benchmarks on one the build machines from the pool.
3. Each benchmark generates output - by using Google Benchmarks we generate JSON format file.
4. The build step in the TeamCity which runs the performance gathers all those files and parses them.
5. For each parsed output it creates an entry in the DB with the commit ID as a key and auxiliary information that can be helpful.
6. The codespeed sitting on top of that Database visualize data in the dashboard by marking regressions as red and progressions as green compared to either baseline which you define or previous commit, as all the commits are ordered in the time.
7. You can create custom queries to compare specific commits or see trends on the timeline.

I am not mandating codespeed or anything else, but we should start with something. We can use something more sophisticated, like Influx.
 
> In the benchmarking one of the hardest parts (IMHO) is the process/workflow
> automation. I'm in support of the development of a "meta-benchmarking"
> framework that offers automation, extensibility, and possibility for
> customization.
[>] Meta is good, and I am totally supporting it, but meanwhile we are doing that there is a need for something very simple but usable.
> 
> One of the reasons that people don't do more benchmarking as part of their
> development process is that the tooling around it isn't great.
> Using a command line tool [1] that outputs unconfigurable text to the terminal
> to compare benchmarks seems inadequate to me.
[>] I would argue here - it is the minimal config that works with external tooling without creating huge infrastructure around it. We already use Google Benchmark library which provides all the needed output format. And if you do not like CodeSpeed we can use anything else, e.g. Dana (https://github.com/google/dana) from Google. 
> 
> In the cited example
> 
> $ benchcmp old.txt new.txt
> 
> Where do old.txt and new.txt come from? I would like to have that detail (build
> of appropriate component, execution of benchmarks and collection of results)
> automated.
[>]In the case of Go it is: $go test -run=^$ -bench=. ./... > old.txt
Then you switch to the new branch and do the same with >new.txt then you do benchcmp and it does the comparison. 3 bash commands.

> 
> FWIW, 7 and a half years ago [2] I wrote a small project called vbench to assist
> with benchmark automation, so this has been a long-term interest of mine.
> Codespeed existed in 2011, here is what I wrote about it in December 2011,
> and it is definitely odd to find myself typing almost the exact same words years
> later:
> 
> "Before starting to write a new project I looked briefly at codespeed... The
> dealbreaker is that codespeed is just a web application. It doesn't actually (to
> my knowledge, someone correct me if I'm wrong?) have any kind of a
> framework for orchestrating the running of benchmarks throughout your code
> history."
[>] I totally agree with you. But the good part is that it doesn't need to have orchestration. TeamCitry or any other CI will do those steps for you. And the fact that you can run the benchmarks by hand and CI can just replicate your actions make suitable for most of the cases. And I don't care about codespeed or asv, as you said it is just a stupid web app. The most important part is to create a working pipeline. While we are looking for the best salt-cellar, we can use the plastic one. :)
> 
> asv [3] is a more modern and evolved version of vbench. But it's Python-
> specific. I think we need the same kind of thing except being able to automate
> the execution of any benchmarks for any component in the Arrow project. So
> we have some work to do.
[>] Here is the catch - trying to do for any benchmarks will consume time and resources, and still there will be something left behind. It is hard to cover general case, and assume that the particular one, like C++ will be covered. 

> 
> - Wes
> 
> [1]:
> https://github.com/golang/tools/blob/master/cmd/benchcmp/benchcmp.go
> [2]: http://wesmckinney.com/blog/introducing-vbench-new-code-performance-
> analysis-and-monitoring-tool/
> [3]: https://github.com/airspeed-velocity/asv
> 
> On Wed, Apr 24, 2019 at 11:18 AM Sebastien Binet <bi...@cern.ch> wrote:
> >
> > On Wed, Apr 24, 2019 at 11:22 AM Antoine Pitrou <an...@python.org>
> wrote:
> >
> > >
> > > Hi Areg,
> > >
> > > Le 23/04/2019 à 23:43, Melik-Adamyan, Areg a écrit :
> > > > Because we are using Google Benchmark, which has specific format
> > > > there
> > > is a tool called becnhcmp which compares two runs:
> > > >
> > > > $ benchcmp old.txt new.txt
> > > > benchmark           old ns/op     new ns/op     delta
> > > > BenchmarkConcat     523           68.6          -86.88%
> > > >
> > > > So the comparison part is done and there is no need to create
> > > > infra for
> > > that.
> > >
> >
> > "surprisingly" Go is already using that benchmark format :) and (on
> > top of a Go-based benchcmp command) there is also a benchstat command
> > that, given a set of multiple before/after data points adds some
> > amount of statistical analysis:
> >  https://godoc.org/golang.org/x/perf/cmd/benchstat
> >
> > using the "benchmark" file format of benchcmp and benchstat would
> > allow better cross-language interop.
> >
> > cheers,
> > -s

Re: Benchmarking mailing list thread [was Fwd: [Discuss] Benchmarking infrastructure]

Posted by Wes McKinney <we...@gmail.com>.

In the benchmarking one of the hardest parts (IMHO) is the
process/workflow automation. I'm in support of the development of a
"meta-benchmarking" framework that offers automation, extensibility,
and possibility for customization.

One of the reasons that people don't do more benchmarking as part of
their development process is that the tooling around it isn't great.
Using a command line tool [1] that outputs unconfigurable text to the
terminal to compare benchmarks seems inadequate to me.

In the cited example

$ benchcmp old.txt new.txt

Where do old.txt and new.txt come from? I would like to have that
detail (build of appropriate component, execution of benchmarks and
collection of results) automated.

FWIW, 7 and a half years ago [2] I wrote a small project called vbench
to assist with benchmark automation, so this has been a long-term
interest of mine. Codespeed existed in 2011, here is what I wrote
about it in December 2011, and it is definitely odd to find myself
typing almost the exact same words years later:

"Before starting to write a new project I looked briefly at
codespeed... The dealbreaker is that codespeed is just a web
application. It doesn't actually (to my knowledge, someone correct me
if I'm wrong?) have any kind of a framework for orchestrating the
running of benchmarks throughout your code history."

asv [3] is a more modern and evolved version of vbench. But it's
Python-specific. I think we need the same kind of thing except being
able to automate the execution of any benchmarks for any component in
the Arrow project. So we have some work to do.

- Wes

[1]: https://github.com/golang/tools/blob/master/cmd/benchcmp/benchcmp.go
[2]: http://wesmckinney.com/blog/introducing-vbench-new-code-performance-analysis-and-monitoring-tool/
[3]: https://github.com/airspeed-velocity/asv

On Wed, Apr 24, 2019 at 11:18 AM Sebastien Binet <bi...@cern.ch> wrote:
>
> On Wed, Apr 24, 2019 at 11:22 AM Antoine Pitrou <an...@python.org> wrote:
>
> >
> > Hi Areg,
> >
> > Le 23/04/2019 à 23:43, Melik-Adamyan, Areg a écrit :
> > > Because we are using Google Benchmark, which has specific format there
> > is a tool called becnhcmp which compares two runs:
> > >
> > > $ benchcmp old.txt new.txt
> > > benchmark           old ns/op     new ns/op     delta
> > > BenchmarkConcat     523           68.6          -86.88%
> > >
> > > So the comparison part is done and there is no need to create infra for
> > that.
> >
>
> "surprisingly" Go is already using that benchmark format :)
> and (on top of a Go-based benchcmp command) there is also a benchstat
> command that, given a set of multiple before/after data points adds some
> amount of statistical analysis:
>  https://godoc.org/golang.org/x/perf/cmd/benchstat
>
> using the "benchmark" file format of benchcmp and benchstat would allow
> better cross-language interop.
>
> cheers,
> -s

RE: Benchmarking mailing list thread [was Fwd: [Discuss] Benchmarking infrastructure]

Posted by "Melik-Adamyan, Areg" <ar...@intel.com>.

Sebastien - yes, Go has very advanced, but simple performance benchmarking tooling. My intention is to reuse as much as we can.

-----Original Message-----
From: Sebastien Binet [mailto:binet@cern.ch] 
Sent: Wednesday, April 24, 2019 11:09 AM
To: dev@arrow.apache.org
Subject: Re: Benchmarking mailing list thread [was Fwd: [Discuss] Benchmarking infrastructure]

On Wed, Apr 24, 2019 at 11:22 AM Antoine Pitrou <an...@python.org> wrote:

>
> Hi Areg,
>
> Le 23/04/2019 à 23:43, Melik-Adamyan, Areg a écrit :
> > Because we are using Google Benchmark, which has specific format 
> > there
> is a tool called becnhcmp which compares two runs:
> >
> > $ benchcmp old.txt new.txt
> > benchmark           old ns/op     new ns/op     delta
> > BenchmarkConcat     523           68.6          -86.88%
> >
> > So the comparison part is done and there is no need to create infra 
> > for
> that.
>

"surprisingly" Go is already using that benchmark format :) and (on top of a Go-based benchcmp command) there is also a benchstat command that, given a set of multiple before/after data points adds some amount of statistical analysis:
 https://godoc.org/golang.org/x/perf/cmd/benchstat

using the "benchmark" file format of benchcmp and benchstat would allow better cross-language interop.

cheers,
-s

Re: Benchmarking mailing list thread [was Fwd: [Discuss] Benchmarking infrastructure]

Posted by Sebastien Binet <bi...@cern.ch>.

On Wed, Apr 24, 2019 at 11:22 AM Antoine Pitrou <an...@python.org> wrote:

>
> Hi Areg,
>
> Le 23/04/2019 à 23:43, Melik-Adamyan, Areg a écrit :
> > Because we are using Google Benchmark, which has specific format there
> is a tool called becnhcmp which compares two runs:
> >
> > $ benchcmp old.txt new.txt
> > benchmark           old ns/op     new ns/op     delta
> > BenchmarkConcat     523           68.6          -86.88%
> >
> > So the comparison part is done and there is no need to create infra for
> that.
>

"surprisingly" Go is already using that benchmark format :)
and (on top of a Go-based benchcmp command) there is also a benchstat
command that, given a set of multiple before/after data points adds some
amount of statistical analysis:
 https://godoc.org/golang.org/x/perf/cmd/benchstat

using the "benchmark" file format of benchcmp and benchstat would allow
better cross-language interop.

cheers,
-s

Re: Benchmarking mailing list thread [was Fwd: [Discuss] Benchmarking infrastructure]

Posted by Antoine Pitrou <an...@python.org>.

Hi Areg,

Le 23/04/2019 à 23:43, Melik-Adamyan, Areg a écrit :
> Because we are using Google Benchmark, which has specific format there is a tool called becnhcmp which compares two runs:
> 
> $ benchcmp old.txt new.txt
> benchmark           old ns/op     new ns/op     delta
> BenchmarkConcat     523           68.6          -86.88%
> 
> So the comparison part is done and there is no need to create infra for that.

The goal here is to have a cross-language benchmarking infrastructure so
that we can track performance, not only of C++ features, but also Python
(and later perhaps Java, etc.).

Additionally, being able to write benchmarks in Python may let us test
more sophisticated scenarious easily, and so ultimately track C++
performance better as well (as the Python bindings call into the C++ libs).

Regards

Antoine.