You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Melik-Adamyan, Areg" <ar...@intel.com> on 2019/01/18 06:13:50 UTC

Benchmarking dashboard proposal

Hello,

I want to restart/attach to the discussions for creating Arrow benchmarking dashboard. I want to propose performance benchmark run per commit to track the changes.
The proposal includes building infrastructure for per-commit tracking comprising of the following parts:
- Hosted JetBrains for OSS https://teamcity.jetbrains.com/ as a build system 
- Agents running in cloud both VM/container (DigitalOcean, or others) and bare-metal (Packet.net/AWS) and on-premise(Nvidia boxes?)
- JFrog artifactory storage and management for OSS projects https://jfrog.com/open-source/#artifactory2 
- Codespeed as a frontend https://github.com/tobami/codespeed 

I am volunteering to build such system (if needed more Intel folks will be involved) so we can start tracking performance on various platforms and understand how changes affect it.

Please, let me know your thoughts!

Thanks,
-Areg.




Re: Benchmarking dashboard proposal

Posted by Brian Hulette <hu...@gmail.com>.
We also have some JS benchmarks [1]. Currently they're only really run on
an ad-hoc basis to manually test major changes but it would be great to
include them in this.

[1] https://github.com/apache/arrow/tree/master/js/perf

On Fri, Jan 18, 2019 at 12:34 AM Uwe L. Korn <ma...@uwekorn.com> wrote:

> Hello,
>
> note that we have(had?) the Python benchmarks continuously running and
> reported at https://pandas.pydata.org/speed/arrow/. Seems like this
> stopped in July 2018.
>
> UWe
>
> On Fri, Jan 18, 2019, at 9:23 AM, Antoine Pitrou wrote:
> >
> > Hi Areg,
> >
> > That sounds like a good idea to me.  Note our benchmarks are currently
> > scattered accross the various implementations.  The two that I know of:
> >
> > - the C++ benchmarks are standalone executables created using the Google
> > Benchmark library, aptly named "*-benchmark" (or "*-benchmark.exe" on
> > Windows)
> > - the Python benchmarks use the ASV utility:
> >
> https://github.com/apache/arrow/blob/master/docs/source/python/benchmarks.rst
> >
> > There may be more in the other implementations.
> >
> > Regards
> >
> > Antoine.
> >
> >
> > Le 18/01/2019 à 07:13, Melik-Adamyan, Areg a écrit :
> > > Hello,
> > >
> > > I want to restart/attach to the discussions for creating Arrow
> benchmarking dashboard. I want to propose performance benchmark run per
> commit to track the changes.
> > > The proposal includes building infrastructure for per-commit tracking
> comprising of the following parts:
> > > - Hosted JetBrains for OSS https://teamcity.jetbrains.com/ as a build
> system
> > > - Agents running in cloud both VM/container (DigitalOcean, or others)
> and bare-metal (Packet.net/AWS) and on-premise(Nvidia boxes?)
> > > - JFrog artifactory storage and management for OSS projects
> https://jfrog.com/open-source/#artifactory2
> > > - Codespeed as a frontend https://github.com/tobami/codespeed
> > >
> > > I am volunteering to build such system (if needed more Intel folks
> will be involved) so we can start tracking performance on various platforms
> and understand how changes affect it.
> > >
> > > Please, let me know your thoughts!
> > >
> > > Thanks,
> > > -Areg.
> > >
> > >
> > >
>

Re: Benchmarking dashboard proposal

Posted by Tom Augspurger <to...@gmail.com>.
I'll see if I can figure out why the benchmarks at
https://pandas.pydata.org/speed/arrow/ aren't being updated this weekend.

On Fri, Jan 18, 2019 at 2:34 AM Uwe L. Korn <ma...@uwekorn.com> wrote:

> Hello,
>
> note that we have(had?) the Python benchmarks continuously running and
> reported at https://pandas.pydata.org/speed/arrow/. Seems like this
> stopped in July 2018.
>
> UWe
>
> On Fri, Jan 18, 2019, at 9:23 AM, Antoine Pitrou wrote:
> >
> > Hi Areg,
> >
> > That sounds like a good idea to me.  Note our benchmarks are currently
> > scattered accross the various implementations.  The two that I know of:
> >
> > - the C++ benchmarks are standalone executables created using the Google
> > Benchmark library, aptly named "*-benchmark" (or "*-benchmark.exe" on
> > Windows)
> > - the Python benchmarks use the ASV utility:
> >
> https://github.com/apache/arrow/blob/master/docs/source/python/benchmarks.rst
> >
> > There may be more in the other implementations.
> >
> > Regards
> >
> > Antoine.
> >
> >
> > Le 18/01/2019 à 07:13, Melik-Adamyan, Areg a écrit :
> > > Hello,
> > >
> > > I want to restart/attach to the discussions for creating Arrow
> benchmarking dashboard. I want to propose performance benchmark run per
> commit to track the changes.
> > > The proposal includes building infrastructure for per-commit tracking
> comprising of the following parts:
> > > - Hosted JetBrains for OSS https://teamcity.jetbrains.com/ as a build
> system
> > > - Agents running in cloud both VM/container (DigitalOcean, or others)
> and bare-metal (Packet.net/AWS) and on-premise(Nvidia boxes?)
> > > - JFrog artifactory storage and management for OSS projects
> https://jfrog.com/open-source/#artifactory2
> > > - Codespeed as a frontend https://github.com/tobami/codespeed
> > >
> > > I am volunteering to build such system (if needed more Intel folks
> will be involved) so we can start tracking performance on various platforms
> and understand how changes affect it.
> > >
> > > Please, let me know your thoughts!
> > >
> > > Thanks,
> > > -Areg.
> > >
> > >
> > >
>

Re: Benchmarking dashboard proposal

Posted by "Uwe L. Korn" <ma...@uwekorn.com>.
Hello,

note that we have(had?) the Python benchmarks continuously running and reported at https://pandas.pydata.org/speed/arrow/. Seems like this stopped in July 2018.

UWe

On Fri, Jan 18, 2019, at 9:23 AM, Antoine Pitrou wrote:
> 
> Hi Areg,
> 
> That sounds like a good idea to me.  Note our benchmarks are currently
> scattered accross the various implementations.  The two that I know of:
> 
> - the C++ benchmarks are standalone executables created using the Google
> Benchmark library, aptly named "*-benchmark" (or "*-benchmark.exe" on
> Windows)
> - the Python benchmarks use the ASV utility:
> https://github.com/apache/arrow/blob/master/docs/source/python/benchmarks.rst
> 
> There may be more in the other implementations.
> 
> Regards
> 
> Antoine.
> 
> 
> Le 18/01/2019 à 07:13, Melik-Adamyan, Areg a écrit :
> > Hello,
> > 
> > I want to restart/attach to the discussions for creating Arrow benchmarking dashboard. I want to propose performance benchmark run per commit to track the changes.
> > The proposal includes building infrastructure for per-commit tracking comprising of the following parts:
> > - Hosted JetBrains for OSS https://teamcity.jetbrains.com/ as a build system 
> > - Agents running in cloud both VM/container (DigitalOcean, or others) and bare-metal (Packet.net/AWS) and on-premise(Nvidia boxes?)
> > - JFrog artifactory storage and management for OSS projects https://jfrog.com/open-source/#artifactory2 
> > - Codespeed as a frontend https://github.com/tobami/codespeed 
> > 
> > I am volunteering to build such system (if needed more Intel folks will be involved) so we can start tracking performance on various platforms and understand how changes affect it.
> > 
> > Please, let me know your thoughts!
> > 
> > Thanks,
> > -Areg.
> > 
> > 
> > 

Re: Benchmarking dashboard proposal

Posted by Antoine Pitrou <an...@python.org>.
Hi Areg,

That sounds like a good idea to me.  Note our benchmarks are currently
scattered accross the various implementations.  The two that I know of:

- the C++ benchmarks are standalone executables created using the Google
Benchmark library, aptly named "*-benchmark" (or "*-benchmark.exe" on
Windows)
- the Python benchmarks use the ASV utility:
https://github.com/apache/arrow/blob/master/docs/source/python/benchmarks.rst

There may be more in the other implementations.

Regards

Antoine.


Le 18/01/2019 à 07:13, Melik-Adamyan, Areg a écrit :
> Hello,
> 
> I want to restart/attach to the discussions for creating Arrow benchmarking dashboard. I want to propose performance benchmark run per commit to track the changes.
> The proposal includes building infrastructure for per-commit tracking comprising of the following parts:
> - Hosted JetBrains for OSS https://teamcity.jetbrains.com/ as a build system 
> - Agents running in cloud both VM/container (DigitalOcean, or others) and bare-metal (Packet.net/AWS) and on-premise(Nvidia boxes?)
> - JFrog artifactory storage and management for OSS projects https://jfrog.com/open-source/#artifactory2 
> - Codespeed as a frontend https://github.com/tobami/codespeed 
> 
> I am volunteering to build such system (if needed more Intel folks will be involved) so we can start tracking performance on various platforms and understand how changes affect it.
> 
> Please, let me know your thoughts!
> 
> Thanks,
> -Areg.
> 
> 
> 

Re: Benchmarking dashboard proposal

Posted by Jeff Zhang <zj...@gmail.com>.
+1 It make sense to track the performance of arrow Because I think project
arrow is different from other projects that its goal is efficiently data
exchange between systems/languages.


Melik-Adamyan, Areg <ar...@intel.com> 于2019年1月18日周五 下午2:14写道:

> Hello,
>
> I want to restart/attach to the discussions for creating Arrow
> benchmarking dashboard. I want to propose performance benchmark run per
> commit to track the changes.
> The proposal includes building infrastructure for per-commit tracking
> comprising of the following parts:
> - Hosted JetBrains for OSS https://teamcity.jetbrains.com/ as a build
> system
> - Agents running in cloud both VM/container (DigitalOcean, or others) and
> bare-metal (Packet.net/AWS) and on-premise(Nvidia boxes?)
> - JFrog artifactory storage and management for OSS projects
> https://jfrog.com/open-source/#artifactory2
> - Codespeed as a frontend https://github.com/tobami/codespeed
>
> I am volunteering to build such system (if needed more Intel folks will be
> involved) so we can start tracking performance on various platforms and
> understand how changes affect it.
>
> Please, let me know your thoughts!
>
> Thanks,
> -Areg.
>
>
>
>

-- 
Best Regards

Jeff Zhang

Re: Benchmarking dashboard proposal

Posted by Antoine Pitrou <an...@python.org>.
Le 20/02/2019 à 18:55, Melik-Adamyan, Areg a écrit :
> 2. The unit-tests framework (google/benchmark) allows to effectively report in textual format the needed data on benchmark with preamble containing information about the machine on which the benchmarks are run.

On this topic, gbenchmark actually can output JSON, e.g.:
./build/release/arrow-utf8-util-benchmark --benchmark_out=results.json --benchmark_out_format=json

Here is how the JSON output looks like:
https://gist.github.com/pitrou/e055b454f333adf3c16325613c716309

Using this data it should be easy to massage an ingestion script
that gives it to the database in the expected format.

> - Disallow to enter data to the central repo any single benchmarks run, as they do not mean much in the case of continuous and statistically relevant measurements. [...]
> - Mandate the contributors to have dedicated environment for measurements.

I have no strong opinion on this.  Another possibility is to regard one set
of machines (e.g. Intel- or Ursa Labs-provided benchmarking machines, such
as the DGX machines currently at Wes' office) as the reference for tracking
regressions, and other machines as just informational.

That said, I think you're right that it doesn't sound very useful to allow
arbitrary benchmark result submissions.  However, I think there could still
be a separate test database instance, to allow easy testing of ingestion
or reporting scripts.

Regards

Antoine.


> 3. So with environments set and regular runs you have all the artifacts, though not in a very comprehensible format. So the reason to set a dashboard is to allow to consume data and be able to track performance of various parts on a historical perspective and much more nicely with visualizations. 
> And here are the scope restrictions I have in mind:
> - Disallow to enter data to the central repo any single benchmarks run, as they do not mean much in the case of continuous and statistically relevant measurements. What information you will get if someone reports some single run? You do not know how clean it was done, and more importantly is it possible to reproduce elsewhere. That is why even if it is better, worse or the same you cannot compare with the data already in the DB.
> - Mandate the contributors to have dedicated environment for measurements. Otherwise they can use the TeamCity to run and parse data and publish on their site. Data that enters Arrow performance DB becomes Arrow community owned data. And it becomes community's job to answer why certain things are better or worse.
> -  Because the numbers and flavors for CPU/GPU/accelerators are huge we cannot satisfy all the needs upfront and create DB that covers all the possible variants. I think we should have simple CPU and GPU configs now, even if they will not be perfect. By simple I mean basic brand string. That should be enough. Having all the detailed info in the DB does not make sense, as my experience is telling, you never use them, you use the CPUID/brandname to get the info needed.
> - Scope and reqs will change during the time and going huge now will make things complicated later. So I think it will be beneficial to have something quick up and running, get better understanding of our needs and gaps, and go from there. 
> The needed infra is already up on AWS, so as soon as we resolve DNS and key exchange issues we can launch.
> 
> -Areg.
> 
> -----Original Message-----
> From: Tanya Schlusser [mailto:tanya@tickel.net] 
> Sent: Thursday, February 7, 2019 4:40 PM
> To: dev@arrow.apache.org
> Subject: Re: Benchmarking dashboard proposal
> 
> Late, but there's a PR now with first-draft DDL ( https://github.com/apache/arrow/pull/3586).
> Happy to receive any feedback!
> 
> I tried to think about how people would submit benchmarks, and added a Postgraphile container for http-via-GraphQL.
> If others have strong opinions on the data modeling please speak up because I'm more a database user than a designer.
> 
> I can also help with benchmarking work in R/Python given guidance/a roadmap/examples from someone else.
> 
> Best,
> Tanya
> 
> On Mon, Feb 4, 2019 at 12:37 PM Tanya Schlusser <ta...@tickel.net> wrote:
> 
>> I hope to make a PR with the DDL by tomorrow or Wednesday night—DDL 
>> along with a README in a new directory `arrow/dev/benchmarking` unless 
>> directed otherwise.
>>
>> A "C++ Benchmark Collector" script would be super. I expect some 
>> back-and-forth on this to identify naïve assumptions in the data model.
>>
>> Attempting to submit actual benchmarks is how to get a handle on that. 
>> I recognize I'm blocking downstream work. Better to get an initial PR 
>> and some discussion going.
>>
>> Best,
>> Tanya
>>
>> On Mon, Feb 4, 2019 at 10:10 AM Wes McKinney <we...@gmail.com> wrote:
>>
>>> hi folks,
>>>
>>> I'm curious where we currently stand on this project. I see the 
>>> discussion in https://issues.apache.org/jira/browse/ARROW-4313 -- 
>>> would the next step be to have a pull request with .sql files 
>>> containing the DDL required to create the schema in PostgreSQL?
>>>
>>> I could volunteer to write the "C++ Benchmark Collector" script that 
>>> will run all the benchmarks on Linux and collect their data to be 
>>> inserted into the database.
>>>
>>> Thanks
>>> Wes
>>>
>>> On Sun, Jan 27, 2019 at 12:20 AM Tanya Schlusser <ta...@tickel.net>
>>> wrote:
>>>>
>>>> I don't want to be the bottleneck and have posted an initial draft 
>>>> data model in the JIRA issue
>>> https://issues.apache.org/jira/browse/ARROW-4313
>>>>
>>>> It should not be a problem to get content into a form that would be 
>>>> acceptable for either a static site like ASV (via CORS queries to a 
>>>> GraphQL/REST interface) or a codespeed-style site (via a separate 
>>>> schema organized for Django)
>>>>
>>>> I don't think I'm experienced enough to actually write any 
>>>> benchmarks though, so all I can contribute is backend work for this task.
>>>>
>>>> Best,
>>>> Tanya
>>>>
>>>> On Sat, Jan 26, 2019 at 7:37 PM Wes McKinney <we...@gmail.com>
>>> wrote:
>>>>
>>>>> hi folks,
>>>>>
>>>>> I'd like to propose some kind of timeline for getting a first 
>>>>> iteration of a benchmark database developed and live, with 
>>>>> scripts to enable one or more initial agents to start adding new 
>>>>> data on a daily / per-commit basis. I have at least 3 physical 
>>>>> machines where I could immediately set up cron jobs to start 
>>>>> adding new data, and I could attempt to backfill data as far back as possible.
>>>>>
>>>>> Personally, I would like to see this done by the end of February 
>>>>> if not sooner -- if we don't have the volunteers to push the work 
>>>>> to completion by then please let me know as I will rearrange my 
>>>>> priorities to make sure that it happens. Does that sounds reasonable?
>>>>>
>>>>> Please let me know if this plan sounds reasonable:
>>>>>
>>>>> * Set up a hosted PostgreSQL instance, configure backups
>>>>> * Propose and adopt a database schema for storing benchmark 
>>>>> results
>>>>> * For C++, write script (or Dockerfile) to execute all 
>>>>> google-benchmarks, output results to JSON, then adapter script
>>>>> (Python) to ingest into database
>>>>> * For Python, similar script that invokes ASV, then inserts ASV 
>>>>> results into benchmark database
>>>>>
>>>>> This seems to be a pre-requisite for having a front-end to 
>>>>> visualize the results, but the dashboard/front end can hopefully 
>>>>> be implemented in such a way that the details of the benchmark 
>>>>> database are not too tightly coupled
>>>>>
>>>>> (Do we have any other benchmarks in the project that would need 
>>>>> to be inserted initially?)
>>>>>
>>>>> Related work to trigger benchmarks on agents when new commits 
>>>>> land in master can happen concurrently -- one task need not block 
>>>>> the other
>>>>>
>>>>> Thanks
>>>>> Wes
>>>>>
>>>>> On Mon, Jan 21, 2019 at 11:14 AM Wes McKinney 
>>>>> <we...@gmail.com>
>>> wrote:
>>>>>>
>>>>>> Sorry, copy-paste failure:
>>>>> https://issues.apache.org/jira/browse/ARROW-4313
>>>>>>
>>>>>> On Mon, Jan 21, 2019 at 11:14 AM Wes McKinney 
>>>>>> <we...@gmail.com>
>>>>> wrote:
>>>>>>>
>>>>>>> I don't think there is one but I just created
>>>>>>>
>>>>>
>>> https://lists.apache.org/thread.html/278e573445c83bbd8ee66474b9356c52
>>> 91a16f6b6eca11dbbe4b473a@%3Cdev.arrow.apache.org%3E
>>>>>>>
>>>>>>> On Mon, Jan 21, 2019 at 10:35 AM Tanya Schlusser <
>>> tanya@tickel.net>
>>>>> wrote:
>>>>>>>>
>>>>>>>> Areg,
>>>>>>>>
>>>>>>>> If you'd like help, I volunteer! No experience benchmarking 
>>>>>>>> but
>>> tons
>>>>>>>> experience databasing—I can mock the backend (database + 
>>>>>>>> http)
>>> as a
>>>>>>>> starting point for discussion if this is the way people 
>>>>>>>> want to
>>> go.
>>>>>>>>
>>>>>>>> Is there a Jira ticket for this that i can jump into?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sun, Jan 20, 2019 at 3:24 PM Wes McKinney <
>>> wesmckinn@gmail.com>
>>>>> wrote:
>>>>>>>>
>>>>>>>>> hi Areg,
>>>>>>>>>
>>>>>>>>> This sounds great -- we've discussed building a more
>>> full-featured
>>>>>>>>> benchmark automation system in the past but nothing has 
>>>>>>>>> been
>>>>> developed
>>>>>>>>> yet.
>>>>>>>>>
>>>>>>>>> Your proposal about the details sounds OK; the single 
>>>>>>>>> most
>>>>> important
>>>>>>>>> thing to me is that we build and maintain a very general
>>> purpose
>>>>>>>>> database schema for building the historical benchmark 
>>>>>>>>> database
>>>>>>>>>
>>>>>>>>> The benchmark database should keep track of:
>>>>>>>>>
>>>>>>>>> * Timestamp of benchmark run
>>>>>>>>> * Git commit hash of codebase
>>>>>>>>> * Machine unique name (sort of the "user id")
>>>>>>>>> * CPU identification for machine, and clock frequency (in
>>> case of
>>>>>>>>> overclocking)
>>>>>>>>> * CPU cache sizes (L1/L2/L3)
>>>>>>>>> * Whether or not CPU throttling is enabled (if it can be
>>> easily
>>>>> determined)
>>>>>>>>> * RAM size
>>>>>>>>> * GPU identification (if any)
>>>>>>>>> * Benchmark unique name
>>>>>>>>> * Programming language(s) associated with benchmark (e.g. 
>>>>>>>>> a
>>>>> benchmark
>>>>>>>>> may involve both C++ and Python)
>>>>>>>>> * Benchmark time, plus mean and standard deviation if
>>> available,
>>>>> else NULL
>>>>>>>>>
>>>>>>>>> (maybe some other things)
>>>>>>>>>
>>>>>>>>> I would rather not be locked into the internal database
>>> schema of a
>>>>>>>>> particular benchmarking tool. So people in the community 
>>>>>>>>> can
>>> just
>>>>> run
>>>>>>>>> SQL queries against the database and use the data however 
>>>>>>>>> they
>>>>> like.
>>>>>>>>> We'll just have to be careful that people don't DROP 
>>>>>>>>> TABLE or
>>>>> DELETE
>>>>>>>>> (but we should have daily backups so we can recover from 
>>>>>>>>> such
>>>>> cases)
>>>>>>>>>
>>>>>>>>> So while we may make use of TeamCity to schedule the runs 
>>>>>>>>> on
>>> the
>>>>> cloud
>>>>>>>>> and physical hardware, we should also provide a path for 
>>>>>>>>> other
>>>>> people
>>>>>>>>> in the community to add data to the benchmark database on
>>> their
>>>>>>>>> hardware on an ad hoc basis. For example, I have several
>>> machines
>>>>> in
>>>>>>>>> my home on all operating systems (Windows / macOS / 
>>>>>>>>> Linux,
>>> and soon
>>>>>>>>> also ARM64) and I'd like to set up scheduled tasks / cron
>>> jobs to
>>>>>>>>> report in to the database at least on a daily basis.
>>>>>>>>>
>>>>>>>>> Ideally the benchmark database would just be a PostgreSQL
>>> server
>>>>> with
>>>>>>>>> a schema we write down and keep backed up etc. Hosted
>>> PostgreSQL is
>>>>>>>>> inexpensive ($200+ per year depending on size of 
>>>>>>>>> instance;
>>> this
>>>>>>>>> probably doesn't need to be a crazy big machine)
>>>>>>>>>
>>>>>>>>> I suspect there will be a manageable amount of 
>>>>>>>>> development
>>>>> involved to
>>>>>>>>> glue each of the benchmarking frameworks together with 
>>>>>>>>> the
>>>>> benchmark
>>>>>>>>> database. This can also handle querying the operating 
>>>>>>>>> system
>>> for
>>>>> the
>>>>>>>>> system information listed above
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> Wes
>>>>>>>>>
>>>>>>>>> On Fri, Jan 18, 2019 at 12:14 AM Melik-Adamyan, Areg 
>>>>>>>>> <ar...@intel.com> wrote:
>>>>>>>>>>
>>>>>>>>>> Hello,
>>>>>>>>>>
>>>>>>>>>> I want to restart/attach to the discussions for 
>>>>>>>>>> creating
>>> Arrow
>>>>>>>>> benchmarking dashboard. I want to propose performance
>>> benchmark
>>>>> run per
>>>>>>>>> commit to track the changes.
>>>>>>>>>> The proposal includes building infrastructure for 
>>>>>>>>>> per-commit
>>>>> tracking
>>>>>>>>> comprising of the following parts:
>>>>>>>>>> - Hosted JetBrains for OSS 
>>>>>>>>>> https://teamcity.jetbrains.com/
>>> as a
>>>>> build
>>>>>>>>> system
>>>>>>>>>> - Agents running in cloud both VM/container 
>>>>>>>>>> (DigitalOcean,
>>> or
>>>>> others)
>>>>>>>>> and bare-metal (Packet.net/AWS) and on-premise(Nvidia 
>>>>>>>>> boxes?)
>>>>>>>>>> - JFrog artifactory storage and management for OSS 
>>>>>>>>>> projects
>>>>>>>>> https://jfrog.com/open-source/#artifactory2
>>>>>>>>>> - Codespeed as a frontend
>>> https://github.com/tobami/codespeed
>>>>>>>>>>
>>>>>>>>>> I am volunteering to build such system (if needed more 
>>>>>>>>>> Intel
>>>>> folks will
>>>>>>>>> be involved) so we can start tracking performance on 
>>>>>>>>> various
>>>>> platforms and
>>>>>>>>> understand how changes affect it.
>>>>>>>>>>
>>>>>>>>>> Please, let me know your thoughts!
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> -Areg.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>
>>>
>>

Re: Benchmarking dashboard proposal

Posted by Tanya Schlusser <ta...@tickel.net>.
>
> Side question: is it expected to be able to connect to the DB directly
> from the outside?  I don't have any clue about the possible security
> implications.


This is do-able by creating different database accounts. Also, Wes's
solution was to back up the database periodically (daily?) to protect
against accidents. The current setup has a root user (full permission),
`arrow_anonymous` user (select + insert only), and `arrow_admin` (select,
insert, update, delete).

On Wed, Feb 20, 2019 at 12:19 PM Antoine Pitrou <an...@python.org> wrote:

>
> Side question: is it expected to be able to connect to the DB directly
> from the outside?  I don't have any clue about the possible security
> implications.
>
> Regards
>
> Antoine.
>
>
>
> Le 20/02/2019 à 18:55, Melik-Adamyan, Areg a écrit :
> > There is a lot of discussion going in the PR ARROW-4313 itself; I would
> like to bring some of the high-level questions here to discuss. First of
> all many thanks to Tanya for the work you are doing.
> > Related to the dashboard intrinsics, I would like to set some scope and
> stick to that so we would not waste any job and get maximum efficiency from
> the work we are doing on the dashboard.
> > One thing that IMHO we are missing is against which requirements the
> work (DDL) is being done and in which scope? For me there are several
> things:
> > 1. We want continuous *validated* performance tracking against checkins
> to catch performance regressions and progressions. Validated means that the
> running environment is isolated enough so the stddev (assuming the
> distribution is normal) is as close to 0 as possible. It means both
> hardware and software should be fixed and not changeable to have only one
> variable to measure.
> > 2. The unit-tests framework (google/benchmark) allows to effectively
> report in textual format the needed data on benchmark with preamble
> containing information about the machine on which the benchmarks are run.
> > 3. So with environments set and regular runs you have all the artifacts,
> though not in a very comprehensible format. So the reason to set a
> dashboard is to allow to consume data and be able to track performance of
> various parts on a historical perspective and much more nicely with
> visualizations.
> > And here are the scope restrictions I have in mind:
> > - Disallow to enter data to the central repo any single benchmarks run,
> as they do not mean much in the case of continuous and statistically
> relevant measurements. What information you will get if someone reports
> some single run? You do not know how clean it was done, and more
> importantly is it possible to reproduce elsewhere. That is why even if it
> is better, worse or the same you cannot compare with the data already in
> the DB.
> > - Mandate the contributors to have dedicated environment for
> measurements. Otherwise they can use the TeamCity to run and parse data and
> publish on their site. Data that enters Arrow performance DB becomes Arrow
> community owned data. And it becomes community's job to answer why certain
> things are better or worse.
> > -  Because the numbers and flavors for CPU/GPU/accelerators are huge we
> cannot satisfy all the needs upfront and create DB that covers all the
> possible variants. I think we should have simple CPU and GPU configs now,
> even if they will not be perfect. By simple I mean basic brand string. That
> should be enough. Having all the detailed info in the DB does not make
> sense, as my experience is telling, you never use them, you use the
> CPUID/brandname to get the info needed.
> > - Scope and reqs will change during the time and going huge now will
> make things complicated later. So I think it will be beneficial to have
> something quick up and running, get better understanding of our needs and
> gaps, and go from there.
> > The needed infra is already up on AWS, so as soon as we resolve DNS and
> key exchange issues we can launch.
> >
> > -Areg.
> >
> > -----Original Message-----
> > From: Tanya Schlusser [mailto:tanya@tickel.net]
> > Sent: Thursday, February 7, 2019 4:40 PM
> > To: dev@arrow.apache.org
> > Subject: Re: Benchmarking dashboard proposal
> >
> > Late, but there's a PR now with first-draft DDL (
> https://github.com/apache/arrow/pull/3586).
> > Happy to receive any feedback!
> >
> > I tried to think about how people would submit benchmarks, and added a
> Postgraphile container for http-via-GraphQL.
> > If others have strong opinions on the data modeling please speak up
> because I'm more a database user than a designer.
> >
> > I can also help with benchmarking work in R/Python given guidance/a
> roadmap/examples from someone else.
> >
> > Best,
> > Tanya
> >
> > On Mon, Feb 4, 2019 at 12:37 PM Tanya Schlusser <ta...@tickel.net>
> wrote:
> >
> >> I hope to make a PR with the DDL by tomorrow or Wednesday night—DDL
> >> along with a README in a new directory `arrow/dev/benchmarking` unless
> >> directed otherwise.
> >>
> >> A "C++ Benchmark Collector" script would be super. I expect some
> >> back-and-forth on this to identify naïve assumptions in the data model.
> >>
> >> Attempting to submit actual benchmarks is how to get a handle on that.
> >> I recognize I'm blocking downstream work. Better to get an initial PR
> >> and some discussion going.
> >>
> >> Best,
> >> Tanya
> >>
> >> On Mon, Feb 4, 2019 at 10:10 AM Wes McKinney <we...@gmail.com>
> wrote:
> >>
> >>> hi folks,
> >>>
> >>> I'm curious where we currently stand on this project. I see the
> >>> discussion in https://issues.apache.org/jira/browse/ARROW-4313 --
> >>> would the next step be to have a pull request with .sql files
> >>> containing the DDL required to create the schema in PostgreSQL?
> >>>
> >>> I could volunteer to write the "C++ Benchmark Collector" script that
> >>> will run all the benchmarks on Linux and collect their data to be
> >>> inserted into the database.
> >>>
> >>> Thanks
> >>> Wes
> >>>
> >>> On Sun, Jan 27, 2019 at 12:20 AM Tanya Schlusser <ta...@tickel.net>
> >>> wrote:
> >>>>
> >>>> I don't want to be the bottleneck and have posted an initial draft
> >>>> data model in the JIRA issue
> >>> https://issues.apache.org/jira/browse/ARROW-4313
> >>>>
> >>>> It should not be a problem to get content into a form that would be
> >>>> acceptable for either a static site like ASV (via CORS queries to a
> >>>> GraphQL/REST interface) or a codespeed-style site (via a separate
> >>>> schema organized for Django)
> >>>>
> >>>> I don't think I'm experienced enough to actually write any
> >>>> benchmarks though, so all I can contribute is backend work for this
> task.
> >>>>
> >>>> Best,
> >>>> Tanya
> >>>>
> >>>> On Sat, Jan 26, 2019 at 7:37 PM Wes McKinney <we...@gmail.com>
> >>> wrote:
> >>>>
> >>>>> hi folks,
> >>>>>
> >>>>> I'd like to propose some kind of timeline for getting a first
> >>>>> iteration of a benchmark database developed and live, with
> >>>>> scripts to enable one or more initial agents to start adding new
> >>>>> data on a daily / per-commit basis. I have at least 3 physical
> >>>>> machines where I could immediately set up cron jobs to start
> >>>>> adding new data, and I could attempt to backfill data as far back as
> possible.
> >>>>>
> >>>>> Personally, I would like to see this done by the end of February
> >>>>> if not sooner -- if we don't have the volunteers to push the work
> >>>>> to completion by then please let me know as I will rearrange my
> >>>>> priorities to make sure that it happens. Does that sounds reasonable?
> >>>>>
> >>>>> Please let me know if this plan sounds reasonable:
> >>>>>
> >>>>> * Set up a hosted PostgreSQL instance, configure backups
> >>>>> * Propose and adopt a database schema for storing benchmark
> >>>>> results
> >>>>> * For C++, write script (or Dockerfile) to execute all
> >>>>> google-benchmarks, output results to JSON, then adapter script
> >>>>> (Python) to ingest into database
> >>>>> * For Python, similar script that invokes ASV, then inserts ASV
> >>>>> results into benchmark database
> >>>>>
> >>>>> This seems to be a pre-requisite for having a front-end to
> >>>>> visualize the results, but the dashboard/front end can hopefully
> >>>>> be implemented in such a way that the details of the benchmark
> >>>>> database are not too tightly coupled
> >>>>>
> >>>>> (Do we have any other benchmarks in the project that would need
> >>>>> to be inserted initially?)
> >>>>>
> >>>>> Related work to trigger benchmarks on agents when new commits
> >>>>> land in master can happen concurrently -- one task need not block
> >>>>> the other
> >>>>>
> >>>>> Thanks
> >>>>> Wes
> >>>>>
> >>>>> On Mon, Jan 21, 2019 at 11:14 AM Wes McKinney
> >>>>> <we...@gmail.com>
> >>> wrote:
> >>>>>>
> >>>>>> Sorry, copy-paste failure:
> >>>>> https://issues.apache.org/jira/browse/ARROW-4313
> >>>>>>
> >>>>>> On Mon, Jan 21, 2019 at 11:14 AM Wes McKinney
> >>>>>> <we...@gmail.com>
> >>>>> wrote:
> >>>>>>>
> >>>>>>> I don't think there is one but I just created
> >>>>>>>
> >>>>>
> >>> https://lists.apache.org/thread.html/278e573445c83bbd8ee66474b9356c52
> >>> 91a16f6b6eca11dbbe4b473a@%3Cdev.arrow.apache.org%3E
> >>>>>>>
> >>>>>>> On Mon, Jan 21, 2019 at 10:35 AM Tanya Schlusser <
> >>> tanya@tickel.net>
> >>>>> wrote:
> >>>>>>>>
> >>>>>>>> Areg,
> >>>>>>>>
> >>>>>>>> If you'd like help, I volunteer! No experience benchmarking
> >>>>>>>> but
> >>> tons
> >>>>>>>> experience databasing—I can mock the backend (database +
> >>>>>>>> http)
> >>> as a
> >>>>>>>> starting point for discussion if this is the way people
> >>>>>>>> want to
> >>> go.
> >>>>>>>>
> >>>>>>>> Is there a Jira ticket for this that i can jump into?
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Sun, Jan 20, 2019 at 3:24 PM Wes McKinney <
> >>> wesmckinn@gmail.com>
> >>>>> wrote:
> >>>>>>>>
> >>>>>>>>> hi Areg,
> >>>>>>>>>
> >>>>>>>>> This sounds great -- we've discussed building a more
> >>> full-featured
> >>>>>>>>> benchmark automation system in the past but nothing has
> >>>>>>>>> been
> >>>>> developed
> >>>>>>>>> yet.
> >>>>>>>>>
> >>>>>>>>> Your proposal about the details sounds OK; the single
> >>>>>>>>> most
> >>>>> important
> >>>>>>>>> thing to me is that we build and maintain a very general
> >>> purpose
> >>>>>>>>> database schema for building the historical benchmark
> >>>>>>>>> database
> >>>>>>>>>
> >>>>>>>>> The benchmark database should keep track of:
> >>>>>>>>>
> >>>>>>>>> * Timestamp of benchmark run
> >>>>>>>>> * Git commit hash of codebase
> >>>>>>>>> * Machine unique name (sort of the "user id")
> >>>>>>>>> * CPU identification for machine, and clock frequency (in
> >>> case of
> >>>>>>>>> overclocking)
> >>>>>>>>> * CPU cache sizes (L1/L2/L3)
> >>>>>>>>> * Whether or not CPU throttling is enabled (if it can be
> >>> easily
> >>>>> determined)
> >>>>>>>>> * RAM size
> >>>>>>>>> * GPU identification (if any)
> >>>>>>>>> * Benchmark unique name
> >>>>>>>>> * Programming language(s) associated with benchmark (e.g.
> >>>>>>>>> a
> >>>>> benchmark
> >>>>>>>>> may involve both C++ and Python)
> >>>>>>>>> * Benchmark time, plus mean and standard deviation if
> >>> available,
> >>>>> else NULL
> >>>>>>>>>
> >>>>>>>>> (maybe some other things)
> >>>>>>>>>
> >>>>>>>>> I would rather not be locked into the internal database
> >>> schema of a
> >>>>>>>>> particular benchmarking tool. So people in the community
> >>>>>>>>> can
> >>> just
> >>>>> run
> >>>>>>>>> SQL queries against the database and use the data however
> >>>>>>>>> they
> >>>>> like.
> >>>>>>>>> We'll just have to be careful that people don't DROP
> >>>>>>>>> TABLE or
> >>>>> DELETE
> >>>>>>>>> (but we should have daily backups so we can recover from
> >>>>>>>>> such
> >>>>> cases)
> >>>>>>>>>
> >>>>>>>>> So while we may make use of TeamCity to schedule the runs
> >>>>>>>>> on
> >>> the
> >>>>> cloud
> >>>>>>>>> and physical hardware, we should also provide a path for
> >>>>>>>>> other
> >>>>> people
> >>>>>>>>> in the community to add data to the benchmark database on
> >>> their
> >>>>>>>>> hardware on an ad hoc basis. For example, I have several
> >>> machines
> >>>>> in
> >>>>>>>>> my home on all operating systems (Windows / macOS /
> >>>>>>>>> Linux,
> >>> and soon
> >>>>>>>>> also ARM64) and I'd like to set up scheduled tasks / cron
> >>> jobs to
> >>>>>>>>> report in to the database at least on a daily basis.
> >>>>>>>>>
> >>>>>>>>> Ideally the benchmark database would just be a PostgreSQL
> >>> server
> >>>>> with
> >>>>>>>>> a schema we write down and keep backed up etc. Hosted
> >>> PostgreSQL is
> >>>>>>>>> inexpensive ($200+ per year depending on size of
> >>>>>>>>> instance;
> >>> this
> >>>>>>>>> probably doesn't need to be a crazy big machine)
> >>>>>>>>>
> >>>>>>>>> I suspect there will be a manageable amount of
> >>>>>>>>> development
> >>>>> involved to
> >>>>>>>>> glue each of the benchmarking frameworks together with
> >>>>>>>>> the
> >>>>> benchmark
> >>>>>>>>> database. This can also handle querying the operating
> >>>>>>>>> system
> >>> for
> >>>>> the
> >>>>>>>>> system information listed above
> >>>>>>>>>
> >>>>>>>>> Thanks
> >>>>>>>>> Wes
> >>>>>>>>>
> >>>>>>>>> On Fri, Jan 18, 2019 at 12:14 AM Melik-Adamyan, Areg
> >>>>>>>>> <ar...@intel.com> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Hello,
> >>>>>>>>>>
> >>>>>>>>>> I want to restart/attach to the discussions for
> >>>>>>>>>> creating
> >>> Arrow
> >>>>>>>>> benchmarking dashboard. I want to propose performance
> >>> benchmark
> >>>>> run per
> >>>>>>>>> commit to track the changes.
> >>>>>>>>>> The proposal includes building infrastructure for
> >>>>>>>>>> per-commit
> >>>>> tracking
> >>>>>>>>> comprising of the following parts:
> >>>>>>>>>> - Hosted JetBrains for OSS
> >>>>>>>>>> https://teamcity.jetbrains.com/
> >>> as a
> >>>>> build
> >>>>>>>>> system
> >>>>>>>>>> - Agents running in cloud both VM/container
> >>>>>>>>>> (DigitalOcean,
> >>> or
> >>>>> others)
> >>>>>>>>> and bare-metal (Packet.net/AWS) and on-premise(Nvidia
> >>>>>>>>> boxes?)
> >>>>>>>>>> - JFrog artifactory storage and management for OSS
> >>>>>>>>>> projects
> >>>>>>>>> https://jfrog.com/open-source/#artifactory2
> >>>>>>>>>> - Codespeed as a frontend
> >>> https://github.com/tobami/codespeed
> >>>>>>>>>>
> >>>>>>>>>> I am volunteering to build such system (if needed more
> >>>>>>>>>> Intel
> >>>>> folks will
> >>>>>>>>> be involved) so we can start tracking performance on
> >>>>>>>>> various
> >>>>> platforms and
> >>>>>>>>> understand how changes affect it.
> >>>>>>>>>>
> >>>>>>>>>> Please, let me know your thoughts!
> >>>>>>>>>>
> >>>>>>>>>> Thanks,
> >>>>>>>>>> -Areg.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>
> >>>
> >>
>

Re: Benchmarking dashboard proposal

Posted by Antoine Pitrou <an...@python.org>.
Side question: is it expected to be able to connect to the DB directly
from the outside?  I don't have any clue about the possible security
implications.

Regards

Antoine.



Le 20/02/2019 à 18:55, Melik-Adamyan, Areg a écrit :
> There is a lot of discussion going in the PR ARROW-4313 itself; I would like to bring some of the high-level questions here to discuss. First of all many thanks to Tanya for the work you are doing. 
> Related to the dashboard intrinsics, I would like to set some scope and stick to that so we would not waste any job and get maximum efficiency from the work we are doing on the dashboard.
> One thing that IMHO we are missing is against which requirements the work (DDL) is being done and in which scope? For me there are several things:
> 1. We want continuous *validated* performance tracking against checkins to catch performance regressions and progressions. Validated means that the running environment is isolated enough so the stddev (assuming the distribution is normal) is as close to 0 as possible. It means both hardware and software should be fixed and not changeable to have only one variable to measure.
> 2. The unit-tests framework (google/benchmark) allows to effectively report in textual format the needed data on benchmark with preamble containing information about the machine on which the benchmarks are run.
> 3. So with environments set and regular runs you have all the artifacts, though not in a very comprehensible format. So the reason to set a dashboard is to allow to consume data and be able to track performance of various parts on a historical perspective and much more nicely with visualizations. 
> And here are the scope restrictions I have in mind:
> - Disallow to enter data to the central repo any single benchmarks run, as they do not mean much in the case of continuous and statistically relevant measurements. What information you will get if someone reports some single run? You do not know how clean it was done, and more importantly is it possible to reproduce elsewhere. That is why even if it is better, worse or the same you cannot compare with the data already in the DB.
> - Mandate the contributors to have dedicated environment for measurements. Otherwise they can use the TeamCity to run and parse data and publish on their site. Data that enters Arrow performance DB becomes Arrow community owned data. And it becomes community's job to answer why certain things are better or worse.
> -  Because the numbers and flavors for CPU/GPU/accelerators are huge we cannot satisfy all the needs upfront and create DB that covers all the possible variants. I think we should have simple CPU and GPU configs now, even if they will not be perfect. By simple I mean basic brand string. That should be enough. Having all the detailed info in the DB does not make sense, as my experience is telling, you never use them, you use the CPUID/brandname to get the info needed.
> - Scope and reqs will change during the time and going huge now will make things complicated later. So I think it will be beneficial to have something quick up and running, get better understanding of our needs and gaps, and go from there. 
> The needed infra is already up on AWS, so as soon as we resolve DNS and key exchange issues we can launch.
> 
> -Areg.
> 
> -----Original Message-----
> From: Tanya Schlusser [mailto:tanya@tickel.net] 
> Sent: Thursday, February 7, 2019 4:40 PM
> To: dev@arrow.apache.org
> Subject: Re: Benchmarking dashboard proposal
> 
> Late, but there's a PR now with first-draft DDL ( https://github.com/apache/arrow/pull/3586).
> Happy to receive any feedback!
> 
> I tried to think about how people would submit benchmarks, and added a Postgraphile container for http-via-GraphQL.
> If others have strong opinions on the data modeling please speak up because I'm more a database user than a designer.
> 
> I can also help with benchmarking work in R/Python given guidance/a roadmap/examples from someone else.
> 
> Best,
> Tanya
> 
> On Mon, Feb 4, 2019 at 12:37 PM Tanya Schlusser <ta...@tickel.net> wrote:
> 
>> I hope to make a PR with the DDL by tomorrow or Wednesday night—DDL 
>> along with a README in a new directory `arrow/dev/benchmarking` unless 
>> directed otherwise.
>>
>> A "C++ Benchmark Collector" script would be super. I expect some 
>> back-and-forth on this to identify naïve assumptions in the data model.
>>
>> Attempting to submit actual benchmarks is how to get a handle on that. 
>> I recognize I'm blocking downstream work. Better to get an initial PR 
>> and some discussion going.
>>
>> Best,
>> Tanya
>>
>> On Mon, Feb 4, 2019 at 10:10 AM Wes McKinney <we...@gmail.com> wrote:
>>
>>> hi folks,
>>>
>>> I'm curious where we currently stand on this project. I see the 
>>> discussion in https://issues.apache.org/jira/browse/ARROW-4313 -- 
>>> would the next step be to have a pull request with .sql files 
>>> containing the DDL required to create the schema in PostgreSQL?
>>>
>>> I could volunteer to write the "C++ Benchmark Collector" script that 
>>> will run all the benchmarks on Linux and collect their data to be 
>>> inserted into the database.
>>>
>>> Thanks
>>> Wes
>>>
>>> On Sun, Jan 27, 2019 at 12:20 AM Tanya Schlusser <ta...@tickel.net>
>>> wrote:
>>>>
>>>> I don't want to be the bottleneck and have posted an initial draft 
>>>> data model in the JIRA issue
>>> https://issues.apache.org/jira/browse/ARROW-4313
>>>>
>>>> It should not be a problem to get content into a form that would be 
>>>> acceptable for either a static site like ASV (via CORS queries to a 
>>>> GraphQL/REST interface) or a codespeed-style site (via a separate 
>>>> schema organized for Django)
>>>>
>>>> I don't think I'm experienced enough to actually write any 
>>>> benchmarks though, so all I can contribute is backend work for this task.
>>>>
>>>> Best,
>>>> Tanya
>>>>
>>>> On Sat, Jan 26, 2019 at 7:37 PM Wes McKinney <we...@gmail.com>
>>> wrote:
>>>>
>>>>> hi folks,
>>>>>
>>>>> I'd like to propose some kind of timeline for getting a first 
>>>>> iteration of a benchmark database developed and live, with 
>>>>> scripts to enable one or more initial agents to start adding new 
>>>>> data on a daily / per-commit basis. I have at least 3 physical 
>>>>> machines where I could immediately set up cron jobs to start 
>>>>> adding new data, and I could attempt to backfill data as far back as possible.
>>>>>
>>>>> Personally, I would like to see this done by the end of February 
>>>>> if not sooner -- if we don't have the volunteers to push the work 
>>>>> to completion by then please let me know as I will rearrange my 
>>>>> priorities to make sure that it happens. Does that sounds reasonable?
>>>>>
>>>>> Please let me know if this plan sounds reasonable:
>>>>>
>>>>> * Set up a hosted PostgreSQL instance, configure backups
>>>>> * Propose and adopt a database schema for storing benchmark 
>>>>> results
>>>>> * For C++, write script (or Dockerfile) to execute all 
>>>>> google-benchmarks, output results to JSON, then adapter script
>>>>> (Python) to ingest into database
>>>>> * For Python, similar script that invokes ASV, then inserts ASV 
>>>>> results into benchmark database
>>>>>
>>>>> This seems to be a pre-requisite for having a front-end to 
>>>>> visualize the results, but the dashboard/front end can hopefully 
>>>>> be implemented in such a way that the details of the benchmark 
>>>>> database are not too tightly coupled
>>>>>
>>>>> (Do we have any other benchmarks in the project that would need 
>>>>> to be inserted initially?)
>>>>>
>>>>> Related work to trigger benchmarks on agents when new commits 
>>>>> land in master can happen concurrently -- one task need not block 
>>>>> the other
>>>>>
>>>>> Thanks
>>>>> Wes
>>>>>
>>>>> On Mon, Jan 21, 2019 at 11:14 AM Wes McKinney 
>>>>> <we...@gmail.com>
>>> wrote:
>>>>>>
>>>>>> Sorry, copy-paste failure:
>>>>> https://issues.apache.org/jira/browse/ARROW-4313
>>>>>>
>>>>>> On Mon, Jan 21, 2019 at 11:14 AM Wes McKinney 
>>>>>> <we...@gmail.com>
>>>>> wrote:
>>>>>>>
>>>>>>> I don't think there is one but I just created
>>>>>>>
>>>>>
>>> https://lists.apache.org/thread.html/278e573445c83bbd8ee66474b9356c52
>>> 91a16f6b6eca11dbbe4b473a@%3Cdev.arrow.apache.org%3E
>>>>>>>
>>>>>>> On Mon, Jan 21, 2019 at 10:35 AM Tanya Schlusser <
>>> tanya@tickel.net>
>>>>> wrote:
>>>>>>>>
>>>>>>>> Areg,
>>>>>>>>
>>>>>>>> If you'd like help, I volunteer! No experience benchmarking 
>>>>>>>> but
>>> tons
>>>>>>>> experience databasing—I can mock the backend (database + 
>>>>>>>> http)
>>> as a
>>>>>>>> starting point for discussion if this is the way people 
>>>>>>>> want to
>>> go.
>>>>>>>>
>>>>>>>> Is there a Jira ticket for this that i can jump into?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sun, Jan 20, 2019 at 3:24 PM Wes McKinney <
>>> wesmckinn@gmail.com>
>>>>> wrote:
>>>>>>>>
>>>>>>>>> hi Areg,
>>>>>>>>>
>>>>>>>>> This sounds great -- we've discussed building a more
>>> full-featured
>>>>>>>>> benchmark automation system in the past but nothing has 
>>>>>>>>> been
>>>>> developed
>>>>>>>>> yet.
>>>>>>>>>
>>>>>>>>> Your proposal about the details sounds OK; the single 
>>>>>>>>> most
>>>>> important
>>>>>>>>> thing to me is that we build and maintain a very general
>>> purpose
>>>>>>>>> database schema for building the historical benchmark 
>>>>>>>>> database
>>>>>>>>>
>>>>>>>>> The benchmark database should keep track of:
>>>>>>>>>
>>>>>>>>> * Timestamp of benchmark run
>>>>>>>>> * Git commit hash of codebase
>>>>>>>>> * Machine unique name (sort of the "user id")
>>>>>>>>> * CPU identification for machine, and clock frequency (in
>>> case of
>>>>>>>>> overclocking)
>>>>>>>>> * CPU cache sizes (L1/L2/L3)
>>>>>>>>> * Whether or not CPU throttling is enabled (if it can be
>>> easily
>>>>> determined)
>>>>>>>>> * RAM size
>>>>>>>>> * GPU identification (if any)
>>>>>>>>> * Benchmark unique name
>>>>>>>>> * Programming language(s) associated with benchmark (e.g. 
>>>>>>>>> a
>>>>> benchmark
>>>>>>>>> may involve both C++ and Python)
>>>>>>>>> * Benchmark time, plus mean and standard deviation if
>>> available,
>>>>> else NULL
>>>>>>>>>
>>>>>>>>> (maybe some other things)
>>>>>>>>>
>>>>>>>>> I would rather not be locked into the internal database
>>> schema of a
>>>>>>>>> particular benchmarking tool. So people in the community 
>>>>>>>>> can
>>> just
>>>>> run
>>>>>>>>> SQL queries against the database and use the data however 
>>>>>>>>> they
>>>>> like.
>>>>>>>>> We'll just have to be careful that people don't DROP 
>>>>>>>>> TABLE or
>>>>> DELETE
>>>>>>>>> (but we should have daily backups so we can recover from 
>>>>>>>>> such
>>>>> cases)
>>>>>>>>>
>>>>>>>>> So while we may make use of TeamCity to schedule the runs 
>>>>>>>>> on
>>> the
>>>>> cloud
>>>>>>>>> and physical hardware, we should also provide a path for 
>>>>>>>>> other
>>>>> people
>>>>>>>>> in the community to add data to the benchmark database on
>>> their
>>>>>>>>> hardware on an ad hoc basis. For example, I have several
>>> machines
>>>>> in
>>>>>>>>> my home on all operating systems (Windows / macOS / 
>>>>>>>>> Linux,
>>> and soon
>>>>>>>>> also ARM64) and I'd like to set up scheduled tasks / cron
>>> jobs to
>>>>>>>>> report in to the database at least on a daily basis.
>>>>>>>>>
>>>>>>>>> Ideally the benchmark database would just be a PostgreSQL
>>> server
>>>>> with
>>>>>>>>> a schema we write down and keep backed up etc. Hosted
>>> PostgreSQL is
>>>>>>>>> inexpensive ($200+ per year depending on size of 
>>>>>>>>> instance;
>>> this
>>>>>>>>> probably doesn't need to be a crazy big machine)
>>>>>>>>>
>>>>>>>>> I suspect there will be a manageable amount of 
>>>>>>>>> development
>>>>> involved to
>>>>>>>>> glue each of the benchmarking frameworks together with 
>>>>>>>>> the
>>>>> benchmark
>>>>>>>>> database. This can also handle querying the operating 
>>>>>>>>> system
>>> for
>>>>> the
>>>>>>>>> system information listed above
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> Wes
>>>>>>>>>
>>>>>>>>> On Fri, Jan 18, 2019 at 12:14 AM Melik-Adamyan, Areg 
>>>>>>>>> <ar...@intel.com> wrote:
>>>>>>>>>>
>>>>>>>>>> Hello,
>>>>>>>>>>
>>>>>>>>>> I want to restart/attach to the discussions for 
>>>>>>>>>> creating
>>> Arrow
>>>>>>>>> benchmarking dashboard. I want to propose performance
>>> benchmark
>>>>> run per
>>>>>>>>> commit to track the changes.
>>>>>>>>>> The proposal includes building infrastructure for 
>>>>>>>>>> per-commit
>>>>> tracking
>>>>>>>>> comprising of the following parts:
>>>>>>>>>> - Hosted JetBrains for OSS 
>>>>>>>>>> https://teamcity.jetbrains.com/
>>> as a
>>>>> build
>>>>>>>>> system
>>>>>>>>>> - Agents running in cloud both VM/container 
>>>>>>>>>> (DigitalOcean,
>>> or
>>>>> others)
>>>>>>>>> and bare-metal (Packet.net/AWS) and on-premise(Nvidia 
>>>>>>>>> boxes?)
>>>>>>>>>> - JFrog artifactory storage and management for OSS 
>>>>>>>>>> projects
>>>>>>>>> https://jfrog.com/open-source/#artifactory2
>>>>>>>>>> - Codespeed as a frontend
>>> https://github.com/tobami/codespeed
>>>>>>>>>>
>>>>>>>>>> I am volunteering to build such system (if needed more 
>>>>>>>>>> Intel
>>>>> folks will
>>>>>>>>> be involved) so we can start tracking performance on 
>>>>>>>>> various
>>>>> platforms and
>>>>>>>>> understand how changes affect it.
>>>>>>>>>>
>>>>>>>>>> Please, let me know your thoughts!
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> -Areg.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>
>>>
>>

RE: Benchmarking dashboard proposal

Posted by "Melik-Adamyan, Areg" <ar...@intel.com>.
There is a lot of discussion going in the PR ARROW-4313 itself; I would like to bring some of the high-level questions here to discuss. First of all many thanks to Tanya for the work you are doing. 
Related to the dashboard intrinsics, I would like to set some scope and stick to that so we would not waste any job and get maximum efficiency from the work we are doing on the dashboard.
One thing that IMHO we are missing is against which requirements the work (DDL) is being done and in which scope? For me there are several things:
1. We want continuous *validated* performance tracking against checkins to catch performance regressions and progressions. Validated means that the running environment is isolated enough so the stddev (assuming the distribution is normal) is as close to 0 as possible. It means both hardware and software should be fixed and not changeable to have only one variable to measure.
2. The unit-tests framework (google/benchmark) allows to effectively report in textual format the needed data on benchmark with preamble containing information about the machine on which the benchmarks are run.
3. So with environments set and regular runs you have all the artifacts, though not in a very comprehensible format. So the reason to set a dashboard is to allow to consume data and be able to track performance of various parts on a historical perspective and much more nicely with visualizations. 
And here are the scope restrictions I have in mind:
- Disallow to enter data to the central repo any single benchmarks run, as they do not mean much in the case of continuous and statistically relevant measurements. What information you will get if someone reports some single run? You do not know how clean it was done, and more importantly is it possible to reproduce elsewhere. That is why even if it is better, worse or the same you cannot compare with the data already in the DB.
- Mandate the contributors to have dedicated environment for measurements. Otherwise they can use the TeamCity to run and parse data and publish on their site. Data that enters Arrow performance DB becomes Arrow community owned data. And it becomes community's job to answer why certain things are better or worse.
-  Because the numbers and flavors for CPU/GPU/accelerators are huge we cannot satisfy all the needs upfront and create DB that covers all the possible variants. I think we should have simple CPU and GPU configs now, even if they will not be perfect. By simple I mean basic brand string. That should be enough. Having all the detailed info in the DB does not make sense, as my experience is telling, you never use them, you use the CPUID/brandname to get the info needed.
- Scope and reqs will change during the time and going huge now will make things complicated later. So I think it will be beneficial to have something quick up and running, get better understanding of our needs and gaps, and go from there. 
The needed infra is already up on AWS, so as soon as we resolve DNS and key exchange issues we can launch.

-Areg.

-----Original Message-----
From: Tanya Schlusser [mailto:tanya@tickel.net] 
Sent: Thursday, February 7, 2019 4:40 PM
To: dev@arrow.apache.org
Subject: Re: Benchmarking dashboard proposal

Late, but there's a PR now with first-draft DDL ( https://github.com/apache/arrow/pull/3586).
Happy to receive any feedback!

I tried to think about how people would submit benchmarks, and added a Postgraphile container for http-via-GraphQL.
If others have strong opinions on the data modeling please speak up because I'm more a database user than a designer.

I can also help with benchmarking work in R/Python given guidance/a roadmap/examples from someone else.

Best,
Tanya

On Mon, Feb 4, 2019 at 12:37 PM Tanya Schlusser <ta...@tickel.net> wrote:

> I hope to make a PR with the DDL by tomorrow or Wednesday night—DDL 
> along with a README in a new directory `arrow/dev/benchmarking` unless 
> directed otherwise.
>
> A "C++ Benchmark Collector" script would be super. I expect some 
> back-and-forth on this to identify naïve assumptions in the data model.
>
> Attempting to submit actual benchmarks is how to get a handle on that. 
> I recognize I'm blocking downstream work. Better to get an initial PR 
> and some discussion going.
>
> Best,
> Tanya
>
> On Mon, Feb 4, 2019 at 10:10 AM Wes McKinney <we...@gmail.com> wrote:
>
>> hi folks,
>>
>> I'm curious where we currently stand on this project. I see the 
>> discussion in https://issues.apache.org/jira/browse/ARROW-4313 -- 
>> would the next step be to have a pull request with .sql files 
>> containing the DDL required to create the schema in PostgreSQL?
>>
>> I could volunteer to write the "C++ Benchmark Collector" script that 
>> will run all the benchmarks on Linux and collect their data to be 
>> inserted into the database.
>>
>> Thanks
>> Wes
>>
>> On Sun, Jan 27, 2019 at 12:20 AM Tanya Schlusser <ta...@tickel.net>
>> wrote:
>> >
>> > I don't want to be the bottleneck and have posted an initial draft 
>> > data model in the JIRA issue
>> https://issues.apache.org/jira/browse/ARROW-4313
>> >
>> > It should not be a problem to get content into a form that would be 
>> > acceptable for either a static site like ASV (via CORS queries to a 
>> > GraphQL/REST interface) or a codespeed-style site (via a separate 
>> > schema organized for Django)
>> >
>> > I don't think I'm experienced enough to actually write any 
>> > benchmarks though, so all I can contribute is backend work for this task.
>> >
>> > Best,
>> > Tanya
>> >
>> > On Sat, Jan 26, 2019 at 7:37 PM Wes McKinney <we...@gmail.com>
>> wrote:
>> >
>> > > hi folks,
>> > >
>> > > I'd like to propose some kind of timeline for getting a first 
>> > > iteration of a benchmark database developed and live, with 
>> > > scripts to enable one or more initial agents to start adding new 
>> > > data on a daily / per-commit basis. I have at least 3 physical 
>> > > machines where I could immediately set up cron jobs to start 
>> > > adding new data, and I could attempt to backfill data as far back as possible.
>> > >
>> > > Personally, I would like to see this done by the end of February 
>> > > if not sooner -- if we don't have the volunteers to push the work 
>> > > to completion by then please let me know as I will rearrange my 
>> > > priorities to make sure that it happens. Does that sounds reasonable?
>> > >
>> > > Please let me know if this plan sounds reasonable:
>> > >
>> > > * Set up a hosted PostgreSQL instance, configure backups
>> > > * Propose and adopt a database schema for storing benchmark 
>> > > results
>> > > * For C++, write script (or Dockerfile) to execute all 
>> > > google-benchmarks, output results to JSON, then adapter script
>> > > (Python) to ingest into database
>> > > * For Python, similar script that invokes ASV, then inserts ASV 
>> > > results into benchmark database
>> > >
>> > > This seems to be a pre-requisite for having a front-end to 
>> > > visualize the results, but the dashboard/front end can hopefully 
>> > > be implemented in such a way that the details of the benchmark 
>> > > database are not too tightly coupled
>> > >
>> > > (Do we have any other benchmarks in the project that would need 
>> > > to be inserted initially?)
>> > >
>> > > Related work to trigger benchmarks on agents when new commits 
>> > > land in master can happen concurrently -- one task need not block 
>> > > the other
>> > >
>> > > Thanks
>> > > Wes
>> > >
>> > > On Mon, Jan 21, 2019 at 11:14 AM Wes McKinney 
>> > > <we...@gmail.com>
>> wrote:
>> > > >
>> > > > Sorry, copy-paste failure:
>> > > https://issues.apache.org/jira/browse/ARROW-4313
>> > > >
>> > > > On Mon, Jan 21, 2019 at 11:14 AM Wes McKinney 
>> > > > <we...@gmail.com>
>> > > wrote:
>> > > > >
>> > > > > I don't think there is one but I just created
>> > > > >
>> > >
>> https://lists.apache.org/thread.html/278e573445c83bbd8ee66474b9356c52
>> 91a16f6b6eca11dbbe4b473a@%3Cdev.arrow.apache.org%3E
>> > > > >
>> > > > > On Mon, Jan 21, 2019 at 10:35 AM Tanya Schlusser <
>> tanya@tickel.net>
>> > > wrote:
>> > > > > >
>> > > > > > Areg,
>> > > > > >
>> > > > > > If you'd like help, I volunteer! No experience benchmarking 
>> > > > > > but
>> tons
>> > > > > > experience databasing—I can mock the backend (database + 
>> > > > > > http)
>> as a
>> > > > > > starting point for discussion if this is the way people 
>> > > > > > want to
>> go.
>> > > > > >
>> > > > > > Is there a Jira ticket for this that i can jump into?
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > On Sun, Jan 20, 2019 at 3:24 PM Wes McKinney <
>> wesmckinn@gmail.com>
>> > > wrote:
>> > > > > >
>> > > > > > > hi Areg,
>> > > > > > >
>> > > > > > > This sounds great -- we've discussed building a more
>> full-featured
>> > > > > > > benchmark automation system in the past but nothing has 
>> > > > > > > been
>> > > developed
>> > > > > > > yet.
>> > > > > > >
>> > > > > > > Your proposal about the details sounds OK; the single 
>> > > > > > > most
>> > > important
>> > > > > > > thing to me is that we build and maintain a very general
>> purpose
>> > > > > > > database schema for building the historical benchmark 
>> > > > > > > database
>> > > > > > >
>> > > > > > > The benchmark database should keep track of:
>> > > > > > >
>> > > > > > > * Timestamp of benchmark run
>> > > > > > > * Git commit hash of codebase
>> > > > > > > * Machine unique name (sort of the "user id")
>> > > > > > > * CPU identification for machine, and clock frequency (in
>> case of
>> > > > > > > overclocking)
>> > > > > > > * CPU cache sizes (L1/L2/L3)
>> > > > > > > * Whether or not CPU throttling is enabled (if it can be
>> easily
>> > > determined)
>> > > > > > > * RAM size
>> > > > > > > * GPU identification (if any)
>> > > > > > > * Benchmark unique name
>> > > > > > > * Programming language(s) associated with benchmark (e.g. 
>> > > > > > > a
>> > > benchmark
>> > > > > > > may involve both C++ and Python)
>> > > > > > > * Benchmark time, plus mean and standard deviation if
>> available,
>> > > else NULL
>> > > > > > >
>> > > > > > > (maybe some other things)
>> > > > > > >
>> > > > > > > I would rather not be locked into the internal database
>> schema of a
>> > > > > > > particular benchmarking tool. So people in the community 
>> > > > > > > can
>> just
>> > > run
>> > > > > > > SQL queries against the database and use the data however 
>> > > > > > > they
>> > > like.
>> > > > > > > We'll just have to be careful that people don't DROP 
>> > > > > > > TABLE or
>> > > DELETE
>> > > > > > > (but we should have daily backups so we can recover from 
>> > > > > > > such
>> > > cases)
>> > > > > > >
>> > > > > > > So while we may make use of TeamCity to schedule the runs 
>> > > > > > > on
>> the
>> > > cloud
>> > > > > > > and physical hardware, we should also provide a path for 
>> > > > > > > other
>> > > people
>> > > > > > > in the community to add data to the benchmark database on
>> their
>> > > > > > > hardware on an ad hoc basis. For example, I have several
>> machines
>> > > in
>> > > > > > > my home on all operating systems (Windows / macOS / 
>> > > > > > > Linux,
>> and soon
>> > > > > > > also ARM64) and I'd like to set up scheduled tasks / cron
>> jobs to
>> > > > > > > report in to the database at least on a daily basis.
>> > > > > > >
>> > > > > > > Ideally the benchmark database would just be a PostgreSQL
>> server
>> > > with
>> > > > > > > a schema we write down and keep backed up etc. Hosted
>> PostgreSQL is
>> > > > > > > inexpensive ($200+ per year depending on size of 
>> > > > > > > instance;
>> this
>> > > > > > > probably doesn't need to be a crazy big machine)
>> > > > > > >
>> > > > > > > I suspect there will be a manageable amount of 
>> > > > > > > development
>> > > involved to
>> > > > > > > glue each of the benchmarking frameworks together with 
>> > > > > > > the
>> > > benchmark
>> > > > > > > database. This can also handle querying the operating 
>> > > > > > > system
>> for
>> > > the
>> > > > > > > system information listed above
>> > > > > > >
>> > > > > > > Thanks
>> > > > > > > Wes
>> > > > > > >
>> > > > > > > On Fri, Jan 18, 2019 at 12:14 AM Melik-Adamyan, Areg 
>> > > > > > > <ar...@intel.com> wrote:
>> > > > > > > >
>> > > > > > > > Hello,
>> > > > > > > >
>> > > > > > > > I want to restart/attach to the discussions for 
>> > > > > > > > creating
>> Arrow
>> > > > > > > benchmarking dashboard. I want to propose performance
>> benchmark
>> > > run per
>> > > > > > > commit to track the changes.
>> > > > > > > > The proposal includes building infrastructure for 
>> > > > > > > > per-commit
>> > > tracking
>> > > > > > > comprising of the following parts:
>> > > > > > > > - Hosted JetBrains for OSS 
>> > > > > > > > https://teamcity.jetbrains.com/
>> as a
>> > > build
>> > > > > > > system
>> > > > > > > > - Agents running in cloud both VM/container 
>> > > > > > > > (DigitalOcean,
>> or
>> > > others)
>> > > > > > > and bare-metal (Packet.net/AWS) and on-premise(Nvidia 
>> > > > > > > boxes?)
>> > > > > > > > - JFrog artifactory storage and management for OSS 
>> > > > > > > > projects
>> > > > > > > https://jfrog.com/open-source/#artifactory2
>> > > > > > > > - Codespeed as a frontend
>> https://github.com/tobami/codespeed
>> > > > > > > >
>> > > > > > > > I am volunteering to build such system (if needed more 
>> > > > > > > > Intel
>> > > folks will
>> > > > > > > be involved) so we can start tracking performance on 
>> > > > > > > various
>> > > platforms and
>> > > > > > > understand how changes affect it.
>> > > > > > > >
>> > > > > > > > Please, let me know your thoughts!
>> > > > > > > >
>> > > > > > > > Thanks,
>> > > > > > > > -Areg.
>> > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > >
>>
>

Re: Benchmarking dashboard proposal

Posted by Tanya Schlusser <ta...@tickel.net>.
Late, but there's a PR now with first-draft DDL (
https://github.com/apache/arrow/pull/3586).
Happy to receive any feedback!

I tried to think about how people would submit benchmarks, and added a
Postgraphile container for http-via-GraphQL.
If others have strong opinions on the data modeling please speak up because
I'm more a database user than a designer.

I can also help with benchmarking work in R/Python given guidance/a
roadmap/examples from someone else.

Best,
Tanya

On Mon, Feb 4, 2019 at 12:37 PM Tanya Schlusser <ta...@tickel.net> wrote:

> I hope to make a PR with the DDL by tomorrow or Wednesday night—DDL along
> with a README in a new directory `arrow/dev/benchmarking` unless directed
> otherwise.
>
> A "C++ Benchmark Collector" script would be super. I expect some
> back-and-forth on this to identify naïve assumptions in the data model.
>
> Attempting to submit actual benchmarks is how to get a handle on that. I
> recognize I'm blocking downstream work. Better to get an initial PR and
> some discussion going.
>
> Best,
> Tanya
>
> On Mon, Feb 4, 2019 at 10:10 AM Wes McKinney <we...@gmail.com> wrote:
>
>> hi folks,
>>
>> I'm curious where we currently stand on this project. I see the
>> discussion in https://issues.apache.org/jira/browse/ARROW-4313 --
>> would the next step be to have a pull request with .sql files
>> containing the DDL required to create the schema in PostgreSQL?
>>
>> I could volunteer to write the "C++ Benchmark Collector" script that
>> will run all the benchmarks on Linux and collect their data to be
>> inserted into the database.
>>
>> Thanks
>> Wes
>>
>> On Sun, Jan 27, 2019 at 12:20 AM Tanya Schlusser <ta...@tickel.net>
>> wrote:
>> >
>> > I don't want to be the bottleneck and have posted an initial draft data
>> > model in the JIRA issue
>> https://issues.apache.org/jira/browse/ARROW-4313
>> >
>> > It should not be a problem to get content into a form that would be
>> > acceptable for either a static site like ASV (via CORS queries to a
>> > GraphQL/REST interface) or a codespeed-style site (via a separate schema
>> > organized for Django)
>> >
>> > I don't think I'm experienced enough to actually write any benchmarks
>> > though, so all I can contribute is backend work for this task.
>> >
>> > Best,
>> > Tanya
>> >
>> > On Sat, Jan 26, 2019 at 7:37 PM Wes McKinney <we...@gmail.com>
>> wrote:
>> >
>> > > hi folks,
>> > >
>> > > I'd like to propose some kind of timeline for getting a first
>> > > iteration of a benchmark database developed and live, with scripts to
>> > > enable one or more initial agents to start adding new data on a daily
>> > > / per-commit basis. I have at least 3 physical machines where I could
>> > > immediately set up cron jobs to start adding new data, and I could
>> > > attempt to backfill data as far back as possible.
>> > >
>> > > Personally, I would like to see this done by the end of February if
>> > > not sooner -- if we don't have the volunteers to push the work to
>> > > completion by then please let me know as I will rearrange my
>> > > priorities to make sure that it happens. Does that sounds reasonable?
>> > >
>> > > Please let me know if this plan sounds reasonable:
>> > >
>> > > * Set up a hosted PostgreSQL instance, configure backups
>> > > * Propose and adopt a database schema for storing benchmark results
>> > > * For C++, write script (or Dockerfile) to execute all
>> > > google-benchmarks, output results to JSON, then adapter script
>> > > (Python) to ingest into database
>> > > * For Python, similar script that invokes ASV, then inserts ASV
>> > > results into benchmark database
>> > >
>> > > This seems to be a pre-requisite for having a front-end to visualize
>> > > the results, but the dashboard/front end can hopefully be implemented
>> > > in such a way that the details of the benchmark database are not too
>> > > tightly coupled
>> > >
>> > > (Do we have any other benchmarks in the project that would need to be
>> > > inserted initially?)
>> > >
>> > > Related work to trigger benchmarks on agents when new commits land in
>> > > master can happen concurrently -- one task need not block the other
>> > >
>> > > Thanks
>> > > Wes
>> > >
>> > > On Mon, Jan 21, 2019 at 11:14 AM Wes McKinney <we...@gmail.com>
>> wrote:
>> > > >
>> > > > Sorry, copy-paste failure:
>> > > https://issues.apache.org/jira/browse/ARROW-4313
>> > > >
>> > > > On Mon, Jan 21, 2019 at 11:14 AM Wes McKinney <we...@gmail.com>
>> > > wrote:
>> > > > >
>> > > > > I don't think there is one but I just created
>> > > > >
>> > >
>> https://lists.apache.org/thread.html/278e573445c83bbd8ee66474b9356c5291a16f6b6eca11dbbe4b473a@%3Cdev.arrow.apache.org%3E
>> > > > >
>> > > > > On Mon, Jan 21, 2019 at 10:35 AM Tanya Schlusser <
>> tanya@tickel.net>
>> > > wrote:
>> > > > > >
>> > > > > > Areg,
>> > > > > >
>> > > > > > If you'd like help, I volunteer! No experience benchmarking but
>> tons
>> > > > > > experience databasing—I can mock the backend (database + http)
>> as a
>> > > > > > starting point for discussion if this is the way people want to
>> go.
>> > > > > >
>> > > > > > Is there a Jira ticket for this that i can jump into?
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > On Sun, Jan 20, 2019 at 3:24 PM Wes McKinney <
>> wesmckinn@gmail.com>
>> > > wrote:
>> > > > > >
>> > > > > > > hi Areg,
>> > > > > > >
>> > > > > > > This sounds great -- we've discussed building a more
>> full-featured
>> > > > > > > benchmark automation system in the past but nothing has been
>> > > developed
>> > > > > > > yet.
>> > > > > > >
>> > > > > > > Your proposal about the details sounds OK; the single most
>> > > important
>> > > > > > > thing to me is that we build and maintain a very general
>> purpose
>> > > > > > > database schema for building the historical benchmark database
>> > > > > > >
>> > > > > > > The benchmark database should keep track of:
>> > > > > > >
>> > > > > > > * Timestamp of benchmark run
>> > > > > > > * Git commit hash of codebase
>> > > > > > > * Machine unique name (sort of the "user id")
>> > > > > > > * CPU identification for machine, and clock frequency (in
>> case of
>> > > > > > > overclocking)
>> > > > > > > * CPU cache sizes (L1/L2/L3)
>> > > > > > > * Whether or not CPU throttling is enabled (if it can be
>> easily
>> > > determined)
>> > > > > > > * RAM size
>> > > > > > > * GPU identification (if any)
>> > > > > > > * Benchmark unique name
>> > > > > > > * Programming language(s) associated with benchmark (e.g. a
>> > > benchmark
>> > > > > > > may involve both C++ and Python)
>> > > > > > > * Benchmark time, plus mean and standard deviation if
>> available,
>> > > else NULL
>> > > > > > >
>> > > > > > > (maybe some other things)
>> > > > > > >
>> > > > > > > I would rather not be locked into the internal database
>> schema of a
>> > > > > > > particular benchmarking tool. So people in the community can
>> just
>> > > run
>> > > > > > > SQL queries against the database and use the data however they
>> > > like.
>> > > > > > > We'll just have to be careful that people don't DROP TABLE or
>> > > DELETE
>> > > > > > > (but we should have daily backups so we can recover from such
>> > > cases)
>> > > > > > >
>> > > > > > > So while we may make use of TeamCity to schedule the runs on
>> the
>> > > cloud
>> > > > > > > and physical hardware, we should also provide a path for other
>> > > people
>> > > > > > > in the community to add data to the benchmark database on
>> their
>> > > > > > > hardware on an ad hoc basis. For example, I have several
>> machines
>> > > in
>> > > > > > > my home on all operating systems (Windows / macOS / Linux,
>> and soon
>> > > > > > > also ARM64) and I'd like to set up scheduled tasks / cron
>> jobs to
>> > > > > > > report in to the database at least on a daily basis.
>> > > > > > >
>> > > > > > > Ideally the benchmark database would just be a PostgreSQL
>> server
>> > > with
>> > > > > > > a schema we write down and keep backed up etc. Hosted
>> PostgreSQL is
>> > > > > > > inexpensive ($200+ per year depending on size of instance;
>> this
>> > > > > > > probably doesn't need to be a crazy big machine)
>> > > > > > >
>> > > > > > > I suspect there will be a manageable amount of development
>> > > involved to
>> > > > > > > glue each of the benchmarking frameworks together with the
>> > > benchmark
>> > > > > > > database. This can also handle querying the operating system
>> for
>> > > the
>> > > > > > > system information listed above
>> > > > > > >
>> > > > > > > Thanks
>> > > > > > > Wes
>> > > > > > >
>> > > > > > > On Fri, Jan 18, 2019 at 12:14 AM Melik-Adamyan, Areg
>> > > > > > > <ar...@intel.com> wrote:
>> > > > > > > >
>> > > > > > > > Hello,
>> > > > > > > >
>> > > > > > > > I want to restart/attach to the discussions for creating
>> Arrow
>> > > > > > > benchmarking dashboard. I want to propose performance
>> benchmark
>> > > run per
>> > > > > > > commit to track the changes.
>> > > > > > > > The proposal includes building infrastructure for per-commit
>> > > tracking
>> > > > > > > comprising of the following parts:
>> > > > > > > > - Hosted JetBrains for OSS https://teamcity.jetbrains.com/
>> as a
>> > > build
>> > > > > > > system
>> > > > > > > > - Agents running in cloud both VM/container (DigitalOcean,
>> or
>> > > others)
>> > > > > > > and bare-metal (Packet.net/AWS) and on-premise(Nvidia boxes?)
>> > > > > > > > - JFrog artifactory storage and management for OSS projects
>> > > > > > > https://jfrog.com/open-source/#artifactory2
>> > > > > > > > - Codespeed as a frontend
>> https://github.com/tobami/codespeed
>> > > > > > > >
>> > > > > > > > I am volunteering to build such system (if needed more Intel
>> > > folks will
>> > > > > > > be involved) so we can start tracking performance on various
>> > > platforms and
>> > > > > > > understand how changes affect it.
>> > > > > > > >
>> > > > > > > > Please, let me know your thoughts!
>> > > > > > > >
>> > > > > > > > Thanks,
>> > > > > > > > -Areg.
>> > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > >
>>
>

Re: Benchmarking dashboard proposal

Posted by Tanya Schlusser <ta...@tickel.net>.
I hope to make a PR with the DDL by tomorrow or Wednesday night—DDL along
with a README in a new directory `arrow/dev/benchmarking` unless directed
otherwise.

A "C++ Benchmark Collector" script would be super. I expect some
back-and-forth on this to identify naïve assumptions in the data model.

Attempting to submit actual benchmarks is how to get a handle on that. I
recognize I'm blocking downstream work. Better to get an initial PR and
some discussion going.

Best,
Tanya

On Mon, Feb 4, 2019 at 10:10 AM Wes McKinney <we...@gmail.com> wrote:

> hi folks,
>
> I'm curious where we currently stand on this project. I see the
> discussion in https://issues.apache.org/jira/browse/ARROW-4313 --
> would the next step be to have a pull request with .sql files
> containing the DDL required to create the schema in PostgreSQL?
>
> I could volunteer to write the "C++ Benchmark Collector" script that
> will run all the benchmarks on Linux and collect their data to be
> inserted into the database.
>
> Thanks
> Wes
>
> On Sun, Jan 27, 2019 at 12:20 AM Tanya Schlusser <ta...@tickel.net> wrote:
> >
> > I don't want to be the bottleneck and have posted an initial draft data
> > model in the JIRA issue https://issues.apache.org/jira/browse/ARROW-4313
> >
> > It should not be a problem to get content into a form that would be
> > acceptable for either a static site like ASV (via CORS queries to a
> > GraphQL/REST interface) or a codespeed-style site (via a separate schema
> > organized for Django)
> >
> > I don't think I'm experienced enough to actually write any benchmarks
> > though, so all I can contribute is backend work for this task.
> >
> > Best,
> > Tanya
> >
> > On Sat, Jan 26, 2019 at 7:37 PM Wes McKinney <we...@gmail.com>
> wrote:
> >
> > > hi folks,
> > >
> > > I'd like to propose some kind of timeline for getting a first
> > > iteration of a benchmark database developed and live, with scripts to
> > > enable one or more initial agents to start adding new data on a daily
> > > / per-commit basis. I have at least 3 physical machines where I could
> > > immediately set up cron jobs to start adding new data, and I could
> > > attempt to backfill data as far back as possible.
> > >
> > > Personally, I would like to see this done by the end of February if
> > > not sooner -- if we don't have the volunteers to push the work to
> > > completion by then please let me know as I will rearrange my
> > > priorities to make sure that it happens. Does that sounds reasonable?
> > >
> > > Please let me know if this plan sounds reasonable:
> > >
> > > * Set up a hosted PostgreSQL instance, configure backups
> > > * Propose and adopt a database schema for storing benchmark results
> > > * For C++, write script (or Dockerfile) to execute all
> > > google-benchmarks, output results to JSON, then adapter script
> > > (Python) to ingest into database
> > > * For Python, similar script that invokes ASV, then inserts ASV
> > > results into benchmark database
> > >
> > > This seems to be a pre-requisite for having a front-end to visualize
> > > the results, but the dashboard/front end can hopefully be implemented
> > > in such a way that the details of the benchmark database are not too
> > > tightly coupled
> > >
> > > (Do we have any other benchmarks in the project that would need to be
> > > inserted initially?)
> > >
> > > Related work to trigger benchmarks on agents when new commits land in
> > > master can happen concurrently -- one task need not block the other
> > >
> > > Thanks
> > > Wes
> > >
> > > On Mon, Jan 21, 2019 at 11:14 AM Wes McKinney <we...@gmail.com>
> wrote:
> > > >
> > > > Sorry, copy-paste failure:
> > > https://issues.apache.org/jira/browse/ARROW-4313
> > > >
> > > > On Mon, Jan 21, 2019 at 11:14 AM Wes McKinney <we...@gmail.com>
> > > wrote:
> > > > >
> > > > > I don't think there is one but I just created
> > > > >
> > >
> https://lists.apache.org/thread.html/278e573445c83bbd8ee66474b9356c5291a16f6b6eca11dbbe4b473a@%3Cdev.arrow.apache.org%3E
> > > > >
> > > > > On Mon, Jan 21, 2019 at 10:35 AM Tanya Schlusser <tanya@tickel.net
> >
> > > wrote:
> > > > > >
> > > > > > Areg,
> > > > > >
> > > > > > If you'd like help, I volunteer! No experience benchmarking but
> tons
> > > > > > experience databasing—I can mock the backend (database + http)
> as a
> > > > > > starting point for discussion if this is the way people want to
> go.
> > > > > >
> > > > > > Is there a Jira ticket for this that i can jump into?
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Sun, Jan 20, 2019 at 3:24 PM Wes McKinney <
> wesmckinn@gmail.com>
> > > wrote:
> > > > > >
> > > > > > > hi Areg,
> > > > > > >
> > > > > > > This sounds great -- we've discussed building a more
> full-featured
> > > > > > > benchmark automation system in the past but nothing has been
> > > developed
> > > > > > > yet.
> > > > > > >
> > > > > > > Your proposal about the details sounds OK; the single most
> > > important
> > > > > > > thing to me is that we build and maintain a very general
> purpose
> > > > > > > database schema for building the historical benchmark database
> > > > > > >
> > > > > > > The benchmark database should keep track of:
> > > > > > >
> > > > > > > * Timestamp of benchmark run
> > > > > > > * Git commit hash of codebase
> > > > > > > * Machine unique name (sort of the "user id")
> > > > > > > * CPU identification for machine, and clock frequency (in case
> of
> > > > > > > overclocking)
> > > > > > > * CPU cache sizes (L1/L2/L3)
> > > > > > > * Whether or not CPU throttling is enabled (if it can be easily
> > > determined)
> > > > > > > * RAM size
> > > > > > > * GPU identification (if any)
> > > > > > > * Benchmark unique name
> > > > > > > * Programming language(s) associated with benchmark (e.g. a
> > > benchmark
> > > > > > > may involve both C++ and Python)
> > > > > > > * Benchmark time, plus mean and standard deviation if
> available,
> > > else NULL
> > > > > > >
> > > > > > > (maybe some other things)
> > > > > > >
> > > > > > > I would rather not be locked into the internal database schema
> of a
> > > > > > > particular benchmarking tool. So people in the community can
> just
> > > run
> > > > > > > SQL queries against the database and use the data however they
> > > like.
> > > > > > > We'll just have to be careful that people don't DROP TABLE or
> > > DELETE
> > > > > > > (but we should have daily backups so we can recover from such
> > > cases)
> > > > > > >
> > > > > > > So while we may make use of TeamCity to schedule the runs on
> the
> > > cloud
> > > > > > > and physical hardware, we should also provide a path for other
> > > people
> > > > > > > in the community to add data to the benchmark database on their
> > > > > > > hardware on an ad hoc basis. For example, I have several
> machines
> > > in
> > > > > > > my home on all operating systems (Windows / macOS / Linux, and
> soon
> > > > > > > also ARM64) and I'd like to set up scheduled tasks / cron jobs
> to
> > > > > > > report in to the database at least on a daily basis.
> > > > > > >
> > > > > > > Ideally the benchmark database would just be a PostgreSQL
> server
> > > with
> > > > > > > a schema we write down and keep backed up etc. Hosted
> PostgreSQL is
> > > > > > > inexpensive ($200+ per year depending on size of instance; this
> > > > > > > probably doesn't need to be a crazy big machine)
> > > > > > >
> > > > > > > I suspect there will be a manageable amount of development
> > > involved to
> > > > > > > glue each of the benchmarking frameworks together with the
> > > benchmark
> > > > > > > database. This can also handle querying the operating system
> for
> > > the
> > > > > > > system information listed above
> > > > > > >
> > > > > > > Thanks
> > > > > > > Wes
> > > > > > >
> > > > > > > On Fri, Jan 18, 2019 at 12:14 AM Melik-Adamyan, Areg
> > > > > > > <ar...@intel.com> wrote:
> > > > > > > >
> > > > > > > > Hello,
> > > > > > > >
> > > > > > > > I want to restart/attach to the discussions for creating
> Arrow
> > > > > > > benchmarking dashboard. I want to propose performance benchmark
> > > run per
> > > > > > > commit to track the changes.
> > > > > > > > The proposal includes building infrastructure for per-commit
> > > tracking
> > > > > > > comprising of the following parts:
> > > > > > > > - Hosted JetBrains for OSS https://teamcity.jetbrains.com/
> as a
> > > build
> > > > > > > system
> > > > > > > > - Agents running in cloud both VM/container (DigitalOcean, or
> > > others)
> > > > > > > and bare-metal (Packet.net/AWS) and on-premise(Nvidia boxes?)
> > > > > > > > - JFrog artifactory storage and management for OSS projects
> > > > > > > https://jfrog.com/open-source/#artifactory2
> > > > > > > > - Codespeed as a frontend
> https://github.com/tobami/codespeed
> > > > > > > >
> > > > > > > > I am volunteering to build such system (if needed more Intel
> > > folks will
> > > > > > > be involved) so we can start tracking performance on various
> > > platforms and
> > > > > > > understand how changes affect it.
> > > > > > > >
> > > > > > > > Please, let me know your thoughts!
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > -Areg.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > >
>

Re: Benchmarking dashboard proposal

Posted by Wes McKinney <we...@gmail.com>.
hi folks,

I'm curious where we currently stand on this project. I see the
discussion in https://issues.apache.org/jira/browse/ARROW-4313 --
would the next step be to have a pull request with .sql files
containing the DDL required to create the schema in PostgreSQL?

I could volunteer to write the "C++ Benchmark Collector" script that
will run all the benchmarks on Linux and collect their data to be
inserted into the database.

Thanks
Wes

On Sun, Jan 27, 2019 at 12:20 AM Tanya Schlusser <ta...@tickel.net> wrote:
>
> I don't want to be the bottleneck and have posted an initial draft data
> model in the JIRA issue https://issues.apache.org/jira/browse/ARROW-4313
>
> It should not be a problem to get content into a form that would be
> acceptable for either a static site like ASV (via CORS queries to a
> GraphQL/REST interface) or a codespeed-style site (via a separate schema
> organized for Django)
>
> I don't think I'm experienced enough to actually write any benchmarks
> though, so all I can contribute is backend work for this task.
>
> Best,
> Tanya
>
> On Sat, Jan 26, 2019 at 7:37 PM Wes McKinney <we...@gmail.com> wrote:
>
> > hi folks,
> >
> > I'd like to propose some kind of timeline for getting a first
> > iteration of a benchmark database developed and live, with scripts to
> > enable one or more initial agents to start adding new data on a daily
> > / per-commit basis. I have at least 3 physical machines where I could
> > immediately set up cron jobs to start adding new data, and I could
> > attempt to backfill data as far back as possible.
> >
> > Personally, I would like to see this done by the end of February if
> > not sooner -- if we don't have the volunteers to push the work to
> > completion by then please let me know as I will rearrange my
> > priorities to make sure that it happens. Does that sounds reasonable?
> >
> > Please let me know if this plan sounds reasonable:
> >
> > * Set up a hosted PostgreSQL instance, configure backups
> > * Propose and adopt a database schema for storing benchmark results
> > * For C++, write script (or Dockerfile) to execute all
> > google-benchmarks, output results to JSON, then adapter script
> > (Python) to ingest into database
> > * For Python, similar script that invokes ASV, then inserts ASV
> > results into benchmark database
> >
> > This seems to be a pre-requisite for having a front-end to visualize
> > the results, but the dashboard/front end can hopefully be implemented
> > in such a way that the details of the benchmark database are not too
> > tightly coupled
> >
> > (Do we have any other benchmarks in the project that would need to be
> > inserted initially?)
> >
> > Related work to trigger benchmarks on agents when new commits land in
> > master can happen concurrently -- one task need not block the other
> >
> > Thanks
> > Wes
> >
> > On Mon, Jan 21, 2019 at 11:14 AM Wes McKinney <we...@gmail.com> wrote:
> > >
> > > Sorry, copy-paste failure:
> > https://issues.apache.org/jira/browse/ARROW-4313
> > >
> > > On Mon, Jan 21, 2019 at 11:14 AM Wes McKinney <we...@gmail.com>
> > wrote:
> > > >
> > > > I don't think there is one but I just created
> > > >
> > https://lists.apache.org/thread.html/278e573445c83bbd8ee66474b9356c5291a16f6b6eca11dbbe4b473a@%3Cdev.arrow.apache.org%3E
> > > >
> > > > On Mon, Jan 21, 2019 at 10:35 AM Tanya Schlusser <ta...@tickel.net>
> > wrote:
> > > > >
> > > > > Areg,
> > > > >
> > > > > If you'd like help, I volunteer! No experience benchmarking but tons
> > > > > experience databasing—I can mock the backend (database + http) as a
> > > > > starting point for discussion if this is the way people want to go.
> > > > >
> > > > > Is there a Jira ticket for this that i can jump into?
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Sun, Jan 20, 2019 at 3:24 PM Wes McKinney <we...@gmail.com>
> > wrote:
> > > > >
> > > > > > hi Areg,
> > > > > >
> > > > > > This sounds great -- we've discussed building a more full-featured
> > > > > > benchmark automation system in the past but nothing has been
> > developed
> > > > > > yet.
> > > > > >
> > > > > > Your proposal about the details sounds OK; the single most
> > important
> > > > > > thing to me is that we build and maintain a very general purpose
> > > > > > database schema for building the historical benchmark database
> > > > > >
> > > > > > The benchmark database should keep track of:
> > > > > >
> > > > > > * Timestamp of benchmark run
> > > > > > * Git commit hash of codebase
> > > > > > * Machine unique name (sort of the "user id")
> > > > > > * CPU identification for machine, and clock frequency (in case of
> > > > > > overclocking)
> > > > > > * CPU cache sizes (L1/L2/L3)
> > > > > > * Whether or not CPU throttling is enabled (if it can be easily
> > determined)
> > > > > > * RAM size
> > > > > > * GPU identification (if any)
> > > > > > * Benchmark unique name
> > > > > > * Programming language(s) associated with benchmark (e.g. a
> > benchmark
> > > > > > may involve both C++ and Python)
> > > > > > * Benchmark time, plus mean and standard deviation if available,
> > else NULL
> > > > > >
> > > > > > (maybe some other things)
> > > > > >
> > > > > > I would rather not be locked into the internal database schema of a
> > > > > > particular benchmarking tool. So people in the community can just
> > run
> > > > > > SQL queries against the database and use the data however they
> > like.
> > > > > > We'll just have to be careful that people don't DROP TABLE or
> > DELETE
> > > > > > (but we should have daily backups so we can recover from such
> > cases)
> > > > > >
> > > > > > So while we may make use of TeamCity to schedule the runs on the
> > cloud
> > > > > > and physical hardware, we should also provide a path for other
> > people
> > > > > > in the community to add data to the benchmark database on their
> > > > > > hardware on an ad hoc basis. For example, I have several machines
> > in
> > > > > > my home on all operating systems (Windows / macOS / Linux, and soon
> > > > > > also ARM64) and I'd like to set up scheduled tasks / cron jobs to
> > > > > > report in to the database at least on a daily basis.
> > > > > >
> > > > > > Ideally the benchmark database would just be a PostgreSQL server
> > with
> > > > > > a schema we write down and keep backed up etc. Hosted PostgreSQL is
> > > > > > inexpensive ($200+ per year depending on size of instance; this
> > > > > > probably doesn't need to be a crazy big machine)
> > > > > >
> > > > > > I suspect there will be a manageable amount of development
> > involved to
> > > > > > glue each of the benchmarking frameworks together with the
> > benchmark
> > > > > > database. This can also handle querying the operating system for
> > the
> > > > > > system information listed above
> > > > > >
> > > > > > Thanks
> > > > > > Wes
> > > > > >
> > > > > > On Fri, Jan 18, 2019 at 12:14 AM Melik-Adamyan, Areg
> > > > > > <ar...@intel.com> wrote:
> > > > > > >
> > > > > > > Hello,
> > > > > > >
> > > > > > > I want to restart/attach to the discussions for creating Arrow
> > > > > > benchmarking dashboard. I want to propose performance benchmark
> > run per
> > > > > > commit to track the changes.
> > > > > > > The proposal includes building infrastructure for per-commit
> > tracking
> > > > > > comprising of the following parts:
> > > > > > > - Hosted JetBrains for OSS https://teamcity.jetbrains.com/ as a
> > build
> > > > > > system
> > > > > > > - Agents running in cloud both VM/container (DigitalOcean, or
> > others)
> > > > > > and bare-metal (Packet.net/AWS) and on-premise(Nvidia boxes?)
> > > > > > > - JFrog artifactory storage and management for OSS projects
> > > > > > https://jfrog.com/open-source/#artifactory2
> > > > > > > - Codespeed as a frontend https://github.com/tobami/codespeed
> > > > > > >
> > > > > > > I am volunteering to build such system (if needed more Intel
> > folks will
> > > > > > be involved) so we can start tracking performance on various
> > platforms and
> > > > > > understand how changes affect it.
> > > > > > >
> > > > > > > Please, let me know your thoughts!
> > > > > > >
> > > > > > > Thanks,
> > > > > > > -Areg.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> >

Re: Benchmarking dashboard proposal

Posted by Tanya Schlusser <ta...@tickel.net>.
I don't want to be the bottleneck and have posted an initial draft data
model in the JIRA issue https://issues.apache.org/jira/browse/ARROW-4313

It should not be a problem to get content into a form that would be
acceptable for either a static site like ASV (via CORS queries to a
GraphQL/REST interface) or a codespeed-style site (via a separate schema
organized for Django)

I don't think I'm experienced enough to actually write any benchmarks
though, so all I can contribute is backend work for this task.

Best,
Tanya

On Sat, Jan 26, 2019 at 7:37 PM Wes McKinney <we...@gmail.com> wrote:

> hi folks,
>
> I'd like to propose some kind of timeline for getting a first
> iteration of a benchmark database developed and live, with scripts to
> enable one or more initial agents to start adding new data on a daily
> / per-commit basis. I have at least 3 physical machines where I could
> immediately set up cron jobs to start adding new data, and I could
> attempt to backfill data as far back as possible.
>
> Personally, I would like to see this done by the end of February if
> not sooner -- if we don't have the volunteers to push the work to
> completion by then please let me know as I will rearrange my
> priorities to make sure that it happens. Does that sounds reasonable?
>
> Please let me know if this plan sounds reasonable:
>
> * Set up a hosted PostgreSQL instance, configure backups
> * Propose and adopt a database schema for storing benchmark results
> * For C++, write script (or Dockerfile) to execute all
> google-benchmarks, output results to JSON, then adapter script
> (Python) to ingest into database
> * For Python, similar script that invokes ASV, then inserts ASV
> results into benchmark database
>
> This seems to be a pre-requisite for having a front-end to visualize
> the results, but the dashboard/front end can hopefully be implemented
> in such a way that the details of the benchmark database are not too
> tightly coupled
>
> (Do we have any other benchmarks in the project that would need to be
> inserted initially?)
>
> Related work to trigger benchmarks on agents when new commits land in
> master can happen concurrently -- one task need not block the other
>
> Thanks
> Wes
>
> On Mon, Jan 21, 2019 at 11:14 AM Wes McKinney <we...@gmail.com> wrote:
> >
> > Sorry, copy-paste failure:
> https://issues.apache.org/jira/browse/ARROW-4313
> >
> > On Mon, Jan 21, 2019 at 11:14 AM Wes McKinney <we...@gmail.com>
> wrote:
> > >
> > > I don't think there is one but I just created
> > >
> https://lists.apache.org/thread.html/278e573445c83bbd8ee66474b9356c5291a16f6b6eca11dbbe4b473a@%3Cdev.arrow.apache.org%3E
> > >
> > > On Mon, Jan 21, 2019 at 10:35 AM Tanya Schlusser <ta...@tickel.net>
> wrote:
> > > >
> > > > Areg,
> > > >
> > > > If you'd like help, I volunteer! No experience benchmarking but tons
> > > > experience databasing—I can mock the backend (database + http) as a
> > > > starting point for discussion if this is the way people want to go.
> > > >
> > > > Is there a Jira ticket for this that i can jump into?
> > > >
> > > >
> > > >
> > > >
> > > > On Sun, Jan 20, 2019 at 3:24 PM Wes McKinney <we...@gmail.com>
> wrote:
> > > >
> > > > > hi Areg,
> > > > >
> > > > > This sounds great -- we've discussed building a more full-featured
> > > > > benchmark automation system in the past but nothing has been
> developed
> > > > > yet.
> > > > >
> > > > > Your proposal about the details sounds OK; the single most
> important
> > > > > thing to me is that we build and maintain a very general purpose
> > > > > database schema for building the historical benchmark database
> > > > >
> > > > > The benchmark database should keep track of:
> > > > >
> > > > > * Timestamp of benchmark run
> > > > > * Git commit hash of codebase
> > > > > * Machine unique name (sort of the "user id")
> > > > > * CPU identification for machine, and clock frequency (in case of
> > > > > overclocking)
> > > > > * CPU cache sizes (L1/L2/L3)
> > > > > * Whether or not CPU throttling is enabled (if it can be easily
> determined)
> > > > > * RAM size
> > > > > * GPU identification (if any)
> > > > > * Benchmark unique name
> > > > > * Programming language(s) associated with benchmark (e.g. a
> benchmark
> > > > > may involve both C++ and Python)
> > > > > * Benchmark time, plus mean and standard deviation if available,
> else NULL
> > > > >
> > > > > (maybe some other things)
> > > > >
> > > > > I would rather not be locked into the internal database schema of a
> > > > > particular benchmarking tool. So people in the community can just
> run
> > > > > SQL queries against the database and use the data however they
> like.
> > > > > We'll just have to be careful that people don't DROP TABLE or
> DELETE
> > > > > (but we should have daily backups so we can recover from such
> cases)
> > > > >
> > > > > So while we may make use of TeamCity to schedule the runs on the
> cloud
> > > > > and physical hardware, we should also provide a path for other
> people
> > > > > in the community to add data to the benchmark database on their
> > > > > hardware on an ad hoc basis. For example, I have several machines
> in
> > > > > my home on all operating systems (Windows / macOS / Linux, and soon
> > > > > also ARM64) and I'd like to set up scheduled tasks / cron jobs to
> > > > > report in to the database at least on a daily basis.
> > > > >
> > > > > Ideally the benchmark database would just be a PostgreSQL server
> with
> > > > > a schema we write down and keep backed up etc. Hosted PostgreSQL is
> > > > > inexpensive ($200+ per year depending on size of instance; this
> > > > > probably doesn't need to be a crazy big machine)
> > > > >
> > > > > I suspect there will be a manageable amount of development
> involved to
> > > > > glue each of the benchmarking frameworks together with the
> benchmark
> > > > > database. This can also handle querying the operating system for
> the
> > > > > system information listed above
> > > > >
> > > > > Thanks
> > > > > Wes
> > > > >
> > > > > On Fri, Jan 18, 2019 at 12:14 AM Melik-Adamyan, Areg
> > > > > <ar...@intel.com> wrote:
> > > > > >
> > > > > > Hello,
> > > > > >
> > > > > > I want to restart/attach to the discussions for creating Arrow
> > > > > benchmarking dashboard. I want to propose performance benchmark
> run per
> > > > > commit to track the changes.
> > > > > > The proposal includes building infrastructure for per-commit
> tracking
> > > > > comprising of the following parts:
> > > > > > - Hosted JetBrains for OSS https://teamcity.jetbrains.com/ as a
> build
> > > > > system
> > > > > > - Agents running in cloud both VM/container (DigitalOcean, or
> others)
> > > > > and bare-metal (Packet.net/AWS) and on-premise(Nvidia boxes?)
> > > > > > - JFrog artifactory storage and management for OSS projects
> > > > > https://jfrog.com/open-source/#artifactory2
> > > > > > - Codespeed as a frontend https://github.com/tobami/codespeed
> > > > > >
> > > > > > I am volunteering to build such system (if needed more Intel
> folks will
> > > > > be involved) so we can start tracking performance on various
> platforms and
> > > > > understand how changes affect it.
> > > > > >
> > > > > > Please, let me know your thoughts!
> > > > > >
> > > > > > Thanks,
> > > > > > -Areg.
> > > > > >
> > > > > >
> > > > > >
> > > > >
>

Re: Benchmarking dashboard proposal

Posted by Wes McKinney <we...@gmail.com>.
hi folks,

I'd like to propose some kind of timeline for getting a first
iteration of a benchmark database developed and live, with scripts to
enable one or more initial agents to start adding new data on a daily
/ per-commit basis. I have at least 3 physical machines where I could
immediately set up cron jobs to start adding new data, and I could
attempt to backfill data as far back as possible.

Personally, I would like to see this done by the end of February if
not sooner -- if we don't have the volunteers to push the work to
completion by then please let me know as I will rearrange my
priorities to make sure that it happens. Does that sounds reasonable?

Please let me know if this plan sounds reasonable:

* Set up a hosted PostgreSQL instance, configure backups
* Propose and adopt a database schema for storing benchmark results
* For C++, write script (or Dockerfile) to execute all
google-benchmarks, output results to JSON, then adapter script
(Python) to ingest into database
* For Python, similar script that invokes ASV, then inserts ASV
results into benchmark database

This seems to be a pre-requisite for having a front-end to visualize
the results, but the dashboard/front end can hopefully be implemented
in such a way that the details of the benchmark database are not too
tightly coupled

(Do we have any other benchmarks in the project that would need to be
inserted initially?)

Related work to trigger benchmarks on agents when new commits land in
master can happen concurrently -- one task need not block the other

Thanks
Wes

On Mon, Jan 21, 2019 at 11:14 AM Wes McKinney <we...@gmail.com> wrote:
>
> Sorry, copy-paste failure: https://issues.apache.org/jira/browse/ARROW-4313
>
> On Mon, Jan 21, 2019 at 11:14 AM Wes McKinney <we...@gmail.com> wrote:
> >
> > I don't think there is one but I just created
> > https://lists.apache.org/thread.html/278e573445c83bbd8ee66474b9356c5291a16f6b6eca11dbbe4b473a@%3Cdev.arrow.apache.org%3E
> >
> > On Mon, Jan 21, 2019 at 10:35 AM Tanya Schlusser <ta...@tickel.net> wrote:
> > >
> > > Areg,
> > >
> > > If you'd like help, I volunteer! No experience benchmarking but tons
> > > experience databasing—I can mock the backend (database + http) as a
> > > starting point for discussion if this is the way people want to go.
> > >
> > > Is there a Jira ticket for this that i can jump into?
> > >
> > >
> > >
> > >
> > > On Sun, Jan 20, 2019 at 3:24 PM Wes McKinney <we...@gmail.com> wrote:
> > >
> > > > hi Areg,
> > > >
> > > > This sounds great -- we've discussed building a more full-featured
> > > > benchmark automation system in the past but nothing has been developed
> > > > yet.
> > > >
> > > > Your proposal about the details sounds OK; the single most important
> > > > thing to me is that we build and maintain a very general purpose
> > > > database schema for building the historical benchmark database
> > > >
> > > > The benchmark database should keep track of:
> > > >
> > > > * Timestamp of benchmark run
> > > > * Git commit hash of codebase
> > > > * Machine unique name (sort of the "user id")
> > > > * CPU identification for machine, and clock frequency (in case of
> > > > overclocking)
> > > > * CPU cache sizes (L1/L2/L3)
> > > > * Whether or not CPU throttling is enabled (if it can be easily determined)
> > > > * RAM size
> > > > * GPU identification (if any)
> > > > * Benchmark unique name
> > > > * Programming language(s) associated with benchmark (e.g. a benchmark
> > > > may involve both C++ and Python)
> > > > * Benchmark time, plus mean and standard deviation if available, else NULL
> > > >
> > > > (maybe some other things)
> > > >
> > > > I would rather not be locked into the internal database schema of a
> > > > particular benchmarking tool. So people in the community can just run
> > > > SQL queries against the database and use the data however they like.
> > > > We'll just have to be careful that people don't DROP TABLE or DELETE
> > > > (but we should have daily backups so we can recover from such cases)
> > > >
> > > > So while we may make use of TeamCity to schedule the runs on the cloud
> > > > and physical hardware, we should also provide a path for other people
> > > > in the community to add data to the benchmark database on their
> > > > hardware on an ad hoc basis. For example, I have several machines in
> > > > my home on all operating systems (Windows / macOS / Linux, and soon
> > > > also ARM64) and I'd like to set up scheduled tasks / cron jobs to
> > > > report in to the database at least on a daily basis.
> > > >
> > > > Ideally the benchmark database would just be a PostgreSQL server with
> > > > a schema we write down and keep backed up etc. Hosted PostgreSQL is
> > > > inexpensive ($200+ per year depending on size of instance; this
> > > > probably doesn't need to be a crazy big machine)
> > > >
> > > > I suspect there will be a manageable amount of development involved to
> > > > glue each of the benchmarking frameworks together with the benchmark
> > > > database. This can also handle querying the operating system for the
> > > > system information listed above
> > > >
> > > > Thanks
> > > > Wes
> > > >
> > > > On Fri, Jan 18, 2019 at 12:14 AM Melik-Adamyan, Areg
> > > > <ar...@intel.com> wrote:
> > > > >
> > > > > Hello,
> > > > >
> > > > > I want to restart/attach to the discussions for creating Arrow
> > > > benchmarking dashboard. I want to propose performance benchmark run per
> > > > commit to track the changes.
> > > > > The proposal includes building infrastructure for per-commit tracking
> > > > comprising of the following parts:
> > > > > - Hosted JetBrains for OSS https://teamcity.jetbrains.com/ as a build
> > > > system
> > > > > - Agents running in cloud both VM/container (DigitalOcean, or others)
> > > > and bare-metal (Packet.net/AWS) and on-premise(Nvidia boxes?)
> > > > > - JFrog artifactory storage and management for OSS projects
> > > > https://jfrog.com/open-source/#artifactory2
> > > > > - Codespeed as a frontend https://github.com/tobami/codespeed
> > > > >
> > > > > I am volunteering to build such system (if needed more Intel folks will
> > > > be involved) so we can start tracking performance on various platforms and
> > > > understand how changes affect it.
> > > > >
> > > > > Please, let me know your thoughts!
> > > > >
> > > > > Thanks,
> > > > > -Areg.
> > > > >
> > > > >
> > > > >
> > > >

Re: Benchmarking dashboard proposal

Posted by Wes McKinney <we...@gmail.com>.
Sorry, copy-paste failure: https://issues.apache.org/jira/browse/ARROW-4313

On Mon, Jan 21, 2019 at 11:14 AM Wes McKinney <we...@gmail.com> wrote:
>
> I don't think there is one but I just created
> https://lists.apache.org/thread.html/278e573445c83bbd8ee66474b9356c5291a16f6b6eca11dbbe4b473a@%3Cdev.arrow.apache.org%3E
>
> On Mon, Jan 21, 2019 at 10:35 AM Tanya Schlusser <ta...@tickel.net> wrote:
> >
> > Areg,
> >
> > If you'd like help, I volunteer! No experience benchmarking but tons
> > experience databasing—I can mock the backend (database + http) as a
> > starting point for discussion if this is the way people want to go.
> >
> > Is there a Jira ticket for this that i can jump into?
> >
> >
> >
> >
> > On Sun, Jan 20, 2019 at 3:24 PM Wes McKinney <we...@gmail.com> wrote:
> >
> > > hi Areg,
> > >
> > > This sounds great -- we've discussed building a more full-featured
> > > benchmark automation system in the past but nothing has been developed
> > > yet.
> > >
> > > Your proposal about the details sounds OK; the single most important
> > > thing to me is that we build and maintain a very general purpose
> > > database schema for building the historical benchmark database
> > >
> > > The benchmark database should keep track of:
> > >
> > > * Timestamp of benchmark run
> > > * Git commit hash of codebase
> > > * Machine unique name (sort of the "user id")
> > > * CPU identification for machine, and clock frequency (in case of
> > > overclocking)
> > > * CPU cache sizes (L1/L2/L3)
> > > * Whether or not CPU throttling is enabled (if it can be easily determined)
> > > * RAM size
> > > * GPU identification (if any)
> > > * Benchmark unique name
> > > * Programming language(s) associated with benchmark (e.g. a benchmark
> > > may involve both C++ and Python)
> > > * Benchmark time, plus mean and standard deviation if available, else NULL
> > >
> > > (maybe some other things)
> > >
> > > I would rather not be locked into the internal database schema of a
> > > particular benchmarking tool. So people in the community can just run
> > > SQL queries against the database and use the data however they like.
> > > We'll just have to be careful that people don't DROP TABLE or DELETE
> > > (but we should have daily backups so we can recover from such cases)
> > >
> > > So while we may make use of TeamCity to schedule the runs on the cloud
> > > and physical hardware, we should also provide a path for other people
> > > in the community to add data to the benchmark database on their
> > > hardware on an ad hoc basis. For example, I have several machines in
> > > my home on all operating systems (Windows / macOS / Linux, and soon
> > > also ARM64) and I'd like to set up scheduled tasks / cron jobs to
> > > report in to the database at least on a daily basis.
> > >
> > > Ideally the benchmark database would just be a PostgreSQL server with
> > > a schema we write down and keep backed up etc. Hosted PostgreSQL is
> > > inexpensive ($200+ per year depending on size of instance; this
> > > probably doesn't need to be a crazy big machine)
> > >
> > > I suspect there will be a manageable amount of development involved to
> > > glue each of the benchmarking frameworks together with the benchmark
> > > database. This can also handle querying the operating system for the
> > > system information listed above
> > >
> > > Thanks
> > > Wes
> > >
> > > On Fri, Jan 18, 2019 at 12:14 AM Melik-Adamyan, Areg
> > > <ar...@intel.com> wrote:
> > > >
> > > > Hello,
> > > >
> > > > I want to restart/attach to the discussions for creating Arrow
> > > benchmarking dashboard. I want to propose performance benchmark run per
> > > commit to track the changes.
> > > > The proposal includes building infrastructure for per-commit tracking
> > > comprising of the following parts:
> > > > - Hosted JetBrains for OSS https://teamcity.jetbrains.com/ as a build
> > > system
> > > > - Agents running in cloud both VM/container (DigitalOcean, or others)
> > > and bare-metal (Packet.net/AWS) and on-premise(Nvidia boxes?)
> > > > - JFrog artifactory storage and management for OSS projects
> > > https://jfrog.com/open-source/#artifactory2
> > > > - Codespeed as a frontend https://github.com/tobami/codespeed
> > > >
> > > > I am volunteering to build such system (if needed more Intel folks will
> > > be involved) so we can start tracking performance on various platforms and
> > > understand how changes affect it.
> > > >
> > > > Please, let me know your thoughts!
> > > >
> > > > Thanks,
> > > > -Areg.
> > > >
> > > >
> > > >
> > >

Re: Benchmarking dashboard proposal

Posted by Wes McKinney <we...@gmail.com>.
I don't think there is one but I just created
https://lists.apache.org/thread.html/278e573445c83bbd8ee66474b9356c5291a16f6b6eca11dbbe4b473a@%3Cdev.arrow.apache.org%3E

On Mon, Jan 21, 2019 at 10:35 AM Tanya Schlusser <ta...@tickel.net> wrote:
>
> Areg,
>
> If you'd like help, I volunteer! No experience benchmarking but tons
> experience databasing—I can mock the backend (database + http) as a
> starting point for discussion if this is the way people want to go.
>
> Is there a Jira ticket for this that i can jump into?
>
>
>
>
> On Sun, Jan 20, 2019 at 3:24 PM Wes McKinney <we...@gmail.com> wrote:
>
> > hi Areg,
> >
> > This sounds great -- we've discussed building a more full-featured
> > benchmark automation system in the past but nothing has been developed
> > yet.
> >
> > Your proposal about the details sounds OK; the single most important
> > thing to me is that we build and maintain a very general purpose
> > database schema for building the historical benchmark database
> >
> > The benchmark database should keep track of:
> >
> > * Timestamp of benchmark run
> > * Git commit hash of codebase
> > * Machine unique name (sort of the "user id")
> > * CPU identification for machine, and clock frequency (in case of
> > overclocking)
> > * CPU cache sizes (L1/L2/L3)
> > * Whether or not CPU throttling is enabled (if it can be easily determined)
> > * RAM size
> > * GPU identification (if any)
> > * Benchmark unique name
> > * Programming language(s) associated with benchmark (e.g. a benchmark
> > may involve both C++ and Python)
> > * Benchmark time, plus mean and standard deviation if available, else NULL
> >
> > (maybe some other things)
> >
> > I would rather not be locked into the internal database schema of a
> > particular benchmarking tool. So people in the community can just run
> > SQL queries against the database and use the data however they like.
> > We'll just have to be careful that people don't DROP TABLE or DELETE
> > (but we should have daily backups so we can recover from such cases)
> >
> > So while we may make use of TeamCity to schedule the runs on the cloud
> > and physical hardware, we should also provide a path for other people
> > in the community to add data to the benchmark database on their
> > hardware on an ad hoc basis. For example, I have several machines in
> > my home on all operating systems (Windows / macOS / Linux, and soon
> > also ARM64) and I'd like to set up scheduled tasks / cron jobs to
> > report in to the database at least on a daily basis.
> >
> > Ideally the benchmark database would just be a PostgreSQL server with
> > a schema we write down and keep backed up etc. Hosted PostgreSQL is
> > inexpensive ($200+ per year depending on size of instance; this
> > probably doesn't need to be a crazy big machine)
> >
> > I suspect there will be a manageable amount of development involved to
> > glue each of the benchmarking frameworks together with the benchmark
> > database. This can also handle querying the operating system for the
> > system information listed above
> >
> > Thanks
> > Wes
> >
> > On Fri, Jan 18, 2019 at 12:14 AM Melik-Adamyan, Areg
> > <ar...@intel.com> wrote:
> > >
> > > Hello,
> > >
> > > I want to restart/attach to the discussions for creating Arrow
> > benchmarking dashboard. I want to propose performance benchmark run per
> > commit to track the changes.
> > > The proposal includes building infrastructure for per-commit tracking
> > comprising of the following parts:
> > > - Hosted JetBrains for OSS https://teamcity.jetbrains.com/ as a build
> > system
> > > - Agents running in cloud both VM/container (DigitalOcean, or others)
> > and bare-metal (Packet.net/AWS) and on-premise(Nvidia boxes?)
> > > - JFrog artifactory storage and management for OSS projects
> > https://jfrog.com/open-source/#artifactory2
> > > - Codespeed as a frontend https://github.com/tobami/codespeed
> > >
> > > I am volunteering to build such system (if needed more Intel folks will
> > be involved) so we can start tracking performance on various platforms and
> > understand how changes affect it.
> > >
> > > Please, let me know your thoughts!
> > >
> > > Thanks,
> > > -Areg.
> > >
> > >
> > >
> >

Re: Benchmarking dashboard proposal

Posted by Tanya Schlusser <ta...@tickel.net>.
Areg,

If you'd like help, I volunteer! No experience benchmarking but tons
experience databasing—I can mock the backend (database + http) as a
starting point for discussion if this is the way people want to go.

Is there a Jira ticket for this that i can jump into?




On Sun, Jan 20, 2019 at 3:24 PM Wes McKinney <we...@gmail.com> wrote:

> hi Areg,
>
> This sounds great -- we've discussed building a more full-featured
> benchmark automation system in the past but nothing has been developed
> yet.
>
> Your proposal about the details sounds OK; the single most important
> thing to me is that we build and maintain a very general purpose
> database schema for building the historical benchmark database
>
> The benchmark database should keep track of:
>
> * Timestamp of benchmark run
> * Git commit hash of codebase
> * Machine unique name (sort of the "user id")
> * CPU identification for machine, and clock frequency (in case of
> overclocking)
> * CPU cache sizes (L1/L2/L3)
> * Whether or not CPU throttling is enabled (if it can be easily determined)
> * RAM size
> * GPU identification (if any)
> * Benchmark unique name
> * Programming language(s) associated with benchmark (e.g. a benchmark
> may involve both C++ and Python)
> * Benchmark time, plus mean and standard deviation if available, else NULL
>
> (maybe some other things)
>
> I would rather not be locked into the internal database schema of a
> particular benchmarking tool. So people in the community can just run
> SQL queries against the database and use the data however they like.
> We'll just have to be careful that people don't DROP TABLE or DELETE
> (but we should have daily backups so we can recover from such cases)
>
> So while we may make use of TeamCity to schedule the runs on the cloud
> and physical hardware, we should also provide a path for other people
> in the community to add data to the benchmark database on their
> hardware on an ad hoc basis. For example, I have several machines in
> my home on all operating systems (Windows / macOS / Linux, and soon
> also ARM64) and I'd like to set up scheduled tasks / cron jobs to
> report in to the database at least on a daily basis.
>
> Ideally the benchmark database would just be a PostgreSQL server with
> a schema we write down and keep backed up etc. Hosted PostgreSQL is
> inexpensive ($200+ per year depending on size of instance; this
> probably doesn't need to be a crazy big machine)
>
> I suspect there will be a manageable amount of development involved to
> glue each of the benchmarking frameworks together with the benchmark
> database. This can also handle querying the operating system for the
> system information listed above
>
> Thanks
> Wes
>
> On Fri, Jan 18, 2019 at 12:14 AM Melik-Adamyan, Areg
> <ar...@intel.com> wrote:
> >
> > Hello,
> >
> > I want to restart/attach to the discussions for creating Arrow
> benchmarking dashboard. I want to propose performance benchmark run per
> commit to track the changes.
> > The proposal includes building infrastructure for per-commit tracking
> comprising of the following parts:
> > - Hosted JetBrains for OSS https://teamcity.jetbrains.com/ as a build
> system
> > - Agents running in cloud both VM/container (DigitalOcean, or others)
> and bare-metal (Packet.net/AWS) and on-premise(Nvidia boxes?)
> > - JFrog artifactory storage and management for OSS projects
> https://jfrog.com/open-source/#artifactory2
> > - Codespeed as a frontend https://github.com/tobami/codespeed
> >
> > I am volunteering to build such system (if needed more Intel folks will
> be involved) so we can start tracking performance on various platforms and
> understand how changes affect it.
> >
> > Please, let me know your thoughts!
> >
> > Thanks,
> > -Areg.
> >
> >
> >
>

Re: Benchmarking dashboard proposal

Posted by Wes McKinney <we...@gmail.com>.
hi Areg,

This sounds great -- we've discussed building a more full-featured
benchmark automation system in the past but nothing has been developed
yet.

Your proposal about the details sounds OK; the single most important
thing to me is that we build and maintain a very general purpose
database schema for building the historical benchmark database

The benchmark database should keep track of:

* Timestamp of benchmark run
* Git commit hash of codebase
* Machine unique name (sort of the "user id")
* CPU identification for machine, and clock frequency (in case of overclocking)
* CPU cache sizes (L1/L2/L3)
* Whether or not CPU throttling is enabled (if it can be easily determined)
* RAM size
* GPU identification (if any)
* Benchmark unique name
* Programming language(s) associated with benchmark (e.g. a benchmark
may involve both C++ and Python)
* Benchmark time, plus mean and standard deviation if available, else NULL

(maybe some other things)

I would rather not be locked into the internal database schema of a
particular benchmarking tool. So people in the community can just run
SQL queries against the database and use the data however they like.
We'll just have to be careful that people don't DROP TABLE or DELETE
(but we should have daily backups so we can recover from such cases)

So while we may make use of TeamCity to schedule the runs on the cloud
and physical hardware, we should also provide a path for other people
in the community to add data to the benchmark database on their
hardware on an ad hoc basis. For example, I have several machines in
my home on all operating systems (Windows / macOS / Linux, and soon
also ARM64) and I'd like to set up scheduled tasks / cron jobs to
report in to the database at least on a daily basis.

Ideally the benchmark database would just be a PostgreSQL server with
a schema we write down and keep backed up etc. Hosted PostgreSQL is
inexpensive ($200+ per year depending on size of instance; this
probably doesn't need to be a crazy big machine)

I suspect there will be a manageable amount of development involved to
glue each of the benchmarking frameworks together with the benchmark
database. This can also handle querying the operating system for the
system information listed above

Thanks
Wes

On Fri, Jan 18, 2019 at 12:14 AM Melik-Adamyan, Areg
<ar...@intel.com> wrote:
>
> Hello,
>
> I want to restart/attach to the discussions for creating Arrow benchmarking dashboard. I want to propose performance benchmark run per commit to track the changes.
> The proposal includes building infrastructure for per-commit tracking comprising of the following parts:
> - Hosted JetBrains for OSS https://teamcity.jetbrains.com/ as a build system
> - Agents running in cloud both VM/container (DigitalOcean, or others) and bare-metal (Packet.net/AWS) and on-premise(Nvidia boxes?)
> - JFrog artifactory storage and management for OSS projects https://jfrog.com/open-source/#artifactory2
> - Codespeed as a frontend https://github.com/tobami/codespeed
>
> I am volunteering to build such system (if needed more Intel folks will be involved) so we can start tracking performance on various platforms and understand how changes affect it.
>
> Please, let me know your thoughts!
>
> Thanks,
> -Areg.
>
>
>