You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@arrow.apache.org by Antoine Pitrou <an...@python.org> on 2018/04/11 13:40:17 UTC

Continuous benchmarking setup

Hello

With the following changes, it seems we might reach the point where
we're able to run the Python-based benchmark suite accross multiple
commits (at least the ones not anterior to those changes):
https://github.com/apache/arrow/pull/1775

To make this truly useful, we would need a dedicated host.  Ideally a
(Linux) OS running on bare metal, with SMT/HyperThreading disabled.
If running virtualized, the VM should have dedicated physical CPU cores.

That machine would run the benchmarks on a regular basis (perhaps once
per night) and publish the results in static HTML form somewhere.

(note: nice to have in the future might be access to NVidia hardware,
but right now there are no CUDA benchmarks in the Python benchmarks)

What should be the procedure here?

Regards

Antoine.

Re: Continuous benchmarking setup

Posted by Tom Augspurger <to...@gmail.com>.

Currently, there are 3 snowflakes :)

- Benchmark setup: https://github.com/TomAugspurger/asv-runner
  + Some setup to bootstrap a clean install with airflow, conda, asv,
supervisor, etc. All the infrastructure around running the benchmarks.
  + Each project adds itself to the list of benchmarks, as in
https://github.com/TomAugspurger/asv-runner/pull/3. Then things are
re-deployed. Deployment requires ansible and an SSH key for the benchmark
machine
- Benchmark publishing: After running all the benchmarks, the results are
collected and pushed to https://github.com/tomaugspurger/asv-collection
- Benchmark hosting: A cron job on the server hosting pandas docs pulls
https://github.com/tomaugspurger/asv-collection and serves them from the
`/speed` directory.

There are many things that could be improved on here, but I personally
won't have time in the near term. Happy to assist though.

On Mon, Apr 23, 2018 at 10:15 AM, Wes McKinney <we...@gmail.com> wrote:

> hi Tom -- is the publishing workflow for this documented someplace, or
> available in a GitHub repo? We want to make sure we don't accumulate
> any "snowflakes" in the development process.
>
> thanks!
> Wes
>
> On Fri, Apr 13, 2018 at 8:36 AM, Tom Augspurger
> <to...@gmail.com> wrote:
> > They are run daily and published to http://pandas.pydata.org/speed/
> >
> >
> > ________________________________
> > From: Antoine Pitrou <an...@python.org>
> > Sent: Friday, April 13, 2018 4:28:11 AM
> > To: dev@arrow.apache.org
> > Subject: Re: Continuous benchmarking setup
> >
> >
> > Nice! Are the benchmark results published somewhere?
> >
> >
> >
> > Le 13/04/2018 à 02:50, Tom Augspurger a écrit :
> >> https://github.com/TomAugspurger/asv-runner/ is the setup for the
> projects currently running. Adding arrow to  https://github.com/
> TomAugspurger/asv-runner/blob/master/tests/full.yml might work. I'll have
> to redeploy with the update.
> >>
> >> ________________________________
> >> From: Wes McKinney <we...@gmail.com>
> >> Sent: Thursday, April 12, 2018 7:24:20 PM
> >> To: dev@arrow.apache.org
> >> Subject: Re: Continuous benchmarking setup
> >>
> >> hi Antoine,
> >>
> >> I have a bare metal machine at home (affectionately known as the
> >> "pandabox") that's available via SSH that we've been using for
> >> continuous benchmarking for other projects. Arrow is welcome to use
> >> it. I can give you access to the machine if you would like. Hopefully,
> >> we can suitably the process of setting up a continuous benchmarking
> >> machine so that if we need to migrate to a new machine, it is not too
> >> much of a hardship to do so.
> >>
> >> Thanks
> >> Wes
> >>
> >> On Wed, Apr 11, 2018 at 9:40 AM, Antoine Pitrou <an...@python.org>
> wrote:
> >>>
> >>> Hello
> >>>
> >>> With the following changes, it seems we might reach the point where
> >>> we're able to run the Python-based benchmark suite accross multiple
> >>> commits (at least the ones not anterior to those changes):
> >>> https://github.com/apache/arrow/pull/1775
> >>>
> >>> To make this truly useful, we would need a dedicated host.  Ideally a
> >>> (Linux) OS running on bare metal, with SMT/HyperThreading disabled.
> >>> If running virtualized, the VM should have dedicated physical CPU
> cores.
> >>>
> >>> That machine would run the benchmarks on a regular basis (perhaps once
> >>> per night) and publish the results in static HTML form somewhere.
> >>>
> >>> (note: nice to have in the future might be access to NVidia hardware,
> >>> but right now there are no CUDA benchmarks in the Python benchmarks)
> >>>
> >>> What should be the procedure here?
> >>>
> >>> Regards
> >>>
> >>> Antoine.
> >>
>

Re: Continuous benchmarking setup

Posted by Wes McKinney <we...@gmail.com>.

hi Tom -- is the publishing workflow for this documented someplace, or
available in a GitHub repo? We want to make sure we don't accumulate
any "snowflakes" in the development process.

thanks!
Wes

On Fri, Apr 13, 2018 at 8:36 AM, Tom Augspurger
<to...@gmail.com> wrote:
> They are run daily and published to http://pandas.pydata.org/speed/
>
>
> ________________________________
> From: Antoine Pitrou <an...@python.org>
> Sent: Friday, April 13, 2018 4:28:11 AM
> To: dev@arrow.apache.org
> Subject: Re: Continuous benchmarking setup
>
>
> Nice! Are the benchmark results published somewhere?
>
>
>
> Le 13/04/2018 à 02:50, Tom Augspurger a écrit :
>> https://github.com/TomAugspurger/asv-runner/ is the setup for the projects currently running. Adding arrow to  https://github.com/TomAugspurger/asv-runner/blob/master/tests/full.yml might work. I'll have to redeploy with the update.
>>
>> ________________________________
>> From: Wes McKinney <we...@gmail.com>
>> Sent: Thursday, April 12, 2018 7:24:20 PM
>> To: dev@arrow.apache.org
>> Subject: Re: Continuous benchmarking setup
>>
>> hi Antoine,
>>
>> I have a bare metal machine at home (affectionately known as the
>> "pandabox") that's available via SSH that we've been using for
>> continuous benchmarking for other projects. Arrow is welcome to use
>> it. I can give you access to the machine if you would like. Hopefully,
>> we can suitably the process of setting up a continuous benchmarking
>> machine so that if we need to migrate to a new machine, it is not too
>> much of a hardship to do so.
>>
>> Thanks
>> Wes
>>
>> On Wed, Apr 11, 2018 at 9:40 AM, Antoine Pitrou <an...@python.org> wrote:
>>>
>>> Hello
>>>
>>> With the following changes, it seems we might reach the point where
>>> we're able to run the Python-based benchmark suite accross multiple
>>> commits (at least the ones not anterior to those changes):
>>> https://github.com/apache/arrow/pull/1775
>>>
>>> To make this truly useful, we would need a dedicated host.  Ideally a
>>> (Linux) OS running on bare metal, with SMT/HyperThreading disabled.
>>> If running virtualized, the VM should have dedicated physical CPU cores.
>>>
>>> That machine would run the benchmarks on a regular basis (perhaps once
>>> per night) and publish the results in static HTML form somewhere.
>>>
>>> (note: nice to have in the future might be access to NVidia hardware,
>>> but right now there are no CUDA benchmarks in the Python benchmarks)
>>>
>>> What should be the procedure here?
>>>
>>> Regards
>>>
>>> Antoine.
>>

Re: Continuous benchmarking setup

Posted by Tom Augspurger <to...@gmail.com>.

They are run daily and published to http://pandas.pydata.org/speed/


________________________________
From: Antoine Pitrou <an...@python.org>
Sent: Friday, April 13, 2018 4:28:11 AM
To: dev@arrow.apache.org
Subject: Re: Continuous benchmarking setup


Nice! Are the benchmark results published somewhere?



Le 13/04/2018 à 02:50, Tom Augspurger a écrit :
> https://github.com/TomAugspurger/asv-runner/ is the setup for the projects currently running. Adding arrow to  https://github.com/TomAugspurger/asv-runner/blob/master/tests/full.yml might work. I'll have to redeploy with the update.
>
> ________________________________
> From: Wes McKinney <we...@gmail.com>
> Sent: Thursday, April 12, 2018 7:24:20 PM
> To: dev@arrow.apache.org
> Subject: Re: Continuous benchmarking setup
>
> hi Antoine,
>
> I have a bare metal machine at home (affectionately known as the
> "pandabox") that's available via SSH that we've been using for
> continuous benchmarking for other projects. Arrow is welcome to use
> it. I can give you access to the machine if you would like. Hopefully,
> we can suitably the process of setting up a continuous benchmarking
> machine so that if we need to migrate to a new machine, it is not too
> much of a hardship to do so.
>
> Thanks
> Wes
>
> On Wed, Apr 11, 2018 at 9:40 AM, Antoine Pitrou <an...@python.org> wrote:
>>
>> Hello
>>
>> With the following changes, it seems we might reach the point where
>> we're able to run the Python-based benchmark suite accross multiple
>> commits (at least the ones not anterior to those changes):
>> https://github.com/apache/arrow/pull/1775
>>
>> To make this truly useful, we would need a dedicated host.  Ideally a
>> (Linux) OS running on bare metal, with SMT/HyperThreading disabled.
>> If running virtualized, the VM should have dedicated physical CPU cores.
>>
>> That machine would run the benchmarks on a regular basis (perhaps once
>> per night) and publish the results in static HTML form somewhere.
>>
>> (note: nice to have in the future might be access to NVidia hardware,
>> but right now there are no CUDA benchmarks in the Python benchmarks)
>>
>> What should be the procedure here?
>>
>> Regards
>>
>> Antoine.
>

Re: Continuous benchmarking setup

Posted by Antoine Pitrou <an...@python.org>.

Nice! Are the benchmark results published somewhere?



Le 13/04/2018 à 02:50, Tom Augspurger a écrit :
> https://github.com/TomAugspurger/asv-runner/ is the setup for the projects currently running. Adding arrow to  https://github.com/TomAugspurger/asv-runner/blob/master/tests/full.yml might work. I'll have to redeploy with the update.
> 
> ________________________________
> From: Wes McKinney <we...@gmail.com>
> Sent: Thursday, April 12, 2018 7:24:20 PM
> To: dev@arrow.apache.org
> Subject: Re: Continuous benchmarking setup
> 
> hi Antoine,
> 
> I have a bare metal machine at home (affectionately known as the
> "pandabox") that's available via SSH that we've been using for
> continuous benchmarking for other projects. Arrow is welcome to use
> it. I can give you access to the machine if you would like. Hopefully,
> we can suitably the process of setting up a continuous benchmarking
> machine so that if we need to migrate to a new machine, it is not too
> much of a hardship to do so.
> 
> Thanks
> Wes
> 
> On Wed, Apr 11, 2018 at 9:40 AM, Antoine Pitrou <an...@python.org> wrote:
>>
>> Hello
>>
>> With the following changes, it seems we might reach the point where
>> we're able to run the Python-based benchmark suite accross multiple
>> commits (at least the ones not anterior to those changes):
>> https://github.com/apache/arrow/pull/1775
>>
>> To make this truly useful, we would need a dedicated host.  Ideally a
>> (Linux) OS running on bare metal, with SMT/HyperThreading disabled.
>> If running virtualized, the VM should have dedicated physical CPU cores.
>>
>> That machine would run the benchmarks on a regular basis (perhaps once
>> per night) and publish the results in static HTML form somewhere.
>>
>> (note: nice to have in the future might be access to NVidia hardware,
>> but right now there are no CUDA benchmarks in the Python benchmarks)
>>
>> What should be the procedure here?
>>
>> Regards
>>
>> Antoine.
>

Re: Continuous benchmarking setup

Posted by Tom Augspurger <to...@gmail.com>.

https://github.com/TomAugspurger/asv-runner/ is the setup for the projects currently running. Adding arrow to  https://github.com/TomAugspurger/asv-runner/blob/master/tests/full.yml might work. I'll have to redeploy with the update.

________________________________
From: Wes McKinney <we...@gmail.com>
Sent: Thursday, April 12, 2018 7:24:20 PM
To: dev@arrow.apache.org
Subject: Re: Continuous benchmarking setup

hi Antoine,

I have a bare metal machine at home (affectionately known as the
"pandabox") that's available via SSH that we've been using for
continuous benchmarking for other projects. Arrow is welcome to use
it. I can give you access to the machine if you would like. Hopefully,
we can suitably the process of setting up a continuous benchmarking
machine so that if we need to migrate to a new machine, it is not too
much of a hardship to do so.

Thanks
Wes

On Wed, Apr 11, 2018 at 9:40 AM, Antoine Pitrou <an...@python.org> wrote:
>
> Hello
>
> With the following changes, it seems we might reach the point where
> we're able to run the Python-based benchmark suite accross multiple
> commits (at least the ones not anterior to those changes):
> https://github.com/apache/arrow/pull/1775
>
> To make this truly useful, we would need a dedicated host.  Ideally a
> (Linux) OS running on bare metal, with SMT/HyperThreading disabled.
> If running virtualized, the VM should have dedicated physical CPU cores.
>
> That machine would run the benchmarks on a regular basis (perhaps once
> per night) and publish the results in static HTML form somewhere.
>
> (note: nice to have in the future might be access to NVidia hardware,
> but right now there are no CUDA benchmarks in the Python benchmarks)
>
> What should be the procedure here?
>
> Regards
>
> Antoine.

Re: Continuous benchmarking setup

Posted by Wes McKinney <we...@gmail.com>.

hi Antoine,

I have a bare metal machine at home (affectionately known as the
"pandabox") that's available via SSH that we've been using for
continuous benchmarking for other projects. Arrow is welcome to use
it. I can give you access to the machine if you would like. Hopefully,
we can suitably the process of setting up a continuous benchmarking
machine so that if we need to migrate to a new machine, it is not too
much of a hardship to do so.

Thanks
Wes

On Wed, Apr 11, 2018 at 9:40 AM, Antoine Pitrou <an...@python.org> wrote:
>
> Hello
>
> With the following changes, it seems we might reach the point where
> we're able to run the Python-based benchmark suite accross multiple
> commits (at least the ones not anterior to those changes):
> https://github.com/apache/arrow/pull/1775
>
> To make this truly useful, we would need a dedicated host.  Ideally a
> (Linux) OS running on bare metal, with SMT/HyperThreading disabled.
> If running virtualized, the VM should have dedicated physical CPU cores.
>
> That machine would run the benchmarks on a regular basis (perhaps once
> per night) and publish the results in static HTML form somewhere.
>
> (note: nice to have in the future might be access to NVidia hardware,
> but right now there are no CUDA benchmarks in the Python benchmarks)
>
> What should be the procedure here?
>
> Regards
>
> Antoine.

Re: Continuous benchmarking setup

Posted by Wes McKinney <we...@gmail.com>.

I know the tool we are using for Python benchmarks is Python-specific
-- it would be interesting to see if there's a way to ingest benchmark
output (as JSON or some other output) from other programming
languages.

On Mon, May 14, 2018 at 8:56 AM, Brian Hulette <br...@ccri.com> wrote:
> Is anyone aware of a way we could set up similar continuous benchmarks for
> JS? We wrote some benchmarks earlier this year but currently have no
> automated way of running them.
>
> Brian
>
>
>
> On 05/11/2018 08:21 PM, Wes McKinney wrote:
>>
>> Thanks Tom and Antoine!
>>
>> Since these benchmarks are literally running on a machine in my closet
>> at home, there may be some downtime in the future. At some point we
>> should document a process of setting up a new machine from scratch to
>> be the nightly bare metal benchmark slave.
>>
>> - Wes
>>
>> On Fri, May 11, 2018 at 9:08 AM, Antoine Pitrou <so...@pitrou.net>
>> wrote:
>>>
>>> Hi again,
>>>
>>> Tom has configured the benchmarking machine to run and publish Arrow's
>>> ASV-based benchmarks.  The latest results can now be seen at:
>>> https://pandas.pydata.org/speed/arrow/
>>>
>>> I expect these are regenerated on a regular (daily?) basis.
>>>
>>> Thanks Tom :-)
>>>
>>> Regards
>>>
>>> Antoine.
>>>
>>>
>>> On Wed, 11 Apr 2018 15:40:17 +0200
>>> Antoine Pitrou <an...@python.org> wrote:
>>>>
>>>> Hello
>>>>
>>>> With the following changes, it seems we might reach the point where
>>>> we're able to run the Python-based benchmark suite accross multiple
>>>> commits (at least the ones not anterior to those changes):
>>>> https://github.com/apache/arrow/pull/1775
>>>>
>>>> To make this truly useful, we would need a dedicated host.  Ideally a
>>>> (Linux) OS running on bare metal, with SMT/HyperThreading disabled.
>>>> If running virtualized, the VM should have dedicated physical CPU cores.
>>>>
>>>> That machine would run the benchmarks on a regular basis (perhaps once
>>>> per night) and publish the results in static HTML form somewhere.
>>>>
>>>> (note: nice to have in the future might be access to NVidia hardware,
>>>> but right now there are no CUDA benchmarks in the Python benchmarks)
>>>>
>>>> What should be the procedure here?
>>>>
>>>> Regards
>>>>
>>>> Antoine.
>>>>
>

Re: Continuous benchmarking setup

Posted by Brian Hulette <br...@ccri.com>.

Is anyone aware of a way we could set up similar continuous benchmarks 
for JS? We wrote some benchmarks earlier this year but currently have no 
automated way of running them.

Brian


On 05/11/2018 08:21 PM, Wes McKinney wrote:
> Thanks Tom and Antoine!
>
> Since these benchmarks are literally running on a machine in my closet
> at home, there may be some downtime in the future. At some point we
> should document a process of setting up a new machine from scratch to
> be the nightly bare metal benchmark slave.
>
> - Wes
>
> On Fri, May 11, 2018 at 9:08 AM, Antoine Pitrou <so...@pitrou.net> wrote:
>> Hi again,
>>
>> Tom has configured the benchmarking machine to run and publish Arrow's
>> ASV-based benchmarks.  The latest results can now be seen at:
>> https://pandas.pydata.org/speed/arrow/
>>
>> I expect these are regenerated on a regular (daily?) basis.
>>
>> Thanks Tom :-)
>>
>> Regards
>>
>> Antoine.
>>
>>
>> On Wed, 11 Apr 2018 15:40:17 +0200
>> Antoine Pitrou <an...@python.org> wrote:
>>> Hello
>>>
>>> With the following changes, it seems we might reach the point where
>>> we're able to run the Python-based benchmark suite accross multiple
>>> commits (at least the ones not anterior to those changes):
>>> https://github.com/apache/arrow/pull/1775
>>>
>>> To make this truly useful, we would need a dedicated host.  Ideally a
>>> (Linux) OS running on bare metal, with SMT/HyperThreading disabled.
>>> If running virtualized, the VM should have dedicated physical CPU cores.
>>>
>>> That machine would run the benchmarks on a regular basis (perhaps once
>>> per night) and publish the results in static HTML form somewhere.
>>>
>>> (note: nice to have in the future might be access to NVidia hardware,
>>> but right now there are no CUDA benchmarks in the Python benchmarks)
>>>
>>> What should be the procedure here?
>>>
>>> Regards
>>>
>>> Antoine.
>>>

Re: Continuous benchmarking setup

Posted by Wes McKinney <we...@gmail.com>.

Thanks Tom and Antoine!

Since these benchmarks are literally running on a machine in my closet
at home, there may be some downtime in the future. At some point we
should document a process of setting up a new machine from scratch to
be the nightly bare metal benchmark slave.

- Wes

On Fri, May 11, 2018 at 9:08 AM, Antoine Pitrou <so...@pitrou.net> wrote:
>
> Hi again,
>
> Tom has configured the benchmarking machine to run and publish Arrow's
> ASV-based benchmarks.  The latest results can now be seen at:
> https://pandas.pydata.org/speed/arrow/
>
> I expect these are regenerated on a regular (daily?) basis.
>
> Thanks Tom :-)
>
> Regards
>
> Antoine.
>
>
> On Wed, 11 Apr 2018 15:40:17 +0200
> Antoine Pitrou <an...@python.org> wrote:
>> Hello
>>
>> With the following changes, it seems we might reach the point where
>> we're able to run the Python-based benchmark suite accross multiple
>> commits (at least the ones not anterior to those changes):
>> https://github.com/apache/arrow/pull/1775
>>
>> To make this truly useful, we would need a dedicated host.  Ideally a
>> (Linux) OS running on bare metal, with SMT/HyperThreading disabled.
>> If running virtualized, the VM should have dedicated physical CPU cores.
>>
>> That machine would run the benchmarks on a regular basis (perhaps once
>> per night) and publish the results in static HTML form somewhere.
>>
>> (note: nice to have in the future might be access to NVidia hardware,
>> but right now there are no CUDA benchmarks in the Python benchmarks)
>>
>> What should be the procedure here?
>>
>> Regards
>>
>> Antoine.
>>
>

Re: Continuous benchmarking setup

Posted by Antoine Pitrou <so...@pitrou.net>.

Hi again,

Tom has configured the benchmarking machine to run and publish Arrow's
ASV-based benchmarks.  The latest results can now be seen at:
https://pandas.pydata.org/speed/arrow/

I expect these are regenerated on a regular (daily?) basis.

Thanks Tom :-)

Regards

Antoine.


On Wed, 11 Apr 2018 15:40:17 +0200
Antoine Pitrou <an...@python.org> wrote:
> Hello
> 
> With the following changes, it seems we might reach the point where
> we're able to run the Python-based benchmark suite accross multiple
> commits (at least the ones not anterior to those changes):
> https://github.com/apache/arrow/pull/1775
> 
> To make this truly useful, we would need a dedicated host.  Ideally a
> (Linux) OS running on bare metal, with SMT/HyperThreading disabled.
> If running virtualized, the VM should have dedicated physical CPU cores.
> 
> That machine would run the benchmarks on a regular basis (perhaps once
> per night) and publish the results in static HTML form somewhere.
> 
> (note: nice to have in the future might be access to NVidia hardware,
> but right now there are no CUDA benchmarks in the Python benchmarks)
> 
> What should be the procedure here?
> 
> Regards
> 
> Antoine.
>