You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Diana Clarke (Jira)" <ji...@apache.org> on 2021/02/09 20:40:00 UTC
[jira] [Created] (ARROW-11573) [Developer][Archery] Google benchmark now reports run type

Diana Clarke created ARROW-11573:
------------------------------------

             Summary: [Developer][Archery] Google benchmark now reports run type
                 Key: ARROW-11573
                 URL: https://issues.apache.org/jira/browse/ARROW-11573
             Project: Apache Arrow
          Issue Type: Bug
          Components: Archery, Developer Tools
            Reporter: Diana Clarke
            Assignee: Diana Clarke


Google Benchmark now reports run type [1], so the following code and comment can be updated.

{code}
    Observations are found when running with `--benchmark_repetitions`. Sadly,
    the format mixes values and aggregates, e.g.

    RegressionSumKernel/32768/0                 1 us          1 us  25.8077GB/s
    RegressionSumKernel/32768/0                 1 us          1 us  25.7066GB/s
    RegressionSumKernel/32768/0                 1 us          1 us  25.1481GB/s
    RegressionSumKernel/32768/0                 1 us          1 us  25.846GB/s
    RegressionSumKernel/32768/0                 1 us          1 us  25.6453GB/s
    RegressionSumKernel/32768/0_mean            1 us          1 us  25.6307GB/s
    RegressionSumKernel/32768/0_median          1 us          1 us  25.7066GB/s
    RegressionSumKernel/32768/0_stddev          0 us          0 us  288.046MB/s

    As from benchmark v1.4.1 (2019-04-24), the only way to differentiate an
    actual run from the aggregates, is to match on the benchmark name. The
    aggregates will be appended with `_$agg_name`.

    This class encapsulate the logic to separate runs from aggregate . This is
    hopefully avoided in benchmark's master version with a separate json
    attribute.
{code}

{code}
    @property
    def is_agg(self):
        """ Indicate if the observation is a run or an aggregate. """
        suffixes = ["_mean", "_median", "_stddev"]
        return any(map(lambda x: self._name.endswith(x), suffixes))
{code}


Here's example output (note the aggregate vs the actual observation):

{code}
 {'aggregate_name': 'mean',
  'cpu_time': 9818703.124999983,
  'items_per_second': 26700744.55186333,
  'iterations': 3,
  'name': 'TakeStringRandomIndicesWithNulls/262144/0_mean',
  'null_percent': 0.0,
  'real_time': 10138621.349445505,
  'repetitions': 0,
  'run_name': 'TakeStringRandomIndicesWithNulls/262144/0',
  'run_type': 'aggregate',
  'size': 262144.0,
  'threads': 1,
  'time_unit': 'ns'},
 {'cpu_time': 9718937.499999996,
  'items_per_second': 26972495.707478322,
  'iterations': 64,
  'name': 'TakeStringRandomIndicesWithNulls/262144/0',
  'null_percent': 0.0,
  'real_time': 10297947.859726265,
  'repetition_index': 2,
  'repetitions': 0,
  'run_name': 'TakeStringRandomIndicesWithNulls/262144/0',
  'run_type': 'iteration',
  'size': 262144.0,
  'threads': 1,
  'time_unit': 'ns'},
{code}

[1] https://github.com/google/benchmark/commit/8688c5c4cfa1527ceca2136b2a738d9712a01890





--
This message was sent by Atlassian Jira
(v8.3.4#803005)