You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/06/18 01:11:07 UTC

[GitHub] [arrow] wesm opened a new pull request #7475: ARROW-8500: [C++] Add benchmark for using Filter on RecordBatch

wesm opened a new pull request #7475:
URL: https://github.com/apache/arrow/pull/7475


   Since I changed Filter on RecordBatch to transform the filter to indices and use Take, I wanted to have a benchmark to compare the before/after performance so this can also be monitored over time. These benchmarks could use some refactoring but this is at least a starting point. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs commented on pull request #7475: ARROW-8500: [C++] Add benchmark for using Filter on RecordBatch

Posted by GitBox <gi...@apache.org>.
kszucs commented on pull request #7475:
URL: https://github.com/apache/arrow/pull/7475#issuecomment-646025902


   Buildbot parses a specific stdio format from the archery command which was a bit different for this invocation, my guess is passing a specific commit makes the output format different.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #7475: ARROW-8500: [C++] Add benchmark for using Filter on RecordBatch

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7475:
URL: https://github.com/apache/arrow/pull/7475#issuecomment-646179658


   +1, awaiting CI


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #7475: ARROW-8500: [C++] Add benchmark for using Filter on RecordBatch

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7475:
URL: https://github.com/apache/arrow/pull/7475#issuecomment-646264314


   Good stuff. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot commented on pull request #7475: ARROW-8500: [C++] Add benchmark for using Filter on RecordBatch

Posted by GitBox <gi...@apache.org>.
ursabot commented on pull request #7475:
URL: https://github.com/apache/arrow/pull/7475#issuecomment-645968685


   [AMD64 Ubuntu 18.04 C++ Benchmark (#113245)](https://ci.ursalabs.org/#builders/73/builds/85) builder failed with an exception.
   
   Revision: 999865b042c3131920b52b40a2387535168f3a08
   
   Archery: `'archery benchmark ...'` step's traceback:
   ```pycon
   Traceback (most recent call last):
     File "/home/ursabot/.conda/envs/ursabot/lib/python3.7/site-packages/twisted/internet/defer.py", line 654, in _runCallbacks
       current.result = callback(current.result, *args, **kw)
     File "/home/ursabot/.conda/envs/ursabot/lib/python3.7/site-packages/twisted/internet/defer.py", line 1475, in gotResult
       _inlineCallbacks(r, g, status)
     File "/home/ursabot/.conda/envs/ursabot/lib/python3.7/site-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
       result = result.throwExceptionIntoGenerator(g)
     File "/home/ursabot/.conda/envs/ursabot/lib/python3.7/site-packages/twisted/python/failure.py", line 512, in throwExceptionIntoGenerator
       return g.throw(self.type, self.value, self.tb)
   --- <exception caught here> ---
     File "/home/ursabot/.conda/envs/ursabot/lib/python3.7/site-packages/buildbot/process/buildstep.py", line 566, in startStep
       self.results = yield self.run()
     File "/home/ursabot/.conda/envs/ursabot/lib/python3.7/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
       result = g.send(result)
     File "/home/ursabot/ursabot/ursabot/steps.py", line 67, in run
       await log.addContent(content)
     File "/home/ursabot/.conda/envs/ursabot/lib/python3.7/site-packages/buildbot/process/log.py", line 130, in addContent
       return self.lbf.append(text)
     File "/home/ursabot/.conda/envs/ursabot/lib/python3.7/site-packages/buildbot/util/lineboundaries.py", line 62, in append
       text = self.newline_re.sub('\n', text)
   builtins.TypeError: expected string or bytes-like object
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs edited a comment on pull request #7475: ARROW-8500: [C++] Add benchmark for using Filter on RecordBatch

Posted by GitBox <gi...@apache.org>.
kszucs edited a comment on pull request #7475:
URL: https://github.com/apache/arrow/pull/7475#issuecomment-646025902


   Buildbot parses a specific stdio format from the archery command which was a bit different for this invocation, my guess is passing a specific commit makes the output format different.
   
   I'm triggering a benchmark without the contender commit so see whether it is a buildbot parser issue or an archery output formatting error.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #7475: ARROW-8500: [C++] Add benchmark for using Filter on RecordBatch

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #7475:
URL: https://github.com/apache/arrow/pull/7475#issuecomment-645710548


   https://issues.apache.org/jira/browse/ARROW-8500


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #7475: ARROW-8500: [C++] Add benchmark for using Filter on RecordBatch

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7475:
URL: https://github.com/apache/arrow/pull/7475#issuecomment-646022860


   @fsaintjacques @kszucs any idea what went wrong with buildbot?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs commented on pull request #7475: ARROW-8500: [C++] Add benchmark for using Filter on RecordBatch

Posted by GitBox <gi...@apache.org>.
kszucs commented on pull request #7475:
URL: https://github.com/apache/arrow/pull/7475#issuecomment-646042154


   Another guess is `FilterRecordBatch` doesn't exist in the contender.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot commented on pull request #7475: ARROW-8500: [C++] Add benchmark for using Filter on RecordBatch

Posted by GitBox <gi...@apache.org>.
ursabot commented on pull request #7475:
URL: https://github.com/apache/arrow/pull/7475#issuecomment-645714003


   [AMD64 Ubuntu 18.04 C++ Benchmark (#113134)](https://ci.ursalabs.org/#builders/73/builds/84) builder failed with an exception.
   
   Revision: 999865b042c3131920b52b40a2387535168f3a08
   
   Archery: `'archery benchmark ...'` step's traceback:
   ```pycon
   Traceback (most recent call last):
     File "/home/ursabot/.conda/envs/ursabot/lib/python3.7/site-packages/twisted/internet/defer.py", line 654, in _runCallbacks
       current.result = callback(current.result, *args, **kw)
     File "/home/ursabot/.conda/envs/ursabot/lib/python3.7/site-packages/twisted/internet/defer.py", line 1475, in gotResult
       _inlineCallbacks(r, g, status)
     File "/home/ursabot/.conda/envs/ursabot/lib/python3.7/site-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
       result = result.throwExceptionIntoGenerator(g)
     File "/home/ursabot/.conda/envs/ursabot/lib/python3.7/site-packages/twisted/python/failure.py", line 512, in throwExceptionIntoGenerator
       return g.throw(self.type, self.value, self.tb)
   --- <exception caught here> ---
     File "/home/ursabot/.conda/envs/ursabot/lib/python3.7/site-packages/buildbot/process/buildstep.py", line 566, in startStep
       self.results = yield self.run()
     File "/home/ursabot/.conda/envs/ursabot/lib/python3.7/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
       result = g.send(result)
     File "/home/ursabot/ursabot/ursabot/steps.py", line 67, in run
       await log.addContent(content)
     File "/home/ursabot/.conda/envs/ursabot/lib/python3.7/site-packages/buildbot/process/log.py", line 130, in addContent
       return self.lbf.append(text)
     File "/home/ursabot/.conda/envs/ursabot/lib/python3.7/site-packages/buildbot/util/lineboundaries.py", line 62, in append
       text = self.newline_re.sub('\n', text)
   builtins.TypeError: expected string or bytes-like object
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs commented on pull request #7475: ARROW-8500: [C++] Add benchmark for using Filter on RecordBatch

Posted by GitBox <gi...@apache.org>.
kszucs commented on pull request #7475:
URL: https://github.com/apache/arrow/pull/7475#issuecomment-646026066


   @ursabot benchmark --benchmark-filter=FilterRecordBatch


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs edited a comment on pull request #7475: ARROW-8500: [C++] Add benchmark for using Filter on RecordBatch

Posted by GitBox <gi...@apache.org>.
kszucs edited a comment on pull request #7475:
URL: https://github.com/apache/arrow/pull/7475#issuecomment-646025902


   Buildbot parses a specific stdio format from the archery command which was a bit different for this invocation, my guess is passing a specific commit makes the output format different.
   
   I'm triggering a benchmark without the contender commit so see whether it is a buildbot parser issue or an archery output formatting error.
   
   The output contains only a single resultset in the logs whereas the passing benchmarks contain two, so archery doesn't produce the result diff as a json.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot commented on pull request #7475: ARROW-8500: [C++] Add benchmark for using Filter on RecordBatch

Posted by GitBox <gi...@apache.org>.
ursabot commented on pull request #7475:
URL: https://github.com/apache/arrow/pull/7475#issuecomment-646062913


   [AMD64 Ubuntu 18.04 C++ Benchmark (#113289)](https://ci.ursalabs.org/#builders/73/builds/87) builder has been succeeded.
   
   Revision: 999865b042c3131920b52b40a2387535168f3a08
   
   ```diff
     ======================================  ===============  ===============  =========
     benchmark                               baseline         contender        change
     ======================================  ===============  ===============  =========
   - FilterStringFilterNoNulls/262144/0      3.205 GiB/sec    563.351 MiB/sec  -82.832%
   - FilterInt64FilterWithNulls/262144/4     1.448 GiB/sec    643.680 MiB/sec  -56.595%
   - FilterFSLInt64FilterWithNulls/262144/6  1.061 GiB/sec    334.867 MiB/sec  -69.188%
     FilterFSLInt64FilterNoNulls/262144/2    1.355 GiB/sec    6.347 GiB/sec    368.336%
   - FilterFSLInt64FilterNoNulls/262144/0    1.404 GiB/sec    720.926 MiB/sec  -49.872%
     FilterFSLInt64FilterWithNulls/262144/1  186.786 MiB/sec  516.247 MiB/sec  176.385%
     FilterFSLInt64FilterWithNulls/262144/7  171.996 MiB/sec  468.500 MiB/sec  172.390%
     FilterStringFilterWithNulls/262144/2    2.408 GiB/sec    9.139 GiB/sec    279.573%
     FilterInt64FilterWithNulls/262144/5     544.180 MiB/sec  5.138 GiB/sec    866.755%
     FilterStringFilterNoNulls/262144/9      90.139 MiB/sec   392.643 MiB/sec  335.595%
     FilterInt64FilterNoNulls/262144/9       570.820 MiB/sec  3.250 GiB/sec    482.971%
     FilterStringFilterNoNulls/262144/8      416.738 MiB/sec  10.990 GiB/sec   2600.350%
   - FilterInt64FilterWithNulls/262144/0     1.463 GiB/sec    622.819 MiB/sec  -58.424%
     FilterFSLInt64FilterWithNulls/262144/2  1.061 GiB/sec    4.517 GiB/sec    325.695%
   - FilterStringFilterWithNulls/262144/3    524.535 MiB/sec  438.494 MiB/sec  -16.403%
     FilterInt64FilterNoNulls/262144/3       597.101 MiB/sec  4.326 GiB/sec    641.848%
     FilterInt64FilterWithNulls/262144/7     518.449 MiB/sec  620.439 MiB/sec  19.672%
     FilterStringFilterNoNulls/262144/1      553.473 MiB/sec  716.671 MiB/sec  29.486%
   - FilterInt64FilterNoNulls/262144/4       2.166 GiB/sec    680.128 MiB/sec  -69.332%
     FilterFSLInt64FilterWithNulls/262144/5  179.177 MiB/sec  4.391 GiB/sec    2409.209%
     FilterInt64FilterWithNulls/262144/9     496.572 MiB/sec  547.030 MiB/sec  10.161%
     FilterStringFilterWithNulls/262144/8    284.351 MiB/sec  8.655 GiB/sec    3016.828%
     FilterInt64FilterNoNulls/262144/1       647.779 MiB/sec  1.024 GiB/sec    61.870%
   - FilterFSLInt64FilterWithNulls/262144/0  1.091 GiB/sec    398.361 MiB/sec  -64.327%
     FilterInt64FilterNoNulls/262144/7       565.141 MiB/sec  657.051 MiB/sec  16.263%
     FilterFSLInt64FilterNoNulls/262144/9    169.973 MiB/sec  269.496 MiB/sec  58.552%
     FilterStringFilterNoNulls/262144/2      3.155 GiB/sec    11.443 GiB/sec   262.664%
     FilterStringFilterWithNulls/262144/5    518.426 MiB/sec  8.833 GiB/sec    1644.691%
     FilterStringFilterNoNulls/262144/7      486.759 MiB/sec  681.910 MiB/sec  40.092%
     FilterInt64FilterNoNulls/262144/2       2.160 GiB/sec    7.943 GiB/sec    267.766%
   - FilterStringFilterWithNulls/262144/4    2.359 GiB/sec    649.099 MiB/sec  -73.125%
   - FilterStringFilterWithNulls/262144/6    2.135 GiB/sec    434.104 MiB/sec  -80.147%
   - FilterStringFilterWithNulls/262144/0    2.435 GiB/sec    444.067 MiB/sec  -82.190%
     FilterInt64FilterWithNulls/262144/1     594.768 MiB/sec  648.937 MiB/sec  9.108%
     FilterInt64FilterNoNulls/262144/5       594.885 MiB/sec  7.189 GiB/sec    1137.460%
   - FilterInt64FilterWithNulls/262144/6     1.438 GiB/sec    584.712 MiB/sec  -60.292%
   - FilterStringFilterNoNulls/262144/4      3.134 GiB/sec    711.198 MiB/sec  -77.837%
     FilterStringFilterWithNulls/262144/9    85.327 MiB/sec   398.211 MiB/sec  366.691%
     FilterFSLInt64FilterNoNulls/262144/1    184.492 MiB/sec  565.075 MiB/sec  206.287%
   - FilterStringFilterNoNulls/262144/6      2.876 GiB/sec    488.107 MiB/sec  -83.424%
     FilterFSLInt64FilterWithNulls/262144/8  1.087 GiB/sec    4.335 GiB/sec    298.769%
     FilterInt64FilterNoNulls/262144/0       2.192 GiB/sec    7.987 GiB/sec    264.420%
     FilterFSLInt64FilterNoNulls/262144/8    1.427 GiB/sec    5.784 GiB/sec    305.352%
     FilterFSLInt64FilterNoNulls/262144/7    175.996 MiB/sec  467.499 MiB/sec  165.630%
   - FilterFSLInt64FilterWithNulls/262144/4  1.061 GiB/sec    478.103 MiB/sec  -55.992%
     FilterStringFilterNoNulls/262144/5      526.042 MiB/sec  11.046 GiB/sec   2050.145%
     FilterFSLInt64FilterNoNulls/262144/3    176.402 MiB/sec  560.631 MiB/sec  217.815%
     FilterStringFilterWithNulls/262144/1    545.748 MiB/sec  648.966 MiB/sec  18.913%
   - FilterFSLInt64FilterNoNulls/262144/4    1.359 GiB/sec    523.037 MiB/sec  -62.410%
     FilterInt64FilterWithNulls/262144/3     546.504 MiB/sec  612.891 MiB/sec  12.148%
     FilterFSLInt64FilterNoNulls/262144/5    176.620 MiB/sec  5.881 GiB/sec    3309.737%
     FilterFSLInt64FilterWithNulls/262144/9  178.978 MiB/sec  290.600 MiB/sec  62.367%
     FilterStringFilterWithNulls/262144/7    482.739 MiB/sec  647.174 MiB/sec  34.063%
     FilterInt64FilterWithNulls/262144/8     1.453 GiB/sec    5.236 GiB/sec    260.446%
   - FilterFSLInt64FilterNoNulls/262144/6    1.355 GiB/sec    403.734 MiB/sec  -70.899%
     FilterInt64FilterWithNulls/262144/2     1.449 GiB/sec    5.155 GiB/sec    255.704%
     FilterInt64FilterNoNulls/262144/8       2.214 GiB/sec    7.144 GiB/sec    222.645%
     FilterInt64FilterNoNulls/262144/6       2.164 GiB/sec    3.805 GiB/sec    75.808%
     FilterStringFilterNoNulls/262144/3      529.096 MiB/sec  550.969 MiB/sec  4.134%
     FilterFSLInt64FilterWithNulls/262144/3  178.992 MiB/sec  351.963 MiB/sec  96.636%
     ======================================  ===============  ===============  =========
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot commented on pull request #7475: ARROW-8500: [C++] Add benchmark for using Filter on RecordBatch

Posted by GitBox <gi...@apache.org>.
ursabot commented on pull request #7475:
URL: https://github.com/apache/arrow/pull/7475#issuecomment-646036977


   [AMD64 Ubuntu 18.04 C++ Benchmark (#113280)](https://ci.ursalabs.org/#builders/73/builds/86) builder failed with an exception.
   
   Revision: 999865b042c3131920b52b40a2387535168f3a08
   
   Archery: `'archery benchmark ...'` step's traceback:
   ```pycon
   Traceback (most recent call last):
     File "/home/ursabot/.conda/envs/ursabot/lib/python3.7/site-packages/twisted/internet/defer.py", line 654, in _runCallbacks
       current.result = callback(current.result, *args, **kw)
     File "/home/ursabot/.conda/envs/ursabot/lib/python3.7/site-packages/twisted/internet/defer.py", line 1475, in gotResult
       _inlineCallbacks(r, g, status)
     File "/home/ursabot/.conda/envs/ursabot/lib/python3.7/site-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
       result = result.throwExceptionIntoGenerator(g)
     File "/home/ursabot/.conda/envs/ursabot/lib/python3.7/site-packages/twisted/python/failure.py", line 512, in throwExceptionIntoGenerator
       return g.throw(self.type, self.value, self.tb)
   --- <exception caught here> ---
     File "/home/ursabot/.conda/envs/ursabot/lib/python3.7/site-packages/buildbot/process/buildstep.py", line 566, in startStep
       self.results = yield self.run()
     File "/home/ursabot/.conda/envs/ursabot/lib/python3.7/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
       result = g.send(result)
     File "/home/ursabot/ursabot/ursabot/steps.py", line 67, in run
       await log.addContent(content)
     File "/home/ursabot/.conda/envs/ursabot/lib/python3.7/site-packages/buildbot/process/log.py", line 130, in addContent
       return self.lbf.append(text)
     File "/home/ursabot/.conda/envs/ursabot/lib/python3.7/site-packages/buildbot/util/lineboundaries.py", line 62, in append
       text = self.newline_re.sub('\n', text)
   builtins.TypeError: expected string or bytes-like object
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm closed pull request #7475: ARROW-8500: [C++] Add benchmark for using Filter on RecordBatch

Posted by GitBox <gi...@apache.org>.
wesm closed pull request #7475:
URL: https://github.com/apache/arrow/pull/7475


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] fsaintjacques commented on pull request #7475: ARROW-8500: [C++] Add benchmark for using Filter on RecordBatch

Posted by GitBox <gi...@apache.org>.
fsaintjacques commented on pull request #7475:
URL: https://github.com/apache/arrow/pull/7475#issuecomment-646243118


   I confirm locally with taxi dataset, runtime for a low selectivity (total_amount > 200$, 120k / 1.5b rows) goes from 9s to 3s. Niice improvement.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #7475: ARROW-8500: [C++] Add benchmark for using Filter on RecordBatch

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7475:
URL: https://github.com/apache/arrow/pull/7475#issuecomment-645709018


   @ursabot benchmark --benchmark-filter=FilterRecordBatch 22f374102


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs removed a comment on pull request #7475: ARROW-8500: [C++] Add benchmark for using Filter on RecordBatch

Posted by GitBox <gi...@apache.org>.
kszucs removed a comment on pull request #7475:
URL: https://github.com/apache/arrow/pull/7475#issuecomment-646026066


   @ursabot benchmark --benchmark-filter=FilterRecordBatch


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #7475: ARROW-8500: [C++] Add benchmark for using Filter on RecordBatch

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7475:
URL: https://github.com/apache/arrow/pull/7475#issuecomment-646137349


   I created https://github.com/wesm/arrow/tree/ARROW-8500-comparison for running apples-to-apples benchmark comparisons
   
   For filtering record batches, the new selection vector approach is 5-40x faster. The performance improvement goes up drastically for low selectivity filters.
   
   * performance now: https://gist.github.com/wesm/b76e6928b7b18815cb5b33d730f8563d
   * performance before: https://gist.github.com/wesm/2f4b5339e968f55e9b08c2de5d8393cd


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] kszucs commented on pull request #7475: ARROW-8500: [C++] Add benchmark for using Filter on RecordBatch

Posted by GitBox <gi...@apache.org>.
kszucs commented on pull request #7475:
URL: https://github.com/apache/arrow/pull/7475#issuecomment-646041235


   @ursabot benchmark --benchmark-filter=Filter 22f3741


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #7475: ARROW-8500: [C++] Add benchmark for using Filter on RecordBatch

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7475:
URL: https://github.com/apache/arrow/pull/7475#issuecomment-646078371


   Note: those Filter benchmarks are garbage because they don't include the RandomArrayGenerator::Boolean bugfix


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot removed a comment on pull request #7475: ARROW-8500: [C++] Add benchmark for using Filter on RecordBatch

Posted by GitBox <gi...@apache.org>.
ursabot removed a comment on pull request #7475:
URL: https://github.com/apache/arrow/pull/7475#issuecomment-646036977


   [AMD64 Ubuntu 18.04 C++ Benchmark (#113280)](https://ci.ursalabs.org/#builders/73/builds/86) builder failed with an exception.
   
   Revision: 999865b042c3131920b52b40a2387535168f3a08
   
   Archery: `'archery benchmark ...'` step's traceback:
   ```pycon
   Traceback (most recent call last):
     File "/home/ursabot/.conda/envs/ursabot/lib/python3.7/site-packages/twisted/internet/defer.py", line 654, in _runCallbacks
       current.result = callback(current.result, *args, **kw)
     File "/home/ursabot/.conda/envs/ursabot/lib/python3.7/site-packages/twisted/internet/defer.py", line 1475, in gotResult
       _inlineCallbacks(r, g, status)
     File "/home/ursabot/.conda/envs/ursabot/lib/python3.7/site-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
       result = result.throwExceptionIntoGenerator(g)
     File "/home/ursabot/.conda/envs/ursabot/lib/python3.7/site-packages/twisted/python/failure.py", line 512, in throwExceptionIntoGenerator
       return g.throw(self.type, self.value, self.tb)
   --- <exception caught here> ---
     File "/home/ursabot/.conda/envs/ursabot/lib/python3.7/site-packages/buildbot/process/buildstep.py", line 566, in startStep
       self.results = yield self.run()
     File "/home/ursabot/.conda/envs/ursabot/lib/python3.7/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
       result = g.send(result)
     File "/home/ursabot/ursabot/ursabot/steps.py", line 67, in run
       await log.addContent(content)
     File "/home/ursabot/.conda/envs/ursabot/lib/python3.7/site-packages/buildbot/process/log.py", line 130, in addContent
       return self.lbf.append(text)
     File "/home/ursabot/.conda/envs/ursabot/lib/python3.7/site-packages/buildbot/util/lineboundaries.py", line 62, in append
       text = self.newline_re.sub('\n', text)
   builtins.TypeError: expected string or bytes-like object
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #7475: ARROW-8500: [C++] Add benchmark for using Filter on RecordBatch

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7475:
URL: https://github.com/apache/arrow/pull/7475#issuecomment-646076369


   @kszucs oh right, that would do it 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #7475: ARROW-8500: [C++] Add benchmark for using Filter on RecordBatch

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7475:
URL: https://github.com/apache/arrow/pull/7475#issuecomment-646022727


   @ursabot benchmark --benchmark-filter=FilterRecordBatch 22f3741


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org