You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/02/09 01:50:19 UTC

[GitHub] [arrow-datafusion] dianaclarke opened a new pull request #1791: DataFusion + Conbench Integration

dianaclarke opened a new pull request #1791:
URL: https://github.com/apache/arrow-datafusion/pull/1791


   Here's a minimal DataFusion + Conbench proof of concept.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] dianaclarke edited a comment on pull request #1791: DataFusion + Conbench Integration

Posted by GitBox <gi...@apache.org>.
dianaclarke edited a comment on pull request #1791:
URL: https://github.com/apache/arrow-datafusion/pull/1791#issuecomment-1034170039


   > Thank you @dianaclarke -- I will try and review this carefully over the next day or two. CC @andygrove and @Dandandan
   
   No rush & no pressure to merge this (or the sister `arrow-rs` pull request). I don't actually work on anything Arrow related anymore – just didn't want to leave you hanging. Cheers!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] dianaclarke commented on a change in pull request #1791: DataFusion + Conbench Integration

Posted by GitBox <gi...@apache.org>.
dianaclarke commented on a change in pull request #1791:
URL: https://github.com/apache/arrow-datafusion/pull/1791#discussion_r802300002



##########
File path: conbench/_criterion.py
##########
@@ -0,0 +1,99 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import collections
+import csv
+import os
+import pathlib
+import subprocess
+
+import conbench.runner
+from conbench.machine_info import github_info
+
+
+def _result_in_seconds(row):
+    # sample_measured_value - The value of the measurement for this sample.
+    # Note that this is the measured value for the whole sample, not the
+    # time-per-iteration To calculate the time-per-iteration, use
+    # sample_measured_value/iteration_count
+    # -- https://bheisler.github.io/criterion.rs/book/user_guide/csv_output.html
+    count = int(row["iteration_count"])
+    sample = float(row["sample_measured_value"])
+    return sample / count / 10**9
+
+
+def _parse_benchmark_group(row):
+    parts = row["group"].split(",")

Review comment:
       TODO: strip extra spaces, or abort trying to bucket and just go with the names as-is?
   
   ```
       "tags": {
           "name": " u64_wide,  built-in functions",
           "suite": "window partition and order by"
       },
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] houqp merged pull request #1791: DataFusion + Conbench Integration

Posted by GitBox <gi...@apache.org>.
houqp merged pull request #1791:
URL: https://github.com/apache/arrow-datafusion/pull/1791


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] dianaclarke commented on a change in pull request #1791: DataFusion + Conbench Integration

Posted by GitBox <gi...@apache.org>.
dianaclarke commented on a change in pull request #1791:
URL: https://github.com/apache/arrow-datafusion/pull/1791#discussion_r802240777



##########
File path: conbench/.flake8
##########
@@ -0,0 +1,19 @@
+# Licensed to the Apache Software Foundation (ASF) under one

Review comment:
       From a Python perspective, it's probably a mistake to name this directory `conbench`. Thoughts on what you would like `arrow-datafusion/conbench/` to be named?
   
   - `arrow-datafusion/_conbench/`?
   - `arrow-datafusion/conbench-benchmarks/`?
   - `arrow-datafusion/conbench-integration`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] dianaclarke commented on a change in pull request #1791: DataFusion + Conbench Integration

Posted by GitBox <gi...@apache.org>.
dianaclarke commented on a change in pull request #1791:
URL: https://github.com/apache/arrow-datafusion/pull/1791#discussion_r802240777



##########
File path: conbench/.flake8
##########
@@ -0,0 +1,19 @@
+# Licensed to the Apache Software Foundation (ASF) under one

Review comment:
       From a Python perspective, it's probably a mistake to name this directory `conbench`. Thoughts on what you would like `arrow-datafusion/conbench/` to be named?
   
   - `arrow-datafusion/_conbench/`?
   - `arrow-datafusion/conbench-benchmarks/`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] dianaclarke commented on pull request #1791: DataFusion + Conbench Integration

Posted by GitBox <gi...@apache.org>.
dianaclarke commented on pull request #1791:
URL: https://github.com/apache/arrow-datafusion/pull/1791#issuecomment-1034170039


   > Thank you @dianaclarke -- I will try and review this carefully over the next day or two. CC @andygrove and @Dandandan
   
   No rush & no pressure to merge this (or the sister `arrow-rs` pull request). I don't actually work on anything Arrow related anymore – just didn't want to leave you hanging. Cheers


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on pull request #1791: DataFusion + Conbench Integration

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #1791:
URL: https://github.com/apache/arrow-datafusion/pull/1791#issuecomment-1034166511


   Thank you @dianaclarke  -- I will try and review this carefully over the next day or two. CC @andygrove  and @Dandandan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] houqp commented on pull request #1791: DataFusion + Conbench Integration

Posted by GitBox <gi...@apache.org>.
houqp commented on pull request #1791:
URL: https://github.com/apache/arrow-datafusion/pull/1791#issuecomment-1046180527


   Thank you @dianaclarke , this is going to be a big productivity boost for us on all performance related PRs :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] dianaclarke commented on a change in pull request #1791: DataFusion + Conbench Integration

Posted by GitBox <gi...@apache.org>.
dianaclarke commented on a change in pull request #1791:
URL: https://github.com/apache/arrow-datafusion/pull/1791#discussion_r802300002



##########
File path: conbench/_criterion.py
##########
@@ -0,0 +1,99 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import collections
+import csv
+import os
+import pathlib
+import subprocess
+
+import conbench.runner
+from conbench.machine_info import github_info
+
+
+def _result_in_seconds(row):
+    # sample_measured_value - The value of the measurement for this sample.
+    # Note that this is the measured value for the whole sample, not the
+    # time-per-iteration To calculate the time-per-iteration, use
+    # sample_measured_value/iteration_count
+    # -- https://bheisler.github.io/criterion.rs/book/user_guide/csv_output.html
+    count = int(row["iteration_count"])
+    sample = float(row["sample_measured_value"])
+    return sample / count / 10**9
+
+
+def _parse_benchmark_group(row):
+    parts = row["group"].split(",")

Review comment:
       TODO: strip extra spaces, or abort trying to bucket and just go with the names as-is?
   
   ```
       "tags": {
           "name": " u64_wide,  built-in functions",
           "suite": "window partition and order by"
       },
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] andygrove commented on pull request #1791: DataFusion + Conbench Integration

Posted by GitBox <gi...@apache.org>.
andygrove commented on pull request #1791:
URL: https://github.com/apache/arrow-datafusion/pull/1791#issuecomment-1046077286


   If there are no objections, I plan to merge this PR next week so that I can proceed to the next step of integrating this with CI. The changes in this PR are self-contained so I think it is low risk to merge.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org