You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "assignUser (via GitHub)" <gi...@apache.org> on 2024/03/14 04:32:36 UTC

Re: [PR] chore(dev/benchmarks): Reorganize benchmarks such that they can build/run against previous versions [arrow-nanoarrow]

assignUser commented on code in PR #398:
URL: https://github.com/apache/arrow-nanoarrow/pull/398#discussion_r1524181279


##########
dev/benchmarks/CMakeLists.txt:
##########
@@ -0,0 +1,80 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+message(STATUS "Building using CMake version: ${CMAKE_VERSION}")
+cmake_minimum_required(VERSION 3.14)
+include(FetchContent)
+
+project(nanoarrow_benchmarks)
+
+if(NOT DEFINED CMAKE_C_STANDARD)
+  set(CMAKE_C_STANDARD 99)

Review Comment:
   You probably want to add [C(XX)_STANDARD_REQUIRED](https://cmake.org/cmake/help/latest/prop_tgt/C_STANDARD_REQUIRED.html) otherwise this can decay to a previous standard.



##########
dev/benchmarks/CMakeLists.txt:
##########
@@ -0,0 +1,80 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+message(STATUS "Building using CMake version: ${CMAKE_VERSION}")
+cmake_minimum_required(VERSION 3.14)
+include(FetchContent)
+
+project(nanoarrow_benchmarks)
+
+if(NOT DEFINED CMAKE_C_STANDARD)
+  set(CMAKE_C_STANDARD 99)
+endif()
+
+if(NOT DEFINED CMAKE_CXX_STANDARD)
+  set(CMAKE_CXX_STANDARD 11)
+endif()
+
+option(NANOARROW_BENCHMARK_VERSION "nanoarrow version to benchmark" OFF)
+option(NANOARROW_BENCHMARK_SOURCE_DIR "path to a nanoarrow source checkout to benchmark"

Review Comment:
   This is a bit funky as options can only be boolean but you are also using it as a string. If you set it as a cache variable it would behave similar to an option (a passed -DNANORARROW... would overwrite it) but be more idiomatic. I do wish string options/selections would be a thing (arrow has a custom function for them).



##########
dev/benchmarks/benchmark-run-all.sh:
##########


Review Comment:
   Just overall very nice, short and sweet :icecream: 



##########
dev/benchmarks/CMakeLists.txt:
##########
@@ -0,0 +1,80 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+message(STATUS "Building using CMake version: ${CMAKE_VERSION}")
+cmake_minimum_required(VERSION 3.14)
+include(FetchContent)
+
+project(nanoarrow_benchmarks)
+
+if(NOT DEFINED CMAKE_C_STANDARD)
+  set(CMAKE_C_STANDARD 99)
+endif()
+
+if(NOT DEFINED CMAKE_CXX_STANDARD)
+  set(CMAKE_CXX_STANDARD 11)
+endif()
+
+option(NANOARROW_BENCHMARK_VERSION "nanoarrow version to benchmark" OFF)
+option(NANOARROW_BENCHMARK_SOURCE_DIR "path to a nanoarrow source checkout to benchmark"
+       OFF)
+
+# Avoids a warning about timestamps on downloaded files (prefer new policy
+# if available))
+if(${CMAKE_VERSION} VERSION_GREATER_EQUAL "3.23")
+  cmake_policy(SET CMP0135 NEW)
+endif()
+
+# Use google/benchmark
+set(BENCHMARK_ENABLE_TESTING OFF)
+fetchcontent_declare(benchmark
+                     URL https://github.com/google/benchmark/archive/refs/tags/v1.8.3.zip
+                     URL_HASH SHA256=abfc22e33e3594d0edf8eaddaf4d84a2ffc491ad74b6a7edc6e7a608f690e691
+)
+fetchcontent_makeavailable(benchmark)
+
+if(NANOARROW_BENCHMARK_SOURCE_DIR)
+  fetchcontent_declare(nanoarrow SOURCE_DIR "${NANOARROW_BENCHMARK_SOURCE_DIR}")
+  fetchcontent_makeavailable(nanoarrow)
+elseif(NANOARROW_BENCHMARK_VERSION)
+  fetchcontent_declare(nanoarrow
+                       URL https://github.com/apache/arrow-nanoarrow/archive/refs/tags/apache-arrow-nanoarrow-${NANOARROW_BENCHMARK_VERSION}.zip

Review Comment:
   fyi: You can actually use `URL` for a local dir too!
   >A URL may be an ordinary path in the local file system (in which case it must be the only URL provided) or any downloadable URL supported by the [file(DOWNLOAD)](https://cmake.org/cmake/help/latest/command/file.html#download) command. A local filesystem path may refer to either an existing directory or to an archive file, whereas a URL is expected to point to a file which can be treated as an archive.



##########
dev/benchmarks/benchmark-report.qmd:
##########
@@ -0,0 +1,227 @@
+---
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+title: "Benchmark Report"
+format: gfm
+---
+
+```{r setup, include=FALSE}
+library(tidyverse)
+library(xml2)
+library(bench)
+
+# For linking benchmark sources with https://github.com/{repo}/blob/{ref}/file#Lxx
+# This heuristic is not perfect but will generally result in correct links
+# for PRs that update benchmarks and the general running of benchmarks on the
+# main branch.
+github_ref <- gert::git_branch()
+if (github_ref == "main") {
+  github_repo <- "apache/arrow-nanoarrow"
+} else {
+  remote <- gert::git_remote_info()

Review Comment:
   This actually errors for me but that is due to the way `gh pr checkout` sets up the branch (no remote apparently?). Probably not something that will happen in PRs?



##########
.github/workflows/benchmarks.yaml:
##########
@@ -25,9 +25,11 @@ on:
     branches:
       - main
     paths:
-      - 'CMakeLists.txt'
       - '.github/workflows/benchmarks.yaml'
-      - 'src/**/*_benchmark.cc'
+      - 'dev/benchmarks/**'
+
+permissions:
+  contents: read

Review Comment:
   :clap: 



##########
dev/benchmarks/benchmark-run-all.sh:
##########
@@ -0,0 +1,67 @@
+# Licensed to the Apache Software Foundation (ASF) under one

Review Comment:
   ```suggestion
   #!/usr/bin/env bash
   # Licensed to the Apache Software Foundation (ASF) under one
   ```
   I use zsh and that complains about -o pipefail



##########
dev/benchmarks/.gitignore:
##########


Review Comment:
   sometimes git is so weird xD



##########
dev/benchmarks/CMakeLists.txt:
##########
@@ -0,0 +1,80 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+message(STATUS "Building using CMake version: ${CMAKE_VERSION}")
+cmake_minimum_required(VERSION 3.14)
+include(FetchContent)
+
+project(nanoarrow_benchmarks)
+
+if(NOT DEFINED CMAKE_C_STANDARD)
+  set(CMAKE_C_STANDARD 99)
+endif()
+
+if(NOT DEFINED CMAKE_CXX_STANDARD)
+  set(CMAKE_CXX_STANDARD 11)
+endif()
+
+option(NANOARROW_BENCHMARK_VERSION "nanoarrow version to benchmark" OFF)
+option(NANOARROW_BENCHMARK_SOURCE_DIR "path to a nanoarrow source checkout to benchmark"
+       OFF)
+
+# Avoids a warning about timestamps on downloaded files (prefer new policy
+# if available))
+if(${CMAKE_VERSION} VERSION_GREATER_EQUAL "3.23")
+  cmake_policy(SET CMP0135 NEW)
+endif()
+
+# Use google/benchmark
+set(BENCHMARK_ENABLE_TESTING OFF)
+fetchcontent_declare(benchmark
+                     URL https://github.com/google/benchmark/archive/refs/tags/v1.8.3.zip
+                     URL_HASH SHA256=abfc22e33e3594d0edf8eaddaf4d84a2ffc491ad74b6a7edc6e7a608f690e691
+)
+fetchcontent_makeavailable(benchmark)
+
+if(NANOARROW_BENCHMARK_SOURCE_DIR)
+  fetchcontent_declare(nanoarrow SOURCE_DIR "${NANOARROW_BENCHMARK_SOURCE_DIR}")
+  fetchcontent_makeavailable(nanoarrow)
+elseif(NANOARROW_BENCHMARK_VERSION)
+  fetchcontent_declare(nanoarrow
+                       URL https://github.com/apache/arrow-nanoarrow/archive/refs/tags/apache-arrow-nanoarrow-${NANOARROW_BENCHMARK_VERSION}.zip
+  )
+  fetchcontent_makeavailable(nanoarrow)
+endif()
+
+# Check that at least one option was specified. This also lets the top-level
+# CMakeLists.txt use add_subdirectory() for slightly easier benchmark development.
+if(NOT TARGET nanoarrow)
+  message(FATAL_ERROR "nanoarrow target not found (missing -DNANOARROW_BENCHMARK_XXX option?)"
+  )
+endif()
+
+# Add + link tests
+add_executable(schema_benchmark c/schema_benchmark.cc)
+add_executable(array_benchmark c/array_benchmark.cc)
+
+target_link_libraries(schema_benchmark PRIVATE nanoarrow benchmark::benchmark_main)
+target_link_libraries(array_benchmark PRIVATE nanoarrow benchmark::benchmark_main)
+
+# This lets all benchmarks run via ctest -VV when this is the top-level project
+include(CTest)
+enable_testing()

Review Comment:
   This is technically redundant as including CTest also enables it but I have had mixed results with that so I also usually call it explicitly :)



##########
dev/benchmarks/README.md:
##########
@@ -0,0 +1,51 @@
+<!---
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+# Benchmarking nanoarrow
+
+This subdirectory contains benchmarks and tools to run them. This is currently
+only implemented for the C library but may expand to include the R and Python
+bindings. The structure is as follows:
+
+- Benchmarks are documented inline using [Doxygen](https://www.doxygen.nl/).
+- Configurations are CMake build presets, and CMake handles pulling a previous
+  or local nanoarrow using `FetchContent`. Benchmarks are run using `ctest`.
+- There is a bare-bones report written as a [Quarto](https://quarto.org)
+  document that renders to markdown.
+
+You can run benchmarks for a single configuration (e.g., `local`) with:
+
+```shell
+mkdir build && cd build
+cmake .. --preset local
+cmake --build .
+ctest
+```
+
+The provided `benchmark-run-all.sh` creates (or reuses, if they are already
+present) build directories in the form `build/<preset>` for each preset
+and runs `ctest`.
+
+You can build a full report by running:
+
+```shell
+./benchmark-run-all.sh
+cd apidoc && doxygen && cd ..
+quarto render benchmark-report.qmd

Review Comment:
   Is quarto known outside of the R/posit sphere? Otherwise a quick link on how to get it might be good?
   https://quarto.org/docs/get-started/



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org