You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by "comphead (via GitHub)" <gi...@apache.org> on 2023/04/28 15:48:27 UTC

[GitHub] [arrow-datafusion] comphead commented on a diff in pull request #6134: Improve `compare.py` output to use `min` times and better column titles

comphead commented on code in PR #6134:
URL: https://github.com/apache/arrow-datafusion/pull/6134#discussion_r1180569668


##########
benchmarks/compare.py:
##########
@@ -64,14 +61,9 @@ def load_from(cls, data: Dict[str, Any]) -> QueryRun:
     def execution_time(self) -> float:
         assert len(self.iterations) >= 1
 
-        # If we don't have enough samples, median() is probably
-        # going to be a worse measure than just an average.
-        if len(self.iterations) < MEAN_THRESHOLD:
-            method = statistics.mean
-        else:
-            method = statistics.median
-
-        return method(iteration.elapsed for iteration in self.iterations)
+        # Use minimum execution time to account for variations / other
+        # things the system was doing
+        return min(iteration.elapsed for iteration in self.iterations)

Review Comment:
   lgtm, but imho, can `min` be deceptive comparing to `avg/mean/median`? 🤔 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org