You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/05/07 19:05:07 UTC

[GitHub] [arrow] thisisnic opened a new pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

thisisnic opened a new pull request #10269:
URL: https://github.com/apache/arrow/pull/10269


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ianmcook commented on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-836683724


   > Shall I create a new ticket to add code which checks column names are unique when combining objects?
   
   Yes, please create a Jira for that, thanks! And if you happen to solve it in this PR, then you can resolve that Jira with a comment indicating that it was solved in this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ElenaHenderson commented on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ElenaHenderson commented on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-842553784


   @ursabot please benchmark name=file-read lang=R


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] thisisnic commented on a change in pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
thisisnic commented on a change in pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#discussion_r648045038



##########
File path: r/R/util.R
##########
@@ -139,3 +139,42 @@ attr(is_writable_table, "fail") <- function(call, env){
   )
 }
 
+#' Take an object of length 1 and repeat it.
+#' 
+#' @param object Object of length 1 to be repeated - vector, `Scalar`, `Array`, or `ChunkedArray`
+#' @param n Number of repetitions
+#' 
+#' @return `Array` of length `n`
+#' 
+#' @keywords internal
+repeat_value_as_array <- function(object, n) {
+  if (inherits(object, "ChunkedArray")) {
+    return(Scalar$create(object$chunks[[1]])$as_array(n))
+  }
+  return(Scalar$create(object)$as_array(n))
+}
+
+#' Recycle scalar values in a list of arrays
+#' 
+#' @param arrays List of arrays
+#' @return List of arrays with any vector/Scalar/Array/ChunkedArray values of length 1 recycled 
+#' @keywords internal
+recycle_scalars <- function(arrays){
+  # Get lengths of items in arrays
+  arr_lens <- map_int(arrays, NROW)
+  
+  if (length(arrays) > 1 && any(arr_lens == 1) && !all(arr_lens==1)) {
+    
+    # Recycling not supported for tibbles and data.frames
+    if(all(map_lgl(arrays, ~inherits(.x, "data.frame")))){
+      abort(c(
+          "All input tibbles or data.frames must have the same number of rows",
+          x = paste("Number of rows in inputs:",oxford_paste(map_int(arrays, ~nrow(.x))))

Review comment:
       I wasn't before, but now I am now you mention it.  I have updated my error message to just print the longest and shortest length items seeing as I think this is still sufficient to be useful.  Let me know if it looks OK!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   :warning: ec2-t3-xlarge-us-east-2 agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   :warning: ec2-t3-large-us-east-2 agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-842505335


   Benchmark runs are scheduled for baseline = 325eb073e0fb6971f3dd027299d37850377b39ea and contender = 8363dff11a8319235b1d5bcf4350a0793c5b7903. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0% :warning: Contender and baseline run contexts do not match] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/e07388d4ae5a48e79db8d977e1fbead9...174a0cbbdaa64cac9c91851d28842bab/)
   [Finished :arrow_down:0.0% :arrow_up:0.0% :warning: Contender and baseline run contexts do not match] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/585c76ccb6744629ab19a94e804b66a6...798e36b0e8104da4ba1fe940d6ff4764/)
   [Finished :arrow_down:0.0% :arrow_up:1.45%] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/9eb43ed540a3418f8ee0a75921b77b14...111f823ff77943d79c69f4eb43311fb5/)
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/abe3e3b060824852b59d9934b5ed4af6...b5760c307cf54cf2ac78891f4b203d0f/)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#discussion_r628474112



##########
File path: r/R/table.R
##########
@@ -175,6 +175,16 @@ Table$create <- function(..., schema = NULL) {
     return(dplyr::group_by(out, !!!dplyr::groups(dots[[1]])))
   }
   
+  # If any arrays are length 1, recycle them  
+  arr_lens <- map(dots, length)
+  if (length(dots) > 1 && any(arr_lens == 1) && !all(arr_lens==1)){
+    dots <- purrr::modify2(

Review comment:
       In `arrow-package.R`, add `modify2` to the line beginning with `#' @importFrom purrr`. Then do `devtools::document()` to update the `NAMESPACE` file. Then remove the `purrr::` from this line and the other line in `record-batch.R` where it's used.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
jonkeane commented on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-841238315


   A couple of comments/additions, that I think you're generally right.
   
   The R benchmarks tend to be stable (https://conbench.ursa.dev/compare/runs/8b6fef07829948998502a7677dec6e03...0cbd9dcbe2594e06ab95cf0e088cf25b/ is a run on the master branch and is between -3% and 1% change and that -3% is an outlier there, the next largest decrease is -0.8%). So we can have decent confidence that we're not observing noise alone here. We're working actively to improve this, but wanted to put it out there as part of the assumptions I'm using.
   
   There are some file-read benchmarks that are >5% slower, interestingly it is all (and only) the fanniemae dataset that is slower (both reading from parquet and from feather) and *only* when it is being converted to a data.frame, not when it is being left as a table. This seems a little suspect to me since the only places that I'm seeing you've meaningfully changed the code is `RecordBatch$create`, `Table$create`, and `MakeArrayFromScalar`. Do any of those get called when reading parquet or feather files? 
   
   Note: I don't see csv reads run here, IIRC those were proactively disabled due to memory issues, but I will confirm that (and I thought this machine should have been able to handle these and there is https://issues.apache.org/jira/browse/ARROW-12519 to track).
   
   There are also another number of benchmarks that are in the 5-1% slower range (the other file-read, as well as the df to R conversions, and a handful of the writing benchmarks). The df to R conversions seem more in line with the code that was changed, and those are in the 3-6% range (though most are closer to 3%, with one being an outlier at 6%)
   
   The next 28/128 or ~20% of the benchmarks are 0-1% slower and then 19/138 or ~14% of the benchmarks are 0-1% faster. These are probably all just noise.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#discussion_r630293930



##########
File path: r/R/record-batch.R
##########
@@ -161,6 +161,17 @@ RecordBatch$create <- function(..., schema = NULL) {
     out <- RecordBatch__from_arrays(schema, arrays)
     return(dplyr::group_by(out, !!!dplyr::groups(arrays[[1]])))
   }
+
+  # If any arrays are length 1, recycle them  
+  arr_lens <- map_int(arrays, length)
+  if (length(arrays) > 1 && any(arr_lens == 1) && !all(arr_lens==1)){
+    max_array_len <- max(arr_lens)
+    arrays <- modify2(
+      arrays,
+      arr_lens == 1,
+      ~if(.y) MakeArrayFromScalar(Scalar$create(as.vector(.x)), max_array_len) else .x

Review comment:
       Thanks! I created ARROW-12737 for follow up on `concat_arrays()` and `combine_chunks()`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] thisisnic commented on a change in pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
thisisnic commented on a change in pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#discussion_r635433846



##########
File path: r/tests/testthat/test-RecordBatch.R
##########
@@ -415,14 +414,42 @@ test_that("record_batch() handles null type (ARROW-7064)", {
   expect_equivalent(batch$schema,  schema(a = int32(), n = null()))
 })
 
-test_that("record_batch() scalar recycling", {
-  skip("Not implemented (ARROW-11705)")
+test_that("record_batch() scalar recycling with vectors", {
   expect_data_frame(
     record_batch(a = 1:10, b = 5),
     tibble::tibble(a = 1:10, b = 5)
   )
 })
 
+test_that("record_batch() scalar recycling with Scalars, Arrays, and ChunkedArrays", {
+  
+  expect_data_frame(
+    record_batch(a = Array$create(1:10), b = Scalar$create(5)),
+    tibble::tibble(a = 1:10, b = 5)
+  )
+  
+  expect_data_frame(
+    record_batch(a = Array$create(1:10), b = Array$create(5)),
+    tibble::tibble(a = 1:10, b = 5)
+  )
+  
+  expect_data_frame(
+    record_batch(a = Array$create(1:10), b = ChunkedArray$create(5)),
+    tibble::tibble(a = 1:10, b = 5)
+  )
+  
+})
+
+test_that("record_batch() no recycling with tibbles", {

Review comment:
       @nealrichardson I've now updated this so that the code uses either `length` or `nrow` depending on whether the argument is something that inherits from a `data.frame` or not.  
   
   I got stuck trying to recycle tibbles as I couldn't think how to do it in a reasonable way with the packages we import.
   
   Do you think there is one?  If not, I'm tempted to say it's out of the scope of this ticket anyway.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   :warning: ec2-t3-large-us-east-2 agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ianmcook commented on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-834741650


   Ultimately I think we should try to do scalar value recycling in C++ code, but I think this a great temporary solution in the meantime.
   
   What happens if you pass data frames instead of vectors and one of them has length one (i.e. only one row)? Maybe add a test to check the behavior in that case.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#discussion_r628474112



##########
File path: r/R/table.R
##########
@@ -175,6 +175,16 @@ Table$create <- function(..., schema = NULL) {
     return(dplyr::group_by(out, !!!dplyr::groups(dots[[1]])))
   }
   
+  # If any arrays are length 1, recycle them  
+  arr_lens <- map(dots, length)
+  if (length(dots) > 1 && any(arr_lens == 1) && !all(arr_lens==1)){
+    dots <- purrr::modify2(

Review comment:
       In `arrow-package.R`, add `modify2` to the line beginning with `#' @importFrom purrr`. Then do `devtools::document()` to update the `NAMESPACE` file. Then remove the `purrr::` from this line.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-842556138


   Benchmark runs are scheduled for baseline = 325eb073e0fb6971f3dd027299d37850377b39ea and contender = 8363dff11a8319235b1d5bcf4350a0793c5b7903. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Scheduled] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/e07388d4ae5a48e79db8d977e1fbead9...fdf46169d96349f3b8acd5079a8b11fb/)
   [Finished :arrow_down:0.0% :arrow_up:0.0% :warning: Contender and baseline run contexts do not match] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/585c76ccb6744629ab19a94e804b66a6...22d02b05c14b45f08226e60f3864a40e/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/9eb43ed540a3418f8ee0a75921b77b14...b4fb3f01bc814530b238483d14f12ec0/)
   [Scheduled] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/abe3e3b060824852b59d9934b5ed4af6...4a0fd651b5af4152b2c4e02776e8d9ad/)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#discussion_r629628488



##########
File path: r/R/record-batch.R
##########
@@ -161,6 +161,17 @@ RecordBatch$create <- function(..., schema = NULL) {
     out <- RecordBatch__from_arrays(schema, arrays)
     return(dplyr::group_by(out, !!!dplyr::groups(arrays[[1]])))
   }
+
+  # If any arrays are length 1, recycle them  
+  arr_lens <- map_int(arrays, length)
+  if (length(arrays) > 1 && any(arr_lens == 1) && !all(arr_lens==1)){
+    max_array_len <- max(arr_lens)
+    arrays <- modify2(
+      arrays,
+      arr_lens == 1,
+      ~if(.y) MakeArrayFromScalar(Scalar$create(as.vector(.x)), max_array_len) else .x

Review comment:
       We should find a way to do this without calling `as.vector()` because that will convert Arrow objects to R vectors which could cause the data type to be lost during the conversion.
   
   But the problem is that without `as.vector()`, length 1 `ChunkedArray` objects will error when passed to `Scalar$create()`. I think the cleanest way to solve that is to improve the `Array$create` function in `array.R` to handle `ChunkedArray` objects and convert it to `Array`. (This is getting into yak shaving territory here but I think it's important.)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   :warning: ec2-t3-large-us-east-2 agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ianmcook edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-839127582


   @jonkeane it looks like it's only evaluating the last commit in this PR; the baseline is not the commit _before_ this PR, it's the second to last commit _in_ this PR. Is that a known problem? Is there some way to explicitly specify which baseline commit to use?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   :warning: ec2-t3-xlarge-us-east-2 agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] thisisnic commented on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
thisisnic commented on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-841248102


   > There are some file-read benchmarks that are >5% slower, interestingly it is all (and only) the fanniemae dataset that is slower (both reading from parquet and from feather) and _only_ when it is being converted to a data.frame, not when it is being left as a table. This seems a little suspect to me since the only places that I'm seeing you've meaningfully changed the code is `RecordBatch$create`, `Table$create`, and `MakeArrayFromScalar`. Do any of those get called when reading parquet or feather files?
   
   They do not, which does make it strange; completely overlooked the fact that those shouldn't be relevant here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   :warning: ec2-t3-xlarge-us-east-2 agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   :warning: ec2-t3-xlarge-us-east-2 agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#discussion_r629628488



##########
File path: r/R/record-batch.R
##########
@@ -161,6 +161,17 @@ RecordBatch$create <- function(..., schema = NULL) {
     out <- RecordBatch__from_arrays(schema, arrays)
     return(dplyr::group_by(out, !!!dplyr::groups(arrays[[1]])))
   }
+
+  # If any arrays are length 1, recycle them  
+  arr_lens <- map_int(arrays, length)
+  if (length(arrays) > 1 && any(arr_lens == 1) && !all(arr_lens==1)){
+    max_array_len <- max(arr_lens)
+    arrays <- modify2(
+      arrays,
+      arr_lens == 1,
+      ~if(.y) MakeArrayFromScalar(Scalar$create(as.vector(.x)), max_array_len) else .x

Review comment:
       We should find a way to do this without calling `as.vector()` because that will convert Arrow objects to R vectors which could cause the data type to be lost during the conversion.
   
   But the problem is that without `as.vector()`, length 1 `ChunkedArray` objects will error when passed to `Array$create()` inside `Scalar$create()`. I think the cleanest way to solve that is to improve the `Array$create` function in `array.R` to handle a `ChunkedArray` object and convert it to `Array`. (This is getting into yak shaving territory here but I think it's important.)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   :warning: ec2-t3-large-us-east-2 agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Finished :arrow_down:5.8% :arrow_up:0.0%] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] thisisnic edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
thisisnic edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-841058739


   > @thisisnic ooh we finally have some benchmark results to look at!: [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   > The `dataframe-to-table` results are the pertinent ones here I think.
   > @jonkeane can you help us interpret these results?
   
   I'm going to take a stab at interpreting them just to see how I do.  Overall, the changes I've made have made things 5% slower.  I'm not sure whether this is important - as I don't have a good idea of a cut-off, or idea whether any of this is just noise.  Looking more closely at the results, the biggest differences are to file-read, where the update is at least 8% slower.
   
   Does this feel like a significant slowdown? I'd say so.  I think it provides support to the idea that this really should be implemented at the C++ level rather than the R level.  Is this OK to be merged in? I think "perhaps", as long as we open a JIRA for this to be implemented at the C++ level (which I have done here: https://issues.apache.org/jira/browse/ARROW-12789)
   
   Let me know your thoughts!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ianmcook commented on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838711332


   @nealrichardson you want to take a look before I merge? Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] thisisnic commented on a change in pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
thisisnic commented on a change in pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#discussion_r647568522



##########
File path: r/R/util.R
##########
@@ -110,3 +110,36 @@ handle_embedded_nul_error <- function(e) {
   }
   stop(e)
 }
+
+#' Take an object of length 1 and repeat it.
+#' 
+#' @param object Object of length 1 to be repeated - vector, `Scalar`, `Array`, or `ChunkedArray`
+#' @param n Number of repetitions
+#' 
+#' @return `Array` of length `n`
+#' 
+#' @keywords internal
+repeat_value_as_array <- function(object, n) {
+  if (inherits(object, "ChunkedArray")) {
+    return(MakeArrayFromScalar(Scalar$create(object$chunks[[1]]), n))
+  }
+  return(MakeArrayFromScalar(Scalar$create(object), n))
+}
+
+#' Recycle scalar values in a list of arrays
+#' 
+#' @param arrays List of arrays
+#' @return List of arrays with any vector/Scalar/Array/ChunkedArray values of length 1 recycled 
+#' @keywords internal
+recycle_scalars <- function(arrays){
+  # Get lengths of items in arrays
+  is_df <- map_lgl(arrays, ~inherits(.x, "data.frame"))
+  arr_lens <- lengths(arrays)
+  arr_lens[is_df] <- map_int(arrays[is_df], nrow)
+  
+  if (length(arrays) > 1 && any(arr_lens == 1) && !all(arr_lens==1)) {
+    max_array_len <- max(arr_lens)
+    arrays[arr_lens == 1 & !is_df] <- lapply(arrays[arr_lens == 1 & !is_df], repeat_value_as_array, max_array_len)

Review comment:
       I've now ended up removing this extra code to try to handle mixed length tibble/data.frames as it seems like a lot of effort for something that doesn't seem to be a particularly common use case.  I've added in an error message for if the input tables are tibbles of different lengths.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] thisisnic commented on a change in pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
thisisnic commented on a change in pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#discussion_r629572528



##########
File path: r/R/table.R
##########
@@ -175,6 +175,17 @@ Table$create <- function(..., schema = NULL) {
     return(dplyr::group_by(out, !!!dplyr::groups(dots[[1]])))
   }
   
+  # If any arrays are length 1, recycle them  
+  arr_lens <- map(dots, length)
+  if (length(dots) > 1 && any(arr_lens == 1) && !all(arr_lens==1)){
+    max_array_len <- max(unlist(arr_lens))

Review comment:
       Oops, must have escaped as I was moving things around!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Scheduled] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Scheduled] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#discussion_r629634210



##########
File path: r/R/record-batch.R
##########
@@ -161,6 +161,17 @@ RecordBatch$create <- function(..., schema = NULL) {
     out <- RecordBatch__from_arrays(schema, arrays)
     return(dplyr::group_by(out, !!!dplyr::groups(arrays[[1]])))
   }
+
+  # If any arrays are length 1, recycle them  
+  arr_lens <- map_int(arrays, length)
+  if (length(arrays) > 1 && any(arr_lens == 1) && !all(arr_lens==1)){
+    max_array_len <- max(arr_lens)
+    arrays <- modify2(
+      arrays,
+      arr_lens == 1,
+      ~if(.y) MakeArrayFromScalar(Scalar$create(as.vector(.x)), max_array_len) else .x

Review comment:
       As for how to convert a `ChunkedArray` to an `Array`: you should add a new R6 method named `combine_chunks()` to the `ChunkedArray` class that works like the PyArrow [`ChunkedArray.combine_chunks`](https://arrow.apache.org/docs/python/generated/pyarrow.ChunkedArray.html#pyarrow.ChunkedArray.combine_chunks) method (but without the MemoryPool stuff).
   
   That in turn will require adding a `concat_arrays()` function (like PyArrow has) which could be exported.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#discussion_r632571798



##########
File path: r/R/record-batch.R
##########
@@ -161,7 +161,18 @@ RecordBatch$create <- function(..., schema = NULL) {
     out <- RecordBatch__from_arrays(schema, arrays)
     return(dplyr::group_by(out, !!!dplyr::groups(arrays[[1]])))
   }
-  
+
+  # If any arrays are length 1, recycle them  
+  arr_lens <- map_int(arrays, length)

Review comment:
       FWIW there is a base R function `lengths()` that does this (though I don't recall what version it was added in)

##########
File path: r/R/record-batch.R
##########
@@ -161,7 +161,18 @@ RecordBatch$create <- function(..., schema = NULL) {
     out <- RecordBatch__from_arrays(schema, arrays)
     return(dplyr::group_by(out, !!!dplyr::groups(arrays[[1]])))
   }
-  
+
+  # If any arrays are length 1, recycle them  
+  arr_lens <- map_int(arrays, length)
+  if (length(arrays) > 1 && any(arr_lens == 1) && !all(arr_lens==1)) {
+    max_array_len <- max(arr_lens)
+    arrays <- modify2(
+      arrays,
+      arr_lens == 1,
+      ~if (.y) repeat_value_as_array(.x, max_array_len) else .x
+    )

Review comment:
       Call me 👴 but I personally find 
   
   ```
   arrays[arr_lens == 1] <- lapply(arrays[arr_lens == 1], repeat_value_as_array, max_array_len)
   ```
   
   easier to read than `modify2(...)`.

##########
File path: r/R/scalar.R
##########
@@ -33,7 +33,7 @@ Scalar <- R6Class("Scalar",
   public = list(
     ToString = function() Scalar__ToString(self),
     as_vector = function() Scalar__as_vector(self),
-    as_array = function() MakeArrayFromScalar(self),
+    as_array = function() MakeArrayFromScalar(self, 1L),

Review comment:
       If you pass this through, maybe you don't need `repeat_value_as_array`
   
   ```suggestion
       as_array = function(length = 1L) MakeArrayFromScalar(self, as.integer(length)),
   ```

##########
File path: r/tests/testthat/test-RecordBatch.R
##########
@@ -415,14 +414,42 @@ test_that("record_batch() handles null type (ARROW-7064)", {
   expect_equivalent(batch$schema,  schema(a = int32(), n = null()))
 })
 
-test_that("record_batch() scalar recycling", {
-  skip("Not implemented (ARROW-11705)")
+test_that("record_batch() scalar recycling with vectors", {
   expect_data_frame(
     record_batch(a = 1:10, b = 5),
     tibble::tibble(a = 1:10, b = 5)
   )
 })
 
+test_that("record_batch() scalar recycling with Scalars, Arrays, and ChunkedArrays", {
+  
+  expect_data_frame(
+    record_batch(a = Array$create(1:10), b = Scalar$create(5)),
+    tibble::tibble(a = 1:10, b = 5)
+  )
+  
+  expect_data_frame(
+    record_batch(a = Array$create(1:10), b = Array$create(5)),
+    tibble::tibble(a = 1:10, b = 5)
+  )
+  
+  expect_data_frame(
+    record_batch(a = Array$create(1:10), b = ChunkedArray$create(5)),
+    tibble::tibble(a = 1:10, b = 5)
+  )
+  
+})
+
+test_that("record_batch() no recycling with tibbles", {

Review comment:
       Why does this error, and should it?

##########
File path: r/R/util.R
##########
@@ -110,3 +110,21 @@ handle_embedded_nul_error <- function(e) {
   }
   stop(e)
 }
+
+#' Take an object of length 1 and repeat it.
+#' 
+#' @param object Object to be repeated - vector, `Scalar`, `Array`, or `ChunkedArray`
+#' @param n Number of repetitions
+#' 
+#' @return `Array` of length `n`
+#' 
+#' @keywords internal
+repeat_value_as_array <- function(object, n) {
+  if (length(object) != 1) {

Review comment:
       If you're checking length again here (unnecessary IMO since you're only calling this in the case where you've validated length == 1 already), you could simplify your `modify2()` wrapper and just `map` over all `arrays`, and in here only do the recycling if length == 1. 

##########
File path: r/R/record-batch.R
##########
@@ -161,7 +161,18 @@ RecordBatch$create <- function(..., schema = NULL) {
     out <- RecordBatch__from_arrays(schema, arrays)
     return(dplyr::group_by(out, !!!dplyr::groups(arrays[[1]])))
   }
-  
+
+  # If any arrays are length 1, recycle them  

Review comment:
       This code block seems to be repeated in both the Table and RecordBatch code, so it might be better factored out as a helper function.
   
   I also wonder whether the logic would actually be simpler in C++ because you could do it at a later point where you know exactly that you have a vector of Arrays and don't have to worry about whether it is an R vector, a Scalar, an Array, etc. See the `check_consistent_array_size` function for example in `r/src`--you could drop in around there and instead of erroring if you don't have consistent lengths, handle the recycling case. (Also ok to (1) ignore this suggestion or (2) defer to a followup)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot commented on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot commented on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Scheduled] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Scheduled] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Scheduled] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   :warning: ec2-t3-xlarge-us-east-2 agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#discussion_r629634210



##########
File path: r/R/record-batch.R
##########
@@ -161,6 +161,17 @@ RecordBatch$create <- function(..., schema = NULL) {
     out <- RecordBatch__from_arrays(schema, arrays)
     return(dplyr::group_by(out, !!!dplyr::groups(arrays[[1]])))
   }
+
+  # If any arrays are length 1, recycle them  
+  arr_lens <- map_int(arrays, length)
+  if (length(arrays) > 1 && any(arr_lens == 1) && !all(arr_lens==1)){
+    max_array_len <- max(arr_lens)
+    arrays <- modify2(
+      arrays,
+      arr_lens == 1,
+      ~if(.y) MakeArrayFromScalar(Scalar$create(as.vector(.x)), max_array_len) else .x

Review comment:
       As for how to convert a `ChunkedArray` to an `Array`: you should add a new R6 method named `combine_chunks()` to the `ChunkedArray` class that works like the PyArrow [`ChunkedArray.combine_chunks`](https://arrow.apache.org/docs/python/generated/pyarrow.ChunkedArray.html#pyarrow.ChunkedArray.combine_chunks) method (but without the MemoryPool stuff)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#discussion_r635572831



##########
File path: r/R/table.R
##########
@@ -175,12 +175,17 @@ Table$create <- function(..., schema = NULL) {
     return(dplyr::group_by(out, !!!dplyr::groups(dots[[1]])))

Review comment:
       This if block needs to be handled--as it currently stands, if you have a grouped_df you won't get scalar recycling. (In general I'd like to see this code refactored so that there's only one `Table__from_dots` call.)

##########
File path: r/R/util.R
##########
@@ -110,3 +110,36 @@ handle_embedded_nul_error <- function(e) {
   }
   stop(e)
 }
+
+#' Take an object of length 1 and repeat it.
+#' 
+#' @param object Object of length 1 to be repeated - vector, `Scalar`, `Array`, or `ChunkedArray`
+#' @param n Number of repetitions
+#' 
+#' @return `Array` of length `n`
+#' 
+#' @keywords internal
+repeat_value_as_array <- function(object, n) {
+  if (inherits(object, "ChunkedArray")) {
+    return(MakeArrayFromScalar(Scalar$create(object$chunks[[1]]), n))
+  }
+  return(MakeArrayFromScalar(Scalar$create(object), n))
+}
+
+#' Recycle scalar values in a list of arrays
+#' 
+#' @param arrays List of arrays
+#' @return List of arrays with any vector/Scalar/Array/ChunkedArray values of length 1 recycled 
+#' @keywords internal
+recycle_scalars <- function(arrays){
+  # Get lengths of items in arrays
+  is_df <- map_lgl(arrays, ~inherits(.x, "data.frame"))
+  arr_lens <- lengths(arrays)
+  arr_lens[is_df] <- map_int(arrays[is_df], nrow)

Review comment:
       ```suggestion
     arr_lens <- map_int(arrays, NROW)
   ```

##########
File path: r/R/table.R
##########
@@ -175,12 +175,17 @@ Table$create <- function(..., schema = NULL) {
     return(dplyr::group_by(out, !!!dplyr::groups(dots[[1]])))
   }
   
+  # If any arrays are length 1, recycle them  
+  dots <- recycle_scalars(dots)

Review comment:
       This probably belongs inside the "else" block below

##########
File path: r/R/util.R
##########
@@ -110,3 +110,36 @@ handle_embedded_nul_error <- function(e) {
   }
   stop(e)
 }
+
+#' Take an object of length 1 and repeat it.
+#' 
+#' @param object Object of length 1 to be repeated - vector, `Scalar`, `Array`, or `ChunkedArray`
+#' @param n Number of repetitions
+#' 
+#' @return `Array` of length `n`
+#' 
+#' @keywords internal
+repeat_value_as_array <- function(object, n) {
+  if (inherits(object, "ChunkedArray")) {
+    return(MakeArrayFromScalar(Scalar$create(object$chunks[[1]]), n))
+  }
+  return(MakeArrayFromScalar(Scalar$create(object), n))
+}
+
+#' Recycle scalar values in a list of arrays
+#' 
+#' @param arrays List of arrays
+#' @return List of arrays with any vector/Scalar/Array/ChunkedArray values of length 1 recycled 
+#' @keywords internal
+recycle_scalars <- function(arrays){
+  # Get lengths of items in arrays
+  is_df <- map_lgl(arrays, ~inherits(.x, "data.frame"))
+  arr_lens <- lengths(arrays)
+  arr_lens[is_df] <- map_int(arrays[is_df], nrow)
+  
+  if (length(arrays) > 1 && any(arr_lens == 1) && !all(arr_lens==1)) {
+    max_array_len <- max(arr_lens)
+    arrays[arr_lens == 1 & !is_df] <- lapply(arrays[arr_lens == 1 & !is_df], repeat_value_as_array, max_array_len)

Review comment:
       Why not data frames? they turn into struct arrays:
   
   ```
   > Scalar$create(tibble::tibble(a=1))$as_array()
   StructArray
   <struct<a: int32>>
   -- is_valid: all not null
   -- child 0 type: int32
     [
       1
     ]
   ``

##########
File path: r/R/util.R
##########
@@ -110,3 +110,36 @@ handle_embedded_nul_error <- function(e) {
   }
   stop(e)
 }
+
+#' Take an object of length 1 and repeat it.
+#' 
+#' @param object Object of length 1 to be repeated - vector, `Scalar`, `Array`, or `ChunkedArray`
+#' @param n Number of repetitions
+#' 
+#' @return `Array` of length `n`
+#' 
+#' @keywords internal
+repeat_value_as_array <- function(object, n) {
+  if (inherits(object, "ChunkedArray")) {
+    return(MakeArrayFromScalar(Scalar$create(object$chunks[[1]]), n))
+  }
+  return(MakeArrayFromScalar(Scalar$create(object), n))

Review comment:
       Since this is wired up as the as_array method on Scalar, we should use it
   
   ```suggestion
       return(Scalar$create(object$chunks[[1]])$as_array(n))
     }
     return(Scalar$create(object)$as_array(n))
   ```
   
   Also, how might the ChunkedArray case go wrong here?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   :warning: ec2-t3-large-us-east-2 agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ElenaHenderson commented on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ElenaHenderson commented on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-842505116


   @ursabot please benchmark name=dataframe-to-table lang=R


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   :warning: ec2-t3-large-us-east-2 agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   :warning: ec2-t3-large-us-east-2 agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] thisisnic commented on a change in pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
thisisnic commented on a change in pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#discussion_r629115855



##########
File path: r/R/table.R
##########
@@ -175,6 +175,16 @@ Table$create <- function(..., schema = NULL) {
     return(dplyr::group_by(out, !!!dplyr::groups(dots[[1]])))
   }
   
+  # If any arrays are length 1, recycle them  
+  arr_lens <- map(dots, length)
+  if (length(dots) > 1 && any(arr_lens == 1) && !all(arr_lens==1)){
+    dots <- purrr::modify2(

Review comment:
       Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   :warning: ec2-t3-xlarge-us-east-2 agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Scheduled] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ianmcook edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-839127582


   @jonkeane it looks like it's only evaluating the last commit in this PR; the baseline is not the commit _before_ this PR, it's the second to last commit _in_ this PR. Is that a known problem?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   :warning: ec2-t3-xlarge-us-east-2 agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-842556138


   Benchmark runs are scheduled for baseline = 325eb073e0fb6971f3dd027299d37850377b39ea and contender = 8363dff11a8319235b1d5bcf4350a0793c5b7903. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/e07388d4ae5a48e79db8d977e1fbead9...fdf46169d96349f3b8acd5079a8b11fb/)
   [Finished :arrow_down:0.0% :arrow_up:0.0% :warning: Contender and baseline run contexts do not match] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/585c76ccb6744629ab19a94e804b66a6...22d02b05c14b45f08226e60f3864a40e/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/9eb43ed540a3418f8ee0a75921b77b14...b4fb3f01bc814530b238483d14f12ec0/)
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/abe3e3b060824852b59d9934b5ed4af6...4a0fd651b5af4152b2c4e02776e8d9ad/)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#discussion_r629628488



##########
File path: r/R/record-batch.R
##########
@@ -161,6 +161,17 @@ RecordBatch$create <- function(..., schema = NULL) {
     out <- RecordBatch__from_arrays(schema, arrays)
     return(dplyr::group_by(out, !!!dplyr::groups(arrays[[1]])))
   }
+
+  # If any arrays are length 1, recycle them  
+  arr_lens <- map_int(arrays, length)
+  if (length(arrays) > 1 && any(arr_lens == 1) && !all(arr_lens==1)){
+    max_array_len <- max(arr_lens)
+    arrays <- modify2(
+      arrays,
+      arr_lens == 1,
+      ~if(.y) MakeArrayFromScalar(Scalar$create(as.vector(.x)), max_array_len) else .x

Review comment:
       We should find a way to do this without calling `as.vector()` because that will convert Arrow objects to R vectors which could cause the data type to be lost during the conversion.
   
   But the problem is that without `as.vector()`, length 1 `ChunkedArray` objects will error when passed to `Array$create()` inside `Scalar$create()`. I think the cleanest way to solve that is to improve the `Array$create` function in `array.R` to handle `ChunkedArray` objects and convert it to `Array`. (This is getting into yak shaving territory here but I think it's important.)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] thisisnic commented on a change in pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
thisisnic commented on a change in pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#discussion_r633850677



##########
File path: r/R/record-batch.R
##########
@@ -161,7 +161,18 @@ RecordBatch$create <- function(..., schema = NULL) {
     out <- RecordBatch__from_arrays(schema, arrays)
     return(dplyr::group_by(out, !!!dplyr::groups(arrays[[1]])))
   }
-  
+
+  # If any arrays are length 1, recycle them  

Review comment:
       I opened up a ticket to do this in C++, so I figured probably no point duplicating that effort? Though if this seems like a special case that's better off implemented in the R package's C++ layer rather than the source C++, I can look into it.  See discussion on this ticket, @nealrichardson : https://issues.apache.org/jira/browse/ARROW-12789




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ianmcook commented on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838715745


   @thisisnic please also build the docs so `util.Rd` is included here


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] thisisnic commented on a change in pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
thisisnic commented on a change in pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#discussion_r635854937



##########
File path: r/R/table.R
##########
@@ -175,12 +175,17 @@ Table$create <- function(..., schema = NULL) {
     return(dplyr::group_by(out, !!!dplyr::groups(dots[[1]])))

Review comment:
       I was thinking it's OK that there's no scalar recycling with a grouped_df as scalar recycling only happens when there's >1 input passed in, and grouping only happens when there's 1 input passed in.
   
   Using tibble/dplyr, I get `FALSE` if I call `tibble::tibble(slice(iris,1), group_by(iris, Species), .name_repair = "unique") %>% is_grouped_df()`, so this I think this is what we want?
   
   I've refactored the code so things are more in keeping with the early return style & only 1 Table__from_dots call now.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   :warning: ec2-t3-xlarge-us-east-2 agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] thisisnic commented on a change in pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
thisisnic commented on a change in pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#discussion_r634139678



##########
File path: r/tests/testthat/test-RecordBatch.R
##########
@@ -415,14 +414,42 @@ test_that("record_batch() handles null type (ARROW-7064)", {
   expect_equivalent(batch$schema,  schema(a = int32(), n = null()))
 })
 
-test_that("record_batch() scalar recycling", {
-  skip("Not implemented (ARROW-11705)")
+test_that("record_batch() scalar recycling with vectors", {
   expect_data_frame(
     record_batch(a = 1:10, b = 5),
     tibble::tibble(a = 1:10, b = 5)
   )
 })
 
+test_that("record_batch() scalar recycling with Scalars, Arrays, and ChunkedArrays", {
+  
+  expect_data_frame(
+    record_batch(a = Array$create(1:10), b = Scalar$create(5)),
+    tibble::tibble(a = 1:10, b = 5)
+  )
+  
+  expect_data_frame(
+    record_batch(a = Array$create(1:10), b = Array$create(5)),
+    tibble::tibble(a = 1:10, b = 5)
+  )
+  
+  expect_data_frame(
+    record_batch(a = Array$create(1:10), b = ChunkedArray$create(5)),
+    tibble::tibble(a = 1:10, b = 5)
+  )
+  
+})
+
+test_that("record_batch() no recycling with tibbles", {

Review comment:
       I had a think about this, and `tibble::tibble` does support tibble recycling, so if we're trying to allow similar behaviour, I think it makes sense to implement it here.  




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot commented on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot commented on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-842505335


   Benchmark runs are scheduled for baseline = 325eb073e0fb6971f3dd027299d37850377b39ea and contender = 8363dff11a8319235b1d5bcf4350a0793c5b7903. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Scheduled] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/e07388d4ae5a48e79db8d977e1fbead9...174a0cbbdaa64cac9c91851d28842bab/)
   [Scheduled] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/585c76ccb6744629ab19a94e804b66a6...798e36b0e8104da4ba1fe940d6ff4764/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/9eb43ed540a3418f8ee0a75921b77b14...111f823ff77943d79c69f4eb43311fb5/)
   [Scheduled] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/abe3e3b060824852b59d9934b5ed4af6...b5760c307cf54cf2ac78891f4b203d0f/)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ianmcook commented on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838864118


   In theory, this would be a good opportunity to use Conbench to test whether the multiple `length()` calls added here might have a meaningful effect on the performance of `Table`/`RecordBatch` creation. In practice, I'm not sure whether Conbench would help us here. @jonkeane do you know?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot commented on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot commented on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-842505335


   Benchmark runs are scheduled for baseline = 325eb073e0fb6971f3dd027299d37850377b39ea and contender = 8363dff11a8319235b1d5bcf4350a0793c5b7903. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Scheduled] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/e07388d4ae5a48e79db8d977e1fbead9...174a0cbbdaa64cac9c91851d28842bab/)
   [Scheduled] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/585c76ccb6744629ab19a94e804b66a6...798e36b0e8104da4ba1fe940d6ff4764/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/9eb43ed540a3418f8ee0a75921b77b14...111f823ff77943d79c69f4eb43311fb5/)
   [Scheduled] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/abe3e3b060824852b59d9934b5ed4af6...b5760c307cf54cf2ac78891f4b203d0f/)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   :warning: ec2-t3-large-us-east-2 agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
jonkeane commented on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838983401






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ianmcook commented on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-839127582


   @jonkeane it looks like it's only evaluating the last commit; the baseline is not the commit before this PR, it's the second to last commit in this PR. Is that a known problem?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   :warning: ec2-t3-xlarge-us-east-2 agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   :warning: ec2-t3-xlarge-us-east-2 agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   :warning: ec2-t3-xlarge-us-east-2 agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   :warning: ec2-t3-large-us-east-2 agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   :warning: ec2-t3-large-us-east-2 agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   :warning: ec2-t3-xlarge-us-east-2 agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-834700741


   https://issues.apache.org/jira/browse/ARROW-11705


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#discussion_r638271386



##########
File path: r/R/util.R
##########
@@ -110,3 +110,36 @@ handle_embedded_nul_error <- function(e) {
   }
   stop(e)
 }
+
+#' Take an object of length 1 and repeat it.
+#' 
+#' @param object Object of length 1 to be repeated - vector, `Scalar`, `Array`, or `ChunkedArray`
+#' @param n Number of repetitions
+#' 
+#' @return `Array` of length `n`
+#' 
+#' @keywords internal
+repeat_value_as_array <- function(object, n) {
+  if (inherits(object, "ChunkedArray")) {
+    return(MakeArrayFromScalar(Scalar$create(object$chunks[[1]]), n))
+  }
+  return(MakeArrayFromScalar(Scalar$create(object), n))
+}
+
+#' Recycle scalar values in a list of arrays
+#' 
+#' @param arrays List of arrays
+#' @return List of arrays with any vector/Scalar/Array/ChunkedArray values of length 1 recycled 
+#' @keywords internal
+recycle_scalars <- function(arrays){
+  # Get lengths of items in arrays
+  is_df <- map_lgl(arrays, ~inherits(.x, "data.frame"))
+  arr_lens <- lengths(arrays)
+  arr_lens[is_df] <- map_int(arrays[is_df], nrow)
+  
+  if (length(arrays) > 1 && any(arr_lens == 1) && !all(arr_lens==1)) {
+    max_array_len <- max(arr_lens)
+    arrays[arr_lens == 1 & !is_df] <- lapply(arrays[arr_lens == 1 & !is_df], repeat_value_as_array, max_array_len)

Review comment:
       @romainfrancois can you take a look please?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   :warning: ec2-t3-large-us-east-2 agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-842505335






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ianmcook edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-834741650


   Ultimately I think we should try to do scalar value recycling in C++ code, but I think this a great temporary solution in the meantime.
   
   What happens if you pass data frames instead of vectors and one of them has length one (i.e. only one row)? Maybe add a test to check the behavior in that case.
   
   And what happens if you pass Arrow arrays instead of R vectors and one of them has length 1? E.g.
   ```r
   Table$create(a = Array$create(1:10), b = Array$create(5))
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ElenaHenderson removed a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ElenaHenderson removed a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-842553784


   @ursabot please benchmark name=file-read lang=R


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ianmcook edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838715745


   @thisisnic please also build the docs so the new `.Rd` file is included here


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-842556138


   Benchmark runs are scheduled for baseline = 325eb073e0fb6971f3dd027299d37850377b39ea and contender = 8363dff11a8319235b1d5bcf4350a0793c5b7903. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/e07388d4ae5a48e79db8d977e1fbead9...fdf46169d96349f3b8acd5079a8b11fb/)
   [Finished :arrow_down:0.0% :arrow_up:0.0% :warning: Contender and baseline run contexts do not match] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/585c76ccb6744629ab19a94e804b66a6...22d02b05c14b45f08226e60f3864a40e/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/9eb43ed540a3418f8ee0a75921b77b14...b4fb3f01bc814530b238483d14f12ec0/)
   [Scheduled] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/abe3e3b060824852b59d9934b5ed4af6...4a0fd651b5af4152b2c4e02776e8d9ad/)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   :warning: ec2-t3-large-us-east-2 agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   :warning: ec2-t3-xlarge-us-east-2 agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   :warning: ec2-t3-large-us-east-2 agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   :warning: ec2-t3-xlarge-us-east-2 agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] thisisnic commented on a change in pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
thisisnic commented on a change in pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#discussion_r633857420



##########
File path: r/tests/testthat/test-RecordBatch.R
##########
@@ -415,14 +414,42 @@ test_that("record_batch() handles null type (ARROW-7064)", {
   expect_equivalent(batch$schema,  schema(a = int32(), n = null()))
 })
 
-test_that("record_batch() scalar recycling", {
-  skip("Not implemented (ARROW-11705)")
+test_that("record_batch() scalar recycling with vectors", {
   expect_data_frame(
     record_batch(a = 1:10, b = 5),
     tibble::tibble(a = 1:10, b = 5)
   )
 })
 
+test_that("record_batch() scalar recycling with Scalars, Arrays, and ChunkedArrays", {
+  
+  expect_data_frame(
+    record_batch(a = Array$create(1:10), b = Scalar$create(5)),
+    tibble::tibble(a = 1:10, b = 5)
+  )
+  
+  expect_data_frame(
+    record_batch(a = Array$create(1:10), b = Array$create(5)),
+    tibble::tibble(a = 1:10, b = 5)
+  )
+  
+  expect_data_frame(
+    record_batch(a = Array$create(1:10), b = ChunkedArray$create(5)),
+    tibble::tibble(a = 1:10, b = 5)
+  )
+  
+})
+
+test_that("record_batch() no recycling with tibbles", {

Review comment:
       It does it because when we call `length()` on a `tibble`, it returns the number of columns, so in the case in the examples, 2. 
   
   I don't know if it should - should we support recycling for tibbles? I guess we could - I suppose it's consistent with the other behaviour implemented.  This is definitely a problem though as if one of my tibbles has 1 column then weird things happen.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#discussion_r647851002



##########
File path: r/R/util.R
##########
@@ -139,3 +139,42 @@ attr(is_writable_table, "fail") <- function(call, env){
   )
 }
 
+#' Take an object of length 1 and repeat it.
+#' 
+#' @param object Object of length 1 to be repeated - vector, `Scalar`, `Array`, or `ChunkedArray`
+#' @param n Number of repetitions
+#' 
+#' @return `Array` of length `n`
+#' 
+#' @keywords internal
+repeat_value_as_array <- function(object, n) {
+  if (inherits(object, "ChunkedArray")) {
+    return(Scalar$create(object$chunks[[1]])$as_array(n))
+  }
+  return(Scalar$create(object)$as_array(n))
+}
+
+#' Recycle scalar values in a list of arrays
+#' 
+#' @param arrays List of arrays
+#' @return List of arrays with any vector/Scalar/Array/ChunkedArray values of length 1 recycled 
+#' @keywords internal
+recycle_scalars <- function(arrays){
+  # Get lengths of items in arrays
+  arr_lens <- map_int(arrays, NROW)
+  

Review comment:
       NBD but you call `arr_lens == 1` 4 times below so you could store that up here, and then I think the code becomes a little more readable (in a prose-like way).
   
   ```suggestion
     is_scalar <- arr_lens == 1
     
   ```

##########
File path: r/R/util.R
##########
@@ -139,3 +139,42 @@ attr(is_writable_table, "fail") <- function(call, env){
   )
 }
 
+#' Take an object of length 1 and repeat it.
+#' 
+#' @param object Object of length 1 to be repeated - vector, `Scalar`, `Array`, or `ChunkedArray`
+#' @param n Number of repetitions
+#' 
+#' @return `Array` of length `n`
+#' 
+#' @keywords internal
+repeat_value_as_array <- function(object, n) {
+  if (inherits(object, "ChunkedArray")) {
+    return(Scalar$create(object$chunks[[1]])$as_array(n))
+  }
+  return(Scalar$create(object)$as_array(n))
+}
+
+#' Recycle scalar values in a list of arrays
+#' 
+#' @param arrays List of arrays
+#' @return List of arrays with any vector/Scalar/Array/ChunkedArray values of length 1 recycled 
+#' @keywords internal
+recycle_scalars <- function(arrays){
+  # Get lengths of items in arrays
+  arr_lens <- map_int(arrays, NROW)
+  
+  if (length(arrays) > 1 && any(arr_lens == 1) && !all(arr_lens==1)) {
+    
+    # Recycling not supported for tibbles and data.frames
+    if(all(map_lgl(arrays, ~inherits(.x, "data.frame")))){
+      abort(c(
+          "All input tibbles or data.frames must have the same number of rows",
+          x = paste("Number of rows in inputs:",oxford_paste(map_int(arrays, ~nrow(.x))))

Review comment:
       right?
   
   ```suggestion
             x = paste("Number of rows in inputs:", oxford_paste(arr_lens)
   ```
   
   Also, are we worried that `arr_lens` could be large (and thus this error message would be huge)?

##########
File path: r/R/util.R
##########
@@ -139,3 +139,42 @@ attr(is_writable_table, "fail") <- function(call, env){
   )
 }
 
+#' Take an object of length 1 and repeat it.
+#' 
+#' @param object Object of length 1 to be repeated - vector, `Scalar`, `Array`, or `ChunkedArray`
+#' @param n Number of repetitions
+#' 
+#' @return `Array` of length `n`
+#' 
+#' @keywords internal
+repeat_value_as_array <- function(object, n) {
+  if (inherits(object, "ChunkedArray")) {
+    return(Scalar$create(object$chunks[[1]])$as_array(n))
+  }
+  return(Scalar$create(object)$as_array(n))
+}
+
+#' Recycle scalar values in a list of arrays
+#' 
+#' @param arrays List of arrays
+#' @return List of arrays with any vector/Scalar/Array/ChunkedArray values of length 1 recycled 
+#' @keywords internal
+recycle_scalars <- function(arrays){
+  # Get lengths of items in arrays
+  arr_lens <- map_int(arrays, NROW)
+  
+  if (length(arrays) > 1 && any(arr_lens == 1) && !all(arr_lens==1)) {
+    
+    # Recycling not supported for tibbles and data.frames
+    if(all(map_lgl(arrays, ~inherits(.x, "data.frame")))){

Review comment:
       You added whitespace throughout the package but not here :)
   
   ```suggestion
       if (all(map_lgl(arrays, ~inherits(.x, "data.frame")))) {
   ```

##########
File path: r/R/util.R
##########
@@ -139,3 +139,42 @@ attr(is_writable_table, "fail") <- function(call, env){
   )
 }
 
+#' Take an object of length 1 and repeat it.

Review comment:
       Minor suggestion: I'd put this function definition after `recycle_scalars`. It's only interesting in the context of that function.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   :warning: ec2-t3-large-us-east-2 agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   :warning: ec2-t3-xlarge-us-east-2 agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#discussion_r629354356



##########
File path: r/R/record-batch.R
##########
@@ -161,6 +161,17 @@ RecordBatch$create <- function(..., schema = NULL) {
     out <- RecordBatch__from_arrays(schema, arrays)
     return(dplyr::group_by(out, !!!dplyr::groups(arrays[[1]])))
   }
+
+  # If any arrays are length 1, recycle them  
+  arr_lens <- map(arrays, length)
+  if (length(arrays) > 1 && any(arr_lens == 1) && !all(arr_lens==1)){
+    max_array_len <- max(unlist(arr_lens))

Review comment:
       ```suggestion
     arr_lens <- map_int(arrays, length)
     if (length(arrays) > 1 && any(arr_lens == 1) && !all(arr_lens==1)){
       max_array_len <- max(arr_lens)
   ```
   (and the same in `table.R`)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] thisisnic commented on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
thisisnic commented on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-836325287


   @ianmcook I think this is something for a separate ticket/PR, but when I was testing things you mentioned above, I found that it is possible to create `Table` and `RecordBatch` objects with duplicated column names, which then results in errors when I try to analyse them, e.g. 
   
   ```r
   Table$create(iris, iris) %>% filter(Species == "versicolor")
   ```
   ` Error in schm$GetFieldByName(name)$ToString() : attempt to apply non-function `
   
   Shall I create a new ticket to add code which checks column names are unique when combining objects?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#discussion_r629647107



##########
File path: r/R/record-batch.R
##########
@@ -161,6 +161,17 @@ RecordBatch$create <- function(..., schema = NULL) {
     out <- RecordBatch__from_arrays(schema, arrays)
     return(dplyr::group_by(out, !!!dplyr::groups(arrays[[1]])))
   }
+
+  # If any arrays are length 1, recycle them  
+  arr_lens <- map_int(arrays, length)
+  if (length(arrays) > 1 && any(arr_lens == 1) && !all(arr_lens==1)){
+    max_array_len <- max(arr_lens)
+    arrays <- modify2(
+      arrays,
+      arr_lens == 1,
+      ~if(.y) MakeArrayFromScalar(Scalar$create(as.vector(.x)), max_array_len) else .x

Review comment:
       Looking at this a bit more, I think it might require some challenging C++ coding to implement `combine_chunks()` like I described. Perhaps it should be dealt with in a separate Jira & separate PR.
   
   In the meantime, you could avoid calling `as.vector()` by handling `ChunkedArray` explicitly here, taking the first chunk before passing it to `Scalar$create()`:
   
   ```r
   if (inherits(.x, "ChunkedArray")) {
     .x <- .x$chunk(0)
   }
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   :warning: ec2-t3-xlarge-us-east-2 agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ElenaHenderson commented on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ElenaHenderson commented on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-842505116


   @ursabot please benchmark name=dataframe-to-table lang=R


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot commented on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot commented on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-842556138


   Benchmark runs are scheduled for baseline = 325eb073e0fb6971f3dd027299d37850377b39ea and contender = 8363dff11a8319235b1d5bcf4350a0793c5b7903. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Scheduled] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/e07388d4ae5a48e79db8d977e1fbead9...fdf46169d96349f3b8acd5079a8b11fb/)
   [Scheduled] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/585c76ccb6744629ab19a94e804b66a6...22d02b05c14b45f08226e60f3864a40e/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/9eb43ed540a3418f8ee0a75921b77b14...b4fb3f01bc814530b238483d14f12ec0/)
   [Scheduled] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/abe3e3b060824852b59d9934b5ed4af6...4a0fd651b5af4152b2c4e02776e8d9ad/)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] thisisnic commented on a change in pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
thisisnic commented on a change in pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#discussion_r636043288



##########
File path: r/R/util.R
##########
@@ -110,3 +110,36 @@ handle_embedded_nul_error <- function(e) {
   }
   stop(e)
 }
+
+#' Take an object of length 1 and repeat it.
+#' 
+#' @param object Object of length 1 to be repeated - vector, `Scalar`, `Array`, or `ChunkedArray`
+#' @param n Number of repetitions
+#' 
+#' @return `Array` of length `n`
+#' 
+#' @keywords internal
+repeat_value_as_array <- function(object, n) {
+  if (inherits(object, "ChunkedArray")) {
+    return(MakeArrayFromScalar(Scalar$create(object$chunks[[1]]), n))
+  }
+  return(MakeArrayFromScalar(Scalar$create(object), n))
+}
+
+#' Recycle scalar values in a list of arrays
+#' 
+#' @param arrays List of arrays
+#' @return List of arrays with any vector/Scalar/Array/ChunkedArray values of length 1 recycled 
+#' @keywords internal
+recycle_scalars <- function(arrays){
+  # Get lengths of items in arrays
+  is_df <- map_lgl(arrays, ~inherits(.x, "data.frame"))
+  arr_lens <- lengths(arrays)
+  arr_lens[is_df] <- map_int(arrays[is_df], nrow)
+  
+  if (length(arrays) > 1 && any(arr_lens == 1) && !all(arr_lens==1)) {
+    max_array_len <- max(arr_lens)
+    arrays[arr_lens == 1 & !is_df] <- lapply(arrays[arr_lens == 1 & !is_df], repeat_value_as_array, max_array_len)

Review comment:
       RecordBatch__from_arrays fails with a `StructArray` as input due to `count_fields` (from recordbatch.cpp) checking if the object inherits from a data.frame - I've tried to implement something in `18be171`, but it's wrong, and I'm struggling to figure out how to do it correctly.
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   :warning: ec2-t3-large-us-east-2 agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   :warning: ec2-t3-large-us-east-2 agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] thisisnic commented on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
thisisnic commented on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-840595984


   > Benchmark runs are scheduled for baseline = [4e0f0cf](https://github.com/apache/arrow/commit/4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe) and contender = [dbe7491](https://github.com/apache/arrow/commit/dbe74918019e172f8bdd3a2085f1ec7481fa79f4). Results will be available as each benchmark for each run completes.
   > Conbench compare runs links:
   > [Failed arrow_down0.0% arrow_up0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   > [Failed arrow_down0.0% arrow_up0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   > [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   > [Failed arrow_down0.0% arrow_up0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   > warning ursa-i9-9960x agent is disconnected or machine is offline.
   
   @jonkeane It looks like Conbench has used different benchmarks for each of those PRs - do you know why that's happened?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   :warning: ec2-t3-xlarge-us-east-2 agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   :warning: ec2-t3-xlarge-us-east-2 agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#discussion_r629647107



##########
File path: r/R/record-batch.R
##########
@@ -161,6 +161,17 @@ RecordBatch$create <- function(..., schema = NULL) {
     out <- RecordBatch__from_arrays(schema, arrays)
     return(dplyr::group_by(out, !!!dplyr::groups(arrays[[1]])))
   }
+
+  # If any arrays are length 1, recycle them  
+  arr_lens <- map_int(arrays, length)
+  if (length(arrays) > 1 && any(arr_lens == 1) && !all(arr_lens==1)){
+    max_array_len <- max(arr_lens)
+    arrays <- modify2(
+      arrays,
+      arr_lens == 1,
+      ~if(.y) MakeArrayFromScalar(Scalar$create(as.vector(.x)), max_array_len) else .x

Review comment:
       Looking at this a bit more, I think it might require some challenging C++ coding to implement `combine_chunks()` like I described. Perhaps it should be dealt with in a separate Jira & separate PR.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] thisisnic commented on a change in pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
thisisnic commented on a change in pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#discussion_r632754954



##########
File path: r/R/record-batch.R
##########
@@ -161,7 +161,18 @@ RecordBatch$create <- function(..., schema = NULL) {
     out <- RecordBatch__from_arrays(schema, arrays)
     return(dplyr::group_by(out, !!!dplyr::groups(arrays[[1]])))
   }
-  
+
+  # If any arrays are length 1, recycle them  
+  arr_lens <- map_int(arrays, length)

Review comment:
       Apparently 3.2.0.  I think in our test-r-versions job, we only got back to 3.3.0 and I can't think of anywhere else that we go back to 3.2.0 or before so will update.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   :warning: ec2-t3-xlarge-us-east-2 agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   :warning: ec2-t3-large-us-east-2 agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] thisisnic commented on a change in pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
thisisnic commented on a change in pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#discussion_r630048707



##########
File path: r/R/record-batch.R
##########
@@ -161,6 +161,17 @@ RecordBatch$create <- function(..., schema = NULL) {
     out <- RecordBatch__from_arrays(schema, arrays)
     return(dplyr::group_by(out, !!!dplyr::groups(arrays[[1]])))
   }
+
+  # If any arrays are length 1, recycle them  
+  arr_lens <- map_int(arrays, length)
+  if (length(arrays) > 1 && any(arr_lens == 1) && !all(arr_lens==1)){
+    max_array_len <- max(arr_lens)
+    arrays <- modify2(
+      arrays,
+      arr_lens == 1,
+      ~if(.y) MakeArrayFromScalar(Scalar$create(as.vector(.x)), max_array_len) else .x

Review comment:
       Done!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] thisisnic commented on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
thisisnic commented on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-836322644


   > What happens if you pass data frames instead of vectors and one of them has length one (i.e. only one row)? Maybe add a test to check the behavior in that case.
   
   Will do. 
   
   > And what happens if you pass Arrow arrays instead of R vectors and one of them has length 1? E.g.
   > 
   > ```r
   > Table$create(a = Array$create(1:10), b = Array$create(5))
   > ```
   An error, will make changes to handle this to and add appropriate tests.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   :warning: ec2-t3-xlarge-us-east-2 agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] thisisnic commented on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
thisisnic commented on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-841058739


   > @thisisnic ooh we finally have some benchmark results to look at!: [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   > The `dataframe-to-table` results are the pertinent ones here I think.
   > @jonkeane can you help us interpret these results?
   
   I'm going to take a stab at interpreting them just to see how I do.  Overall, the changes I've made have made things 5% slower.  I'm not sure whether this is important - as I don't have a good idea of a cut-off, or idea whether any of this is just noise.  Looking more closely at the results, the biggest differences are to file-read, where the update is at least 8% slower.
   
   Does this feel like a significant slowdown? I'd say so.  I think it provides support to the idea that this really should be implemented at the C++ level rather than the R level.  Is this OK to be merged in? I think "perhaps", as long as we open a JIRA for this to be implemented at the C++ level.
   
   Let me know your thoughts!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-842556138


   Benchmark runs are scheduled for baseline = 325eb073e0fb6971f3dd027299d37850377b39ea and contender = 8363dff11a8319235b1d5bcf4350a0793c5b7903. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/e07388d4ae5a48e79db8d977e1fbead9...fdf46169d96349f3b8acd5079a8b11fb/)
   [Finished :arrow_down:0.0% :arrow_up:0.0% :warning: Contender and baseline run contexts do not match] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/585c76ccb6744629ab19a94e804b66a6...22d02b05c14b45f08226e60f3864a40e/)
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/9eb43ed540a3418f8ee0a75921b77b14...b4fb3f01bc814530b238483d14f12ec0/)
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/abe3e3b060824852b59d9934b5ed4af6...4a0fd651b5af4152b2c4e02776e8d9ad/)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#discussion_r629569897



##########
File path: r/R/table.R
##########
@@ -175,6 +175,17 @@ Table$create <- function(..., schema = NULL) {
     return(dplyr::group_by(out, !!!dplyr::groups(dots[[1]])))
   }
   
+  # If any arrays are length 1, recycle them  
+  arr_lens <- map(dots, length)
+  if (length(dots) > 1 && any(arr_lens == 1) && !all(arr_lens==1)){
+    max_array_len <- max(unlist(arr_lens))

Review comment:
       ```suggestion
     arr_lens <- map_int(dots, length)
     if (length(dots) > 1 && any(arr_lens == 1) && !all(arr_lens==1)){
       max_array_len <- max(arr_lens)
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] thisisnic commented on a change in pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
thisisnic commented on a change in pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#discussion_r633851616



##########
File path: r/R/util.R
##########
@@ -110,3 +110,21 @@ handle_embedded_nul_error <- function(e) {
   }
   stop(e)
 }
+
+#' Take an object of length 1 and repeat it.
+#' 
+#' @param object Object to be repeated - vector, `Scalar`, `Array`, or `ChunkedArray`
+#' @param n Number of repetitions
+#' 
+#' @return `Array` of length `n`
+#' 
+#' @keywords internal
+repeat_value_as_array <- function(object, n) {
+  if (length(object) != 1) {

Review comment:
       I think I was trying to make this function a bit more generic in case it's useful elsewhere, but you make a good point; I'll remove the check.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook removed a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ianmcook removed a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-839127582


   @jonkeane it looks like it's only evaluating the last commit in this PR; the baseline is not the commit _before_ this PR, it's the second to last commit _in_ this PR. Is that a known problem? Is there some way to explicitly specify which baseline commit to use?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ianmcook commented on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838481121


   This looks pretty good to me! Just a few final things:
   - Could you please ensure there are spaces added: `if(` → `if (` and `){`→ `) {`
   - Could you search all the tests for "ARROW-11705" and see if the two skipped tests work now that you've implemented this fix?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ElenaHenderson commented on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ElenaHenderson commented on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-842555929


   @ursabot please benchmark name=file-read lang=R


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] thisisnic commented on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
thisisnic commented on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838628591


   > This looks pretty good to me! Just a few final things:
   > 
   >     * Could you please ensure there are spaces added: `if(` → `if (` and `){`→ `) {`
   > 
   >     * Could you search all the tests for "[ARROW-11705](https://issues.apache.org/jira/browse/ARROW-11705)" and see if the two skipped tests work now that you've implemented this fix?
   
   Done both now, and fixed spacing in a load of other places as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   :warning: ec2-t3-large-us-east-2 agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ianmcook commented on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-840693418


   I think Conbench needs some tweaking before it can help us here.
   
   I'll go ahead and resolve the conflicts, wait for checks to pass, and merge this if there aren't any objections


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ianmcook commented on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-840728106


   @thisisnic ooh we finally have some benchmark results to look at!: [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/) 
   The `dataframe-to-table` results are the pertinent ones here I think.
   @jonkeane can you help us interpret these results?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   :warning: ec2-t3-xlarge-us-east-2 agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#discussion_r632849602



##########
File path: r/R/record-batch.R
##########
@@ -161,7 +161,18 @@ RecordBatch$create <- function(..., schema = NULL) {
     out <- RecordBatch__from_arrays(schema, arrays)
     return(dplyr::group_by(out, !!!dplyr::groups(arrays[[1]])))
   }
-  
+
+  # If any arrays are length 1, recycle them  
+  arr_lens <- map_int(arrays, length)

Review comment:
       Per DESCRIPTION we require R >= 3.3 (because we depend on packages that require R >= 3.3)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] thisisnic commented on a change in pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
thisisnic commented on a change in pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#discussion_r632750639



##########
File path: r/R/record-batch.R
##########
@@ -161,7 +161,18 @@ RecordBatch$create <- function(..., schema = NULL) {
     out <- RecordBatch__from_arrays(schema, arrays)
     return(dplyr::group_by(out, !!!dplyr::groups(arrays[[1]])))
   }
-  
+
+  # If any arrays are length 1, recycle them  
+  arr_lens <- map_int(arrays, length)
+  if (length(arrays) > 1 && any(arr_lens == 1) && !all(arr_lens==1)) {
+    max_array_len <- max(arr_lens)
+    arrays <- modify2(
+      arrays,
+      arr_lens == 1,
+      ~if (.y) repeat_value_as_array(.x, max_array_len) else .x
+    )

Review comment:
       I don't disagree, will update




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #10269: ARROW-11705: [R] Support scalar value recycling in RecordBatch/Table$create()

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #10269:
URL: https://github.com/apache/arrow/pull/10269#issuecomment-838984969


   Benchmark runs are scheduled for baseline = 4e0f0cf79cf836a29e4bfd4a7b2d692f8b50bffe and contender = dbe74918019e172f8bdd3a2085f1ec7481fa79f4. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-large-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/614f005c2822403591fddd25921d2358...67131004c3734086a26def2729031134/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2 (mimalloc)](https://conbench.ursa.dev/compare/runs/c61a9f2f9b504ad49587818f3ae5f283...0e8346acd8a040f9bdbfe92576c027f5/)
   [Scheduled] [ursa-i9-9960x (mimalloc)](https://conbench.ursa.dev/compare/runs/139601ccb33e49c88e039fcfa1a6d460...1a688615902f4a0a997f48e908738a25/)
   [Failed :arrow_down:0.0% :arrow_up:0.0% Warning: Contender and baseline run contexts do not match] [ursa-thinkcentre-m75q (mimalloc)](https://conbench.ursa.dev/compare/runs/ab3cde1618b84feea09029227d19e223...59d9724c63334b16ba4b5d1f20e20875/)
   :warning: ursa-i9-9960x agent is disconnected or machine is offline.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org