You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@arrow.apache.org by "austin3dickey (via GitHub)" <gi...@apache.org> on 2023/06/05 18:22:59 UTC

[GitHub] [arrow] austin3dickey opened a new issue, #35917: R `tpch` benchmarks are failing (queries 02, 10, and 13)

austin3dickey opened a new issue, #35917:
URL: https://github.com/apache/arrow/issues/35917

   ### Describe the bug, including details regarding any error messages, version, and platform.
   
   For a while[^1] now, the TPCH-02, TPCH-10, and TPCH-13 queries have been failing validation during R benchmarking. Here is [a link](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-arm64-m6g-linux-compute/builds/2797#01888349-b91d-4cca-9757-04766a04f4c6) to a recent build, and here is [the benchmarking code](https://github.com/voltrondata-labs/arrowbench/blob/main/R/bm-tpc-h.R) that is run. Each of these queries fails with the same error for all the cases that we test: `format=native` and `format=parquet`, and also `scale_factor=1` and `scale_factor=10`.
   
   TPCH-02:
   ```
   Error in eval(bm$after_each, envir = ctx) : The answer does not match
   Calls: run_bm -> run_iteration -> eval -> eval
   In addition: Warning message:
   In eval(bm$after_each, envir = ctx) : 
   old vs new
                                                                                                           s_comment
   - old[1, ]   l, ironic instructions cajole                                                                       
   + new[1, ]   uriously regular requests hag                                                                       
   - old[2, ]   es. furiously silent deposits among the deposits haggle furiously a                                 
   + new[2, ]   efully express instructions. regular requests against the slyly fin                                 
   - old[3, ]   ar, regular requests nag blithely special accounts. final deposits impress carefully. ironic,       
   + new[3, ]   etect about the furiously final accounts. slyly ironic pinto beans sleep inside the furiously       
   - old[4, ]   s sleep according to the quick requests. carefully                                                  
   + new[4, ]   ackages boost blithely. blithely regular deposits c             [... truncated]
   Execution halted
   ```
   
   TPCH-10:
   ```
   Error in eval(bm$after_each, envir = ctx) : The answer does not match
   Calls: run_bm -> run_iteration -> eval -> eval
   In addition: Warning message:
   In eval(bm$after_each, envir = ctx) : 
   old vs new
                                                                                                                      c_comment
   - old[1, ]  ep. blithely regular foxes promise slyly furiously ironic depend                                                
   + new[1, ]  sits. slyly regular requests sleep alongside of the regular inst                                                
   - old[2, ]  endencies sleep. slyly express deposits nag carefully around the even tithes. slyly regular                     
   + new[2, ]  ggle carefully enticing requests. final deposits use bold, bold pinto beans. ironic, idle re                    
   - old[3, ]  tes. final instructions nag quickly according to                                                                
   + new[3, ]   need to boost against the slyly regular account                                                                
   - old[4, ]  ost carefully. slyly regular packages cajole about the blithely final ideas. permanently daring depos [... truncated]
   Execution halted
   ```
   
   TPCH-13:
   ```
   Error in eval(bm$after_each, envir = ctx) : The answer does not match
   Calls: run_bm -> run_iteration -> eval -> eval
   In addition: Warning message:
   In eval(bm$after_each, envir = ctx) : 
   old vs new
               c_count
     old[1, ]        0
   - old[2, ]       10
   + new[2, ]        9
   - old[3, ]        9
   + new[3, ]       10
     old[4, ]       11
     old[5, ]        8
     old[6, ]       12
   
   old vs new
               c_count
     old[7, ]       13
     old[8, ]       19
     old[9, ]        7
   - old[10, ]      18
   + new[10, ]      17
   - old[11, ]      20
   + new[11, ]      18
   - old[12, ]      14
   + new[12, ]      20
     old[13, ]      15
   - old[14, ]      17
   + new[14, ]      14
     old[15, ]      16
     old[16, ]      21
     old[17, ]      22
   
   old vs new
               c_count
     old[33, ]      33
     old[34, ]      34
     old[35, ]      35
   - old[36, ]      36
   + new[36, ]       1
   - old[37, ]       1
   + new[37, ]      36
   - old[38, ]      37
   + new[38, ]      38
   - old[39, ]      38
   + new[39, ]      37
   - old[40, ]      41
   + new[40, ]      40
   - old[41, ]      40
   + new[41, ]      41
     old[42, ]      39
   
   `old$c_count[1:6]`: 0 10  9 11 8 12
   `new$c_count[1:6]`: 0  9 10 11 8 12
   
   `old$c_count[7:17]`: 13 19 7 18 20 14 15 17 16 21  [... truncated]
   Execution halted
   ```
   
   [^1]: I will look into the first time this failed, and report back.
   
   ### Component(s)
   
   Benchmarking, R


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] thisisnic commented on issue #35917: [Benchmarking] [R] R `tpch` benchmarks are failing (queries 02, 10, and 13)

Posted by "thisisnic (via GitHub)" <gi...@apache.org>.

thisisnic commented on issue #35917:
URL: https://github.com/apache/arrow/issues/35917#issuecomment-1577274881

   What is being compared here in terms of `old` and `new`?  Looks like the specific output is completely different, which isn't a good sign!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] austin3dickey commented on issue #35917: [Benchmarking] [R] R `tpch` benchmarks are failing (queries 02, 10, and 13)

Posted by "austin3dickey (via GitHub)" <gi...@apache.org>.

austin3dickey commented on issue #35917:
URL: https://github.com/apache/arrow/issues/35917#issuecomment-1577284271

   Update: these benchmarks do not fail on the `ursa-i9-9960x` or `test-mac-arm` hardwares, only the `arm64-m6g-linux-compute`, `arm64-t4g-linux-compute`, and `ec2-m5-4xlarge-us-east-2` hardwares.
   
   Also I think the first time we saw this failing was [this build](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-arm64-m6g-linux-compute/builds/2739#01884951-5a06-4a67-a569-5bc7a7b3bb8d) on 2023-05-23, which was a benchmarking run kicked off due to the merge of this PR: https://github.com/apache/arrow/pull/35342. (The other two problematic hardwares started failing on this commit's benchmarking runs as well.)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] austin3dickey commented on issue #35917: [Benchmarking] [R] R `tpch` benchmarks are failing (queries 02, 10, and 13)

Posted by "austin3dickey (via GitHub)" <gi...@apache.org>.

austin3dickey commented on issue #35917:
URL: https://github.com/apache/arrow/issues/35917#issuecomment-1581038033

   The last successful build logs have:
   ```
   trying URL 'https://cloud.r-project.org/src/contrib/duckdb_0.7.1-1.tar.gz'
   ```
   The first failing build logs have:
   ```
   trying URL 'https://cloud.r-project.org/src/contrib/duckdb_0.8.0.tar.gz'
   ```
   
   Seems like a big red flag to me!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] alistaire47 commented on issue #35917: [Benchmarking] [R] R `tpch` benchmarks are failing (queries 02, 10, and 13)

Posted by "alistaire47 (via GitHub)" <gi...@apache.org>.

alistaire47 commented on issue #35917:
URL: https://github.com/apache/arrow/issues/35917#issuecomment-1601094348

   This was resolved by https://github.com/voltrondata-labs/arrowbench/pull/136 so somebody with permissions can close this issue now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] austin3dickey closed issue #35917: [Benchmarking] [R] R `tpch` benchmarks are failing (queries 02, 10, and 13)

Posted by "austin3dickey (via GitHub)" <gi...@apache.org>.

austin3dickey closed issue #35917: [Benchmarking] [R] R `tpch` benchmarks are failing (queries 02, 10, and 13)
URL: https://github.com/apache/arrow/issues/35917


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] austin3dickey commented on issue #35917: [Benchmarking] [R] R `tpch` benchmarks are failing (queries 02, 10, and 13)

Posted by "austin3dickey (via GitHub)" <gi...@apache.org>.

austin3dickey commented on issue #35917:
URL: https://github.com/apache/arrow/issues/35917#issuecomment-1577288295

   >What is being compared here in terms of old and new? 
   
   I believe `old` is the actual result from the benchmark run, and `new` is the expected query answer.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] jgehrcke commented on issue #35917: [Benchmarking] [R] R `tpch` benchmarks are failing (queries 02, 10, and 13)

Posted by "jgehrcke (via GitHub)" <gi...@apache.org>.

jgehrcke commented on issue #35917:
URL: https://github.com/apache/arrow/issues/35917#issuecomment-1582279592

   @austin3dickey great work narrowing down the duckdb pkg difference.
   
   > Pinning isn't common in R DESCRIPTIONs
   
   I propose https://github.com/voltrondata-labs/arrowbench/pull/135 to try out pinning to `0.7.1-1`. I think this is a cheap way to generate important insights to guide further decision-making here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] thisisnic commented on issue #35917: [Benchmarking] [R] R `tpch` benchmarks are failing (queries 02, 10, and 13)

Posted by "thisisnic (via GitHub)" <gi...@apache.org>.

thisisnic commented on issue #35917:
URL: https://github.com/apache/arrow/issues/35917#issuecomment-1577301099

   Very odd, not sure what to suggest.  I wouldn't expect C# code to affect R results as they're completely separate codebases, so it might be something else around then.  Especially strange as it only fails on some hardware.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] austin3dickey commented on issue #35917: [Benchmarking] [R] R `tpch` benchmarks are failing (queries 02, 10, and 13)

Posted by "austin3dickey (via GitHub)" <gi...@apache.org>.

austin3dickey commented on issue #35917:
URL: https://github.com/apache/arrow/issues/35917#issuecomment-1581040472

   Another red flag: there is a change to default sort order in 0.8.0. https://duckdb.org/2023/05/17/announcing-duckdb-080.html#breaking-sql-changes


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] alistaire47 commented on issue #35917: [Benchmarking] [R] R `tpch` benchmarks are failing (queries 02, 10, and 13)

Posted by "alistaire47 (via GitHub)" <gi...@apache.org>.

alistaire47 commented on issue #35917:
URL: https://github.com/apache/arrow/issues/35917#issuecomment-1577354640

   > Update: these benchmarks do not fail on the `ursa-i9-9960x` or `test-mac-arm` hardwares, only the `arm64-m6g-linux-compute`, `arm64-t4g-linux-compute`, and `ec2-m5-4xlarge-us-east-2` hardwares.
   
   Might be a datagen thing—the cache doesn't get recreated often on the dedicated machines, but on the cloud machines it has to be regenerated each time. [It looks like we're still using duckdb to generate data](https://github.com/voltrondata-labs/arrowbench/blob/ebfab3d5535a1a7ba8680be4d0394296b9b1a5ee/R/ensure-tpch-source.R); a starting point might be comparing that against [datalogistik](https://github.com/conbench/datalogistik)?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] alistaire47 commented on issue #35917: [Benchmarking] [R] R `tpch` benchmarks are failing (queries 02, 10, and 13)

Posted by "alistaire47 (via GitHub)" <gi...@apache.org>.

alistaire47 commented on issue #35917:
URL: https://github.com/apache/arrow/issues/35917#issuecomment-1581070740

   As far as I can tell, we're installing duckdb from CRAN as a dep and it's not pinned, so yes, we're subject to breaking changes:
   
   https://github.com/voltrondata-labs/arrowbench/blob/ebfab3d5535a1a7ba8680be4d0394296b9b1a5ee/DESCRIPTION#L19
   
   It's worth running against 0.6.0 and 0.8.0 and seeing if we can reproduce the issue.
   
   Pinning isn't common in R `DESCRIPTION`s (current best-practice is using [renv](https://rstudio.github.io/renv/articles/renv.html) to make a lock file), but seems like the quick solution here if this is actually the problem (though if it results in building duckdb from source instead of getting a binary, it could add ~half an hour to runtimes. But maybe we're already doing that in most cases?).
   
   If we decide to update instead, we'll likely need to clear the dataset caches from the bare-metal machines. Elena has done this for us before, but I'm not totally sure how. Maybe we should make a little triggerable buildkite job to do this as necessary?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] austin3dickey commented on issue #35917: [Benchmarking] [R] R `tpch` benchmarks are failing (queries 02, 10, and 13)

Posted by "austin3dickey (via GitHub)" <gi...@apache.org>.

austin3dickey commented on issue #35917:
URL: https://github.com/apache/arrow/issues/35917#issuecomment-1581109931

   Another small potential fix is to explicitly specify NULLS FIRST when we use duckdb, since that seems to be what changed in the 0.8.0 version. As you said, local testing is probably the best way to ensure this will work.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org