You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by "alamb (via GitHub)" <gi...@apache.org> on 2023/05/19 14:57:09 UTC

[GitHub] [arrow-datafusion] alamb opened a new pull request, #6393: Reduce output when `sqllogictest` runs successfully, and run tests in parallel

alamb opened a new pull request, #6393:
URL: https://github.com/apache/arrow-datafusion/pull/6393

# Which issue does this PR close?

Closes https://github.com/apache/arrow-datafusion/issues/6392

# Rationale for this change

# What changes are included in this PR?

# Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example, are they covered by existing tests)?
-->

# Are there any user-facing changes?

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] alamb commented on pull request #6393: Reduce output when `sqllogictest` runs successfully, and run tests in parallel

Posted by "alamb (via GitHub)" <gi...@apache.org>.

alamb commented on PR #6393:
URL: https://github.com/apache/arrow-datafusion/pull/6393#issuecomment-1561968854

   Thanks everyone!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #6393: Reduce output when `sqllogictest` runs successfully, and run tests in parallel

Posted by "alamb (via GitHub)" <gi...@apache.org>.

alamb commented on code in PR #6393:
URL: https://github.com/apache/arrow-datafusion/pull/6393#discussion_r1204672358


##########
datafusion/core/tests/sqllogictests/src/main.rs:
##########
@@ -56,65 +57,116 @@ pub fn main() {
 
 #[tokio::main]
 #[cfg(not(target_family = "windows"))]
-pub async fn main() -> Result<(), Box<dyn Error>> {
+pub async fn main() -> Result<()> {
     run_tests().await
 }
 
-async fn run_tests() -> Result<(), Box<dyn Error>> {
+async fn run_tests() -> Result<()> {
     // Enable logging (e.g. set RUST_LOG=debug to see debug logs)
     env_logger::init();
 
     let options = Options::new();
 
-    for (path, relative_path) in read_test_files(&options) {
-        if options.complete_mode {
-            run_complete_file(&path, relative_path).await?;
-        } else if options.postgres_runner {
-            run_test_file_with_postgres(&path, relative_path).await?;
-        } else {
-            run_test_file(&path, relative_path).await?;
+    let test_files: Vec<_> = read_test_files(&options).collect();
+
+    // Run all tests in parallel, reporting failures at the end

Review Comment:
   In deb9a4e20 and 139b59422



##########
datafusion/core/tests/sqllogictests/src/main.rs:
##########
@@ -56,65 +57,116 @@ pub fn main() {
 
 #[tokio::main]
 #[cfg(not(target_family = "windows"))]
-pub async fn main() -> Result<(), Box<dyn Error>> {
+pub async fn main() -> Result<()> {
     run_tests().await
 }
 
-async fn run_tests() -> Result<(), Box<dyn Error>> {
+async fn run_tests() -> Result<()> {
     // Enable logging (e.g. set RUST_LOG=debug to see debug logs)
     env_logger::init();
 
     let options = Options::new();
 
-    for (path, relative_path) in read_test_files(&options) {
-        if options.complete_mode {
-            run_complete_file(&path, relative_path).await?;
-        } else if options.postgres_runner {
-            run_test_file_with_postgres(&path, relative_path).await?;
-        } else {
-            run_test_file(&path, relative_path).await?;
+    let test_files: Vec<_> = read_test_files(&options).collect();
+
+    // Run all tests in parallel, reporting failures at the end
+    let results: Vec<_> = futures::stream::iter(test_files)
+        .map(|test_file| {
+            tokio::task::spawn(async move {
+                println!("Running {:?}", test_file.relative_path);
+                if options.complete_mode {
+                    run_complete_file(test_file).await?;
+                } else if options.postgres_runner {
+                    run_test_file_with_postgres(test_file).await?;
+                } else {
+                    run_test_file(test_file).await?;
+                }
+                Ok(()) as Result<()>
+            })
+        })
+        // run up to num_cpus streams in parallel
+        .buffer_unordered(num_cpus::get())
+        .collect()
+        .await;
+
+    // Collect and examine errors
+    let errors: Vec<_> = results
+        .into_iter()
+        .filter_map(|result| {
+            match result {
+                // Tokio panic error
+                Err(e) => Some(DataFusionError::External(Box::new(e))),
+                Ok(thread_result) => match thread_result {
+                    // Test run error
+                    Err(e) => Some(e),
+                    // success
+                    Ok(_) => None,
+                },
+            }
+        })
+        .collect();

Review Comment:
   > Another to achieve the same effect is to replace futures::stream::iter(Some/None) by returning tokio_stream::empty()/tokio_stream::once() instead of Some/None in the branches of match.
   
   I actually tried this:
   
   ```rust
           .flat_map(|result| {
               // filter to keep only the errors, so we can report on them
               match result {
                   // Tokio panic error
                   Err(e) => tokio_stream::once(DataFusionError::External(Box::new(e))),
                   Ok(thread_result) => match thread_result {
                       // Test run error
                       Err(e) => tokio_stream::once(e),
                       // success
                       Ok(_) => tokio_stream::empty()
                   },
               }
           })
   ```
   
   And sadly the compiler complained that the match arms had different types:
   
   ```
   error[E0308]: `match` arms have incompatible types
      --> datafusion/core/tests/sqllogictests/src/main.rs:100:30
       |
   96  |                   Ok(thread_result) => match thread_result {
       |  ______________________________________-
   97  | |                     // Test run error
   98  | |                     Err(e) => tokio_stream::once(e),
       | |                               --------------------- this is found to be of type `tokio_stream::Once<DataFusionError>`
   99  | |                     // success
   100 | |                     Ok(_) => tokio_stream::empty()
       | |                              ^^^^^^^^^^^^^^^^^^^^^ expected `Once<DataFusionError>`, found `Empty<_>`
   101 | |                 },
       | |_________________- `match` arms have incompatible types
       |
       = note: expected struct `tokio_stream::Once<DataFusionError>`
                  found struct `tokio_stream::Empty<_>`
   
   ```



##########
datafusion/core/tests/sqllogictests/src/main.rs:
##########
@@ -56,65 +57,116 @@ pub fn main() {
 
 #[tokio::main]
 #[cfg(not(target_family = "windows"))]
-pub async fn main() -> Result<(), Box<dyn Error>> {
+pub async fn main() -> Result<()> {
     run_tests().await
 }
 
-async fn run_tests() -> Result<(), Box<dyn Error>> {
+async fn run_tests() -> Result<()> {
     // Enable logging (e.g. set RUST_LOG=debug to see debug logs)
     env_logger::init();
 
     let options = Options::new();
 
-    for (path, relative_path) in read_test_files(&options) {
-        if options.complete_mode {
-            run_complete_file(&path, relative_path).await?;
-        } else if options.postgres_runner {
-            run_test_file_with_postgres(&path, relative_path).await?;
-        } else {
-            run_test_file(&path, relative_path).await?;
+    let test_files: Vec<_> = read_test_files(&options).collect();
+
+    // Run all tests in parallel, reporting failures at the end
+    let results: Vec<_> = futures::stream::iter(test_files)
+        .map(|test_file| {
+            tokio::task::spawn(async move {
+                println!("Running {:?}", test_file.relative_path);
+                if options.complete_mode {
+                    run_complete_file(test_file).await?;
+                } else if options.postgres_runner {
+                    run_test_file_with_postgres(test_file).await?;
+                } else {
+                    run_test_file(test_file).await?;
+                }
+                Ok(()) as Result<()>
+            })
+        })
+        // run up to num_cpus streams in parallel
+        .buffer_unordered(num_cpus::get())
+        .collect()
+        .await;
+
+    // Collect and examine errors
+    let errors: Vec<_> = results
+        .into_iter()
+        .filter_map(|result| {
+            match result {
+                // Tokio panic error
+                Err(e) => Some(DataFusionError::External(Box::new(e))),
+                Ok(thread_result) => match thread_result {
+                    // Test run error
+                    Err(e) => Some(e),
+                    // success
+                    Ok(_) => None,
+                },
+            }
+        })
+        .collect();

Review Comment:
   TIL -- very cool @melgenek  -- thank you



##########
datafusion/core/tests/sqllogictests/src/main.rs:
##########
@@ -56,65 +57,116 @@ pub fn main() {
 

Review Comment:
   in 19f145c77 👍 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #6393: Reduce output when `sqllogictest` runs successfully, and run tests in parallel

Posted by "alamb (via GitHub)" <gi...@apache.org>.

alamb commented on code in PR #6393:
URL: https://github.com/apache/arrow-datafusion/pull/6393#discussion_r1200522484


##########
datafusion/core/tests/sqllogictests/src/main.rs:
##########
@@ -56,65 +57,116 @@ pub fn main() {
 
 #[tokio::main]
 #[cfg(not(target_family = "windows"))]
-pub async fn main() -> Result<(), Box<dyn Error>> {
+pub async fn main() -> Result<()> {
     run_tests().await
 }
 
-async fn run_tests() -> Result<(), Box<dyn Error>> {
+async fn run_tests() -> Result<()> {
     // Enable logging (e.g. set RUST_LOG=debug to see debug logs)
     env_logger::init();
 
     let options = Options::new();
 
-    for (path, relative_path) in read_test_files(&options) {
-        if options.complete_mode {
-            run_complete_file(&path, relative_path).await?;
-        } else if options.postgres_runner {
-            run_test_file_with_postgres(&path, relative_path).await?;
-        } else {
-            run_test_file(&path, relative_path).await?;
+    let test_files: Vec<_> = read_test_files(&options).collect();
+
+    // Run all tests in parallel, reporting failures at the end

Review Comment:
   Each `slt` file runs with its own `SessionContext` so I think they should be isolated from one another. 
   
   I suppose if they all shared a temporary directory or something else that could be changed, that would be a problem. 
   
   Perhaps I can add some comments in various places explaining why it is important to keep the tests from having externally visible side effects so they can be parallelized



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] melgenek commented on a diff in pull request #6393: Reduce output when `sqllogictest` runs successfully, and run tests in parallel

Posted by "melgenek (via GitHub)" <gi...@apache.org>.

melgenek commented on code in PR #6393:
URL: https://github.com/apache/arrow-datafusion/pull/6393#discussion_r1204807855


##########
datafusion/core/tests/sqllogictests/src/main.rs:
##########
@@ -56,65 +57,116 @@ pub fn main() {
 
 #[tokio::main]
 #[cfg(not(target_family = "windows"))]
-pub async fn main() -> Result<(), Box<dyn Error>> {
+pub async fn main() -> Result<()> {
     run_tests().await
 }
 
-async fn run_tests() -> Result<(), Box<dyn Error>> {
+async fn run_tests() -> Result<()> {
     // Enable logging (e.g. set RUST_LOG=debug to see debug logs)
     env_logger::init();
 
     let options = Options::new();
 
-    for (path, relative_path) in read_test_files(&options) {
-        if options.complete_mode {
-            run_complete_file(&path, relative_path).await?;
-        } else if options.postgres_runner {
-            run_test_file_with_postgres(&path, relative_path).await?;
-        } else {
-            run_test_file(&path, relative_path).await?;
+    let test_files: Vec<_> = read_test_files(&options).collect();
+
+    // Run all tests in parallel, reporting failures at the end
+    let results: Vec<_> = futures::stream::iter(test_files)
+        .map(|test_file| {
+            tokio::task::spawn(async move {
+                println!("Running {:?}", test_file.relative_path);
+                if options.complete_mode {
+                    run_complete_file(test_file).await?;
+                } else if options.postgres_runner {
+                    run_test_file_with_postgres(test_file).await?;
+                } else {
+                    run_test_file(test_file).await?;
+                }
+                Ok(()) as Result<()>
+            })
+        })
+        // run up to num_cpus streams in parallel
+        .buffer_unordered(num_cpus::get())
+        .collect()
+        .await;
+
+    // Collect and examine errors
+    let errors: Vec<_> = results
+        .into_iter()
+        .filter_map(|result| {
+            match result {
+                // Tokio panic error
+                Err(e) => Some(DataFusionError::External(Box::new(e))),
+                Ok(thread_result) => match thread_result {
+                    // Test run error
+                    Err(e) => Some(e),
+                    // success
+                    Ok(_) => None,
+                },
+            }
+        })
+        .collect();

Review Comment:
   Oops, sorry for a wrong suggestion. I actually verified the first version, but not the second one.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] comphead commented on a diff in pull request #6393: Reduce output when `sqllogictest` runs successfully, and run tests in parallel

Posted by "comphead (via GitHub)" <gi...@apache.org>.

comphead commented on code in PR #6393:
URL: https://github.com/apache/arrow-datafusion/pull/6393#discussion_r1199396805


##########
datafusion/core/tests/sqllogictests/src/main.rs:
##########
@@ -56,65 +57,116 @@ pub fn main() {
 
 #[tokio::main]
 #[cfg(not(target_family = "windows"))]
-pub async fn main() -> Result<(), Box<dyn Error>> {
+pub async fn main() -> Result<()> {
     run_tests().await
 }
 
-async fn run_tests() -> Result<(), Box<dyn Error>> {
+async fn run_tests() -> Result<()> {
     // Enable logging (e.g. set RUST_LOG=debug to see debug logs)
     env_logger::init();
 
     let options = Options::new();
 
-    for (path, relative_path) in read_test_files(&options) {
-        if options.complete_mode {
-            run_complete_file(&path, relative_path).await?;
-        } else if options.postgres_runner {
-            run_test_file_with_postgres(&path, relative_path).await?;
-        } else {
-            run_test_file(&path, relative_path).await?;
+    let test_files: Vec<_> = read_test_files(&options).collect();
+
+    // Run all tests in parallel, reporting failures at the end

Review Comment:
   just wondering, if there can be a race condition if multiple tests work with table t1, and the table dropped by test who has finished first. Other test could fail.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] alamb commented on pull request #6393: Reduce output when `sqllogictest` runs successfully, and run tests in parallel

Posted by "alamb (via GitHub)" <gi...@apache.org>.

alamb commented on PR #6393:
URL: https://github.com/apache/arrow-datafusion/pull/6393#issuecomment-1559479454

   FYI @melgenek  and @xudong963  -- I am not sure if you are interested in this PR, but if you had time to review I would be most appreciative


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] alamb merged pull request #6393: Reduce output when `sqllogictest` runs successfully, and run tests in parallel

Posted by "alamb (via GitHub)" <gi...@apache.org>.

alamb merged PR #6393:
URL: https://github.com/apache/arrow-datafusion/pull/6393


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] melgenek commented on a diff in pull request #6393: Reduce output when `sqllogictest` runs successfully, and run tests in parallel

Posted by "melgenek (via GitHub)" <gi...@apache.org>.

melgenek commented on code in PR #6393:
URL: https://github.com/apache/arrow-datafusion/pull/6393#discussion_r1202961565


##########
datafusion/core/tests/sqllogictests/src/main.rs:
##########
@@ -56,65 +57,116 @@ pub fn main() {
 
 #[tokio::main]
 #[cfg(not(target_family = "windows"))]
-pub async fn main() -> Result<(), Box<dyn Error>> {
+pub async fn main() -> Result<()> {
     run_tests().await
 }
 
-async fn run_tests() -> Result<(), Box<dyn Error>> {
+async fn run_tests() -> Result<()> {
     // Enable logging (e.g. set RUST_LOG=debug to see debug logs)
     env_logger::init();
 
     let options = Options::new();
 
-    for (path, relative_path) in read_test_files(&options) {
-        if options.complete_mode {
-            run_complete_file(&path, relative_path).await?;
-        } else if options.postgres_runner {
-            run_test_file_with_postgres(&path, relative_path).await?;
-        } else {
-            run_test_file(&path, relative_path).await?;
+    let test_files: Vec<_> = read_test_files(&options).collect();
+
+    // Run all tests in parallel, reporting failures at the end
+    let results: Vec<_> = futures::stream::iter(test_files)

Review Comment:
   ```suggestion
       // Run all tests in parallel, reporting failures at the end
       let results: Vec<_> = futures::stream::iter(read_test_files(&options))
   ```
   There's no need to collect in between.



##########
datafusion/core/tests/sqllogictests/src/main.rs:
##########
@@ -56,65 +57,116 @@ pub fn main() {
 

Review Comment:
   The `main` function above for Windows creates a single-threaded executor. I made it this way by mistake.
   It should be `tokio::runtime::Builder::new_multi_thread()`.



##########
datafusion/core/tests/sqllogictests/src/main.rs:
##########
@@ -56,65 +57,116 @@ pub fn main() {
 
 #[tokio::main]
 #[cfg(not(target_family = "windows"))]
-pub async fn main() -> Result<(), Box<dyn Error>> {
+pub async fn main() -> Result<()> {
     run_tests().await
 }
 
-async fn run_tests() -> Result<(), Box<dyn Error>> {
+async fn run_tests() -> Result<()> {
     // Enable logging (e.g. set RUST_LOG=debug to see debug logs)
     env_logger::init();
 
     let options = Options::new();
 
-    for (path, relative_path) in read_test_files(&options) {
-        if options.complete_mode {
-            run_complete_file(&path, relative_path).await?;
-        } else if options.postgres_runner {
-            run_test_file_with_postgres(&path, relative_path).await?;
-        } else {
-            run_test_file(&path, relative_path).await?;
+    let test_files: Vec<_> = read_test_files(&options).collect();
+
+    // Run all tests in parallel, reporting failures at the end
+    let results: Vec<_> = futures::stream::iter(test_files)
+        .map(|test_file| {
+            tokio::task::spawn(async move {
+                println!("Running {:?}", test_file.relative_path);
+                if options.complete_mode {
+                    run_complete_file(test_file).await?;
+                } else if options.postgres_runner {
+                    run_test_file_with_postgres(test_file).await?;
+                } else {
+                    run_test_file(test_file).await?;
+                }
+                Ok(()) as Result<()>
+            })
+        })
+        // run up to num_cpus streams in parallel
+        .buffer_unordered(num_cpus::get())
+        .collect()
+        .await;
+
+    // Collect and examine errors
+    let errors: Vec<_> = results
+        .into_iter()
+        .filter_map(|result| {
+            match result {
+                // Tokio panic error
+                Err(e) => Some(DataFusionError::External(Box::new(e))),
+                Ok(thread_result) => match thread_result {
+                    // Test run error
+                    Err(e) => Some(e),
+                    // success
+                    Ok(_) => None,
+                },
+            }
+        })
+        .collect();

Review Comment:
   ```suggestion
           .buffer_unordered(num_cpus::get())
           .flat_map(|result| {
               futures::stream::iter(match result {
                   // Tokio panic error
                   Err(e) => Some(DataFusionError::External(Box::new(e))),
                   Ok(thread_result) => match thread_result {
                       // Test run error
                       Err(e) => Some(e),
                       // success
                       Ok(_) => None,
                   },
               })
           })
           .collect()
           .await;
   ```
   It is possible to avoid an intermediate `collect+into_iter` to transform results. `flat_map + futures::stream::iter` would flatten the stream with 0 or 1 elements.
   Another to achieve the same effect is to replace `futures::stream::iter(Some/None)` by returning `tokio_stream::empty()`/`tokio_stream::once()` instead of Some/None in the branches of `match`.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] comphead commented on a diff in pull request #6393: Reduce output when `sqllogictest` runs successfully, and run tests in parallel

Posted by "comphead (via GitHub)" <gi...@apache.org>.

comphead commented on code in PR #6393:
URL: https://github.com/apache/arrow-datafusion/pull/6393#discussion_r1199396805


##########
datafusion/core/tests/sqllogictests/src/main.rs:
##########
@@ -56,65 +57,116 @@ pub fn main() {
 
 #[tokio::main]
 #[cfg(not(target_family = "windows"))]
-pub async fn main() -> Result<(), Box<dyn Error>> {
+pub async fn main() -> Result<()> {
     run_tests().await
 }
 
-async fn run_tests() -> Result<(), Box<dyn Error>> {
+async fn run_tests() -> Result<()> {
     // Enable logging (e.g. set RUST_LOG=debug to see debug logs)
     env_logger::init();
 
     let options = Options::new();
 
-    for (path, relative_path) in read_test_files(&options) {
-        if options.complete_mode {
-            run_complete_file(&path, relative_path).await?;
-        } else if options.postgres_runner {
-            run_test_file_with_postgres(&path, relative_path).await?;
-        } else {
-            run_test_file(&path, relative_path).await?;
+    let test_files: Vec<_> = read_test_files(&options).collect();
+
+    // Run all tests in parallel, reporting failures at the end

Review Comment:
   just wondering, if there can be a race condition if multiple tests work with table t1, and it is dropped by test who has finished first. Other test could fail.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #6393: Reduce output when `sqllogictest` runs successfully, and run tests in parallel

Posted by "alamb (via GitHub)" <gi...@apache.org>.

alamb commented on code in PR #6393:
URL: https://github.com/apache/arrow-datafusion/pull/6393#discussion_r1205839673


##########
datafusion/core/tests/sqllogictests/src/main.rs:
##########
@@ -56,65 +57,116 @@ pub fn main() {
 
 #[tokio::main]
 #[cfg(not(target_family = "windows"))]
-pub async fn main() -> Result<(), Box<dyn Error>> {
+pub async fn main() -> Result<()> {
     run_tests().await
 }
 
-async fn run_tests() -> Result<(), Box<dyn Error>> {
+async fn run_tests() -> Result<()> {
     // Enable logging (e.g. set RUST_LOG=debug to see debug logs)
     env_logger::init();
 
     let options = Options::new();
 
-    for (path, relative_path) in read_test_files(&options) {
-        if options.complete_mode {
-            run_complete_file(&path, relative_path).await?;
-        } else if options.postgres_runner {
-            run_test_file_with_postgres(&path, relative_path).await?;
-        } else {
-            run_test_file(&path, relative_path).await?;
+    let test_files: Vec<_> = read_test_files(&options).collect();
+
+    // Run all tests in parallel, reporting failures at the end
+    let results: Vec<_> = futures::stream::iter(test_files)
+        .map(|test_file| {
+            tokio::task::spawn(async move {
+                println!("Running {:?}", test_file.relative_path);
+                if options.complete_mode {
+                    run_complete_file(test_file).await?;
+                } else if options.postgres_runner {
+                    run_test_file_with_postgres(test_file).await?;
+                } else {
+                    run_test_file(test_file).await?;
+                }
+                Ok(()) as Result<()>
+            })
+        })
+        // run up to num_cpus streams in parallel
+        .buffer_unordered(num_cpus::get())
+        .collect()
+        .await;
+
+    // Collect and examine errors
+    let errors: Vec<_> = results
+        .into_iter()
+        .filter_map(|result| {
+            match result {
+                // Tokio panic error
+                Err(e) => Some(DataFusionError::External(Box::new(e))),
+                Ok(thread_result) => match thread_result {
+                    // Test run error
+                    Err(e) => Some(e),
+                    // success
+                    Ok(_) => None,
+                },
+            }
+        })
+        .collect();

Review Comment:
   No problem -- I learned something new and I think the result is better code all around. Thank you for taking the time to review and suggest



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org