You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/01/04 22:40:08 UTC

[GitHub] [arrow] Dandandan opened a new pull request #9099: ARROW-11129: [Rust][DataFusion] Use tokio for loading parquet [WIP]

Dandandan opened a new pull request #9099:
URL: https://github.com/apache/arrow/pull/9099


   Inspired by PR https://github.com/apache/arrow/pull/9086 from @jorgecarleitao 
   
   Seems better to use  `task::spawn_blocking` to set upper limit on nr. of threads (512 by default) using tokio.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on pull request #9099: ARROW-11129: [Rust][DataFusion] Use tokio for writing parquet

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #9099:
URL: https://github.com/apache/arrow/pull/9099#issuecomment-754386570


   Two CI jobs are hanging indefinitely, I canceled them.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on pull request #9099: ARROW-11129: [Rust][DataFusion] Use tokio for loading parquet

Posted by GitBox <gi...@apache.org>.
Dandandan commented on pull request #9099:
URL: https://github.com/apache/arrow/pull/9099#issuecomment-754789551


   Locally seems to work now thanks @andygrove . Seems CI jobs are in queue for quite some time


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #9099: ARROW-11129: [Rust][DataFusion] Use tokio thread pool for loading parquet

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #9099:
URL: https://github.com/apache/arrow/pull/9099#issuecomment-754913960


   ![Screen Shot 2021-01-05 at 4 33 56 PM](https://user-images.githubusercontent.com/490673/103701330-d0eda280-4f73-11eb-8475-3698f6fea4e5.png)
   And now they do show as green. WTF guthub?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #9099: ARROW-11129: [Rust][DataFusion] Use tokio for loading parquet [WIP]

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #9099:
URL: https://github.com/apache/arrow/pull/9099#issuecomment-754266445


   https://issues.apache.org/jira/browse/ARROW-11129


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on pull request #9099: ARROW-11129: [Rust][DataFusion] Use tokio for loading parquet

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #9099:
URL: https://github.com/apache/arrow/pull/9099#issuecomment-754707400


   Note that this is hanging in the CI, which I think is unrelated with the CI itself.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] andygrove commented on a change in pull request #9099: ARROW-11129: [Rust][DataFusion] Use tokio for loading parquet

Posted by GitBox <gi...@apache.org>.
andygrove commented on a change in pull request #9099:
URL: https://github.com/apache/arrow/pull/9099#discussion_r552012528



##########
File path: rust/datafusion/src/physical_plan/parquet.rs
##########
@@ -256,16 +257,21 @@ impl ExecutionPlan for ParquetExec {
         let projection = self.projection.clone();
         let batch_size = self.batch_size;
 
-        thread::spawn(move || {
-            if let Err(e) = read_files(
+        task::spawn_blocking(move || {
+            read_files(
                 &filenames,
                 projection.clone(),
                 batch_size,
                 response_tx.clone(),
-            ) {
-                println!("Parquet reader thread terminated due to error: {:?}", e);
-            }
-        });
+            )
+        })
+        .await

Review comment:
       I think this will lead to deadlock because the channel's buffer will fill up and the reader won't be able to push new batches. The caller can't read any batches from the channel until this task completes?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] andygrove commented on a change in pull request #9099: ARROW-11129: [Rust][DataFusion] Use tokio for loading parquet

Posted by GitBox <gi...@apache.org>.
andygrove commented on a change in pull request #9099:
URL: https://github.com/apache/arrow/pull/9099#discussion_r552010480



##########
File path: rust/datafusion/src/physical_plan/parquet.rs
##########
@@ -256,16 +257,21 @@ impl ExecutionPlan for ParquetExec {
         let projection = self.projection.clone();
         let batch_size = self.batch_size;
 
-        thread::spawn(move || {
-            if let Err(e) = read_files(
+        task::spawn_blocking(move || {
+            read_files(
                 &filenames,
                 projection.clone(),
                 batch_size,
                 response_tx.clone(),
-            ) {
-                println!("Parquet reader thread terminated due to error: {:?}", e);
-            }
-        });
+            )
+        })
+        .await

Review comment:
       I don't think we want to `await` here for the task to complete?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb edited a comment on pull request #9099: ARROW-11129: [Rust][DataFusion] Use tokio thread pool for loading parquet

Posted by GitBox <gi...@apache.org>.
alamb edited a comment on pull request #9099:
URL: https://github.com/apache/arrow/pull/9099#issuecomment-754913960


   ![Screen Shot 2021-01-05 at 4 33 56 PM](https://user-images.githubusercontent.com/490673/103701330-d0eda280-4f73-11eb-8475-3698f6fea4e5.png)
   And now they do show as green. WTF github?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb closed pull request #9099: ARROW-11129: [Rust][DataFusion] Use tokio thread pool for loading parquet

Posted by GitBox <gi...@apache.org>.
alamb closed pull request #9099:
URL: https://github.com/apache/arrow/pull/9099


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #9099: ARROW-11129: [Rust][DataFusion] Use tokio thread pool for loading parquet

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #9099:
URL: https://github.com/apache/arrow/pull/9099#issuecomment-754912699


   I swear Github showed that all the CI checks were green when I merged this PR.... but now they are now showing as still running...


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org