You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/08/07 19:35:07 UTC

[GitHub] [arrow-datafusion] andygrove opened a new issue #835: Ballista: TPC-H q3 @ SF=1000 never completes

andygrove opened a new issue #835:
URL: https://github.com/apache/arrow-datafusion/issues/835


   **Describe the bug**
   I've started testing with large scale factor data sets and I am seeing TPC-H query 3 hang, possibly during execution of query stage 3.
   
   Here are the data sizes of the shuffle output directories for the different query stages at the time the query appears to have stopped executing.
   
   ```bash
   $ du -h -d 1 /mnt/bigdata/temp/RpXfVVN/
   616M	/mnt/bigdata/temp/RpXfVVN/1
   890M	/mnt/bigdata/temp/RpXfVVN/3
   21G	/mnt/bigdata/temp/RpXfVVN/2
   92G	/mnt/bigdata/temp/RpXfVVN/4
   113G	/mnt/bigdata/temp/RpXfVVN/
   ```
   
   Query stages 1, 2, and 4 have 48 shuffle files for each output partition, as expected. Query stage 3 only has 3 shuffle output files for each output partition, which doesn't seem right.
   
   The last output I see in the scheduler process is:
   
   ```
   INFO  ballista_scheduler] Sending new task to 3965aec5-ca89-4853-90ee-91f56e23a979: RpXfVVN/3/12
   ```
   
   Here is some output from one partition from query stage 3 that did complete (output partitions 2, 14, and 22 completed).
   
   ```
   === [RpXfVVN/3/14] Physical plan with metrics ===
   ShuffleWriterExec: Some(Hash([Column { name: "o_orderkey", index: 2 }], 24)), metrics=[outputRows=6072170, writeTime=1085450524, inputRows=6072170]
     CoalesceBatchesExec: target_batch_size=4096, metrics=[]
       HashJoinExec: mode=Partitioned, join_type=Inner, on=[(Column { name: "c_custkey", index: 0 }, Column { name: "o_custkey", index: 1 })], metrics=[outputBatches=7073, inputRows=30396981, inputBatches=7073, outputRows=6072170, joinTime=1458]
         CoalesceBatchesExec: target_batch_size=4096, metrics=[]
           ShuffleReaderExec: partition_locations(24)=...
   ```
   
   **To Reproduce**
   
   Generate data set using tpctools crate.
   
   ```bash
   cargo install tpctools
   tpctools generate --benchmark tpch --scale 1000 --partitions 48 --generator-path /mnt/bigdata/tpch-dbgen --output /mnt/bigdata/tpch-sf1000/
   ```
   
   Run a scheduler:
   
   ```bash
   RUST_LOG=info ./target/release/ballista-scheduler
   ```
   
   Run an executor:
   
   ```bash
   RUST_LOG=info ./target/release/ballista-executor -c 24 --work-dir /mnt/bigdata/temp
   ```
   
   Run the benchmark:
   
   ```bash
   ../target/release/tpch benchmark ballista --path /mnt/bigdata/tpch-sf1000/ --format tbl --iterations 1 --query 3 --debug --host localhost --port 50050 --shuffle-partitions 24
   ```
   
   **Expected behavior**
   Query should complete.
   
   **Additional context**
   Running on 24-core threadripper with 64 GB RAM.
   
   Before the hang, things were looking good - cores were being kept relatively busy and overall system memory use was only 12 GB and stayed pretty flat throughout.
   
   ![ballista-tpch-sf1000](https://user-images.githubusercontent.com/934084/128611949-99e58c5b-8125-4fdf-b7f0-24f8e4260e3c.png)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] andygrove closed issue #835: Ballista: TPC-H q3 @ SF=1000 never completes

Posted by GitBox <gi...@apache.org>.

andygrove closed issue #835:
URL: https://github.com/apache/arrow-datafusion/issues/835


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] andygrove commented on issue #835: Ballista: TPC-H q3 @ SF=1000 never completes

Posted by GitBox <gi...@apache.org>.

andygrove commented on issue #835:
URL: https://github.com/apache/arrow-datafusion/issues/835#issuecomment-894703910


   This feels like a deadlock somewhere. I wonder if the shuffle reader is unable to read partitions because the executor has run out of threads to handle incoming Flight requests. I will add some debug logging and explore that next.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-datafusion] andygrove commented on issue #835: Ballista: TPC-H q3 @ SF=1000 never completes

Posted by GitBox <gi...@apache.org>.

andygrove commented on issue #835:
URL: https://github.com/apache/arrow-datafusion/issues/835#issuecomment-894812548


   I ran this again and it succeeded so maybe I was not being patient enough. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org