You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "BubbaJoe (via GitHub)" <gi...@apache.org> on 2023/03/07 06:12:53 UTC

[GitHub] [arrow-ballista] BubbaJoe opened a new issue, #702: Distributed Execution

BubbaJoe opened a new issue, #702:
URL: https://github.com/apache/arrow-ballista/issues/702

   Hello,
   
   I am very new to rust so please bare with me.
   
   So I would like to make a query on a large amount of data (50 GB of Parquet files) across multiple executors. But I am wondering how ballista handles this. Does it executed heavy loads like this because the node running it will only have 16 GB of nodes.
   
   1. How can I determine the memory required for an execution plan?
   
   2. Does ballista execute a single query on multiple executors? If not, how can I optimize this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-ballista] BubbaJoe closed issue #702: Distributed Execution

Posted by "BubbaJoe (via GitHub)" <gi...@apache.org>.
BubbaJoe closed issue #702: Distributed Execution
URL: https://github.com/apache/arrow-ballista/issues/702


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-ballista] avantgardnerio commented on issue #702: Distributed Execution

Posted by "avantgardnerio (via GitHub)" <gi...@apache.org>.
avantgardnerio commented on issue #702:
URL: https://github.com/apache/arrow-ballista/issues/702#issuecomment-1464039701

   1. I'm not sure how you would determine the appropriate amount of memory without just trying it out. Ballista by no means loads all 50GB into memory at the same time - it breaks it up into smaller RecordBatches for processing.
   2. Ballista will run your query on as many executors as it can successfully parallelize (likely as many as you give it, depending on the query).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org