You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by "alamb (via GitHub)" <gi...@apache.org> on 2023/09/05 18:45:30 UTC

[GitHub] [arrow-datafusion] alamb commented on issue #7429: Atomicity and Locking in Data Fusion [Question]

alamb commented on issue #7429:
URL: https://github.com/apache/arrow-datafusion/issues/7429#issuecomment-1707145144

   > My question is how to refresh the data while its being actively queried - perhaps by other threads?
   > Will running the DDL above safely lock the table until its refreshed? Will it fail incoming SQL queries?
   
   I expect that
   
   ```sql
   CREATE mem_table as SELECT * FROM s3_parquet_table
   -- run a query against mem_table
   ```
   
   And then in another thread run something like
   ```sql
   CREATE or replace mem_table as SELECT * FROM s3_parquet_table
   ```
   
   The first query against `mem_table` will use the original data. When the second query completes it will entirely replace the `mem_table` with a new table provider instance and all queries planned after that point will use the new data. Any queries that are already running will continue to use the old data
   
   A query will not see partial results / partial updates
   
   However, you will have two copies of your data in memory until the first query completes


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org