You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Tobias (JIRA)" <ji...@apache.org> on 2017/03/16 08:22:41 UTC
[jira] [Created] (DRILL-5358) Error if Parquet file changes during
query
Tobias created DRILL-5358:
-----------------------------
Summary: Error if Parquet file changes during query
Key: DRILL-5358
URL: https://issues.apache.org/jira/browse/DRILL-5358
Project: Apache Drill
Issue Type: Bug
Components: Metadata, Storage - Parquet
Affects Versions: 1.9.0
Reporter: Tobias
We have a scenario where we generate our own parquet files
every X amount of seconds.
These files are in a structure based on date and it is only the file for today that gets updated
The process is as follows
1. generate parquet file in temp directory
2. When finished generation mv the file into a drill workspace/ (data/2017/03/10/data.parquet, ..)
3. Then restart the process
We have noticed that if the file is moved in while a query has started running
it will throw and error that the parquet magic number is incorrect
This is due to the file length being cached and reused so basically what seems to happen is
1. Drill plans the query
2. File gets changed under Drills feet
3. Drill executes query and tries to read and incorrect offset of the changed file
Is there anyway to fix this or avoid this scenario?
Another side effect of constantly generating a new file is that the metadata cache gets discarded for the whole workspace despite only one file changing
Is there a way to avoid that?
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)