You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/02/22 02:18:21 UTC

[GitHub] mikedias commented on issue #22952: [SPARK-20568][SS] Provide option to clean up completed files in streaming query

mikedias commented on issue #22952: [SPARK-20568][SS] Provide option to clean up completed files in streaming query
URL: https://github.com/apache/spark/pull/22952#issuecomment-466247024
 
 
   I think what will happen is the new file will never get processed until stream restarts because the obsolete files are not removed from the `seenFiles` map. Only when the stream restarts, the `seenFiles` will be build using the `metadataLog` information and then it wont contain the obsolete files. 
   
   And the timestamp does not play a role here. The current code only checks the filename to consider if the file is new or not (#23782 proposes an option to also consider the timestamp). 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org