You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by HeartSaVioR <gi...@git.apache.org> on 2018/12/02 03:58:55 UTC

[GitHub] spark issue #22952: [SPARK-20568][SS] Provide option to clean up completed f...

Github user HeartSaVioR commented on the issue:

    https://github.com/apache/spark/pull/22952
  
    @zsxwing 
    Yeah, it would be ideal we can enforce `archivePath` to which don't have any possibility to match against source path (glob), so my approach was to find directory which is the base directory without having glob in ancestor, and `archive path + base directory of source path` doesn't belong to sub-directory of found directory.
    
    For example, suppose source path is `/a/b/c/*/ef?/*/g/h/*/i`, then base directory of source path would be `/a/b/c`, and `archive path + base directory of source path` should not belong to sub-directory of `/a/b/c`.
    (My code has a bug for finding the directory so need to fix it.)
    
    This is not an elegant approach and the approach has false-positive, ending up restricting the archive path which actually doesn't make overlap (too restrict), but it would guarantee two paths never overlap. (So no need to re-check when renaming file.)
    
    I guess the approach might be reasonable because in practice end users would avoid themselves have to think about complicated case on overlaps, and just isolate two paths.
    
    What do you think about this approach?
    
    cc. @gaborgsomogyi Could you also help validating my approach? 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org