You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by XuTingjun <gi...@git.apache.org> on 2015/02/12 03:33:10 UTC

[GitHub] spark pull request: [Core][Improvement] Delelte no longer used fil...

GitHub user XuTingjun opened a pull request:

    https://github.com/apache/spark/pull/4548

    [Core][Improvement] Delelte no longer used file

    Every time while executor fetching a jar from httpserver,  a lock file and a cache file will be created on the local. After fetching, this two files will be useless.
    And when the jar package is big, the cache file also be big. it wates the disk space. 


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/XuTingjun/spark patch

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/4548.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4548
    
----
commit 7bea1fe852d56af6bb97de3116832d4813c1db3a
Author: Xu Tingjun <xu...@huawei.com>
Date:   2015-02-11T08:18:45Z

    delelte no longer used file

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5764] Delete the cache and lock file af...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/4548#issuecomment-74063138
  
    That looks like the intent, from the comment. These files should ultimately be deleted when the executor stops. Do you think there is a problem in light of this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5764] Delete the cache and lock file af...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/4548#issuecomment-74063812
  
    Executors are per-app, so this is roughly the same thing?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Core][Improvement] Delelte no longer used fil...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4548#issuecomment-74007717
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5764] Delete the cache and lock file af...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/4548#issuecomment-74062453
  
    Ah right of course. So, the executor is keying the cache on (hash of) URL and 'version', where version is the driver's timestamp. That would be the same for executors across the same app, and that's the purpose of this cache. Right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5764] Delete the cache and lock file af...

Posted by XuTingjun <gi...@git.apache.org>.
Github user XuTingjun commented on the pull request:

    https://github.com/apache/spark/pull/4548#issuecomment-74064000
  
    I think we should consider the dynamic executor allocation, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5764] Delete the cache and lock file af...

Posted by XuTingjun <gi...@git.apache.org>.
Github user XuTingjun closed the pull request at:

    https://github.com/apache/spark/pull/4548


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5764] Delete the cache and lock file af...

Posted by XuTingjun <gi...@git.apache.org>.
Github user XuTingjun commented on the pull request:

    https://github.com/apache/spark/pull/4548#issuecomment-74040124
  
     val cachedFileName = s"${url.hashCode}${timestamp}_cache"
    
    The cache file is named with url.hashCode and timestamp. No cache file of a jar will be the same with it. So  it will not be called for future caller


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [Core][Improvement] Delelte no longer used fil...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/4548#issuecomment-74034843
  
    Isn't the point that the files should stick around for future callers? The file is not recopied and lock is not recreated if it exists. (You would need a JIRA for this anyway, but first let's clear up this question.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5764] Delete the cache and lock file af...

Posted by XuTingjun <gi...@git.apache.org>.
Github user XuTingjun commented on the pull request:

    https://github.com/apache/spark/pull/4548#issuecomment-74063531
  
    I think the cache file should be deleted when the app is finished, not executor stops.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5764] Delete the cache and lock file af...

Posted by XuTingjun <gi...@git.apache.org>.
Github user XuTingjun commented on the pull request:

    https://github.com/apache/spark/pull/4548#issuecomment-74062990
  
    Do you mean, the executors on the same node will use the cached file? I think it's right.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5764] Delete the cache and lock file af...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/4548#issuecomment-74047686
  
    The idea is that this uniquely determines the file and even a version of that file. That by itself is sound. Timestamp is not always "the current time". Look at the invocation in `Executor.scala`. I'm not as sure about the invocation in `SparkContext.scala` since it also does a fetch locally, with the current time, and that is always a 'cache miss', but I think that one is by design? But for the executor it looks correct at first glance since it uses timestamp as a sort of version key, where the timestamp is the time this particular file was added by the driver.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5764] Delete the cache and lock file af...

Posted by XuTingjun <gi...@git.apache.org>.
Github user XuTingjun commented on the pull request:

    https://github.com/apache/spark/pull/4548#issuecomment-74061806
  
    In SparkContext.scala, the useCache is false, so it won't use the cached file


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-5764] Delete the cache and lock file af...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/4548#issuecomment-74064917
  
    Yeah, good point. Actually, ignore my comment. The executors stick this file in `SparkFiles.getRootDirectory` and that is not necessarily deleted by the executor. I mean, it's not necessarily even shared. 
    
    My point was that they should not be immediately deleted, at least. They do serve a purpose in some cases.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org