You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by fireflyc <gi...@git.apache.org> on 2014/07/18 17:22:36 UTC

[GitHub] spark pull request: Fixed the number of worker thread

GitHub user fireflyc opened a pull request:

    https://github.com/apache/spark/pull/1485

    Fixed the number of worker thread

    There are a lot of input Block cause too many Worker threads and will
    load all data.So it should be to control the number of Worker threads

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/fireflyc/spark fixed-executor-thread

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1485.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1485
    
----
commit 1facd581b3e1e37cc896a7db8d3bb8e9ab088686
Author: fireflyc <fi...@126.com>
Date:   2014-07-18T15:19:46Z

    Fixed the number of worker thread
    
    There are a lot of input Block cause too many Worker threads and will
    load all data.So it should be to control the number of Worker threads

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Fixed the number of worker thread

Posted by fireflyc <gi...@git.apache.org>.

Github user fireflyc commented on the pull request:

    https://github.com/apache/spark/pull/1485#issuecomment-49493529
  
    Will always have a task to run, when the system is idle all threads will not participate CPU. When need   run  task need not new a thread.
    
    `fixed` is great way. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Fixed the number of worker thread

Posted by fireflyc <gi...@git.apache.org>.

Github user fireflyc commented on the pull request:

    https://github.com/apache/spark/pull/1485#issuecomment-49501533
  
    My program is spark streaming over Hadoop yarn.It work for user click stream.
    I read code,number of worker threads and block?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Fixed the number of worker thread

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/1485#issuecomment-49443851
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Fixed the number of worker thread

Posted by pwendell <gi...@git.apache.org>.

Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/1485#issuecomment-50545882
  
    Hey there - as Aaron said, the executors should never have more than N tasks active if there are N cores. I think there might be a bug causing this. So I'd recommend we close this issue and open a JIRA to figure out what is going on.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Fixed the number of worker thread

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/1485#issuecomment-49504813
  
    @fireflyc again just on my tangent -- the drawback is you leave N threads allocated taking up non-trivial stack memory and so on. In most of the cases I see the overhead of starting new threads on demand isn't significant. If what you describe is happening, then fixed is certainly an improvement over cached.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Fixed the number of worker thread

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/1485#issuecomment-49444796
  
    Slightly bigger point: both the 'fixed' and 'cached' executors from `Executors` have some drawbacks:
    
    - 'fixed' always keeps the given number of threads active even if they're not doing anything
    - 'cached' may create an unlimited number of threads
    
    It's perfectly possible to create a `ThreadPoolExecutor` with core size 0 and a fixed maximum size. I wonder if that isn't the best choice here, and actually, in other usages I see throughout Spark? Because a similar issue comes up in about 10 places.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Fixed the number of worker thread

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/1485


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Fixed the number of worker thread

Posted by aarondav <gi...@git.apache.org>.

Github user aarondav commented on the pull request:

    https://github.com/apache/spark/pull/1485#issuecomment-49526386
  
    @fireflyc Spark should not be scheduling more than N concurrent tasks on an Executor. It appears that the tasks may be returning "success" but then don't actually return the thread to the thread pool. 
    
    This is itself a bug -- could you run "jstack" on your Executor process to see where the threads are stuck?
    
    Perhaps new tasks are just starting before the old threads finish cleaning up, and thus this solution is the right one, but I'd like to find out exactly why.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Fixed the number of worker thread

Posted by aarondav <gi...@git.apache.org>.

Github user aarondav commented on the pull request:

    https://github.com/apache/spark/pull/1485#issuecomment-49494194
  
    The tasks launched on an Executor are controlled by the DAGScheduler, and should not exceed the number of cores that executor is advertising. In what situation have you seen this happening?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Fixed the number of worker thread

Posted by fireflyc <gi...@git.apache.org>.

Github user fireflyc commented on the pull request:

    https://github.com/apache/spark/pull/1485#issuecomment-49495043
  
    My application have 1000+ Worker Threads.
    ![0e75b115d7a1b2dba97284cf6443b6f0](https://cloud.githubusercontent.com/assets/183107/3633383/d939413c-0edf-11e4-91d0-5ab99df71b59.jpeg)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Fixed the number of worker thread

Posted by aarondav <gi...@git.apache.org>.

Github user aarondav commented on the pull request:

    https://github.com/apache/spark/pull/1485#issuecomment-49499453
  
    Does your patch fix this problem, or do your Executors just hang after you reach enough cores? This behavior should not be happening, even with an unlimited capacity cached tread pool.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---