You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by fireflyc <gi...@git.apache.org> on 2014/07/18 17:22:36 UTC
[GitHub] spark pull request: Fixed the number of worker thread
GitHub user fireflyc opened a pull request:
https://github.com/apache/spark/pull/1485
Fixed the number of worker thread
There are a lot of input Block cause too many Worker threads and will
load all data.So it should be to control the number of Worker threads
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/fireflyc/spark fixed-executor-thread
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/1485.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1485
----
commit 1facd581b3e1e37cc896a7db8d3bb8e9ab088686
Author: fireflyc <fi...@126.com>
Date: 2014-07-18T15:19:46Z
Fixed the number of worker thread
There are a lot of input Block cause too many Worker threads and will
load all data.So it should be to control the number of Worker threads
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: Fixed the number of worker thread
Posted by fireflyc <gi...@git.apache.org>.
Github user fireflyc commented on the pull request:
https://github.com/apache/spark/pull/1485#issuecomment-49493529
Will always have a task to run, when the system is idle all threads will not participate CPU. When need run task need not new a thread.
`fixed` is great way.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: Fixed the number of worker thread
Posted by fireflyc <gi...@git.apache.org>.
Github user fireflyc commented on the pull request:
https://github.com/apache/spark/pull/1485#issuecomment-49501533
My program is spark streaming over Hadoop yarn.It work for user click stream.
I read code,number of worker threads and block?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: Fixed the number of worker thread
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/1485#issuecomment-49443851
Can one of the admins verify this patch?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: Fixed the number of worker thread
Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:
https://github.com/apache/spark/pull/1485#issuecomment-50545882
Hey there - as Aaron said, the executors should never have more than N tasks active if there are N cores. I think there might be a bug causing this. So I'd recommend we close this issue and open a JIRA to figure out what is going on.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: Fixed the number of worker thread
Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/1485#issuecomment-49504813
@fireflyc again just on my tangent -- the drawback is you leave N threads allocated taking up non-trivial stack memory and so on. In most of the cases I see the overhead of starting new threads on demand isn't significant. If what you describe is happening, then fixed is certainly an improvement over cached.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: Fixed the number of worker thread
Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:
https://github.com/apache/spark/pull/1485#issuecomment-49444796
Slightly bigger point: both the 'fixed' and 'cached' executors from `Executors` have some drawbacks:
- 'fixed' always keeps the given number of threads active even if they're not doing anything
- 'cached' may create an unlimited number of threads
It's perfectly possible to create a `ThreadPoolExecutor` with core size 0 and a fixed maximum size. I wonder if that isn't the best choice here, and actually, in other usages I see throughout Spark? Because a similar issue comes up in about 10 places.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: Fixed the number of worker thread
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/1485
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: Fixed the number of worker thread
Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on the pull request:
https://github.com/apache/spark/pull/1485#issuecomment-49526386
@fireflyc Spark should not be scheduling more than N concurrent tasks on an Executor. It appears that the tasks may be returning "success" but then don't actually return the thread to the thread pool.
This is itself a bug -- could you run "jstack" on your Executor process to see where the threads are stuck?
Perhaps new tasks are just starting before the old threads finish cleaning up, and thus this solution is the right one, but I'd like to find out exactly why.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: Fixed the number of worker thread
Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on the pull request:
https://github.com/apache/spark/pull/1485#issuecomment-49494194
The tasks launched on an Executor are controlled by the DAGScheduler, and should not exceed the number of cores that executor is advertising. In what situation have you seen this happening?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: Fixed the number of worker thread
Posted by fireflyc <gi...@git.apache.org>.
Github user fireflyc commented on the pull request:
https://github.com/apache/spark/pull/1485#issuecomment-49495043
My application have 1000+ Worker Threads.
![0e75b115d7a1b2dba97284cf6443b6f0](https://cloud.githubusercontent.com/assets/183107/3633383/d939413c-0edf-11e4-91d0-5ab99df71b59.jpeg)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] spark pull request: Fixed the number of worker thread
Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on the pull request:
https://github.com/apache/spark/pull/1485#issuecomment-49499453
Does your patch fix this problem, or do your Executors just hang after you reach enough cores? This behavior should not be happening, even with an unlimited capacity cached tread pool.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---