You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by liuwei <st...@126.com> on 2014/07/30 10:43:42 UTC

spark.scheduler.pool seems not working in spark streaming

In my spark streaming program, I set scheduler pool, just as follows:

val myFairSchedulerFile = “xxx.xml”
val myStreamingPool = “xxx”

System.setProperty(“spark.scheduler.allocation.file”, myFairSchedulerFile)
val conf = new SparkConf()
val ssc = new StreamingContext(conf, batchInterval)
ssc.sparkContext.setLocalProperty(“spark.scheduler.pool”, myStreamingPool)
….
ssc.start()
ssc.awaitTermination()

I submit my spark streaming job in my spark cluster, and I found stage’s pool name is “default”, it seem ssc.sparkContext.setLocalProperty(“spark.scheduler.pool”, myStreamingPool) not work.

Re: spark.scheduler.pool seems not working in spark streaming

Posted by Tathagata Das <ta...@gmail.com>.
I filed a JIRA for this task for future reference.

https://issues.apache.org/jira/browse/SPARK-2780

On Thu, Jul 31, 2014 at 5:37 PM, Tathagata Das
<ta...@gmail.com> wrote:
> Whoa! That worked! I was half afraid it wont, since I hadnt tried it myself.
>
> TD
>
> On Wed, Jul 30, 2014 at 8:32 PM, liuwei <st...@126.com> wrote:
>> Hi, Tathagata Das:
>>
>>       I followed your advice and solved this problem, thank you very much!
>>
>>
>> 在 2014年7月31日,上午3:07,Tathagata Das <ta...@gmail.com> 写道:
>>
>>> This is because setLocalProperty makes all Spark jobs submitted using
>>> the current thread belong to the set pool. However, in Spark
>>> Streaming, all the jobs are actually launched in the background from a
>>> different thread. So this setting does not work. However,  there is a
>>> work around. If you are doing any kind of output operations on
>>> DStreams, like DStream.foreachRDD(), you can set the property inside
>>> that
>>>
>>> dstream.foreachRDD(rdd =>
>>>   rdd.sparkContext.setLocalProperty(...)
>>> )
>>>
>>>
>>>
>>> On Wed, Jul 30, 2014 at 1:43 AM, liuwei <st...@126.com> wrote:
>>>> In my spark streaming program, I set scheduler pool, just as follows:
>>>>
>>>> val myFairSchedulerFile = “xxx.xml”
>>>> val myStreamingPool = “xxx”
>>>>
>>>> System.setProperty(“spark.scheduler.allocation.file”, myFairSchedulerFile)
>>>> val conf = new SparkConf()
>>>> val ssc = new StreamingContext(conf, batchInterval)
>>>> ssc.sparkContext.setLocalProperty(“spark.scheduler.pool”, myStreamingPool)
>>>> ….
>>>> ssc.start()
>>>> ssc.awaitTermination()
>>>>
>>>> I submit my spark streaming job in my spark cluster, and I found stage’s pool name is “default”, it seem ssc.sparkContext.setLocalProperty(“spark.scheduler.pool”, myStreamingPool) not work.
>>
>>

Re: spark.scheduler.pool seems not working in spark streaming

Posted by Tathagata Das <ta...@gmail.com>.
Whoa! That worked! I was half afraid it wont, since I hadnt tried it myself.

TD

On Wed, Jul 30, 2014 at 8:32 PM, liuwei <st...@126.com> wrote:
> Hi, Tathagata Das:
>
>       I followed your advice and solved this problem, thank you very much!
>
>
> 在 2014年7月31日,上午3:07,Tathagata Das <ta...@gmail.com> 写道:
>
>> This is because setLocalProperty makes all Spark jobs submitted using
>> the current thread belong to the set pool. However, in Spark
>> Streaming, all the jobs are actually launched in the background from a
>> different thread. So this setting does not work. However,  there is a
>> work around. If you are doing any kind of output operations on
>> DStreams, like DStream.foreachRDD(), you can set the property inside
>> that
>>
>> dstream.foreachRDD(rdd =>
>>   rdd.sparkContext.setLocalProperty(...)
>> )
>>
>>
>>
>> On Wed, Jul 30, 2014 at 1:43 AM, liuwei <st...@126.com> wrote:
>>> In my spark streaming program, I set scheduler pool, just as follows:
>>>
>>> val myFairSchedulerFile = “xxx.xml”
>>> val myStreamingPool = “xxx”
>>>
>>> System.setProperty(“spark.scheduler.allocation.file”, myFairSchedulerFile)
>>> val conf = new SparkConf()
>>> val ssc = new StreamingContext(conf, batchInterval)
>>> ssc.sparkContext.setLocalProperty(“spark.scheduler.pool”, myStreamingPool)
>>> ….
>>> ssc.start()
>>> ssc.awaitTermination()
>>>
>>> I submit my spark streaming job in my spark cluster, and I found stage’s pool name is “default”, it seem ssc.sparkContext.setLocalProperty(“spark.scheduler.pool”, myStreamingPool) not work.
>
>

Re: spark.scheduler.pool seems not working in spark streaming

Posted by liuwei <st...@126.com>.
Hi, Tathagata Das:

      I followed your advice and solved this problem, thank you very much!


在 2014年7月31日,上午3:07,Tathagata Das <ta...@gmail.com> 写道:

> This is because setLocalProperty makes all Spark jobs submitted using
> the current thread belong to the set pool. However, in Spark
> Streaming, all the jobs are actually launched in the background from a
> different thread. So this setting does not work. However,  there is a
> work around. If you are doing any kind of output operations on
> DStreams, like DStream.foreachRDD(), you can set the property inside
> that
> 
> dstream.foreachRDD(rdd =>
>   rdd.sparkContext.setLocalProperty(...)
> )
> 
> 
> 
> On Wed, Jul 30, 2014 at 1:43 AM, liuwei <st...@126.com> wrote:
>> In my spark streaming program, I set scheduler pool, just as follows:
>> 
>> val myFairSchedulerFile = “xxx.xml”
>> val myStreamingPool = “xxx”
>> 
>> System.setProperty(“spark.scheduler.allocation.file”, myFairSchedulerFile)
>> val conf = new SparkConf()
>> val ssc = new StreamingContext(conf, batchInterval)
>> ssc.sparkContext.setLocalProperty(“spark.scheduler.pool”, myStreamingPool)
>> ….
>> ssc.start()
>> ssc.awaitTermination()
>> 
>> I submit my spark streaming job in my spark cluster, and I found stage’s pool name is “default”, it seem ssc.sparkContext.setLocalProperty(“spark.scheduler.pool”, myStreamingPool) not work.



Re: spark.scheduler.pool seems not working in spark streaming

Posted by Tathagata Das <ta...@gmail.com>.
This is because setLocalProperty makes all Spark jobs submitted using
the current thread belong to the set pool. However, in Spark
Streaming, all the jobs are actually launched in the background from a
different thread. So this setting does not work. However,  there is a
work around. If you are doing any kind of output operations on
DStreams, like DStream.foreachRDD(), you can set the property inside
that

dstream.foreachRDD(rdd =>
   rdd.sparkContext.setLocalProperty(...)
)



On Wed, Jul 30, 2014 at 1:43 AM, liuwei <st...@126.com> wrote:
> In my spark streaming program, I set scheduler pool, just as follows:
>
> val myFairSchedulerFile = “xxx.xml”
> val myStreamingPool = “xxx”
>
> System.setProperty(“spark.scheduler.allocation.file”, myFairSchedulerFile)
> val conf = new SparkConf()
> val ssc = new StreamingContext(conf, batchInterval)
> ssc.sparkContext.setLocalProperty(“spark.scheduler.pool”, myStreamingPool)
> ….
> ssc.start()
> ssc.awaitTermination()
>
> I submit my spark streaming job in my spark cluster, and I found stage’s pool name is “default”, it seem ssc.sparkContext.setLocalProperty(“spark.scheduler.pool”, myStreamingPool) not work.