You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Gil Vernik <GI...@il.ibm.com> on 2015/09/17 09:07:50 UTC

how to send additional configuration to the RDD after it was lazily created

Hi,

I have the following case, which i am not sure how to resolve.

My code uses HadoopRDD and creates various RDDs on top of it 
(MapPartitionsRDD, and so on ) 
After all RDDs were lazily created, my code "knows" some new information 
and i want that "compute" method of the HadoopRDD will be aware of it (at 
the point when "compute" method will be called). 
What is the possible way 'to send' some additional information to the 
compute method of the HadoopRDD after this RDD is lazily created?
I tried to play with configuration, like to perform set("test","111") in 
the code and modify the compute method of HadoopRDD with get("test") - but 
of it's not working,  since SparkContext has only clone of the of the 
configuration and it can't be modified in run time.

Any thoughts how can i make it? 

Thanks
Gil.

答复: bug in Worker.scala, ExecutorRunner is not serializable

Posted by Huangguowei <hu...@huawei.com>.
Not error in normal case.

But if I want to ask Worker through akkaUrl to get executors status, it will cause Exception.


发件人: Sean Owen [mailto:sowen@cloudera.com]
发送时间: 2015年9月17日 15:54
收件人: Huangguowei; Dev
主题: Re: bug in Worker.scala, ExecutorRunner is not serializable


Did this cause an error for you?

On Thu, Sep 17, 2015, 8:51 AM Huangguowei <hu...@huawei.com>> wrote:

In Worker.scala line 480:

    case RequestWorkerState =>
      sender ! WorkerStateResponse(host, port, workerId, executors.values.toList,
        finishedExecutors.values.toList, drivers.values.toList,
        finishedDrivers.values.toList, activeMasterUrl, cores, memory,
        coresUsed, memoryUsed, activeMasterWebUiUrl)

The executors’s type is:
val executors = new HashMap[String, ExecutorRunner]

but ExecutorRunner cannot be Serialized, so if ask RequestWorkerState will cause java.io.NotSerializableException.



Re: 答复: bug in Worker.scala, ExecutorRunner is not serializable

Posted by Reynold Xin <rx...@databricks.com>.
Sounds good.


On Fri, Sep 18, 2015 at 8:50 AM, Shixiong Zhu <zs...@gmail.com> wrote:

> I'm wondering if we should create a tag trait (e.g., LocalMessage) for
> messages like this and add the comment in the trait. Looks better than
> adding inline comments for all these messages.
>
> Best Regards,
> Shixiong Zhu
>
> 2015-09-18 15:10 GMT+08:00 Reynold Xin <rx...@databricks.com>:
>
>> Maybe we should add some inline comment explaining why it is ok for that
>> message to be not serializable.
>>
>>
>> On Thu, Sep 17, 2015 at 4:08 AM, Huangguowei <hu...@huawei.com>
>> wrote:
>>
>>> Thanks for your reply. I just want to do some monitors, never mind!
>>>
>>>
>>>
>>> *发件人:* Shixiong Zhu [mailto:zsxwing@gmail.com]
>>> *发送时间:* 2015年9月17日 17:23
>>> *收件人:* Huangguowei; dev@spark.apache.org
>>> *主题:* Re: bug in Worker.scala, ExecutorRunner is not serializable
>>>
>>>
>>>
>>> RequestWorkerState is an internal message between Worker
>>> and WorkerWebUI. Since they are in the same process, that's fine. Actually,
>>> these are not public APIs. Could you elaborate your use case?
>>>
>>>
>>> Best Regards,
>>>
>>> Shixiong Zhu
>>>
>>>
>>>
>>> 2015-09-17 16:36 GMT+08:00 Huangguowei <hu...@huawei.com>:
>>>
>>>
>>>
>>> Is it possible to get Executors status when running an application?
>>>
>>>
>>>
>>> *发件人:* Sean Owen [mailto:sowen@cloudera.com]
>>> *发送时间:* 2015年9月17日 15:54
>>> *收件人:* Huangguowei; Dev
>>> *主题:* Re: bug in Worker.scala, ExecutorRunner is not serializable
>>>
>>>
>>>
>>> Did this cause an error for you?
>>>
>>>
>>>
>>> On Thu, Sep 17, 2015, 8:51 AM Huangguowei <hu...@huawei.com>
>>> wrote:
>>>
>>>
>>>
>>> In Worker.scala line 480:
>>>
>>>
>>>
>>>     case RequestWorkerState =>
>>>
>>>       sender ! WorkerStateResponse(host, port, workerId,
>>> executors.values.toList,
>>>
>>>         finishedExecutors.values.toList, drivers.values.toList,
>>>
>>>         finishedDrivers.values.toList, activeMasterUrl, cores, memory,
>>>
>>>         coresUsed, memoryUsed, activeMasterWebUiUrl)
>>>
>>>
>>>
>>> The executors’s type is:
>>>
>>> val executors = new HashMap[String, ExecutorRunner]
>>>
>>>
>>>
>>> but ExecutorRunner cannot be Serialized, so if ask RequestWorkerState
>>> will cause java.io.NotSerializableException.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>

Re: 答复: bug in Worker.scala, ExecutorRunner is not serializable

Posted by Shixiong Zhu <zs...@gmail.com>.
I'm wondering if we should create a tag trait (e.g., LocalMessage) for
messages like this and add the comment in the trait. Looks better than
adding inline comments for all these messages.

Best Regards,
Shixiong Zhu

2015-09-18 15:10 GMT+08:00 Reynold Xin <rx...@databricks.com>:

> Maybe we should add some inline comment explaining why it is ok for that
> message to be not serializable.
>
>
> On Thu, Sep 17, 2015 at 4:08 AM, Huangguowei <hu...@huawei.com>
> wrote:
>
>> Thanks for your reply. I just want to do some monitors, never mind!
>>
>>
>>
>> *发件人:* Shixiong Zhu [mailto:zsxwing@gmail.com]
>> *发送时间:* 2015年9月17日 17:23
>> *收件人:* Huangguowei; dev@spark.apache.org
>> *主题:* Re: bug in Worker.scala, ExecutorRunner is not serializable
>>
>>
>>
>> RequestWorkerState is an internal message between Worker and WorkerWebUI.
>> Since they are in the same process, that's fine. Actually, these are not
>> public APIs. Could you elaborate your use case?
>>
>>
>> Best Regards,
>>
>> Shixiong Zhu
>>
>>
>>
>> 2015-09-17 16:36 GMT+08:00 Huangguowei <hu...@huawei.com>:
>>
>>
>>
>> Is it possible to get Executors status when running an application?
>>
>>
>>
>> *发件人:* Sean Owen [mailto:sowen@cloudera.com]
>> *发送时间:* 2015年9月17日 15:54
>> *收件人:* Huangguowei; Dev
>> *主题:* Re: bug in Worker.scala, ExecutorRunner is not serializable
>>
>>
>>
>> Did this cause an error for you?
>>
>>
>>
>> On Thu, Sep 17, 2015, 8:51 AM Huangguowei <hu...@huawei.com> wrote:
>>
>>
>>
>> In Worker.scala line 480:
>>
>>
>>
>>     case RequestWorkerState =>
>>
>>       sender ! WorkerStateResponse(host, port, workerId,
>> executors.values.toList,
>>
>>         finishedExecutors.values.toList, drivers.values.toList,
>>
>>         finishedDrivers.values.toList, activeMasterUrl, cores, memory,
>>
>>         coresUsed, memoryUsed, activeMasterWebUiUrl)
>>
>>
>>
>> The executors’s type is:
>>
>> val executors = new HashMap[String, ExecutorRunner]
>>
>>
>>
>> but ExecutorRunner cannot be Serialized, so if ask RequestWorkerState
>> will cause java.io.NotSerializableException.
>>
>>
>>
>>
>>
>>
>>
>
>

Re: 答复: bug in Worker.scala, ExecutorRunner is not serializable

Posted by Reynold Xin <rx...@databricks.com>.
Maybe we should add some inline comment explaining why it is ok for that
message to be not serializable.


On Thu, Sep 17, 2015 at 4:08 AM, Huangguowei <hu...@huawei.com> wrote:

> Thanks for your reply. I just want to do some monitors, never mind!
>
>
>
> *发件人:* Shixiong Zhu [mailto:zsxwing@gmail.com]
> *发送时间:* 2015年9月17日 17:23
> *收件人:* Huangguowei; dev@spark.apache.org
> *主题:* Re: bug in Worker.scala, ExecutorRunner is not serializable
>
>
>
> RequestWorkerState is an internal message between Worker and WorkerWebUI.
> Since they are in the same process, that's fine. Actually, these are not
> public APIs. Could you elaborate your use case?
>
>
> Best Regards,
>
> Shixiong Zhu
>
>
>
> 2015-09-17 16:36 GMT+08:00 Huangguowei <hu...@huawei.com>:
>
>
>
> Is it possible to get Executors status when running an application?
>
>
>
> *发件人:* Sean Owen [mailto:sowen@cloudera.com]
> *发送时间:* 2015年9月17日 15:54
> *收件人:* Huangguowei; Dev
> *主题:* Re: bug in Worker.scala, ExecutorRunner is not serializable
>
>
>
> Did this cause an error for you?
>
>
>
> On Thu, Sep 17, 2015, 8:51 AM Huangguowei <hu...@huawei.com> wrote:
>
>
>
> In Worker.scala line 480:
>
>
>
>     case RequestWorkerState =>
>
>       sender ! WorkerStateResponse(host, port, workerId,
> executors.values.toList,
>
>         finishedExecutors.values.toList, drivers.values.toList,
>
>         finishedDrivers.values.toList, activeMasterUrl, cores, memory,
>
>         coresUsed, memoryUsed, activeMasterWebUiUrl)
>
>
>
> The executors’s type is:
>
> val executors = new HashMap[String, ExecutorRunner]
>
>
>
> but ExecutorRunner cannot be Serialized, so if ask RequestWorkerState will
> cause java.io.NotSerializableException.
>
>
>
>
>
>
>

答复: bug in Worker.scala, ExecutorRunner is not serializable

Posted by Huangguowei <hu...@huawei.com>.
Thanks for your reply. I just want to do some monitors, never mind!

发件人: Shixiong Zhu [mailto:zsxwing@gmail.com]
发送时间: 2015年9月17日 17:23
收件人: Huangguowei; dev@spark.apache.org
主题: Re: bug in Worker.scala, ExecutorRunner is not serializable

RequestWorkerState is an internal message between Worker and WorkerWebUI. Since they are in the same process, that's fine. Actually, these are not public APIs. Could you elaborate your use case?


Best Regards,
Shixiong Zhu

2015-09-17 16:36 GMT+08:00 Huangguowei <hu...@huawei.com>>:

Is it possible to get Executors status when running an application?

发件人: Sean Owen [mailto:sowen@cloudera.com<ma...@cloudera.com>]
发送时间: 2015年9月17日 15:54
收件人: Huangguowei; Dev
主题: Re: bug in Worker.scala, ExecutorRunner is not serializable


Did this cause an error for you?

On Thu, Sep 17, 2015, 8:51 AM Huangguowei <hu...@huawei.com>> wrote:

In Worker.scala line 480:

    case RequestWorkerState =>
      sender ! WorkerStateResponse(host, port, workerId, executors.values.toList,
        finishedExecutors.values.toList, drivers.values.toList,
        finishedDrivers.values.toList, activeMasterUrl, cores, memory,
        coresUsed, memoryUsed, activeMasterWebUiUrl)

The executors’s type is:
val executors = new HashMap[String, ExecutorRunner]

but ExecutorRunner cannot be Serialized, so if ask RequestWorkerState will cause java.io.NotSerializableException.




Re: bug in Worker.scala, ExecutorRunner is not serializable

Posted by Shixiong Zhu <zs...@gmail.com>.
RequestWorkerState is an internal message between Worker and WorkerWebUI.
Since they are in the same process, that's fine. Actually, these are not
public APIs. Could you elaborate your use case?

Best Regards,
Shixiong Zhu

2015-09-17 16:36 GMT+08:00 Huangguowei <hu...@huawei.com>:

>
>
> Is it possible to get Executors status when running an application?
>
>
>
> *发件人:* Sean Owen [mailto:sowen@cloudera.com]
> *发送时间:* 2015年9月17日 15:54
> *收件人:* Huangguowei; Dev
> *主题:* Re: bug in Worker.scala, ExecutorRunner is not serializable
>
>
>
> Did this cause an error for you?
>
>
>
> On Thu, Sep 17, 2015, 8:51 AM Huangguowei <hu...@huawei.com> wrote:
>
>
>
> In Worker.scala line 480:
>
>
>
>     case RequestWorkerState =>
>
>       sender ! WorkerStateResponse(host, port, workerId,
> executors.values.toList,
>
>         finishedExecutors.values.toList, drivers.values.toList,
>
>         finishedDrivers.values.toList, activeMasterUrl, cores, memory,
>
>         coresUsed, memoryUsed, activeMasterWebUiUrl)
>
>
>
> The executors’s type is:
>
> val executors = new HashMap[String, ExecutorRunner]
>
>
>
> but ExecutorRunner cannot be Serialized, so if ask RequestWorkerState will
> cause java.io.NotSerializableException.
>
>
>
>
>
>

re: bug in Worker.scala, ExecutorRunner is not serializable

Posted by Huangguowei <hu...@huawei.com>.
Is it possible to get Executors status when running an application?

发件人: Sean Owen [mailto:sowen@cloudera.com]
发送时间: 2015年9月17日 15:54
收件人: Huangguowei; Dev
主题: Re: bug in Worker.scala, ExecutorRunner is not serializable


Did this cause an error for you?

On Thu, Sep 17, 2015, 8:51 AM Huangguowei <hu...@huawei.com>> wrote:

In Worker.scala line 480:

    case RequestWorkerState =>
      sender ! WorkerStateResponse(host, port, workerId, executors.values.toList,
        finishedExecutors.values.toList, drivers.values.toList,
        finishedDrivers.values.toList, activeMasterUrl, cores, memory,
        coresUsed, memoryUsed, activeMasterWebUiUrl)

The executors’s type is:
val executors = new HashMap[String, ExecutorRunner]

but ExecutorRunner cannot be Serialized, so if ask RequestWorkerState will cause java.io.NotSerializableException.



Re: bug in Worker.scala, ExecutorRunner is not serializable

Posted by Sean Owen <so...@cloudera.com>.
Did this cause an error for you?

On Thu, Sep 17, 2015, 8:51 AM Huangguowei <hu...@huawei.com> wrote:

>
>
> In Worker.scala line 480:
>
>
>
>     case RequestWorkerState =>
>
>       sender ! WorkerStateResponse(host, port, workerId,
> executors.values.toList,
>
>         finishedExecutors.values.toList, drivers.values.toList,
>
>         finishedDrivers.values.toList, activeMasterUrl, cores, memory,
>
>         coresUsed, memoryUsed, activeMasterWebUiUrl)
>
>
>
> The executors’s type is:
>
> val executors = new HashMap[String, ExecutorRunner]
>
>
>
> but ExecutorRunner cannot be Serialized, so if ask RequestWorkerState will
> cause java.io.NotSerializableException.
>
>
>
>
>

bug in Worker.scala, ExecutorRunner is not serializable

Posted by Huangguowei <hu...@huawei.com>.
In Worker.scala line 480:

    case RequestWorkerState =>
      sender ! WorkerStateResponse(host, port, workerId, executors.values.toList,
        finishedExecutors.values.toList, drivers.values.toList,
        finishedDrivers.values.toList, activeMasterUrl, cores, memory,
        coresUsed, memoryUsed, activeMasterWebUiUrl)

The executors's type is:
val executors = new HashMap[String, ExecutorRunner]

but ExecutorRunner cannot be Serialized, so if ask RequestWorkerState will cause java.io.NotSerializableException.



Re: how to send additional configuration to the RDD after it was lazily created

Posted by Romi Kuntsman <ro...@totango.com>.
What new information do you know after creating the RDD, that you didn't
know at the time of it's creation?
I think the whole point is that RDD is immutable, you can't change it once
it was created.
Perhaps you need to refactor your logic to know the parameters earlier, or
create a whole new RDD again.

*Romi Kuntsman*, *Big Data Engineer*
http://www.totango.com

On Thu, Sep 17, 2015 at 10:07 AM, Gil Vernik <GI...@il.ibm.com> wrote:

> Hi,
>
> I have the following case, which i am not sure how to resolve.
>
> My code uses HadoopRDD and creates various RDDs on top of it
> (MapPartitionsRDD, and so on )
> After all RDDs were lazily created, my code "knows" some new information
> and i want that "compute" method of the HadoopRDD will be aware of it (at
> the point when "compute" method will be called).
> What is the possible way 'to send' some additional information to the
> compute method of the HadoopRDD after this RDD is lazily created?
> I tried to play with configuration, like to perform set("test","111") in
> the code and modify the compute method of HadoopRDD with get("test") - but
> of it's not working,  since SparkContext has only clone of the of the
> configuration and it can't be modified in run time.
>
> Any thoughts how can i make it?
>
> Thanks
> Gil.