You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Anahita Talebi <an...@gmail.com> on 2017/02/02 12:29:58 UTC

Running a spark code on multiple machines using google cloud platform

Dear all,

I am trying to run a spark code on multiple machines using submit job in
google cloud platform.
As the inputs of my code, I have a training and testing datasets.

When I use small training data set like (10kb), the code can be
successfully ran on the google cloud while when I have a large data set
like 50Gb, I received the following error:

17/02/01 19:08:06 ERROR org.apache.spark.scheduler.LiveListenerBus:
SparkListenerBus has already stopped! Dropping event
SparkListenerTaskEnd(2,0,ResultTask,TaskKilled,org.apache.spark.scheduler.TaskInfo@3101f3b3,null)

Does anyone can give me a hint how I can solve my problem?

PS: I cannot use small training data set because I have an
optimization code which needs to use all the data.

I have to use google could platform because I need to run the code on
multiple machines.

Thanks a lot,

Anahita

Re: Running a spark code on multiple machines using google cloud platform

Posted by Anahita Talebi <an...@gmail.com>.
Thanks for your answer.
do you mean Amazon EMR?

On Thu, Feb 2, 2017 at 2:30 PM, Marco Mistroni <mm...@gmail.com> wrote:

> U can use EMR if u want to run. On a cluster....
> Kr
>
> On 2 Feb 2017 12:30 pm, "Anahita Talebi" <an...@gmail.com>
> wrote:
>
>> Dear all,
>>
>> I am trying to run a spark code on multiple machines using submit job in
>> google cloud platform.
>> As the inputs of my code, I have a training and testing datasets.
>>
>> When I use small training data set like (10kb), the code can be
>> successfully ran on the google cloud while when I have a large data set
>> like 50Gb, I received the following error:
>>
>> 17/02/01 19:08:06 ERROR org.apache.spark.scheduler.LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerTaskEnd(2,0,ResultTask,TaskKilled,org.apache.spark.scheduler.TaskInfo@3101f3b3,null)
>>
>> Does anyone can give me a hint how I can solve my problem?
>>
>> PS: I cannot use small training data set because I have an optimization code which needs to use all the data.
>>
>> I have to use google could platform because I need to run the code on multiple machines.
>>
>> Thanks a lot,
>>
>> Anahita
>>
>>

Re: Running a spark code on multiple machines using google cloud platform

Posted by Marco Mistroni <mm...@gmail.com>.
U can use EMR if u want to run. On a cluster....
Kr

On 2 Feb 2017 12:30 pm, "Anahita Talebi" <an...@gmail.com> wrote:

> Dear all,
>
> I am trying to run a spark code on multiple machines using submit job in
> google cloud platform.
> As the inputs of my code, I have a training and testing datasets.
>
> When I use small training data set like (10kb), the code can be
> successfully ran on the google cloud while when I have a large data set
> like 50Gb, I received the following error:
>
> 17/02/01 19:08:06 ERROR org.apache.spark.scheduler.LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerTaskEnd(2,0,ResultTask,TaskKilled,org.apache.spark.scheduler.TaskInfo@3101f3b3,null)
>
> Does anyone can give me a hint how I can solve my problem?
>
> PS: I cannot use small training data set because I have an optimization code which needs to use all the data.
>
> I have to use google could platform because I need to run the code on multiple machines.
>
> Thanks a lot,
>
> Anahita
>
>