You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Walrus theCat <wa...@gmail.com> on 2014/07/15 05:27:10 UTC

truly bizarre behavior with local[n] on Spark 1.0.1

Hi,

I've got a socketTextStream through which I'm reading input.  I have three
Dstreams, all of which are the same window operation over that
socketTextStream.  I have a four core machine.  As we've been covering
lately, I have to give a "cores" parameter to my StreamingSparkContext:

ssc = new StreamingContext("local[4]" /**TODO change once a cluster is up
**/,
      "AppName", Seconds(1))

Now, I have three dstreams, and all I ask them to do is print or count.  I
should preface this with the statement that they all work on their own.

dstream1 // 1 second window
dstream2 // 2 second window
dstream3 // 5 minute window


If I construct the ssc with "local[8]", and put these statements in this
order, I get prints on the first one, and zero counts on the second one:

ssc(local[8])  // hyperthread dat sheezy
dstream1.print // works
dstream2.count.print // always prints 0



If I do this, this happens:
ssc(local[4])
dstream1.print // doesn't work, just gives me the Time: .... ms message
dstream2.count.print // doesn't work, prints 0

ssc(local[6])
dstream1.print // doesn't work, just gives me the Time: .... ms message
dstream2.count.print // works, prints 1

Sometimes these results switch up, seemingly at random. How can I get
things to the point where I can develop and test my application locally?

Thanks

Re: truly bizarre behavior with local[n] on Spark 1.0.1

Posted by Walrus theCat <wa...@gmail.com>.
Tathagata determined that the reason it was failing was the accidental
creation of multiple input streams.  Thanks!


On Tue, Jul 15, 2014 at 1:09 PM, Walrus theCat <wa...@gmail.com>
wrote:

> Will do.
>
>
> On Tue, Jul 15, 2014 at 12:56 PM, Tathagata Das <
> tathagata.das1565@gmail.com> wrote:
>
>> This sounds really really weird. Can you give me a piece of code that I
>> can run to reproduce this issue myself?
>>
>> TD
>>
>>
>> On Tue, Jul 15, 2014 at 12:02 AM, Walrus theCat <wa...@gmail.com>
>> wrote:
>>
>>> This is (obviously) spark streaming, by the way.
>>>
>>>
>>> On Mon, Jul 14, 2014 at 8:27 PM, Walrus theCat <wa...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I've got a socketTextStream through which I'm reading input.  I have
>>>> three Dstreams, all of which are the same window operation over that
>>>> socketTextStream.  I have a four core machine.  As we've been covering
>>>> lately, I have to give a "cores" parameter to my StreamingSparkContext:
>>>>
>>>> ssc = new StreamingContext("local[4]" /**TODO change once a cluster is
>>>> up **/,
>>>>       "AppName", Seconds(1))
>>>>
>>>> Now, I have three dstreams, and all I ask them to do is print or
>>>> count.  I should preface this with the statement that they all work on
>>>> their own.
>>>>
>>>> dstream1 // 1 second window
>>>> dstream2 // 2 second window
>>>> dstream3 // 5 minute window
>>>>
>>>>
>>>> If I construct the ssc with "local[8]", and put these statements in
>>>> this order, I get prints on the first one, and zero counts on the second
>>>> one:
>>>>
>>>> ssc(local[8])  // hyperthread dat sheezy
>>>> dstream1.print // works
>>>> dstream2.count.print // always prints 0
>>>>
>>>>
>>>>
>>>> If I do this, this happens:
>>>> ssc(local[4])
>>>> dstream1.print // doesn't work, just gives me the Time: .... ms message
>>>> dstream2.count.print // doesn't work, prints 0
>>>>
>>>> ssc(local[6])
>>>> dstream1.print // doesn't work, just gives me the Time: .... ms message
>>>> dstream2.count.print // works, prints 1
>>>>
>>>> Sometimes these results switch up, seemingly at random. How can I get
>>>> things to the point where I can develop and test my application locally?
>>>>
>>>> Thanks
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>

Re: truly bizarre behavior with local[n] on Spark 1.0.1

Posted by Walrus theCat <wa...@gmail.com>.
Will do.


On Tue, Jul 15, 2014 at 12:56 PM, Tathagata Das <tathagata.das1565@gmail.com
> wrote:

> This sounds really really weird. Can you give me a piece of code that I
> can run to reproduce this issue myself?
>
> TD
>
>
> On Tue, Jul 15, 2014 at 12:02 AM, Walrus theCat <wa...@gmail.com>
> wrote:
>
>> This is (obviously) spark streaming, by the way.
>>
>>
>> On Mon, Jul 14, 2014 at 8:27 PM, Walrus theCat <wa...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I've got a socketTextStream through which I'm reading input.  I have
>>> three Dstreams, all of which are the same window operation over that
>>> socketTextStream.  I have a four core machine.  As we've been covering
>>> lately, I have to give a "cores" parameter to my StreamingSparkContext:
>>>
>>> ssc = new StreamingContext("local[4]" /**TODO change once a cluster is
>>> up **/,
>>>       "AppName", Seconds(1))
>>>
>>> Now, I have three dstreams, and all I ask them to do is print or count.
>>> I should preface this with the statement that they all work on their own.
>>>
>>> dstream1 // 1 second window
>>> dstream2 // 2 second window
>>> dstream3 // 5 minute window
>>>
>>>
>>> If I construct the ssc with "local[8]", and put these statements in this
>>> order, I get prints on the first one, and zero counts on the second one:
>>>
>>> ssc(local[8])  // hyperthread dat sheezy
>>> dstream1.print // works
>>> dstream2.count.print // always prints 0
>>>
>>>
>>>
>>> If I do this, this happens:
>>> ssc(local[4])
>>> dstream1.print // doesn't work, just gives me the Time: .... ms message
>>> dstream2.count.print // doesn't work, prints 0
>>>
>>> ssc(local[6])
>>> dstream1.print // doesn't work, just gives me the Time: .... ms message
>>> dstream2.count.print // works, prints 1
>>>
>>> Sometimes these results switch up, seemingly at random. How can I get
>>> things to the point where I can develop and test my application locally?
>>>
>>> Thanks
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>

Re: truly bizarre behavior with local[n] on Spark 1.0.1

Posted by Tathagata Das <ta...@gmail.com>.
This sounds really really weird. Can you give me a piece of code that I can
run to reproduce this issue myself?

TD


On Tue, Jul 15, 2014 at 12:02 AM, Walrus theCat <wa...@gmail.com>
wrote:

> This is (obviously) spark streaming, by the way.
>
>
> On Mon, Jul 14, 2014 at 8:27 PM, Walrus theCat <wa...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I've got a socketTextStream through which I'm reading input.  I have
>> three Dstreams, all of which are the same window operation over that
>> socketTextStream.  I have a four core machine.  As we've been covering
>> lately, I have to give a "cores" parameter to my StreamingSparkContext:
>>
>> ssc = new StreamingContext("local[4]" /**TODO change once a cluster is up
>> **/,
>>       "AppName", Seconds(1))
>>
>> Now, I have three dstreams, and all I ask them to do is print or count.
>> I should preface this with the statement that they all work on their own.
>>
>> dstream1 // 1 second window
>> dstream2 // 2 second window
>> dstream3 // 5 minute window
>>
>>
>> If I construct the ssc with "local[8]", and put these statements in this
>> order, I get prints on the first one, and zero counts on the second one:
>>
>> ssc(local[8])  // hyperthread dat sheezy
>> dstream1.print // works
>> dstream2.count.print // always prints 0
>>
>>
>>
>> If I do this, this happens:
>> ssc(local[4])
>> dstream1.print // doesn't work, just gives me the Time: .... ms message
>> dstream2.count.print // doesn't work, prints 0
>>
>> ssc(local[6])
>> dstream1.print // doesn't work, just gives me the Time: .... ms message
>> dstream2.count.print // works, prints 1
>>
>> Sometimes these results switch up, seemingly at random. How can I get
>> things to the point where I can develop and test my application locally?
>>
>> Thanks
>>
>>
>>
>>
>>
>>
>>
>

Re: truly bizarre behavior with local[n] on Spark 1.0.1

Posted by Walrus theCat <wa...@gmail.com>.
This is (obviously) spark streaming, by the way.


On Mon, Jul 14, 2014 at 8:27 PM, Walrus theCat <wa...@gmail.com>
wrote:

> Hi,
>
> I've got a socketTextStream through which I'm reading input.  I have three
> Dstreams, all of which are the same window operation over that
> socketTextStream.  I have a four core machine.  As we've been covering
> lately, I have to give a "cores" parameter to my StreamingSparkContext:
>
> ssc = new StreamingContext("local[4]" /**TODO change once a cluster is up
> **/,
>       "AppName", Seconds(1))
>
> Now, I have three dstreams, and all I ask them to do is print or count.  I
> should preface this with the statement that they all work on their own.
>
> dstream1 // 1 second window
> dstream2 // 2 second window
> dstream3 // 5 minute window
>
>
> If I construct the ssc with "local[8]", and put these statements in this
> order, I get prints on the first one, and zero counts on the second one:
>
> ssc(local[8])  // hyperthread dat sheezy
> dstream1.print // works
> dstream2.count.print // always prints 0
>
>
>
> If I do this, this happens:
> ssc(local[4])
> dstream1.print // doesn't work, just gives me the Time: .... ms message
> dstream2.count.print // doesn't work, prints 0
>
> ssc(local[6])
> dstream1.print // doesn't work, just gives me the Time: .... ms message
> dstream2.count.print // works, prints 1
>
> Sometimes these results switch up, seemingly at random. How can I get
> things to the point where I can develop and test my application locally?
>
> Thanks
>
>
>
>
>
>
>