You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by James King <ja...@gmail.com> on 2015/03/25 14:01:12 UTC

NetwrokWordCount + Spark standalone

I'm trying to run the Java NetwrokWordCount example against a simple spark
standalone runtime of one  master and one worker.

But it doesn't seem to work, the text entered on the Netcat data server is
not being picked up and printed to Eclispe console output.

However if I use conf.setMaster("local[2]") it works, the correct text gets
picked up and printed to Eclipse console.

Any ideas why, any pointers?

Re: NetwrokWordCount + Spark standalone

Posted by Akhil Das <ak...@sigmoidanalytics.com>.
You can open the Master UI running on 8080 port of your ubuntu machine and
after submitting the job, you can see how many cores are being used etc
from the UI.

Thanks
Best Regards

On Wed, Mar 25, 2015 at 6:50 PM, James King <ja...@gmail.com> wrote:

> Thanks Akhil,
>
> Yes indeed this is why it works when using local[2] but I'm unclear of why
> it doesn't work when using standalone daemons?
>
> Is there way to check what cores are being seen when running against
> standalone daemons?
>
> I'm running the master and worker on same ubuntu host. The Driver program
> is running from a windows machine.
>
> On ubuntu host command     cat /proc/cpuinfo | grep processor | wc -l
> is giving 2
>
> On Windows machine it is:
> NumberOfCores=2
> NumberOfLogicalProcessors=4
>
>
> On Wed, Mar 25, 2015 at 2:06 PM, Akhil Das <ak...@sigmoidanalytics.com>
> wrote:
>
>> Spark Streaming requires you to have minimum of 2 cores, 1 for receiving
>> your data and the other for processing. So when you say local[2] it
>> basically initialize 2 threads on your local machine, 1 for receiving data
>> from network and the other for your word count processing.
>>
>> Thanks
>> Best Regards
>>
>> On Wed, Mar 25, 2015 at 6:31 PM, James King <ja...@gmail.com>
>> wrote:
>>
>>> I'm trying to run the Java NetwrokWordCount example against a simple
>>> spark standalone runtime of one  master and one worker.
>>>
>>> But it doesn't seem to work, the text entered on the Netcat data server
>>> is not being picked up and printed to Eclispe console output.
>>>
>>> However if I use conf.setMaster("local[2]") it works, the correct text
>>> gets picked up and printed to Eclipse console.
>>>
>>> Any ideas why, any pointers?
>>>
>>
>>
>

Re: NetwrokWordCount + Spark standalone

Posted by James King <ja...@gmail.com>.
Thanks Akhil,

Yes indeed this is why it works when using local[2] but I'm unclear of why
it doesn't work when using standalone daemons?

Is there way to check what cores are being seen when running against
standalone daemons?

I'm running the master and worker on same ubuntu host. The Driver program
is running from a windows machine.

On ubuntu host command     cat /proc/cpuinfo | grep processor | wc -l
is giving 2

On Windows machine it is:
NumberOfCores=2
NumberOfLogicalProcessors=4


On Wed, Mar 25, 2015 at 2:06 PM, Akhil Das <ak...@sigmoidanalytics.com>
wrote:

> Spark Streaming requires you to have minimum of 2 cores, 1 for receiving
> your data and the other for processing. So when you say local[2] it
> basically initialize 2 threads on your local machine, 1 for receiving data
> from network and the other for your word count processing.
>
> Thanks
> Best Regards
>
> On Wed, Mar 25, 2015 at 6:31 PM, James King <ja...@gmail.com> wrote:
>
>> I'm trying to run the Java NetwrokWordCount example against a simple
>> spark standalone runtime of one  master and one worker.
>>
>> But it doesn't seem to work, the text entered on the Netcat data server
>> is not being picked up and printed to Eclispe console output.
>>
>> However if I use conf.setMaster("local[2]") it works, the correct text
>> gets picked up and printed to Eclipse console.
>>
>> Any ideas why, any pointers?
>>
>
>

Re: NetwrokWordCount + Spark standalone

Posted by Akhil Das <ak...@sigmoidanalytics.com>.
Spark Streaming requires you to have minimum of 2 cores, 1 for receiving
your data and the other for processing. So when you say local[2] it
basically initialize 2 threads on your local machine, 1 for receiving data
from network and the other for your word count processing.

Thanks
Best Regards

On Wed, Mar 25, 2015 at 6:31 PM, James King <ja...@gmail.com> wrote:

> I'm trying to run the Java NetwrokWordCount example against a simple spark
> standalone runtime of one  master and one worker.
>
> But it doesn't seem to work, the text entered on the Netcat data server is
> not being picked up and printed to Eclispe console output.
>
> However if I use conf.setMaster("local[2]") it works, the correct text
> gets picked up and printed to Eclipse console.
>
> Any ideas why, any pointers?
>