You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Mich Talebzadeh <mi...@gmail.com> on 2016/06/12 19:23:36 UTC

What is the interpretation of Cores in Spark doc

Hi,

I was writing some docs on Spark P&T and came across this.

It is about the terminology or interpretation of that in Spark doc.

This is my understanding of cores and threads.

 Cores are physical cores. Threads are virtual cores. Cores with 2 threads
is called hyper threading technology so 2 threads per core makes the core
work on two loads at same time. In other words, every thread takes care of
one load.

Core has its own memory. So if you have a dual core with hyper threading,
the core works with 2 loads each at same time because of the 2 threads per
core, but this 2 threads will share memory in that core.

Some vendors as I am sure most of you aware charge licensing per core.

For example on the same host that I have Spark, I have a SAP product that
checks the licensing and shuts the application down if the license does not
agree with the cores speced.

This is what it says

./cpuinfo
License hostid:        00e04c69159a 0050b60fd1e7
Detected 12 logical processor(s), 6 core(s), in 1 chip(s)

So here I have 12 logical processors  and 6 cores and 1 chip. I call
logical processors as threads so I have 12 threads?

Now if I go and start worker process ${SPARK_HOME}/sbin/start-slaves.sh, I
see this in GUI page

[image: Inline images 1]

it says 12 cores but I gather it is threads?


Spark document
<http://spark.apache.org/docs/latest/submitting-applications.html> states
and I quote


[image: Inline images 2]



OK the line local[k] adds  ..  *set this to the number of cores on your
machine*


But I know that it means threads. Because if I went and set that to 6, it
would be only 6 threads as opposed to 12 threads.


the next line local[*] seems to indicate it correctly as it refers to
"logical cores" that in my understanding it is threads.


I trust that I am not nitpicking here!


Cheers,



Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com

Re: What is the interpretation of Cores in Spark doc

Posted by Mich Talebzadeh <mi...@gmail.com>.
great reply everyone.

just confining to the current subject matter Spark and the use of CPU
allocation. We have Spark-submit parameters:

Local mode

${SPARK_HOME}/bin/spark-submit \
                 --num-executors 1 \
                --master local[2] \  ## two cores


And that --master[k] on my box comes from

cat /proc/cpuinfo|grep processor
processor       : 0
processor       : 1
processor       : 2
processor       : 3
processor       : 4
processor       : 5
processor       : 6
processor       : 7
processor       : 8
processor       : 9
processor       : 10
processor       : 11

so there are 12 processors 0-12

And 12 core-id

cat /proc/cpuinfo|grep 'core id'
core id         : 0
core id         : 1
core id         : 2
core id         : 8
core id         : 9
core id         : 10
core id         : 0
core id         : 1
core id         : 2
core id         : 8
core id         : 9
core id         : 10

So in spark-submit I can put

${SPARK_HOME}/bin/spark-submit \
                 --num-executors 1 \
                --master local[12] \  ## Max cores

Actually this is what Spark doc
<http://spark.apache.org/docs/latest/submitting-applications.html>says

 *Run application locally on 8 cores*
./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master local[8] \


That resolves our usage.

Now I mentioned earlier the licensing charges. So if I run any SAP product
they are going to charge us with cores on this host for their software

./cpuinfo
License hostid:        00e04c69159a 0050b60fd1e7
*Detected 12 logical processor(s), 6 core(s), in 1 chip(s)*

They charge by core(s) so we will have to pay for 6 cores not 12 logical
processors. I am sure if they knew that they could charge for 12 cores they
would have done it by now :)


Cheers

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 17 June 2016 at 12:01, Robin East <ro...@xense.co.uk> wrote:

> Agreed it’s a worthwhile discussion (and interesting IMO)
>
> This is a section from your original post:
>
> It is about the terminology or interpretation of that in Spark doc.
>>>>>
>>>>> This is my understanding of cores and threads.
>>>>>
>>>>>  Cores are physical cores. Threads are virtual cores.
>>>>>
>>>>
> At least as far as Spark doc is concerned Threads are not synonymous with
> virtual cores; they are closely related concepts of course. So any time we
> want to have a discussion about architecture, performance, tuning,
> configuration etc we do need to be clear about the concepts and how they
> are defined.
>
> Granted CPU hardware implementation can also refer to ’threads’. In fact
> Oracle/Sun seem unclear as to what they mean by thread - in various
> documents they define threads as:
>
> A software entity that can be executed on hardware (e.g. Oracle SPARC
> Architecture 2011)
>
> At other times as:
>
> A thread is a hardware strand. Each thread, or strand, enjoys a unique set
> of resources in support of its … (e.g. OpenSPARC T1 Microarchitecture
> Specification)
>
> So unless the documentation you are writing is very specific to your
> environment, and the idea that a thread is a logical processor is generally
> accepted, I would not be inclined to treat threads as if they are logical
> processors.
>
>
>
> On 16 Jun 2016, at 15:45, Mich Talebzadeh <mi...@gmail.com>
> wrote:
>
> Thanks all.
>
> I think we are diverging but IMO it is a worthwhile discussion
>
> Actually, threads are a hardware implementation - hence the whole notion
> of “multi-threaded cores”.   What happens is that the cores often have
> duplicate registers, etc. for holding execution state.   While it is
> correct that only a single process is executing at a time, a single core
> will have execution states of multiple processes preserved in these
> registers. In addition, it is the core (not the OS) that determines when
> the thread is executed. The approach often varies according to the CPU
> manufacturer, but the most simple approach is when one thread of execution
> executes a multi-cycle operation (e.g. a fetch from main memory, etc.), the
> core simply stops processing that thread saves the execution state to a set
> of registers, loads instructions from the other set of registers and goes
> on.  On the Oracle SPARC chips, it will actually check the next thread to
> see if the reason it was ‘parked’ has completed and if not, skip it for the
> subsequent thread. The OS is only aware of what are cores and what are
> logical processors - and dispatches accordingly.  *Execution is up to the
> cores*. .
>
> Cheers
>
>
>
>
> Dr Mich Talebzadeh
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 16 June 2016 at 13:02, Robin East <ro...@xense.co.uk> wrote:
>
>> Mich
>>
>> >> A core may have one or more threads
>> It would be more accurate to say that a core could *run* one or more
>> threads scheduled for execution. Threads are a software/OS concept that
>> represent executable code that is scheduled to run by the OS; A CPU, core
>> or virtual core/virtual processor execute that code. Threads are not CPUs
>> or cores whether physical or logical - any Spark documentation that implies
>> this is mistaken. I’ve looked at the documentation you mention and I don’t
>> read it to mean that threads are logical processors.
>>
>> To go back to your original question, if you set local[6] and you have 12
>> logical processors then you are likely to have half your CPU resources
>> unused by Spark.
>>
>>
>> On 15 Jun 2016, at 23:08, Mich Talebzadeh <mi...@gmail.com>
>> wrote:
>>
>> I think it is slightly more than that.
>>
>> These days  software is licensed by core (generally speaking).   That is
>> the physical processor.   * A core may have one or more threads - or
>> logical processors*. Virtualization adds some fun to the mix.
>> Generally what they present is ‘virtual processors’.   What that equates to
>> depends on the virtualization layer itself.   In some simpler VM’s - it is
>> virtual=logical.   In others, virtual=logical but they are constrained to
>> be from the same cores - e.g. if you get 6 virtual processors, it really is
>> 3 full cores with 2 threads each.   Rational is due to the way OS
>> dispatching works on ‘logical’ processors vs. cores and POSIX threaded
>> applications.
>>
>> HTH
>>
>> Dr Mich Talebzadeh
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 13 June 2016 at 18:17, Mark Hamstra <ma...@clearstorydata.com> wrote:
>>
>>> I don't know what documentation you were referring to, but this is
>>> clearly an erroneous statement: "Threads are virtual cores."  At best it is
>>> terminology abuse by a hardware manufacturer.  Regardless, Spark can't get
>>> too concerned about how any particular hardware vendor wants to refer to
>>> the specific components of their CPU architecture.  For us, a core is a
>>> logical execution unit, something on which a thread of execution can run.
>>> That can map in different ways to different physical or virtual hardware.
>>>
>>> On Mon, Jun 13, 2016 at 12:02 AM, Mich Talebzadeh <
>>> mich.talebzadeh@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> It is not the issue of testing anything. I was referring to
>>>> documentation that clearly use the term "threads". As I said and showed
>>>> before, one line is using the term "thread" and the next one "logical
>>>> cores".
>>>>
>>>>
>>>> HTH
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>>
>>>> On 12 June 2016 at 23:57, Daniel Darabos <
>>>> daniel.darabos@lynxanalytics.com> wrote:
>>>>
>>>>> Spark is a software product. In software a "core" is something that a
>>>>> process can run on. So it's a "virtual core". (Do not call these "threads".
>>>>> A "thread" is not something a process can run on.)
>>>>>
>>>>> local[*] uses java.lang.Runtime.availableProcessors()
>>>>> <https://github.com/apache/spark/blob/v1.6.1/core/src/main/scala/org/apache/spark/SparkContext.scala#L2608>.
>>>>> Since Java is software, this also returns the number of virtual cores. (You
>>>>> can test this easily.)
>>>>>
>>>>>
>>>>> On Sun, Jun 12, 2016 at 9:23 PM, Mich Talebzadeh <
>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I was writing some docs on Spark P&T and came across this.
>>>>>>
>>>>>> It is about the terminology or interpretation of that in Spark doc.
>>>>>>
>>>>>> This is my understanding of cores and threads.
>>>>>>
>>>>>>  Cores are physical cores. Threads are virtual cores. Cores with 2
>>>>>> threads is called hyper threading technology so 2 threads per core makes
>>>>>> the core work on two loads at same time. In other words, every thread takes
>>>>>> care of one load.
>>>>>>
>>>>>> Core has its own memory. So if you have a dual core with hyper
>>>>>> threading, the core works with 2 loads each at same time because of the 2
>>>>>> threads per core, but this 2 threads will share memory in that core.
>>>>>>
>>>>>> Some vendors as I am sure most of you aware charge licensing per core.
>>>>>>
>>>>>> For example on the same host that I have Spark, I have a SAP product
>>>>>> that checks the licensing and shuts the application down if the license
>>>>>> does not agree with the cores speced.
>>>>>>
>>>>>> This is what it says
>>>>>>
>>>>>> ./cpuinfo
>>>>>> License hostid:        00e04c69159a 0050b60fd1e7
>>>>>> Detected 12 logical processor(s), 6 core(s), in 1 chip(s)
>>>>>>
>>>>>> So here I have 12 logical processors  and 6 cores and 1 chip. I call
>>>>>> logical processors as threads so I have 12 threads?
>>>>>>
>>>>>> Now if I go and start worker process
>>>>>> ${SPARK_HOME}/sbin/start-slaves.sh, I see this in GUI page
>>>>>>
>>>>>> <image.png>
>>>>>>
>>>>>> it says 12 cores but I gather it is threads?
>>>>>>
>>>>>> Spark document
>>>>>> <http://spark.apache.org/docs/latest/submitting-applications.html>
>>>>>> states and I quote
>>>>>>
>>>>>> <image.png>
>>>>>>
>>>>>>
>>>>>> OK the line local[k] adds  ..  *set this to the number of cores on
>>>>>> your machine*
>>>>>>
>>>>>> But I know that it means threads. Because if I went and set that to
>>>>>> 6, it would be only 6 threads as opposed to 12 threads.
>>>>>>
>>>>>> the next line local[*] seems to indicate it correctly as it refers to
>>>>>> "logical cores" that in my understanding it is threads.
>>>>>>
>>>>>> I trust that I am not nitpicking here!
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>>
>>>>>> Dr Mich Talebzadeh
>>>>>>
>>>>>>
>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>
>>>>>>
>>>>>> http://talebzadehmich.wordpress.com
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>>
>
>

Re: What is the interpretation of Cores in Spark doc

Posted by Robin East <ro...@xense.co.uk>.
Agreed it’s a worthwhile discussion (and interesting IMO)

This is a section from your original post:

> It is about the terminology or interpretation of that in Spark doc.
> 
> This is my understanding of cores and threads.
> 
>  Cores are physical cores. Threads are virtual cores.

At least as far as Spark doc is concerned Threads are not synonymous with virtual cores; they are closely related concepts of course. So any time we want to have a discussion about architecture, performance, tuning, configuration etc we do need to be clear about the concepts and how they are defined.

Granted CPU hardware implementation can also refer to ’threads’. In fact Oracle/Sun seem unclear as to what they mean by thread - in various documents they define threads as:

A software entity that can be executed on hardware (e.g. Oracle SPARC Architecture 2011)

At other times as:

A thread is a hardware strand. Each thread, or strand, enjoys a unique set of resources in support of its … (e.g. OpenSPARC T1 Microarchitecture Specification)

So unless the documentation you are writing is very specific to your environment, and the idea that a thread is a logical processor is generally accepted, I would not be inclined to treat threads as if they are logical processors.



> On 16 Jun 2016, at 15:45, Mich Talebzadeh <mi...@gmail.com> wrote:
> 
> Thanks all.
> 
> I think we are diverging but IMO it is a worthwhile discussion
> 
> Actually, threads are a hardware implementation - hence the whole notion of “multi-threaded cores”.   What happens is that the cores often have duplicate registers, etc. for holding execution state.   While it is correct that only a single process is executing at a time, a single core will have execution states of multiple processes preserved in these registers. In addition, it is the core (not the OS) that determines when the thread is executed. The approach often varies according to the CPU manufacturer, but the most simple approach is when one thread of execution executes a multi-cycle operation (e.g. a fetch from main memory, etc.), the core simply stops processing that thread saves the execution state to a set of registers, loads instructions from the other set of registers and goes on.  On the Oracle SPARC chips, it will actually check the next thread to see if the reason it was ‘parked’ has completed and if not, skip it for the subsequent thread. The OS is only aware of what are cores and what are logical processors - and dispatches accordingly.  Execution is up to the cores. .
> Cheers
> 
> 
> 
> 
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>  
> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>  
> 
> On 16 June 2016 at 13:02, Robin East <robin.east@xense.co.uk <ma...@xense.co.uk>> wrote:
> Mich
> 
> >> A core may have one or more threads
> It would be more accurate to say that a core could run one or more threads scheduled for execution. Threads are a software/OS concept that represent executable code that is scheduled to run by the OS; A CPU, core or virtual core/virtual processor execute that code. Threads are not CPUs or cores whether physical or logical - any Spark documentation that implies this is mistaken. I’ve looked at the documentation you mention and I don’t read it to mean that threads are logical processors.
> 
> To go back to your original question, if you set local[6] and you have 12 logical processors then you are likely to have half your CPU resources unused by Spark.
> 
> 
>> On 15 Jun 2016, at 23:08, Mich Talebzadeh <mich.talebzadeh@gmail.com <ma...@gmail.com>> wrote:
>> 
>> I think it is slightly more than that.
>> 
>> These days  software is licensed by core (generally speaking).   That is the physical processor.    A core may have one or more threads - or logical processors. Virtualization adds some fun to the mix.   Generally what they present is ‘virtual processors’.   What that equates to depends on the virtualization layer itself.   In some simpler VM’s - it is virtual=logical.   In others, virtual=logical but they are constrained to be from the same cores - e.g. if you get 6 virtual processors, it really is 3 full cores with 2 threads each.   Rational is due to the way OS dispatching works on ‘logical’ processors vs. cores and POSIX threaded applications.
>> 
>> HTH
>> 
>> Dr Mich Talebzadeh
>>  
>> LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>>  
>> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>>  
>> 
>> On 13 June 2016 at 18:17, Mark Hamstra <mark@clearstorydata.com <ma...@clearstorydata.com>> wrote:
>> I don't know what documentation you were referring to, but this is clearly an erroneous statement: "Threads are virtual cores."  At best it is terminology abuse by a hardware manufacturer.  Regardless, Spark can't get too concerned about how any particular hardware vendor wants to refer to the specific components of their CPU architecture.  For us, a core is a logical execution unit, something on which a thread of execution can run.  That can map in different ways to different physical or virtual hardware. 
>> 
>> On Mon, Jun 13, 2016 at 12:02 AM, Mich Talebzadeh <mich.talebzadeh@gmail.com <ma...@gmail.com>> wrote:
>> Hi,
>> 
>> It is not the issue of testing anything. I was referring to documentation that clearly use the term "threads". As I said and showed before, one line is using the term "thread" and the next one "logical cores".
>> 
>> 
>> HTH
>> 
>> Dr Mich Talebzadeh
>>  
>> LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>>  
>> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>>  
>> 
>> On 12 June 2016 at 23:57, Daniel Darabos <daniel.darabos@lynxanalytics.com <ma...@lynxanalytics.com>> wrote:
>> Spark is a software product. In software a "core" is something that a process can run on. So it's a "virtual core". (Do not call these "threads". A "thread" is not something a process can run on.)
>> 
>> local[*] uses java.lang.Runtime.availableProcessors() <https://github.com/apache/spark/blob/v1.6.1/core/src/main/scala/org/apache/spark/SparkContext.scala#L2608>. Since Java is software, this also returns the number of virtual cores. (You can test this easily.)
>> 
>> 
>> On Sun, Jun 12, 2016 at 9:23 PM, Mich Talebzadeh <mich.talebzadeh@gmail.com <ma...@gmail.com>> wrote:
>> 
>> Hi,
>> 
>> I was writing some docs on Spark P&T and came across this.
>> 
>> It is about the terminology or interpretation of that in Spark doc.
>> 
>> This is my understanding of cores and threads.
>> 
>>  Cores are physical cores. Threads are virtual cores. Cores with 2 threads is called hyper threading technology so 2 threads per core makes the core work on two loads at same time. In other words, every thread takes care of one load.
>> 
>> Core has its own memory. So if you have a dual core with hyper threading, the core works with 2 loads each at same time because of the 2 threads per core, but this 2 threads will share memory in that core.
>> 
>> Some vendors as I am sure most of you aware charge licensing per core.
>> 
>> For example on the same host that I have Spark, I have a SAP product that checks the licensing and shuts the application down if the license does not agree with the cores speced.
>> 
>> This is what it says
>> 
>> ./cpuinfo
>> License hostid:        00e04c69159a 0050b60fd1e7
>> Detected 12 logical processor(s), 6 core(s), in 1 chip(s)
>> 
>> So here I have 12 logical processors  and 6 cores and 1 chip. I call logical processors as threads so I have 12 threads?
>> 
>> Now if I go and start worker process ${SPARK_HOME}/sbin/start-slaves.sh, I see this in GUI page
>> 
>> <image.png>
>> 
>> it says 12 cores but I gather it is threads?
>> 
>> Spark document <http://spark.apache.org/docs/latest/submitting-applications.html> states and I quote
>> 
>> <image.png>
>> 
>> 
>> OK the line local[k] adds  ..  set this to the number of cores on your machine
>> 
>> But I know that it means threads. Because if I went and set that to 6, it would be only 6 threads as opposed to 12 threads.
>> 
>> the next line local[*] seems to indicate it correctly as it refers to "logical cores" that in my understanding it is threads.
>> 
>> I trust that I am not nitpicking here!
>> 
>> Cheers,
>> 
>> 
>> Dr Mich Talebzadeh
>>  
>> LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>>  
>> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>>  
>> 
>> 
>> 
>> 
> 
> 


Re: What is the interpretation of Cores in Spark doc

Posted by Deepak Goel <de...@gmail.com>.
Just wondering, if threads were purely an hardware implementation then if
my application in Java had one thread, and it was ran on a multcore machine
then that thread in Java could be split up into small parts and ran in
different cores simultaneously. However this would raise synchronization
problems.

So while it is true, threads are implemented at hardware level, but there
are threads also at software level.


Hey

Namaskara~Nalama~Guten Tag~Bonjour


   --
Keigu

Deepak
73500 12833
www.simtree.net, deepak@simtree.net
deicool@gmail.com

LinkedIn: www.linkedin.com/in/deicool
Skype: thumsupdeicool
Google talk: deicool
Blog: http://loveandfearless.wordpress.com
Facebook: http://www.facebook.com/deicool

"Contribute to the world, environment and more : http://www.gridrepublic.org
"

On Thu, Jun 16, 2016 at 8:15 PM, Mich Talebzadeh <mi...@gmail.com>
wrote:

> Thanks all.
>
> I think we are diverging but IMO it is a worthwhile discussion
>
> Actually, threads are a hardware implementation - hence the whole notion
> of “multi-threaded cores”.   What happens is that the cores often have
> duplicate registers, etc. for holding execution state.   While it is
> correct that only a single process is executing at a time, a single core
> will have execution states of multiple processes preserved in these
> registers. In addition, it is the core (not the OS) that determines when
> the thread is executed. The approach often varies according to the CPU
> manufacturer, but the most simple approach is when one thread of execution
> executes a multi-cycle operation (e.g. a fetch from main memory, etc.), the
> core simply stops processing that thread saves the execution state to a set
> of registers, loads instructions from the other set of registers and goes
> on.  On the Oracle SPARC chips, it will actually check the next thread to
> see if the reason it was ‘parked’ has completed and if not, skip it for the
> subsequent thread. The OS is only aware of what are cores and what are
> logical processors - and dispatches accordingly.  *Execution is up to the
> cores*. .
>
> Cheers
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 16 June 2016 at 13:02, Robin East <ro...@xense.co.uk> wrote:
>
>> Mich
>>
>> >> A core may have one or more threads
>> It would be more accurate to say that a core could *run* one or more
>> threads scheduled for execution. Threads are a software/OS concept that
>> represent executable code that is scheduled to run by the OS; A CPU, core
>> or virtual core/virtual processor execute that code. Threads are not CPUs
>> or cores whether physical or logical - any Spark documentation that implies
>> this is mistaken. I’ve looked at the documentation you mention and I don’t
>> read it to mean that threads are logical processors.
>>
>> To go back to your original question, if you set local[6] and you have 12
>> logical processors then you are likely to have half your CPU resources
>> unused by Spark.
>>
>>
>> On 15 Jun 2016, at 23:08, Mich Talebzadeh <mi...@gmail.com>
>> wrote:
>>
>> I think it is slightly more than that.
>>
>> These days  software is licensed by core (generally speaking).   That is
>> the physical processor.   * A core may have one or more threads - or
>> logical processors*. Virtualization adds some fun to the mix.
>> Generally what they present is ‘virtual processors’.   What that equates to
>> depends on the virtualization layer itself.   In some simpler VM’s - it is
>> virtual=logical.   In others, virtual=logical but they are constrained to
>> be from the same cores - e.g. if you get 6 virtual processors, it really is
>> 3 full cores with 2 threads each.   Rational is due to the way OS
>> dispatching works on ‘logical’ processors vs. cores and POSIX threaded
>> applications.
>>
>> HTH
>>
>> Dr Mich Talebzadeh
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 13 June 2016 at 18:17, Mark Hamstra <ma...@clearstorydata.com> wrote:
>>
>>> I don't know what documentation you were referring to, but this is
>>> clearly an erroneous statement: "Threads are virtual cores."  At best it is
>>> terminology abuse by a hardware manufacturer.  Regardless, Spark can't get
>>> too concerned about how any particular hardware vendor wants to refer to
>>> the specific components of their CPU architecture.  For us, a core is a
>>> logical execution unit, something on which a thread of execution can run.
>>> That can map in different ways to different physical or virtual hardware.
>>>
>>> On Mon, Jun 13, 2016 at 12:02 AM, Mich Talebzadeh <
>>> mich.talebzadeh@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> It is not the issue of testing anything. I was referring to
>>>> documentation that clearly use the term "threads". As I said and showed
>>>> before, one line is using the term "thread" and the next one "logical
>>>> cores".
>>>>
>>>>
>>>> HTH
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>>
>>>> On 12 June 2016 at 23:57, Daniel Darabos <
>>>> daniel.darabos@lynxanalytics.com> wrote:
>>>>
>>>>> Spark is a software product. In software a "core" is something that a
>>>>> process can run on. So it's a "virtual core". (Do not call these "threads".
>>>>> A "thread" is not something a process can run on.)
>>>>>
>>>>> local[*] uses java.lang.Runtime.availableProcessors()
>>>>> <https://github.com/apache/spark/blob/v1.6.1/core/src/main/scala/org/apache/spark/SparkContext.scala#L2608>.
>>>>> Since Java is software, this also returns the number of virtual cores. (You
>>>>> can test this easily.)
>>>>>
>>>>>
>>>>> On Sun, Jun 12, 2016 at 9:23 PM, Mich Talebzadeh <
>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I was writing some docs on Spark P&T and came across this.
>>>>>>
>>>>>> It is about the terminology or interpretation of that in Spark doc.
>>>>>>
>>>>>> This is my understanding of cores and threads.
>>>>>>
>>>>>>  Cores are physical cores. Threads are virtual cores. Cores with 2
>>>>>> threads is called hyper threading technology so 2 threads per core makes
>>>>>> the core work on two loads at same time. In other words, every thread takes
>>>>>> care of one load.
>>>>>>
>>>>>> Core has its own memory. So if you have a dual core with hyper
>>>>>> threading, the core works with 2 loads each at same time because of the 2
>>>>>> threads per core, but this 2 threads will share memory in that core.
>>>>>>
>>>>>> Some vendors as I am sure most of you aware charge licensing per core.
>>>>>>
>>>>>> For example on the same host that I have Spark, I have a SAP product
>>>>>> that checks the licensing and shuts the application down if the license
>>>>>> does not agree with the cores speced.
>>>>>>
>>>>>> This is what it says
>>>>>>
>>>>>> ./cpuinfo
>>>>>> License hostid:        00e04c69159a 0050b60fd1e7
>>>>>> Detected 12 logical processor(s), 6 core(s), in 1 chip(s)
>>>>>>
>>>>>> So here I have 12 logical processors  and 6 cores and 1 chip. I call
>>>>>> logical processors as threads so I have 12 threads?
>>>>>>
>>>>>> Now if I go and start worker process
>>>>>> ${SPARK_HOME}/sbin/start-slaves.sh, I see this in GUI page
>>>>>>
>>>>>> <image.png>
>>>>>>
>>>>>> it says 12 cores but I gather it is threads?
>>>>>>
>>>>>> Spark document
>>>>>> <http://spark.apache.org/docs/latest/submitting-applications.html>
>>>>>> states and I quote
>>>>>>
>>>>>> <image.png>
>>>>>>
>>>>>>
>>>>>> OK the line local[k] adds  ..  *set this to the number of cores on
>>>>>> your machine*
>>>>>>
>>>>>> But I know that it means threads. Because if I went and set that to
>>>>>> 6, it would be only 6 threads as opposed to 12 threads.
>>>>>>
>>>>>> the next line local[*] seems to indicate it correctly as it refers to
>>>>>> "logical cores" that in my understanding it is threads.
>>>>>>
>>>>>> I trust that I am not nitpicking here!
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>>
>>>>>> Dr Mich Talebzadeh
>>>>>>
>>>>>>
>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>
>>>>>>
>>>>>> http://talebzadehmich.wordpress.com
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>>
>

Re: What is the interpretation of Cores in Spark doc

Posted by Mark Hamstra <ma...@clearstorydata.com>.
>
> Actually, threads are a hardware implementation - hence the whole notion
> of “multi-threaded cores”.


No, a multi-threaded core is a core that supports multiple concurrent
threads of execution, not a core that has multiple threads.  The
terminology and marketing around multi-core processors, hyper threading and
virtualization are confusing enough without taking the further step of
misapplying software-specific terms to hardware components.

On Thu, Jun 16, 2016 at 7:45 AM, Mich Talebzadeh <mi...@gmail.com>
wrote:

> Thanks all.
>
> I think we are diverging but IMO it is a worthwhile discussion
>
> Actually, threads are a hardware implementation - hence the whole notion
> of “multi-threaded cores”.   What happens is that the cores often have
> duplicate registers, etc. for holding execution state.   While it is
> correct that only a single process is executing at a time, a single core
> will have execution states of multiple processes preserved in these
> registers. In addition, it is the core (not the OS) that determines when
> the thread is executed. The approach often varies according to the CPU
> manufacturer, but the most simple approach is when one thread of execution
> executes a multi-cycle operation (e.g. a fetch from main memory, etc.), the
> core simply stops processing that thread saves the execution state to a set
> of registers, loads instructions from the other set of registers and goes
> on.  On the Oracle SPARC chips, it will actually check the next thread to
> see if the reason it was ‘parked’ has completed and if not, skip it for the
> subsequent thread. The OS is only aware of what are cores and what are
> logical processors - and dispatches accordingly.  *Execution is up to the
> cores*. .
>
> Cheers
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 16 June 2016 at 13:02, Robin East <ro...@xense.co.uk> wrote:
>
>> Mich
>>
>> >> A core may have one or more threads
>> It would be more accurate to say that a core could *run* one or more
>> threads scheduled for execution. Threads are a software/OS concept that
>> represent executable code that is scheduled to run by the OS; A CPU, core
>> or virtual core/virtual processor execute that code. Threads are not CPUs
>> or cores whether physical or logical - any Spark documentation that implies
>> this is mistaken. I’ve looked at the documentation you mention and I don’t
>> read it to mean that threads are logical processors.
>>
>> To go back to your original question, if you set local[6] and you have 12
>> logical processors then you are likely to have half your CPU resources
>> unused by Spark.
>>
>>
>> On 15 Jun 2016, at 23:08, Mich Talebzadeh <mi...@gmail.com>
>> wrote:
>>
>> I think it is slightly more than that.
>>
>> These days  software is licensed by core (generally speaking).   That is
>> the physical processor.   * A core may have one or more threads - or
>> logical processors*. Virtualization adds some fun to the mix.
>> Generally what they present is ‘virtual processors’.   What that equates to
>> depends on the virtualization layer itself.   In some simpler VM’s - it is
>> virtual=logical.   In others, virtual=logical but they are constrained to
>> be from the same cores - e.g. if you get 6 virtual processors, it really is
>> 3 full cores with 2 threads each.   Rational is due to the way OS
>> dispatching works on ‘logical’ processors vs. cores and POSIX threaded
>> applications.
>>
>> HTH
>>
>> Dr Mich Talebzadeh
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 13 June 2016 at 18:17, Mark Hamstra <ma...@clearstorydata.com> wrote:
>>
>>> I don't know what documentation you were referring to, but this is
>>> clearly an erroneous statement: "Threads are virtual cores."  At best it is
>>> terminology abuse by a hardware manufacturer.  Regardless, Spark can't get
>>> too concerned about how any particular hardware vendor wants to refer to
>>> the specific components of their CPU architecture.  For us, a core is a
>>> logical execution unit, something on which a thread of execution can run.
>>> That can map in different ways to different physical or virtual hardware.
>>>
>>> On Mon, Jun 13, 2016 at 12:02 AM, Mich Talebzadeh <
>>> mich.talebzadeh@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> It is not the issue of testing anything. I was referring to
>>>> documentation that clearly use the term "threads". As I said and showed
>>>> before, one line is using the term "thread" and the next one "logical
>>>> cores".
>>>>
>>>>
>>>> HTH
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>>
>>>> On 12 June 2016 at 23:57, Daniel Darabos <
>>>> daniel.darabos@lynxanalytics.com> wrote:
>>>>
>>>>> Spark is a software product. In software a "core" is something that a
>>>>> process can run on. So it's a "virtual core". (Do not call these "threads".
>>>>> A "thread" is not something a process can run on.)
>>>>>
>>>>> local[*] uses java.lang.Runtime.availableProcessors()
>>>>> <https://github.com/apache/spark/blob/v1.6.1/core/src/main/scala/org/apache/spark/SparkContext.scala#L2608>.
>>>>> Since Java is software, this also returns the number of virtual cores. (You
>>>>> can test this easily.)
>>>>>
>>>>>
>>>>> On Sun, Jun 12, 2016 at 9:23 PM, Mich Talebzadeh <
>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I was writing some docs on Spark P&T and came across this.
>>>>>>
>>>>>> It is about the terminology or interpretation of that in Spark doc.
>>>>>>
>>>>>> This is my understanding of cores and threads.
>>>>>>
>>>>>>  Cores are physical cores. Threads are virtual cores. Cores with 2
>>>>>> threads is called hyper threading technology so 2 threads per core makes
>>>>>> the core work on two loads at same time. In other words, every thread takes
>>>>>> care of one load.
>>>>>>
>>>>>> Core has its own memory. So if you have a dual core with hyper
>>>>>> threading, the core works with 2 loads each at same time because of the 2
>>>>>> threads per core, but this 2 threads will share memory in that core.
>>>>>>
>>>>>> Some vendors as I am sure most of you aware charge licensing per core.
>>>>>>
>>>>>> For example on the same host that I have Spark, I have a SAP product
>>>>>> that checks the licensing and shuts the application down if the license
>>>>>> does not agree with the cores speced.
>>>>>>
>>>>>> This is what it says
>>>>>>
>>>>>> ./cpuinfo
>>>>>> License hostid:        00e04c69159a 0050b60fd1e7
>>>>>> Detected 12 logical processor(s), 6 core(s), in 1 chip(s)
>>>>>>
>>>>>> So here I have 12 logical processors  and 6 cores and 1 chip. I call
>>>>>> logical processors as threads so I have 12 threads?
>>>>>>
>>>>>> Now if I go and start worker process
>>>>>> ${SPARK_HOME}/sbin/start-slaves.sh, I see this in GUI page
>>>>>>
>>>>>> <image.png>
>>>>>>
>>>>>> it says 12 cores but I gather it is threads?
>>>>>>
>>>>>> Spark document
>>>>>> <http://spark.apache.org/docs/latest/submitting-applications.html>
>>>>>> states and I quote
>>>>>>
>>>>>> <image.png>
>>>>>>
>>>>>>
>>>>>> OK the line local[k] adds  ..  *set this to the number of cores on
>>>>>> your machine*
>>>>>>
>>>>>> But I know that it means threads. Because if I went and set that to
>>>>>> 6, it would be only 6 threads as opposed to 12 threads.
>>>>>>
>>>>>> the next line local[*] seems to indicate it correctly as it refers to
>>>>>> "logical cores" that in my understanding it is threads.
>>>>>>
>>>>>> I trust that I am not nitpicking here!
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>>
>>>>>> Dr Mich Talebzadeh
>>>>>>
>>>>>>
>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>
>>>>>>
>>>>>> http://talebzadehmich.wordpress.com
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>>
>

Re: What is the interpretation of Cores in Spark doc

Posted by Mark Hamstra <ma...@clearstorydata.com>.
I mean only that hardware-level threads and the processor's scheduling of
those threads is only one segment of the total space of threads and thread
scheduling, and that saying things like cores have threads or only the core
schedules threads can be more confusing than helpful.

On Thu, Jun 16, 2016 at 11:33 AM, Mich Talebzadeh <mich.talebzadeh@gmail.com
> wrote:

> Well LOL
>
> Given a set of parameters one can argue from any angle.
>
> It is not obvious what you are trying to sate here? "It is not strictly
> true"  yeah OK
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 16 June 2016 at 19:07, Mark Hamstra <ma...@clearstorydata.com> wrote:
>
>> In addition, it is the core (not the OS) that determines when the thread
>>> is executed.
>>
>>
>> That's also not strictly true.  "Thread" is a concept that can exist at
>> multiple levels -- even concurrently at multiple levels for a single
>> running program.  Different entities will be responsible for scheduling the
>> execution of threads at these different levels, and the CPU is only in
>> direct control at the lowest level, that of so-called hardware threads.  Of
>> course, higher-level threads eventually need to be run as lower-level
>> hardware tasks, and the mappings between various types of application-level
>> threads and OS- and/or hardware-level threads can be complicated, but it is
>> still not helpful to think of the CPU as being the only entity responsible
>> for the scheduling of threads.
>>
>> On Thu, Jun 16, 2016 at 7:45 AM, Mich Talebzadeh <
>> mich.talebzadeh@gmail.com> wrote:
>>
>>> Thanks all.
>>>
>>> I think we are diverging but IMO it is a worthwhile discussion
>>>
>>> Actually, threads are a hardware implementation - hence the whole notion
>>> of “multi-threaded cores”.   What happens is that the cores often have
>>> duplicate registers, etc. for holding execution state.   While it is
>>> correct that only a single process is executing at a time, a single core
>>> will have execution states of multiple processes preserved in these
>>> registers. In addition, it is the core (not the OS) that determines when
>>> the thread is executed. The approach often varies according to the CPU
>>> manufacturer, but the most simple approach is when one thread of execution
>>> executes a multi-cycle operation (e.g. a fetch from main memory, etc.), the
>>> core simply stops processing that thread saves the execution state to a set
>>> of registers, loads instructions from the other set of registers and goes
>>> on.  On the Oracle SPARC chips, it will actually check the next thread to
>>> see if the reason it was ‘parked’ has completed and if not, skip it for the
>>> subsequent thread. The OS is only aware of what are cores and what are
>>> logical processors - and dispatches accordingly.  *Execution is up to
>>> the cores*. .
>>>
>>> Cheers
>>>
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>> On 16 June 2016 at 13:02, Robin East <ro...@xense.co.uk> wrote:
>>>
>>>> Mich
>>>>
>>>> >> A core may have one or more threads
>>>> It would be more accurate to say that a core could *run* one or more
>>>> threads scheduled for execution. Threads are a software/OS concept that
>>>> represent executable code that is scheduled to run by the OS; A CPU, core
>>>> or virtual core/virtual processor execute that code. Threads are not CPUs
>>>> or cores whether physical or logical - any Spark documentation that implies
>>>> this is mistaken. I’ve looked at the documentation you mention and I don’t
>>>> read it to mean that threads are logical processors.
>>>>
>>>> To go back to your original question, if you set local[6] and you have
>>>> 12 logical processors then you are likely to have half your CPU resources
>>>> unused by Spark.
>>>>
>>>>
>>>> On 15 Jun 2016, at 23:08, Mich Talebzadeh <mi...@gmail.com>
>>>> wrote:
>>>>
>>>> I think it is slightly more than that.
>>>>
>>>> These days  software is licensed by core (generally speaking).   That
>>>> is the physical processor.   * A core may have one or more threads -
>>>> or logical processors*. Virtualization adds some fun to the mix.
>>>> Generally what they present is ‘virtual processors’.   What that equates to
>>>> depends on the virtualization layer itself.   In some simpler VM’s - it is
>>>> virtual=logical.   In others, virtual=logical but they are constrained to
>>>> be from the same cores - e.g. if you get 6 virtual processors, it really is
>>>> 3 full cores with 2 threads each.   Rational is due to the way OS
>>>> dispatching works on ‘logical’ processors vs. cores and POSIX threaded
>>>> applications.
>>>>
>>>> HTH
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>>
>>>> On 13 June 2016 at 18:17, Mark Hamstra <ma...@clearstorydata.com> wrote:
>>>>
>>>>> I don't know what documentation you were referring to, but this is
>>>>> clearly an erroneous statement: "Threads are virtual cores."  At best it is
>>>>> terminology abuse by a hardware manufacturer.  Regardless, Spark can't get
>>>>> too concerned about how any particular hardware vendor wants to refer to
>>>>> the specific components of their CPU architecture.  For us, a core is a
>>>>> logical execution unit, something on which a thread of execution can run.
>>>>> That can map in different ways to different physical or virtual hardware.
>>>>>
>>>>> On Mon, Jun 13, 2016 at 12:02 AM, Mich Talebzadeh <
>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> It is not the issue of testing anything. I was referring to
>>>>>> documentation that clearly use the term "threads". As I said and showed
>>>>>> before, one line is using the term "thread" and the next one "logical
>>>>>> cores".
>>>>>>
>>>>>>
>>>>>> HTH
>>>>>>
>>>>>> Dr Mich Talebzadeh
>>>>>>
>>>>>>
>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>
>>>>>>
>>>>>> http://talebzadehmich.wordpress.com
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 12 June 2016 at 23:57, Daniel Darabos <
>>>>>> daniel.darabos@lynxanalytics.com> wrote:
>>>>>>
>>>>>>> Spark is a software product. In software a "core" is something that
>>>>>>> a process can run on. So it's a "virtual core". (Do not call these
>>>>>>> "threads". A "thread" is not something a process can run on.)
>>>>>>>
>>>>>>> local[*] uses java.lang.Runtime.availableProcessors()
>>>>>>> <https://github.com/apache/spark/blob/v1.6.1/core/src/main/scala/org/apache/spark/SparkContext.scala#L2608>.
>>>>>>> Since Java is software, this also returns the number of virtual cores. (You
>>>>>>> can test this easily.)
>>>>>>>
>>>>>>>
>>>>>>> On Sun, Jun 12, 2016 at 9:23 PM, Mich Talebzadeh <
>>>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I was writing some docs on Spark P&T and came across this.
>>>>>>>>
>>>>>>>> It is about the terminology or interpretation of that in Spark doc.
>>>>>>>>
>>>>>>>> This is my understanding of cores and threads.
>>>>>>>>
>>>>>>>>  Cores are physical cores. Threads are virtual cores. Cores with 2
>>>>>>>> threads is called hyper threading technology so 2 threads per core makes
>>>>>>>> the core work on two loads at same time. In other words, every thread takes
>>>>>>>> care of one load.
>>>>>>>>
>>>>>>>> Core has its own memory. So if you have a dual core with hyper
>>>>>>>> threading, the core works with 2 loads each at same time because of the 2
>>>>>>>> threads per core, but this 2 threads will share memory in that core.
>>>>>>>>
>>>>>>>> Some vendors as I am sure most of you aware charge licensing per
>>>>>>>> core.
>>>>>>>>
>>>>>>>> For example on the same host that I have Spark, I have a SAP
>>>>>>>> product that checks the licensing and shuts the application down if the
>>>>>>>> license does not agree with the cores speced.
>>>>>>>>
>>>>>>>> This is what it says
>>>>>>>>
>>>>>>>> ./cpuinfo
>>>>>>>> License hostid:        00e04c69159a 0050b60fd1e7
>>>>>>>> Detected 12 logical processor(s), 6 core(s), in 1 chip(s)
>>>>>>>>
>>>>>>>> So here I have 12 logical processors  and 6 cores and 1 chip. I
>>>>>>>> call logical processors as threads so I have 12 threads?
>>>>>>>>
>>>>>>>> Now if I go and start worker process
>>>>>>>> ${SPARK_HOME}/sbin/start-slaves.sh, I see this in GUI page
>>>>>>>>
>>>>>>>> <image.png>
>>>>>>>>
>>>>>>>> it says 12 cores but I gather it is threads?
>>>>>>>>
>>>>>>>> Spark document
>>>>>>>> <http://spark.apache.org/docs/latest/submitting-applications.html>
>>>>>>>> states and I quote
>>>>>>>>
>>>>>>>> <image.png>
>>>>>>>>
>>>>>>>>
>>>>>>>> OK the line local[k] adds  ..  *set this to the number of cores on
>>>>>>>> your machine*
>>>>>>>>
>>>>>>>> But I know that it means threads. Because if I went and set that to
>>>>>>>> 6, it would be only 6 threads as opposed to 12 threads.
>>>>>>>>
>>>>>>>> the next line local[*] seems to indicate it correctly as it refers
>>>>>>>> to "logical cores" that in my understanding it is threads.
>>>>>>>>
>>>>>>>> I trust that I am not nitpicking here!
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>>
>>>>>>>>
>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>
>>>>>>>>
>>>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>
>>>>>>>>
>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>

Re: What is the interpretation of Cores in Spark doc

Posted by Mich Talebzadeh <mi...@gmail.com>.
Well LOL

Given a set of parameters one can argue from any angle.

It is not obvious what you are trying to sate here? "It is not strictly
true"  yeah OK

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 16 June 2016 at 19:07, Mark Hamstra <ma...@clearstorydata.com> wrote:

> In addition, it is the core (not the OS) that determines when the thread
>> is executed.
>
>
> That's also not strictly true.  "Thread" is a concept that can exist at
> multiple levels -- even concurrently at multiple levels for a single
> running program.  Different entities will be responsible for scheduling the
> execution of threads at these different levels, and the CPU is only in
> direct control at the lowest level, that of so-called hardware threads.  Of
> course, higher-level threads eventually need to be run as lower-level
> hardware tasks, and the mappings between various types of application-level
> threads and OS- and/or hardware-level threads can be complicated, but it is
> still not helpful to think of the CPU as being the only entity responsible
> for the scheduling of threads.
>
> On Thu, Jun 16, 2016 at 7:45 AM, Mich Talebzadeh <
> mich.talebzadeh@gmail.com> wrote:
>
>> Thanks all.
>>
>> I think we are diverging but IMO it is a worthwhile discussion
>>
>> Actually, threads are a hardware implementation - hence the whole notion
>> of “multi-threaded cores”.   What happens is that the cores often have
>> duplicate registers, etc. for holding execution state.   While it is
>> correct that only a single process is executing at a time, a single core
>> will have execution states of multiple processes preserved in these
>> registers. In addition, it is the core (not the OS) that determines when
>> the thread is executed. The approach often varies according to the CPU
>> manufacturer, but the most simple approach is when one thread of execution
>> executes a multi-cycle operation (e.g. a fetch from main memory, etc.), the
>> core simply stops processing that thread saves the execution state to a set
>> of registers, loads instructions from the other set of registers and goes
>> on.  On the Oracle SPARC chips, it will actually check the next thread to
>> see if the reason it was ‘parked’ has completed and if not, skip it for the
>> subsequent thread. The OS is only aware of what are cores and what are
>> logical processors - and dispatches accordingly.  *Execution is up to
>> the cores*. .
>>
>> Cheers
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 16 June 2016 at 13:02, Robin East <ro...@xense.co.uk> wrote:
>>
>>> Mich
>>>
>>> >> A core may have one or more threads
>>> It would be more accurate to say that a core could *run* one or more
>>> threads scheduled for execution. Threads are a software/OS concept that
>>> represent executable code that is scheduled to run by the OS; A CPU, core
>>> or virtual core/virtual processor execute that code. Threads are not CPUs
>>> or cores whether physical or logical - any Spark documentation that implies
>>> this is mistaken. I’ve looked at the documentation you mention and I don’t
>>> read it to mean that threads are logical processors.
>>>
>>> To go back to your original question, if you set local[6] and you have
>>> 12 logical processors then you are likely to have half your CPU resources
>>> unused by Spark.
>>>
>>>
>>> On 15 Jun 2016, at 23:08, Mich Talebzadeh <mi...@gmail.com>
>>> wrote:
>>>
>>> I think it is slightly more than that.
>>>
>>> These days  software is licensed by core (generally speaking).   That
>>> is the physical processor.   * A core may have one or more threads - or
>>> logical processors*. Virtualization adds some fun to the mix.
>>> Generally what they present is ‘virtual processors’.   What that equates to
>>> depends on the virtualization layer itself.   In some simpler VM’s - it is
>>> virtual=logical.   In others, virtual=logical but they are constrained to
>>> be from the same cores - e.g. if you get 6 virtual processors, it really is
>>> 3 full cores with 2 threads each.   Rational is due to the way OS
>>> dispatching works on ‘logical’ processors vs. cores and POSIX threaded
>>> applications.
>>>
>>> HTH
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>> On 13 June 2016 at 18:17, Mark Hamstra <ma...@clearstorydata.com> wrote:
>>>
>>>> I don't know what documentation you were referring to, but this is
>>>> clearly an erroneous statement: "Threads are virtual cores."  At best it is
>>>> terminology abuse by a hardware manufacturer.  Regardless, Spark can't get
>>>> too concerned about how any particular hardware vendor wants to refer to
>>>> the specific components of their CPU architecture.  For us, a core is a
>>>> logical execution unit, something on which a thread of execution can run.
>>>> That can map in different ways to different physical or virtual hardware.
>>>>
>>>> On Mon, Jun 13, 2016 at 12:02 AM, Mich Talebzadeh <
>>>> mich.talebzadeh@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> It is not the issue of testing anything. I was referring to
>>>>> documentation that clearly use the term "threads". As I said and showed
>>>>> before, one line is using the term "thread" and the next one "logical
>>>>> cores".
>>>>>
>>>>>
>>>>> HTH
>>>>>
>>>>> Dr Mich Talebzadeh
>>>>>
>>>>>
>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>
>>>>>
>>>>> http://talebzadehmich.wordpress.com
>>>>>
>>>>>
>>>>>
>>>>> On 12 June 2016 at 23:57, Daniel Darabos <
>>>>> daniel.darabos@lynxanalytics.com> wrote:
>>>>>
>>>>>> Spark is a software product. In software a "core" is something that a
>>>>>> process can run on. So it's a "virtual core". (Do not call these "threads".
>>>>>> A "thread" is not something a process can run on.)
>>>>>>
>>>>>> local[*] uses java.lang.Runtime.availableProcessors()
>>>>>> <https://github.com/apache/spark/blob/v1.6.1/core/src/main/scala/org/apache/spark/SparkContext.scala#L2608>.
>>>>>> Since Java is software, this also returns the number of virtual cores. (You
>>>>>> can test this easily.)
>>>>>>
>>>>>>
>>>>>> On Sun, Jun 12, 2016 at 9:23 PM, Mich Talebzadeh <
>>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>>
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I was writing some docs on Spark P&T and came across this.
>>>>>>>
>>>>>>> It is about the terminology or interpretation of that in Spark doc.
>>>>>>>
>>>>>>> This is my understanding of cores and threads.
>>>>>>>
>>>>>>>  Cores are physical cores. Threads are virtual cores. Cores with 2
>>>>>>> threads is called hyper threading technology so 2 threads per core makes
>>>>>>> the core work on two loads at same time. In other words, every thread takes
>>>>>>> care of one load.
>>>>>>>
>>>>>>> Core has its own memory. So if you have a dual core with hyper
>>>>>>> threading, the core works with 2 loads each at same time because of the 2
>>>>>>> threads per core, but this 2 threads will share memory in that core.
>>>>>>>
>>>>>>> Some vendors as I am sure most of you aware charge licensing per
>>>>>>> core.
>>>>>>>
>>>>>>> For example on the same host that I have Spark, I have a SAP product
>>>>>>> that checks the licensing and shuts the application down if the license
>>>>>>> does not agree with the cores speced.
>>>>>>>
>>>>>>> This is what it says
>>>>>>>
>>>>>>> ./cpuinfo
>>>>>>> License hostid:        00e04c69159a 0050b60fd1e7
>>>>>>> Detected 12 logical processor(s), 6 core(s), in 1 chip(s)
>>>>>>>
>>>>>>> So here I have 12 logical processors  and 6 cores and 1 chip. I call
>>>>>>> logical processors as threads so I have 12 threads?
>>>>>>>
>>>>>>> Now if I go and start worker process
>>>>>>> ${SPARK_HOME}/sbin/start-slaves.sh, I see this in GUI page
>>>>>>>
>>>>>>> <image.png>
>>>>>>>
>>>>>>> it says 12 cores but I gather it is threads?
>>>>>>>
>>>>>>> Spark document
>>>>>>> <http://spark.apache.org/docs/latest/submitting-applications.html>
>>>>>>> states and I quote
>>>>>>>
>>>>>>> <image.png>
>>>>>>>
>>>>>>>
>>>>>>> OK the line local[k] adds  ..  *set this to the number of cores on
>>>>>>> your machine*
>>>>>>>
>>>>>>> But I know that it means threads. Because if I went and set that to
>>>>>>> 6, it would be only 6 threads as opposed to 12 threads.
>>>>>>>
>>>>>>> the next line local[*] seems to indicate it correctly as it refers
>>>>>>> to "logical cores" that in my understanding it is threads.
>>>>>>>
>>>>>>> I trust that I am not nitpicking here!
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>>
>>>>>>> Dr Mich Talebzadeh
>>>>>>>
>>>>>>>
>>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>
>>>>>>>
>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>
>

Re: What is the interpretation of Cores in Spark doc

Posted by Mark Hamstra <ma...@clearstorydata.com>.
>
> In addition, it is the core (not the OS) that determines when the thread
> is executed.


That's also not strictly true.  "Thread" is a concept that can exist at
multiple levels -- even concurrently at multiple levels for a single
running program.  Different entities will be responsible for scheduling the
execution of threads at these different levels, and the CPU is only in
direct control at the lowest level, that of so-called hardware threads.  Of
course, higher-level threads eventually need to be run as lower-level
hardware tasks, and the mappings between various types of application-level
threads and OS- and/or hardware-level threads can be complicated, but it is
still not helpful to think of the CPU as being the only entity responsible
for the scheduling of threads.

On Thu, Jun 16, 2016 at 7:45 AM, Mich Talebzadeh <mi...@gmail.com>
wrote:

> Thanks all.
>
> I think we are diverging but IMO it is a worthwhile discussion
>
> Actually, threads are a hardware implementation - hence the whole notion
> of “multi-threaded cores”.   What happens is that the cores often have
> duplicate registers, etc. for holding execution state.   While it is
> correct that only a single process is executing at a time, a single core
> will have execution states of multiple processes preserved in these
> registers. In addition, it is the core (not the OS) that determines when
> the thread is executed. The approach often varies according to the CPU
> manufacturer, but the most simple approach is when one thread of execution
> executes a multi-cycle operation (e.g. a fetch from main memory, etc.), the
> core simply stops processing that thread saves the execution state to a set
> of registers, loads instructions from the other set of registers and goes
> on.  On the Oracle SPARC chips, it will actually check the next thread to
> see if the reason it was ‘parked’ has completed and if not, skip it for the
> subsequent thread. The OS is only aware of what are cores and what are
> logical processors - and dispatches accordingly.  *Execution is up to the
> cores*. .
>
> Cheers
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 16 June 2016 at 13:02, Robin East <ro...@xense.co.uk> wrote:
>
>> Mich
>>
>> >> A core may have one or more threads
>> It would be more accurate to say that a core could *run* one or more
>> threads scheduled for execution. Threads are a software/OS concept that
>> represent executable code that is scheduled to run by the OS; A CPU, core
>> or virtual core/virtual processor execute that code. Threads are not CPUs
>> or cores whether physical or logical - any Spark documentation that implies
>> this is mistaken. I’ve looked at the documentation you mention and I don’t
>> read it to mean that threads are logical processors.
>>
>> To go back to your original question, if you set local[6] and you have 12
>> logical processors then you are likely to have half your CPU resources
>> unused by Spark.
>>
>>
>> On 15 Jun 2016, at 23:08, Mich Talebzadeh <mi...@gmail.com>
>> wrote:
>>
>> I think it is slightly more than that.
>>
>> These days  software is licensed by core (generally speaking).   That is
>> the physical processor.   * A core may have one or more threads - or
>> logical processors*. Virtualization adds some fun to the mix.
>> Generally what they present is ‘virtual processors’.   What that equates to
>> depends on the virtualization layer itself.   In some simpler VM’s - it is
>> virtual=logical.   In others, virtual=logical but they are constrained to
>> be from the same cores - e.g. if you get 6 virtual processors, it really is
>> 3 full cores with 2 threads each.   Rational is due to the way OS
>> dispatching works on ‘logical’ processors vs. cores and POSIX threaded
>> applications.
>>
>> HTH
>>
>> Dr Mich Talebzadeh
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 13 June 2016 at 18:17, Mark Hamstra <ma...@clearstorydata.com> wrote:
>>
>>> I don't know what documentation you were referring to, but this is
>>> clearly an erroneous statement: "Threads are virtual cores."  At best it is
>>> terminology abuse by a hardware manufacturer.  Regardless, Spark can't get
>>> too concerned about how any particular hardware vendor wants to refer to
>>> the specific components of their CPU architecture.  For us, a core is a
>>> logical execution unit, something on which a thread of execution can run.
>>> That can map in different ways to different physical or virtual hardware.
>>>
>>> On Mon, Jun 13, 2016 at 12:02 AM, Mich Talebzadeh <
>>> mich.talebzadeh@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> It is not the issue of testing anything. I was referring to
>>>> documentation that clearly use the term "threads". As I said and showed
>>>> before, one line is using the term "thread" and the next one "logical
>>>> cores".
>>>>
>>>>
>>>> HTH
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>>
>>>> On 12 June 2016 at 23:57, Daniel Darabos <
>>>> daniel.darabos@lynxanalytics.com> wrote:
>>>>
>>>>> Spark is a software product. In software a "core" is something that a
>>>>> process can run on. So it's a "virtual core". (Do not call these "threads".
>>>>> A "thread" is not something a process can run on.)
>>>>>
>>>>> local[*] uses java.lang.Runtime.availableProcessors()
>>>>> <https://github.com/apache/spark/blob/v1.6.1/core/src/main/scala/org/apache/spark/SparkContext.scala#L2608>.
>>>>> Since Java is software, this also returns the number of virtual cores. (You
>>>>> can test this easily.)
>>>>>
>>>>>
>>>>> On Sun, Jun 12, 2016 at 9:23 PM, Mich Talebzadeh <
>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I was writing some docs on Spark P&T and came across this.
>>>>>>
>>>>>> It is about the terminology or interpretation of that in Spark doc.
>>>>>>
>>>>>> This is my understanding of cores and threads.
>>>>>>
>>>>>>  Cores are physical cores. Threads are virtual cores. Cores with 2
>>>>>> threads is called hyper threading technology so 2 threads per core makes
>>>>>> the core work on two loads at same time. In other words, every thread takes
>>>>>> care of one load.
>>>>>>
>>>>>> Core has its own memory. So if you have a dual core with hyper
>>>>>> threading, the core works with 2 loads each at same time because of the 2
>>>>>> threads per core, but this 2 threads will share memory in that core.
>>>>>>
>>>>>> Some vendors as I am sure most of you aware charge licensing per core.
>>>>>>
>>>>>> For example on the same host that I have Spark, I have a SAP product
>>>>>> that checks the licensing and shuts the application down if the license
>>>>>> does not agree with the cores speced.
>>>>>>
>>>>>> This is what it says
>>>>>>
>>>>>> ./cpuinfo
>>>>>> License hostid:        00e04c69159a 0050b60fd1e7
>>>>>> Detected 12 logical processor(s), 6 core(s), in 1 chip(s)
>>>>>>
>>>>>> So here I have 12 logical processors  and 6 cores and 1 chip. I call
>>>>>> logical processors as threads so I have 12 threads?
>>>>>>
>>>>>> Now if I go and start worker process
>>>>>> ${SPARK_HOME}/sbin/start-slaves.sh, I see this in GUI page
>>>>>>
>>>>>> <image.png>
>>>>>>
>>>>>> it says 12 cores but I gather it is threads?
>>>>>>
>>>>>> Spark document
>>>>>> <http://spark.apache.org/docs/latest/submitting-applications.html>
>>>>>> states and I quote
>>>>>>
>>>>>> <image.png>
>>>>>>
>>>>>>
>>>>>> OK the line local[k] adds  ..  *set this to the number of cores on
>>>>>> your machine*
>>>>>>
>>>>>> But I know that it means threads. Because if I went and set that to
>>>>>> 6, it would be only 6 threads as opposed to 12 threads.
>>>>>>
>>>>>> the next line local[*] seems to indicate it correctly as it refers to
>>>>>> "logical cores" that in my understanding it is threads.
>>>>>>
>>>>>> I trust that I am not nitpicking here!
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>>
>>>>>> Dr Mich Talebzadeh
>>>>>>
>>>>>>
>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>
>>>>>>
>>>>>> http://talebzadehmich.wordpress.com
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>>
>

Re: What is the interpretation of Cores in Spark doc

Posted by Mich Talebzadeh <mi...@gmail.com>.
Thanks all.

I think we are diverging but IMO it is a worthwhile discussion

Actually, threads are a hardware implementation - hence the whole notion of
“multi-threaded cores”.   What happens is that the cores often have
duplicate registers, etc. for holding execution state.   While it is
correct that only a single process is executing at a time, a single core
will have execution states of multiple processes preserved in these
registers. In addition, it is the core (not the OS) that determines when
the thread is executed. The approach often varies according to the CPU
manufacturer, but the most simple approach is when one thread of execution
executes a multi-cycle operation (e.g. a fetch from main memory, etc.), the
core simply stops processing that thread saves the execution state to a set
of registers, loads instructions from the other set of registers and goes
on.  On the Oracle SPARC chips, it will actually check the next thread to
see if the reason it was ‘parked’ has completed and if not, skip it for the
subsequent thread. The OS is only aware of what are cores and what are
logical processors - and dispatches accordingly.  *Execution is up to the
cores*. .

Cheers



Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 16 June 2016 at 13:02, Robin East <ro...@xense.co.uk> wrote:

> Mich
>
> >> A core may have one or more threads
> It would be more accurate to say that a core could *run* one or more
> threads scheduled for execution. Threads are a software/OS concept that
> represent executable code that is scheduled to run by the OS; A CPU, core
> or virtual core/virtual processor execute that code. Threads are not CPUs
> or cores whether physical or logical - any Spark documentation that implies
> this is mistaken. I’ve looked at the documentation you mention and I don’t
> read it to mean that threads are logical processors.
>
> To go back to your original question, if you set local[6] and you have 12
> logical processors then you are likely to have half your CPU resources
> unused by Spark.
>
>
> On 15 Jun 2016, at 23:08, Mich Talebzadeh <mi...@gmail.com>
> wrote:
>
> I think it is slightly more than that.
>
> These days  software is licensed by core (generally speaking).   That is
> the physical processor.   * A core may have one or more threads - or
> logical processors*. Virtualization adds some fun to the mix.   Generally
> what they present is ‘virtual processors’.   What that equates to depends
> on the virtualization layer itself.   In some simpler VM’s - it is
> virtual=logical.   In others, virtual=logical but they are constrained to
> be from the same cores - e.g. if you get 6 virtual processors, it really is
> 3 full cores with 2 threads each.   Rational is due to the way OS
> dispatching works on ‘logical’ processors vs. cores and POSIX threaded
> applications.
>
> HTH
>
> Dr Mich Talebzadeh
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 13 June 2016 at 18:17, Mark Hamstra <ma...@clearstorydata.com> wrote:
>
>> I don't know what documentation you were referring to, but this is
>> clearly an erroneous statement: "Threads are virtual cores."  At best it is
>> terminology abuse by a hardware manufacturer.  Regardless, Spark can't get
>> too concerned about how any particular hardware vendor wants to refer to
>> the specific components of their CPU architecture.  For us, a core is a
>> logical execution unit, something on which a thread of execution can run.
>> That can map in different ways to different physical or virtual hardware.
>>
>> On Mon, Jun 13, 2016 at 12:02 AM, Mich Talebzadeh <
>> mich.talebzadeh@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> It is not the issue of testing anything. I was referring to
>>> documentation that clearly use the term "threads". As I said and showed
>>> before, one line is using the term "thread" and the next one "logical
>>> cores".
>>>
>>>
>>> HTH
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>> On 12 June 2016 at 23:57, Daniel Darabos <
>>> daniel.darabos@lynxanalytics.com> wrote:
>>>
>>>> Spark is a software product. In software a "core" is something that a
>>>> process can run on. So it's a "virtual core". (Do not call these "threads".
>>>> A "thread" is not something a process can run on.)
>>>>
>>>> local[*] uses java.lang.Runtime.availableProcessors()
>>>> <https://github.com/apache/spark/blob/v1.6.1/core/src/main/scala/org/apache/spark/SparkContext.scala#L2608>.
>>>> Since Java is software, this also returns the number of virtual cores. (You
>>>> can test this easily.)
>>>>
>>>>
>>>> On Sun, Jun 12, 2016 at 9:23 PM, Mich Talebzadeh <
>>>> mich.talebzadeh@gmail.com> wrote:
>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> I was writing some docs on Spark P&T and came across this.
>>>>>
>>>>> It is about the terminology or interpretation of that in Spark doc.
>>>>>
>>>>> This is my understanding of cores and threads.
>>>>>
>>>>>  Cores are physical cores. Threads are virtual cores. Cores with 2
>>>>> threads is called hyper threading technology so 2 threads per core makes
>>>>> the core work on two loads at same time. In other words, every thread takes
>>>>> care of one load.
>>>>>
>>>>> Core has its own memory. So if you have a dual core with hyper
>>>>> threading, the core works with 2 loads each at same time because of the 2
>>>>> threads per core, but this 2 threads will share memory in that core.
>>>>>
>>>>> Some vendors as I am sure most of you aware charge licensing per core.
>>>>>
>>>>> For example on the same host that I have Spark, I have a SAP product
>>>>> that checks the licensing and shuts the application down if the license
>>>>> does not agree with the cores speced.
>>>>>
>>>>> This is what it says
>>>>>
>>>>> ./cpuinfo
>>>>> License hostid:        00e04c69159a 0050b60fd1e7
>>>>> Detected 12 logical processor(s), 6 core(s), in 1 chip(s)
>>>>>
>>>>> So here I have 12 logical processors  and 6 cores and 1 chip. I call
>>>>> logical processors as threads so I have 12 threads?
>>>>>
>>>>> Now if I go and start worker process
>>>>> ${SPARK_HOME}/sbin/start-slaves.sh, I see this in GUI page
>>>>>
>>>>> <image.png>
>>>>>
>>>>> it says 12 cores but I gather it is threads?
>>>>>
>>>>> Spark document
>>>>> <http://spark.apache.org/docs/latest/submitting-applications.html>
>>>>> states and I quote
>>>>>
>>>>> <image.png>
>>>>>
>>>>>
>>>>> OK the line local[k] adds  ..  *set this to the number of cores on
>>>>> your machine*
>>>>>
>>>>> But I know that it means threads. Because if I went and set that to 6,
>>>>> it would be only 6 threads as opposed to 12 threads.
>>>>>
>>>>> the next line local[*] seems to indicate it correctly as it refers to
>>>>> "logical cores" that in my understanding it is threads.
>>>>>
>>>>> I trust that I am not nitpicking here!
>>>>>
>>>>> Cheers,
>>>>>
>>>>>
>>>>> Dr Mich Talebzadeh
>>>>>
>>>>>
>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>
>>>>>
>>>>> http://talebzadehmich.wordpress.com
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>
>

Re: What is the interpretation of Cores in Spark doc

Posted by Robin East <ro...@xense.co.uk>.
Mich

>> A core may have one or more threads
It would be more accurate to say that a core could run one or more threads scheduled for execution. Threads are a software/OS concept that represent executable code that is scheduled to run by the OS; A CPU, core or virtual core/virtual processor execute that code. Threads are not CPUs or cores whether physical or logical - any Spark documentation that implies this is mistaken. I’ve looked at the documentation you mention and I don’t read it to mean that threads are logical processors.

To go back to your original question, if you set local[6] and you have 12 logical processors then you are likely to have half your CPU resources unused by Spark.


> On 15 Jun 2016, at 23:08, Mich Talebzadeh <mi...@gmail.com> wrote:
> 
> I think it is slightly more than that.
> 
> These days  software is licensed by core (generally speaking).   That is the physical processor.    A core may have one or more threads - or logical processors. Virtualization adds some fun to the mix.   Generally what they present is ‘virtual processors’.   What that equates to depends on the virtualization layer itself.   In some simpler VM’s - it is virtual=logical.   In others, virtual=logical but they are constrained to be from the same cores - e.g. if you get 6 virtual processors, it really is 3 full cores with 2 threads each.   Rational is due to the way OS dispatching works on ‘logical’ processors vs. cores and POSIX threaded applications.
> 
> HTH
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>  
> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>  
> 
> On 13 June 2016 at 18:17, Mark Hamstra <mark@clearstorydata.com <ma...@clearstorydata.com>> wrote:
> I don't know what documentation you were referring to, but this is clearly an erroneous statement: "Threads are virtual cores."  At best it is terminology abuse by a hardware manufacturer.  Regardless, Spark can't get too concerned about how any particular hardware vendor wants to refer to the specific components of their CPU architecture.  For us, a core is a logical execution unit, something on which a thread of execution can run.  That can map in different ways to different physical or virtual hardware. 
> 
> On Mon, Jun 13, 2016 at 12:02 AM, Mich Talebzadeh <mich.talebzadeh@gmail.com <ma...@gmail.com>> wrote:
> Hi,
> 
> It is not the issue of testing anything. I was referring to documentation that clearly use the term "threads". As I said and showed before, one line is using the term "thread" and the next one "logical cores".
> 
> 
> HTH
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>  
> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>  
> 
> On 12 June 2016 at 23:57, Daniel Darabos <daniel.darabos@lynxanalytics.com <ma...@lynxanalytics.com>> wrote:
> Spark is a software product. In software a "core" is something that a process can run on. So it's a "virtual core". (Do not call these "threads". A "thread" is not something a process can run on.)
> 
> local[*] uses java.lang.Runtime.availableProcessors() <https://github.com/apache/spark/blob/v1.6.1/core/src/main/scala/org/apache/spark/SparkContext.scala#L2608>. Since Java is software, this also returns the number of virtual cores. (You can test this easily.)
> 
> 
> On Sun, Jun 12, 2016 at 9:23 PM, Mich Talebzadeh <mich.talebzadeh@gmail.com <ma...@gmail.com>> wrote:
> 
> Hi,
> 
> I was writing some docs on Spark P&T and came across this.
> 
> It is about the terminology or interpretation of that in Spark doc.
> 
> This is my understanding of cores and threads.
> 
>  Cores are physical cores. Threads are virtual cores. Cores with 2 threads is called hyper threading technology so 2 threads per core makes the core work on two loads at same time. In other words, every thread takes care of one load.
> 
> Core has its own memory. So if you have a dual core with hyper threading, the core works with 2 loads each at same time because of the 2 threads per core, but this 2 threads will share memory in that core.
> 
> Some vendors as I am sure most of you aware charge licensing per core.
> 
> For example on the same host that I have Spark, I have a SAP product that checks the licensing and shuts the application down if the license does not agree with the cores speced.
> 
> This is what it says
> 
> ./cpuinfo
> License hostid:        00e04c69159a 0050b60fd1e7
> Detected 12 logical processor(s), 6 core(s), in 1 chip(s)
> 
> So here I have 12 logical processors  and 6 cores and 1 chip. I call logical processors as threads so I have 12 threads?
> 
> Now if I go and start worker process ${SPARK_HOME}/sbin/start-slaves.sh, I see this in GUI page
> 
> <image.png>
> 
> it says 12 cores but I gather it is threads?
> 
> Spark document <http://spark.apache.org/docs/latest/submitting-applications.html> states and I quote
> 
> <image.png>
> 
> 
> OK the line local[k] adds  ..  set this to the number of cores on your machine
> 
> But I know that it means threads. Because if I went and set that to 6, it would be only 6 threads as opposed to 12 threads.
> 
> the next line local[*] seems to indicate it correctly as it refers to "logical cores" that in my understanding it is threads.
> 
> I trust that I am not nitpicking here!
> 
> Cheers,
> 
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>  
> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>  
> 
> 
> 
> 


Re: What is the interpretation of Cores in Spark doc

Posted by Mich Talebzadeh <mi...@gmail.com>.
I think it is slightly more than that.

These days  software is licensed by core (generally speaking).   That is
the physical processor.   * A core may have one or more threads - or
logical processors*. Virtualization adds some fun to the mix.   Generally
what they present is ‘virtual processors’.   What that equates to depends
on the virtualization layer itself.   In some simpler VM’s - it is
virtual=logical.   In others, virtual=logical but they are constrained to
be from the same cores - e.g. if you get 6 virtual processors, it really is
3 full cores with 2 threads each.   Rational is due to the way OS
dispatching works on ‘logical’ processors vs. cores and POSIX threaded
applications.

HTH

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 13 June 2016 at 18:17, Mark Hamstra <ma...@clearstorydata.com> wrote:

> I don't know what documentation you were referring to, but this is clearly
> an erroneous statement: "Threads are virtual cores."  At best it is
> terminology abuse by a hardware manufacturer.  Regardless, Spark can't get
> too concerned about how any particular hardware vendor wants to refer to
> the specific components of their CPU architecture.  For us, a core is a
> logical execution unit, something on which a thread of execution can run.
> That can map in different ways to different physical or virtual hardware.
>
> On Mon, Jun 13, 2016 at 12:02 AM, Mich Talebzadeh <
> mich.talebzadeh@gmail.com> wrote:
>
>> Hi,
>>
>> It is not the issue of testing anything. I was referring to documentation
>> that clearly use the term "threads". As I said and showed before, one line
>> is using the term "thread" and the next one "logical cores".
>>
>>
>> HTH
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 12 June 2016 at 23:57, Daniel Darabos <
>> daniel.darabos@lynxanalytics.com> wrote:
>>
>>> Spark is a software product. In software a "core" is something that a
>>> process can run on. So it's a "virtual core". (Do not call these "threads".
>>> A "thread" is not something a process can run on.)
>>>
>>> local[*] uses java.lang.Runtime.availableProcessors()
>>> <https://github.com/apache/spark/blob/v1.6.1/core/src/main/scala/org/apache/spark/SparkContext.scala#L2608>.
>>> Since Java is software, this also returns the number of virtual cores. (You
>>> can test this easily.)
>>>
>>>
>>> On Sun, Jun 12, 2016 at 9:23 PM, Mich Talebzadeh <
>>> mich.talebzadeh@gmail.com> wrote:
>>>
>>>>
>>>> Hi,
>>>>
>>>> I was writing some docs on Spark P&T and came across this.
>>>>
>>>> It is about the terminology or interpretation of that in Spark doc.
>>>>
>>>> This is my understanding of cores and threads.
>>>>
>>>>  Cores are physical cores. Threads are virtual cores. Cores with 2
>>>> threads is called hyper threading technology so 2 threads per core makes
>>>> the core work on two loads at same time. In other words, every thread takes
>>>> care of one load.
>>>>
>>>> Core has its own memory. So if you have a dual core with hyper
>>>> threading, the core works with 2 loads each at same time because of the 2
>>>> threads per core, but this 2 threads will share memory in that core.
>>>>
>>>> Some vendors as I am sure most of you aware charge licensing per core.
>>>>
>>>> For example on the same host that I have Spark, I have a SAP product
>>>> that checks the licensing and shuts the application down if the license
>>>> does not agree with the cores speced.
>>>>
>>>> This is what it says
>>>>
>>>> ./cpuinfo
>>>> License hostid:        00e04c69159a 0050b60fd1e7
>>>> Detected 12 logical processor(s), 6 core(s), in 1 chip(s)
>>>>
>>>> So here I have 12 logical processors  and 6 cores and 1 chip. I call
>>>> logical processors as threads so I have 12 threads?
>>>>
>>>> Now if I go and start worker process
>>>> ${SPARK_HOME}/sbin/start-slaves.sh, I see this in GUI page
>>>>
>>>> [image: Inline images 1]
>>>>
>>>> it says 12 cores but I gather it is threads?
>>>>
>>>>
>>>> Spark document
>>>> <http://spark.apache.org/docs/latest/submitting-applications.html>
>>>> states and I quote
>>>>
>>>>
>>>> [image: Inline images 2]
>>>>
>>>>
>>>>
>>>> OK the line local[k] adds  ..  *set this to the number of cores on
>>>> your machine*
>>>>
>>>>
>>>> But I know that it means threads. Because if I went and set that to 6,
>>>> it would be only 6 threads as opposed to 12 threads.
>>>>
>>>>
>>>> the next line local[*] seems to indicate it correctly as it refers to
>>>> "logical cores" that in my understanding it is threads.
>>>>
>>>>
>>>> I trust that I am not nitpicking here!
>>>>
>>>>
>>>> Cheers,
>>>>
>>>>
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>>
>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>>
>>>
>>>
>>
>

Re: What is the interpretation of Cores in Spark doc

Posted by Mark Hamstra <ma...@clearstorydata.com>.
I don't know what documentation you were referring to, but this is clearly
an erroneous statement: "Threads are virtual cores."  At best it is
terminology abuse by a hardware manufacturer.  Regardless, Spark can't get
too concerned about how any particular hardware vendor wants to refer to
the specific components of their CPU architecture.  For us, a core is a
logical execution unit, something on which a thread of execution can run.
That can map in different ways to different physical or virtual hardware.

On Mon, Jun 13, 2016 at 12:02 AM, Mich Talebzadeh <mich.talebzadeh@gmail.com
> wrote:

> Hi,
>
> It is not the issue of testing anything. I was referring to documentation
> that clearly use the term "threads". As I said and showed before, one line
> is using the term "thread" and the next one "logical cores".
>
>
> HTH
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 12 June 2016 at 23:57, Daniel Darabos <daniel.darabos@lynxanalytics.com
> > wrote:
>
>> Spark is a software product. In software a "core" is something that a
>> process can run on. So it's a "virtual core". (Do not call these "threads".
>> A "thread" is not something a process can run on.)
>>
>> local[*] uses java.lang.Runtime.availableProcessors()
>> <https://github.com/apache/spark/blob/v1.6.1/core/src/main/scala/org/apache/spark/SparkContext.scala#L2608>.
>> Since Java is software, this also returns the number of virtual cores. (You
>> can test this easily.)
>>
>>
>> On Sun, Jun 12, 2016 at 9:23 PM, Mich Talebzadeh <
>> mich.talebzadeh@gmail.com> wrote:
>>
>>>
>>> Hi,
>>>
>>> I was writing some docs on Spark P&T and came across this.
>>>
>>> It is about the terminology or interpretation of that in Spark doc.
>>>
>>> This is my understanding of cores and threads.
>>>
>>>  Cores are physical cores. Threads are virtual cores. Cores with 2
>>> threads is called hyper threading technology so 2 threads per core makes
>>> the core work on two loads at same time. In other words, every thread takes
>>> care of one load.
>>>
>>> Core has its own memory. So if you have a dual core with hyper
>>> threading, the core works with 2 loads each at same time because of the 2
>>> threads per core, but this 2 threads will share memory in that core.
>>>
>>> Some vendors as I am sure most of you aware charge licensing per core.
>>>
>>> For example on the same host that I have Spark, I have a SAP product
>>> that checks the licensing and shuts the application down if the license
>>> does not agree with the cores speced.
>>>
>>> This is what it says
>>>
>>> ./cpuinfo
>>> License hostid:        00e04c69159a 0050b60fd1e7
>>> Detected 12 logical processor(s), 6 core(s), in 1 chip(s)
>>>
>>> So here I have 12 logical processors  and 6 cores and 1 chip. I call
>>> logical processors as threads so I have 12 threads?
>>>
>>> Now if I go and start worker process ${SPARK_HOME}/sbin/start-slaves.sh,
>>> I see this in GUI page
>>>
>>> [image: Inline images 1]
>>>
>>> it says 12 cores but I gather it is threads?
>>>
>>>
>>> Spark document
>>> <http://spark.apache.org/docs/latest/submitting-applications.html>
>>> states and I quote
>>>
>>>
>>> [image: Inline images 2]
>>>
>>>
>>>
>>> OK the line local[k] adds  ..  *set this to the number of cores on your
>>> machine*
>>>
>>>
>>> But I know that it means threads. Because if I went and set that to 6,
>>> it would be only 6 threads as opposed to 12 threads.
>>>
>>>
>>> the next line local[*] seems to indicate it correctly as it refers to
>>> "logical cores" that in my understanding it is threads.
>>>
>>>
>>> I trust that I am not nitpicking here!
>>>
>>>
>>> Cheers,
>>>
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>>
>>
>>
>

Re: What is the interpretation of Cores in Spark doc

Posted by Mich Talebzadeh <mi...@gmail.com>.
Hi,

It is not the issue of testing anything. I was referring to documentation
that clearly use the term "threads". As I said and showed before, one line
is using the term "thread" and the next one "logical cores".


HTH

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 12 June 2016 at 23:57, Daniel Darabos <da...@lynxanalytics.com>
wrote:

> Spark is a software product. In software a "core" is something that a
> process can run on. So it's a "virtual core". (Do not call these "threads".
> A "thread" is not something a process can run on.)
>
> local[*] uses java.lang.Runtime.availableProcessors()
> <https://github.com/apache/spark/blob/v1.6.1/core/src/main/scala/org/apache/spark/SparkContext.scala#L2608>.
> Since Java is software, this also returns the number of virtual cores. (You
> can test this easily.)
>
>
> On Sun, Jun 12, 2016 at 9:23 PM, Mich Talebzadeh <
> mich.talebzadeh@gmail.com> wrote:
>
>>
>> Hi,
>>
>> I was writing some docs on Spark P&T and came across this.
>>
>> It is about the terminology or interpretation of that in Spark doc.
>>
>> This is my understanding of cores and threads.
>>
>>  Cores are physical cores. Threads are virtual cores. Cores with 2
>> threads is called hyper threading technology so 2 threads per core makes
>> the core work on two loads at same time. In other words, every thread takes
>> care of one load.
>>
>> Core has its own memory. So if you have a dual core with hyper threading,
>> the core works with 2 loads each at same time because of the 2 threads per
>> core, but this 2 threads will share memory in that core.
>>
>> Some vendors as I am sure most of you aware charge licensing per core.
>>
>> For example on the same host that I have Spark, I have a SAP product that
>> checks the licensing and shuts the application down if the license does not
>> agree with the cores speced.
>>
>> This is what it says
>>
>> ./cpuinfo
>> License hostid:        00e04c69159a 0050b60fd1e7
>> Detected 12 logical processor(s), 6 core(s), in 1 chip(s)
>>
>> So here I have 12 logical processors  and 6 cores and 1 chip. I call
>> logical processors as threads so I have 12 threads?
>>
>> Now if I go and start worker process ${SPARK_HOME}/sbin/start-slaves.sh,
>> I see this in GUI page
>>
>> [image: Inline images 1]
>>
>> it says 12 cores but I gather it is threads?
>>
>>
>> Spark document
>> <http://spark.apache.org/docs/latest/submitting-applications.html>
>> states and I quote
>>
>>
>> [image: Inline images 2]
>>
>>
>>
>> OK the line local[k] adds  ..  *set this to the number of cores on your
>> machine*
>>
>>
>> But I know that it means threads. Because if I went and set that to 6, it
>> would be only 6 threads as opposed to 12 threads.
>>
>>
>> the next line local[*] seems to indicate it correctly as it refers to
>> "logical cores" that in my understanding it is threads.
>>
>>
>> I trust that I am not nitpicking here!
>>
>>
>> Cheers,
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>
>

Re: What is the interpretation of Cores in Spark doc

Posted by Daniel Darabos <da...@lynxanalytics.com>.
Spark is a software product. In software a "core" is something that a
process can run on. So it's a "virtual core". (Do not call these "threads".
A "thread" is not something a process can run on.)

local[*] uses java.lang.Runtime.availableProcessors()
<https://github.com/apache/spark/blob/v1.6.1/core/src/main/scala/org/apache/spark/SparkContext.scala#L2608>.
Since Java is software, this also returns the number of virtual cores. (You
can test this easily.)


On Sun, Jun 12, 2016 at 9:23 PM, Mich Talebzadeh <mi...@gmail.com>
wrote:

>
> Hi,
>
> I was writing some docs on Spark P&T and came across this.
>
> It is about the terminology or interpretation of that in Spark doc.
>
> This is my understanding of cores and threads.
>
>  Cores are physical cores. Threads are virtual cores. Cores with 2 threads
> is called hyper threading technology so 2 threads per core makes the core
> work on two loads at same time. In other words, every thread takes care of
> one load.
>
> Core has its own memory. So if you have a dual core with hyper threading,
> the core works with 2 loads each at same time because of the 2 threads per
> core, but this 2 threads will share memory in that core.
>
> Some vendors as I am sure most of you aware charge licensing per core.
>
> For example on the same host that I have Spark, I have a SAP product that
> checks the licensing and shuts the application down if the license does not
> agree with the cores speced.
>
> This is what it says
>
> ./cpuinfo
> License hostid:        00e04c69159a 0050b60fd1e7
> Detected 12 logical processor(s), 6 core(s), in 1 chip(s)
>
> So here I have 12 logical processors  and 6 cores and 1 chip. I call
> logical processors as threads so I have 12 threads?
>
> Now if I go and start worker process ${SPARK_HOME}/sbin/start-slaves.sh, I
> see this in GUI page
>
> [image: Inline images 1]
>
> it says 12 cores but I gather it is threads?
>
>
> Spark document
> <http://spark.apache.org/docs/latest/submitting-applications.html> states
> and I quote
>
>
> [image: Inline images 2]
>
>
>
> OK the line local[k] adds  ..  *set this to the number of cores on your
> machine*
>
>
> But I know that it means threads. Because if I went and set that to 6, it
> would be only 6 threads as opposed to 12 threads.
>
>
> the next line local[*] seems to indicate it correctly as it refers to
> "logical cores" that in my understanding it is threads.
>
>
> I trust that I am not nitpicking here!
>
>
> Cheers,
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>