You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by "Mail.com" <pr...@mail.com> on 2016/07/26 00:18:49 UTC
Num of executors and cores
Hi All,
I have a directory which has 12 files. I want to read the entire file so I am reading it as wholeTextFiles(dirpath, numPartitions).
I run spark-submit as <all other stuff> --num-executors 12 --executor-cores 1 and numPartitions 12.
However, when I run the job I see that the stage which reads the directory has only 8 tasks. So some task reads more than one file and takes twice the time.
What can I do that the files are read by 12 tasks I.e one file per task.
Thanks,
Pradeep
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org
Re: Num of executors and cores
Posted by "Mail.com" <pr...@mail.com>.
Hi,
In spark submit, I specify --master yarn-client.
When I go to executors in UI I do see all the 12 different executors assigned. But for the stage when I drill down to Tasks I saw only 8 tasks with index 0-7.
I ran again increasing the number of executors as 15 and I now see 12 tasks for the stage.
Still like to understand even if 12 executors were available why there was only 8 tasks for the stage.
Thanks,
Pradeep
> On Jul 26, 2016, at 8:46 AM, Jacek Laskowski <ja...@japila.pl> wrote:
>
> Hi,
>
> Where's this yarn-client mode specified? When you said "However, when
> I run the job I see that the stage which reads the directory has only
> 8 tasks." -- how do you see 8 tasks for a stage? It appears you're in
> local[*] mode on a 8-core machine (like me) and that's why I'm asking
> such basic questions.
>
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
>> On Tue, Jul 26, 2016 at 2:39 PM, Mail.com <pr...@mail.com> wrote:
>> More of jars and files and app name. It runs on yarn-client mode.
>>
>> Thanks,
>> Pradeep
>>
>>> On Jul 26, 2016, at 7:10 AM, Jacek Laskowski <ja...@japila.pl> wrote:
>>>
>>> Hi,
>>>
>>> What's "<all other stuff>"? What master URL do you use?
>>>
>>> Pozdrawiam,
>>> Jacek Laskowski
>>> ----
>>> https://medium.com/@jaceklaskowski/
>>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>>> Follow me at https://twitter.com/jaceklaskowski
>>>
>>>
>>>> On Tue, Jul 26, 2016 at 2:18 AM, Mail.com <pr...@mail.com> wrote:
>>>> Hi All,
>>>>
>>>> I have a directory which has 12 files. I want to read the entire file so I am reading it as wholeTextFiles(dirpath, numPartitions).
>>>>
>>>> I run spark-submit as <all other stuff> --num-executors 12 --executor-cores 1 and numPartitions 12.
>>>>
>>>> However, when I run the job I see that the stage which reads the directory has only 8 tasks. So some task reads more than one file and takes twice the time.
>>>>
>>>> What can I do that the files are read by 12 tasks I.e one file per task.
>>>>
>>>> Thanks,
>>>> Pradeep
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org
Re: Num of executors and cores
Posted by Jacek Laskowski <ja...@japila.pl>.
Hi,
Where's this yarn-client mode specified? When you said "However, when
I run the job I see that the stage which reads the directory has only
8 tasks." -- how do you see 8 tasks for a stage? It appears you're in
local[*] mode on a 8-core machine (like me) and that's why I'm asking
such basic questions.
Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski
On Tue, Jul 26, 2016 at 2:39 PM, Mail.com <pr...@mail.com> wrote:
> More of jars and files and app name. It runs on yarn-client mode.
>
> Thanks,
> Pradeep
>
>> On Jul 26, 2016, at 7:10 AM, Jacek Laskowski <ja...@japila.pl> wrote:
>>
>> Hi,
>>
>> What's "<all other stuff>"? What master URL do you use?
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> ----
>> https://medium.com/@jaceklaskowski/
>> Mastering Apache Spark http://bit.ly/mastering-apache-spark
>> Follow me at https://twitter.com/jaceklaskowski
>>
>>
>>> On Tue, Jul 26, 2016 at 2:18 AM, Mail.com <pr...@mail.com> wrote:
>>> Hi All,
>>>
>>> I have a directory which has 12 files. I want to read the entire file so I am reading it as wholeTextFiles(dirpath, numPartitions).
>>>
>>> I run spark-submit as <all other stuff> --num-executors 12 --executor-cores 1 and numPartitions 12.
>>>
>>> However, when I run the job I see that the stage which reads the directory has only 8 tasks. So some task reads more than one file and takes twice the time.
>>>
>>> What can I do that the files are read by 12 tasks I.e one file per task.
>>>
>>> Thanks,
>>> Pradeep
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>
>
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org
Re: Num of executors and cores
Posted by "Mail.com" <pr...@mail.com>.
More of jars and files and app name. It runs on yarn-client mode.
Thanks,
Pradeep
> On Jul 26, 2016, at 7:10 AM, Jacek Laskowski <ja...@japila.pl> wrote:
>
> Hi,
>
> What's "<all other stuff>"? What master URL do you use?
>
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
>> On Tue, Jul 26, 2016 at 2:18 AM, Mail.com <pr...@mail.com> wrote:
>> Hi All,
>>
>> I have a directory which has 12 files. I want to read the entire file so I am reading it as wholeTextFiles(dirpath, numPartitions).
>>
>> I run spark-submit as <all other stuff> --num-executors 12 --executor-cores 1 and numPartitions 12.
>>
>> However, when I run the job I see that the stage which reads the directory has only 8 tasks. So some task reads more than one file and takes twice the time.
>>
>> What can I do that the files are read by 12 tasks I.e one file per task.
>>
>> Thanks,
>> Pradeep
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org
Re: Num of executors and cores
Posted by Jacek Laskowski <ja...@japila.pl>.
Hi,
What's "<all other stuff>"? What master URL do you use?
Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski
On Tue, Jul 26, 2016 at 2:18 AM, Mail.com <pr...@mail.com> wrote:
> Hi All,
>
> I have a directory which has 12 files. I want to read the entire file so I am reading it as wholeTextFiles(dirpath, numPartitions).
>
> I run spark-submit as <all other stuff> --num-executors 12 --executor-cores 1 and numPartitions 12.
>
> However, when I run the job I see that the stage which reads the directory has only 8 tasks. So some task reads more than one file and takes twice the time.
>
> What can I do that the files are read by 12 tasks I.e one file per task.
>
> Thanks,
> Pradeep
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org