You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Praveen Sripati <pr...@gmail.com> on 2014/11/26 13:24:02 UTC

Number of executors and tasks

Hi,

I am running Spark in the stand alone mode.

1) I have a file of 286MB in HDFS (block size is 64MB) and so is split into
5 blocks. When I have the file in HDFS, 5 tasks are generated and so 5
files in the output. My understanding is that there will be a separate
partition for each block and there will be a separate task for each
partition. This makes sense why I see 5 files in the output.

When I put the same file in local file system (not HDFS), I see 9 files in
the output. I am curious why it is 9?

2) With the file in HDFS and local file system, I see a single
CoarseGrainedExecutorBackend when I run the jps command. Why is it one
executor process and how do we configure the number of executor process?

Thanks,
Praveen

Re: Number of executors and tasks

Posted by Akhil Das <ak...@sigmoidanalytics.com>.

This one would give you a better understanding
http://stackoverflow.com/questions/24622108/apache-spark-the-number-of-cores-vs-the-number-of-executors

Thanks
Best Regards

On Wed, Nov 26, 2014 at 10:32 PM, Akhil Das <ak...@sigmoidanalytics.com>
wrote:

> 1. On HDFS files are treated as ~64mb in block size. When you put the same
> file in local file system (ext3/ext4) it will be treated as different (in
> your case it looks like ~32mb) and that's why you are seeing 9 output files.
>
> 2. You could set *num-executors *to increase the number of executor
> processes.
>
> Thanks
> Best Regards
>
> On Wed, Nov 26, 2014 at 5:54 PM, Praveen Sripati <praveensripati@gmail.com
> > wrote:
>
>> Hi,
>>
>> I am running Spark in the stand alone mode.
>>
>> 1) I have a file of 286MB in HDFS (block size is 64MB) and so is split
>> into 5 blocks. When I have the file in HDFS, 5 tasks are generated and so 5
>> files in the output. My understanding is that there will be a separate
>> partition for each block and there will be a separate task for each
>> partition. This makes sense why I see 5 files in the output.
>>
>> When I put the same file in local file system (not HDFS), I see 9 files
>> in the output. I am curious why it is 9?
>>
>> 2) With the file in HDFS and local file system, I see a single
>> CoarseGrainedExecutorBackend when I run the jps command. Why is it one
>> executor process and how do we configure the number of executor process?
>>
>> Thanks,
>> Praveen
>>
>
>

Re: Number of executors and tasks

Posted by Akhil Das <ak...@sigmoidanalytics.com>.

1. On HDFS files are treated as ~64mb in block size. When you put the same
file in local file system (ext3/ext4) it will be treated as different (in
your case it looks like ~32mb) and that's why you are seeing 9 output files.

2. You could set *num-executors *to increase the number of executor
processes.

Thanks
Best Regards

On Wed, Nov 26, 2014 at 5:54 PM, Praveen Sripati <pr...@gmail.com>
wrote:

> Hi,
>
> I am running Spark in the stand alone mode.
>
> 1) I have a file of 286MB in HDFS (block size is 64MB) and so is split
> into 5 blocks. When I have the file in HDFS, 5 tasks are generated and so 5
> files in the output. My understanding is that there will be a separate
> partition for each block and there will be a separate task for each
> partition. This makes sense why I see 5 files in the output.
>
> When I put the same file in local file system (not HDFS), I see 9 files in
> the output. I am curious why it is 9?
>
> 2) With the file in HDFS and local file system, I see a single
> CoarseGrainedExecutorBackend when I run the jps command. Why is it one
> executor process and how do we configure the number of executor process?
>
> Thanks,
> Praveen
>