You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Jia Zou <ja...@gmail.com> on 2016/01/22 05:05:30 UTC

Spark partition size tuning

Dear all!

When using Spark to read from local file system, the default partition size
is 32MB, how can I increase the partition size to 128MB, to reduce the
number of tasks?

Thank you very much!

Best Regards,
Jia

Re: Spark partition size tuning

Posted by Pavel Plotnikov <pa...@team.wrike.com>.
Hi,
May be *sc.hadoopConfiguration.setInt( "dfs.blocksize", blockSize ) *helps
you

Best Regards,
Pavel

On Tue, Jan 26, 2016 at 7:13 AM Jia Zou <ja...@gmail.com> wrote:

> Dear all,
>
> First to update that the local file system data partition size can be
> tuned by:
> sc.hadoopConfiguration().setLong("fs.local.block.size", blocksize)
>
> However, I also need to tune Spark data partition size for input data that
> is stored in Tachyon (default is 512MB), but above method can't work for
> Tachyon data.
>
> Do you have any suggestions? Thanks very much!
>
> Best Regards,
> Jia
>
>
> ---------- Forwarded message ----------
> From: Jia Zou <ja...@gmail.com>
> Date: Thu, Jan 21, 2016 at 10:05 PM
> Subject: Spark partition size tuning
> To: "user @spark" <us...@spark.apache.org>
>
>
> Dear all!
>
> When using Spark to read from local file system, the default partition
> size is 32MB, how can I increase the partition size to 128MB, to reduce the
> number of tasks?
>
> Thank you very much!
>
> Best Regards,
> Jia
>
>

Re: Spark partition size tuning

Posted by Jia Zou <ja...@gmail.com>.
Hi, Gene,

Thanks for your suggestion.
However, even if I set tachyon.user.block.size.bytes=134217728, and I can
see that from the web console, the files that I load to Tachyon via
copyToLocal, still has 512MB block size.
Do you have more suggestions?

Best Regards,
Jia

On Tue, Jan 26, 2016 at 11:46 PM, Gene Pang <ge...@gmail.com> wrote:

> Hi Jia,
>
> If you want to change the Tachyon block size, you can set the
> tachyon.user.block.size.bytes.default parameter (
> http://tachyon-project.org/documentation/Configuration-Settings.html).
> You can set it via extraJavaOptions per job, or adding it to
> tachyon-site.properties.
>
> I hope that helps,
> Gene
>
> On Mon, Jan 25, 2016 at 8:13 PM, Jia Zou <ja...@gmail.com> wrote:
>
>> Dear all,
>>
>> First to update that the local file system data partition size can be
>> tuned by:
>> sc.hadoopConfiguration().setLong("fs.local.block.size", blocksize)
>>
>> However, I also need to tune Spark data partition size for input data
>> that is stored in Tachyon (default is 512MB), but above method can't work
>> for Tachyon data.
>>
>> Do you have any suggestions? Thanks very much!
>>
>> Best Regards,
>> Jia
>>
>>
>> ---------- Forwarded message ----------
>> From: Jia Zou <ja...@gmail.com>
>> Date: Thu, Jan 21, 2016 at 10:05 PM
>> Subject: Spark partition size tuning
>> To: "user @spark" <us...@spark.apache.org>
>>
>>
>> Dear all!
>>
>> When using Spark to read from local file system, the default partition
>> size is 32MB, how can I increase the partition size to 128MB, to reduce the
>> number of tasks?
>>
>> Thank you very much!
>>
>> Best Regards,
>> Jia
>>
>>
>

Re: Spark partition size tuning

Posted by Gene Pang <ge...@gmail.com>.
Hi Jia,

If you want to change the Tachyon block size, you can set the
tachyon.user.block.size.bytes.default parameter (
http://tachyon-project.org/documentation/Configuration-Settings.html). You
can set it via extraJavaOptions per job, or adding it to
tachyon-site.properties.

I hope that helps,
Gene

On Mon, Jan 25, 2016 at 8:13 PM, Jia Zou <ja...@gmail.com> wrote:

> Dear all,
>
> First to update that the local file system data partition size can be
> tuned by:
> sc.hadoopConfiguration().setLong("fs.local.block.size", blocksize)
>
> However, I also need to tune Spark data partition size for input data that
> is stored in Tachyon (default is 512MB), but above method can't work for
> Tachyon data.
>
> Do you have any suggestions? Thanks very much!
>
> Best Regards,
> Jia
>
>
> ---------- Forwarded message ----------
> From: Jia Zou <ja...@gmail.com>
> Date: Thu, Jan 21, 2016 at 10:05 PM
> Subject: Spark partition size tuning
> To: "user @spark" <us...@spark.apache.org>
>
>
> Dear all!
>
> When using Spark to read from local file system, the default partition
> size is 32MB, how can I increase the partition size to 128MB, to reduce the
> number of tasks?
>
> Thank you very much!
>
> Best Regards,
> Jia
>
>

Fwd: Spark partition size tuning

Posted by Jia Zou <ja...@gmail.com>.
Dear all,

First to update that the local file system data partition size can be tuned
by:
sc.hadoopConfiguration().setLong("fs.local.block.size", blocksize)

However, I also need to tune Spark data partition size for input data that
is stored in Tachyon (default is 512MB), but above method can't work for
Tachyon data.

Do you have any suggestions? Thanks very much!

Best Regards,
Jia


---------- Forwarded message ----------
From: Jia Zou <ja...@gmail.com>
Date: Thu, Jan 21, 2016 at 10:05 PM
Subject: Spark partition size tuning
To: "user @spark" <us...@spark.apache.org>


Dear all!

When using Spark to read from local file system, the default partition size
is 32MB, how can I increase the partition size to 128MB, to reduce the
number of tasks?

Thank you very much!

Best Regards,
Jia