You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Richard Hanson <rh...@mailbox.org> on 2017/04/17 10:18:08 UTC

Spark-shell's performance

I am playing with some data using (stand alone) spark-shell (Spark version 1.6.0) by executing `spark-shell`. The flow is simple; a bit like cp - basically moving local 100k files (the max size is 190k) to S3. Memory is configured as below


export SPARK_DRIVER_MEMORY=8192M
export SPARK_WORKER_CORES=1
export SPARK_WORKER_MEMORY=8192M
export SPARK_EXECUTOR_CORES=4
export SPARK_EXECUTOR_MEMORY=2048M


But total time spent on moving those files to S3 took roughly 30 mins. The resident memory I found is roughly 3.820g (checking with top -p <pid>). This seems to me there are still rooms to speed it up, though this is only for testing purpose. So I would like to know if any other parameters I can change to improve spark-shell's performance? Is the memory setup above correct? 


Thanks.

Re: Spark-shell's performance

Posted by "颜发才 (Yan Facai)" <fa...@gmail.com>.

Hi, Hanson.
Perhaps I’m digressing here.
If I'm wrong or mistake, please correct me.

SPARK_WORKER_* is the configuration for whole cluster, and it's fine to
write those global variable in spark-env.sh.
However,
SPARK_DRIVER_* and SPARK_EXECUTOR_* is the configuration for application
(your code), perhaps it's better to pass the argument to spark-shell
directly, like:
```bash
spark-shell --driver-memory 8G --executor-cores 4 --executor-memory 2G
```

Tuning the configuration for application is a good start, and passing them
to spark-shell directly is easier to test.

For more details see:
+ `spark-shell -h`
+ http://spark.apache.org/docs/latest/submitting-applications.html
+ http://spark.apache.org/docs/latest/spark-standalone.html

On Mon, Apr 17, 2017 at 6:18 PM, Richard Hanson <rh...@mailbox.org> wrote:

> I am playing with some data using (stand alone) spark-shell (Spark version
> 1.6.0) by executing `spark-shell`. The flow is simple; a bit like cp -
> basically moving local 100k files (the max size is 190k) to S3. Memory is
> configured as below
>
>
> export SPARK_DRIVER_MEMORY=8192M
> export SPARK_WORKER_CORES=1
> export SPARK_WORKER_MEMORY=8192M
> export SPARK_EXECUTOR_CORES=4
> export SPARK_EXECUTOR_MEMORY=2048M
>
>
> But total time spent on moving those files to S3 took roughly 30 mins. The
> resident memory I found is roughly 3.820g (checking with top -p <pid>).
> This seems to me there are still rooms to speed it up, though this is only
> for testing purpose. So I would like to know if any other parameters I can
> change to improve spark-shell's performance? Is the memory setup above
> correct?
>
>
> Thanks.
>