You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Aliaksei Litouka <al...@gmail.com> on 2014/06/10 23:38:10 UTC

How to specify executor memory in EC2 ?

I am testing my application in EC2 cluster of m3.medium machines. By
default, only 512 MB of memory on each machine is used. I want to increase
this amount and I'm trying to do it by passing --executor-memory 2G option
to the spark-submit script, but it doesn't seem to work - each machine uses
only 512 MB instead of 2 gigabytes. What am I doing wrong? How do I
increase the amount of memory?

Re: How to specify executor memory in EC2 ?

Posted by Aliaksei Litouka <al...@gmail.com>.

Aaron,
spark.executor.memory is set to 2454m in my spark-defaults.conf, which is a
reasonable value for EC2 instances which I use (they are m3.medium
machines). However, it doesn't help and each executor uses only 512 MB of
memory. To figure out why, I examined spark-submit and spark-class scripts
and found that java options and java memory size are computed in the
spark-class script (see OUR_JAVA_OPTS and OUR_JAVA_MEM variables in that
script). Then these values are used to compose the following string:

JAVA_OPTS="$JAVA_OPTS -Xms$OUR_JAVA_MEM -Xmx$OUR_JAVA_MEM"

Note that OUR_JAVA_MEM is appended to the end of the string. For some
reason which I haven't found yet, OUR_JAVA_MEM is set to its default value
- 512 MB. I was able to fix it only by setting the SPARM_MEM variable in
the spark-env.sh file:

export SPARK_MEM=2g

However, this variable is deprecated, so my solution doesn't seem to be
good :)


On Thu, Jun 12, 2014 at 10:16 PM, Aaron Davidson <il...@gmail.com> wrote:

> The scripts for Spark 1.0 actually specify this property in
> /root/spark/conf/spark-defaults.conf
>
> I didn't know that this would override the --executor-memory flag, though,
> that's pretty odd.
>
>
> On Thu, Jun 12, 2014 at 6:02 PM, Aliaksei Litouka <
> aliaksei.litouka@gmail.com> wrote:
>
>> Yes, I am launching a cluster with the spark_ec2 script. I checked
>> /root/spark/conf/spark-env.sh on the master node and on slaves and it looks
>> like this:
>>
>> #!/usr/bin/env bash
>>> export SPARK_LOCAL_DIRS="/mnt/spark"
>>> # Standalone cluster options
>>> export SPARK_MASTER_OPTS=""
>>> export SPARK_WORKER_INSTANCES=1
>>> export SPARK_WORKER_CORES=1
>>> export HADOOP_HOME="/root/ephemeral-hdfs"
>>> export SPARK_MASTER_IP=ec2-54-89-95-238.compute-1.amazonaws.com
>>> export MASTER=`cat /root/spark-ec2/cluster-url`
>>> export
>>> SPARK_SUBMIT_LIBRARY_PATH="$SPARK_SUBMIT_LIBRARY_PATH:/root/ephemeral-hdfs/lib/native/"
>>> export
>>> SPARK_SUBMIT_CLASSPATH="$SPARK_CLASSPATH:$SPARK_SUBMIT_CLASSPATH:/root/ephemeral-hdfs/conf"
>>> # Bind Spark's web UIs to this machine's public EC2 hostname:
>>> export SPARK_PUBLIC_DNS=`wget -q -O -
>>> http://169.254.169.254/latest/meta-data/public-hostname`
>>> <http://169.254.169.254/latest/meta-data/public-hostname>
>>> # Set a high ulimit for large shuffles
>>> ulimit -n 1000000
>>
>>
>> None of these variables seem to be related to memory size. Let me know if
>> I am missing something.
>>
>>
>> On Thu, Jun 12, 2014 at 7:17 PM, Matei Zaharia <ma...@gmail.com>
>> wrote:
>>
>>> Are you launching this using our EC2 scripts? Or have you set up a
>>> cluster by hand?
>>>
>>> Matei
>>>
>>> On Jun 12, 2014, at 2:32 PM, Aliaksei Litouka <
>>> aliaksei.litouka@gmail.com> wrote:
>>>
>>> spark-env.sh doesn't seem to contain any settings related to memory size
>>> :( I will continue searching for a solution and will post it if I find it :)
>>> Thank you, anyway
>>>
>>>
>>> On Wed, Jun 11, 2014 at 12:19 AM, Matei Zaharia <matei.zaharia@gmail.com
>>> > wrote:
>>>
>>>> It might be that conf/spark-env.sh on EC2 is configured to set it to
>>>> 512, and is overriding the application’s settings. Take a look in there and
>>>> delete that line if possible.
>>>>
>>>> Matei
>>>>
>>>> On Jun 10, 2014, at 2:38 PM, Aliaksei Litouka <
>>>> aliaksei.litouka@gmail.com> wrote:
>>>>
>>>> > I am testing my application in EC2 cluster of m3.medium machines. By
>>>> default, only 512 MB of memory on each machine is used. I want to increase
>>>> this amount and I'm trying to do it by passing --executor-memory 2G option
>>>> to the spark-submit script, but it doesn't seem to work - each machine uses
>>>> only 512 MB instead of 2 gigabytes. What am I doing wrong? How do I
>>>> increase the amount of memory?
>>>>
>>>>
>>>
>>>
>>
>

Re: How to specify executor memory in EC2 ?

Posted by Aaron Davidson <il...@gmail.com>.

The scripts for Spark 1.0 actually specify this property in
/root/spark/conf/spark-defaults.conf

I didn't know that this would override the --executor-memory flag, though,
that's pretty odd.


On Thu, Jun 12, 2014 at 6:02 PM, Aliaksei Litouka <
aliaksei.litouka@gmail.com> wrote:

> Yes, I am launching a cluster with the spark_ec2 script. I checked
> /root/spark/conf/spark-env.sh on the master node and on slaves and it looks
> like this:
>
> #!/usr/bin/env bash
>> export SPARK_LOCAL_DIRS="/mnt/spark"
>> # Standalone cluster options
>> export SPARK_MASTER_OPTS=""
>> export SPARK_WORKER_INSTANCES=1
>> export SPARK_WORKER_CORES=1
>> export HADOOP_HOME="/root/ephemeral-hdfs"
>> export SPARK_MASTER_IP=ec2-54-89-95-238.compute-1.amazonaws.com
>> export MASTER=`cat /root/spark-ec2/cluster-url`
>> export
>> SPARK_SUBMIT_LIBRARY_PATH="$SPARK_SUBMIT_LIBRARY_PATH:/root/ephemeral-hdfs/lib/native/"
>> export
>> SPARK_SUBMIT_CLASSPATH="$SPARK_CLASSPATH:$SPARK_SUBMIT_CLASSPATH:/root/ephemeral-hdfs/conf"
>> # Bind Spark's web UIs to this machine's public EC2 hostname:
>> export SPARK_PUBLIC_DNS=`wget -q -O -
>> http://169.254.169.254/latest/meta-data/public-hostname`
>> <http://169.254.169.254/latest/meta-data/public-hostname>
>> # Set a high ulimit for large shuffles
>> ulimit -n 1000000
>
>
> None of these variables seem to be related to memory size. Let me know if
> I am missing something.
>
>
> On Thu, Jun 12, 2014 at 7:17 PM, Matei Zaharia <ma...@gmail.com>
> wrote:
>
>> Are you launching this using our EC2 scripts? Or have you set up a
>> cluster by hand?
>>
>> Matei
>>
>> On Jun 12, 2014, at 2:32 PM, Aliaksei Litouka <al...@gmail.com>
>> wrote:
>>
>> spark-env.sh doesn't seem to contain any settings related to memory size
>> :( I will continue searching for a solution and will post it if I find it :)
>> Thank you, anyway
>>
>>
>> On Wed, Jun 11, 2014 at 12:19 AM, Matei Zaharia <ma...@gmail.com>
>> wrote:
>>
>>> It might be that conf/spark-env.sh on EC2 is configured to set it to
>>> 512, and is overriding the application’s settings. Take a look in there and
>>> delete that line if possible.
>>>
>>> Matei
>>>
>>> On Jun 10, 2014, at 2:38 PM, Aliaksei Litouka <
>>> aliaksei.litouka@gmail.com> wrote:
>>>
>>> > I am testing my application in EC2 cluster of m3.medium machines. By
>>> default, only 512 MB of memory on each machine is used. I want to increase
>>> this amount and I'm trying to do it by passing --executor-memory 2G option
>>> to the spark-submit script, but it doesn't seem to work - each machine uses
>>> only 512 MB instead of 2 gigabytes. What am I doing wrong? How do I
>>> increase the amount of memory?
>>>
>>>
>>
>>
>

Re: How to specify executor memory in EC2 ?

Posted by Aliaksei Litouka <al...@gmail.com>.

Yes, I am launching a cluster with the spark_ec2 script. I checked
/root/spark/conf/spark-env.sh on the master node and on slaves and it looks
like this:

#!/usr/bin/env bash
> export SPARK_LOCAL_DIRS="/mnt/spark"
> # Standalone cluster options
> export SPARK_MASTER_OPTS=""
> export SPARK_WORKER_INSTANCES=1
> export SPARK_WORKER_CORES=1
> export HADOOP_HOME="/root/ephemeral-hdfs"
> export SPARK_MASTER_IP=ec2-54-89-95-238.compute-1.amazonaws.com
> export MASTER=`cat /root/spark-ec2/cluster-url`
> export
> SPARK_SUBMIT_LIBRARY_PATH="$SPARK_SUBMIT_LIBRARY_PATH:/root/ephemeral-hdfs/lib/native/"
> export
> SPARK_SUBMIT_CLASSPATH="$SPARK_CLASSPATH:$SPARK_SUBMIT_CLASSPATH:/root/ephemeral-hdfs/conf"
> # Bind Spark's web UIs to this machine's public EC2 hostname:
> export SPARK_PUBLIC_DNS=`wget -q -O -
> http://169.254.169.254/latest/meta-data/public-hostname`
> # Set a high ulimit for large shuffles
> ulimit -n 1000000


None of these variables seem to be related to memory size. Let me know if I
am missing something.


On Thu, Jun 12, 2014 at 7:17 PM, Matei Zaharia <ma...@gmail.com>
wrote:

> Are you launching this using our EC2 scripts? Or have you set up a cluster
> by hand?
>
> Matei
>
> On Jun 12, 2014, at 2:32 PM, Aliaksei Litouka <al...@gmail.com>
> wrote:
>
> spark-env.sh doesn't seem to contain any settings related to memory size
> :( I will continue searching for a solution and will post it if I find it :)
> Thank you, anyway
>
>
> On Wed, Jun 11, 2014 at 12:19 AM, Matei Zaharia <ma...@gmail.com>
> wrote:
>
>> It might be that conf/spark-env.sh on EC2 is configured to set it to 512,
>> and is overriding the application’s settings. Take a look in there and
>> delete that line if possible.
>>
>> Matei
>>
>> On Jun 10, 2014, at 2:38 PM, Aliaksei Litouka <al...@gmail.com>
>> wrote:
>>
>> > I am testing my application in EC2 cluster of m3.medium machines. By
>> default, only 512 MB of memory on each machine is used. I want to increase
>> this amount and I'm trying to do it by passing --executor-memory 2G option
>> to the spark-submit script, but it doesn't seem to work - each machine uses
>> only 512 MB instead of 2 gigabytes. What am I doing wrong? How do I
>> increase the amount of memory?
>>
>>
>
>

Re: How to specify executor memory in EC2 ?

Posted by Matei Zaharia <ma...@gmail.com>.

Are you launching this using our EC2 scripts? Or have you set up a cluster by hand?

Matei

On Jun 12, 2014, at 2:32 PM, Aliaksei Litouka <al...@gmail.com> wrote:

> spark-env.sh doesn't seem to contain any settings related to memory size :( I will continue searching for a solution and will post it if I find it :)
> Thank you, anyway
> 
> 
> On Wed, Jun 11, 2014 at 12:19 AM, Matei Zaharia <ma...@gmail.com> wrote:
> It might be that conf/spark-env.sh on EC2 is configured to set it to 512, and is overriding the application’s settings. Take a look in there and delete that line if possible.
> 
> Matei
> 
> On Jun 10, 2014, at 2:38 PM, Aliaksei Litouka <al...@gmail.com> wrote:
> 
> > I am testing my application in EC2 cluster of m3.medium machines. By default, only 512 MB of memory on each machine is used. I want to increase this amount and I'm trying to do it by passing --executor-memory 2G option to the spark-submit script, but it doesn't seem to work - each machine uses only 512 MB instead of 2 gigabytes. What am I doing wrong? How do I increase the amount of memory?
> 
>

Re: How to specify executor memory in EC2 ?

Posted by Aliaksei Litouka <al...@gmail.com>.

spark-env.sh doesn't seem to contain any settings related to memory size :(
I will continue searching for a solution and will post it if I find it :)
Thank you, anyway


On Wed, Jun 11, 2014 at 12:19 AM, Matei Zaharia <ma...@gmail.com>
wrote:

> It might be that conf/spark-env.sh on EC2 is configured to set it to 512,
> and is overriding the application’s settings. Take a look in there and
> delete that line if possible.
>
> Matei
>
> On Jun 10, 2014, at 2:38 PM, Aliaksei Litouka <al...@gmail.com>
> wrote:
>
> > I am testing my application in EC2 cluster of m3.medium machines. By
> default, only 512 MB of memory on each machine is used. I want to increase
> this amount and I'm trying to do it by passing --executor-memory 2G option
> to the spark-submit script, but it doesn't seem to work - each machine uses
> only 512 MB instead of 2 gigabytes. What am I doing wrong? How do I
> increase the amount of memory?
>
>

Re: How to specify executor memory in EC2 ?

Posted by Matei Zaharia <ma...@gmail.com>.

It might be that conf/spark-env.sh on EC2 is configured to set it to 512, and is overriding the application’s settings. Take a look in there and delete that line if possible.

Matei

On Jun 10, 2014, at 2:38 PM, Aliaksei Litouka <al...@gmail.com> wrote:

> I am testing my application in EC2 cluster of m3.medium machines. By default, only 512 MB of memory on each machine is used. I want to increase this amount and I'm trying to do it by passing --executor-memory 2G option to the spark-submit script, but it doesn't seem to work - each machine uses only 512 MB instead of 2 gigabytes. What am I doing wrong? How do I increase the amount of memory?