You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@systemml.apache.org by Deron Eriksson <de...@gmail.com> on 2015/11/18 04:31:00 UTC

SystemML-config.xml in distributed Hadoop environment

Hello,

The SystemML binary release comes with a SystemML configuration file
(SystemML-config.xml) in its root directory. Are all the property
name/values in this file the recommended SystemML configuration settings
when running on a Hadoop cluster? Are any of these properties of particular
relevance when increasing performance for the cluster?

For example, I have a 4-node cluster with 3 data nodes. Should I change
<numreducers> to be 2x the number of data nodes, so change from 10 to 6?

Also, with regards to <optlevel>, what is being optimized and how does this
affect performance?

Thanks!
Deron

Re: SystemML-config.xml in distributed Hadoop environment

Posted by Niketan Pansare <np...@us.ibm.com>.

Hi Deron,

Few additions and corrections to my previous email:
1. Number of reducers is automatically set for the instructions where it
dramatically affects the performance (such as reblk, csvrblk and cpmm).
However, setting the number of reducers (to "2x nodes") in config file is
still a good rule of thumb as it will affect GMR jobs.

2. We added an optimization level and the levels are shifted:
- 0,1,2: same as described in previous email.
- 3: enables resource optimizer. For more details about this, please see
http://dl.acm.org/citation.cfm?id=2749432
- 4: GLOBAL TIME_MEMORY_BASED
- 5: DEBUG MODE

3. Since SystemML is based on hybrid execution model, the parameters
related to CP are extremely important for performance. There are two key
questions one need to ask:
3.a Where to run CP ?
- The spark users can compare the CP with the "driver" and draw similar
analogies. Setting dml.yarn.appmaster to true is similar to yarn-cluster
and setting it to false is similar to yarn-client mode. Please note that,
the former mode is especially useful if you have small head node.

3.b What is the memory budget of CP ?
- In Spark, setting driver memory to extremely small value (let's say 2GB)
can severely limit the amount of data one can broadcast and collect.
Similarly, setting CP memory to extremely small value limits the number of
operations that can be performed in CP and hence directly limits the
optimization scope of SystemML.
- So, it is recommended to provide reasonably high memory budgets for CP (
through dml.yarn.appmaster.mem or "HADOOP_CLIENT_OPTS").

4. Other useful parameters I forgot to mention are:
- JVM head size for mappers and reducers (dml.yarn.mapreduce.mem). Please
remember to not skew the degree of parallelism (i.e. maximum number of map
tasks on the cluster) while setting them.
-  "mapreduce.task.io.sort.mb ". The recommended value is 3 times the HDFS
block size.
- Multithreaded operations: cp.parallel.matrixmult, cp.parallel.textio. It
is recommended to keep it turned on.

Thanks,

Niketan Pansare



From:	Deron Eriksson <de...@gmail.com>
To:	dev@systemml.incubator.apache.org
Date:	11/18/2015 11:06 AM
Subject:	Re: SystemML-config.xml in distributed Hadoop environment



Thank you, Niketan. That information is very useful.

Deron


On Wed, Nov 18, 2015 at 8:25 AM, Niketan Pansare <np...@us.ibm.com>
wrote:

> Hi Deron,
>
> Please see the below answers:
>
> Are all the property
> name/values in this file the recommended SystemML configuration settings
> when running on a Hadoop cluster?
> Yes, but some are dependent on the size of cluster (for example: number
of
> reducers). So the user might need to modify them accordingly.
>
> Are any of these properties of particular
> relevance when increasing performance for the cluster?
> Yes. Going back to "the number of reducers" example, if one has 100 node
> cluster and using default "10" reducers would cause underutilization of
the
> cluster.
>
> For example, I have a 4-node cluster with 3 data nodes. Should I change
> <numreducers> to be 2x the number of data nodes, so change from 10 to 6?
> 2x nodes is a good rule of thumb for the number of reducers for "MR"
> backend. I verified this in the performance experiments.
>
> Also, with regards to <optlevel>, what is being optimized and how does
this
> affect performance?
> <optlevel> is a tuning flag for SystemML's runtime optimizer. I would
> recommend to use the default optlevel. Here is the documentation:
> * Optimization Types for Compilation
> *
> * O0 STATIC - Decisions for scheduling operations on CP/MR are based on
> * predefined set of rules, which check if the dimensions are below a
> * fixed/static threshold (OLD Method of choosing between CP and MR).
> * The optimization scope is LOCAL, i.e., per statement block.
> * Advanced rewrites like constant folding, common subexpression
> elimination,
> * or *inter* procedural analysis are NOT applied.
> *
> * O1 MEMORY_BASED - Every operation is scheduled on CP or MR, solely
> * based on the amount of memory required to perform that operation.
> * It does NOT take the execution time into account.
> * The optimization scope is LOCAL, i.e., per statement block.
> * Advanced rewrites like constant folding, common subexpression
> elimination,
> * or *inter* procedural analysis are NOT applied.
> *
> * O2 MEMORY_BASED - Every operation is scheduled on CP or MR, solely
> * based on the amount of memory required to perform that operation.
> * It does NOT take the execution time into account.
> * The optimization scope is LOCAL, i.e., per statement block.
> * All advanced rewrites are applied. This is the default optimization
> * level of SystemML.
> *
> * O3 GLOBAL TIME_MEMORY_BASED - Operation scheduling on CP or MR as well
> as
> * many other rewrites of data flow properties such as block size,
> partitioning,
> * replication, vectorization, etc are done with the optimization
objective
> of
> * minimizing execution time under hard memory constraints per operation
and
> * execution context. The optimization scope if GLOBAL, i.e.,
program-wide.
> * All advanced rewrites are applied. This optimization level requires
more
> * optimization time but has higher optimization potential.
> *
> * O4 DEBUG MODE - All optimizations, global and local, which interfere
> with
> * breakpoints are NOT applied. This optimization level is REQUIRED for
the
> * compiler running in debug mode.
>
> Thanks,
>
> Niketan Pansare
> IBM Almaden Research Center
> Phone (office): (408) 927 1740
> E-mail: npansar@us.ibm.com
> http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar
>
> [image: Inactive hide details for Deron Eriksson ---11/17/2015 07:31:06
> PM---Hello, The SystemML binary release comes with a SystemML c]Deron
> Eriksson ---11/17/2015 07:31:06 PM---Hello, The SystemML binary release
> comes with a SystemML configuration file
>
> From: Deron Eriksson <de...@gmail.com>
> To: dev@systemml.incubator.apache.org
> Date: 11/17/2015 07:31 PM
> Subject: SystemML-config.xml in distributed Hadoop environment
> ------------------------------
>
>
>
> Hello,
>
> The SystemML binary release comes with a SystemML configuration file
> (SystemML-config.xml) in its root directory. Are all the property
> name/values in this file the recommended SystemML configuration settings
> when running on a Hadoop cluster? Are any of these properties of
particular
> relevance when increasing performance for the cluster?
>
> For example, I have a 4-node cluster with 3 data nodes. Should I change
> <numreducers> to be 2x the number of data nodes, so change from 10 to 6?
>
> Also, with regards to <optlevel>, what is being optimized and how does
this
> affect performance?
>
> Thanks!
> Deron
>
>
>

Re: SystemML-config.xml in distributed Hadoop environment

Posted by Deron Eriksson <de...@gmail.com>.

Thank you, Niketan. That information is very useful.

Deron


On Wed, Nov 18, 2015 at 8:25 AM, Niketan Pansare <np...@us.ibm.com> wrote:

> Hi Deron,
>
> Please see the below answers:
>
> Are all the property
> name/values in this file the recommended SystemML configuration settings
> when running on a Hadoop cluster?
> Yes, but some are dependent on the size of cluster (for example: number of
> reducers). So the user might need to modify them accordingly.
>
> Are any of these properties of particular
> relevance when increasing performance for the cluster?
> Yes. Going back to "the number of reducers" example, if one has 100 node
> cluster and using default "10" reducers would cause underutilization of the
> cluster.
>
> For example, I have a 4-node cluster with 3 data nodes. Should I change
> <numreducers> to be 2x the number of data nodes, so change from 10 to 6?
> 2x nodes is a good rule of thumb for the number of reducers for "MR"
> backend. I verified this in the performance experiments.
>
> Also, with regards to <optlevel>, what is being optimized and how does this
> affect performance?
> <optlevel> is a tuning flag for SystemML's runtime optimizer. I would
> recommend to use the default optlevel. Here is the documentation:
> * Optimization Types for Compilation
> *
> * O0 STATIC - Decisions for scheduling operations on CP/MR are based on
> * predefined set of rules, which check if the dimensions are below a
> * fixed/static threshold (OLD Method of choosing between CP and MR).
> * The optimization scope is LOCAL, i.e., per statement block.
> * Advanced rewrites like constant folding, common subexpression
> elimination,
> * or *inter* procedural analysis are NOT applied.
> *
> * O1 MEMORY_BASED - Every operation is scheduled on CP or MR, solely
> * based on the amount of memory required to perform that operation.
> * It does NOT take the execution time into account.
> * The optimization scope is LOCAL, i.e., per statement block.
> * Advanced rewrites like constant folding, common subexpression
> elimination,
> * or *inter* procedural analysis are NOT applied.
> *
> * O2 MEMORY_BASED - Every operation is scheduled on CP or MR, solely
> * based on the amount of memory required to perform that operation.
> * It does NOT take the execution time into account.
> * The optimization scope is LOCAL, i.e., per statement block.
> * All advanced rewrites are applied. This is the default optimization
> * level of SystemML.
> *
> * O3 GLOBAL TIME_MEMORY_BASED - Operation scheduling on CP or MR as well
> as
> * many other rewrites of data flow properties such as block size,
> partitioning,
> * replication, vectorization, etc are done with the optimization objective
> of
> * minimizing execution time under hard memory constraints per operation and
> * execution context. The optimization scope if GLOBAL, i.e., program-wide.
> * All advanced rewrites are applied. This optimization level requires more
> * optimization time but has higher optimization potential.
> *
> * O4 DEBUG MODE - All optimizations, global and local, which interfere
> with
> * breakpoints are NOT applied. This optimization level is REQUIRED for the
> * compiler running in debug mode.
>
> Thanks,
>
> Niketan Pansare
> IBM Almaden Research Center
> Phone (office): (408) 927 1740
> E-mail: npansar@us.ibm.com
> http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar
>
> [image: Inactive hide details for Deron Eriksson ---11/17/2015 07:31:06
> PM---Hello, The SystemML binary release comes with a SystemML c]Deron
> Eriksson ---11/17/2015 07:31:06 PM---Hello, The SystemML binary release
> comes with a SystemML configuration file
>
> From: Deron Eriksson <de...@gmail.com>
> To: dev@systemml.incubator.apache.org
> Date: 11/17/2015 07:31 PM
> Subject: SystemML-config.xml in distributed Hadoop environment
> ------------------------------
>
>
>
> Hello,
>
> The SystemML binary release comes with a SystemML configuration file
> (SystemML-config.xml) in its root directory. Are all the property
> name/values in this file the recommended SystemML configuration settings
> when running on a Hadoop cluster? Are any of these properties of particular
> relevance when increasing performance for the cluster?
>
> For example, I have a 4-node cluster with 3 data nodes. Should I change
> <numreducers> to be 2x the number of data nodes, so change from 10 to 6?
>
> Also, with regards to <optlevel>, what is being optimized and how does this
> affect performance?
>
> Thanks!
> Deron
>
>
>

Re: SystemML-config.xml in distributed Hadoop environment

Posted by Niketan Pansare <np...@us.ibm.com>.

Hi Deron,

Please see the below answers:

Are all the property
name/values in this file the recommended SystemML configuration settings
when running on a Hadoop cluster?
Yes, but some are dependent on the size of cluster (for example: number of
reducers). So the user might need to modify them accordingly.

Are any of these properties of particular
relevance when increasing performance for the cluster?
Yes. Going back to "the number of reducers" example, if one has 100 node
cluster and using default "10" reducers would cause underutilization of the
cluster.

For example, I have a 4-node cluster with 3 data nodes. Should I change
<numreducers> to be 2x the number of data nodes, so change from 10 to 6?
2x nodes is a good rule of thumb for the number of reducers for "MR"
backend. I verified this in the performance experiments.

Also, with regards to <optlevel>, what is being optimized and how does this
affect performance?
<optlevel> is a tuning flag for SystemML's runtime optimizer. I would
recommend to use the default optlevel. Here is the documentation:
* Optimization Types for Compilation
	 *
	 *  O0 STATIC - Decisions for scheduling operations on CP/MR are
based on
	 *  predefined set of rules, which check if the dimensions are below
a
	 *  fixed/static threshold (OLD Method of choosing between CP and
MR).
	 *  The optimization scope is LOCAL, i.e., per statement block.
	 *  Advanced rewrites like constant folding, common subexpression
elimination,
	 *  or inter procedural analysis are NOT applied.
	 *
	 *  O1 MEMORY_BASED - Every operation is scheduled on CP or MR,
solely
	 *  based on the amount of memory required to perform that operation.
	 *  It does NOT take the execution time into account.
	 *  The optimization scope is LOCAL, i.e., per statement block.
	 *  Advanced rewrites like constant folding, common subexpression
elimination,
	 *  or inter procedural analysis are NOT applied.
	 *
	 *  O2 MEMORY_BASED - Every operation is scheduled on CP or MR,
solely
	 *  based on the amount of memory required to perform that operation.
	 *  It does NOT take the execution time into account.
	 *  The optimization scope is LOCAL, i.e., per statement block.
	 *  All advanced rewrites are applied. This is the default
optimization
	 *  level of SystemML.
	 *
	 *  O3 GLOBAL TIME_MEMORY_BASED - Operation scheduling on CP or MR as
well as
	 *  many other rewrites of data flow properties such as block size,
partitioning,
	 *  replication, vectorization, etc are done with the optimization
objective of
	 *  minimizing execution time under hard memory constraints per
operation and
	 *  execution context. The optimization scope if GLOBAL, i.e.,
program-wide.
	 *  All advanced rewrites are applied. This optimization level
requires more
	 *  optimization time but has higher optimization potential.
	 *
	 *  O4 DEBUG MODE - All optimizations, global and local, which
interfere with
	 *  breakpoints are NOT applied. This optimization level is REQUIRED
for the
	 *  compiler running in debug mode.

Thanks,

Niketan Pansare
IBM Almaden Research Center
Phone (office): (408) 927 1740
E-mail: npansar@us.ibm.com
http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar



From:	Deron Eriksson <de...@gmail.com>
To:	dev@systemml.incubator.apache.org
Date:	11/17/2015 07:31 PM
Subject:	SystemML-config.xml in distributed Hadoop environment



Hello,

The SystemML binary release comes with a SystemML configuration file
(SystemML-config.xml) in its root directory. Are all the property
name/values in this file the recommended SystemML configuration settings
when running on a Hadoop cluster? Are any of these properties of particular
relevance when increasing performance for the cluster?

For example, I have a 4-node cluster with 3 data nodes. Should I change
<numreducers> to be 2x the number of data nodes, so change from 10 to 6?

Also, with regards to <optlevel>, what is being optimized and how does this
affect performance?

Thanks!
Deron