You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Charlie Tsai (JIRA)" <ji...@apache.org> on 2017/08/29 22:44:00 UTC
[jira] [Comment Edited] (SPARK-19307) SPARK-17387 caused ignorance
of conf object passed to SparkContext:
[ https://issues.apache.org/jira/browse/SPARK-19307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146280#comment-16146280 ]
Charlie Tsai edited comment on SPARK-19307 at 8/29/17 10:43 PM:
----------------------------------------------------------------
Hi,
I am using 2.2.0 but find that command line {{--conf}} arguments are still not available when the {{SparkConf()}} object is instantiated. As a result, I can't check what has already been set using the command line {{--conf}} arguments in my driver and set additional configuration using {{setIfMissing}}. Instead, {{setIfMissing}} effectively overwrites whatever is passed in through the CLI.
For example, if my job is:
{code}
# debug.py
import pyspark
if __name__ == '__main__':
print(pyspark.SparkConf()._jconf) # is `None` but should include `--conf` arguments
default_conf = {
"spark.dynamicAllocation.maxExecutors": "36",
"spark.yarn.executor.memoryOverhead": "1500",
}
# these are supposed to be set only if not provided by the CLI args
spark_conf = pyspark.SparkConf()
for (k, v) in default_conf.items():
spark_conf.setIfMissing(k, v)
{code}
Running
{code}
spark-submit \
--master yarn \
--deploy-mode client \
--conf spark.yarn.executor.memoryOverhead=2500 \
--conf spark.dynamicAllocation.maxExecutors=128 \
debug.py
{code}
In 1.6.2 the CLI args take precedent, whereas in 2.2.0, {{SparkConf().getAll()}} appears empty even though {{--conf}} args were passed in already.
was (Author: ctsai):
Hi,
I am using 2.2.0 but find that command line {{--conf}} arguments are still not available when the {{SparkConf()}} object is instantiated. As a result, I can't check what has already been set using the command line {{--conf}} arguments in my driver and set additional configuration using {{setIfMissing}}. Instead, {{setIfMissing}} effectively overwrites whatever is passed in through the CLI.
For example, if my job is:
{code}
# debug.py
import pyspark
if __name__ == '__main__':
print(pyspark.SparkConf()._jconf) # is `None` but should include `--conf` arguments
default_conf = {
"spark.dynamicAllocation.maxExecutors": "36",
"spark.yarn.executor.memoryOverhead": "1500",
}
# these are suppsoed to be set only if not provided by the CLI args
spark_conf = pyspark.SparkConf()
for (k, v) in default_conf.items():
spark_conf.setIfMissing(k, v)
{code}
Running
{code}
spark-submit \
--master yarn \
--deploy-mode client \
--conf spark.yarn.executor.memoryOverhead=2500 \
--conf spark.dynamicAllocation.maxExecutors=128 \
debug.py
{code}
In 1.6.2 the CLI args take precedent, whereas in 2.2.0, {{SparkConf().getAll()}} appears empty even though {{--conf}} args were passed in already.
> SPARK-17387 caused ignorance of conf object passed to SparkContext:
> -------------------------------------------------------------------
>
> Key: SPARK-19307
> URL: https://issues.apache.org/jira/browse/SPARK-19307
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 2.1.0
> Reporter: yuriy_hupalo
> Assignee: Marcelo Vanzin
> Fix For: 2.1.1, 2.2.0
>
> Attachments: SPARK-19307.patch
>
>
> after patch SPARK-17387 was applied -- Sparkconf object is ignored when launching SparkContext programmatically via python from spark-submit:
> https://github.com/apache/spark/blob/master/python/pyspark/context.py#L128:
> in case when we are running python SparkContext(conf=xxx) from spark-submit:
> conf is set, conf._jconf is None ()
> passed as arg conf object is ignored (and used only when we are launching java_gateway).
> how to fix:
> python/pyspark/context.py:132
> {code:title=python/pyspark/context.py:132}
> if conf is not None and conf._jconf is not None:
> # conf has been initialized in JVM properly, so use conf directly. This represent the
> # scenario that JVM has been launched before SparkConf is created (e.g. SparkContext is
> # created and then stopped, and we create a new SparkConf and new SparkContext again)
> self._conf = conf
> else:
> self._conf = SparkConf(_jvm=SparkContext._jvm)
> + if conf:
> + for key, value in conf.getAll():
> + self._conf.set(key,value)
> + print(key,value)
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org