You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yongjia Wang (JIRA)" <ji...@apache.org> on 2015/09/22 20:01:05 UTC

[jira] [Comment Edited] (SPARK-5152) Let metrics.properties file take an hdfs:// path

    [ https://issues.apache.org/jira/browse/SPARK-5152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14902883#comment-14902883 ] 

Yongjia Wang edited comment on SPARK-5152 at 9/22/15 6:00 PM:
--------------------------------------------------------------

I voted for this. 
This would enable configuring metrics or log4j properties of all the workers just from one placing when submitting the job. Without it, you will have to setup on each of the workers. If hdfs:// can be supported, I assume s3n:// s3a:// would all be supported since they go through the same interface.
Alternatively, it's probably even better if there is a way, to specify through "conf" spark properties as in the spark-submit command line, to upload custom files to spark executor's working directory before the executor process starts. the "spark.files" option upload the files lazily when the first task starts, which is too late for configuration.


was (Author: yongjiaw):
I voted for this. 
It enables configuring metrics or log4j properties of all the workers just from the driver. Without it, you will have to setup on each of the workers.
Alternatively, it's probably even better if there is a way, to specify through "conf" spark properties in the spark-submit command line, to upload custom files to spark executor's working directory before the executor process starts. the "spark.files" option upload the files lazily when the first task starts, which is too late for configuration.

> Let metrics.properties file take an hdfs:// path
> ------------------------------------------------
>
>                 Key: SPARK-5152
>                 URL: https://issues.apache.org/jira/browse/SPARK-5152
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 1.2.0
>            Reporter: Ryan Williams
>
> From my reading of [the code|https://github.com/apache/spark/blob/06dc4b5206a578065ebbb6bb8d54246ca007397f/core/src/main/scala/org/apache/spark/metrics/MetricsConfig.scala#L53], the {{spark.metrics.conf}} property must be a path that is resolvable on the local filesystem of each executor.
> Running a Spark job with {{--conf spark.metrics.conf=hdfs://host1.domain.com/path/metrics.properties}} logs many errors (~1 per executor, presumably?) like:
> {code}
> 15/01/08 13:20:57 ERROR metrics.MetricsConfig: Error loading configure file
> java.io.FileNotFoundException: hdfs:/host1.domain.com/path/metrics.properties (No such file or directory)
>         at java.io.FileInputStream.open(Native Method)
>         at java.io.FileInputStream.<init>(FileInputStream.java:146)
>         at java.io.FileInputStream.<init>(FileInputStream.java:101)
>         at org.apache.spark.metrics.MetricsConfig.initialize(MetricsConfig.scala:53)
>         at org.apache.spark.metrics.MetricsSystem.<init>(MetricsSystem.scala:92)
>         at org.apache.spark.metrics.MetricsSystem$.createMetricsSystem(MetricsSystem.scala:218)
>         at org.apache.spark.SparkEnv$.create(SparkEnv.scala:329)
>         at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:181)
>         at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:131)
>         at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:61)
>         at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:60)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>         at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:60)
>         at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:113)
>         at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:163)
>         at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
> {code}
> which seems consistent with the idea that it's looking on the local filesystem and not parsing the "scheme" portion of the URL.
> Letting all executors get their {{metrics.properties}} files from one location on HDFS would be an improvement, right?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org