You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "John Zhuge (JIRA)" <ji...@apache.org> on 2018/06/14 03:22:00 UTC
[jira] [Comment Edited] (SPARK-5152) Let metrics.properties file take an hdfs:// path

    [ https://issues.apache.org/jira/browse/SPARK-5152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16511724#comment-16511724 ] 

John Zhuge edited comment on SPARK-5152 at 6/14/18 3:21 AM:
------------------------------------------------------------

SPARK-7169 alleviated this issue, however, still find this approach *spark.metrics.conf=s3://bucket/spark-metrics/graphite.properties* a little more simple and convenient. Compared to *spark.metrics.conf.** in SparkConf, a metrics config file can group the properties together, separate from the rest of the properties. In my case, there are 10 properties to config. It is also easy to swap out the config file for different users or different purposes, especially in self-serving environments. I wish spark-submit can accept multiple '--properties-file' options.

Pretty simple change. Let me know whether I can post a PR.
{code:java}
- case Some(f) => new FileInputStream(f)
+ case Some(f) =>
+   val hadoopPath = new Path(Utils.resolveURI(f))
+   Utils.getHadoopFileSystem(hadoopPath.toUri, new Configuration()).open(hadoopPath)
{code}
 


was (Author: jzhuge):
SPARK-7169 alleviated this issue, however, still find this approach *spark.metrics.conf=s3://bucket/spark-metrics/graphite.properties* a little more convenient and clean. Compared to *spark.metrics.conf.** in SparkConf, a metrics config file groups the properties together, separate from the rest of the Spark properties. In my case, there are 10 properties. It is easy to swap out the config file by different users or for different purposes, especially in a self-serving environment. I wish spark-submit can accept multiple '--properties-file' options.

The downside is this will add one more dependency on hadoop-client in spark-core, besides history server.

Pretty simple change. Let me know whether I can post an PR.
{code:java}
- case Some(f) => new FileInputStream(f)
+ case Some(f) =>
+   val hadoopPath = new Path(Utils.resolveURI(f))
+   Utils.getHadoopFileSystem(hadoopPath.toUri, new Configuration()).open(hadoopPath)
{code}
 

> Let metrics.properties file take an hdfs:// path
> ------------------------------------------------
>
>                 Key: SPARK-5152
>                 URL: https://issues.apache.org/jira/browse/SPARK-5152
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 1.2.0
>            Reporter: Ryan Williams
>            Priority: Major
>
> From my reading of [the code|https://github.com/apache/spark/blob/06dc4b5206a578065ebbb6bb8d54246ca007397f/core/src/main/scala/org/apache/spark/metrics/MetricsConfig.scala#L53], the {{spark.metrics.conf}} property must be a path that is resolvable on the local filesystem of each executor.
> Running a Spark job with {{--conf spark.metrics.conf=hdfs://host1.domain.com/path/metrics.properties}} logs many errors (~1 per executor, presumably?) like:
> {code}
> 15/01/08 13:20:57 ERROR metrics.MetricsConfig: Error loading configure file
> java.io.FileNotFoundException: hdfs:/host1.domain.com/path/metrics.properties (No such file or directory)
>         at java.io.FileInputStream.open(Native Method)
>         at java.io.FileInputStream.<init>(FileInputStream.java:146)
>         at java.io.FileInputStream.<init>(FileInputStream.java:101)
>         at org.apache.spark.metrics.MetricsConfig.initialize(MetricsConfig.scala:53)
>         at org.apache.spark.metrics.MetricsSystem.<init>(MetricsSystem.scala:92)
>         at org.apache.spark.metrics.MetricsSystem$.createMetricsSystem(MetricsSystem.scala:218)
>         at org.apache.spark.SparkEnv$.create(SparkEnv.scala:329)
>         at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:181)
>         at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:131)
>         at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:61)
>         at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:60)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>         at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:60)
>         at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:113)
>         at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:163)
>         at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
> {code}
> which seems consistent with the idea that it's looking on the local filesystem and not parsing the "scheme" portion of the URL.
> Letting all executors get their {{metrics.properties}} files from one location on HDFS would be an improvement, right?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org