You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by sarutak <gi...@git.apache.org> on 2014/09/29 08:49:49 UTC

[GitHub] spark pull request: [SPARK-3718] FsHistoryProvider should consider...

GitHub user sarutak opened a pull request:

    https://github.com/apache/spark/pull/2573

    [SPARK-3718] FsHistoryProvider should consider spark.eventLog.dir not only spark.history.fs.logDirectory

    It's a minor improvement.
    
    FsHistoryProvider reads event logs from the directory represented as spark.history.fs.logDirectory, but I think the directory is nearly equal the directory represented as spark.eventLog.dir so we should consider spark.eventLog.dir too.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sarutak/spark SPARK-3718

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/2573.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2573
    
----
commit 2de89b42e642897315322bf1bbe761ee56073a7e
Author: Kousuke Saruta <sa...@oss.nttdata.co.jp>
Date:   2014-09-29T06:24:11Z

    Modified FsHistoryProvider.scala to consider spark.eventLog.dir property

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3718] FsHistoryProvider should consider...

Posted by WangTaoTheTonic <gi...@git.apache.org>.
Github user WangTaoTheTonic commented on the pull request:

    https://github.com/apache/spark/pull/2573#issuecomment-57285525
  
    Actually HistoryServer can read application logs generated by Spark apps on another node. The `spark.eventLog.dir` could be different between this and that. So on my opinion it is flexible to seperate the two configs.
    Also `spark.eventLog.dir` is activated only if `spark.eventLog.enabled` is true. If HistoryServer load data in `spark.eventLog.dir`, is it necessary to check value of `spark.eventLog.enabled`? 
    In a word current solution is simple and loose coupling.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3718] FsHistoryProvider should consider...

Posted by WangTaoTheTonic <gi...@git.apache.org>.
Github user WangTaoTheTonic commented on the pull request:

    https://github.com/apache/spark/pull/2573#issuecomment-57131370
  
    Looks like `spark.history.fs.logDirectory` and `spark.eventLog.dir` is same configuration item on different sides(driver side and HistoryServer side).  I thingk distinguishing them between each other is better to keep HistoryServer independent.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3718] FsHistoryProvider should consider...

Posted by sarutak <gi...@git.apache.org>.
Github user sarutak commented on the pull request:

    https://github.com/apache/spark/pull/2573#issuecomment-57202820
  
    Basically, I think it's good idea to separate configuration between Driver side and HistoryServer side but if we use HDFS as a storage for event logs, in most case,  spark.history.fs.logDirectory and spark.eventLog.dir is set to same. So, I think it's good to choose spark.eventLog.dir as a second candidate of event log's directory.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3718] FsHistoryProvider should consider...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2573#issuecomment-57128398
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20964/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3718] FsHistoryProvider should consider...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2573#issuecomment-57128394
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20964/consoleFull) for   PR 2573 at commit [`2de89b4`](https://github.com/apache/spark/commit/2de89b42e642897315322bf1bbe761ee56073a7e).
     * This patch **passes** unit tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3718] FsHistoryProvider should consider...

Posted by abraithwaite <gi...@git.apache.org>.
Github user abraithwaite commented on the pull request:

    https://github.com/apache/spark/pull/2573#issuecomment-140622568
  
    Hello!
    
    I was reading the explanation and I'm not quite sure I understand the reasoning still.  I spent a bit too long trying to figure out how to configure the executors to log to the correct hdfs directory.
    
    How exactly does a spark application connect _directly_ to a spark history server?  It's my understanding (correct me if I'm wrong) that the application logs to a directory and the history server reads that directory.  So even if you had two history servers, they'd presumably both only have one log directory configuration parameter, no?
    
    Clearly, the docs should at least be cleared up on the monitoring page.  https://spark.apache.org/docs/latest/monitoring.html has no mention of spark.eventLog.dir (although it does mention spark.eventLog.enabled).  It seems intuitive that these would be the same property.
    
    /cc @andrewor14 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3718] FsHistoryProvider should consider...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2573#issuecomment-57122876
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20964/consoleFull) for   PR 2573 at commit [`2de89b4`](https://github.com/apache/spark/commit/2de89b42e642897315322bf1bbe761ee56073a7e).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3718] FsHistoryProvider should consider...

Posted by sarutak <gi...@git.apache.org>.
Github user sarutak closed the pull request at:

    https://github.com/apache/spark/pull/2573


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-3718] FsHistoryProvider should consider...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/2573#issuecomment-57513273
  
    Hey @sarutak, these two are distinct in that `spark.eventLog.dir` is application-specific, while `spark.history.fs.logDirectory` is not. You may have two history servers for instance, and some applications want to connect to the first and others the second. I don't really see a benefit in conflating these two in any way. Would you mind closing this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org