You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by twinkle-sachdeva <gi...@git.apache.org> on 2015/02/02 16:37:56 UTC

[GitHub] spark pull request: SPARK-4705:Creating different log directories ...

GitHub user twinkle-sachdeva opened a pull request:

    https://github.com/apache/spark/pull/4311

    SPARK-4705:Creating different log directories for different app attempts...

    ... in case of yarn cluster mode

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/twinkle-sachdeva/spark SPARK-4705

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/4311.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4311
    
----
commit 5d9eedf1731f8e91fdb3ac16e40a6523c453375e
Author: twinkle sachdeva <tw...@guavus.com>
Date:   2015-02-02T15:38:35Z

    SPARK-4705:Creating different log directories for different app attempts in case of yarn cluster mode

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4705:Creating different log directories ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4311#issuecomment-75321476
  
      [Test build #27790 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27790/consoleFull) for   PR 4311 at commit [`9477d5b`](https://github.com/apache/spark/commit/9477d5b653abf418a9a4ba051a554b8ccd61e858).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4705:Creating different log directories ...

Posted by twinkle-sachdeva <gi...@git.apache.org>.
Github user twinkle-sachdeva commented on the pull request:

    https://github.com/apache/spark/pull/4311#issuecomment-76601050
  
    Hi,
    
    Thanks @andrewor14 , for the patience. 
    I have created another pull request https://github.com/apache/spark/pull/4845 as per the new structure of event logging etc.
    
    --Twinkle


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4705:Creating different log directories ...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/4311#issuecomment-72729132
  
    Hi @twinkle-sachdeva,
    
    I left some comments about your design in the bug. Also, in general, changes first go into the master branch, and then are backported to other branches if desired. I think that pattern should be followed here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4705:Creating different log directories ...

Posted by twinkle-sachdeva <gi...@git.apache.org>.
Github user twinkle-sachdeva closed the pull request at:

    https://github.com/apache/spark/pull/4311


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4705:Creating different log directories ...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/4311#issuecomment-76289825
  
    Ok, just ping us on your new patch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4705:Creating different log directories ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4311#issuecomment-73693081
  
      [Test build #27202 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27202/consoleFull) for   PR 4311 at commit [`9477d5b`](https://github.com/apache/spark/commit/9477d5b653abf418a9a4ba051a554b8ccd61e858).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4705:Creating different log directories ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4311#issuecomment-73692210
  
      [Test build #27201 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27201/consoleFull) for   PR 4311 at commit [`37901a5`](https://github.com/apache/spark/commit/37901a5cc5bd1641c7e64590b97f3e30effd6c2d).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4705:Creating different log directories ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4311#issuecomment-73692021
  
      [Test build #27201 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27201/consoleFull) for   PR 4311 at commit [`37901a5`](https://github.com/apache/spark/commit/37901a5cc5bd1641c7e64590b97f3e30effd6c2d).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4705:Creating different log directories ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4311#issuecomment-73692213
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27201/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4705:Creating different log directories ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4311#issuecomment-73302969
  
      [Test build #26935 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26935/consoleFull) for   PR 4311 at commit [`5d9eedf`](https://github.com/apache/spark/commit/5d9eedf1731f8e91fdb3ac16e40a6523c453375e).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4705:Creating different log directories ...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4311#discussion_r25105177
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/HistoryPage.scala ---
    @@ -31,15 +33,25 @@ private[spark] class HistoryPage(parent: HistoryServer) extends WebUIPage("") {
         val requestedPage = Option(request.getParameter("page")).getOrElse("1").toInt
         val requestedFirst = (requestedPage - 1) * pageSize
     
    -    val allApps = parent.getApplicationList()
    -    val actualFirst = if (requestedFirst < allApps.size) requestedFirst else 0
    -    val apps = allApps.slice(actualFirst, Math.min(actualFirst + pageSize, allApps.size))
    -
    +    val applicationNattemptsList = parent.getApplicationList()
    +    val (hasAttemptInfo, appToAttemptMap)  = getApplicationLevelList(applicationNattemptsList)
    +    val allAppsSize = if(hasAttemptInfo) appToAttemptMap.size else applicationNattemptsList.size
    +    val actualFirst = if (requestedFirst < allAppsSize) requestedFirst else 0
    +    val apps = applicationNattemptsList.slice(actualFirst, 
    +                                              Math.min(actualFirst + pageSize,
    +                                              allAppsSize))
    +    val appWithAttemptsDisplayList = appToAttemptMap.slice(actualFirst, 
    +                                                           Math.min(actualFirst + pageSize,
    +                                                           allAppsSize))
         val actualPage = (actualFirst / pageSize) + 1
    -    val last = Math.min(actualFirst + pageSize, allApps.size) - 1
    -    val pageCount = allApps.size / pageSize + (if (allApps.size % pageSize > 0) 1 else 0)
    +    val last = Math.min(actualFirst + pageSize, allAppsSize) - 1
    +    val pageCount = allAppsSize / pageSize + (if (allAppsSize % pageSize > 0) 1 else 0)
    + 
    +     val appTable = if(hasAttemptInfo ) UIUtils.listingTable(appWithAttemptHeader,
    --- End diff --
    
    val appTable =
      if (hasAttemptInfo) {
        UIUtils.listingTable(...)
      } else {
        UIUtils.listingTable(...)
      }


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4705:Creating different log directories ...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4311#discussion_r25104871
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
    @@ -166,21 +167,65 @@ private[history] class FsHistoryProvider(conf: SparkConf) extends ApplicationHis
                 newLastModifiedTime = math.max(newLastModifiedTime, modTime)
                 modTime > lastModifiedTime
               } else {
    -            false
    +            val appLogStatus = fs.listStatus(new Path(dir.getPath().toUri()))
    +            val appAttemptsDirs = if (appLogStatus != null) appLogStatus.filter(_.isDir).toSeq 
    +                                  else Seq[FileStatus]()
    --- End diff --
    
    ```
    val appAttemptDirs =
      if (appLogStatus != null) {
        appLogStatus.filter(_.isDir).toSeq
      } else {
        Seq[FileStatus]()
      }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4705:Creating different log directories ...

Posted by twinkle-sachdeva <gi...@git.apache.org>.
Github user twinkle-sachdeva commented on the pull request:

    https://github.com/apache/spark/pull/4311#issuecomment-75903400
  
    Hi @andrewor14,
    
    I will create pull request for master branch soon. Actually my hard-disk got crashed, that had changes in master branch too. It will take me 3-4 days more. I will keep an eye on formatting also, this time.
    I hope this much delay is fine.
    
    Thanks,
    Twinkle


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4705:Creating different log directories ...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/4311#issuecomment-75324157
  
    @twinkle-sachdeva I just noticed that there are a lot of style guide violations in this patch. Please look at how the rest of the code is formatted for reference. Also, this patch is currently highly tailored to the implementation in event log format in Spark 1.2, which should be very different from that used in Spark 1.3. I would imagine that it will be a non-trivial amount of work to bring this patch up to date to master. Would you still prefer to do this, or have one of @vanzin or me take over? In both cases we'll be sure to give you credit in the release notes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4705:Creating different log directories ...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4311#discussion_r24267784
  
    --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala ---
    @@ -88,6 +88,10 @@ private[spark] class ApplicationMaster(args: ApplicationMasterArguments,
     
             // Propagate the application ID so that YarnClusterSchedulerBackend can pick it up.
             System.setProperty("spark.yarn.app.id", appAttemptId.getApplicationId().toString())
    +
    +       //Propagate the attempt if, so that in case of event logging, different attempt's logs gets created in different directory
    --- End diff --
    
    this line is too long and will fail tests


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4705:Creating different log directories ...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/4311#issuecomment-73302414
  
    ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4705:Creating different log directories ...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/4311#issuecomment-75321260
  
    @twinkle-sachdeva Would you mind creating an equivalent PR on the master branch? It will make it speed up the review/merge process for us. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4705:Creating different log directories ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4311#issuecomment-73705114
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27202/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4705:Creating different log directories ...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4311#discussion_r25105114
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
    @@ -166,21 +167,65 @@ private[history] class FsHistoryProvider(conf: SparkConf) extends ApplicationHis
                 newLastModifiedTime = math.max(newLastModifiedTime, modTime)
                 modTime > lastModifiedTime
               } else {
    -            false
    +            val appLogStatus = fs.listStatus(new Path(dir.getPath().toUri()))
    +            val appAttemptsDirs = if (appLogStatus != null) appLogStatus.filter(_.isDir).toSeq 
    +                                  else Seq[FileStatus]()
    +            var isValidApplicationLogToUpdate = false
    +            breakable {
    +                for (appAttemptDir <- appAttemptsDirs){
    +                    // There are multiple attempts inside this application logs
    +                    if (fs.isFile(new Path(appAttemptDir.getPath(),
    +                                           EventLoggingListener.APPLICATION_COMPLETE))) {
    +                      val modTime = getModificationTime(dir)
    +                      newLastModifiedTime = math.max(newLastModifiedTime, modTime)
    +                      isValidApplicationLogToUpdate = modTime > lastModifiedTime
    +                      break
    +                    }
    +                }
    +            }
    +            isValidApplicationLogToUpdate
               }
    -        }
    -        .flatMap { dir =>
    -          try {
    -            val (replayBus, appListener) = createReplayBus(dir)
    -            replayBus.replay()
    -            Some(new FsApplicationHistoryInfo(
    -              dir.getPath().getName(),
    -              appListener.appId.getOrElse(dir.getPath().getName()),
    -              appListener.appName.getOrElse(NOT_STARTED),
    -              appListener.startTime.getOrElse(-1L),
    -              appListener.endTime.getOrElse(-1L),
    -              getModificationTime(dir),
    -              appListener.sparkUser.getOrElse(NOT_STARTED)))
    +         }
    +         .flatMap { dir =>
    +           val appAttemptsApplicationHistoryInfo = 
    +             new scala.collection.mutable.ArrayBuffer[FsApplicationHistoryInfo]()
    +           try {
    +            if (!fs.isFile(new Path(dir.getPath(), EventLoggingListener.APPLICATION_COMPLETE))) {
    +              // There are multiple attempts inside this application logs
    +              val appLogStatus = fs.listStatus(new Path(dir.getPath().toUri()))
    +              val appAttemptSubDirs:Seq[FileStatus] = 
    +                if (appLogStatus != null) appLogStatus.filter(_.isDir).toSeq else Seq[FileStatus]()
    +               
    +              for (appAttemptDir <- appAttemptSubDirs){
    +               // There are multiple attempts inside this application logs
    +                 if (fs.isFile(new Path(appAttemptDir.getPath(), 
    +                                     EventLoggingListener.APPLICATION_COMPLETE))) {
    +                   val (replayBus, appListener) = createReplayBus(appAttemptDir)
    +                   replayBus.replay() 
    +                   appAttemptsApplicationHistoryInfo += new FsApplicationHistoryInfo( 
    +                        dir.getPath().getName() + "/" + appAttemptDir.getPath().getName(),
    +                        dir.getPath().getName() + "_attemptid_" + appAttemptDir.getPath().getName(),
    +                        appListener.appName.getOrElse(NOT_STARTED),
    +                        appListener.startTime.getOrElse(-1L),
    +                        appListener.endTime.getOrElse(-1L),
    +                        getModificationTime(dir),
    +                        appListener.sparkUser.getOrElse(NOT_STARTED)) 
    +                }
    +              }
    +              
    +            } else {
    +              val (replayBus, appListener) = createReplayBus(dir)
    +              replayBus.replay()
    +              appAttemptsApplicationHistoryInfo += new FsApplicationHistoryInfo(
    +                     dir.getPath().getName(),
    +                     appListener.appId.getOrElse(dir.getPath().getName()),
    +                     appListener.appName.getOrElse(NOT_STARTED),
    +                     appListener.startTime.getOrElse(-1L),
    +                     appListener.endTime.getOrElse(-1L),
    +                     getModificationTime(dir),
    +                     appListener.sparkUser.getOrElse(NOT_STARTED))
    --- End diff --
    
    this is indented too much. Please use 2 spaces instead


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4705:Creating different log directories ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4311#issuecomment-73705104
  
      [Test build #27202 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27202/consoleFull) for   PR 4311 at commit [`9477d5b`](https://github.com/apache/spark/commit/9477d5b653abf418a9a4ba051a554b8ccd61e858).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4705:Creating different log directories ...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4311#discussion_r25104555
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -348,6 +348,7 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli
       taskScheduler.start()
     
       val applicationId: String = taskScheduler.applicationId()
    +  val applicationAttemptId : String = taskScheduler.applicationAttemptId()
    --- End diff --
    
    no space before `:`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4705:Creating different log directories ...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4311#discussion_r25104823
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
    @@ -166,21 +167,65 @@ private[history] class FsHistoryProvider(conf: SparkConf) extends ApplicationHis
                 newLastModifiedTime = math.max(newLastModifiedTime, modTime)
                 modTime > lastModifiedTime
               } else {
    -            false
    +            val appLogStatus = fs.listStatus(new Path(dir.getPath().toUri()))
    +            val appAttemptsDirs = if (appLogStatus != null) appLogStatus.filter(_.isDir).toSeq 
    +                                  else Seq[FileStatus]()
    +            var isValidApplicationLogToUpdate = false
    +            breakable {
    +                for (appAttemptDir <- appAttemptsDirs){
    +                    // There are multiple attempts inside this application logs
    +                    if (fs.isFile(new Path(appAttemptDir.getPath(),
    +                                           EventLoggingListener.APPLICATION_COMPLETE))) {
    +                      val modTime = getModificationTime(dir)
    +                      newLastModifiedTime = math.max(newLastModifiedTime, modTime)
    +                      isValidApplicationLogToUpdate = modTime > lastModifiedTime
    +                      break
    +                    }
    +                }
    +            }
    --- End diff --
    
    please do not use break here. While it's unnatural that Scala does not provide native support for breaks, we try to avoid breaks elsewhere in the code base because it requires us to wrap the code in a `breakable` block. Here you can just use a while loop with a boolean var instead.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4705:Creating different log directories ...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4311#discussion_r25105076
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
    @@ -166,21 +167,65 @@ private[history] class FsHistoryProvider(conf: SparkConf) extends ApplicationHis
                 newLastModifiedTime = math.max(newLastModifiedTime, modTime)
                 modTime > lastModifiedTime
               } else {
    -            false
    +            val appLogStatus = fs.listStatus(new Path(dir.getPath().toUri()))
    +            val appAttemptsDirs = if (appLogStatus != null) appLogStatus.filter(_.isDir).toSeq 
    +                                  else Seq[FileStatus]()
    +            var isValidApplicationLogToUpdate = false
    +            breakable {
    +                for (appAttemptDir <- appAttemptsDirs){
    +                    // There are multiple attempts inside this application logs
    +                    if (fs.isFile(new Path(appAttemptDir.getPath(),
    +                                           EventLoggingListener.APPLICATION_COMPLETE))) {
    +                      val modTime = getModificationTime(dir)
    +                      newLastModifiedTime = math.max(newLastModifiedTime, modTime)
    +                      isValidApplicationLogToUpdate = modTime > lastModifiedTime
    +                      break
    +                    }
    +                }
    +            }
    +            isValidApplicationLogToUpdate
               }
    -        }
    -        .flatMap { dir =>
    -          try {
    -            val (replayBus, appListener) = createReplayBus(dir)
    -            replayBus.replay()
    -            Some(new FsApplicationHistoryInfo(
    -              dir.getPath().getName(),
    -              appListener.appId.getOrElse(dir.getPath().getName()),
    -              appListener.appName.getOrElse(NOT_STARTED),
    -              appListener.startTime.getOrElse(-1L),
    -              appListener.endTime.getOrElse(-1L),
    -              getModificationTime(dir),
    -              appListener.sparkUser.getOrElse(NOT_STARTED)))
    +         }
    +         .flatMap { dir =>
    +           val appAttemptsApplicationHistoryInfo = 
    +             new scala.collection.mutable.ArrayBuffer[FsApplicationHistoryInfo]()
    --- End diff --
    
    we should just import `scala.collection.mutable.ArrayBuffer` at the top


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4705:Creating different log directories ...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/4311#issuecomment-76680225
  
    Mind closing this PR? at the least, this is not opened vs `master` anyway.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4705:Creating different log directories ...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4311#discussion_r25104604
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -364,7 +365,8 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli
       private[spark] val eventLogger: Option[EventLoggingListener] = {
         if (isEventLogEnabled) {
           val logger =
    -        new EventLoggingListener(applicationId, eventLogDir.get, conf, hadoopConfiguration)
    +        new EventLoggingListener(applicationId, applicationAttemptId,
    +                                 eventLogDir.get, conf, hadoopConfiguration)
    --- End diff --
    
    prefer this style:
    ```
    val logger = new EventLoggingListener(
      applicationId, applicationAttemptId, eventLogDir.get, conf, hadoopConfiguration)
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4705:Creating different log directories ...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/4311#issuecomment-73303308
  
    In this particular case we might actually need separate PRs for 1.2 and the Master because the event logs are produced differently there. I wonder if this also applies to standalone mode


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4705:Creating different log directories ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4311#issuecomment-73303253
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26935/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4705:Creating different log directories ...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/4311#issuecomment-75321142
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4705:Creating different log directories ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4311#issuecomment-72478171
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4705:Creating different log directories ...

Posted by twinkle-sachdeva <gi...@git.apache.org>.
Github user twinkle-sachdeva commented on the pull request:

    https://github.com/apache/spark/pull/4311#issuecomment-73693030
  
    Hi @vanzin,
    
    Will it be fine to review it in current state, or should I create a pull request for master branch first?
    Either way, i can merge it , as and when required.
    
    Thanks,
    Twinkle


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4705:Creating different log directories ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4311#issuecomment-73303249
  
      [Test build #26935 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26935/consoleFull) for   PR 4311 at commit [`5d9eedf`](https://github.com/apache/spark/commit/5d9eedf1731f8e91fdb3ac16e40a6523c453375e).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4705:Creating different log directories ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/4311#issuecomment-75337128
  
      [Test build #27790 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27790/consoleFull) for   PR 4311 at commit [`9477d5b`](https://github.com/apache/spark/commit/9477d5b653abf418a9a4ba051a554b8ccd61e858).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: SPARK-4705:Creating different log directories ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/4311#issuecomment-75337134
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27790/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org