You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by sharkdtu <gi...@git.apache.org> on 2017/05/12 08:30:44 UTC

[GitHub] spark pull request #17963: [SPARK-20722][Core][History Server] Replay newer ...

GitHub user sharkdtu opened a pull request:

    https://github.com/apache/spark/pull/17963

    [SPARK-20722][Core][History Server] Replay newer event log that hasn't be replayed in advance for request

    ## What changes were proposed in this pull request?
    
    History server may replay logs slowly if the size of event logs in current checking period is very large. It will get stuck for a while before entering next checking period, if we request a newer application history ui, we get the error like "Application application_1481785469354_934016 not found". We can let history server replay the newer event log in advance for request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sharkdtu/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17963.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17963
    
----
commit 3005c7cd7c57fd0c6a0ea318760dc2dc3010e3aa
Author: sharkdtu <sh...@tencent.com>
Date:   2017-05-12T07:50:44Z

    Replay event log that hasn't be replayed in current checking period in advance for request

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17963: [SPARK-20722][Core][History Server] Replay newer event l...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17963
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17963: [SPARK-20722][CORE] Replay newer event log that hasn't b...

Posted by sharkdtu <gi...@git.apache.org>.
Github user sharkdtu commented on the issue:

    https://github.com/apache/spark/pull/17963
  
    @jerryshao Event log file will not be processed twice, you can review  `FsHistoryProvider.checkForLogs` and `FsHistoryProvider.mergeApplicationListing`. In next checking period, it will check event log length by comparing to the corresponding appinfo from `fileToAppInfo`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17963: [SPARK-20722][CORE] Replay newer event log that hasn't b...

Posted by sharkdtu <gi...@git.apache.org>.
Github user sharkdtu commented on the issue:

    https://github.com/apache/spark/pull/17963
  
    @jerryshao thx, i agree that. this pr may be a temporary fix before SPARK-18085


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17963: [SPARK-20722][CORE] Replay newer event log that h...

Posted by sharkdtu <gi...@git.apache.org>.
Github user sharkdtu closed the pull request at:

    https://github.com/apache/spark/pull/17963


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17963: [SPARK-20722][CORE] Replay newer event log that hasn't b...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:

    https://github.com/apache/spark/pull/17963
  
    From my understanding, with your fix there will be a chance this event log file will be processed twice, this could be a big overhead if event log is very large. Also this PR looks more a temporary fix, rather than a thorough solution.
    
    [SPARK-18085](https://issues.apache.org/jira/browse/SPARK-18085) is working on a thorough fix about the performance of SHS.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17963: [SPARK-20722][CORE] Replay newer event log that hasn't b...

Posted by sharkdtu <gi...@git.apache.org>.
Github user sharkdtu commented on the issue:

    https://github.com/apache/spark/pull/17963
  
    @ajbozarth Yes, this case is a big issue in my production cluster, where run nearly 20,000 applications every day.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17963: [SPARK-20722][CORE] Replay newer event log that hasn't b...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:

    https://github.com/apache/spark/pull/17963
  
    I don't think so. Because `mergeApplicationListing` and `getAppUI` are running in two different threads, there could be a chance where this two methods are processing the same event file. This could be happened when `mergeApplicationListing` is processing a large event file, at this time `getAppUI` kicked in.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17963: [SPARK-20722][CORE] Replay newer event log that hasn't b...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/17963
  
    I haven't looked at the code, but since this won't make it into 2.2, and I'll be pushing hard to get SPARK-18085 into 2.3, this is throw-away code IMO. I'd rather spend resources into getting SPARK-18085 reviewed (hint hint hint) than on this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17963: [SPARK-20722][CORE] Replay newer event log that hasn't b...

Posted by sharkdtu <gi...@git.apache.org>.
Github user sharkdtu commented on the issue:

    https://github.com/apache/spark/pull/17963
  
    cc @srowen @ajbozarth 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17963: [SPARK-20722][CORE] Replay newer event log that hasn't b...

Posted by ajbozarth <gi...@git.apache.org>.
Github user ajbozarth commented on the issue:

    https://github.com/apache/spark/pull/17963
  
    Honestly this feels a bit hack-y, is this a big issue for you? This should only happen if you start up SHS on a large directly then want to access the UI for an App that started after starting the SHS. It just feels like too much of an edge-case to add a potential duplication of log processing. If this case is an big issue though I would agree this is the best "patch" while waiting for SHSv2. I'll defer to @srowen and @vanzin on this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org