You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by gengliangwang <gi...@git.apache.org> on 2017/11/20 07:13:04 UTC

[GitHub] spark pull request #19786: [SPARK-22559][CORE]history server: handle excepti...

GitHub user gengliangwang opened a pull request:

    https://github.com/apache/spark/pull/19786

    [SPARK-22559][CORE]history server: handle exception on opening corrupted listing.ldb

    ## What changes were proposed in this pull request?
    Currently history server v2 failed to start if `listing.ldb` is corrupted.
    This patch get rid of the corrupted `listing.ldb` and re-create it. 
    The exception handling follows [opening disk store for app](https://github.com/apache/spark/blob/0ffa7c488fa8156e2a1aa282e60b7c36b86d8af8/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala#L307)
    ## How was this patch tested?
    manual test
    
    Please review http://spark.apache.org/contributing.html before opening a pull request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gengliangwang/spark listingException

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19786.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19786
    
----
commit 054cd7e208c908aa50cc8030b15feee9d2ce71e9
Author: Wang Gengliang <lt...@gmail.com>
Date:   2017-11-20T06:59:40Z

    history server: handle exception on opening corrupted listing.ldb

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19786: [SPARK-22559][CORE]history server: handle exception on o...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/19786
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19786: [SPARK-22559][CORE]history server: handle exception on o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19786
  
    **[Test build #84015 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84015/testReport)** for PR 19786 at commit [`054cd7e`](https://github.com/apache/spark/commit/054cd7e208c908aa50cc8030b15feee9d2ce71e9).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19786: [SPARK-22559][CORE]history server: handle excepti...

Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19786#discussion_r151988844
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
    @@ -132,15 +132,20 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
           AppStatusStore.CURRENT_VERSION, logDir.toString())
     
         try {
    -      open(new File(path, "listing.ldb"), metadata)
    +      open(dbPath, metadata)
         } catch {
           // If there's an error, remove the listing database and any existing UI database
           // from the store directory, since it's extremely likely that they'll all contain
           // incompatible information.
           case _: UnsupportedStoreVersionException | _: MetadataMismatchException =>
             logInfo("Detected incompatible DB versions, deleting...")
             path.listFiles().foreach(Utils.deleteRecursively)
    -        open(new File(path, "listing.ldb"), metadata)
    +        open(dbPath, metadata)
    +      case e: Exception =>
    +        // Get rid of the old data and re-create it. The store is either old or corrupted.
    +        logWarning(s"Failed to load disk store $path for app listing.", e)
    --- End diff --
    
    No, the file listing.ldb is built from event logs, which stores list of applications(history server home page). It is used like cache.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19786: [SPARK-22559][CORE]history server: handle excepti...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/19786


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19786: [SPARK-22559][CORE]history server: handle exception on o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19786
  
    **[Test build #84078 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84078/testReport)** for PR 19786 at commit [`8cfa0b2`](https://github.com/apache/spark/commit/8cfa0b23cdf44c37452835beee88e1eb4e7f4cfc).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19786: [SPARK-22559][CORE]history server: handle exception on o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19786
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19786: [SPARK-22559][CORE]history server: handle exception on o...

Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on the issue:

    https://github.com/apache/spark/pull/19786
  
    @vanzin Please help review this.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19786: [SPARK-22559][CORE]history server: handle exception on o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19786
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84015/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19786: [SPARK-22559][CORE]history server: handle exception on o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19786
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19786: [SPARK-22559][CORE]history server: handle exception on o...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/19786
  
    thanks, merging to master!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19786: [SPARK-22559][CORE]history server: handle exception on o...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/19786
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19786: [SPARK-22559][CORE]history server: handle exception on o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19786
  
    **[Test build #84021 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84021/testReport)** for PR 19786 at commit [`054cd7e`](https://github.com/apache/spark/commit/054cd7e208c908aa50cc8030b15feee9d2ce71e9).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19786: [SPARK-22559][CORE]history server: handle exception on o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19786
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19786: [SPARK-22559][CORE]history server: handle exception on o...

Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on the issue:

    https://github.com/apache/spark/pull/19786
  
    retest this please.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19786: [SPARK-22559][CORE]history server: handle exception on o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19786
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84021/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19786: [SPARK-22559][CORE]history server: handle excepti...

Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19786#discussion_r152040491
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
    @@ -132,15 +132,20 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
           AppStatusStore.CURRENT_VERSION, logDir.toString())
     
         try {
    -      open(new File(path, "listing.ldb"), metadata)
    +      open(dbPath, metadata)
         } catch {
           // If there's an error, remove the listing database and any existing UI database
           // from the store directory, since it's extremely likely that they'll all contain
           // incompatible information.
           case _: UnsupportedStoreVersionException | _: MetadataMismatchException =>
             logInfo("Detected incompatible DB versions, deleting...")
             path.listFiles().foreach(Utils.deleteRecursively)
    -        open(new File(path, "listing.ldb"), metadata)
    +        open(dbPath, metadata)
    +      case e: Exception =>
    +        // Get rid of the old data and re-create it. The store is either old or corrupted.
    +        logWarning(s"Failed to load disk store $path for app listing.", e)
    --- End diff --
    
    Yes I thought about catch `DBException`
    But in this pr I just follows opening disk store for app:https://github.com/apache/spark/blob/0ffa7c488fa8156e2a1aa282e60b7c36b86d8af8/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala#L307


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19786: [SPARK-22559][CORE]history server: handle exception on o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19786
  
    **[Test build #84015 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84015/testReport)** for PR 19786 at commit [`054cd7e`](https://github.com/apache/spark/commit/054cd7e208c908aa50cc8030b15feee9d2ce71e9).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19786: [SPARK-22559][CORE]history server: handle excepti...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19786#discussion_r151983008
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
    @@ -132,15 +132,20 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
           AppStatusStore.CURRENT_VERSION, logDir.toString())
     
         try {
    -      open(new File(path, "listing.ldb"), metadata)
    +      open(dbPath, metadata)
         } catch {
           // If there's an error, remove the listing database and any existing UI database
           // from the store directory, since it's extremely likely that they'll all contain
           // incompatible information.
           case _: UnsupportedStoreVersionException | _: MetadataMismatchException =>
             logInfo("Detected incompatible DB versions, deleting...")
             path.listFiles().foreach(Utils.deleteRecursively)
    -        open(new File(path, "listing.ldb"), metadata)
    +        open(dbPath, metadata)
    +      case e: Exception =>
    +        // Get rid of the old data and re-create it. The store is either old or corrupted.
    +        logWarning(s"Failed to load disk store $path for app listing.", e)
    --- End diff --
    
    Is there a danger here that a transient error causes this to delete an otherwise valid file?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19786: [SPARK-22559][CORE]history server: handle exception on o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19786
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84078/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19786: [SPARK-22559][CORE]history server: handle exception on o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19786
  
    **[Test build #84098 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84098/testReport)** for PR 19786 at commit [`8cfa0b2`](https://github.com/apache/spark/commit/8cfa0b23cdf44c37452835beee88e1eb4e7f4cfc).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19786: [SPARK-22559][CORE]history server: handle exception on o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19786
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84109/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19786: [SPARK-22559][CORE]history server: handle exception on o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19786
  
    **[Test build #84109 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84109/testReport)** for PR 19786 at commit [`8cfa0b2`](https://github.com/apache/spark/commit/8cfa0b23cdf44c37452835beee88e1eb4e7f4cfc).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19786: [SPARK-22559][CORE]history server: handle excepti...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19786#discussion_r152322074
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
    @@ -132,15 +132,20 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
           AppStatusStore.CURRENT_VERSION, logDir.toString())
     
         try {
    -      open(new File(path, "listing.ldb"), metadata)
    +      open(dbPath, metadata)
         } catch {
           // If there's an error, remove the listing database and any existing UI database
           // from the store directory, since it's extremely likely that they'll all contain
           // incompatible information.
           case _: UnsupportedStoreVersionException | _: MetadataMismatchException =>
             logInfo("Detected incompatible DB versions, deleting...")
             path.listFiles().foreach(Utils.deleteRecursively)
    -        open(new File(path, "listing.ldb"), metadata)
    +        open(dbPath, metadata)
    +      case e: Exception =>
    +        // Get rid of the old data and re-create it. The store is either old or corrupted.
    --- End diff --
    
    `is either old`, I think the old version problems is handled by the previous `case`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19786: [SPARK-22559][CORE]history server: handle exception on o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19786
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19786: [SPARK-22559][CORE]history server: handle exception on o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19786
  
    **[Test build #84021 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84021/testReport)** for PR 19786 at commit [`054cd7e`](https://github.com/apache/spark/commit/054cd7e208c908aa50cc8030b15feee9d2ce71e9).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19786: [SPARK-22559][CORE]history server: handle exception on o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19786
  
    **[Test build #84098 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84098/testReport)** for PR 19786 at commit [`8cfa0b2`](https://github.com/apache/spark/commit/8cfa0b23cdf44c37452835beee88e1eb4e7f4cfc).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19786: [SPARK-22559][CORE]history server: handle exception on o...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on the issue:

    https://github.com/apache/spark/pull/19786
  
    ping @cloud-fan


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19786: [SPARK-22559][CORE]history server: handle exception on o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19786
  
    **[Test build #84109 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84109/testReport)** for PR 19786 at commit [`8cfa0b2`](https://github.com/apache/spark/commit/8cfa0b23cdf44c37452835beee88e1eb4e7f4cfc).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19786: [SPARK-22559][CORE]history server: handle exception on o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19786
  
    **[Test build #84078 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84078/testReport)** for PR 19786 at commit [`8cfa0b2`](https://github.com/apache/spark/commit/8cfa0b23cdf44c37452835beee88e1eb4e7f4cfc).
     * This patch **fails from timeout after a configured wait of \`250m\`**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19786: [SPARK-22559][CORE]history server: handle excepti...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19786#discussion_r152037332
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
    @@ -132,15 +132,20 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
           AppStatusStore.CURRENT_VERSION, logDir.toString())
     
         try {
    -      open(new File(path, "listing.ldb"), metadata)
    +      open(dbPath, metadata)
         } catch {
           // If there's an error, remove the listing database and any existing UI database
           // from the store directory, since it's extremely likely that they'll all contain
           // incompatible information.
           case _: UnsupportedStoreVersionException | _: MetadataMismatchException =>
             logInfo("Detected incompatible DB versions, deleting...")
             path.listFiles().foreach(Utils.deleteRecursively)
    -        open(new File(path, "listing.ldb"), metadata)
    +        open(dbPath, metadata)
    +      case e: Exception =>
    +        // Get rid of the old data and re-create it. The store is either old or corrupted.
    +        logWarning(s"Failed to load disk store $path for app listing.", e)
    --- End diff --
    
    Shall we merge it into the previous case?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19786: [SPARK-22559][CORE]history server: handle excepti...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19786#discussion_r152039449
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
    @@ -132,15 +132,20 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
           AppStatusStore.CURRENT_VERSION, logDir.toString())
     
         try {
    -      open(new File(path, "listing.ldb"), metadata)
    +      open(dbPath, metadata)
         } catch {
           // If there's an error, remove the listing database and any existing UI database
           // from the store directory, since it's extremely likely that they'll all contain
           // incompatible information.
           case _: UnsupportedStoreVersionException | _: MetadataMismatchException =>
             logInfo("Detected incompatible DB versions, deleting...")
             path.listFiles().foreach(Utils.deleteRecursively)
    -        open(new File(path, "listing.ldb"), metadata)
    +        open(dbPath, metadata)
    +      case e: Exception =>
    +        // Get rid of the old data and re-create it. The store is either old or corrupted.
    +        logWarning(s"Failed to load disk store $path for app listing.", e)
    --- End diff --
    
    I see, shall we catch `DBException` instead of the general `Exception`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19786: [SPARK-22559][CORE]history server: handle excepti...

Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19786#discussion_r152038882
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
    @@ -132,15 +132,20 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
           AppStatusStore.CURRENT_VERSION, logDir.toString())
     
         try {
    -      open(new File(path, "listing.ldb"), metadata)
    +      open(dbPath, metadata)
         } catch {
           // If there's an error, remove the listing database and any existing UI database
           // from the store directory, since it's extremely likely that they'll all contain
           // incompatible information.
           case _: UnsupportedStoreVersionException | _: MetadataMismatchException =>
             logInfo("Detected incompatible DB versions, deleting...")
             path.listFiles().foreach(Utils.deleteRecursively)
    -        open(new File(path, "listing.ldb"), metadata)
    +        open(dbPath, metadata)
    +      case e: Exception =>
    +        // Get rid of the old data and re-create it. The store is either old or corrupted.
    +        logWarning(s"Failed to load disk store $path for app listing.", e)
    --- End diff --
    
    @cloud-fan No. 
    Currently history server writes to a leveldb file for each application(if its page is accessed). And there will be one db called listing.ldb which stores the listing.
    If the version is not matched, then all the db files should be deleted. 
    In this PR, if only the listing db is corrupted, then just delete the listing db and recreate it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19786: [SPARK-22559][CORE]history server: handle excepti...

Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19786#discussion_r152347531
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
    @@ -34,10 +34,10 @@ import org.apache.hadoop.fs.permission.FsAction
     import org.apache.hadoop.hdfs.DistributedFileSystem
     import org.apache.hadoop.hdfs.protocol.HdfsConstants
     import org.apache.hadoop.security.AccessControlException
    +import org.fusesource.leveldbjni.internal.NativeDB
     
     import org.apache.spark.{SecurityManager, SparkConf, SparkException}
     import org.apache.spark.deploy.SparkHadoopUtil
    -import org.apache.spark.deploy.history.config._
    --- End diff --
    
    this import is not used.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19786: [SPARK-22559][CORE]history server: handle excepti...

Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19786#discussion_r152338077
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
    @@ -132,15 +132,20 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
           AppStatusStore.CURRENT_VERSION, logDir.toString())
     
         try {
    -      open(new File(path, "listing.ldb"), metadata)
    +      open(dbPath, metadata)
         } catch {
           // If there's an error, remove the listing database and any existing UI database
           // from the store directory, since it's extremely likely that they'll all contain
           // incompatible information.
           case _: UnsupportedStoreVersionException | _: MetadataMismatchException =>
             logInfo("Detected incompatible DB versions, deleting...")
             path.listFiles().foreach(Utils.deleteRecursively)
    -        open(new File(path, "listing.ldb"), metadata)
    +        open(dbPath, metadata)
    +      case e: Exception =>
    +        // Get rid of the old data and re-create it. The store is either old or corrupted.
    --- End diff --
    
    Well, the comment is copied. I think Vanzin means compatibility with older version of Leveldb release.
    The previous case is about the version of how history server write into Leveldb.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19786: [SPARK-22559][CORE]history server: handle exception on o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19786
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19786: [SPARK-22559][CORE]history server: handle excepti...

Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19786#discussion_r152345297
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
    @@ -570,7 +575,6 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock)
         }
     
         val logPath = fileStatus.getPath()
    -    logInfo(s"Replaying log path: $logPath")
    --- End diff --
    
    I remove this line because in the function call `replay` below, there is another duplicated log "Replaying log path: ..."


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19786: [SPARK-22559][CORE]history server: handle exception on o...

Posted by gengliangwang <gi...@git.apache.org>.
Github user gengliangwang commented on the issue:

    https://github.com/apache/spark/pull/19786
  
    retest this please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19786: [SPARK-22559][CORE]history server: handle exception on o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19786
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84098/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org