You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/12/07 18:52:21 UTC

[GitHub] [spark] thejdeep opened a new pull request #34829: [WIP][SPARK-23607][CORE] Use HDFS extended attributes to store application summary information in SHS

thejdeep opened a new pull request #34829:
URL: https://github.com/apache/spark/pull/34829


    ### What changes were proposed in this pull request?
   
    This PR seeks to improve the performance of serving the application list in History Server by storing the required information of the application as part of HDFS extended attributes instead of parsing the log file each time.
   
    ### Why are the changes needed?
   
    Improves the performance of the History Server listing page
   
    ### Does this PR introduce _any_ user-facing change?
   
    No.
   
    ### How was this patch tested?
    Will add unit tests


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] thejdeep commented on pull request #34829: [WIP][SPARK-23607][CORE] Use HDFS extended attributes to store application summary information in SHS

Posted by GitBox <gi...@apache.org>.
thejdeep commented on pull request #34829:
URL: https://github.com/apache/spark/pull/34829#issuecomment-1055752373


   @LuciferYang Thanks for your comment. Updated the description with performance numbers comparing with extended attributes disabled vs enabled.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a change in pull request #34829: [WIP][SPARK-23607][CORE] Use HDFS extended attributes to store application summary information in SHS

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on a change in pull request #34829:
URL: https://github.com/apache/spark/pull/34829#discussion_r764527144



##########
File path: core/src/main/scala/org/apache/spark/deploy/history/EventLogFileWriters.scala
##########
@@ -229,6 +232,15 @@ class SingleEventLogFileWriter(
     writeLine(eventJson, flushLogger)
   }
 
+  override def writeToXAttr(attrName: String, attrValue: String): Unit = {
+    try {
+      fileSystem.setXAttr(new Path(inProgressPath), attrName, attrValue.getBytes())
+    } catch {
+      case _: IOException =>

Review comment:
       We may also need to handle `UnsupportedOperationException` , some fs do not really implement `setXAttr` interface, such as `o.a.hadoop.fs.RawLocalFileSystem`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34829: [WIP][SPARK-23607][CORE] Use HDFS extended attributes to store application summary information in SHS

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34829:
URL: https://github.com/apache/spark/pull/34829#issuecomment-988185021


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #34829: [WIP][SPARK-23607][CORE] Use HDFS extended attributes to store application summary information in SHS

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on pull request #34829:
URL: https://github.com/apache/spark/pull/34829#issuecomment-988499023


   > Improves the performance of the History Server listing page
   
   Can you show the performance comparison before and after this pr?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org