You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/09/23 17:04:58 UTC

[GitHub] [spark] thejdeep opened a new pull request #34079: [SPARK-36834][SHUFFLE] Add support for namespacing log lines emitted by ESS

thejdeep opened a new pull request #34079:
URL: https://github.com/apache/spark/pull/34079


   ### What changes were proposed in this pull request?
   Added a config `spark.yarn.shuffle.service.logs.namespace` which can be used to add a namespace suffix to log lines emitted by the External Shuffle Service.
   
   ### Why are the changes needed?
   Since many instances of ESS can be running on the same NM, it would be easier to distinguish between them.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   N/A


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] thejdeep commented on a change in pull request #34079: [SPARK-36834][SHUFFLE] Add support for namespacing log lines emitted by external shuffle service

Posted by GitBox <gi...@apache.org>.
thejdeep commented on a change in pull request #34079:
URL: https://github.com/apache/spark/pull/34079#discussion_r715053724



##########
File path: common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
##########
@@ -284,7 +293,9 @@ static MergedShuffleFileManager newMergedShuffleFileManagerInstance(TransportCon
       // will also need the transport configuration.
       return mergeManagerSubClazz.getConstructor(TransportConf.class).newInstance(conf);
     } catch (Exception e) {
-      logger.error("Unable to create an instance of {}", mergeManagerImplClassName);
+      if(logger != null) {
+        logger.error("Unable to create an instance of {}", mergeManagerImplClassName);
+      }

Review comment:
       This can be null we use this for running a few tests in the `YarnShuffleServiceSuite`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a change in pull request #34079: [SPARK-36834][SHUFFLE] Add support for namespacing log lines emitted by external shuffle service

Posted by GitBox <gi...@apache.org>.
mridulm commented on a change in pull request #34079:
URL: https://github.com/apache/spark/pull/34079#discussion_r715325832



##########
File path: common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
##########
@@ -284,7 +293,6 @@ static MergedShuffleFileManager newMergedShuffleFileManagerInstance(TransportCon
       // will also need the transport configuration.
       return mergeManagerSubClazz.getConstructor(TransportConf.class).newInstance(conf);
     } catch (Exception e) {
-      logger.error("Unable to create an instance of {}", mergeManagerImplClassName);

Review comment:
       I agree with @tgravescs, dropping the log message (particularly an error) would be missing out on very useful debugging information.
   One option would be to move it out of this class into some other util ?
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on pull request #34079: [SPARK-36834][SHUFFLE] Add support for namespacing log lines emitted by ESS

Posted by GitBox <gi...@apache.org>.
mridulm commented on pull request #34079:
URL: https://github.com/apache/spark/pull/34079#issuecomment-926014324


   +CC @tgravescs, @Ngone51 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] thejdeep commented on a change in pull request #34079: [SPARK-36834][SHUFFLE] Add support for namespacing log lines emitted by external shuffle service

Posted by GitBox <gi...@apache.org>.
thejdeep commented on a change in pull request #34079:
URL: https://github.com/apache/spark/pull/34079#discussion_r727299312



##########
File path: docs/running-on-yarn.md
##########
@@ -806,6 +806,17 @@ The following extra configuration options are available when the shuffle service
     NodeManager.
   </td>
 </tr>
+<tr>
+  <td><code>spark.yarn.shuffle.service.logs.namespace</code></td>
+  <td><code>(not set)</code></td>
+  <td>
+    A namespace which will be appended to the class name when forming the logger name to use for
+        emitting logs from the YARN shuffle service, like
+        <code>org.apache.spark.network.yarn.YarnShuffleService.logsNamespaceValue</code>. Since some logging frameworks
+        may expect the logger name to look like a class name, it's generally recommended to provide a value which
+        would be a valid Java package or class name and not include spaces.

Review comment:
       Thanks, updated it and fixed the indentation.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] thejdeep commented on a change in pull request #34079: [SPARK-36834][SHUFFLE] Add support for namespacing log lines emitted by external shuffle service

Posted by GitBox <gi...@apache.org>.
thejdeep commented on a change in pull request #34079:
URL: https://github.com/apache/spark/pull/34079#discussion_r715237267



##########
File path: common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
##########
@@ -284,7 +293,6 @@ static MergedShuffleFileManager newMergedShuffleFileManagerInstance(TransportCon
       // will also need the transport configuration.
       return mergeManagerSubClazz.getConstructor(TransportConf.class).newInstance(conf);
     } catch (Exception e) {
-      logger.error("Unable to create an instance of {}", mergeManagerImplClassName);

Review comment:
       Cannot use the logger inside this static context. Therefore, ended up removing it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] thejdeep commented on a change in pull request #34079: [SPARK-36834][SHUFFLE] Add support for namespacing log lines emitted by external shuffle service

Posted by GitBox <gi...@apache.org>.
thejdeep commented on a change in pull request #34079:
URL: https://github.com/apache/spark/pull/34079#discussion_r715053133



##########
File path: common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
##########
@@ -93,7 +93,7 @@
  * This {@code classpath} configuration is only supported on YARN versions >= 2.9.0.
  */
 public class YarnShuffleService extends AuxiliaryService {
-  private static final Logger logger = LoggerFactory.getLogger(YarnShuffleService.class);
+  private static final Logger logger = null;

Review comment:
       Was a bad merge on my end, thanks. Fixed it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tgravescs commented on a change in pull request #34079: [SPARK-36834][SHUFFLE] Add support for namespacing log lines emitted by external shuffle service

Posted by GitBox <gi...@apache.org>.
tgravescs commented on a change in pull request #34079:
URL: https://github.com/apache/spark/pull/34079#discussion_r715206083



##########
File path: common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
##########
@@ -284,7 +293,6 @@ static MergedShuffleFileManager newMergedShuffleFileManagerInstance(TransportCon
       // will also need the transport configuration.
       return mergeManagerSubClazz.getConstructor(TransportConf.class).newInstance(conf);
     } catch (Exception e) {
-      logger.error("Unable to create an instance of {}", mergeManagerImplClassName);

Review comment:
       does something else print a useful error message?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] thejdeep commented on pull request #34079: [SPARK-36834][SHUFFLE] Add support for namespacing log lines emitted by external shuffle service

Posted by GitBox <gi...@apache.org>.
thejdeep commented on pull request #34079:
URL: https://github.com/apache/spark/pull/34079#issuecomment-933914295


   @tgravescs @mridulm @xkrogen Addressed the comments, please review. Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] thejdeep commented on a change in pull request #34079: [SPARK-36834][SHUFFLE] Add support for namespacing log lines emitted by external shuffle service

Posted by GitBox <gi...@apache.org>.
thejdeep commented on a change in pull request #34079:
URL: https://github.com/apache/spark/pull/34079#discussion_r721768729



##########
File path: common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
##########
@@ -199,11 +206,24 @@ protected void serviceInit(Configuration externalConf) throws Exception {
     _conf = new Configuration(externalConf);
     URL confOverlayUrl = Thread.currentThread().getContextClassLoader()
         .getResource(SHUFFLE_SERVICE_CONF_OVERLAY_RESOURCE_NAME);
+    String className = YarnShuffleService.class.getName();
     if (confOverlayUrl != null) {
-      logger.info("Initializing Spark YARN shuffle service with configuration overlay from {}",
-          confOverlayUrl);
       _conf.addResource(confOverlayUrl);
     }
+    String logsNamespace = _conf.get(SPARK_SHUFFLE_SERVICE_LOGS_NAMESPACE_KEY, "");
+    String loggerName = logsNamespace.isEmpty() ? className : className + "." + logsNamespace;
+    logger = LoggerFactory.getLogger(loggerName);

Review comment:
       Thanks, this is much cleaner.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xkrogen commented on a change in pull request #34079: [SPARK-36834][SHUFFLE] Add support for namespacing log lines emitted by external shuffle service

Posted by GitBox <gi...@apache.org>.
xkrogen commented on a change in pull request #34079:
URL: https://github.com/apache/spark/pull/34079#discussion_r715786353



##########
File path: common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
##########
@@ -199,11 +204,15 @@ protected void serviceInit(Configuration externalConf) throws Exception {
     _conf = new Configuration(externalConf);
     URL confOverlayUrl = Thread.currentThread().getContextClassLoader()
         .getResource(SHUFFLE_SERVICE_CONF_OVERLAY_RESOURCE_NAME);
+    String className = YarnShuffleService.class.getName();
     if (confOverlayUrl != null) {
-      logger.info("Initializing Spark YARN shuffle service with configuration overlay from {}",
-          confOverlayUrl);
       _conf.addResource(confOverlayUrl);
     }
+    String logsNamespace = _conf.get(SPARK_SHUFFLE_SERVICE_LOGS_NAMESPACE_KEY, "");
+    String loggerName = logsNamespace.isEmpty() ? className : className + " " + logsNamespace;
+    logger = LoggerFactory.getLogger(loggerName);
+    logger.info("Initialized Spark YARN shuffle service with configuration overlay from {}",

Review comment:
       We need a different message if `confOverlayUrl` is null, right? Maybe
   ```
   if (confOverlayUrl == null) {
     logger.info("Initialized Spark YARN shuffle service using base configuration");
   } else {
     logger.info(...); <- current logic here
   }
   ```

##########
File path: docs/running-on-yarn.md
##########
@@ -783,6 +783,14 @@ The following extra configuration options are available when the shuffle service
     NodeManager.
   </td>
 </tr>
+<tr>
+  <td><code>spark.yarn.shuffle.service.logs.namespace</code></td>
+  <td><code>(not set)</code></td>
+  <td>
+    A namespace which will be appended to the class name when forming the logger name to use for 
+    emitting logs from the YARN shuffle service.
+  </td>

Review comment:
       Once we finalize on how we add the namespace to the logger name (per the comment I opened), we should provide an example here of what the final name will look like

##########
File path: common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
##########
@@ -199,11 +204,15 @@ protected void serviceInit(Configuration externalConf) throws Exception {
     _conf = new Configuration(externalConf);
     URL confOverlayUrl = Thread.currentThread().getContextClassLoader()
         .getResource(SHUFFLE_SERVICE_CONF_OVERLAY_RESOURCE_NAME);
+    String className = YarnShuffleService.class.getName();
     if (confOverlayUrl != null) {
-      logger.info("Initializing Spark YARN shuffle service with configuration overlay from {}",
-          confOverlayUrl);
       _conf.addResource(confOverlayUrl);
     }
+    String logsNamespace = _conf.get(SPARK_SHUFFLE_SERVICE_LOGS_NAMESPACE_KEY, "");
+    String loggerName = logsNamespace.isEmpty() ? className : className + " " + logsNamespace;

Review comment:
       Maybe we can use something slightly more structured than a space
   
   Right now assuming the namespace is "version311" we'll have something like:
   ```
   INFO network.yarn.YarnShuffleService version311 Initialized Spark YARN shuffle ...
   ```
   Maybe this would look better:
   ```
   INFO network.yarn.YarnShuffleService [version311] Initialized Spark YARN shuffle ...
   ```
   Or, since we're considering it part of the name and since frameworks may assume the name looks like a class,
   ```
   INFO network.yarn.YarnShuffleService.version311 Initialized Spark YARN shuffle ...
   ```
   (if we go with the last option because we think following the class-name-like convention is valuable, then we should probably log a warning if the namespace contains spaces)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] thejdeep commented on pull request #34079: [SPARK-36834][SHUFFLE] Add support for namespacing log lines emitted by external shuffle service

Posted by GitBox <gi...@apache.org>.
thejdeep commented on pull request #34079:
URL: https://github.com/apache/spark/pull/34079#issuecomment-941853085


   @mridulm @tgravescs Please take a look, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] thejdeep commented on a change in pull request #34079: [SPARK-36834][SHUFFLE] Add support for namespacing log lines emitted by external shuffle service

Posted by GitBox <gi...@apache.org>.
thejdeep commented on a change in pull request #34079:
URL: https://github.com/apache/spark/pull/34079#discussion_r727299312



##########
File path: docs/running-on-yarn.md
##########
@@ -806,6 +806,17 @@ The following extra configuration options are available when the shuffle service
     NodeManager.
   </td>
 </tr>
+<tr>
+  <td><code>spark.yarn.shuffle.service.logs.namespace</code></td>
+  <td><code>(not set)</code></td>
+  <td>
+    A namespace which will be appended to the class name when forming the logger name to use for
+        emitting logs from the YARN shuffle service, like
+        <code>org.apache.spark.network.yarn.YarnShuffleService.logsNamespaceValue</code>. Since some logging frameworks
+        may expect the logger name to look like a class name, it's generally recommended to provide a value which
+        would be a valid Java package or class name and not include spaces.

Review comment:
       Thanks, updated it and fixed the indentation.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] thejdeep commented on pull request #34079: [SPARK-36834][SHUFFLE] Add support for namespacing log lines emitted by ESS

Posted by GitBox <gi...@apache.org>.
thejdeep commented on pull request #34079:
URL: https://github.com/apache/spark/pull/34079#issuecomment-925997104


   cc : @xkrogen @mridulm 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] asfgit closed pull request #34079: [SPARK-36834][SHUFFLE] Add support for namespacing log lines emitted by external shuffle service

Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #34079:
URL: https://github.com/apache/spark/pull/34079


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xkrogen commented on a change in pull request #34079: [SPARK-36834][SHUFFLE] Add support for namespacing log lines emitted by external shuffle service

Posted by GitBox <gi...@apache.org>.
xkrogen commented on a change in pull request #34079:
URL: https://github.com/apache/spark/pull/34079#discussion_r715782420



##########
File path: common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
##########
@@ -284,7 +293,6 @@ static MergedShuffleFileManager newMergedShuffleFileManagerInstance(TransportCon
       // will also need the transport configuration.
       return mergeManagerSubClazz.getConstructor(TransportConf.class).newInstance(conf);
     } catch (Exception e) {
-      logger.error("Unable to create an instance of {}", mergeManagerImplClassName);

Review comment:
       Perhaps we need to create a static logger (w/o prefix) to fall back to, in the case that we're unable to access the prefixed instance logger?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a change in pull request #34079: [SPARK-36834][SHUFFLE] Add support for namespacing log lines emitted by external shuffle service

Posted by GitBox <gi...@apache.org>.
mridulm commented on a change in pull request #34079:
URL: https://github.com/apache/spark/pull/34079#discussion_r722447922



##########
File path: common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
##########
@@ -93,7 +93,7 @@
  * This {@code classpath} configuration is only supported on YARN versions >= 2.9.0.
  */
 public class YarnShuffleService extends AuxiliaryService {
-  private static final Logger logger = LoggerFactory.getLogger(YarnShuffleService.class);
+  private static Logger logger = LoggerFactory.getLogger(YarnShuffleService.class);

Review comment:
       To clarify, this should be `final` - the `defaultLogger` from earlier PR.
   Add a new `private Logger logger` - which is initialized to `defaultLogger` by default; and overridden in `serviceInit` as required.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] thejdeep commented on a change in pull request #34079: [SPARK-36834][SHUFFLE] Add support for namespacing log lines emitted by external shuffle service

Posted by GitBox <gi...@apache.org>.
thejdeep commented on a change in pull request #34079:
URL: https://github.com/apache/spark/pull/34079#discussion_r721478834



##########
File path: common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
##########
@@ -199,11 +204,15 @@ protected void serviceInit(Configuration externalConf) throws Exception {
     _conf = new Configuration(externalConf);
     URL confOverlayUrl = Thread.currentThread().getContextClassLoader()
         .getResource(SHUFFLE_SERVICE_CONF_OVERLAY_RESOURCE_NAME);
+    String className = YarnShuffleService.class.getName();
     if (confOverlayUrl != null) {
-      logger.info("Initializing Spark YARN shuffle service with configuration overlay from {}",
-          confOverlayUrl);
       _conf.addResource(confOverlayUrl);
     }
+    String logsNamespace = _conf.get(SPARK_SHUFFLE_SERVICE_LOGS_NAMESPACE_KEY, "");
+    String loggerName = logsNamespace.isEmpty() ? className : className + " " + logsNamespace;
+    logger = LoggerFactory.getLogger(loggerName);
+    logger.info("Initialized Spark YARN shuffle service with configuration overlay from {}",

Review comment:
       Added a default message, thanks




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a change in pull request #34079: [SPARK-36834][SHUFFLE] Add support for namespacing log lines emitted by external shuffle service

Posted by GitBox <gi...@apache.org>.
mridulm commented on a change in pull request #34079:
URL: https://github.com/apache/spark/pull/34079#discussion_r721724785



##########
File path: common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
##########
@@ -199,11 +206,24 @@ protected void serviceInit(Configuration externalConf) throws Exception {
     _conf = new Configuration(externalConf);
     URL confOverlayUrl = Thread.currentThread().getContextClassLoader()
         .getResource(SHUFFLE_SERVICE_CONF_OVERLAY_RESOURCE_NAME);
+    String className = YarnShuffleService.class.getName();
     if (confOverlayUrl != null) {
-      logger.info("Initializing Spark YARN shuffle service with configuration overlay from {}",
-          confOverlayUrl);
       _conf.addResource(confOverlayUrl);
     }
+    String logsNamespace = _conf.get(SPARK_SHUFFLE_SERVICE_LOGS_NAMESPACE_KEY, "");
+    String loggerName = logsNamespace.isEmpty() ? className : className + "." + logsNamespace;
+    logger = LoggerFactory.getLogger(loggerName);

Review comment:
       Initialize `logger` to `staticLogger` and only create a custom logger in case `logsNamespace != ""` ?
   
   ```suggestion
       String logsNamespace = _conf.get(SPARK_SHUFFLE_SERVICE_LOGS_NAMESPACE_KEY, "");
       if (!logsNamespace.isEmpty()) {
         String loggerName = className + "." + logsNamespace;
         logger = LoggerFactory.getLogger(loggerName);
       }
   ```
   

##########
File path: common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
##########
@@ -199,11 +206,24 @@ protected void serviceInit(Configuration externalConf) throws Exception {
     _conf = new Configuration(externalConf);
     URL confOverlayUrl = Thread.currentThread().getContextClassLoader()
         .getResource(SHUFFLE_SERVICE_CONF_OVERLAY_RESOURCE_NAME);
+    String className = YarnShuffleService.class.getName();
     if (confOverlayUrl != null) {
-      logger.info("Initializing Spark YARN shuffle service with configuration overlay from {}",
-          confOverlayUrl);

Review comment:
       Add this back

##########
File path: common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
##########
@@ -93,7 +93,8 @@
  * This {@code classpath} configuration is only supported on YARN versions >= 2.9.0.
  */
 public class YarnShuffleService extends AuxiliaryService {
-  private static final Logger logger = LoggerFactory.getLogger(YarnShuffleService.class);
+  private static final Logger staticLogger = LoggerFactory.getLogger(YarnShuffleService.class);
+  private Logger logger = null;

Review comment:
       ```suggestion
     private Logger logger = staticLogger;
   ```
   
   And use logger for all non-static methods.
   (See comment below in `serviceInit`).

##########
File path: common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
##########
@@ -178,7 +185,7 @@ public YarnShuffleService() {
     // because at this point in instantiation there is no Configuration object; it is not passed
     // until `serviceInit` is called, at which point it's too late to adjust the name.
     super("spark_shuffle");
-    logger.info("Initializing YARN shuffle service for Spark");
+    staticLogger.info("Initializing YARN shuffle service for Spark");

Review comment:
       We can revert this

##########
File path: common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
##########
@@ -199,11 +206,24 @@ protected void serviceInit(Configuration externalConf) throws Exception {
     _conf = new Configuration(externalConf);
     URL confOverlayUrl = Thread.currentThread().getContextClassLoader()
         .getResource(SHUFFLE_SERVICE_CONF_OVERLAY_RESOURCE_NAME);
+    String className = YarnShuffleService.class.getName();
     if (confOverlayUrl != null) {
-      logger.info("Initializing Spark YARN shuffle service with configuration overlay from {}",
-          confOverlayUrl);
       _conf.addResource(confOverlayUrl);
     }
+    String logsNamespace = _conf.get(SPARK_SHUFFLE_SERVICE_LOGS_NAMESPACE_KEY, "");
+    String loggerName = logsNamespace.isEmpty() ? className : className + "." + logsNamespace;
+    logger = LoggerFactory.getLogger(loggerName);
+    if (loggerName.contains(" ")) {
+      logger.warn("Logging namespace contains a space which might " +
+              "not be parseable by some frameworks");
+    }
+    if (confOverlayUrl != null) {
+      logger.info("Initialized Spark YARN shuffle service with configuration overlay from {}",
+              confOverlayUrl);
+    } else {
+      logger.info("Initialized Spark YARN shuffle service with base configuration");
+    }

Review comment:
       We can remove this, and keep the original code.
   
   ```suggestion
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tgravescs commented on pull request #34079: [SPARK-36834][SHUFFLE] Add support for namespacing log lines emitted by external shuffle service

Posted by GitBox <gi...@apache.org>.
tgravescs commented on pull request #34079:
URL: https://github.com/apache/spark/pull/34079#issuecomment-926217624


   conceptually looks fine to me too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xkrogen commented on a change in pull request #34079: [SPARK-36834][SHUFFLE] Add support for namespacing log lines emitted by external shuffle service

Posted by GitBox <gi...@apache.org>.
xkrogen commented on a change in pull request #34079:
URL: https://github.com/apache/spark/pull/34079#discussion_r727279084



##########
File path: docs/running-on-yarn.md
##########
@@ -806,6 +806,17 @@ The following extra configuration options are available when the shuffle service
     NodeManager.
   </td>
 </tr>
+<tr>
+  <td><code>spark.yarn.shuffle.service.logs.namespace</code></td>
+  <td><code>(not set)</code></td>
+  <td>
+    A namespace which will be appended to the class name when forming the logger name to use for
+        emitting logs from the YARN shuffle service, like
+        <code>org.apache.spark.network.yarn.YarnShuffleService.logsNamespaceValue</code>. Since some logging frameworks
+        may expect the logger name to look like a class name, it's generally recommended to provide a value which
+        would be a valid Java package or class name and not include spaces.

Review comment:
       based on other examples in this file, it looks like the indentation for these should be aligned with the first line




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xkrogen commented on a change in pull request #34079: [SPARK-36834][SHUFFLE] Add support for namespacing log lines emitted by external shuffle service

Posted by GitBox <gi...@apache.org>.
xkrogen commented on a change in pull request #34079:
URL: https://github.com/apache/spark/pull/34079#discussion_r722403290



##########
File path: common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
##########
@@ -199,11 +205,22 @@ protected void serviceInit(Configuration externalConf) throws Exception {
     _conf = new Configuration(externalConf);
     URL confOverlayUrl = Thread.currentThread().getContextClassLoader()
         .getResource(SHUFFLE_SERVICE_CONF_OVERLAY_RESOURCE_NAME);
+    String className = YarnShuffleService.class.getName();
     if (confOverlayUrl != null) {
       logger.info("Initializing Spark YARN shuffle service with configuration overlay from {}",
           confOverlayUrl);
       _conf.addResource(confOverlayUrl);
     }
+
+    String logsNamespace = _conf.get(SPARK_SHUFFLE_SERVICE_LOGS_NAMESPACE_KEY, "");
+    if (!logsNamespace.isEmpty()) {
+      logger = LoggerFactory.getLogger(className + "." + logsNamespace);
+    }
+    if (logsNamespace.contains(" ")) {
+      logger.warn("Logging namespace contains a space which might " +
+              "not be parseable by some frameworks");

Review comment:
       I think we can make the message a bit more actionable by including the config name (`SPARK_SHUFFLE_SERVICE_LOGS_NAMESPACE_KEY`) and explaining the suggested format ("... It is suggested to configure this to a simple string that looks like a Java package or class name to conform to standard logger name format")

##########
File path: common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
##########
@@ -199,11 +205,22 @@ protected void serviceInit(Configuration externalConf) throws Exception {
     _conf = new Configuration(externalConf);
     URL confOverlayUrl = Thread.currentThread().getContextClassLoader()
         .getResource(SHUFFLE_SERVICE_CONF_OVERLAY_RESOURCE_NAME);
+    String className = YarnShuffleService.class.getName();
     if (confOverlayUrl != null) {
       logger.info("Initializing Spark YARN shuffle service with configuration overlay from {}",
           confOverlayUrl);
       _conf.addResource(confOverlayUrl);
     }
+
+    String logsNamespace = _conf.get(SPARK_SHUFFLE_SERVICE_LOGS_NAMESPACE_KEY, "");
+    if (!logsNamespace.isEmpty()) {
+      logger = LoggerFactory.getLogger(className + "." + logsNamespace);
+    }
+    if (logsNamespace.contains(" ")) {
+      logger.warn("Logging namespace contains a space which might " +
+              "not be parseable by some frameworks");

Review comment:
       Actually while thinking about this further, maybe it doesn't make sense to log a warning. Users may not be leveraging any framework which expects logger names to look like class names, and then they will get an unnecessary warning. Perhaps we can just rely on the note in the config documentation which says to be careful about using spaces. @mridulm what do you think?

##########
File path: docs/running-on-yarn.md
##########
@@ -793,6 +793,15 @@ The following extra configuration options are available when the shuffle service
     NodeManager.
   </td>
 </tr>
+<tr>
+  <td><code>spark.yarn.shuffle.service.logs.namespace</code></td>
+  <td><code>(not set)</code></td>
+  <td>
+    A namespace which will be appended to the class name when forming the logger name to use for
+    emitting logs from the YARN shuffle service. This would throw a warning if the namespace value
+    provided contains a space, for example, "namespace test".

Review comment:
       Similar to my comment for the warning message, can we do more to explain why we encourage these naming conventions? Maybe:
   ```
       A namespace which will be appended to the class name when forming the logger name to use for
       emitting logs from the YARN shuffle service, like
       `org.apache.spark.network.yarn.YarnShuffleService.logsNamespaceValue`. Since some logging frameworks
       may expect the logger name to look like a class name, it's generally recommended to provide a value which
       would be a valid Java package or class name and not include spaces.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a change in pull request #34079: [SPARK-36834][SHUFFLE] Add support for namespacing log lines emitted by external shuffle service

Posted by GitBox <gi...@apache.org>.
mridulm commented on a change in pull request #34079:
URL: https://github.com/apache/spark/pull/34079#discussion_r715108450



##########
File path: common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
##########
@@ -284,7 +293,9 @@ static MergedShuffleFileManager newMergedShuffleFileManagerInstance(TransportCon
       // will also need the transport configuration.
       return mergeManagerSubClazz.getConstructor(TransportConf.class).newInstance(conf);
     } catch (Exception e) {
-      logger.error("Unable to create an instance of {}", mergeManagerImplClassName);
+      if(logger != null) {
+        logger.error("Unable to create an instance of {}", mergeManagerImplClassName);
+      }

Review comment:
       Oh wait, this is a static method - logger wont be available here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] thejdeep commented on pull request #34079: [SPARK-36834][SHUFFLE] Add support for namespacing log lines emitted by external shuffle service

Posted by GitBox <gi...@apache.org>.
thejdeep commented on pull request #34079:
URL: https://github.com/apache/spark/pull/34079#issuecomment-933602594


   Updated the PR with the changes to account for logging in static cases and using period as a delimiter instead of a space. Please review @xkrogen @mridulm Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] thejdeep commented on pull request #34079: [SPARK-36834][SHUFFLE] Add support for namespacing log lines emitted by external shuffle service

Posted by GitBox <gi...@apache.org>.
thejdeep commented on pull request #34079:
URL: https://github.com/apache/spark/pull/34079#issuecomment-941853085


   @mridulm @tgravescs Please take a look, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34079: [SPARK-36834][SHUFFLE] Add support for namespacing log lines emitted by ESS

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34079:
URL: https://github.com/apache/spark/pull/34079#issuecomment-925996863


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a change in pull request #34079: [SPARK-36834][SHUFFLE] Add support for namespacing log lines emitted by external shuffle service

Posted by GitBox <gi...@apache.org>.
mridulm commented on a change in pull request #34079:
URL: https://github.com/apache/spark/pull/34079#discussion_r728664139



##########
File path: common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
##########
@@ -93,7 +93,8 @@
  * This {@code classpath} configuration is only supported on YARN versions >= 2.9.0.
  */
 public class YarnShuffleService extends AuxiliaryService {
-  private static final Logger logger = LoggerFactory.getLogger(YarnShuffleService.class);
+  private static final Logger defaultLogger = LoggerFactory.getLogger(YarnShuffleService.class);
+  private static Logger logger = defaultLogger;

Review comment:
       This should be an instance variable, not `static`.
   ```suggestion
     private Logger logger = defaultLogger;
   ```

##########
File path: common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
##########
@@ -199,11 +206,18 @@ protected void serviceInit(Configuration externalConf) throws Exception {
     _conf = new Configuration(externalConf);
     URL confOverlayUrl = Thread.currentThread().getContextClassLoader()
         .getResource(SHUFFLE_SERVICE_CONF_OVERLAY_RESOURCE_NAME);
+    String className = YarnShuffleService.class.getName();

Review comment:
       Move this into `if (!logsNamespace.isEmpty()) ` ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] mridulm commented on a change in pull request #34079: [SPARK-36834][SHUFFLE] Add support for namespacing log lines emitted by ESS

Posted by GitBox <gi...@apache.org>.
mridulm commented on a change in pull request #34079:
URL: https://github.com/apache/spark/pull/34079#discussion_r715009324



##########
File path: common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
##########
@@ -93,7 +93,7 @@
  * This {@code classpath} configuration is only supported on YARN versions >= 2.9.0.
  */
 public class YarnShuffleService extends AuxiliaryService {
-  private static final Logger logger = LoggerFactory.getLogger(YarnShuffleService.class);
+  private static final Logger logger = null;

Review comment:
       This should not be `static` or `final` ?

##########
File path: common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java
##########
@@ -284,7 +293,9 @@ static MergedShuffleFileManager newMergedShuffleFileManagerInstance(TransportCon
       // will also need the transport configuration.
       return mergeManagerSubClazz.getConstructor(TransportConf.class).newInstance(conf);
     } catch (Exception e) {
-      logger.error("Unable to create an instance of {}", mergeManagerImplClassName);
+      if(logger != null) {
+        logger.error("Unable to create an instance of {}", mergeManagerImplClassName);
+      }

Review comment:
       When can this be `null` ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xkrogen commented on a change in pull request #34079: [SPARK-36834][SHUFFLE] Add support for namespacing log lines emitted by external shuffle service

Posted by GitBox <gi...@apache.org>.
xkrogen commented on a change in pull request #34079:
URL: https://github.com/apache/spark/pull/34079#discussion_r727279084



##########
File path: docs/running-on-yarn.md
##########
@@ -806,6 +806,17 @@ The following extra configuration options are available when the shuffle service
     NodeManager.
   </td>
 </tr>
+<tr>
+  <td><code>spark.yarn.shuffle.service.logs.namespace</code></td>
+  <td><code>(not set)</code></td>
+  <td>
+    A namespace which will be appended to the class name when forming the logger name to use for
+        emitting logs from the YARN shuffle service, like
+        <code>org.apache.spark.network.yarn.YarnShuffleService.logsNamespaceValue</code>. Since some logging frameworks
+        may expect the logger name to look like a class name, it's generally recommended to provide a value which
+        would be a valid Java package or class name and not include spaces.

Review comment:
       based on other examples in this file, it looks like the indentation for these should be aligned with the first line




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org