You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sedona.apache.org by GitBox <gi...@apache.org> on 2023/01/04 02:38:40 UTC

[GitHub] [incubator-sedona] Kimahriman commented on pull request #743: [SEDONA-228] Standardize logging modules

Kimahriman commented on PR #743:
URL: https://github.com/apache/incubator-sedona/pull/743#issuecomment-1370424624

   > @Kimahriman I don't think it is a good idea to include logging dependencies in Sedona. Since Spark and Flink already package their own versions of log4j or slf4j, including these dependencies by ourselves are likely to create conflicts in the future. This will introduce additional complexity to the project management and cause potential bugs.
   
   Nothing is added as a compile dependency, most of this is to standardize what framework gets used during the tests. For example, several of the test base classes have log4j 1.x code to reduce the logging level for various packages. I don't _think_ these do anything for current Flink or Spark versions, because those both use log4j 2.x as the logging implementation, but they include `log4j-1.2-api` as a dependency so the log4j 1.x code still compiles. I'm not sure what the limitations of this bridge API are.
   
   I mentioned in the description, but the real thing that needs to be done is to migrate all logging to SLF4J. Currently if Spark  or Flink dropped the log4j-1.2-api dependency, Sedona would no longer work I believe. This is kind of a step towards that, with two main parts:
   - The two test dependencies (`log4j-core` and `log4j-slf4j-impl`) are added to make sure all tests run with log4j 2.x, so that we can use a log4j2.properties file to configure them
   - `log4j-1.2-api` is added purely for older versions of Spark since we exclude log4j 1.x dependencies to make sure 2.x is used for the tests. But because this provides the log4j 1.x API that the Sedona code currently uses, it has to be a provided dependency and not a test dependency. If Sedona only used SLF4J for logging this wouldn't be necessary, as the 1.x to 2.x change would be seamless and we wouldn't need to exclude the SLF4J dependency from Spark.
   
   Also part of this is that currently the Flink module doesn't support any form of logging during the tests, because the Flink dependencies don't provide any of the logging modules like Spark does for the dependencies that are currently included.
   
   With this going forward, we could easily enable logging in the common module by adding the slf4j-api as a provided dependency. This would enable the code to compile, and then with the two test dependencies mentioned above, common module tests could properly output logs, and at runtime the slf4j-api would be provided by either Flink or Spark directly. Especially if/as more of the non-Spark related core and sql module code get ported to the common module.
   
   Alternative options to change less things but still work toward that goal:
   - Only include these dependencies for Flink so that the Flink tests can actually produce logs
   - Either don't worry about configuring logging for the Spark tests across all the versions, or also add a log4j.properties file to mirror the log4j2.properties file so both old and new versions of Spark would have similar settings applied (either for reducing verbosity of certain modules or do what I have done here which is output the unit test logs to a file instead of the console
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org