You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sedona.apache.org by GitBox <gi...@apache.org> on 2023/01/02 18:52:34 UTC

[GitHub] [incubator-sedona] Kimahriman opened a new pull request, #743: [SEDONA-228] Standardize logging modules

Kimahriman opened a new pull request, #743:
URL: https://github.com/apache/incubator-sedona/pull/743

   
   ## Did you read the Contributor Guide?
   
   - Yes, I have read [Contributor Rules](https://sedona.apache.org/community/rule/) and [Contributor Development Guide](https://sedona.apache.org/community/develop/)
   
   ## Is this PR related to a JIRA ticket?
   
   - Yes, the URL of the assoicated JIRA ticket is https://issues.apache.org/jira/browse/SEDONA-228. The PR name follows the format `[SEDONA-XXX] my subject`.
   
   ## What changes were proposed in this PR?
   Standardize on using log4j 2 for logging. Currently the logging modules are pulled in from Spark to use log4j 1 or 2 depending on the Spark version. The common module and flink module have no logging available to it. Both Flink and Spark use Log4j 2.x in their latest versions.
   
   Main points:
   - Add `log4j-1.2-api` as a provided dependency for all modules. Currently all the logging uses the log4j 1.x API, so this makes that compatible while still being able to run log4j 2.x (which is what actually happens at runtime with new flink and spark). Long term the logging should probably be switched to use SLF4J so the underlying implementation doesn't matter as much (this is also what flink and spark actually use I think)
   - Add `log4j-core` and `log4j-slf4j-impl` as test dependencies so they log4j 2.x gets used for logging for all tests
   - Add a log4j2.properties file as a test resource to all projects so logs get set to `target/unit-tests.log` to reduce the verbosity of tests when running locally and for finding what tests fail in the CI logs. This was copied from how the Spark repo does it.
   
   ## How was this patch tested?
   Just logging changes, existing tests.
   
   ## Did this PR include necessary documentation updates?
   
   - No, this PR does not affect any public API so no need to change the docs.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-sedona] Kimahriman commented on pull request #743: [SEDONA-228] Standardize logging modules

Posted by GitBox <gi...@apache.org>.
Kimahriman commented on PR #743:
URL: https://github.com/apache/incubator-sedona/pull/743#issuecomment-1370424624

   > @Kimahriman I don't think it is a good idea to include logging dependencies in Sedona. Since Spark and Flink already package their own versions of log4j or slf4j, including these dependencies by ourselves are likely to create conflicts in the future. This will introduce additional complexity to the project management and cause potential bugs.
   
   Nothing is added as a compile dependency, most of this is to standardize what framework gets used during the tests. For example, several of the test base classes have log4j 1.x code to reduce the logging level for various packages. I don't _think_ these do anything for current Flink or Spark versions, because those both use log4j 2.x as the logging implementation, but they include `log4j-1.2-api` as a dependency so the log4j 1.x code still compiles. I'm not sure what the limitations of this bridge API are.
   
   I mentioned in the description, but the real thing that needs to be done is to migrate all logging to SLF4J. Currently if Spark  or Flink dropped the log4j-1.2-api dependency, Sedona would no longer work I believe. This is kind of a step towards that, with two main parts:
   - The two test dependencies (`log4j-core` and `log4j-slf4j-impl`) are added to make sure all tests run with log4j 2.x, so that we can use a log4j2.properties file to configure them
   - `log4j-1.2-api` is added purely for older versions of Spark since we exclude log4j 1.x dependencies to make sure 2.x is used for the tests. But because this provides the log4j 1.x API that the Sedona code currently uses, it has to be a provided dependency and not a test dependency. If Sedona only used SLF4J for logging this wouldn't be necessary, as the 1.x to 2.x change would be seamless and we wouldn't need to exclude the SLF4J dependency from Spark.
   
   Also part of this is that currently the Flink module doesn't support any form of logging during the tests, because the Flink dependencies don't provide any of the logging modules like Spark does for the dependencies that are currently included.
   
   With this going forward, we could easily enable logging in the common module by adding the slf4j-api as a provided dependency. This would enable the code to compile, and then with the two test dependencies mentioned above, common module tests could properly output logs, and at runtime the slf4j-api would be provided by either Flink or Spark directly. Especially if/as more of the non-Spark related core and sql module code get ported to the common module.
   
   Alternative options to change less things but still work toward that goal:
   - Only include these dependencies for Flink so that the Flink tests can actually produce logs
   - Either don't worry about configuring logging for the Spark tests across all the versions, or also add a log4j.properties file to mirror the log4j2.properties file so both old and new versions of Spark would have similar settings applied (either for reducing verbosity of certain modules or do what I have done here which is output the unit test logs to a file instead of the console
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-sedona] jiayuasu commented on pull request #743: [SEDONA-228] Standardize logging modules

Posted by GitBox <gi...@apache.org>.
jiayuasu commented on PR #743:
URL: https://github.com/apache/incubator-sedona/pull/743#issuecomment-1369390537

   @Kimahriman 
   I don't think it is a good idea to include logging dependencies in Sedona. Since Spark and Flink already package their own versions of log4j or slf4j, including these dependencies by ourselves are likely to create conflicts in the future. This will introduce additional complexity to the project management and cause potential bugs.
   
   For the common module, I think we should just use the regular java exception mechanism and let `sedona-spark` and `sedona-flink` to handle exceptions using log4j/slf4j packaged in Spark and Flink.
   
   Please correct me if I am wrong :-)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [sedona] jiayuasu commented on pull request #743: [SEDONA-228] Standardize logging modules

Posted by GitBox <gi...@apache.org>.
jiayuasu commented on PR #743:
URL: https://github.com/apache/sedona/pull/743#issuecomment-1371814275

   @Kimahriman Thank you for your great explanation!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [sedona] jiayuasu merged pull request #743: [SEDONA-228] Standardize logging modules

Posted by GitBox <gi...@apache.org>.
jiayuasu merged PR #743:
URL: https://github.com/apache/sedona/pull/743


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [sedona] jiayuasu commented on pull request #743: [SEDONA-228] Standardize logging modules

Posted by GitBox <gi...@apache.org>.
jiayuasu commented on PR #743:
URL: https://github.com/apache/sedona/pull/743#issuecomment-1372632857

   Great explanation! Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [sedona] jiayuasu commented on pull request #743: [SEDONA-228] Standardize logging modules

Posted by GitBox <gi...@apache.org>.
jiayuasu commented on PR #743:
URL: https://github.com/apache/sedona/pull/743#issuecomment-1371827270

   @Kimahriman One thing I just realized: We probably should just print logging to the console (stdout and stderr) directly. Otherwise the users won't be able to see the logging unless they config log4j explicitly. It will be better if they can see those WARN/ERROR message directly. What do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [sedona] Kimahriman commented on pull request #743: [SEDONA-228] Standardize logging modules

Posted by GitBox <gi...@apache.org>.
Kimahriman commented on PR #743:
URL: https://github.com/apache/sedona/pull/743#issuecomment-1372116194

   Do you mean for tests or for real use?
   
   For tests the logs should all go to `target/unit-tests.log`. Copied this from Spark so the stdout/err of the tests is less verbose and mostly just shows passed/failed tests. Some of the CI test logs were so verbose it's hard to find what test actually failed. I think the python tests are still probably pretty verbose.
   
   For real use, the logs should keep working exactly as they were automatically using the logging frameworks provided by Flink and Spark. If someone were to try to use the common module directly without Flink or Spark, then they would need to provide their own logging configs. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org