You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@manifoldcf.apache.org by Julien Massiera <ju...@francelabs.com> on 2022/01/18 15:29:00 UTC

Add logs to repository connectors

Hi all,

 

In order to improve the tracking of documents processing status, and in
particular when something goes wrong with MCF (like hanging processes
without obvious causes), I would like to propose to add specific logs into
each repository connector. One log at the beginning of each
documentIdentifier processing in the processDocuments method, and one at the
end, so that, at any time, we can easily tell which documents are being
processed.

To implement that, I was thinking of a common class that would write the
logs, with a custom log level (between INFO and DEBUG), so that the logs
could be easily isolated into a specific file through log4j conf. The class
would be stored in the org.apache.manifoldcf.crawler.system package like the
Logging class. 

 

What do you think about that ?

 

Regards,

Julien


RE: Add logs to repository connectors

Posted by Julien Massiera <ju...@francelabs.com>.
Hi Karl,

So can I add a log4j2 logger in each repository connector and log events like I described in the previous mail ? Using the debug level

Regards,
Julien

-----Message d'origine-----
De : Julien Massiera <ju...@francelabs.com> 
Envoyé : mercredi 19 janvier 2022 12:53
À : dev@manifoldcf.apache.org
Objet : RE: Add logs to repository connectors

Hi Karl,

The goal of having a separate class to produce these logs is to either keeping them in the manifoldcf.log (by setting the logs in DEBUG), or writing them in a separate file by targeting the specific class in the logging.xml with a specific appender and a specific logger.
By default we would be in the first case, having thoses logs in the manifoldcf.log in DEBUG mode, and for those who want them in a separate file, they'll have to modify the logging.xml conf on their own.

With the two choices you propose, this cannot be done because there is no way to specifically target these logs in the logging.xml conf. If we use the existing logging class (and modify it to add the connector class), then by targeting this class in the logging.xml file we will target any logs generated with this class, and in the other solution that consists of having one native log4j logger in each connector, we would have to target every logger. 

If I had to choose it would be to have a native log4j logger in every connector so we can at least specifically target the logs even if it requires more conf.

Also the log format would be like this: 

[DOC_PROCESS_START|DOC_PROCESS_END]|CONNECTOR_NAME|DOCUMENT_IDENTIFIER

Regards,
Julien

-----Message d'origine-----
De : Karl Wright <da...@gmail.com>
Envoyé : mardi 18 janvier 2022 17:44
À : dev <de...@manifoldcf.apache.org>
Objet : Re: Add logs to repository connectors

Having a second log is pretty non-standard, and obscures the ordering of events that may be pertinent.  So I am not thrilled with that idea.
However, the current logging system does not make it easy to determine where a log message is coming from, nor can you filter log messages with grep easily.

It would be better to include enough information in the ONE manifoldcf.log file so that you know what connector the log message is coming from.

We already have individual loggers set up that are "catch all" buckets for events of various kinds, but the logger itself might want to have more information, e.g. the class that did the logging, as an optional argument.
Or - we could use log4j's native capabilities.  All that is needed to do that is to simply create a static logger in every connector class from which you wish to log, and write directly to that one.

Karl


On Tue, Jan 18, 2022 at 10:29 AM Julien Massiera < julien.massiera@francelabs.com> wrote:

> Hi all,
>
>
>
> In order to improve the tracking of documents processing status, and 
> in particular when something goes wrong with MCF (like hanging 
> processes without obvious causes), I would like to propose to add 
> specific logs into each repository connector. One log at the beginning 
> of each documentIdentifier processing in the processDocuments method, 
> and one at the end, so that, at any time, we can easily tell which 
> documents are being processed.
>
> To implement that, I was thinking of a common class that would write 
> the logs, with a custom log level (between INFO and DEBUG), so that 
> the logs could be easily isolated into a specific file through log4j 
> conf. The class would be stored in the 
> org.apache.manifoldcf.crawler.system package like the Logging class.
>
>
>
> What do you think about that ?
>
>
>
> Regards,
>
> Julien
>
>



RE: Add logs to repository connectors

Posted by Julien Massiera <ju...@francelabs.com>.
Hi Karl,

The goal of having a separate class to produce these logs is to either keeping them in the manifoldcf.log (by setting the logs in DEBUG), or writing them in a separate file by targeting the specific class in the logging.xml with a specific appender and a specific logger.
By default we would be in the first case, having thoses logs in the manifoldcf.log in DEBUG mode, and for those who want them in a separate file, they'll have to modify the logging.xml conf on their own.

With the two choices you propose, this cannot be done because there is no way to specifically target these logs in the logging.xml conf. If we use the existing logging class (and modify it to add the connector class), then by targeting this class in the logging.xml file we will target any logs generated with this class, and in the other solution that consists of having one native log4j logger in each connector, we would have to target every logger. 

If I had to choose it would be to have a native log4j logger in every connector so we can at least specifically target the logs even if it requires more conf.

Also the log format would be like this: 

[DOC_PROCESS_START|DOC_PROCESS_END]|CONNECTOR_NAME|DOCUMENT_IDENTIFIER

Regards,
Julien

-----Message d'origine-----
De : Karl Wright <da...@gmail.com> 
Envoyé : mardi 18 janvier 2022 17:44
À : dev <de...@manifoldcf.apache.org>
Objet : Re: Add logs to repository connectors

Having a second log is pretty non-standard, and obscures the ordering of events that may be pertinent.  So I am not thrilled with that idea.
However, the current logging system does not make it easy to determine where a log message is coming from, nor can you filter log messages with grep easily.

It would be better to include enough information in the ONE manifoldcf.log file so that you know what connector the log message is coming from.

We already have individual loggers set up that are "catch all" buckets for events of various kinds, but the logger itself might want to have more information, e.g. the class that did the logging, as an optional argument.
Or - we could use log4j's native capabilities.  All that is needed to do that is to simply create a static logger in every connector class from which you wish to log, and write directly to that one.

Karl


On Tue, Jan 18, 2022 at 10:29 AM Julien Massiera < julien.massiera@francelabs.com> wrote:

> Hi all,
>
>
>
> In order to improve the tracking of documents processing status, and 
> in particular when something goes wrong with MCF (like hanging 
> processes without obvious causes), I would like to propose to add 
> specific logs into each repository connector. One log at the beginning 
> of each documentIdentifier processing in the processDocuments method, 
> and one at the end, so that, at any time, we can easily tell which 
> documents are being processed.
>
> To implement that, I was thinking of a common class that would write 
> the logs, with a custom log level (between INFO and DEBUG), so that 
> the logs could be easily isolated into a specific file through log4j 
> conf. The class would be stored in the 
> org.apache.manifoldcf.crawler.system package like the Logging class.
>
>
>
> What do you think about that ?
>
>
>
> Regards,
>
> Julien
>
>


Re: Add logs to repository connectors

Posted by Karl Wright <da...@gmail.com>.
Having a second log is pretty non-standard, and obscures the ordering of
events that may be pertinent.  So I am not thrilled with that idea.
However, the current logging system does not make it easy to determine
where a log message is coming from, nor can you filter log messages with
grep easily.

It would be better to include enough information in the ONE manifoldcf.log
file so that you know what connector the log message is coming from.

We already have individual loggers set up that are "catch all" buckets for
events of various kinds, but the logger itself might want to have more
information, e.g. the class that did the logging, as an optional argument.
Or - we could use log4j's native capabilities.  All that is needed to do
that is to simply create a static logger in every connector class from
which you wish to log, and write directly to that one.

Karl


On Tue, Jan 18, 2022 at 10:29 AM Julien Massiera <
julien.massiera@francelabs.com> wrote:

> Hi all,
>
>
>
> In order to improve the tracking of documents processing status, and in
> particular when something goes wrong with MCF (like hanging processes
> without obvious causes), I would like to propose to add specific logs into
> each repository connector. One log at the beginning of each
> documentIdentifier processing in the processDocuments method, and one at
> the
> end, so that, at any time, we can easily tell which documents are being
> processed.
>
> To implement that, I was thinking of a common class that would write the
> logs, with a custom log level (between INFO and DEBUG), so that the logs
> could be easily isolated into a specific file through log4j conf. The class
> would be stored in the org.apache.manifoldcf.crawler.system package like
> the
> Logging class.
>
>
>
> What do you think about that ?
>
>
>
> Regards,
>
> Julien
>
>