You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by Konstantin Gribov <gr...@gmail.com> on 2014/03/06 21:11:18 UTC

Unconsistent logging in current tika (1.5)

Hi, folks.

Tika-core is quite pure (uses only java.util.logging) but tika-parsers uses
commons-logging 1.1.1 (through pdfbox), slf4j-api 1.5.6 (through netcdf)
and log4j 1.2.14 (through slf4j-log4j as test scope dependency). Also some
parsers (like pdfbox) logs just to stdout/stderr.

It's confusing.

Tika-core use only JUL.
Tika-parsers use JCL and log4j (in tests) and depends on slf4j-api.
Tika-app use JCL, configures log4j in runtime (to change verbosity level)
and depends on slf4j-log4j12.
Tika-server use only JCL but depends on slj4j-api 1.7.5 (through cxf).

What do you think about change all the logging to actual slf4j and
excluding JCL from dependencies at all?

First option group is about add slf4j-api to tika-core dependencies or not.
If it's added we won't use JUL. If it isn't added -- jul-to-slf4j can be
added to tika-parsers deps.

Second option group is related to commons-logging. We can:
- exclude it and force developer to add either jcl-over-slf4j or
commons-logging as dependency,
- exclude it and add jcl-over-slf4j as dependency, so someone uses JCL will
be forced to exclude jcl-over-slf4j,
- leave it and force one to use either slf4j-jcl + commons-logging or
exclude commons-logging and include jcl-over-slf4j.

I think, second way is preferred because developer can use any slf4j
backend and will be forced to do something only when he/she is using JCL.

Third option group is about backend for slf4j. We can use log4j or logback.
I prefer logback-classic but we can use any of them. Either of them
supports log level changing in runtime.

I can refactor tika codebase to use logging in consistent manner and create
pull request on github or jira ticket with patch after that, if my solution
on this issue will be accepted.

By the way, I think we also should update edu.ucar:netcdf to 4.2.20 that
depends on newer slf4j-api 1.6.1.

-- 
Best regards,
Konstantin Gribov.

Re: Unconsistent logging in current tika (1.5)

Posted by Konstantin Gribov <gr...@gmail.com>.
Hello, Nick.

I'll answer in reverse order.

Can you open a jira for that upgrade? If you can also try it locally, and
> report on the jira if all the unit tests still pass, that'd be a help!

Tested on local build and opened ticket for it:
https://issues.apache.org/jira/browse/TIKA-1258.

If we don't want to add logging api to tika-core, then it should not change
anything in tika-core. Also we should drop log4j.properties from
tika-core/src/test/resources because we don't use log4j in it.

Libraries shouldn't have any logging *setup* because it can affect
application which use library. Except for test dependencies.

Would you accept patch that bring all logging in tika-parsers, tika-app and
tika-server to consistent system based on slf4j (which is present in each
of these modules due to dependencies)? Would you accept it if it excludes
jcl (commons-logging) and brings jcl to slf4j bridge?

I want to improve tika to allow other developers using it to have less
headache on configuring tika dependencies.

-- 
Best regards,
Konstantin Gribov.


2014-03-07 9:03 GMT+04:00 Nick Burch <ap...@gagravarr.org>:

> On Fri, 7 Mar 2014, Konstantin Gribov wrote:
>
>> Tika-core is quite pure (uses only java.util.logging) but tika-parsers
>> uses commons-logging 1.1.1 (through pdfbox), slf4j-api 1.5.6 (through
>> netcdf) and log4j 1.2.14 (through slf4j-log4j as test scope dependency).
>> Also some parsers (like pdfbox) logs just to stdout/stderr.
>>
>
> I think part of the issue is that many of the libraries that Tika depends
> on have their own chosen logging library / setup. IIRC, the Tika parsers
> often log in a similar manner to the underlying library they use.
>
> That's not to say that we can't tidy things up a bit, but it does restrict
> how much we can do where log messages come from underlying libraries
>
>
>  It's confusing.
>>
>> Tika-core use only JUL.
>>
>
> Tika-Core ideally shouldn't have any external depdencies, so I'm not sure
> what else it can use while maintaining that?
>
>
>  Tika-parsers use JCL and log4j (in tests) and depends on slf4j-api.
>> Tika-app use JCL, configures log4j in runtime (to change verbosity level)
>> and depends on slf4j-log4j12.
>> Tika-server use only JCL but depends on slj4j-api 1.7.5 (through cxf).
>>
>
> Potentially some of these could be rationalised, though maybe the best we
> can hope for is to ensure they only use whatever their underlying
> dependencies use
>
>
>  By the way, I think we also should update edu.ucar:netcdf to 4.2.20 that
>> depends on newer slf4j-api 1.6.1.
>>
>
> Can you open a jira for that upgrade? If you can also try it locally, and
> report on the jira if all the unit tests still pass, that'd be a help!
>
> Thanks
> Nick
>

Re: Unconsistent logging in current tika (1.5)

Posted by Nick Burch <ap...@gagravarr.org>.
On Fri, 7 Mar 2014, Konstantin Gribov wrote:
> Tika-core is quite pure (uses only java.util.logging) but tika-parsers 
> uses commons-logging 1.1.1 (through pdfbox), slf4j-api 1.5.6 (through 
> netcdf) and log4j 1.2.14 (through slf4j-log4j as test scope dependency). 
> Also some parsers (like pdfbox) logs just to stdout/stderr.

I think part of the issue is that many of the libraries that Tika depends 
on have their own chosen logging library / setup. IIRC, the Tika parsers 
often log in a similar manner to the underlying library they use.

That's not to say that we can't tidy things up a bit, but it does restrict 
how much we can do where log messages come from underlying libraries

> It's confusing.
>
> Tika-core use only JUL.

Tika-Core ideally shouldn't have any external depdencies, so I'm not sure 
what else it can use while maintaining that?

> Tika-parsers use JCL and log4j (in tests) and depends on slf4j-api.
> Tika-app use JCL, configures log4j in runtime (to change verbosity level)
> and depends on slf4j-log4j12.
> Tika-server use only JCL but depends on slj4j-api 1.7.5 (through cxf).

Potentially some of these could be rationalised, though maybe the best we 
can hope for is to ensure they only use whatever their underlying 
dependencies use

> By the way, I think we also should update edu.ucar:netcdf to 4.2.20 that 
> depends on newer slf4j-api 1.6.1.

Can you open a jira for that upgrade? If you can also try it locally, and 
report on the jira if all the unit tests still pass, that'd be a help!

Thanks
Nick