You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by "P. Hill" <pa...@gmail.com> on 2011/12/06 20:27:07 UTC
Parallel Parsing with an AutoDetectParser
I am aware that the AutoDetectParser object is thread safe, but I wish
to ask the opposite question.
Why not simply create a new AutoDetectParser one each time my (server)
code gets a request to parse a file?
If there some overhead, what is it?
Is it just the overhead of checking to see if a definition (-D or
environment varible) points to a tika-config.xml and possibly parsing
it? I don't actually have a config file. Is there other waste or
overhead? Comments?
( I noticed the discussion from Jan 2010 [
https://issues.apache.org/jira/browse/TIKA-374 ] about fixing a bug in
AutoDetectParser to make it even more thread safe. I checked the code
and despite the fix comment only mentioning the one issue -- new
SAXParser instance -- the code also currently TikaConfig to no longer
holds the mime-types in a static member. )
-Paul