You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by "P. Hill" <pa...@gmail.com> on 2011/12/06 20:27:07 UTC

Parallel Parsing with an AutoDetectParser

I am aware that the AutoDetectParser object is thread safe, but I wish 
to ask the opposite question.

Why not  simply create a new AutoDetectParser one each time my (server) 
code gets a request to parse a file?

If there some overhead, what is it?

Is it just the overhead of checking to see if a definition (-D or 
environment varible) points to a tika-config.xml and possibly parsing 
it?  I don't actually have a config file.  Is there other waste or 
overhead?  Comments?

( I noticed the discussion from Jan 2010 [ 
https://issues.apache.org/jira/browse/TIKA-374 ] about fixing a bug in 
AutoDetectParser to make it even more thread safe.  I checked the code 
and despite the fix comment only mentioning the one issue -- new 
SAXParser instance -- the code also currently TikaConfig to no longer 
holds the mime-types in a static member. )

-Paul