You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tyler Palsulich (JIRA)" <ji...@apache.org> on 2014/09/27 16:31:33 UTC
[jira] [Commented] (TIKA-1239) Using Spring and Tika together. Need
to extract the content and metadata.
[ https://issues.apache.org/jira/browse/TIKA-1239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14150616#comment-14150616 ]
Tyler Palsulich commented on TIKA-1239:
---------------------------------------
Hi [~iyersudheshna@gmail.com]. Did you end up with a working Spring configuration?
> Using Spring and Tika together. Need to extract the content and metadata.
> --------------------------------------------------------------------------
>
> Key: TIKA-1239
> URL: https://issues.apache.org/jira/browse/TIKA-1239
> Project: Tika
> Issue Type: Task
> Components: general, metadata, parser
> Reporter: sudheshna iyer
> Priority: Critical
>
> I need to use spring with Tika. Is it thread safe to use the following injected from bean context. I am injecting parseContext, handler and parser into my class TikaImpl.
> ================
> <bean name="parseContext" class="org.apache.tika.parser.ParseContext"></bean>
> <bean name="parser" class="org.apache.tika.parser.AutoDetectParser"></bean>
> <bean name="handler" class="org.xml.sax.helpers.DefaultHandler"></bean>
>
> <bean id="tikaService" class="com.intech.tika.TikaImpl">
> <property name="parseContext" ref="parseContext"></property>
> <property name="parser" ref="parser"></property>
> <property name="handler" ref="handler"></property>
> <property name="resourcesize"><value>10485760</value></property>
> </bean>
> ===============
> In my class I have 3 methods 1. To retrieve metadata 2. to retrieve content 3. to retrieve both.
> So for 1. Retrieve metadata, I am using:
> parser.parse(stream, handler,
> metadata, parseContext)
> 2. To retrieve the content, i am using:
> Tika tika = new Tika();
> tika.setMaxStringLength(resourcesize);
> String content = tika.parseToString(stream);
> 3. To retrieve both: I am using:
> BodyContentHandler bodyContentHandler = new BodyContentHandler(resourcesize);
> Metadata metadata = new Metadata();
> parser.parse(TikaInputStream.get(stream), bodyContentHandler, metadata, parseContext);
> Question is:
> Is my approach thread safe? Introduced 3 methods, thinking that just getting metadata from the first method is faster than the 3rd method.
> Need your suggestion badly. Thank you in advance.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)