You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jackrabbit.apache.org by Robert Siska <ro...@magnolia-cms.com> on 2012/04/05 08:25:08 UTC

textFilterClasses deprecated. How to specify extractors?

Hello people,

My Jackrabbit 2.4 says, that textFilterClasses parameter of SearchIndex
element is deprecated and ignored. When I remove the textFilterClasses
parameter all together, it indexes Pdf, Rtf - everything.

How does it know, what binary files it should index, when I'm not
specifying no extractors? How can I disable/enable them?

API says it's deprecated, but doesn't provide any alternative.

Thank you for any info!

	Robert

Re: textFilterClasses deprecated. How to specify extractors?

Posted by Mark Herman <MH...@NBME.org>.

Robert Siska wrote
> 
> How does it know, what binary files it should index, when I'm not
> specifying no extractors? How can I disable/enable them?
> 

I'm not an expert but what I do know that JR uses Tika to extract text, and
it determines how based on the jcr:mimeType property. If you don't supply
mimetype, then it won't know how to extract it (although I wouldn't
recommend that as a practice). I believe there is a way to supply  JR with a
Tika config that might give you what you want.

Additionally you can specify a indexing config in the repository/workspace
xml files that you can set some rules on what gets indexed and how by
lucene.


--
View this message in context: http://jackrabbit.510166.n4.nabble.com/textFilterClasses-deprecated-How-to-specify-extractors-tp4534050p4536443.html
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.