You are viewing a plain text version of this content. The canonical link for it is here.
Posted to slide-user@jakarta.apache.org by Markus Maeder <Ma...@oase.ch> on 2004/11/01 13:54:09 UTC
Re: lucene, extractor and pdf - HOW? - Problem Solved
Sorry, I looked at code not beeing used anymore. :)
I have solved the PDF-problem by upgrading to the new PDFBox-Version 0.6.7a
Regards
Markus
Zitat von Markus Maeder <Ma...@oase.ch>:
> Zitat von Unico Hommes <un...@hippo.nl>:
> > ...
> > <contentindexer classname="org.apache.slide.index.TextContentIndexer">
> > <parameter name="indexpath">store/index</parameter>
> > <parameter
> > name="analyzer">org.apache.lucene.analysis.de.GermanAnalyzer</parameter>
> > </contentindexer>
>
> If I look at the CVS source, I see
>
> IndexWriter writer =
> new IndexWriter(indexDb, new StandardAnalyzer(), false);
>
> in LuceneIndexer.java
>
> As I am no java crack, I might misunterstand this. But IMHO this will always
> call then StandardAnalyzer...
>
> The problem with the PDF files not getting indexed still exists. The content
> gets extracted with PDFBox. I will trace LuceneIndexer now.
>
>
> Regards
> Markus
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: slide-user-help@jakarta.apache.org
>
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: slide-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: slide-user-help@jakarta.apache.org