You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Jay Hill <ja...@gmail.com> on 2007/12/26 20:23:56 UTC

Analyzer choices for indexing and searching multiple languages

I'm working on a project where we will be searching across several languages
with a single query. There will be different categories which will include
different groups of languages to search (i.e. category "a": English, French,
Spanish; category "b": Spanish, Portugese, Itailian, etc) Originally I began
indexing each language using a language-specific Analyzer, but I'm not sure
to handle the QueryParser at search time, not sure which Analyzer to pass to
it.

Does anyone have any experience with indexing all the languages using the
StandardAnalyzer? Right now we only have European languages to index, so I'm
wondering if anyone has had any experience using the StandardAnalyzer to
index European languages, and then using QueryParser with the
StandardAnalyzer at search time.

Or, would it be better to analyze each language at index time using a
language-specific Analyzer, and then still use the QueryParser with the
StandardAnalyzer at search time. I've considered building a BooleanQuery of
QueryParsers with each QueryParser built with a language-specific Analyzer,
but that seems like it would be bound to be very slow.

Any opinions or thoughts appreciated.

-Jay