You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Scott Smith <SS...@MainstreamData.com> on 2004/01/13 01:59:43 UTC

Philosophy(??) question

I have some documents I'm indexing which have multiple languages in them
(i.e., some fields in the document are always English; other fields may be
other languages).  Now, I understand why a query against a certain field
must use the same analyzer as was used when that field was indexed
(stemming, stop words, etc.).  It seems like different fields could use
different analyzers and the world would still be a happy place.  However,
since the analyzer() is passed in as part of the IndexWriter, that can't
happen.  Is there a way to do this (other than having multiple indexes which
is a problem trying to do combined searches)?  Or am I missing something
more subtle?  Sorry if I'm plowing old ground.

Scott

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Philosophy(??) question

Posted by Erik Hatcher <er...@ehatchersolutions.com>.

On Jan 12, 2004, at 7:59 PM, Scott Smith wrote:
> I have some documents I'm indexing which have multiple languages in 
> them
> (i.e., some fields in the document are always English; other fields 
> may be
> other languages).  Now, I understand why a query against a certain 
> field
> must use the same analyzer as was used when that field was indexed
> (stemming, stop words, etc.).  It seems like different fields could use
> different analyzers and the world would still be a happy place.  
> However,
> since the analyzer() is passed in as part of the IndexWriter, that 
> can't
> happen.  Is there a way to do this (other than having multiple indexes 
> which
> is a problem trying to do combined searches)?  Or am I missing 
> something
> more subtle?  Sorry if I'm plowing old ground.

The new PerFieldAnalyzerWrapper (in v. 1.3) allows you to specify 
different analyzers, as its name says, per field.  You simply specify 
which analyzer to use as a default and then any special ones for 
individual fields.

As for using the same analyzer for querying as for indexing - that is a 
deeper question that I've yet to agree with.  There are some 
interesting reasons why you may want a different one - although they 
must "cooperate" in some fashion.

	Erik

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org

Re: Philosophy(??) question

Posted by Morus Walter <mo...@tanto-xipolis.de>.

Scott Smith writes:
> I have some documents I'm indexing which have multiple languages in them
> (i.e., some fields in the document are always English; other fields may be
> other languages).  Now, I understand why a query against a certain field
> must use the same analyzer as was used when that field was indexed
> (stemming, stop words, etc.).  It seems like different fields could use
> different analyzers and the world would still be a happy place.  However,
> since the analyzer() is passed in as part of the IndexWriter, that can't
> happen.  Is there a way to do this (other than having multiple indexes which
> is a problem trying to do combined searches)?  Or am I missing something
> more subtle?  Sorry if I'm plowing old ground.
> 
AFAIK you need to write one analyzer that acts different based on the
the 'fieldName' parameter in the tokenStream method.
I haven't done that though.

HTH
	Morus

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org