You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Clandes Tino <cl...@yahoo.co.uk> on 2004/02/23 12:49:43 UTC

Multilanguage and wildcard support

Hi, all.
I would like to describe my dilemma about analyzing
stuff.

2. Multilanguage and wildcard support
In Lucene 1.3 Final I have found very useful class
PerFieldAnalyzerWrapper, which helped me to specify
separate analyzer for each field.
But, full text content - obtained after parsing word,
excel, xml or other kind of document) should be
searchable using stemming capabilities and also should
support wildcard queries.
I implemented this solution:
- indexing of full content is done in two separate
fields, because wildcard queries do not pass through
analyzer, as I have read in this mailing archive.
Field1 (“stemmingbody”) - matching snowball analyzer
is used.
Field2 (“plainbody”) - Whitespace analyzer is used.
So, when user searches for some term in item’s
content, I parse the query and if it contains wild
character, search in "plainbody" is performed;
otherwise I search in "stemmingbody", expecting better
search results, that way.
Is there a better way to do this, e.g. not to index
full content in two separate fields, but only one (I
tokenize it, index it, but not store it)?

Thanks for any opinion or suggestion in advance!
Best regards
Milan Agatonovic 



	
	
		
___________________________________________________________
Yahoo! Messenger - Communicate instantly..."Ping" 
your friends today! Download Messenger Now 
http://uk.messenger.yahoo.com/download/index.html

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org