You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Clemens Wyss DEV <cl...@mysign.ch> on 2012/12/09 14:15:23 UTC

porting a cutsom Analyzer from 3.6 -> 4.0

I have a CustomAnalyzer which overrides "public final TokenStream tokenStream ( String fieldName, Reader reader )":
@Override
public final TokenStream tokenStream ( String fieldName, Reader reader )
{
boolean fieldRequiresExactMatching = IndexManager.getInstance().isExactMatchField( fieldName );

Reader localreader = reader;
if ( !fieldRequiresExactMatching )
{
	NormalizeCharMap charMap = new NormalizeCharMap();
	charMap.add(",", " ");
<SNIP>
	// wrap/filter reader
	localreader = new MappingCharFilter( charMap, reader );			
}
TokenStream t = new WhitespaceAnalyzer( IndexManager.CURRENT_LUCENE_VERSION ).tokenStream( fieldName, localreader );

if ( !fieldRequiresExactMatching )
{
	// apply stop word filter
	Set<String> stopWordSet = null;
<SNIP>
	if ( stopWordSet != null )
	{
		// wrap/filter stream
		StopFilter stopFilter = new StopFilter( IndexManager.CURRENT_LUCENE_VERSION, t, stopWordSet, true );
		t = stopFilter;
	}
}
return t;
}

MappingCharFilter -> whiteSpace analysis - <if condition given> -> stop word filtering

As of Lucene 4.0 " protected TokenStreamComponents createComponents ( final String fieldName, final Reader reader )" is to be overridden and  a TokenStreamComponents has tob e returned. I don't see how to achieve this ... all I have is a TokenStream but no Tokenizer ...

Re: porting a cutsom Analyzer from 3.6 -> 4.0

Posted by Nicola Buso <nb...@ebi.ac.uk>.
Hi,

take a look at StandardAnalyzer sources for an example:

http://grepcode.com/file/repo1.maven.org/maven2/org.apache.lucene/lucene-analyzers-common/4.0.0/org/apache/lucene/analysis/standard/StandardAnalyzer.java#StandardAnalyzer

In your case you are case:
- remember your analyzer have to be reusable!
- WhitespaceTokenizer
- NormalizeCharMap to be used with MappingCharFilter. You can
instantiate a NormalizeCharMap with NormalizeCharMap.Builder. Remember
NormalizeCharMap.Builder is consuming the map at every build() request.
- have a look at MappginCharFilterFactory (I don't really know how this
work :-) )

Cheers,

Nicola

On Sun, 2012-12-09 at 14:15 +0100, Clemens Wyss DEV wrote:
> I have a CustomAnalyzer which overrides "public final TokenStream tokenStream ( String fieldName, Reader reader )":
> @Override
> public final TokenStream tokenStream ( String fieldName, Reader reader )
> {
> boolean fieldRequiresExactMatching = IndexManager.getInstance().isExactMatchField( fieldName );
> 
> Reader localreader = reader;
> if ( !fieldRequiresExactMatching )
> {
> 	NormalizeCharMap charMap = new NormalizeCharMap();
> 	charMap.add(",", " ");
> <SNIP>
> 	// wrap/filter reader
> 	localreader = new MappingCharFilter( charMap, reader );			
> }
> TokenStream t = new WhitespaceAnalyzer( IndexManager.CURRENT_LUCENE_VERSION ).tokenStream( fieldName, localreader );
> 
> if ( !fieldRequiresExactMatching )
> {
> 	// apply stop word filter
> 	Set<String> stopWordSet = null;
> <SNIP>
> 	if ( stopWordSet != null )
> 	{
> 		// wrap/filter stream
> 		StopFilter stopFilter = new StopFilter( IndexManager.CURRENT_LUCENE_VERSION, t, stopWordSet, true );
> 		t = stopFilter;
> 	}
> }
> return t;
> }
> 
> MappingCharFilter -> whiteSpace analysis - <if condition given> -> stop word filtering
> 
> As of Lucene 4.0 " protected TokenStreamComponents createComponents ( final String fieldName, final Reader reader )" is to be overridden and  a TokenStreamComponents has tob e returned. I don't see how to achieve this ... all I have is a TokenStream but no Tokenizer ...
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org