You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Clemens Wyss DEV <cl...@mysign.ch> on 2012/12/09 14:15:23 UTC
porting a cutsom Analyzer from 3.6 -> 4.0
I have a CustomAnalyzer which overrides "public final TokenStream tokenStream ( String fieldName, Reader reader )":
@Override
public final TokenStream tokenStream ( String fieldName, Reader reader )
{
boolean fieldRequiresExactMatching = IndexManager.getInstance().isExactMatchField( fieldName );
Reader localreader = reader;
if ( !fieldRequiresExactMatching )
{
NormalizeCharMap charMap = new NormalizeCharMap();
charMap.add(",", " ");
<SNIP>
// wrap/filter reader
localreader = new MappingCharFilter( charMap, reader );
}
TokenStream t = new WhitespaceAnalyzer( IndexManager.CURRENT_LUCENE_VERSION ).tokenStream( fieldName, localreader );
if ( !fieldRequiresExactMatching )
{
// apply stop word filter
Set<String> stopWordSet = null;
<SNIP>
if ( stopWordSet != null )
{
// wrap/filter stream
StopFilter stopFilter = new StopFilter( IndexManager.CURRENT_LUCENE_VERSION, t, stopWordSet, true );
t = stopFilter;
}
}
return t;
}
MappingCharFilter -> whiteSpace analysis - <if condition given> -> stop word filtering
As of Lucene 4.0 " protected TokenStreamComponents createComponents ( final String fieldName, final Reader reader )" is to be overridden and a TokenStreamComponents has tob e returned. I don't see how to achieve this ... all I have is a TokenStream but no Tokenizer ...
Re: porting a cutsom Analyzer from 3.6 -> 4.0
Posted by Nicola Buso <nb...@ebi.ac.uk>.
Hi,
take a look at StandardAnalyzer sources for an example:
http://grepcode.com/file/repo1.maven.org/maven2/org.apache.lucene/lucene-analyzers-common/4.0.0/org/apache/lucene/analysis/standard/StandardAnalyzer.java#StandardAnalyzer
In your case you are case:
- remember your analyzer have to be reusable!
- WhitespaceTokenizer
- NormalizeCharMap to be used with MappingCharFilter. You can
instantiate a NormalizeCharMap with NormalizeCharMap.Builder. Remember
NormalizeCharMap.Builder is consuming the map at every build() request.
- have a look at MappginCharFilterFactory (I don't really know how this
work :-) )
Cheers,
Nicola
On Sun, 2012-12-09 at 14:15 +0100, Clemens Wyss DEV wrote:
> I have a CustomAnalyzer which overrides "public final TokenStream tokenStream ( String fieldName, Reader reader )":
> @Override
> public final TokenStream tokenStream ( String fieldName, Reader reader )
> {
> boolean fieldRequiresExactMatching = IndexManager.getInstance().isExactMatchField( fieldName );
>
> Reader localreader = reader;
> if ( !fieldRequiresExactMatching )
> {
> NormalizeCharMap charMap = new NormalizeCharMap();
> charMap.add(",", " ");
> <SNIP>
> // wrap/filter reader
> localreader = new MappingCharFilter( charMap, reader );
> }
> TokenStream t = new WhitespaceAnalyzer( IndexManager.CURRENT_LUCENE_VERSION ).tokenStream( fieldName, localreader );
>
> if ( !fieldRequiresExactMatching )
> {
> // apply stop word filter
> Set<String> stopWordSet = null;
> <SNIP>
> if ( stopWordSet != null )
> {
> // wrap/filter stream
> StopFilter stopFilter = new StopFilter( IndexManager.CURRENT_LUCENE_VERSION, t, stopWordSet, true );
> t = stopFilter;
> }
> }
> return t;
> }
>
> MappingCharFilter -> whiteSpace analysis - <if condition given> -> stop word filtering
>
> As of Lucene 4.0 " protected TokenStreamComponents createComponents ( final String fieldName, final Reader reader )" is to be overridden and a TokenStreamComponents has tob e returned. I don't see how to achieve this ... all I have is a TokenStream but no Tokenizer ...
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org