You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Nicolás Lichtmaier <ni...@wolfram.com.INVALID> on 2019/03/20 17:52:33 UTC

Help with token streams and graphs

I'm trying to make synonyms work right and for that I'm trying to 
understand better graphs in a token stream.

For that purpose I've built this code:

             Builder builder = CustomAnalyzer.builder();             
builder.withTokenizer(StandardTokenizerFactory.class);             
MySynonymGraphFilterFactory.registerSynonyms(Arrays.asList(             
         Arrays.asList("go to", "navigate", "open")                     
));             
builder.addTokenFilter(*MySynonymGraphFilterFactory*.class, "synonyms", 
"unused");

MySynonymGraphFilterFactory is just a hack to pass a list of lists for 
synonyms. It expands everything mapping everything to everything.

             builder.addTokenFilter(*FlattenGraphFilterFactory*.class); 
/// nothing changes with this!///            Analyzer analyzer = 
builder.build();             TokenStream ts = analyzer.tokenStream("*", 
new StringReader("go to the webpage!"));

Then I call a function that just dumps terms, position increments and 
position lengths:

             System.out.println(LoggingFilter.tokenStreamToString(ts));

What I don't understand is this. I get the same output whether I include 
FlattenGraphFilter or not. This is the output:

    navigate<2> (0)open<2> (0)go  to  the  webpage

(angle brackets show position lengths of the preceding term; parenthesis 
show position increments of the following term)

There's something I'm not understanding here. I'd thought that 
flattening the stream meant that no token will have position length > 
1... was I wrong? I would greatly appreciate any help with understanding 
this.

Thanks!

Nicolás.-