You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Nicolás Lichtmaier <ni...@wolfram.com.INVALID> on 2019/03/21 15:15:58 UTC
Help with token streams and graphs (v2)
(When I sent this message earlier I had used HTML to make it more clear
and easier to read. I see now that the list software removed that
leaving an unreadable mess. I'm sending this again, in case somebody
could be kind enough to guide me here a bit. =) )
------
I'm trying to make synonyms work right and for that I'm trying to
understand better graphs in a token stream.
For that purpose I've built this code:
Builder builder = CustomAnalyzer.builder();
builder.withTokenizer(StandardTokenizerFactory.class);
MySynonymGraphFilterFactory.registerSynonyms(Arrays.asList(
Arrays.asList("go to", "navigate", "open")
));
builder.addTokenFilter(MySynonymGraphFilterFactory.class, "synonyms",
"unused");
(MySynonymGraphFilterFactory is just a hack to pass a list of lists for
synonyms. It expands everything mapping everything to everything.)
builder.addTokenFilter(FlattenGraphFilterFactory.class); //
nothing changes with this!
Analyzer analyzer = builder.build();
TokenStream ts = analyzer.tokenStream("*", new
StringReader("go to the webpage!"));
Then I call a function that just dumps terms, position increments and
position lengths:
System.out.println(LoggingFilter.tokenStreamToString(ts));
What I don't understand is this. I get the same output whether I include
FlattenGraphFilter or not. This is the output:
navigate<2> (0)open<2> (0)go to the webpage
(angle brackets show position lengths of the preceding term; parenthesis
show position increments of the following term)
There's something I'm not understanding here. I'd thought that
flattening the stream meant that no token will have position length >
1... was I wrong? I would greatly appreciate any help with understanding
this.
Thanks!
Nicolás.-
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org