You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Vladimir Svetov <vs...@gmail.com> on 2016/10/14 01:28:07 UTC
getBestFragments with SimpleSpanFragmenter
Hi all,
I have the following 2 indexed data for the field, title_t_en:
"\"War and Peace\" by \"Leo Tolstoy\"
\"Three sisters" by \"Anton Chekhov\""
I am searching by : +((title_t_en:war) (title_t_en:sister))
For every found doc's index *value* the following code is called:
SimpleHTMLFormatter htmlFormatter = new SimpleHTMLFormatter();
QueryScorer queryScorer = new QueryScorer(luceneQuery);
Highlighter highlighter = new Highlighter(htmlFormatter, queryScorer);
SimpleSpanFragmenter fragmenter = new SimpleSpanFragmenter(queryScorer,
*5)*;
String bestFragments = highlighter.getBestFragments(tokenStream, *value*,
*3,*FRAGMENT_DELIMITER );
The code produces the following bestFragments for found values:
"\"<B>War</B> and Peace\" by \"Leo Tolstoy\""
"\"Three <B>sisters</B>\" by \"Anton Chekhov\""
Question:
Why does bestFragments contain more then 5 bytes?
Should the getBestFragments() return 3 fragments with
delimiters , where each fragment does not exceed 5 bytes?
Regards,
Vlad
Re: getBestFragments with SimpleSpanFragmenter
Posted by lukes <ma...@gmail.com>.
If you open the source, you will see it internally calls
this.getBestFragments(tokenStream, text, maxNumFragments) which in turn
calls
this.getBestTextFragments(tokenStream, text, true, maxNumFragments) (*with
flag true*) which will merge the fragments automatically.
Regards.
--
View this message in context: http://lucene.472066.n3.nabble.com/getBestFragments-with-SimpleSpanFragmenter-tp4301065p4301069.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: getBestFragments with SimpleSpanFragmenter
Posted by Vladimir Svetov <vs...@gmail.com>.
Thanks for advice
However I asked about different API semantic:
String fragments = highlighter.*getBestFragments*(tokenStream, value,
*3*,FRAGMENT_DELIMITER
);
when
fragmenter = new SimpleSpanFragmenter(queryScorer,*5*);
highlighter.setTextFragmenter(fragmenter);
Regards
On Thu, Oct 13, 2016 at 7:37 PM, lukes <ma...@gmail.com> wrote:
> Please pass false to mergeContiguousFragments in
> getBestTextFragments(TokenStream tokenStream, String text, boolean
> mergeContiguousFragments, int maxNumFragments) and it should work as
> expected.
>
> Regards.
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/getBestFragments-with-SimpleSpanFragmenter-
> tp4301065p4301066.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
Re: getBestFragments with SimpleSpanFragmenter
Posted by lukes <ma...@gmail.com>.
Please pass false to mergeContiguousFragments in
getBestTextFragments(TokenStream tokenStream, String text, boolean
mergeContiguousFragments, int maxNumFragments) and it should work as
expected.
Regards.
--
View this message in context: http://lucene.472066.n3.nabble.com/getBestFragments-with-SimpleSpanFragmenter-tp4301065p4301066.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org