You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Vladimir Svetov <vs...@gmail.com> on 2016/10/14 01:28:07 UTC

getBestFragments with SimpleSpanFragmenter

Hi  all,


I have the following 2 indexed data for the field, title_t_en:

       "\"War and Peace\" by \"Leo Tolstoy\"
       \"Three sisters" by \"Anton Chekhov\""

I am searching by :  +((title_t_en:war) (title_t_en:sister))

For every found doc's index *value*  the following code is called:

   SimpleHTMLFormatter htmlFormatter = new SimpleHTMLFormatter();
   QueryScorer  queryScorer = new QueryScorer(luceneQuery);
   Highlighter   highlighter = new Highlighter(htmlFormatter, queryScorer);
   SimpleSpanFragmenter fragmenter = new SimpleSpanFragmenter(queryScorer,
*5)*;
   String bestFragments  = highlighter.getBestFragments(tokenStream, *value*,
*3,*FRAGMENT_DELIMITER );

  The code produces the following bestFragments for found values:
                      "\"<B>War</B> and Peace\" by \"Leo Tolstoy\""
                       "\"Three <B>sisters</B>\" by \"Anton Chekhov\""

  Question:
                 Why does bestFragments  contain more then  5  bytes?
                 Should the getBestFragments() return  3 fragments with
delimiters , where each fragment  does not exceed 5 bytes?

Regards,
Vlad

Re: getBestFragments with SimpleSpanFragmenter

Posted by lukes <ma...@gmail.com>.
If you open the source, you will see it internally calls 

this.getBestFragments(tokenStream, text, maxNumFragments) which in turn
calls 

this.getBestTextFragments(tokenStream, text, true, maxNumFragments) (*with
flag true*) which will merge the fragments automatically. 

Regards.



--
View this message in context: http://lucene.472066.n3.nabble.com/getBestFragments-with-SimpleSpanFragmenter-tp4301065p4301069.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: getBestFragments with SimpleSpanFragmenter

Posted by Vladimir Svetov <vs...@gmail.com>.
Thanks for advice
However I asked about  different API semantic:

String fragments = highlighter.*getBestFragments*(tokenStream, value,
*3*,FRAGMENT_DELIMITER
);

when

          fragmenter = new SimpleSpanFragmenter(queryScorer,*5*);
          highlighter.setTextFragmenter(fragmenter);

Regards


On Thu, Oct 13, 2016 at 7:37 PM, lukes <ma...@gmail.com> wrote:

> Please pass false to mergeContiguousFragments in
> getBestTextFragments(TokenStream tokenStream, String text, boolean
> mergeContiguousFragments, int maxNumFragments) and it should work as
> expected.
>
> Regards.
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/getBestFragments-with-SimpleSpanFragmenter-
> tp4301065p4301066.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: getBestFragments with SimpleSpanFragmenter

Posted by lukes <ma...@gmail.com>.
Please pass false to mergeContiguousFragments in
getBestTextFragments(TokenStream tokenStream, String text, boolean
mergeContiguousFragments, int maxNumFragments) and it should work as
expected.

Regards.



--
View this message in context: http://lucene.472066.n3.nabble.com/getBestFragments-with-SimpleSpanFragmenter-tp4301065p4301066.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org