You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Devshree Sane <de...@gmail.com> on 2010/09/21 20:24:10 UTC

Using FastVectorHighlighter for snippets

I am using the FastVectorHighlighter for retrieving snippets from the index.


I am a bit confused about the parameters that are passed to the
FastVectorHighlighter.getBestFragments() method. One parameter is a document
id and another is the maximum number of fragments. Does it mean that only
the maximum number of fragments will be retrieved from document with given
id (even if there are more fragments in the same document)?

Is there a way to get the maximum number of fragments over the whole
index(and not just 1 document)?

Re: Using FastVectorHighlighter for snippets

Posted by Devshree Sane <de...@gmail.com>.
One more observation.
The length of the snippet returned is not equal to the  fragment length
specified.
Does anyone know the reason why?

On Wed, Sep 22, 2010 at 3:05 PM, Devshree Sane <de...@gmail.com>wrote:

> Thanks for your reply Koji.
>
> On Wed, Sep 22, 2010 at 4:51 AM, Koji Sekiguchi <ko...@r.email.ne.jp>wrote:
>
>>  (10/09/22 3:24), Devshree Sane wrote:
>>
>>> I am a bit confused about the parameters that are passed to the
>>> FastVectorHighlighter.getBestFragments() method. One parameter is a
>>> document
>>> id and another is the maximum number of fragments. Does it mean that only
>>> the maximum number of fragments will be retrieved from document with
>>> given
>>> id (even if there are more fragments in the same document)?
>>>
>>>  Correct.
>>
>>
> I did a little experiment for this. Following are my observations.
> Changing the number of characters from 100 to 1000 decreased the number of
> fragments returned.
>
> Is this because the document text was covered with a few 1000 character
> fragments? If so, then this means that one fragment can contain more than
> one occurrence of the query term. Is this so? If yes, is there a way to find
> the number of occurrences of the query term inside a particular
> snippet/fragment?
>
> Also is there a way to get the beginning and ending positions/offsets in
> the document of the snippet/fragment being returned?
>
>
>
>
>

Re: Using FastVectorHighlighter for snippets

Posted by Devshree Sane <de...@gmail.com>.
Thanks for your reply Koji.

On Wed, Sep 22, 2010 at 4:51 AM, Koji Sekiguchi <ko...@r.email.ne.jp> wrote:

>  (10/09/22 3:24), Devshree Sane wrote:
>
>> I am a bit confused about the parameters that are passed to the
>> FastVectorHighlighter.getBestFragments() method. One parameter is a
>> document
>> id and another is the maximum number of fragments. Does it mean that only
>> the maximum number of fragments will be retrieved from document with given
>> id (even if there are more fragments in the same document)?
>>
>>  Correct.
>
>
I did a little experiment for this. Following are my observations.
Changing the number of characters from 100 to 1000 decreased the number of
fragments returned.

Is this because the document text was covered with a few 1000 character
fragments? If so, then this means that one fragment can contain more than
one occurrence of the query term. Is this so? If yes, is there a way to find
the number of occurrences of the query term inside a particular
snippet/fragment?

Also is there a way to get the beginning and ending positions/offsets in the
document of the snippet/fragment being returned?

Re: Using FastVectorHighlighter for snippets

Posted by Koji Sekiguchi <ko...@r.email.ne.jp>.
  (10/09/22 3:24), Devshree Sane wrote:
> I am using the FastVectorHighlighter for retrieving snippets from the index.
>
>
> I am a bit confused about the parameters that are passed to the
> FastVectorHighlighter.getBestFragments() method. One parameter is a document
> id and another is the maximum number of fragments. Does it mean that only
> the maximum number of fragments will be retrieved from document with given
> id (even if there are more fragments in the same document)?
>
Correct.

> Is there a way to get the maximum number of fragments over the whole
> index(and not just 1 document)?
>
You need to put getBestFragments() in a loop:

for( int docId : ids in whole index ){
   String[] fragments = fvh.getBestFragments( ..., docId, ... );
   // ...
}

Koji

-- 
http://www.rondhuit.com/en/


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org