You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Robert Alexander <ro...@gmail.com> on 2015/08/06 19:25:30 UTC

Standard highlighter returns whole document as a fragment

Hey everyone,

I ran into an issue with the standard highlighter in 4.10.4 and was hoping
that someone could help. I'm attempting to fragment a result based on a
SpanNearQuery. If the words in the query are next to each other, the
fragmenter will often return one large result containing the entire
document. If the words are farther apart, it returns fragmetns of the
expected size.

I have included an example here in a gist link. The sample creates an index
in RAM and adds a single document. If I search for "ken" within 3 of "lay",
I see the problem. If I search for "ken" within 3 of "office", the problem
goes away. If you debug with the lucene source, you'll see that it seems as
if textFragmenter.isNewFragmetn() never returns true (although I understand
that this is the user group and not the dev group so this may be of limited
use).

Are there known issues with the standard highlighter and SpanNear queries?
I am only using the old highlighter because the FVH doesn't appear to
handle SpanNear queries at all.

Thanks for the help,

Rob

Sample Gist: https://gist.github.com/robalex/97a005f4ee23c71c48f6

Re: Standard highlighter returns whole document as a fragment

Posted by Duke DAI <du...@gmail.com>.
Seems we are encountering same problem. (thread: bug of
highlighter/SimpleSpanFragmenter,
returned longer fragment than expected?)
When debugging, your fragmenter is SimpleSpanFragmenter? isNewFragment()
returns true due to below logic?
boolean isNewFrag = offsetAtt.endOffset() >= (fragmentSize *
currentNumFrags) <---------true
        && (textSize - offsetAtt.endOffset()) >= (fragmentSize >>> 1);
 <----------FALSE

I am pursuing input from the community instead of changing/maintaining code
by myself.

Best regards,
Duke
If not now, when? If not me, who?

On Fri, Aug 7, 2015 at 1:25 AM, Robert Alexander <ro...@gmail.com> wrote:

> Hey everyone,
>
> I ran into an issue with the standard highlighter in 4.10.4 and was hoping
> that someone could help. I'm attempting to fragment a result based on a
> SpanNearQuery. If the words in the query are next to each other, the
> fragmenter will often return one large result containing the entire
> document. If the words are farther apart, it returns fragmetns of the
> expected size.
>
> I have included an example here in a gist link. The sample creates an index
> in RAM and adds a single document. If I search for "ken" within 3 of "lay",
> I see the problem. If I search for "ken" within 3 of "office", the problem
> goes away. If you debug with the lucene source, you'll see that it seems as
> if textFragmenter.isNewFragmetn() never returns true (although I understand
> that this is the user group and not the dev group so this may be of limited
> use).
>
> Are there known issues with the standard highlighter and SpanNear queries?
> I am only using the old highlighter because the FVH doesn't appear to
> handle SpanNear queries at all.
>
> Thanks for the help,
>
> Rob
>
> Sample Gist: https://gist.github.com/robalex/97a005f4ee23c71c48f6
>