You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Mike Sokolov (JIRA)" <ji...@apache.org> on 2011/06/22 15:24:49 UTC

[jira] [Commented] (LUCENE-3080) cutover highlighter to BytesRef

    [ https://issues.apache.org/jira/browse/LUCENE-3080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13053245#comment-13053245 ] 

Mike Sokolov commented on LUCENE-3080:
--------------------------------------

There could be a good reason though for using byte-offsets in highlighting. I have in mind an optimization that would pull in text from an external file or other source, enabling highlighting without stored fields.  For best performance the snippet should be pulled from the external source using random access to storage, but this requires byte offsets.  I think this might be a big win for large field values.

This could only be done if the highlighter doesn't need to perform any text manipulation itself, so it's not really appropriate for Highlighter, as Robert said, but in the case of FVH it might be possible to implement.  I'm looking at this, but wondering before I get too deep in if anyone can comment on the feasibility of using byte offsets - I'm unclear on what they get used for other than highlighting: would it cause problems to have a CharFilter that returns "corrected" offsets such that char positions in the analyzed text are translated into byte positions in the source text? 

> cutover highlighter to BytesRef
> -------------------------------
>
>                 Key: LUCENE-3080
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3080
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: modules/highlighter
>            Reporter: Michael McCandless
>
> Highlighter still uses char[] terms (consumes tokens from the analyzer as char[] not as BytesRef), which is causing problems for merging SOLR-2497 to trunk.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org