You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by "Luis Filipe Nassif (Created) (JIRA)" <ji...@apache.org> on 2012/02/11 15:11:59 UTC

[jira] [Created] (LUCENE-3772) Highlighter needs the whole text in memory to work

Highlighter needs the whole text in memory to work
--------------------------------------------------

                 Key: LUCENE-3772
                 URL: https://issues.apache.org/jira/browse/LUCENE-3772
             Project: Lucene - Java
          Issue Type: Improvement
          Components: modules/highlighter
    Affects Versions: 3.5
         Environment: Windows 7 Enterprise x64, JRE 1.6.0_25
            Reporter: Luis Filipe Nassif


Highlighter methods getBestFragment(s) and getBestTextFragments only accept a String object representing the whole text to highlight. When dealing with very large docs simultaneously, it can lead to heap consumption problems. It would be better if the API could accept a Reader objetct additionally, like Lucene Document Fields do.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Commented] (LUCENE-3772) Highlighter needs the whole text in memory to work

Posted by "Mark Harwood (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13476044#comment-13476044 ] 

Mark Harwood commented on LUCENE-3772:
--------------------------------------

For bigger-than-memory docs is it not possible to use nested documents to represent subsections (e.g. a child doc for each of the chapters in a book) and then use BlockJoinQuery to select the best child docs?
Highlighting can then be used on a more-manageable subset of the original content and Lucene's ranking algos are being used to select the best "fragment" rather than the highlighter's own attempts to reproduce this logic.

Obviously depends on the shape of your content/queries but books-and-chapters is probably a good fit for this approach.
                
> Highlighter needs the whole text in memory to work
> --------------------------------------------------
>
>                 Key: LUCENE-3772
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3772
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/highlighter
>    Affects Versions: 3.5
>         Environment: Windows 7 Enterprise x64, JRE 1.6.0_25
>            Reporter: Luis Filipe Nassif
>              Labels: highlighter, improvement, memory
>
> Highlighter methods getBestFragment(s) and getBestTextFragments only accept a String object representing the whole text to highlight. When dealing with very large docs simultaneously, it can lead to heap consumption problems. It would be better if the API could accept a Reader objetct additionally, like Lucene Document Fields do.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

[jira] [Commented] (LUCENE-3772) Highlighter needs the whole text in memory to work

Posted by "Luis Filipe Nassif (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LUCENE-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475924#comment-13475924 ] 

Luis Filipe Nassif commented on LUCENE-3772:
--------------------------------------------

I think this improvement is still important.
                
> Highlighter needs the whole text in memory to work
> --------------------------------------------------
>
>                 Key: LUCENE-3772
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3772
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/highlighter
>    Affects Versions: 3.5
>         Environment: Windows 7 Enterprise x64, JRE 1.6.0_25
>            Reporter: Luis Filipe Nassif
>              Labels: highlighter, improvement, memory
>
> Highlighter methods getBestFragment(s) and getBestTextFragments only accept a String object representing the whole text to highlight. When dealing with very large docs simultaneously, it can lead to heap consumption problems. It would be better if the API could accept a Reader objetct additionally, like Lucene Document Fields do.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org