You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Luis Filipe Nassif (Created) (JIRA)" <ji...@apache.org> on 2012/02/11 15:11:59 UTC
[jira] [Created] (LUCENE-3772) Highlighter needs the whole text in
memory to work
Highlighter needs the whole text in memory to work
--------------------------------------------------
Key: LUCENE-3772
URL: https://issues.apache.org/jira/browse/LUCENE-3772
Project: Lucene - Java
Issue Type: Improvement
Components: modules/highlighter
Affects Versions: 3.5
Environment: Windows 7 Enterprise x64, JRE 1.6.0_25
Reporter: Luis Filipe Nassif
Highlighter methods getBestFragment(s) and getBestTextFragments only accept a String object representing the whole text to highlight. When dealing with very large docs simultaneously, it can lead to heap consumption problems. It would be better if the API could accept a Reader objetct additionally, like Lucene Document Fields do.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] [Commented] (LUCENE-3772) Highlighter needs the whole text
in memory to work
Posted by "Mark Harwood (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13476044#comment-13476044 ]
Mark Harwood commented on LUCENE-3772:
--------------------------------------
For bigger-than-memory docs is it not possible to use nested documents to represent subsections (e.g. a child doc for each of the chapters in a book) and then use BlockJoinQuery to select the best child docs?
Highlighting can then be used on a more-manageable subset of the original content and Lucene's ranking algos are being used to select the best "fragment" rather than the highlighter's own attempts to reproduce this logic.
Obviously depends on the shape of your content/queries but books-and-chapters is probably a good fit for this approach.
> Highlighter needs the whole text in memory to work
> --------------------------------------------------
>
> Key: LUCENE-3772
> URL: https://issues.apache.org/jira/browse/LUCENE-3772
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/highlighter
> Affects Versions: 3.5
> Environment: Windows 7 Enterprise x64, JRE 1.6.0_25
> Reporter: Luis Filipe Nassif
> Labels: highlighter, improvement, memory
>
> Highlighter methods getBestFragment(s) and getBestTextFragments only accept a String object representing the whole text to highlight. When dealing with very large docs simultaneously, it can lead to heap consumption problems. It would be better if the API could accept a Reader objetct additionally, like Lucene Document Fields do.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
[jira] [Commented] (LUCENE-3772) Highlighter needs the whole text
in memory to work
Posted by "Luis Filipe Nassif (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/LUCENE-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13475924#comment-13475924 ]
Luis Filipe Nassif commented on LUCENE-3772:
--------------------------------------------
I think this improvement is still important.
> Highlighter needs the whole text in memory to work
> --------------------------------------------------
>
> Key: LUCENE-3772
> URL: https://issues.apache.org/jira/browse/LUCENE-3772
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/highlighter
> Affects Versions: 3.5
> Environment: Windows 7 Enterprise x64, JRE 1.6.0_25
> Reporter: Luis Filipe Nassif
> Labels: highlighter, improvement, memory
>
> Highlighter methods getBestFragment(s) and getBestTextFragments only accept a String object representing the whole text to highlight. When dealing with very large docs simultaneously, it can lead to heap consumption problems. It would be better if the API could accept a Reader objetct additionally, like Lucene Document Fields do.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org