You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Mirko Sertic <mi...@web.de> on 2014/03/24 15:02:59 UTC
Indexing and storing very large documents
Hi there
I am searching for a way to store very large documents in a Lucene 4.7 index and keep them ready to use the PostingsHighlighter for search result highlighting.
I do not want to read the whole document into memory, as this would consume too much memory or could cause an OutOHeapSpace. So i have to use a Reader. Unfortunately i cannot pass a Reader to a Field with FieldType.stored() = true. Any ideas? Code examples would be very cool :-)
Thanks in advance
Mirko
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Aw: RE: Indexing and storing very large documents
Posted by Alexandre Patry <al...@keatext.com>.
On 14-03-24 11:26 AM, Mirko Sertic wrote:
> Ah, ok, so i cannot use PostingsHighlighter as it requires stored fields, right?
The field can be stored anywhere, not necessarily in the index. Here is
something that might work:
1. Store the N first characters of your field in a database.
2. Override PostingsHighlighter#loadFieldValues to load the field from
the database.
3. Specify your prefix length in PostingHighlighter's constructor.
However, it searches for a snippet only in the beginning of the document.
Hope this help,
Alexandre
>
> Regards
> Mirko
>
>
>
> Gesendet: Montag, 24. März 2014 um 16:01 Uhr
> Von: "Uwe Schindler" <uw...@thetaphi.de>
> An: java-user@lucene.apache.org
> Betreff: RE: Indexing and storing very large documents
> Stored fields do not support Readers at the moment.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
>> -----Original Message-----
>> From: Mirko Sertic [mailto:mirko.sertic@web.de]
>> Sent: Monday, March 24, 2014 3:03 PM
>> To: java-user@lucene.apache.org
>> Subject: Indexing and storing very large documents
>>
>> Hi there
>>
>> I am searching for a way to store very large documents in a Lucene 4.7 index
>> and keep them ready to use the PostingsHighlighter for search result
>> highlighting.
>>
>> I do not want to read the whole document into memory, as this would
>> consume too much memory or could cause an OutOHeapSpace. So i have to
>> use a Reader. Unfortunately i cannot pass a Reader to a Field with
>> FieldType.stored() = true. Any ideas? Code examples would be very cool :-)
>>
>> Thanks in advance
>> Mirko
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
--
Alexandre Patry, Ph.D
Chercheur / Researcher
http://KeaText.com
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Aw: RE: Indexing and storing very large documents
Posted by Mirko Sertic <mi...@web.de>.
Ah, ok, so i cannot use PostingsHighlighter as it requires stored fields, right?
Regards
Mirko
Gesendet: Montag, 24. März 2014 um 16:01 Uhr
Von: "Uwe Schindler" <uw...@thetaphi.de>
An: java-user@lucene.apache.org
Betreff: RE: Indexing and storing very large documents
Stored fields do not support Readers at the moment.
-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de
> -----Original Message-----
> From: Mirko Sertic [mailto:mirko.sertic@web.de]
> Sent: Monday, March 24, 2014 3:03 PM
> To: java-user@lucene.apache.org
> Subject: Indexing and storing very large documents
>
> Hi there
>
> I am searching for a way to store very large documents in a Lucene 4.7 index
> and keep them ready to use the PostingsHighlighter for search result
> highlighting.
>
> I do not want to read the whole document into memory, as this would
> consume too much memory or could cause an OutOHeapSpace. So i have to
> use a Reader. Unfortunately i cannot pass a Reader to a Field with
> FieldType.stored() = true. Any ideas? Code examples would be very cool :-)
>
> Thanks in advance
> Mirko
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
RE: Indexing and storing very large documents
Posted by Uwe Schindler <uw...@thetaphi.de>.
Stored fields do not support Readers at the moment.
-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de
> -----Original Message-----
> From: Mirko Sertic [mailto:mirko.sertic@web.de]
> Sent: Monday, March 24, 2014 3:03 PM
> To: java-user@lucene.apache.org
> Subject: Indexing and storing very large documents
>
> Hi there
>
> I am searching for a way to store very large documents in a Lucene 4.7 index
> and keep them ready to use the PostingsHighlighter for search result
> highlighting.
>
> I do not want to read the whole document into memory, as this would
> consume too much memory or could cause an OutOHeapSpace. So i have to
> use a Reader. Unfortunately i cannot pass a Reader to a Field with
> FieldType.stored() = true. Any ideas? Code examples would be very cool :-)
>
> Thanks in advance
> Mirko
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org