You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Mirko Sertic <mi...@web.de> on 2014/03/24 15:02:59 UTC

Indexing and storing very large documents

Hi there
 
I am searching for a way to store very large documents in a Lucene 4.7 index and keep them ready to use the PostingsHighlighter for search result highlighting.
 
I do not want to read the whole document into memory, as this would consume too much memory or could cause an OutOHeapSpace. So i have to use a Reader. Unfortunately i cannot pass a Reader to a Field with FieldType.stored() = true. Any ideas? Code examples would be very cool :-)
 
Thanks in advance
Mirko

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Aw: RE: Indexing and storing very large documents

Posted by Alexandre Patry <al...@keatext.com>.
On 14-03-24 11:26 AM, Mirko Sertic wrote:
> Ah, ok, so i cannot use PostingsHighlighter as it requires stored fields, right?
The field can be stored anywhere, not necessarily in the index. Here is 
something that might work:

1. Store the N first characters of your field in a database.
2. Override PostingsHighlighter#loadFieldValues to load the field from 
the database.
3. Specify your prefix length in PostingHighlighter's constructor.

However, it searches for a snippet only in the beginning of the document.

Hope this help,

Alexandre

>   
> Regards
> Mirko
>   
>   
>
> Gesendet: Montag, 24. März 2014 um 16:01 Uhr
> Von: "Uwe Schindler" <uw...@thetaphi.de>
> An: java-user@lucene.apache.org
> Betreff: RE: Indexing and storing very large documents
> Stored fields do not support Readers at the moment.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
>> -----Original Message-----
>> From: Mirko Sertic [mailto:mirko.sertic@web.de]
>> Sent: Monday, March 24, 2014 3:03 PM
>> To: java-user@lucene.apache.org
>> Subject: Indexing and storing very large documents
>>
>> Hi there
>>
>> I am searching for a way to store very large documents in a Lucene 4.7 index
>> and keep them ready to use the PostingsHighlighter for search result
>> highlighting.
>>
>> I do not want to read the whole document into memory, as this would
>> consume too much memory or could cause an OutOHeapSpace. So i have to
>> use a Reader. Unfortunately i cannot pass a Reader to a Field with
>> FieldType.stored() = true. Any ideas? Code examples would be very cool :-)
>>
>> Thanks in advance
>> Mirko
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>   
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


-- 
Alexandre Patry, Ph.D
Chercheur / Researcher
http://KeaText.com


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Aw: RE: Indexing and storing very large documents

Posted by Mirko Sertic <mi...@web.de>.
Ah, ok, so i cannot use PostingsHighlighter as it requires stored fields, right?
 
Regards
Mirko
 
 

Gesendet: Montag, 24. März 2014 um 16:01 Uhr
Von: "Uwe Schindler" <uw...@thetaphi.de>
An: java-user@lucene.apache.org
Betreff: RE: Indexing and storing very large documents
Stored fields do not support Readers at the moment.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Mirko Sertic [mailto:mirko.sertic@web.de]
> Sent: Monday, March 24, 2014 3:03 PM
> To: java-user@lucene.apache.org
> Subject: Indexing and storing very large documents
>
> Hi there
>
> I am searching for a way to store very large documents in a Lucene 4.7 index
> and keep them ready to use the PostingsHighlighter for search result
> highlighting.
>
> I do not want to read the whole document into memory, as this would
> consume too much memory or could cause an OutOHeapSpace. So i have to
> use a Reader. Unfortunately i cannot pass a Reader to a Field with
> FieldType.stored() = true. Any ideas? Code examples would be very cool :-)
>
> Thanks in advance
> Mirko
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Indexing and storing very large documents

Posted by Uwe Schindler <uw...@thetaphi.de>.
Stored fields do not support Readers at the moment.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Mirko Sertic [mailto:mirko.sertic@web.de]
> Sent: Monday, March 24, 2014 3:03 PM
> To: java-user@lucene.apache.org
> Subject: Indexing and storing very large documents
> 
> Hi there
> 
> I am searching for a way to store very large documents in a Lucene 4.7 index
> and keep them ready to use the PostingsHighlighter for search result
> highlighting.
> 
> I do not want to read the whole document into memory, as this would
> consume too much memory or could cause an OutOHeapSpace. So i have to
> use a Reader. Unfortunately i cannot pass a Reader to a Field with
> FieldType.stored() = true. Any ideas? Code examples would be very cool :-)
> 
> Thanks in advance
> Mirko
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org