You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Bohnsack, Sven" <Sv...@shopping24.de> on 2011/06/07 22:25:46 UTC

How to deal with many files using solr external file field

Hi all,

we're using solr 1.4 and external file field ([1]) for sorting our searchresults. We have about 40.000 Terms, for which we use this sorting option.
Currently we're running into massive OutOfMemory-Problems and were not pretty sure, what's the matter. It seems that the garbage collector stops working or some processes are going wild. However, solr starts to allocate more and more RAM until we experience this OutOfMemory-Exception.


We noticed the following:

For some terms one could see in the solr log that there appear some java.io.FileNotFoundExceptions, when solr tries to load an external file for a term for which there is not such a file, e.g. solr tries to load the external score file for "trousers" but there ist none in the /solr/data-Folder.

Question: is it possible, that those exceptions are responsible for the OutOfMemory-Problem or could it be due to the large(?) number of 40k terms for which we want to sort the result via external file field?

I'm looking forward for your answers, suggestions and ideas :)


Regards
Sven


[1]: http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html

Re: AW: How to deal with many files using solr external file field

Posted by Chris Hostetter <ho...@fucit.org>.
: We took a deeper look at what happened, when an "external-file-field"-Request is sent to SOLR:
: 
: * SOLR looks if there is a file for the requested query, e.g. "trousers"

Something smells fishy here.

ExternalFileField is designed to let you load values for a field (for use 
in functions) from a file where the file name is determined from the field 
name


If ExternalFileField is trying to load a file named "external_trousers" 
that means your query is attempting ot use "trousers" as a *field* ... 
that doesn't sound right.

based on your description of the memory blow up you are seeing, it sounds 
like you are using the user's query string as a (dynamic?) field name and 
none of these external_${query} files exist -- that's not really the 
intended usage.

Can you clarify a bit more what exactly your goal is?  This smells like an 
XY Problem (my gut reaction is that you might actaully be wanting to 
use QueryElevationComponent instead of ExternalFileField)...

http://people.apache.org/~hossman/#xyproblem
XY Problem

Your question appears to be an "XY Problem" ... that is: you are dealing
with "X", you are assuming "Y" will help you, and you are asking about "Y"
without giving more details about the "X" so that we can understand the
full issue.  Perhaps the best solution doesn't involve "Y" at all?
See Also: http://www.perlmonks.org/index.pl?node_id=542341


-Hoss

Re: AW: How to deal with many files using solr external file field

Posted by Martin Grotzke <ma...@googlemail.com>.
Hi,

as I'm also involved in this issue (on the side of Sven) I created a
patch, that replaces the float array by a map that stores score by doc,
so it contains as many entries as the external scoring file contains
lines, but no more.

I created an issue for this: https://issues.apache.org/jira/browse/SOLR-2583

It would be great if someone could have a look at it and comment.

Thanx for your feedback,
cheers,
Martin


On 06/08/2011 12:22 PM, Bohnsack, Sven wrote:
> Hi,
> 
> I could not provide a stack trace and IMHO it won't provide some useful information. But we've made a good progress in the analysis.
> 
> We took a deeper look at what happened, when an "external-file-field"-Request is sent to SOLR:
> 
> * SOLR looks if there is a file for the requested query, e.g. "trousers"
> * If so, then SOLR loads the "trousers"-file and generates a HashMap-Entry consisting of a FileFloatSource-Object and a FloatArray with the size of the number of documents in the SOLR-index. Every document matched by the query gains the score-value, which is provided in the external-score-file. For every(!) other document SOLR writes a zero in that FloatArray
> * if SOLR does not find a file for the query-Request, then SOLR still generates a HashMapEntry with score zero for every document
> 
> In our case we have about 8.5 Mio. documents in our index and one of those Arrays occupies about 34MB Heap Space. Having e.g. 100 different queries and using external file field for sorting the result, SOLR occupies about 3.4GB of Heap Space.
> 
> The problem might be the use of WeakHashMap [1], which prevents the Garbage Collector from cleaning up unused Keys.
> 
> 
> What do you think could be a possible solution for this whole problem? (except from "don't use external file fields" ;)
> 
> 
> Regards
> Sven
> 
> 
> [1]: "A hashtable-based Map implementation with weak keys. An entry in a WeakHashMap will automatically be removed when its key is no longer in ordinary use. More precisely, the presence of a mapping for a given key will not prevent the key from being discarded by the garbage collector, that is, made finalizable, finalized, and then reclaimed. When a key has been discarded its entry is effectively removed from the map, so this class behaves somewhat differently than other Map implementations."
> 
> -----Ursprüngliche Nachricht-----
> Von: mtnest46@gmail.com [mailto:mtnest46@gmail.com] Im Auftrag von Simon Rosenthal
> Gesendet: Mittwoch, 8. Juni 2011 03:56
> An: solr-user@lucene.apache.org
> Betreff: Re: How to deal with many files using solr external file field
> 
> Can you provide a stack trace for the OOM eexception ?
> 
> On Tue, Jun 7, 2011 at 4:25 PM, Bohnsack, Sven
> <Sv...@shopping24.de>wrote:
> 
>> Hi all,
>>
>> we're using solr 1.4 and external file field ([1]) for sorting our
>> searchresults. We have about 40.000 Terms, for which we use this sorting
>> option.
>> Currently we're running into massive OutOfMemory-Problems and were not
>> pretty sure, what's the matter. It seems that the garbage collector stops
>> working or some processes are going wild. However, solr starts to allocate
>> more and more RAM until we experience this OutOfMemory-Exception.
>>
>>
>> We noticed the following:
>>
>> For some terms one could see in the solr log that there appear some
>> java.io.FileNotFoundExceptions, when solr tries to load an external file for
>> a term for which there is not such a file, e.g. solr tries to load the
>> external score file for "trousers" but there ist none in the
>> /solr/data-Folder.
>>
>> Question: is it possible, that those exceptions are responsible for the
>> OutOfMemory-Problem or could it be due to the large(?) number of 40k terms
>> for which we want to sort the result via external file field?
>>
>> I'm looking forward for your answers, suggestions and ideas :)
>>
>>
>> Regards
>> Sven
>>
>>
>> [1]:
>> http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html
>>

-- 
Martin Grotzke
http://twitter.com/martin_grotzke


AW: How to deal with many files using solr external file field

Posted by "Bohnsack, Sven" <Sv...@shopping24.de>.
Hi,

I could not provide a stack trace and IMHO it won't provide some useful information. But we've made a good progress in the analysis.

We took a deeper look at what happened, when an "external-file-field"-Request is sent to SOLR:

* SOLR looks if there is a file for the requested query, e.g. "trousers"
* If so, then SOLR loads the "trousers"-file and generates a HashMap-Entry consisting of a FileFloatSource-Object and a FloatArray with the size of the number of documents in the SOLR-index. Every document matched by the query gains the score-value, which is provided in the external-score-file. For every(!) other document SOLR writes a zero in that FloatArray
* if SOLR does not find a file for the query-Request, then SOLR still generates a HashMapEntry with score zero for every document

In our case we have about 8.5 Mio. documents in our index and one of those Arrays occupies about 34MB Heap Space. Having e.g. 100 different queries and using external file field for sorting the result, SOLR occupies about 3.4GB of Heap Space.

The problem might be the use of WeakHashMap [1], which prevents the Garbage Collector from cleaning up unused Keys.


What do you think could be a possible solution for this whole problem? (except from "don't use external file fields" ;)


Regards
Sven


[1]: "A hashtable-based Map implementation with weak keys. An entry in a WeakHashMap will automatically be removed when its key is no longer in ordinary use. More precisely, the presence of a mapping for a given key will not prevent the key from being discarded by the garbage collector, that is, made finalizable, finalized, and then reclaimed. When a key has been discarded its entry is effectively removed from the map, so this class behaves somewhat differently than other Map implementations."

-----Ursprüngliche Nachricht-----
Von: mtnest46@gmail.com [mailto:mtnest46@gmail.com] Im Auftrag von Simon Rosenthal
Gesendet: Mittwoch, 8. Juni 2011 03:56
An: solr-user@lucene.apache.org
Betreff: Re: How to deal with many files using solr external file field

Can you provide a stack trace for the OOM eexception ?

On Tue, Jun 7, 2011 at 4:25 PM, Bohnsack, Sven
<Sv...@shopping24.de>wrote:

> Hi all,
>
> we're using solr 1.4 and external file field ([1]) for sorting our
> searchresults. We have about 40.000 Terms, for which we use this sorting
> option.
> Currently we're running into massive OutOfMemory-Problems and were not
> pretty sure, what's the matter. It seems that the garbage collector stops
> working or some processes are going wild. However, solr starts to allocate
> more and more RAM until we experience this OutOfMemory-Exception.
>
>
> We noticed the following:
>
> For some terms one could see in the solr log that there appear some
> java.io.FileNotFoundExceptions, when solr tries to load an external file for
> a term for which there is not such a file, e.g. solr tries to load the
> external score file for "trousers" but there ist none in the
> /solr/data-Folder.
>
> Question: is it possible, that those exceptions are responsible for the
> OutOfMemory-Problem or could it be due to the large(?) number of 40k terms
> for which we want to sort the result via external file field?
>
> I'm looking forward for your answers, suggestions and ideas :)
>
>
> Regards
> Sven
>
>
> [1]:
> http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html
>

Re: How to deal with many files using solr external file field

Posted by Simon Rosenthal <si...@yahoo.com>.
Can you provide a stack trace for the OOM eexception ?

On Tue, Jun 7, 2011 at 4:25 PM, Bohnsack, Sven
<Sv...@shopping24.de>wrote:

> Hi all,
>
> we're using solr 1.4 and external file field ([1]) for sorting our
> searchresults. We have about 40.000 Terms, for which we use this sorting
> option.
> Currently we're running into massive OutOfMemory-Problems and were not
> pretty sure, what's the matter. It seems that the garbage collector stops
> working or some processes are going wild. However, solr starts to allocate
> more and more RAM until we experience this OutOfMemory-Exception.
>
>
> We noticed the following:
>
> For some terms one could see in the solr log that there appear some
> java.io.FileNotFoundExceptions, when solr tries to load an external file for
> a term for which there is not such a file, e.g. solr tries to load the
> external score file for "trousers" but there ist none in the
> /solr/data-Folder.
>
> Question: is it possible, that those exceptions are responsible for the
> OutOfMemory-Problem or could it be due to the large(?) number of 40k terms
> for which we want to sort the result via external file field?
>
> I'm looking forward for your answers, suggestions and ideas :)
>
>
> Regards
> Sven
>
>
> [1]:
> http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html
>