You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Bohnsack, Sven" <Sv...@shopping24.de> on 2011/06/07 22:25:46 UTC
How to deal with many files using solr external file field
Hi all,
we're using solr 1.4 and external file field ([1]) for sorting our searchresults. We have about 40.000 Terms, for which we use this sorting option.
Currently we're running into massive OutOfMemory-Problems and were not pretty sure, what's the matter. It seems that the garbage collector stops working or some processes are going wild. However, solr starts to allocate more and more RAM until we experience this OutOfMemory-Exception.
We noticed the following:
For some terms one could see in the solr log that there appear some java.io.FileNotFoundExceptions, when solr tries to load an external file for a term for which there is not such a file, e.g. solr tries to load the external score file for "trousers" but there ist none in the /solr/data-Folder.
Question: is it possible, that those exceptions are responsible for the OutOfMemory-Problem or could it be due to the large(?) number of 40k terms for which we want to sort the result via external file field?
I'm looking forward for your answers, suggestions and ideas :)
Regards
Sven
[1]: http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html
Re: AW: How to deal with many files using solr external file field
Posted by Chris Hostetter <ho...@fucit.org>.
: We took a deeper look at what happened, when an "external-file-field"-Request is sent to SOLR:
:
: * SOLR looks if there is a file for the requested query, e.g. "trousers"
Something smells fishy here.
ExternalFileField is designed to let you load values for a field (for use
in functions) from a file where the file name is determined from the field
name
If ExternalFileField is trying to load a file named "external_trousers"
that means your query is attempting ot use "trousers" as a *field* ...
that doesn't sound right.
based on your description of the memory blow up you are seeing, it sounds
like you are using the user's query string as a (dynamic?) field name and
none of these external_${query} files exist -- that's not really the
intended usage.
Can you clarify a bit more what exactly your goal is? This smells like an
XY Problem (my gut reaction is that you might actaully be wanting to
use QueryElevationComponent instead of ExternalFileField)...
http://people.apache.org/~hossman/#xyproblem
XY Problem
Your question appears to be an "XY Problem" ... that is: you are dealing
with "X", you are assuming "Y" will help you, and you are asking about "Y"
without giving more details about the "X" so that we can understand the
full issue. Perhaps the best solution doesn't involve "Y" at all?
See Also: http://www.perlmonks.org/index.pl?node_id=542341
-Hoss
Re: AW: How to deal with many files using solr external file field
Posted by Martin Grotzke <ma...@googlemail.com>.
Hi,
as I'm also involved in this issue (on the side of Sven) I created a
patch, that replaces the float array by a map that stores score by doc,
so it contains as many entries as the external scoring file contains
lines, but no more.
I created an issue for this: https://issues.apache.org/jira/browse/SOLR-2583
It would be great if someone could have a look at it and comment.
Thanx for your feedback,
cheers,
Martin
On 06/08/2011 12:22 PM, Bohnsack, Sven wrote:
> Hi,
>
> I could not provide a stack trace and IMHO it won't provide some useful information. But we've made a good progress in the analysis.
>
> We took a deeper look at what happened, when an "external-file-field"-Request is sent to SOLR:
>
> * SOLR looks if there is a file for the requested query, e.g. "trousers"
> * If so, then SOLR loads the "trousers"-file and generates a HashMap-Entry consisting of a FileFloatSource-Object and a FloatArray with the size of the number of documents in the SOLR-index. Every document matched by the query gains the score-value, which is provided in the external-score-file. For every(!) other document SOLR writes a zero in that FloatArray
> * if SOLR does not find a file for the query-Request, then SOLR still generates a HashMapEntry with score zero for every document
>
> In our case we have about 8.5 Mio. documents in our index and one of those Arrays occupies about 34MB Heap Space. Having e.g. 100 different queries and using external file field for sorting the result, SOLR occupies about 3.4GB of Heap Space.
>
> The problem might be the use of WeakHashMap [1], which prevents the Garbage Collector from cleaning up unused Keys.
>
>
> What do you think could be a possible solution for this whole problem? (except from "don't use external file fields" ;)
>
>
> Regards
> Sven
>
>
> [1]: "A hashtable-based Map implementation with weak keys. An entry in a WeakHashMap will automatically be removed when its key is no longer in ordinary use. More precisely, the presence of a mapping for a given key will not prevent the key from being discarded by the garbage collector, that is, made finalizable, finalized, and then reclaimed. When a key has been discarded its entry is effectively removed from the map, so this class behaves somewhat differently than other Map implementations."
>
> -----Ursprüngliche Nachricht-----
> Von: mtnest46@gmail.com [mailto:mtnest46@gmail.com] Im Auftrag von Simon Rosenthal
> Gesendet: Mittwoch, 8. Juni 2011 03:56
> An: solr-user@lucene.apache.org
> Betreff: Re: How to deal with many files using solr external file field
>
> Can you provide a stack trace for the OOM eexception ?
>
> On Tue, Jun 7, 2011 at 4:25 PM, Bohnsack, Sven
> <Sv...@shopping24.de>wrote:
>
>> Hi all,
>>
>> we're using solr 1.4 and external file field ([1]) for sorting our
>> searchresults. We have about 40.000 Terms, for which we use this sorting
>> option.
>> Currently we're running into massive OutOfMemory-Problems and were not
>> pretty sure, what's the matter. It seems that the garbage collector stops
>> working or some processes are going wild. However, solr starts to allocate
>> more and more RAM until we experience this OutOfMemory-Exception.
>>
>>
>> We noticed the following:
>>
>> For some terms one could see in the solr log that there appear some
>> java.io.FileNotFoundExceptions, when solr tries to load an external file for
>> a term for which there is not such a file, e.g. solr tries to load the
>> external score file for "trousers" but there ist none in the
>> /solr/data-Folder.
>>
>> Question: is it possible, that those exceptions are responsible for the
>> OutOfMemory-Problem or could it be due to the large(?) number of 40k terms
>> for which we want to sort the result via external file field?
>>
>> I'm looking forward for your answers, suggestions and ideas :)
>>
>>
>> Regards
>> Sven
>>
>>
>> [1]:
>> http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html
>>
--
Martin Grotzke
http://twitter.com/martin_grotzke
AW: How to deal with many files using solr external file field
Posted by "Bohnsack, Sven" <Sv...@shopping24.de>.
Hi,
I could not provide a stack trace and IMHO it won't provide some useful information. But we've made a good progress in the analysis.
We took a deeper look at what happened, when an "external-file-field"-Request is sent to SOLR:
* SOLR looks if there is a file for the requested query, e.g. "trousers"
* If so, then SOLR loads the "trousers"-file and generates a HashMap-Entry consisting of a FileFloatSource-Object and a FloatArray with the size of the number of documents in the SOLR-index. Every document matched by the query gains the score-value, which is provided in the external-score-file. For every(!) other document SOLR writes a zero in that FloatArray
* if SOLR does not find a file for the query-Request, then SOLR still generates a HashMapEntry with score zero for every document
In our case we have about 8.5 Mio. documents in our index and one of those Arrays occupies about 34MB Heap Space. Having e.g. 100 different queries and using external file field for sorting the result, SOLR occupies about 3.4GB of Heap Space.
The problem might be the use of WeakHashMap [1], which prevents the Garbage Collector from cleaning up unused Keys.
What do you think could be a possible solution for this whole problem? (except from "don't use external file fields" ;)
Regards
Sven
[1]: "A hashtable-based Map implementation with weak keys. An entry in a WeakHashMap will automatically be removed when its key is no longer in ordinary use. More precisely, the presence of a mapping for a given key will not prevent the key from being discarded by the garbage collector, that is, made finalizable, finalized, and then reclaimed. When a key has been discarded its entry is effectively removed from the map, so this class behaves somewhat differently than other Map implementations."
-----Ursprüngliche Nachricht-----
Von: mtnest46@gmail.com [mailto:mtnest46@gmail.com] Im Auftrag von Simon Rosenthal
Gesendet: Mittwoch, 8. Juni 2011 03:56
An: solr-user@lucene.apache.org
Betreff: Re: How to deal with many files using solr external file field
Can you provide a stack trace for the OOM eexception ?
On Tue, Jun 7, 2011 at 4:25 PM, Bohnsack, Sven
<Sv...@shopping24.de>wrote:
> Hi all,
>
> we're using solr 1.4 and external file field ([1]) for sorting our
> searchresults. We have about 40.000 Terms, for which we use this sorting
> option.
> Currently we're running into massive OutOfMemory-Problems and were not
> pretty sure, what's the matter. It seems that the garbage collector stops
> working or some processes are going wild. However, solr starts to allocate
> more and more RAM until we experience this OutOfMemory-Exception.
>
>
> We noticed the following:
>
> For some terms one could see in the solr log that there appear some
> java.io.FileNotFoundExceptions, when solr tries to load an external file for
> a term for which there is not such a file, e.g. solr tries to load the
> external score file for "trousers" but there ist none in the
> /solr/data-Folder.
>
> Question: is it possible, that those exceptions are responsible for the
> OutOfMemory-Problem or could it be due to the large(?) number of 40k terms
> for which we want to sort the result via external file field?
>
> I'm looking forward for your answers, suggestions and ideas :)
>
>
> Regards
> Sven
>
>
> [1]:
> http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html
>
Re: How to deal with many files using solr external file field
Posted by Simon Rosenthal <si...@yahoo.com>.
Can you provide a stack trace for the OOM eexception ?
On Tue, Jun 7, 2011 at 4:25 PM, Bohnsack, Sven
<Sv...@shopping24.de>wrote:
> Hi all,
>
> we're using solr 1.4 and external file field ([1]) for sorting our
> searchresults. We have about 40.000 Terms, for which we use this sorting
> option.
> Currently we're running into massive OutOfMemory-Problems and were not
> pretty sure, what's the matter. It seems that the garbage collector stops
> working or some processes are going wild. However, solr starts to allocate
> more and more RAM until we experience this OutOfMemory-Exception.
>
>
> We noticed the following:
>
> For some terms one could see in the solr log that there appear some
> java.io.FileNotFoundExceptions, when solr tries to load an external file for
> a term for which there is not such a file, e.g. solr tries to load the
> external score file for "trousers" but there ist none in the
> /solr/data-Folder.
>
> Question: is it possible, that those exceptions are responsible for the
> OutOfMemory-Problem or could it be due to the large(?) number of 40k terms
> for which we want to sort the result via external file field?
>
> I'm looking forward for your answers, suggestions and ideas :)
>
>
> Regards
> Sven
>
>
> [1]:
> http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html
>