You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Rahul R <ra...@gmail.com> on 2009/07/31 13:17:30 UTC

Re: Limiting facets for huge data - setting indexed=false in schema.xml

Erik,
I understand that caching is going to improve performance. Infact we did a
PSR run with caches enabled and we got awesome results. But these wouldn't
be really representative because the PSR scripts will be doing the same
searches again and again. These would be cached and there would be virtually
no evictions. This is not a practical case.

My hardware (in the PSR environment where I am testing) is pretty good - 12
CPU, 24 G RAM, Ultrasparc III 1.2 GHz processors, Solaris 10. We have
allocated 3.2 GB RAM for Weblogic (JVM). This is the maximum that I am able
to allocate for one JVM.
I think I need to go back and check if I am not using all the fields in the
query. I understand that setting indexed=false alone will not ensure that
all fields don't participate in the query.

Thanks a lot for your response.

Regards
Rahul
On Fri, Jul 31, 2009 at 3:33 PM, Erik Hatcher <er...@ehatchersolutions.com>wrote:

>
> On Jul 31, 2009, at 2:35 AM, Rahul R wrote:
>
> Hello,
>> We are trying to get Solr to work for a really huge parts database.
>> Details
>> of the database
>> - 55 million parts
>> - Totally 3700 properties (facets). But each record will not have value
>> for
>> all properties.
>> - Most of these facets are defined as dynamic fields within the Solr Index
>>
>> We were getting really unacceptable timing while doing faceting/searches
>> on
>> an index created with this database.
>>
>
> Were you accounting for cache warming?  Were your caches sized
> appropriately?  What kind of hardware and RAM were you using?  What were the
> JVM settings?
>
> And certainly not least important - what version of Solr are you running?
> The difference in faceting performance and scalability between Solr 1.3 and
> what will be Solr 1.4 is quite dramatic.
>
> We thought that by limiting the number of properties that are available for
>> faceting, the performance can be improved. To test this, we enabled only 6
>> properties for faceting by setting indexed=true (in schema.xml) for only
>> these properties. All other properties which are defined as dynamic
>> properties had indexed=false.
>>
>
> These settings won't matter - what matters in this case is what facets you
> request, not what is actually in the index.
>
>
> My questions:
>> - Will reducing the number of facets improve faceting and search
>> performance ?
>>
>
> Reducing what fields you request will, of course.  But what you actually
> index has no effect on performance until you request it.
>
> - Is there a better way to reduce the number of facets ?
>>
>
> Hard to say without doing a deeper analysis of your needs.
>
> - Will having a large number of properties defined as dynamic fields,
>> reduce
>> performance ?
>>
>
> Dynamic fields versus statically named fields have no effect on
> performance.
>
>        Erik
>
>

Re: Limiting facets for huge data - setting indexed=false in schema.xml

Posted by Rahul R <ra...@gmail.com>.
We are using 1.3.0. Thanks for the suggestion. Will see if I can try one of
the ngihtly builds.

On Fri, Jul 31, 2009 at 7:49 PM, Erik Hatcher <er...@ehatchersolutions.com>wrote:

> What version of Solr?   Try a nightly build if you're at Solr 1.3 or
> earlier and you'll be amazed at the difference.
>
>        Erik
>
>
> On Jul 31, 2009, at 10:00 AM, Rahul R wrote:
>
> In a production environment, having the caches enabled makes a lot of
>> sense.
>> And most definitely we will be enabling them. However, the primary idea of
>> this exercise is to verify if limiting the number of facets will actually
>> improve the performance.
>>
>> An update on this. I did verify and looks like although I set
>> indexed=false
>> for most of the properties, I have not blocked them from participating in
>> the query. I now enabled only 7 properties for faceting. Now at any given
>> time only a maximum of 7 facets will participate in the query. Performance
>> has now improved from an erstwhile 60 seconds to around 10 seconds.
>>
>> This really helped. Thanks a lot !
>>
>> Regards
>> Rahul
>>
>> On Fri, Jul 31, 2009 at 6:34 PM, Erik Hatcher <erik@ehatchersolutions.com
>> >wrote:
>>
>>
>>> On Jul 31, 2009, at 7:17 AM, Rahul R wrote:
>>>
>>> Erik,
>>>
>>>> I understand that caching is going to improve performance. Infact we did
>>>> a
>>>> PSR run with caches enabled and we got awesome results. But these
>>>> wouldn't
>>>> be really representative because the PSR scripts will be doing the same
>>>> searches again and again. These would be cached and there would be
>>>> virtually
>>>> no evictions. This is not a practical case.
>>>>
>>>>
>>> I don't understand how this is not practical.  Why wouldn't having the
>>> caches warmed and filled with the facets be practical for your needs?
>>>
>>>      Erik
>>>
>>>
>>>
>

Re: Limiting facets for huge data - setting indexed=false in schema.xml

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
What version of Solr?   Try a nightly build if you're at Solr 1.3 or  
earlier and you'll be amazed at the difference.

	Erik

On Jul 31, 2009, at 10:00 AM, Rahul R wrote:

> In a production environment, having the caches enabled makes a lot  
> of sense.
> And most definitely we will be enabling them. However, the primary  
> idea of
> this exercise is to verify if limiting the number of facets will  
> actually
> improve the performance.
>
> An update on this. I did verify and looks like although I set  
> indexed=false
> for most of the properties, I have not blocked them from  
> participating in
> the query. I now enabled only 7 properties for faceting. Now at any  
> given
> time only a maximum of 7 facets will participate in the query.  
> Performance
> has now improved from an erstwhile 60 seconds to around 10 seconds.
>
> This really helped. Thanks a lot !
>
> Regards
> Rahul
>
> On Fri, Jul 31, 2009 at 6:34 PM, Erik Hatcher <erik@ehatchersolutions.com 
> >wrote:
>
>>
>> On Jul 31, 2009, at 7:17 AM, Rahul R wrote:
>>
>> Erik,
>>> I understand that caching is going to improve performance. Infact  
>>> we did a
>>> PSR run with caches enabled and we got awesome results. But these  
>>> wouldn't
>>> be really representative because the PSR scripts will be doing the  
>>> same
>>> searches again and again. These would be cached and there would be
>>> virtually
>>> no evictions. This is not a practical case.
>>>
>>
>> I don't understand how this is not practical.  Why wouldn't having  
>> the
>> caches warmed and filled with the facets be practical for your needs?
>>
>>       Erik
>>
>>


Re: Limiting facets for huge data - setting indexed=false in schema.xml

Posted by Rahul R <ra...@gmail.com>.
In a production environment, having the caches enabled makes a lot of sense.
And most definitely we will be enabling them. However, the primary idea of
this exercise is to verify if limiting the number of facets will actually
improve the performance.

An update on this. I did verify and looks like although I set indexed=false
for most of the properties, I have not blocked them from participating in
the query. I now enabled only 7 properties for faceting. Now at any given
time only a maximum of 7 facets will participate in the query. Performance
has now improved from an erstwhile 60 seconds to around 10 seconds.

This really helped. Thanks a lot !

Regards
Rahul

On Fri, Jul 31, 2009 at 6:34 PM, Erik Hatcher <er...@ehatchersolutions.com>wrote:

>
> On Jul 31, 2009, at 7:17 AM, Rahul R wrote:
>
> Erik,
>> I understand that caching is going to improve performance. Infact we did a
>> PSR run with caches enabled and we got awesome results. But these wouldn't
>> be really representative because the PSR scripts will be doing the same
>> searches again and again. These would be cached and there would be
>> virtually
>> no evictions. This is not a practical case.
>>
>
> I don't understand how this is not practical.  Why wouldn't having the
> caches warmed and filled with the facets be practical for your needs?
>
>        Erik
>
>

Re: Limiting facets for huge data - setting indexed=false in schema.xml

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Jul 31, 2009, at 7:17 AM, Rahul R wrote:

> Erik,
> I understand that caching is going to improve performance. Infact we  
> did a
> PSR run with caches enabled and we got awesome results. But these  
> wouldn't
> be really representative because the PSR scripts will be doing the  
> same
> searches again and again. These would be cached and there would be  
> virtually
> no evictions. This is not a practical case.

I don't understand how this is not practical.  Why wouldn't having the  
caches warmed and filled with the facets be practical for your needs?

	Erik