You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by Mike Klaas <mi...@gmail.com> on 2007/05/25 22:41:20 UTC

solrconfig.xml defaults

Since auditing solrconfig.xml defaults is on the list of things for  
1.2, I thought I'd get the ball rolling:

Lazy field loading: seems like it would benefit more people to be  
enabled explicitly.  I've been using it successfully and some  
substantial gains have been reported on the lucene list.  The  
downsides don't really seen significant.

HashDocSet maxSize: perhaps consider increasing this, or making this  
by default a parameter which is tuned automatically (.5% of maxDocs,  
for instance)

Most people will start with the example solrconfig.xml, I suspect,  
and getting the  performance-related settings right at the start will  
help the perception of Solr's performance.  I'd be tempted to  
increase the default filterCache size too, but that can have quite  
high memory requirements.

-Mike

Re: solrconfig.xml defaults

Posted by Yonik Seeley <yo...@apache.org>.
On 5/25/07, Chris Hostetter <ho...@fucit.org> wrote:
>
> : What about commenting out most of the default parameters in the dismax
> : handler config, so it becomes more standard & usable (w/o editing it's
> : config) after someone customizes their schema?
>
> i'm torn on this ... those defaults make sense for the example schema/data
> -- which is the main point of the whole example/solr/conf.  but i
> appreciate that people can be confused by errors from dismax when they
> chagne their schema (see pingQuery)
>
> perhaps the best solution is to remove the qf/pf/bf defaults for "dismax"
> and add them to "partitioned"

It's minor... I'm OK with either.
For a person just learning dismax though, it almost made sense to "build up"
by adding additional parameters to get more complex queries.

In any case, I changed pingQuery to be simple... IMO, it's not meant
to be a complex test or anything, but just that a query *can* be
issued.

-Yonik

Re: solrconfig.xml defaults

Posted by Chris Hostetter <ho...@fucit.org>.
: What about commenting out most of the default parameters in the dismax
: handler config, so it becomes more standard & usable (w/o editing it's
: config) after someone customizes their schema?

i'm torn on this ... those defaults make sense for the example schema/data
-- which is the main point of the whole example/solr/conf.  but i
appreciate that people can be confused by errors from dismax when they
chagne their schema (see pingQuery)

perhaps the best solution is to remove the qf/pf/bf defaults for "dismax"
and add them to "partitioned"


-Hoss


Re: solrconfig.xml defaults

Posted by Mike Klaas <mi...@gmail.com>.
On 26-May-07, at 7:08 AM, Yonik Seeley wrote:

> On 5/25/07, Mike Klaas <mi...@gmail.com> wrote:
>>
>> Wasn't HashDocSet significantly optimized for intersection recently?
>
> More like optimized/simplified for storing lucene doc ids. Only about
> 8-10% speedup.
> OpenBitSet was more on the order of 2 to 4 times improvement over
> BitSet for intersections.
> Here's one data point from someone else:
>
> http://www.nabble.com/Aggregating-category-hits- 
> tf1623611.html#a4831982

I'm convinced.  The current default is sensible.

-Mike

Re: solrconfig.xml defaults

Posted by Yonik Seeley <yo...@apache.org>.
On 5/25/07, Mike Klaas <mi...@gmail.com> wrote:
> On 25-May-07, at 2:09 PM, Yonik Seeley wrote:
>
> > On 5/25/07, Mike Klaas <mi...@gmail.com> wrote:
>
> >> HashDocSet maxSize: perhaps consider increasing this, or making this
> >> by default a parameter which is tuned automatically (.5% of maxDocs,
> >> for instance)
> >
> > I think when HashDocSet is large enough, it can be slower than
> > OpenBitSet for taking intersections, even when it still saves memory.
> > So it depends on what one is optimizing for.
> >
> > I picked 3000 long ago since that it seemed the fastest for faceting
> > with one particular data set (between 500K to 1M docs), but that was
> > before OpenBitSet.  It also caps the max table size at 4096 entries
>
> Wasn't HashDocSet significantly optimized for intersection recently?

More like optimized/simplified for storing lucene doc ids. Only about
8-10% speedup.
OpenBitSet was more on the order of 2 to 4 times improvement over
BitSet for intersections.
Here's one data point from someone else:

http://www.nabble.com/Aggregating-category-hits-tf1623611.html#a4831982

-Yonik

Re: solrconfig.xml defaults

Posted by Mike Klaas <mi...@gmail.com>.
On 25-May-07, at 2:09 PM, Yonik Seeley wrote:

> On 5/25/07, Mike Klaas <mi...@gmail.com> wrote:

>> HashDocSet maxSize: perhaps consider increasing this, or making this
>> by default a parameter which is tuned automatically (.5% of maxDocs,
>> for instance)
>
> I think when HashDocSet is large enough, it can be slower than
> OpenBitSet for taking intersections, even when it still saves memory.
> So it depends on what one is optimizing for.
>
> I picked 3000 long ago since that it seemed the fastest for faceting
> with one particular data set (between 500K to 1M docs), but that was
> before OpenBitSet.  It also caps the max table size at 4096 entries

Wasn't HashDocSet significantly optimized for intersection recently?

> (16K RAM) (power of two hash table with a load factor of .75).  Does
> it make sense to go up to 8K entries?  Do you have any data on
> different sizes?

Unfortunately, I don't.  I'm using 20K right now for indices ranging  
in size from 3-8M docs, but that was based on advice on the wiki, and  
the memory savings seemed worth it (each bit filter is pushing 500Kb  
to 1Mb at that scale).  I might have time to run some experiments  
before 1.2 is released.  If not, 3000 seems like a well-founded default.

>> Most people will start with the example solrconfig.xml, I suspect,
>> and getting the  performance-related settings right at the start will
>> help the perception of Solr's performance.  I'd be tempted to
>> increase the default filterCache size too, but that can have quite
>> high memory requirements.
>
> Yeah, many people won't think to increase the VM heap size.
> Perhaps that's better as a documentation fix.

I just added a note to SolrPerformanceFactors.  Most of the  
information is already on the wiki.

> What about commenting out most of the default parameters in the dismax
> handler config, so it becomes more standard & usable (w/o editing it's
> config) after someone customizes their schema?

Makes sense, but I agree with Hoss that it is nice for the user to be  
able to easily use the example OOB.

-Mike

Re: solrconfig.xml defaults

Posted by Yonik Seeley <yo...@apache.org>.
On 5/25/07, Mike Klaas <mi...@gmail.com> wrote:
> Since auditing solrconfig.xml defaults is on the list of things for
> 1.2, I thought I'd get the ball rolling:

Thanks, that was one of the things I was looking into now (hitting all
the new URLs and seeing what they looked like too)

> Lazy field loading: seems like it would benefit more people to be
> enabled explicitly.  I've been using it successfully and some
> substantial gains have been reported on the lucene list.  The
> downsides don't really seen significant.

Sounds fine.

> HashDocSet maxSize: perhaps consider increasing this, or making this
> by default a parameter which is tuned automatically (.5% of maxDocs,
> for instance)

I think when HashDocSet is large enough, it can be slower than
OpenBitSet for taking intersections, even when it still saves memory.
So it depends on what one is optimizing for.

I picked 3000 long ago since that it seemed the fastest for faceting
with one particular data set (between 500K to 1M docs), but that was
before OpenBitSet.  It also caps the max table size at 4096 entries
(16K RAM) (power of two hash table with a load factor of .75).  Does
it make sense to go up to 8K entries?  Do you have any data on
different sizes?

> Most people will start with the example solrconfig.xml, I suspect,
> and getting the  performance-related settings right at the start will
> help the perception of Solr's performance.  I'd be tempted to
> increase the default filterCache size too, but that can have quite
> high memory requirements.

Yeah, many people won't think to increase the VM heap size.
Perhaps that's better as a documentation fix.

What about commenting out most of the default parameters in the dismax
handler config, so it becomes more standard & usable (w/o editing it's
config) after someone customizes their schema?


-Yonik