You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by Mike Klaas <mi...@gmail.com> on 2007/05/25 22:41:20 UTC
solrconfig.xml defaults
Since auditing solrconfig.xml defaults is on the list of things for
1.2, I thought I'd get the ball rolling:
Lazy field loading: seems like it would benefit more people to be
enabled explicitly. I've been using it successfully and some
substantial gains have been reported on the lucene list. The
downsides don't really seen significant.
HashDocSet maxSize: perhaps consider increasing this, or making this
by default a parameter which is tuned automatically (.5% of maxDocs,
for instance)
Most people will start with the example solrconfig.xml, I suspect,
and getting the performance-related settings right at the start will
help the perception of Solr's performance. I'd be tempted to
increase the default filterCache size too, but that can have quite
high memory requirements.
-Mike
Re: solrconfig.xml defaults
Posted by Yonik Seeley <yo...@apache.org>.
On 5/25/07, Chris Hostetter <ho...@fucit.org> wrote:
>
> : What about commenting out most of the default parameters in the dismax
> : handler config, so it becomes more standard & usable (w/o editing it's
> : config) after someone customizes their schema?
>
> i'm torn on this ... those defaults make sense for the example schema/data
> -- which is the main point of the whole example/solr/conf. but i
> appreciate that people can be confused by errors from dismax when they
> chagne their schema (see pingQuery)
>
> perhaps the best solution is to remove the qf/pf/bf defaults for "dismax"
> and add them to "partitioned"
It's minor... I'm OK with either.
For a person just learning dismax though, it almost made sense to "build up"
by adding additional parameters to get more complex queries.
In any case, I changed pingQuery to be simple... IMO, it's not meant
to be a complex test or anything, but just that a query *can* be
issued.
-Yonik
Re: solrconfig.xml defaults
Posted by Chris Hostetter <ho...@fucit.org>.
: What about commenting out most of the default parameters in the dismax
: handler config, so it becomes more standard & usable (w/o editing it's
: config) after someone customizes their schema?
i'm torn on this ... those defaults make sense for the example schema/data
-- which is the main point of the whole example/solr/conf. but i
appreciate that people can be confused by errors from dismax when they
chagne their schema (see pingQuery)
perhaps the best solution is to remove the qf/pf/bf defaults for "dismax"
and add them to "partitioned"
-Hoss
Re: solrconfig.xml defaults
Posted by Mike Klaas <mi...@gmail.com>.
On 26-May-07, at 7:08 AM, Yonik Seeley wrote:
> On 5/25/07, Mike Klaas <mi...@gmail.com> wrote:
>>
>> Wasn't HashDocSet significantly optimized for intersection recently?
>
> More like optimized/simplified for storing lucene doc ids. Only about
> 8-10% speedup.
> OpenBitSet was more on the order of 2 to 4 times improvement over
> BitSet for intersections.
> Here's one data point from someone else:
>
> http://www.nabble.com/Aggregating-category-hits-
> tf1623611.html#a4831982
I'm convinced. The current default is sensible.
-Mike
Re: solrconfig.xml defaults
Posted by Yonik Seeley <yo...@apache.org>.
On 5/25/07, Mike Klaas <mi...@gmail.com> wrote:
> On 25-May-07, at 2:09 PM, Yonik Seeley wrote:
>
> > On 5/25/07, Mike Klaas <mi...@gmail.com> wrote:
>
> >> HashDocSet maxSize: perhaps consider increasing this, or making this
> >> by default a parameter which is tuned automatically (.5% of maxDocs,
> >> for instance)
> >
> > I think when HashDocSet is large enough, it can be slower than
> > OpenBitSet for taking intersections, even when it still saves memory.
> > So it depends on what one is optimizing for.
> >
> > I picked 3000 long ago since that it seemed the fastest for faceting
> > with one particular data set (between 500K to 1M docs), but that was
> > before OpenBitSet. It also caps the max table size at 4096 entries
>
> Wasn't HashDocSet significantly optimized for intersection recently?
More like optimized/simplified for storing lucene doc ids. Only about
8-10% speedup.
OpenBitSet was more on the order of 2 to 4 times improvement over
BitSet for intersections.
Here's one data point from someone else:
http://www.nabble.com/Aggregating-category-hits-tf1623611.html#a4831982
-Yonik
Re: solrconfig.xml defaults
Posted by Mike Klaas <mi...@gmail.com>.
On 25-May-07, at 2:09 PM, Yonik Seeley wrote:
> On 5/25/07, Mike Klaas <mi...@gmail.com> wrote:
>> HashDocSet maxSize: perhaps consider increasing this, or making this
>> by default a parameter which is tuned automatically (.5% of maxDocs,
>> for instance)
>
> I think when HashDocSet is large enough, it can be slower than
> OpenBitSet for taking intersections, even when it still saves memory.
> So it depends on what one is optimizing for.
>
> I picked 3000 long ago since that it seemed the fastest for faceting
> with one particular data set (between 500K to 1M docs), but that was
> before OpenBitSet. It also caps the max table size at 4096 entries
Wasn't HashDocSet significantly optimized for intersection recently?
> (16K RAM) (power of two hash table with a load factor of .75). Does
> it make sense to go up to 8K entries? Do you have any data on
> different sizes?
Unfortunately, I don't. I'm using 20K right now for indices ranging
in size from 3-8M docs, but that was based on advice on the wiki, and
the memory savings seemed worth it (each bit filter is pushing 500Kb
to 1Mb at that scale). I might have time to run some experiments
before 1.2 is released. If not, 3000 seems like a well-founded default.
>> Most people will start with the example solrconfig.xml, I suspect,
>> and getting the performance-related settings right at the start will
>> help the perception of Solr's performance. I'd be tempted to
>> increase the default filterCache size too, but that can have quite
>> high memory requirements.
>
> Yeah, many people won't think to increase the VM heap size.
> Perhaps that's better as a documentation fix.
I just added a note to SolrPerformanceFactors. Most of the
information is already on the wiki.
> What about commenting out most of the default parameters in the dismax
> handler config, so it becomes more standard & usable (w/o editing it's
> config) after someone customizes their schema?
Makes sense, but I agree with Hoss that it is nice for the user to be
able to easily use the example OOB.
-Mike
Re: solrconfig.xml defaults
Posted by Yonik Seeley <yo...@apache.org>.
On 5/25/07, Mike Klaas <mi...@gmail.com> wrote:
> Since auditing solrconfig.xml defaults is on the list of things for
> 1.2, I thought I'd get the ball rolling:
Thanks, that was one of the things I was looking into now (hitting all
the new URLs and seeing what they looked like too)
> Lazy field loading: seems like it would benefit more people to be
> enabled explicitly. I've been using it successfully and some
> substantial gains have been reported on the lucene list. The
> downsides don't really seen significant.
Sounds fine.
> HashDocSet maxSize: perhaps consider increasing this, or making this
> by default a parameter which is tuned automatically (.5% of maxDocs,
> for instance)
I think when HashDocSet is large enough, it can be slower than
OpenBitSet for taking intersections, even when it still saves memory.
So it depends on what one is optimizing for.
I picked 3000 long ago since that it seemed the fastest for faceting
with one particular data set (between 500K to 1M docs), but that was
before OpenBitSet. It also caps the max table size at 4096 entries
(16K RAM) (power of two hash table with a load factor of .75). Does
it make sense to go up to 8K entries? Do you have any data on
different sizes?
> Most people will start with the example solrconfig.xml, I suspect,
> and getting the performance-related settings right at the start will
> help the perception of Solr's performance. I'd be tempted to
> increase the default filterCache size too, but that can have quite
> high memory requirements.
Yeah, many people won't think to increase the VM heap size.
Perhaps that's better as a documentation fix.
What about commenting out most of the default parameters in the dismax
handler config, so it becomes more standard & usable (w/o editing it's
config) after someone customizes their schema?
-Yonik