You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Michael Imbeault <mi...@sympatico.ca> on 2006/09/09 20:29:42 UTC
Got it working! And some questions
First of all, in reference to
http://www.mail-archive.com/solr-user@lucene.apache.org/msg00808.html ,
I got it working! The problem(s) was coming from solPHP; the
implementation in the wiki isn't really working, to be honest, at least
for me. I had to modify it significantly at multiple places to get it
working. Tomcat 5.5, WAMP and Windows XP.
The main problem was that addIndex was sending 1 doc at a time to solr;
it would cause a problem after a few thousand docs because i was running
out of resources. I modified solr_update.php to handle batch queries,
and i'm now sending batches of 1000 docs at a time. Great indexing speed.
Had a slight problem with the curl function of solr_update.php; the
custom HTTP header wasn't recognized; I now use curl_setopt($ch,
CURLOPT_POST, 1); curl_setopt($ch, CURLOPT_POSTFIELDS, $post_string); -
much simpler, and now everything works!
Up so far I indexed 15.000.000 documents (my whole collection,
basically) and the performance i'm getting is INCREDIBLE (sub 100ms
query time without warmup and no optimization at all on a 7 gigs index -
and with the cache, it gets stupid fast)! Seriously, Solr amaze me every
time I use it. I increased HashDocSet Maxsize to 75000, will continue to
optimize this value - it helped a great deal. I will try disMaxHandler
soon too; right now the standard one is great. And I will index with a
better stopword file; the default one could really use improvements.
Some questions (couldn't find the answer in the docs):
- Is the solr php in the wiki working out of the box for anyone? Else we
could modify the wiki...
- What is the loadFactor variable of HashDocSet? Should I optimize it too?
- What's the units on the size value of the caches? Megs, number of
queries, kilobytes? Not described anywhere.
- Any way to programatically change the OR/AND preference of the query
parser? I set it to AND by default for user queries, but i'd like to set
it to OR for some server-side queries I must do (find related articles,
order by score).
- Whats the difference between the 2 commits type? Blocking and
non-blocking. Didn't see any differences at all, tried both.
- Every time I do an <optimize> command, I get the following in my
catalina logs - should I do anything about it?
9-Sep-2006 2:24:40 PM org.apache.solr.core.SolrException log
SEVERE: Exception during commit/optimize:java.io.EOFException: no more
data available - expected end tag </optimize> to close start tag
<optimize> from line 1, parser stopped on START_TAG seen <optimize>... @1:10
- Any benefits of setting the allowed memory for Tomcat higher? Right
now im allocating 384 megs.
Can't wait to try the new Faceted Queries... seriously, solr is really,
really awesome up so far. Thanks for all your work, and sorry for all
the questions!
--
Michael Imbeault
CHUL Research Center (CHUQ)
2705 boul. Laurier
Ste-Foy, QC, Canada, G1V 4G2
Tel: (418) 654-2705, Fax: (418) 654-2212
Re: Got it working! And some questions
Posted by James liu <li...@gmail.com>.
- Is the solr php in the wiki working out of the box for anyone?
show your php.ini. did you performance your php?
2006/9/10, Brian Lucas <bl...@gmail.com>:
>
> Hi Michael,
>
> I apologize for the lack of testing on the SolPHP. I had to "strip" it
> down
> significantly to turn it into a general class that would be usable and the
> version up there has not been extensively tested yet (I'm almost ready to
> get back to that and "revise" it), plus much of my coding is done in Rails
> at the moment. However...
>
> If you have a new version, could you send it over my way or just upload it
> to the wiki? I'd like to take a look at the changes and throw your
> revised
> version up there or integrate both versions into a cleaner revision of the
> version already there.
>
> With respect to batch queries, it's already designed to do that (that's
> why
> you see "array($array)" in the example, because it accepts an array of
> updates) but I'd definitely like to see how you revised it.
>
> Thanks,
> Brian
>
>
> -----Original Message-----
> From: Michael Imbeault [mailto:michael.imbeault@sympatico.ca]
> Sent: Saturday, September 09, 2006 12:30 PM
> To: solr-user@lucene.apache.org
> Subject: Got it working! And some questions
>
> First of all, in reference to
> http://www.mail-archive.com/solr-user@lucene.apache.org/msg00808.html ,
> I got it working! The problem(s) was coming from solPHP; the
> implementation in the wiki isn't really working, to be honest, at least
> for me. I had to modify it significantly at multiple places to get it
> working. Tomcat 5.5, WAMP and Windows XP.
>
> The main problem was that addIndex was sending 1 doc at a time to solr;
> it would cause a problem after a few thousand docs because i was running
> out of resources. I modified solr_update.php to handle batch queries,
> and i'm now sending batches of 1000 docs at a time. Great indexing speed.
>
> Had a slight problem with the curl function of solr_update.php; the
> custom HTTP header wasn't recognized; I now use curl_setopt($ch,
> CURLOPT_POST, 1); curl_setopt($ch, CURLOPT_POSTFIELDS, $post_string); -
> much simpler, and now everything works!
>
> Up so far I indexed 15.000.000 documents (my whole collection,
> basically) and the performance i'm getting is INCREDIBLE (sub 100ms
> query time without warmup and no optimization at all on a 7 gigs index -
> and with the cache, it gets stupid fast)! Seriously, Solr amaze me every
> time I use it. I increased HashDocSet Maxsize to 75000, will continue to
> optimize this value - it helped a great deal. I will try disMaxHandler
> soon too; right now the standard one is great. And I will index with a
> better stopword file; the default one could really use improvements.
>
> Some questions (couldn't find the answer in the docs):
>
> - Is the solr php in the wiki working out of the box for anyone? Else we
> could modify the wiki...
>
> - What is the loadFactor variable of HashDocSet? Should I optimize it too?
>
> - What's the units on the size value of the caches? Megs, number of
> queries, kilobytes? Not described anywhere.
>
> - Any way to programatically change the OR/AND preference of the query
> parser? I set it to AND by default for user queries, but i'd like to set
> it to OR for some server-side queries I must do (find related articles,
> order by score).
>
> - Whats the difference between the 2 commits type? Blocking and
> non-blocking. Didn't see any differences at all, tried both.
>
> - Every time I do an <optimize> command, I get the following in my
> catalina logs - should I do anything about it?
>
> 9-Sep-2006 2:24:40 PM org.apache.solr.core.SolrException log
> SEVERE: Exception during commit/optimize:java.io.EOFException: no more
> data available - expected end tag </optimize> to close start tag
> <optimize> from line 1, parser stopped on START_TAG seen <optimize>...
> @1:10
>
> - Any benefits of setting the allowed memory for Tomcat higher? Right
> now im allocating 384 megs.
>
> Can't wait to try the new Faceted Queries... seriously, solr is really,
> really awesome up so far. Thanks for all your work, and sorry for all
> the questions!
>
> --
> Michael Imbeault
> CHUL Research Center (CHUQ)
> 2705 boul. Laurier
> Ste-Foy, QC, Canada, G1V 4G2
> Tel: (418) 654-2705, Fax: (418) 654-2212
>
>
Re: Got it working! And some questions
Posted by Yonik Seeley <yo...@apache.org>.
On 9/9/06, Michael Imbeault <mi...@sympatico.ca> wrote:
> The main problem was that addIndex was sending 1 doc at a time to solr;
> it would cause a problem after a few thousand docs because i was running
> out of resources.
Sending one doc at a time should be fine... you shouldn't run out of
resources.
There must be a bug somewhere...
-Yonik
Re: Got it working! And some questions
Posted by Chris Hostetter <ho...@fucit.org>.
: First of all, it seems the mailing list is having some troubles? Some of
: my posts end up in the wrong thread (even new threads I post), I don't
: receive them in my mail, and they're present only in the 'date archive'
: of http://www.mail-archive.com, and not in the 'thread' one? I don't
: receive some of the other peoples post in my mail too, problems started
: last week I think.
i haven't noticed any problems with mail not making it through - some mail
clients (gmail for example) seem to supress messages they can tell you
sent, maybe that'swhat's happening on your end? As for
threads you start not showing up on the "thread" list ... according to
my mailbox, all but one message i've recieved from you included a
"References:" header (if not a In-Reply-To header) which causes some mail
archivers to assume it's part of an existing thread (this thread for
instance is considered part of the "Double Solr Installation on Single
Tomcat (or Double Index)" thread) ... you may wnat to experiement with
your mail client (off list) to see if you can figure out when/why this
happening.
: Secondly, Chris, thanks for all the useful answers, everything is much
: clearer now. This info should be added to the wiki I think; should I do
feel free ... that's why it's a wiki.
: it? I'm still a little disappointed that I can't change the OR/AND
: parsing by just changing some parameter (like I can do for the number of
: results returned, for example); adding a OR between each word in the
: text i want to compare sounds suboptimal, but i'll probably do it that
: way; its a very minor nitpick, solr is awesome, as I said before.
it would be a fairly simple option to add just like changing the
default field (patches welcome!) but as i said -- typcially if you don't
want the default behavior you are programaticaly generating the query
anyway, and already adding some markup, a little more doesn't make it less
optimal.
-Hoss
Re: Got it working! And some questions
Posted by Chris Hostetter <ho...@fucit.org>.
: Maybe something like q.op or q.oper if it *only* applies to q. Which
: begs the question... what *does* it apply to? At first blush, it
: doesn't seem like it should apply to other queries like fq, facet
: queries, and esp queries defined in solrconfig.xml. I think that
: would be very surprising.
agreed not the comment i put into SolrPluginUtils.parseFilterQueries when
i add fq support to StandardRequestHandler...
/* Ignore SolrParams.DF - could have init param FQs assuming the
* schema default with query param DF intented to only affect Q.
* If user doesn't want schema default, they should be explicit in the FQ.
*/
... i would think a "do" or "op" or "q.op" param should *definitely* only
influence the "q" param.
-Hoss
Re: Got it working! And some questions
Posted by Chris Hostetter <ho...@fucit.org>.
: SolrQueryParser now knows nothing about the default operator, it is
: set from QueryParsing.parseQuery() when passed a SolrParams.
i didn't test it, but it looks clean to me.
the only other thing i would do is beaf up the javadocs for
SolrQueryParser (to clarify that IndexSchema is only used for determining
field format) and QueryParsing.parseQuery (to clarify that it *does* use
IndexSearcher to get extra parsing options).
: QueryParsing.parseQuery() methods could be simplified, perhaps even
...
: It could even get the "q" parameter from there, but there is code
: that passes expressions that don't come from "q". Maybe we could
...yeha, it's utility for simple queries regardless of the "primary"
language of a request handler is key.
: have two parseQuery() methods: parseQuery(String expression,
: SolrQueryRequest req) and parseQuery(SolrQueryRequest req), and for
: the latter the "q" parameter is pulled from the request and used as
: the expression.
That sounds good to me ... but it doesn't seem critical ... clean house as
much as you want, but i don't think anybody else will mind a bit of dust
on the window sills.
-Hoss
Re: Got it working! And some questions
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Sep 12, 2006, at 4:47 PM, Chris Hostetter wrote:
> : I've implemented the ability to override the default operator with
> : q.op=AND|OR. The patch is pasted below for your review.
>
> if i'm reading that right, one subtlety is that "new
> SolrQueryParser(schema,field)" no longer pas attention to
> schema.getQueryParserDefaultOperator() -- that only only becomes
> applicable when using QueryParsing.parseQuery
>
> ...i am very okay with this change, i wasn't really a fan of the
> fact that
> the SolrQueryParser pulled that info out of the IndexSchema in it's
> constructor previously, i just wanted to point out that this patch
> would
> change that.
>
> Perhaps the constructor for SolrQueryParser shouldn't be aware of
> the op
> at all (either from the schema or from the SolrParams) -- and
> setting it
> should be left to QueryParsing.parseQuery (or some other utility in
> the
> QueryParsing class) ... personally i'm a fan of leaving
> SolrQueryParser as
> much like QueryParser as possible -- with the only real change
> being the
> knowledege of hte individual field formats.
I've reworked it based on your feedback. The patch is pasted below.
SolrQueryParser now knows nothing about the default operator, it is
set from QueryParsing.parseQuery() when passed a SolrParams.
QueryParsing.parseQuery() methods could be simplified, perhaps even
into a single method, that took a query expression and a
SolrQueryRequest, where it can get the SolrParams and IndexSchema.
It could even get the "q" parameter from there, but there is code
that passes expressions that don't come from "q". Maybe we could
have two parseQuery() methods: parseQuery(String expression,
SolrQueryRequest req) and parseQuery(SolrQueryRequest req), and for
the latter the "q" parameter is pulled from the request and used as
the expression.
As it is, the patch below works fine and I'm happy to commit it, but
am happy to rework this sort of thing to get it as clean as others like.
Erik
Index: src/java/org/apache/solr/search/SolrQueryParser.java
===================================================================
--- src/java/org/apache/solr/search/SolrQueryParser.java (revision
442772)
+++ src/java/org/apache/solr/search/SolrQueryParser.java (working copy)
@@ -37,7 +37,6 @@
super(defaultField == null ? schema.getDefaultSearchFieldName
() : defaultField, schema.getQueryAnalyzer());
this.schema = schema;
setLowercaseExpandedTerms(false);
- setDefaultOperator("AND".equals
(schema.getQueryParserDefaultOperator()) ? QueryParser.Operator.AND :
QueryParser.Operator.OR);
}
protected Query getFieldQuery(String field, String queryText)
throws ParseException {
Index: src/java/org/apache/solr/search/QueryParsing.java
===================================================================
--- src/java/org/apache/solr/search/QueryParsing.java (revision 442772)
+++ src/java/org/apache/solr/search/QueryParsing.java (working copy)
@@ -19,6 +19,7 @@
import org.apache.lucene.search.*;
import org.apache.solr.search.function.*;
import org.apache.lucene.queryParser.ParseException;
+import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.Term;
import org.apache.solr.core.SolrCore;
@@ -26,6 +27,7 @@
import org.apache.solr.schema.IndexSchema;
import org.apache.solr.schema.SchemaField;
import org.apache.solr.schema.FieldType;
+import org.apache.solr.request.SolrParams;
import java.util.ArrayList;
import java.util.regex.Pattern;
@@ -37,6 +39,7 @@
* @version $Id$
*/
public class QueryParsing {
+ public static final String OP = "q.op";
public static Query parseQuery(String qs, IndexSchema schema) {
return parseQuery(qs, null, schema);
@@ -58,8 +61,26 @@
}
}
+ public static Query parseQuery(String qs, String defaultField,
SolrParams params, IndexSchema schema) {
+ try {
+ String opParam = params.get(OP,
schema.getQueryParserDefaultOperator());
+ QueryParser.Operator defaultOperator = "AND".equals(opParam) ?
QueryParser.Operator.AND : QueryParser.Operator.OR;
+ SolrQueryParser parser = new SolrQueryParser(schema,
defaultField);
+ parser.setDefaultOperator(defaultOperator);
+ Query query = parser.parse(qs);
+ if (SolrCore.log.isLoggable(Level.FINEST)) {
+ SolrCore.log.finest("After QueryParser:" + query);
+ }
+ return query;
+
+ } catch (ParseException e) {
+ SolrCore.log(e);
+ throw new SolrException(400,"Error parsing Lucene query",e);
+ }
+ }
+
/***
* SortSpec encapsulates a Lucene Sort and a count of the number
of documents
* to return.
Index: src/java/org/apache/solr/request/StandardRequestHandler.java
===================================================================
--- src/java/org/apache/solr/request/StandardRequestHandler.java
(revision 442772)
+++ src/java/org/apache/solr/request/StandardRequestHandler.java
(working copy)
@@ -105,7 +105,7 @@
List<String> commands = StrUtils.splitSmart(sreq,';');
String qs = commands.size() >= 1 ? commands.get(0) : "";
- Query query = QueryParsing.parseQuery(qs, defaultField,
req.getSchema());
+ Query query = QueryParsing.parseQuery(qs, defaultField, p,
req.getSchema());
// If the first non-query, non-filter command is a simple
sort on an indexed field, then
// we can use the Lucene sort ability.
Re: Got it working! And some questions
Posted by Chris Hostetter <ho...@fucit.org>.
: I've implemented the ability to override the default operator with
: q.op=AND|OR. The patch is pasted below for your review.
if i'm reading that right, one subtlety is that "new
SolrQueryParser(schema,field)" no longer pas attention to
schema.getQueryParserDefaultOperator() -- that only only becomes
applicable when using QueryParsing.parseQuery
...i am very okay with this change, i wasn't really a fan of the fact that
the SolrQueryParser pulled that info out of the IndexSchema in it's
constructor previously, i just wanted to point out that this patch would
change that.
Perhaps the constructor for SolrQueryParser shouldn't be aware of the op
at all (either from the schema or from the SolrParams) -- and setting it
should be left to QueryParsing.parseQuery (or some other utility in the
QueryParsing class) ... personally i'm a fan of leaving SolrQueryParser as
much like QueryParser as possible -- with the only real change being the
knowledege of hte individual field formats.
: Index: src/java/org/apache/solr/search/SolrQueryParser.java
: ===================================================================
: --- src/java/org/apache/solr/search/SolrQueryParser.java (revision
: 442689)
: +++ src/java/org/apache/solr/search/SolrQueryParser.java (working copy)
: @@ -34,10 +34,14 @@
: protected final IndexSchema schema;
: public SolrQueryParser(IndexSchema schema, String defaultField) {
: + this(schema, defaultField, QueryParser.Operator.OR);
: + }
: +
: + public SolrQueryParser(IndexSchema schema, String defaultField,
: QueryParser.Operator defaultOperator) {
: super(defaultField == null ? schema.getDefaultSearchFieldName
: () : defaultField, schema.getQueryAnalyzer());
: this.schema = schema;
: setLowercaseExpandedTerms(false);
: - setDefaultOperator("AND".equals
: (schema.getQueryParserDefaultOperator()) ? QueryParser.Operator.AND :
: QueryParser.Operator.OR);
: + setDefaultOperator(defaultOperator);
: }
: protected Query getFieldQuery(String field, String queryText)
: throws ParseException {
: Index: src/java/org/apache/solr/search/QueryParsing.java
: ===================================================================
: --- src/java/org/apache/solr/search/QueryParsing.java (revision 442689)
: +++ src/java/org/apache/solr/search/QueryParsing.java (working copy)
: @@ -19,6 +19,7 @@
: import org.apache.lucene.search.*;
: import org.apache.solr.search.function.*;
: import org.apache.lucene.queryParser.ParseException;
: +import org.apache.lucene.queryParser.QueryParser;
: import org.apache.lucene.document.Field;
: import org.apache.lucene.index.Term;
: import org.apache.solr.core.SolrCore;
: @@ -26,6 +27,7 @@
: import org.apache.solr.schema.IndexSchema;
: import org.apache.solr.schema.SchemaField;
: import org.apache.solr.schema.FieldType;
: +import org.apache.solr.request.SolrParams;
: import java.util.ArrayList;
: import java.util.regex.Pattern;
: @@ -37,6 +39,7 @@
: * @version $Id$
: */
: public class QueryParsing {
: + public static final String OP = "q.op";
: public static Query parseQuery(String qs, IndexSchema schema) {
: return parseQuery(qs, null, schema);
: @@ -58,8 +61,24 @@
: }
: }
: + public static Query parseQuery(String qs, String defaultField,
: SolrParams params, IndexSchema schema) {
: + try {
: + String opParam = params.get(OP,
: schema.getQueryParserDefaultOperator());
: + QueryParser.Operator defaultOperator = "AND".equals(opParam) ?
: QueryParser.Operator.AND : QueryParser.Operator.OR;
: + Query query = new SolrQueryParser(schema, defaultField,
: defaultOperator).parse(qs);
: + if (SolrCore.log.isLoggable(Level.FINEST)) {
: + SolrCore.log.finest("After QueryParser:" + query);
: + }
: + return query;
: +
: + } catch (ParseException e) {
: + SolrCore.log(e);
: + throw new SolrException(400,"Error parsing Lucene query",e);
: + }
: + }
: +
: /***
: * SortSpec encapsulates a Lucene Sort and a count of the number
: of documents
: * to return.
: Index: src/java/org/apache/solr/request/StandardRequestHandler.java
: ===================================================================
: --- src/java/org/apache/solr/request/StandardRequestHandler.java
: (revision 442689)
: +++ src/java/org/apache/solr/request/StandardRequestHandler.java
: (working copy)
: @@ -94,7 +94,7 @@
: List<String> commands = StrUtils.splitSmart(sreq,';');
: String qs = commands.size() >= 1 ? commands.get(0) : "";
: - Query query = QueryParsing.parseQuery(qs, defaultField,
: req.getSchema());
: + Query query = QueryParsing.parseQuery(qs, defaultField, p,
: req.getSchema());
: // If the first non-query, non-filter command is a simple
: sort on an indexed field, then
: // we can use the Lucene sort ability.
:
-Hoss
Re: Got it working! And some questions
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Sep 11, 2006, at 2:52 PM, Yonik Seeley wrote:
> On 9/11/06, Erik Hatcher <er...@ehatchersolutions.com> wrote:
>>
>> On Sep 10, 2006, at 10:47 PM, Michael Imbeault wrote:
>> > I'm still a little disappointed that I can't change the OR/AND
>> > parsing by just changing some parameter (like I can do for the
>> > number of results returned, for example); adding a OR between each
>> > word in the text i want to compare sounds suboptimal, but i'll
>> > probably do it that way; its a very minor nitpick, solr is awesome,
>> > as I said before.
>>
>> I'm the one that added support for controlling the default operator
>> of Solr's query parser, and I hadn't considered the use case of
>> controlling that setting from a request parameter. It should be easy
>> enough to add. I'll take a look at adding that support and commit it
>> once I have it working.
>>
>> What parameter name should be used for this? do=[AND|OR] (for
>> default operator)? We have df for default field.
>
> Maybe something like q.op or q.oper if it *only* applies to q. Which
> begs the question... what *does* it apply to? At first blush, it
> doesn't seem like it should apply to other queries like fq, facet
> queries, and esp queries defined in solrconfig.xml. I think that
> would be very surprising.
I've implemented the ability to override the default operator with
q.op=AND|OR. The patch is pasted below for your review.
The one thing I don't like is that QueryParsing.parseQuery(String qs,
String defaultField, SolrParams params, IndexSchema schema) is a bit
redundant in that it takes defaultField which can also be gleaned
from params, but StandardRequestHandler uses "df" for highlighting also.
I'm happy to commit this if there are no objections or suggestions
for improvement (and of course update the wiki documentation for the
parameters).
Erik
Index: src/java/org/apache/solr/search/SolrQueryParser.java
===================================================================
--- src/java/org/apache/solr/search/SolrQueryParser.java (revision
442689)
+++ src/java/org/apache/solr/search/SolrQueryParser.java (working copy)
@@ -34,10 +34,14 @@
protected final IndexSchema schema;
public SolrQueryParser(IndexSchema schema, String defaultField) {
+ this(schema, defaultField, QueryParser.Operator.OR);
+ }
+
+ public SolrQueryParser(IndexSchema schema, String defaultField,
QueryParser.Operator defaultOperator) {
super(defaultField == null ? schema.getDefaultSearchFieldName
() : defaultField, schema.getQueryAnalyzer());
this.schema = schema;
setLowercaseExpandedTerms(false);
- setDefaultOperator("AND".equals
(schema.getQueryParserDefaultOperator()) ? QueryParser.Operator.AND :
QueryParser.Operator.OR);
+ setDefaultOperator(defaultOperator);
}
protected Query getFieldQuery(String field, String queryText)
throws ParseException {
Index: src/java/org/apache/solr/search/QueryParsing.java
===================================================================
--- src/java/org/apache/solr/search/QueryParsing.java (revision 442689)
+++ src/java/org/apache/solr/search/QueryParsing.java (working copy)
@@ -19,6 +19,7 @@
import org.apache.lucene.search.*;
import org.apache.solr.search.function.*;
import org.apache.lucene.queryParser.ParseException;
+import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.Term;
import org.apache.solr.core.SolrCore;
@@ -26,6 +27,7 @@
import org.apache.solr.schema.IndexSchema;
import org.apache.solr.schema.SchemaField;
import org.apache.solr.schema.FieldType;
+import org.apache.solr.request.SolrParams;
import java.util.ArrayList;
import java.util.regex.Pattern;
@@ -37,6 +39,7 @@
* @version $Id$
*/
public class QueryParsing {
+ public static final String OP = "q.op";
public static Query parseQuery(String qs, IndexSchema schema) {
return parseQuery(qs, null, schema);
@@ -58,8 +61,24 @@
}
}
+ public static Query parseQuery(String qs, String defaultField,
SolrParams params, IndexSchema schema) {
+ try {
+ String opParam = params.get(OP,
schema.getQueryParserDefaultOperator());
+ QueryParser.Operator defaultOperator = "AND".equals(opParam) ?
QueryParser.Operator.AND : QueryParser.Operator.OR;
+ Query query = new SolrQueryParser(schema, defaultField,
defaultOperator).parse(qs);
+ if (SolrCore.log.isLoggable(Level.FINEST)) {
+ SolrCore.log.finest("After QueryParser:" + query);
+ }
+ return query;
+
+ } catch (ParseException e) {
+ SolrCore.log(e);
+ throw new SolrException(400,"Error parsing Lucene query",e);
+ }
+ }
+
/***
* SortSpec encapsulates a Lucene Sort and a count of the number
of documents
* to return.
Index: src/java/org/apache/solr/request/StandardRequestHandler.java
===================================================================
--- src/java/org/apache/solr/request/StandardRequestHandler.java
(revision 442689)
+++ src/java/org/apache/solr/request/StandardRequestHandler.java
(working copy)
@@ -94,7 +94,7 @@
List<String> commands = StrUtils.splitSmart(sreq,';');
String qs = commands.size() >= 1 ? commands.get(0) : "";
- Query query = QueryParsing.parseQuery(qs, defaultField,
req.getSchema());
+ Query query = QueryParsing.parseQuery(qs, defaultField, p,
req.getSchema());
// If the first non-query, non-filter command is a simple
sort on an indexed field, then
// we can use the Lucene sort ability.
Re: Got it working! And some questions
Posted by Yonik Seeley <yo...@apache.org>.
On 9/11/06, Erik Hatcher <er...@ehatchersolutions.com> wrote:
>
> On Sep 10, 2006, at 10:47 PM, Michael Imbeault wrote:
> > I'm still a little disappointed that I can't change the OR/AND
> > parsing by just changing some parameter (like I can do for the
> > number of results returned, for example); adding a OR between each
> > word in the text i want to compare sounds suboptimal, but i'll
> > probably do it that way; its a very minor nitpick, solr is awesome,
> > as I said before.
>
> I'm the one that added support for controlling the default operator
> of Solr's query parser, and I hadn't considered the use case of
> controlling that setting from a request parameter. It should be easy
> enough to add. I'll take a look at adding that support and commit it
> once I have it working.
>
> What parameter name should be used for this? do=[AND|OR] (for
> default operator)? We have df for default field.
Maybe something like q.op or q.oper if it *only* applies to q. Which
begs the question... what *does* it apply to? At first blush, it
doesn't seem like it should apply to other queries like fq, facet
queries, and esp queries defined in solrconfig.xml. I think that
would be very surprising.
-Yonik
Re: Got it working! And some questions
Posted by Michael Imbeault <mi...@sympatico.ca>.
Hello Erik,
Thanks for add that feature! "do" is fine with me, if "op" is already
used (not sure about this one).
Erik Hatcher wrote:
>
> On Sep 10, 2006, at 10:47 PM, Michael Imbeault wrote:
>> I'm still a little disappointed that I can't change the OR/AND
>> parsing by just changing some parameter (like I can do for the number
>> of results returned, for example); adding a OR between each word in
>> the text i want to compare sounds suboptimal, but i'll probably do it
>> that way; its a very minor nitpick, solr is awesome, as I said before.
>
> I'm the one that added support for controlling the default operator of
> Solr's query parser, and I hadn't considered the use case of
> controlling that setting from a request parameter. It should be easy
> enough to add. I'll take a look at adding that support and commit it
> once I have it working.
>
> What parameter name should be used for this? do=[AND|OR] (for
> default operator)? We have df for default field.
>
> Erik
>
--
Michael Imbeault
CHUL Research Center (CHUQ)
2705 boul. Laurier
Ste-Foy, QC, Canada, G1V 4G2
Tel: (418) 654-2705, Fax: (418) 654-2212
Re: Got it working! And some questions
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Sep 10, 2006, at 10:47 PM, Michael Imbeault wrote:
> I'm still a little disappointed that I can't change the OR/AND
> parsing by just changing some parameter (like I can do for the
> number of results returned, for example); adding a OR between each
> word in the text i want to compare sounds suboptimal, but i'll
> probably do it that way; its a very minor nitpick, solr is awesome,
> as I said before.
I'm the one that added support for controlling the default operator
of Solr's query parser, and I hadn't considered the use case of
controlling that setting from a request parameter. It should be easy
enough to add. I'll take a look at adding that support and commit it
once I have it working.
What parameter name should be used for this? do=[AND|OR] (for
default operator)? We have df for default field.
Erik
Re: Got it working! And some questions
Posted by Michael Imbeault <mi...@sympatico.ca>.
First of all, it seems the mailing list is having some troubles? Some of
my posts end up in the wrong thread (even new threads I post), I don't
receive them in my mail, and they're present only in the 'date archive'
of http://www.mail-archive.com, and not in the 'thread' one? I don't
receive some of the other peoples post in my mail too, problems started
last week I think.
Secondly, Chris, thanks for all the useful answers, everything is much
clearer now. This info should be added to the wiki I think; should I do
it? I'm still a little disappointed that I can't change the OR/AND
parsing by just changing some parameter (like I can do for the number of
results returned, for example); adding a OR between each word in the
text i want to compare sounds suboptimal, but i'll probably do it that
way; its a very minor nitpick, solr is awesome, as I said before.
@ Brian Lucas: Don't worry, solrPHP was still 99.9% functional, great
work; part of it sending a doc at a time was my fault; I was following
the exact sequence (add to array, submit) displayed in the docs. The
only thing that could be added is a big "//TODO: change this code"
before sections you have to change to make it work for a particular
schema. I'm pretty sure the custom header curl submit works for everyone
else than me; I'm on a windows test box with WAMP on it, so it may be
caused by that. I'll send you tomorrow the changes I done to the code
anyway; as I said, nothing major.
Chris Hostetter wrote:
> : - What is the loadFactor variable of HashDocSet? Should I optimize it too?
>
> this is the same as the loadFactor in a HashMap constructor -- but i don't
> think it has much affect on performance since the HashDocSets never
> "grow".
>
> I personally have never tuned the loadFactor :)
>
> : - What's the units on the size value of the caches? Megs, number of
> : queries, kilobytes? Not described anywhere.
>
> "entries" ... the number of items allowed in the cache.
>
> : - Any way to programatically change the OR/AND preference of the query
> : parser? I set it to AND by default for user queries, but i'd like to set
> : it to OR for some server-side queries I must do (find related articles,
> : order by score).
>
> you mean using StandardRequestHandler? ... not that i can think of off the
> top of my head, but typicaly it makes sense to just configure what you
> want for your "users" in the schema, and then make any machine generated
> queries be explicit.
>
> : - Whats the difference between the 2 commits type? Blocking and
> : non-blocking. Didn't see any differences at all, tried both.
>
> do you mean the waitFlush and waitSearcher options?
> if either of those is true, you shouldn't get a response back from the
> server untill they have finished. if they are false, then the server
> should respond instantly even if it takes several seconds (or maybe even
> minutes) to complete the operation (optimizes can take a while in some
> cases -- as can opening newSearchers if you have a lot of cache warming
> configured)
>
> : - Every time I do an <optimize> command, I get the following in my
> : catalina logs - should I do anything about it?
>
> the optimize command needs to be well formed XML, try "<optimize/>"
> instead of just "<optimize>"
>
> : - Any benefits of setting the allowed memory for Tomcat higher? Right
> : now im allocating 384 megs.
>
> the more memory you've got, the more cachng you can support .. but if
> your index changes so frequently compared to the rate of *unique*
> queries you get that your caches never fill up, it may not matter.
>
>
>
>
> -Hoss
>
--
Michael Imbeault
CHUL Research Center (CHUQ)
2705 boul. Laurier
Ste-Foy, QC, Canada, G1V 4G2
Tel: (418) 654-2705, Fax: (418) 654-2212
Re: Got it working! And some questions
Posted by Chris Hostetter <ho...@fucit.org>.
: - What is the loadFactor variable of HashDocSet? Should I optimize it too?
this is the same as the loadFactor in a HashMap constructor -- but i don't
think it has much affect on performance since the HashDocSets never
"grow".
I personally have never tuned the loadFactor :)
: - What's the units on the size value of the caches? Megs, number of
: queries, kilobytes? Not described anywhere.
"entries" ... the number of items allowed in the cache.
: - Any way to programatically change the OR/AND preference of the query
: parser? I set it to AND by default for user queries, but i'd like to set
: it to OR for some server-side queries I must do (find related articles,
: order by score).
you mean using StandardRequestHandler? ... not that i can think of off the
top of my head, but typicaly it makes sense to just configure what you
want for your "users" in the schema, and then make any machine generated
queries be explicit.
: - Whats the difference between the 2 commits type? Blocking and
: non-blocking. Didn't see any differences at all, tried both.
do you mean the waitFlush and waitSearcher options?
if either of those is true, you shouldn't get a response back from the
server untill they have finished. if they are false, then the server
should respond instantly even if it takes several seconds (or maybe even
minutes) to complete the operation (optimizes can take a while in some
cases -- as can opening newSearchers if you have a lot of cache warming
configured)
: - Every time I do an <optimize> command, I get the following in my
: catalina logs - should I do anything about it?
the optimize command needs to be well formed XML, try "<optimize/>"
instead of just "<optimize>"
: - Any benefits of setting the allowed memory for Tomcat higher? Right
: now im allocating 384 megs.
the more memory you've got, the more cachng you can support .. but if
your index changes so frequently compared to the rate of *unique*
queries you get that your caches never fill up, it may not matter.
-Hoss
RE: Got it working! And some questions
Posted by Brian Lucas <bl...@gmail.com>.
Hi Michael,
I apologize for the lack of testing on the SolPHP. I had to "strip" it down
significantly to turn it into a general class that would be usable and the
version up there has not been extensively tested yet (I'm almost ready to
get back to that and "revise" it), plus much of my coding is done in Rails
at the moment. However...
If you have a new version, could you send it over my way or just upload it
to the wiki? I'd like to take a look at the changes and throw your revised
version up there or integrate both versions into a cleaner revision of the
version already there.
With respect to batch queries, it's already designed to do that (that's why
you see "array($array)" in the example, because it accepts an array of
updates) but I'd definitely like to see how you revised it.
Thanks,
Brian
-----Original Message-----
From: Michael Imbeault [mailto:michael.imbeault@sympatico.ca]
Sent: Saturday, September 09, 2006 12:30 PM
To: solr-user@lucene.apache.org
Subject: Got it working! And some questions
First of all, in reference to
http://www.mail-archive.com/solr-user@lucene.apache.org/msg00808.html ,
I got it working! The problem(s) was coming from solPHP; the
implementation in the wiki isn't really working, to be honest, at least
for me. I had to modify it significantly at multiple places to get it
working. Tomcat 5.5, WAMP and Windows XP.
The main problem was that addIndex was sending 1 doc at a time to solr;
it would cause a problem after a few thousand docs because i was running
out of resources. I modified solr_update.php to handle batch queries,
and i'm now sending batches of 1000 docs at a time. Great indexing speed.
Had a slight problem with the curl function of solr_update.php; the
custom HTTP header wasn't recognized; I now use curl_setopt($ch,
CURLOPT_POST, 1); curl_setopt($ch, CURLOPT_POSTFIELDS, $post_string); -
much simpler, and now everything works!
Up so far I indexed 15.000.000 documents (my whole collection,
basically) and the performance i'm getting is INCREDIBLE (sub 100ms
query time without warmup and no optimization at all on a 7 gigs index -
and with the cache, it gets stupid fast)! Seriously, Solr amaze me every
time I use it. I increased HashDocSet Maxsize to 75000, will continue to
optimize this value - it helped a great deal. I will try disMaxHandler
soon too; right now the standard one is great. And I will index with a
better stopword file; the default one could really use improvements.
Some questions (couldn't find the answer in the docs):
- Is the solr php in the wiki working out of the box for anyone? Else we
could modify the wiki...
- What is the loadFactor variable of HashDocSet? Should I optimize it too?
- What's the units on the size value of the caches? Megs, number of
queries, kilobytes? Not described anywhere.
- Any way to programatically change the OR/AND preference of the query
parser? I set it to AND by default for user queries, but i'd like to set
it to OR for some server-side queries I must do (find related articles,
order by score).
- Whats the difference between the 2 commits type? Blocking and
non-blocking. Didn't see any differences at all, tried both.
- Every time I do an <optimize> command, I get the following in my
catalina logs - should I do anything about it?
9-Sep-2006 2:24:40 PM org.apache.solr.core.SolrException log
SEVERE: Exception during commit/optimize:java.io.EOFException: no more
data available - expected end tag </optimize> to close start tag
<optimize> from line 1, parser stopped on START_TAG seen <optimize>... @1:10
- Any benefits of setting the allowed memory for Tomcat higher? Right
now im allocating 384 megs.
Can't wait to try the new Faceted Queries... seriously, solr is really,
really awesome up so far. Thanks for all your work, and sorry for all
the questions!
--
Michael Imbeault
CHUL Research Center (CHUQ)
2705 boul. Laurier
Ste-Foy, QC, Canada, G1V 4G2
Tel: (418) 654-2705, Fax: (418) 654-2212