You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Michael Imbeault <mi...@sympatico.ca> on 2006/09/09 20:29:42 UTC

Got it working! And some questions

First of all, in reference to 
http://www.mail-archive.com/solr-user@lucene.apache.org/msg00808.html , 
I got it working! The problem(s) was coming from solPHP; the 
implementation in the wiki isn't really working, to be honest, at least 
for me. I had to modify it significantly at multiple places to get it 
working. Tomcat 5.5, WAMP and Windows XP.

The main problem was that addIndex was sending 1 doc at a time to solr; 
it would cause a problem after a few thousand docs because i was running 
out of resources. I modified solr_update.php to handle batch queries, 
and i'm now sending batches of 1000 docs at a time. Great indexing speed.

Had a slight problem with the curl function of solr_update.php; the 
custom HTTP header wasn't recognized; I now use curl_setopt($ch, 
CURLOPT_POST, 1); curl_setopt($ch, CURLOPT_POSTFIELDS, $post_string); - 
much simpler, and now everything works!

Up so far I indexed 15.000.000 documents (my whole collection, 
basically) and the performance i'm getting is INCREDIBLE (sub 100ms 
query time without warmup and no optimization at all on a 7 gigs index - 
and with the cache, it gets stupid fast)! Seriously, Solr amaze me every 
time I use it. I increased HashDocSet Maxsize to 75000, will continue to 
optimize this value - it helped a great deal. I will try disMaxHandler 
soon too; right now the standard one is great. And I will index with a 
better stopword file; the default one could really use improvements.

Some questions (couldn't find the answer in the docs):

- Is the solr php in the wiki working out of the box for anyone? Else we 
could modify the wiki...

- What is the loadFactor variable of HashDocSet? Should I optimize it too?

- What's the units on the size value of the caches? Megs, number of 
queries, kilobytes? Not described anywhere.

- Any way to programatically change the OR/AND preference of the query 
parser? I set it to AND by default for user queries, but i'd like to set 
it to OR for some server-side queries I must do (find related articles, 
order by score).

- Whats the difference between the 2 commits type? Blocking and 
non-blocking. Didn't see any differences at all, tried both.

- Every time I do an <optimize> command, I get the following in my 
catalina logs - should I do anything about it?

 9-Sep-2006 2:24:40 PM org.apache.solr.core.SolrException log
SEVERE: Exception during commit/optimize:java.io.EOFException: no more 
data available - expected end tag </optimize> to close start tag 
<optimize> from line 1, parser stopped on START_TAG seen <optimize>... @1:10

- Any benefits of setting the allowed memory for Tomcat higher? Right 
now im allocating 384 megs.

Can't wait to try the new Faceted Queries... seriously, solr is really, 
really awesome up so far. Thanks for all your work, and sorry for all 
the questions!

-- 
Michael Imbeault
CHUL Research Center (CHUQ)
2705 boul. Laurier
Ste-Foy, QC, Canada, G1V 4G2
Tel: (418) 654-2705, Fax: (418) 654-2212


Re: Got it working! And some questions

Posted by James liu <li...@gmail.com>.
- Is the solr php in the wiki working out of the box for anyone?
show your php.ini. did you performance your php?




2006/9/10, Brian Lucas <bl...@gmail.com>:
>
> Hi Michael,
>
> I apologize for the lack of testing on the SolPHP.  I had to "strip" it
> down
> significantly to turn it into a general class that would be usable and the
> version up there has not been extensively tested yet (I'm almost ready to
> get back to that and "revise" it), plus much of my coding is done in Rails
> at the moment.  However...
>
> If you have a new version, could you send it over my way or just upload it
> to the wiki?  I'd like to take a look at the changes and throw your
> revised
> version up there or integrate both versions into a cleaner revision of the
> version already there.
>
> With respect to batch queries, it's already designed to do that (that's
> why
> you see "array($array)" in the example, because it accepts an array of
> updates) but I'd definitely like to see how you revised it.
>
> Thanks,
> Brian
>
>
> -----Original Message-----
> From: Michael Imbeault [mailto:michael.imbeault@sympatico.ca]
> Sent: Saturday, September 09, 2006 12:30 PM
> To: solr-user@lucene.apache.org
> Subject: Got it working! And some questions
>
> First of all, in reference to
> http://www.mail-archive.com/solr-user@lucene.apache.org/msg00808.html ,
> I got it working! The problem(s) was coming from solPHP; the
> implementation in the wiki isn't really working, to be honest, at least
> for me. I had to modify it significantly at multiple places to get it
> working. Tomcat 5.5, WAMP and Windows XP.
>
> The main problem was that addIndex was sending 1 doc at a time to solr;
> it would cause a problem after a few thousand docs because i was running
> out of resources. I modified solr_update.php to handle batch queries,
> and i'm now sending batches of 1000 docs at a time. Great indexing speed.
>
> Had a slight problem with the curl function of solr_update.php; the
> custom HTTP header wasn't recognized; I now use curl_setopt($ch,
> CURLOPT_POST, 1); curl_setopt($ch, CURLOPT_POSTFIELDS, $post_string); -
> much simpler, and now everything works!
>
> Up so far I indexed 15.000.000 documents (my whole collection,
> basically) and the performance i'm getting is INCREDIBLE (sub 100ms
> query time without warmup and no optimization at all on a 7 gigs index -
> and with the cache, it gets stupid fast)! Seriously, Solr amaze me every
> time I use it. I increased HashDocSet Maxsize to 75000, will continue to
> optimize this value - it helped a great deal. I will try disMaxHandler
> soon too; right now the standard one is great. And I will index with a
> better stopword file; the default one could really use improvements.
>
> Some questions (couldn't find the answer in the docs):
>
> - Is the solr php in the wiki working out of the box for anyone? Else we
> could modify the wiki...
>
> - What is the loadFactor variable of HashDocSet? Should I optimize it too?
>
> - What's the units on the size value of the caches? Megs, number of
> queries, kilobytes? Not described anywhere.
>
> - Any way to programatically change the OR/AND preference of the query
> parser? I set it to AND by default for user queries, but i'd like to set
> it to OR for some server-side queries I must do (find related articles,
> order by score).
>
> - Whats the difference between the 2 commits type? Blocking and
> non-blocking. Didn't see any differences at all, tried both.
>
> - Every time I do an <optimize> command, I get the following in my
> catalina logs - should I do anything about it?
>
> 9-Sep-2006 2:24:40 PM org.apache.solr.core.SolrException log
> SEVERE: Exception during commit/optimize:java.io.EOFException: no more
> data available - expected end tag </optimize> to close start tag
> <optimize> from line 1, parser stopped on START_TAG seen <optimize>...
> @1:10
>
> - Any benefits of setting the allowed memory for Tomcat higher? Right
> now im allocating 384 megs.
>
> Can't wait to try the new Faceted Queries... seriously, solr is really,
> really awesome up so far. Thanks for all your work, and sorry for all
> the questions!
>
> --
> Michael Imbeault
> CHUL Research Center (CHUQ)
> 2705 boul. Laurier
> Ste-Foy, QC, Canada, G1V 4G2
> Tel: (418) 654-2705, Fax: (418) 654-2212
>
>

Re: Got it working! And some questions

Posted by Yonik Seeley <yo...@apache.org>.
On 9/9/06, Michael Imbeault <mi...@sympatico.ca> wrote:
> The main problem was that addIndex was sending 1 doc at a time to solr;
> it would cause a problem after a few thousand docs because i was running
> out of resources.

Sending one doc at a time should be fine... you shouldn't run out of
resources.
There must be a bug somewhere...

-Yonik

Re: Got it working! And some questions

Posted by Chris Hostetter <ho...@fucit.org>.
: First of all, it seems the mailing list is having some troubles? Some of
: my posts end up in the wrong thread (even new threads I post), I don't
: receive them in my mail, and they're present only in the 'date archive'
: of http://www.mail-archive.com, and not in the 'thread' one? I don't
: receive some of the other peoples post in my mail too, problems started
: last week I think.

i haven't noticed any problems with mail not making it through - some mail
clients (gmail for example) seem to supress messages they can tell you
sent, maybe that'swhat's happening on your end?  As for
threads you start not showing up on the "thread" list ... according to
my mailbox, all but one message i've recieved from you included a
"References:" header (if not a In-Reply-To header) which causes some mail
archivers to assume it's part of an existing thread (this thread for
instance is considered part of the "Double Solr Installation on Single
Tomcat (or Double Index)" thread) ... you may wnat to experiement with
your mail client (off list) to see if you can figure out when/why this
happening.

: Secondly, Chris, thanks for all the useful answers, everything is much
: clearer now. This info should be added to the wiki I think; should I do

feel free ... that's why it's a wiki.

: it? I'm still a little disappointed that I can't change the OR/AND
: parsing by just changing some parameter (like I can do for the number of
: results returned, for example); adding a OR between each word in the
: text i want to compare sounds suboptimal, but i'll probably do it that
: way; its a very minor nitpick, solr is awesome, as I said before.

it would be a fairly simple option to add just like changing the
default field (patches welcome!) but as i said -- typcially if you don't
want the default behavior you are programaticaly generating the query
anyway, and already adding some markup, a little more doesn't make it less
optimal.





-Hoss


Re: Got it working! And some questions

Posted by Chris Hostetter <ho...@fucit.org>.
: Maybe something like q.op or q.oper if it *only* applies to q.  Which
: begs the question... what *does* it apply to?  At first blush, it
: doesn't seem like it should apply to other queries like fq, facet
: queries, and esp queries defined in solrconfig.xml.  I think that
: would be very surprising.

agreed not the comment i put into SolrPluginUtils.parseFilterQueries when
i add fq support to StandardRequestHandler...

    /* Ignore SolrParams.DF - could have init param FQs assuming the
     * schema default with query param DF intented to only affect Q.
     * If user doesn't want schema default, they should be explicit in the FQ.
     */

... i would think a "do" or "op" or "q.op" param should *definitely* only
influence the "q" param.





-Hoss


Re: Got it working! And some questions

Posted by Chris Hostetter <ho...@fucit.org>.
: SolrQueryParser now knows nothing about the default operator, it is
: set from QueryParsing.parseQuery() when passed a SolrParams.

i didn't test it, but it looks clean to me.

the only other thing i would do is beaf up the javadocs for
SolrQueryParser (to clarify that IndexSchema is only used for determining
field format) and QueryParsing.parseQuery (to clarify that it *does* use
IndexSearcher to get extra parsing options).

: QueryParsing.parseQuery() methods could be simplified, perhaps even
	...
: It could even get the "q" parameter from there, but there is code
: that passes expressions that don't come from "q".  Maybe we could

...yeha, it's utility for simple queries regardless of the "primary"
language of a request handler is key.

: have two parseQuery() methods:  parseQuery(String expression,
: SolrQueryRequest req) and parseQuery(SolrQueryRequest req), and for
: the latter the "q" parameter is pulled from the request and used as
: the expression.

That sounds good to me ... but it doesn't seem critical ... clean house as
much as you want, but i don't think anybody else will mind a bit of dust
on the window sills.



-Hoss


Re: Got it working! And some questions

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Sep 12, 2006, at 4:47 PM, Chris Hostetter wrote:
> : I've implemented the ability to override the default operator with
> : q.op=AND|OR.  The patch is pasted below for your review.
>
> if i'm reading that right, one subtlety is that "new
> SolrQueryParser(schema,field)" no longer pas attention to
> schema.getQueryParserDefaultOperator() -- that only only becomes
> applicable when using QueryParsing.parseQuery
>
> ...i am very okay with this change, i wasn't really a fan of the  
> fact that
> the SolrQueryParser pulled that info out of the IndexSchema in it's
> constructor previously, i just wanted to point out that this patch  
> would
> change that.
>
> Perhaps the constructor for SolrQueryParser shouldn't be aware of  
> the op
> at all (either from the schema or from the SolrParams) -- and  
> setting it
> should be left to QueryParsing.parseQuery (or some other utility in  
> the
> QueryParsing class) ... personally i'm a fan of leaving  
> SolrQueryParser as
> much like QueryParser as possible -- with the only real change  
> being the
> knowledege of hte individual field formats.

I've reworked it based on your feedback.  The patch is pasted below.

SolrQueryParser now knows nothing about the default operator, it is  
set from QueryParsing.parseQuery() when passed a SolrParams.

QueryParsing.parseQuery() methods could be simplified, perhaps even  
into a single method, that took a query expression and a  
SolrQueryRequest, where it can get the SolrParams and  IndexSchema.   
It could even get the "q" parameter from there, but there is code  
that passes expressions that don't come from "q".  Maybe we could  
have two parseQuery() methods:  parseQuery(String expression,  
SolrQueryRequest req) and parseQuery(SolrQueryRequest req), and for  
the latter the "q" parameter is pulled from the request and used as  
the expression.

As it is, the patch below works fine and I'm happy to commit it, but  
am happy to rework this sort of thing to get it as clean as others like.

	Erik


Index: src/java/org/apache/solr/search/SolrQueryParser.java
===================================================================
--- src/java/org/apache/solr/search/SolrQueryParser.java	(revision  
442772)
+++ src/java/org/apache/solr/search/SolrQueryParser.java	(working copy)
@@ -37,7 +37,6 @@
      super(defaultField == null ? schema.getDefaultSearchFieldName 
() : defaultField, schema.getQueryAnalyzer());
      this.schema = schema;
      setLowercaseExpandedTerms(false);
-    setDefaultOperator("AND".equals 
(schema.getQueryParserDefaultOperator()) ? QueryParser.Operator.AND :  
QueryParser.Operator.OR);
    }
    protected Query getFieldQuery(String field, String queryText)  
throws ParseException {
Index: src/java/org/apache/solr/search/QueryParsing.java
===================================================================
--- src/java/org/apache/solr/search/QueryParsing.java	(revision 442772)
+++ src/java/org/apache/solr/search/QueryParsing.java	(working copy)
@@ -19,6 +19,7 @@
import org.apache.lucene.search.*;
import org.apache.solr.search.function.*;
import org.apache.lucene.queryParser.ParseException;
+import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.Term;
import org.apache.solr.core.SolrCore;
@@ -26,6 +27,7 @@
import org.apache.solr.schema.IndexSchema;
import org.apache.solr.schema.SchemaField;
import org.apache.solr.schema.FieldType;
+import org.apache.solr.request.SolrParams;
import java.util.ArrayList;
import java.util.regex.Pattern;
@@ -37,6 +39,7 @@
   * @version $Id$
   */
public class QueryParsing {
+  public static final String OP = "q.op";
    public static Query parseQuery(String qs, IndexSchema schema) {
      return parseQuery(qs, null, schema);
@@ -58,8 +61,26 @@
      }
    }
+  public static Query parseQuery(String qs, String defaultField,  
SolrParams params, IndexSchema schema) {
+    try {
+      String opParam = params.get(OP,  
schema.getQueryParserDefaultOperator());
+      QueryParser.Operator defaultOperator = "AND".equals(opParam) ?  
QueryParser.Operator.AND : QueryParser.Operator.OR;
+      SolrQueryParser parser = new SolrQueryParser(schema,  
defaultField);
+      parser.setDefaultOperator(defaultOperator);
+      Query query = parser.parse(qs);
+      if (SolrCore.log.isLoggable(Level.FINEST)) {
+        SolrCore.log.finest("After QueryParser:" + query);
+      }
+      return query;
+
+    } catch (ParseException e) {
+      SolrCore.log(e);
+      throw new SolrException(400,"Error parsing Lucene query",e);
+    }
+  }
+
    /***
     * SortSpec encapsulates a Lucene Sort and a count of the number  
of documents
     * to return.
Index: src/java/org/apache/solr/request/StandardRequestHandler.java
===================================================================
--- src/java/org/apache/solr/request/StandardRequestHandler.java	 
(revision 442772)
+++ src/java/org/apache/solr/request/StandardRequestHandler.java	 
(working copy)
@@ -105,7 +105,7 @@
        List<String> commands = StrUtils.splitSmart(sreq,';');
        String qs = commands.size() >= 1 ? commands.get(0) : "";
-      Query query = QueryParsing.parseQuery(qs, defaultField,  
req.getSchema());
+      Query query = QueryParsing.parseQuery(qs, defaultField, p,  
req.getSchema());
        // If the first non-query, non-filter command is a simple  
sort on an indexed field, then
        // we can use the Lucene sort ability.


Re: Got it working! And some questions

Posted by Chris Hostetter <ho...@fucit.org>.
: I've implemented the ability to override the default operator with
: q.op=AND|OR.  The patch is pasted below for your review.

if i'm reading that right, one subtlety is that "new
SolrQueryParser(schema,field)" no longer pas attention to
schema.getQueryParserDefaultOperator() -- that only only becomes
applicable when using QueryParsing.parseQuery

...i am very okay with this change, i wasn't really a fan of the fact that
the SolrQueryParser pulled that info out of the IndexSchema in it's
constructor previously, i just wanted to point out that this patch would
change that.

Perhaps the constructor for SolrQueryParser shouldn't be aware of the op
at all (either from the schema or from the SolrParams) -- and setting it
should be left to QueryParsing.parseQuery (or some other utility in the
QueryParsing class) ... personally i'm a fan of leaving SolrQueryParser as
much like QueryParser as possible -- with the only real change being the
knowledege of hte individual field formats.


: Index: src/java/org/apache/solr/search/SolrQueryParser.java
: ===================================================================
: --- src/java/org/apache/solr/search/SolrQueryParser.java	(revision
: 442689)
: +++ src/java/org/apache/solr/search/SolrQueryParser.java	(working copy)
: @@ -34,10 +34,14 @@
:     protected final IndexSchema schema;
:     public SolrQueryParser(IndexSchema schema, String defaultField) {
: +    this(schema, defaultField, QueryParser.Operator.OR);
: +  }
: +
: +  public SolrQueryParser(IndexSchema schema, String defaultField,
: QueryParser.Operator defaultOperator) {
:       super(defaultField == null ? schema.getDefaultSearchFieldName
: () : defaultField, schema.getQueryAnalyzer());
:       this.schema = schema;
:       setLowercaseExpandedTerms(false);
: -    setDefaultOperator("AND".equals
: (schema.getQueryParserDefaultOperator()) ? QueryParser.Operator.AND :
: QueryParser.Operator.OR);
: +    setDefaultOperator(defaultOperator);
:     }
:     protected Query getFieldQuery(String field, String queryText)
: throws ParseException {
: Index: src/java/org/apache/solr/search/QueryParsing.java
: ===================================================================
: --- src/java/org/apache/solr/search/QueryParsing.java	(revision 442689)
: +++ src/java/org/apache/solr/search/QueryParsing.java	(working copy)
: @@ -19,6 +19,7 @@
: import org.apache.lucene.search.*;
: import org.apache.solr.search.function.*;
: import org.apache.lucene.queryParser.ParseException;
: +import org.apache.lucene.queryParser.QueryParser;
: import org.apache.lucene.document.Field;
: import org.apache.lucene.index.Term;
: import org.apache.solr.core.SolrCore;
: @@ -26,6 +27,7 @@
: import org.apache.solr.schema.IndexSchema;
: import org.apache.solr.schema.SchemaField;
: import org.apache.solr.schema.FieldType;
: +import org.apache.solr.request.SolrParams;
: import java.util.ArrayList;
: import java.util.regex.Pattern;
: @@ -37,6 +39,7 @@
:    * @version $Id$
:    */
: public class QueryParsing {
: +  public static final String OP = "q.op";
:     public static Query parseQuery(String qs, IndexSchema schema) {
:       return parseQuery(qs, null, schema);
: @@ -58,8 +61,24 @@
:       }
:     }
: +  public static Query parseQuery(String qs, String defaultField,
: SolrParams params, IndexSchema schema) {
: +    try {
: +      String opParam = params.get(OP,
: schema.getQueryParserDefaultOperator());
: +      QueryParser.Operator defaultOperator = "AND".equals(opParam) ?
: QueryParser.Operator.AND : QueryParser.Operator.OR;
: +      Query query = new SolrQueryParser(schema, defaultField,
: defaultOperator).parse(qs);
: +      if (SolrCore.log.isLoggable(Level.FINEST)) {
: +        SolrCore.log.finest("After QueryParser:" + query);
: +      }
: +      return query;
: +
: +    } catch (ParseException e) {
: +      SolrCore.log(e);
: +      throw new SolrException(400,"Error parsing Lucene query",e);
: +    }
: +  }
: +
:     /***
:      * SortSpec encapsulates a Lucene Sort and a count of the number
: of documents
:      * to return.
: Index: src/java/org/apache/solr/request/StandardRequestHandler.java
: ===================================================================
: --- src/java/org/apache/solr/request/StandardRequestHandler.java
: (revision 442689)
: +++ src/java/org/apache/solr/request/StandardRequestHandler.java
: (working copy)
: @@ -94,7 +94,7 @@
:         List<String> commands = StrUtils.splitSmart(sreq,';');
:         String qs = commands.size() >= 1 ? commands.get(0) : "";
: -      Query query = QueryParsing.parseQuery(qs, defaultField,
: req.getSchema());
: +      Query query = QueryParsing.parseQuery(qs, defaultField, p,
: req.getSchema());
:         // If the first non-query, non-filter command is a simple
: sort on an indexed field, then
:         // we can use the Lucene sort ability.
:



-Hoss


Re: Got it working! And some questions

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Sep 11, 2006, at 2:52 PM, Yonik Seeley wrote:

> On 9/11/06, Erik Hatcher <er...@ehatchersolutions.com> wrote:
>>
>> On Sep 10, 2006, at 10:47 PM, Michael Imbeault wrote:
>> >  I'm still a little disappointed that I can't change the OR/AND
>> > parsing by just changing some parameter (like I can do for the
>> > number of results returned, for example); adding a OR between each
>> > word in the text i want to compare sounds suboptimal, but i'll
>> > probably do it that way; its a very minor nitpick, solr is awesome,
>> > as I said before.
>>
>> I'm the one that added support for controlling the default operator
>> of Solr's query parser, and I hadn't considered the use case of
>> controlling that setting from a request parameter.  It should be easy
>> enough to add.  I'll take a look at adding that support and commit it
>> once I have it working.
>>
>> What parameter name should be used for this?    do=[AND|OR] (for
>> default operator)?  We have df for default field.
>
> Maybe something like q.op or q.oper if it *only* applies to q.  Which
> begs the question... what *does* it apply to?  At first blush, it
> doesn't seem like it should apply to other queries like fq, facet
> queries, and esp queries defined in solrconfig.xml.  I think that
> would be very surprising.

I've implemented the ability to override the default operator with  
q.op=AND|OR.  The patch is pasted below for your review.

The one thing I don't like is that QueryParsing.parseQuery(String qs,  
String defaultField, SolrParams params, IndexSchema schema) is a bit  
redundant in that it takes defaultField which can also be gleaned  
from params, but StandardRequestHandler uses "df" for highlighting also.

I'm happy to commit this if there are no objections or suggestions  
for improvement (and of course update the wiki documentation for the  
parameters).

	Erik



Index: src/java/org/apache/solr/search/SolrQueryParser.java
===================================================================
--- src/java/org/apache/solr/search/SolrQueryParser.java	(revision  
442689)
+++ src/java/org/apache/solr/search/SolrQueryParser.java	(working copy)
@@ -34,10 +34,14 @@
    protected final IndexSchema schema;
    public SolrQueryParser(IndexSchema schema, String defaultField) {
+    this(schema, defaultField, QueryParser.Operator.OR);
+  }
+
+  public SolrQueryParser(IndexSchema schema, String defaultField,  
QueryParser.Operator defaultOperator) {
      super(defaultField == null ? schema.getDefaultSearchFieldName 
() : defaultField, schema.getQueryAnalyzer());
      this.schema = schema;
      setLowercaseExpandedTerms(false);
-    setDefaultOperator("AND".equals 
(schema.getQueryParserDefaultOperator()) ? QueryParser.Operator.AND :  
QueryParser.Operator.OR);
+    setDefaultOperator(defaultOperator);
    }
    protected Query getFieldQuery(String field, String queryText)  
throws ParseException {
Index: src/java/org/apache/solr/search/QueryParsing.java
===================================================================
--- src/java/org/apache/solr/search/QueryParsing.java	(revision 442689)
+++ src/java/org/apache/solr/search/QueryParsing.java	(working copy)
@@ -19,6 +19,7 @@
import org.apache.lucene.search.*;
import org.apache.solr.search.function.*;
import org.apache.lucene.queryParser.ParseException;
+import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.document.Field;
import org.apache.lucene.index.Term;
import org.apache.solr.core.SolrCore;
@@ -26,6 +27,7 @@
import org.apache.solr.schema.IndexSchema;
import org.apache.solr.schema.SchemaField;
import org.apache.solr.schema.FieldType;
+import org.apache.solr.request.SolrParams;
import java.util.ArrayList;
import java.util.regex.Pattern;
@@ -37,6 +39,7 @@
   * @version $Id$
   */
public class QueryParsing {
+  public static final String OP = "q.op";
    public static Query parseQuery(String qs, IndexSchema schema) {
      return parseQuery(qs, null, schema);
@@ -58,8 +61,24 @@
      }
    }
+  public static Query parseQuery(String qs, String defaultField,  
SolrParams params, IndexSchema schema) {
+    try {
+      String opParam = params.get(OP,  
schema.getQueryParserDefaultOperator());
+      QueryParser.Operator defaultOperator = "AND".equals(opParam) ?  
QueryParser.Operator.AND : QueryParser.Operator.OR;
+      Query query = new SolrQueryParser(schema, defaultField,  
defaultOperator).parse(qs);
+      if (SolrCore.log.isLoggable(Level.FINEST)) {
+        SolrCore.log.finest("After QueryParser:" + query);
+      }
+      return query;
+
+    } catch (ParseException e) {
+      SolrCore.log(e);
+      throw new SolrException(400,"Error parsing Lucene query",e);
+    }
+  }
+
    /***
     * SortSpec encapsulates a Lucene Sort and a count of the number  
of documents
     * to return.
Index: src/java/org/apache/solr/request/StandardRequestHandler.java
===================================================================
--- src/java/org/apache/solr/request/StandardRequestHandler.java	 
(revision 442689)
+++ src/java/org/apache/solr/request/StandardRequestHandler.java	 
(working copy)
@@ -94,7 +94,7 @@
        List<String> commands = StrUtils.splitSmart(sreq,';');
        String qs = commands.size() >= 1 ? commands.get(0) : "";
-      Query query = QueryParsing.parseQuery(qs, defaultField,  
req.getSchema());
+      Query query = QueryParsing.parseQuery(qs, defaultField, p,  
req.getSchema());
        // If the first non-query, non-filter command is a simple  
sort on an indexed field, then
        // we can use the Lucene sort ability.


Re: Got it working! And some questions

Posted by Yonik Seeley <yo...@apache.org>.
On 9/11/06, Erik Hatcher <er...@ehatchersolutions.com> wrote:
>
> On Sep 10, 2006, at 10:47 PM, Michael Imbeault wrote:
> >  I'm still a little disappointed that I can't change the OR/AND
> > parsing by just changing some parameter (like I can do for the
> > number of results returned, for example); adding a OR between each
> > word in the text i want to compare sounds suboptimal, but i'll
> > probably do it that way; its a very minor nitpick, solr is awesome,
> > as I said before.
>
> I'm the one that added support for controlling the default operator
> of Solr's query parser, and I hadn't considered the use case of
> controlling that setting from a request parameter.  It should be easy
> enough to add.  I'll take a look at adding that support and commit it
> once I have it working.
>
> What parameter name should be used for this?    do=[AND|OR] (for
> default operator)?  We have df for default field.

Maybe something like q.op or q.oper if it *only* applies to q.  Which
begs the question... what *does* it apply to?  At first blush, it
doesn't seem like it should apply to other queries like fq, facet
queries, and esp queries defined in solrconfig.xml.  I think that
would be very surprising.

-Yonik

Re: Got it working! And some questions

Posted by Michael Imbeault <mi...@sympatico.ca>.
Hello Erik,

Thanks for add that feature! "do" is fine with me, if "op" is already 
used (not sure about this one).

Erik Hatcher wrote:
>
> On Sep 10, 2006, at 10:47 PM, Michael Imbeault wrote:
>>  I'm still a little disappointed that I can't change the OR/AND 
>> parsing by just changing some parameter (like I can do for the number 
>> of results returned, for example); adding a OR between each word in 
>> the text i want to compare sounds suboptimal, but i'll probably do it 
>> that way; its a very minor nitpick, solr is awesome, as I said before.
>
> I'm the one that added support for controlling the default operator of 
> Solr's query parser, and I hadn't considered the use case of 
> controlling that setting from a request parameter.  It should be easy 
> enough to add.  I'll take a look at adding that support and commit it 
> once I have it working.
>
> What parameter name should be used for this?    do=[AND|OR] (for 
> default operator)?  We have df for default field.
>
>     Erik
>
-- 
Michael Imbeault
CHUL Research Center (CHUQ)
2705 boul. Laurier
Ste-Foy, QC, Canada, G1V 4G2
Tel: (418) 654-2705, Fax: (418) 654-2212


Re: Got it working! And some questions

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Sep 10, 2006, at 10:47 PM, Michael Imbeault wrote:
>  I'm still a little disappointed that I can't change the OR/AND  
> parsing by just changing some parameter (like I can do for the  
> number of results returned, for example); adding a OR between each  
> word in the text i want to compare sounds suboptimal, but i'll  
> probably do it that way; its a very minor nitpick, solr is awesome,  
> as I said before.

I'm the one that added support for controlling the default operator  
of Solr's query parser, and I hadn't considered the use case of  
controlling that setting from a request parameter.  It should be easy  
enough to add.  I'll take a look at adding that support and commit it  
once I have it working.

What parameter name should be used for this?    do=[AND|OR] (for  
default operator)?  We have df for default field.

	Erik


Re: Got it working! And some questions

Posted by Michael Imbeault <mi...@sympatico.ca>.
First of all, it seems the mailing list is having some troubles? Some of 
my posts end up in the wrong thread (even new threads I post), I don't 
receive them in my mail, and they're present only in the 'date archive' 
of http://www.mail-archive.com, and not in the 'thread' one? I don't 
receive some of the other peoples post in my mail too, problems started 
last week I think.

Secondly, Chris, thanks for all the useful answers, everything is much 
clearer now. This info should be added to the wiki I think; should I do 
it? I'm still a little disappointed that I can't change the OR/AND 
parsing by just changing some parameter (like I can do for the number of 
results returned, for example); adding a OR between each word in the 
text i want to compare sounds suboptimal, but i'll probably do it that 
way; its a very minor nitpick, solr is awesome, as I said before.

@ Brian Lucas: Don't worry, solrPHP was still 99.9% functional, great 
work; part of it sending a doc at a time was my fault; I was following 
the exact sequence (add to array, submit) displayed in the docs. The 
only thing that could be added is a big "//TODO: change this code" 
before sections you have to change to make it work for a particular 
schema. I'm pretty sure the custom header curl submit works for everyone 
else than me; I'm on a windows test box with WAMP on it, so it may be 
caused by that. I'll send you tomorrow the changes I done to the code 
anyway; as I said, nothing major.

Chris Hostetter wrote:
> : - What is the loadFactor variable of HashDocSet? Should I optimize it too?
>
> this is the same as the loadFactor in a HashMap constructor -- but i don't
> think it has much affect on performance since the HashDocSets never
> "grow".
>
> I personally have never tuned the loadFactor :)
>
> : - What's the units on the size value of the caches? Megs, number of
> : queries, kilobytes? Not described anywhere.
>
> "entries" ... the number of items allowed in the cache.
>
> : - Any way to programatically change the OR/AND preference of the query
> : parser? I set it to AND by default for user queries, but i'd like to set
> : it to OR for some server-side queries I must do (find related articles,
> : order by score).
>
> you mean using StandardRequestHandler? ... not that i can think of off the
> top of my head, but typicaly it makes sense to just configure what you
> want for your "users" in the schema, and then make any machine generated
> queries be explicit.
>
> : - Whats the difference between the 2 commits type? Blocking and
> : non-blocking. Didn't see any differences at all, tried both.
>
> do you mean the waitFlush and waitSearcher options?
> if either of those is true, you shouldn't get a response back from the
> server untill they have finished.  if they are false, then the server
> should respond instantly even if it takes several seconds (or maybe even
> minutes) to complete the operation (optimizes can take a while in some
> cases -- as can opening newSearchers if you have a lot of cache warming
> configured)
>
> : - Every time I do an <optimize> command, I get the following in my
> : catalina logs - should I do anything about it?
>
> the optimize command needs to be well formed XML, try "<optimize/>"
> instead of just "<optimize>"
>
> : - Any benefits of setting the allowed memory for Tomcat higher? Right
> : now im allocating 384 megs.
>
> the more memory you've got, the more cachng you can support .. but if
> your index changes so frequently compared to the rate of *unique*
> queries you get that your caches never fill up, it may not matter.
>
>
>
>
> -Hoss
>   
-- 
Michael Imbeault
CHUL Research Center (CHUQ)
2705 boul. Laurier
Ste-Foy, QC, Canada, G1V 4G2
Tel: (418) 654-2705, Fax: (418) 654-2212


Re: Got it working! And some questions

Posted by Chris Hostetter <ho...@fucit.org>.
: - What is the loadFactor variable of HashDocSet? Should I optimize it too?

this is the same as the loadFactor in a HashMap constructor -- but i don't
think it has much affect on performance since the HashDocSets never
"grow".

I personally have never tuned the loadFactor :)

: - What's the units on the size value of the caches? Megs, number of
: queries, kilobytes? Not described anywhere.

"entries" ... the number of items allowed in the cache.

: - Any way to programatically change the OR/AND preference of the query
: parser? I set it to AND by default for user queries, but i'd like to set
: it to OR for some server-side queries I must do (find related articles,
: order by score).

you mean using StandardRequestHandler? ... not that i can think of off the
top of my head, but typicaly it makes sense to just configure what you
want for your "users" in the schema, and then make any machine generated
queries be explicit.

: - Whats the difference between the 2 commits type? Blocking and
: non-blocking. Didn't see any differences at all, tried both.

do you mean the waitFlush and waitSearcher options?
if either of those is true, you shouldn't get a response back from the
server untill they have finished.  if they are false, then the server
should respond instantly even if it takes several seconds (or maybe even
minutes) to complete the operation (optimizes can take a while in some
cases -- as can opening newSearchers if you have a lot of cache warming
configured)

: - Every time I do an <optimize> command, I get the following in my
: catalina logs - should I do anything about it?

the optimize command needs to be well formed XML, try "<optimize/>"
instead of just "<optimize>"

: - Any benefits of setting the allowed memory for Tomcat higher? Right
: now im allocating 384 megs.

the more memory you've got, the more cachng you can support .. but if
your index changes so frequently compared to the rate of *unique*
queries you get that your caches never fill up, it may not matter.




-Hoss


RE: Got it working! And some questions

Posted by Brian Lucas <bl...@gmail.com>.
Hi Michael,

I apologize for the lack of testing on the SolPHP.  I had to "strip" it down
significantly to turn it into a general class that would be usable and the
version up there has not been extensively tested yet (I'm almost ready to
get back to that and "revise" it), plus much of my coding is done in Rails
at the moment.  However...

If you have a new version, could you send it over my way or just upload it
to the wiki?  I'd like to take a look at the changes and throw your revised
version up there or integrate both versions into a cleaner revision of the
version already there.

With respect to batch queries, it's already designed to do that (that's why
you see "array($array)" in the example, because it accepts an array of
updates) but I'd definitely like to see how you revised it.

Thanks,
Brian


-----Original Message-----
From: Michael Imbeault [mailto:michael.imbeault@sympatico.ca] 
Sent: Saturday, September 09, 2006 12:30 PM
To: solr-user@lucene.apache.org
Subject: Got it working! And some questions

First of all, in reference to 
http://www.mail-archive.com/solr-user@lucene.apache.org/msg00808.html , 
I got it working! The problem(s) was coming from solPHP; the 
implementation in the wiki isn't really working, to be honest, at least 
for me. I had to modify it significantly at multiple places to get it 
working. Tomcat 5.5, WAMP and Windows XP.

The main problem was that addIndex was sending 1 doc at a time to solr; 
it would cause a problem after a few thousand docs because i was running 
out of resources. I modified solr_update.php to handle batch queries, 
and i'm now sending batches of 1000 docs at a time. Great indexing speed.

Had a slight problem with the curl function of solr_update.php; the 
custom HTTP header wasn't recognized; I now use curl_setopt($ch, 
CURLOPT_POST, 1); curl_setopt($ch, CURLOPT_POSTFIELDS, $post_string); - 
much simpler, and now everything works!

Up so far I indexed 15.000.000 documents (my whole collection, 
basically) and the performance i'm getting is INCREDIBLE (sub 100ms 
query time without warmup and no optimization at all on a 7 gigs index - 
and with the cache, it gets stupid fast)! Seriously, Solr amaze me every 
time I use it. I increased HashDocSet Maxsize to 75000, will continue to 
optimize this value - it helped a great deal. I will try disMaxHandler 
soon too; right now the standard one is great. And I will index with a 
better stopword file; the default one could really use improvements.

Some questions (couldn't find the answer in the docs):

- Is the solr php in the wiki working out of the box for anyone? Else we 
could modify the wiki...

- What is the loadFactor variable of HashDocSet? Should I optimize it too?

- What's the units on the size value of the caches? Megs, number of 
queries, kilobytes? Not described anywhere.

- Any way to programatically change the OR/AND preference of the query 
parser? I set it to AND by default for user queries, but i'd like to set 
it to OR for some server-side queries I must do (find related articles, 
order by score).

- Whats the difference between the 2 commits type? Blocking and 
non-blocking. Didn't see any differences at all, tried both.

- Every time I do an <optimize> command, I get the following in my 
catalina logs - should I do anything about it?

 9-Sep-2006 2:24:40 PM org.apache.solr.core.SolrException log
SEVERE: Exception during commit/optimize:java.io.EOFException: no more 
data available - expected end tag </optimize> to close start tag 
<optimize> from line 1, parser stopped on START_TAG seen <optimize>... @1:10

- Any benefits of setting the allowed memory for Tomcat higher? Right 
now im allocating 384 megs.

Can't wait to try the new Faceted Queries... seriously, solr is really, 
really awesome up so far. Thanks for all your work, and sorry for all 
the questions!

-- 
Michael Imbeault
CHUL Research Center (CHUQ)
2705 boul. Laurier
Ste-Foy, QC, Canada, G1V 4G2
Tel: (418) 654-2705, Fax: (418) 654-2212