You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Mohammad Norouzi <mn...@gmail.com> on 2007/08/15 07:18:28 UTC

query question

Hi

I am using WhitespaceAnalyzer and the query is " icdCode:H* " but there is
no result however I know that there are many documents with this field value
such as H20, H20.5 etc.     this field is tokenized and indexed what is
wrong with this?
when I test this query with Luke it will return no result as well.  I am not
sure whether the query parser change the word into lowercase internally or
not. but I have extended my own query parser and I am sure no change on
input query will be made.

any help would be appreciated

-- 
Regards,
Mohammad
--------------------------
see my blog: http://brainable.blogspot.com/
another in Persian: http://fekre-motefavet.blogspot.com/

Re: query question

Posted by Erick Erickson <er...@gmail.com>.
Mohammad:

See below....

On 8/19/07, Mohammad Norouzi <mn...@gmail.com> wrote:
>
> Erick,
> I am using WhitespaceAnalyzer, and yes it's mixed case, in my application
> I
> never change the entered information to lowercase because of some reasons,


I've found it waaaaaay easier to index things two different ways
rather than have to endlessly worry about case and special
characters. Especially since whatever you do will be wrong some
of the time. For instance, if you do index with case, "ca" wouldn't
match "Ca". And if I search for "ca" I'd get a  (potentially)
completely different set of responses than searching for "Ca",
which would confuse the users and result in endless bug reports.

Indexing the same data twice, once for search and once for
display isn't, I believe, any more expensive than indexing AND
storing the data. That is, say I'm indexing the text "This is some
text". I just add two fields to the doc, one stored but not indexed
and one indexed but not stored.

doc.add("field_search", "This is some text", Field.Store.NO,
        Field.Index.TOKENIZED);
doc.add("field_display", "This is some text", Field.Store.YES,
        Field.Index.NO);

With the appropriate analyzer and/or pre-processing, field_search
will be transformed into a "canonical" form, but the field_display
will be exactly what was entered, capitalization, punctuation,
etc. all in place.

I believe that this consumes pretty much the same resources
as indexing into a single field Field.Store.YES,
Field.Index.TOKENIZED. This makes your search behavior much
simpler. You "canonicalize" your  searchable
field. For instance, remove all punctuation, lowercase it, fold
characters perhaps (see below). My point here is that at *both*
index and search time, I "massage" the data to provide a better
user experience. Not to mention have to field fewer "Why didn't
my search return..... questions <G>.

But I still have my field_display which contains exactly what
was originally entered when I need it.

Of course I don't know whether this works for you, since your
problem space undoubtedly has its own constraints, but it's
something you should consider if possible.


BELOW: <G>> Folding: I have an English based application that
nevertheless has a very few foreign-language books. By folding all the
accented characters into their low-ASCII counterparts for indexing
and searching, but *displaying* the original text in the results, users
get what they expect.

the thing that I didn't consider was the punctuation in the indexes, but in
> query I didn't use any punctuation.  now using Luke, when I put Ca\. (with
> escaping dot) the result is 5 documents however I expect many more, the
> question is do I have to remove all dots and special characters from the
> indexed information while indexing?


See above. But I'd *start* by assuming that your searchable
fields should have all the extraneous stuff removed at *both*
index and search time. Which is pretty easy if you use the
same analyzer for the searchable fields during both operations.


>>And if you only knew how many times I've said something similar to ...
> been totally wrong
>
> Erick, I have to use this because we are writing an API to use object as
> the
> source of indexes and we have to map objects to documents and vice versa,
> would you tell me to make this what other way we should take?


What I was recommending is NOT that you do things a
different way in your *finished* application, but rather that
you simplify your use of Lucene, perhaps in a test or pilot
project, until you get the results you expect from indexing
and searching. Only when you start getting what you expect
from the simple cases should you try to get fancy.

Until then, my experience has been that I'm never sure
whether my problems are in my code or just that
I don't understand how the tool works.....

Best
Erick


On 8/18/07, Erick Erickson <er...@gmail.com> wrote:
> >
> > I think you'll get much farther much faster if you concentrate on
> > a very simple test case for searching until you get the results you
> > expect.
> >
> > It's particularly telling that you can't get your results from Luke.
> > All the rest of your code is irrelevant until you get what you expect
> > from Luke with a simple analyzer or with a stupid-simple bit of
> > test code. Until then, the rest of your code, in which bugs may
> > lurk, just gets in your way.
> >
> > For instance.... you have colons in your term text. I believe you have
> > to escape these for query parsing to work correctly. You have mixed
> > case. Are you absolutely sure that the casing is consistent between
> > indexing and querying? You have other punctuation. Are you also sure
> > that it's not stripped by the query ananlyzers? The fragment above
> > doesn't show us what analyzer you use. I flat guarantee that if it's
> > StandardAnalyzer, lots of punctuation is stripped and the term text is
> > lowercased. Some innocent-seeming bit of code can mess you up in
> > any of these cases.
> >
> > You'll get a log of mileage out of query.toString(), which shows you
> > exactly what the query you send to the searcher looks like. Just
> > copying this into Luke and playing around with it has been very helpful
> > to me.
> >
> > I can't emphasize enough that I've been well served by simplifying the
> > code until it worked. Usually this results for me in a forehead-slapping
> > moment and after that putting the complexity back in is easy. And the
> > total time spent is MUCH shorter than trying to debug the complex case.
> >
> > And if you only knew how many times I've said something similar to
> >
> > "in following code, Context and Dispatcher are parts of interceptor
> > pattern
> > in which I change the given values if they are number and has nothing to
> > do
> > with queries with string values"
> >
> > and been totally wrong <G>.....
> >
> > Best
> > Erick
> >
> > On 8/18/07, Mohammad Norouzi <mn...@gmail.com> wrote:
> > >
> > > testn,
> > >
> > > here is my code but the thing is strange is that by Luke I can't reach
> > my
> > > goal as well,
> > >
> > > look, I have a field (Indexed, Tokenized and Stored) this field has a
> > wide
> > > variety of values from numbers to characters, I give the query
> > > patientResult:oxalate but the result is no document (using
> > > WhitespaceAnalyzer) but I expect to have values like Ca. Oxalate:few
> and
> > > Ca.
> > > Oxalate:many
> > >
> > > in following code, Context and Dispatcher are parts of interceptor
> > pattern
> > > in which I change the given values if they are number and has nothing
> to
> > > do
> > > with queries with string values
> > >
> > >
> > > public class ExtendedQueryParser extends MultiFieldQueryParser {
> > >     private Log logger = LogFactory.getLog(ExtendedQueryParser.class);
> > >     /**
> > >      * if true, overrides the getRangeQuery() method and treat with
> > dates
> > > just like other strings, but
> > >      * if false, everything will normally proceed just like its super
> > > class.
> > >
> > >      */
> > >     private boolean asString;
> > >     private Class clazz;
> > >
> > >     public ExtendedQueryParser(String[] fields,Analyzer analyzer,Class
> > > clazz) {
> > >         super(fields,analyzer);
> > >         //this.asString = asString;
> > >         this.clazz = clazz;
> > >     }
> > >
> > >     @Override
> > >     protected org.apache.lucene.search.Query getRangeQuery(String
> field,
> > > String part1, String part2, boolean inclusive) throws ParseException {
> > >         String val1 = part1;
> > >         String val2 = part2;
> > >         String fieldName = field;
> > >         try {
> > >             Dispatcher dispatcher = Dispatcher.getInstance();
> > >             Context c = new Context();
> > >             c.setClazz(clazz);
> > >             c.setFieldData(MetadataHelper.getIndexField(clazz,field));
> > >             c.setValue(val1);
> > >             dispatcher.beforeQuery(c);
> > >             val1 = c.getWorkingValue();
> > >
> > >             c.setValue(val2);
> > >             dispatcher.beforeQuery(c);
> > >             val2 = c.getWorkingValue();
> > >             fieldName = c.getChangedFieldName();
> > >             logger.debug("Query text translated to
> > "+fieldName+":["+val1+
> > > "
> > > TO " + val2+"]");
> > >
> > >         } catch (Exception e) {
> > >             e.printStackTrace();
> > >         }
> > >
> > >         BooleanQuery.setMaxClauseCount(5120);//5 * 1024
> > >         return new RangeQuery(new Term(fieldName, val1),new
> > > Term(fieldName,
> > > val2),inclusive);
> > >     }
> > >
> > >     @Override
> > >     protected org.apache.lucene.search.Query getFieldQuery(String
> field,
> > > String queryText) throws ParseException {
> > >         logger.debug("FieldQuery no slop:"+queryText);
> > >         String val = queryText;
> > >         String fieldName = field;
> > >         try {
> > >             Dispatcher dispatcher = Dispatcher.getInstance();
> > >             Context c = new Context();
> > >             c.setClazz(clazz);
> > >             c.setFieldData(MetadataHelper.getIndexField(clazz,field));
> > >             c.setValue(val);
> > >             dispatcher.beforeQuery(c);
> > >             val = c.getWorkingValue();
> > >             fieldName = c.getChangedFieldName();
> > >             logger.debug("Query text translated to "+fieldName+ ":" +
> > > val);
> > >
> > >         } catch (Exception e) {
> > >             e.printStackTrace();
> > >         }
> > >
> > >         logger.debug("TermQuery...");
> > >         setLowercaseExpandedTerms(false);
> > >         TermQuery termQuery = new TermQuery(new Term(fieldName, val));
> > >
> > >         return termQuery;//(field,val);
> > >     }
> > >
> > >     @Override
> > >     protected org.apache.lucene.search.Query getFuzzyQuery(String
> arg0,
> > > String arg1, float arg2) throws ParseException {
> > >         logger.debug("FuzzyQuery Text:"+arg1);
> > >         return super.getFuzzyQuery(arg0, arg1, arg2);
> > >     }
> > >
> > >     @Override
> > >     protected org.apache.lucene.search.Query getPrefixQuery(String
> > field,
> > > String text) throws ParseException {
> > >         logger.debug("PrefixQuery Text:"+text);
> > >         //PrefixQuery prefixQuery = new PrefixQuery(new
> > Term(field,text));
> > >         setLowercaseExpandedTerms(false);
> > >         return super.getPrefixQuery(field,text);
> > >     }
> > >
> > >     @Override
> > >     protected org.apache.lucene.search.Query getWildcardQuery(String
> > > field,
> > > String text) throws ParseException {
> > >         logger.debug("WildcardQuery:"+text);
> > >         setLowercaseExpandedTerms(false);
> > >         //WildcardQuery doesn't need to perform any translation on its
> > > numbers
> > >         return super.getWildcardQuery(field, text);
> > >     }
> > >
> > >     @Override
> > >     protected Query getFieldQuery(String field, String queryText, int
> > > slop)
> > > throws ParseException {
> > >         logger.debug("PhraseQuery :"+queryText+" with slop:"+slop);
> > >         String val = queryText;
> > >         String fieldName = field;
> > >         try {
> > >             Dispatcher dispatcher = Dispatcher.getInstance();
> > >             Context c = new Context();
> > >             c.setClazz(clazz);
> > >             c.setFieldData(MetadataHelper.getIndexField(clazz,field));
> > >             c.setValue(val);
> > >             dispatcher.beforeQuery(c);
> > >             val = c.getWorkingValue();
> > >             fieldName = c.getChangedFieldName();
> > >             logger.debug("Query text translated to
> > > "+fieldName+":"+val+"");
> > >
> > >         } catch (Exception e) {
> > >             e.printStackTrace();
> > >         }
> > >         PhraseQuery phraseQuery = new PhraseQuery();
> > >         phraseQuery.add(new Term(fieldName, val));
> > >         phraseQuery.setSlop(slop);
> > >         return phraseQuery;
> > >     }
> > >
> > >
> > > }
> > > --------------------------
> > >
> > > On 8/16/07, testn <te...@doramail.com> wrote:
> > > >
> > > >
> > > > Can you post your code? Make sure that when you use wildcard in your
> > > > custom
> > > > query parser, it will generate either WildcardQuery or PrefixQuery
> > > > correctly.
> > > >
> > > >
> > > > is_maximum wrote:
> > > > >
> > > > > Yes karl, when I explore the index by Luke I can see the terms
> > > > > for example I have a field namely, patientResult, it contains
> value
> > > "Ca.
> > > > > Oxalate:many" and also other values such as "Ca. Oxalate:few" etc.
> > > > >
> > > > > the problems are when I put this query: patientResult:(Ca.
> > > Oxalate:few)
> > > > > the result is
> > > > > 84329 Ca. Oxalate:few
> > > > > 112519 Ca. Oxalate:many
> > > > > 139141 Ca. Oxalate:many
> > > > > 394321 Ca. Oxalate:few
> > > > > 397671 Ca. Oxalate:nod
> > > > > 387549 Ca. Oxalate: mod
> > > > >
> > > > > however this is not the required result but another problem is
> when
> > I
> > > > put
> > > > > patientResult:Oxalate or patientResult:Oxalate* no result will
> > > return!!!
> > > > >
> > > > > let me tell you that I am extended MultiFieldQueryParser to
> override
> > > its
> > > > > methods and in getFieldQuery(...) method I return TermQuery
> > > > >
> > > > > I don't know what I was made wrong?
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On 8/15/07, karl wettin <ka...@gmail.com> wrote:
> > > > >>
> > > > >>
> > > > >> 15 aug 2007 kl. 07.18 skrev Mohammad Norouzi:
> > > > >>
> > > > >> > I am using WhitespaceAnalyzer and the query is " icdCode:H* "
> but
> > > > >> > there is
> > > > >> > no result however I know that there are many documents with
> this
> > > > >> > field value
> > > > >> > such as H20, H20.5 etc.     this field is tokenized and indexed
> > > > >> > what is
> > > > >> > wrong with this?
> > > > >> > when I test this query with Luke it will return no result as
> > well.
> > > > >>
> > > > >> Can you also use Luke to inspect documents you know should
> contain
> > > > these
> > > > >> terms and make sure it really is in there?
> > > > >>
> > > > >> --
> > > > >> karl
> > > > >>
> > > > >>
> > ---------------------------------------------------------------------
> > > > >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > >> For additional commands, e-mail: java-user-help@lucene.apache.org
> > > > >>
> > > > >>
> > > > >
> > > > >
> > > > > --
> > > > > Regards,
> > > > > Mohammad
> > > > > --------------------------
> > > > > see my blog: http://brainable.blogspot.com/
> > > > > another in Persian: http://fekre-motefavet.blogspot.com/
> > > > >
> > > > >
> > > >
> > > > --
> > > > View this message in context:
> > > > http://www.nabble.com/query-question-tf4271198.html#a12185271
> > > > Sent from the Lucene - Java Users mailing list archive at Nabble.com
> .
> > > >
> > > >
> > > >
> ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > > >
> > > >
> > >
> > >
> > > --
> > > Regards,
> > > Mohammad
> > > --------------------------
> > > see my blog: http://brainable.blogspot.com/
> > > another in Persian: http://fekre-motefavet.blogspot.com/
> > >
> >
>
>
>
> --
> Regards,
> Mohammad
> --------------------------
> see my blog: http://brainable.blogspot.com/
> another in Persian: http://fekre-motefavet.blogspot.com/
>

Re: query question

Posted by Mohammad Norouzi <mn...@gmail.com>.
Erick,
I am using WhitespaceAnalyzer, and yes it's mixed case, in my application I
never change the entered information to lowercase because of some reasons,

the thing that I didn't consider was the punctuation in the indexes, but in
query I didn't use any punctuation.  now using Luke, when I put Ca\. (with
escaping dot) the result is 5 documents however I expect many more, the
question is do I have to remove all dots and special characters from the
indexed information while indexing?

>>And if you only knew how many times I've said something similar to ...
been totally wrong

Erick, I have to use this because we are writing an API to use object as the
source of indexes and we have to map objects to documents and vice versa,
would you tell me to make this what other way we should take?


On 8/18/07, Erick Erickson <er...@gmail.com> wrote:
>
> I think you'll get much farther much faster if you concentrate on
> a very simple test case for searching until you get the results you
> expect.
>
> It's particularly telling that you can't get your results from Luke.
> All the rest of your code is irrelevant until you get what you expect
> from Luke with a simple analyzer or with a stupid-simple bit of
> test code. Until then, the rest of your code, in which bugs may
> lurk, just gets in your way.
>
> For instance.... you have colons in your term text. I believe you have
> to escape these for query parsing to work correctly. You have mixed
> case. Are you absolutely sure that the casing is consistent between
> indexing and querying? You have other punctuation. Are you also sure
> that it's not stripped by the query ananlyzers? The fragment above
> doesn't show us what analyzer you use. I flat guarantee that if it's
> StandardAnalyzer, lots of punctuation is stripped and the term text is
> lowercased. Some innocent-seeming bit of code can mess you up in
> any of these cases.
>
> You'll get a log of mileage out of query.toString(), which shows you
> exactly what the query you send to the searcher looks like. Just
> copying this into Luke and playing around with it has been very helpful
> to me.
>
> I can't emphasize enough that I've been well served by simplifying the
> code until it worked. Usually this results for me in a forehead-slapping
> moment and after that putting the complexity back in is easy. And the
> total time spent is MUCH shorter than trying to debug the complex case.
>
> And if you only knew how many times I've said something similar to
>
> "in following code, Context and Dispatcher are parts of interceptor
> pattern
> in which I change the given values if they are number and has nothing to
> do
> with queries with string values"
>
> and been totally wrong <G>.....
>
> Best
> Erick
>
> On 8/18/07, Mohammad Norouzi <mn...@gmail.com> wrote:
> >
> > testn,
> >
> > here is my code but the thing is strange is that by Luke I can't reach
> my
> > goal as well,
> >
> > look, I have a field (Indexed, Tokenized and Stored) this field has a
> wide
> > variety of values from numbers to characters, I give the query
> > patientResult:oxalate but the result is no document (using
> > WhitespaceAnalyzer) but I expect to have values like Ca. Oxalate:few and
> > Ca.
> > Oxalate:many
> >
> > in following code, Context and Dispatcher are parts of interceptor
> pattern
> > in which I change the given values if they are number and has nothing to
> > do
> > with queries with string values
> >
> >
> > public class ExtendedQueryParser extends MultiFieldQueryParser {
> >     private Log logger = LogFactory.getLog(ExtendedQueryParser.class);
> >     /**
> >      * if true, overrides the getRangeQuery() method and treat with
> dates
> > just like other strings, but
> >      * if false, everything will normally proceed just like its super
> > class.
> >
> >      */
> >     private boolean asString;
> >     private Class clazz;
> >
> >     public ExtendedQueryParser(String[] fields,Analyzer analyzer,Class
> > clazz) {
> >         super(fields,analyzer);
> >         //this.asString = asString;
> >         this.clazz = clazz;
> >     }
> >
> >     @Override
> >     protected org.apache.lucene.search.Query getRangeQuery(String field,
> > String part1, String part2, boolean inclusive) throws ParseException {
> >         String val1 = part1;
> >         String val2 = part2;
> >         String fieldName = field;
> >         try {
> >             Dispatcher dispatcher = Dispatcher.getInstance();
> >             Context c = new Context();
> >             c.setClazz(clazz);
> >             c.setFieldData(MetadataHelper.getIndexField(clazz,field));
> >             c.setValue(val1);
> >             dispatcher.beforeQuery(c);
> >             val1 = c.getWorkingValue();
> >
> >             c.setValue(val2);
> >             dispatcher.beforeQuery(c);
> >             val2 = c.getWorkingValue();
> >             fieldName = c.getChangedFieldName();
> >             logger.debug("Query text translated to
> "+fieldName+":["+val1+
> > "
> > TO " + val2+"]");
> >
> >         } catch (Exception e) {
> >             e.printStackTrace();
> >         }
> >
> >         BooleanQuery.setMaxClauseCount(5120);//5 * 1024
> >         return new RangeQuery(new Term(fieldName, val1),new
> > Term(fieldName,
> > val2),inclusive);
> >     }
> >
> >     @Override
> >     protected org.apache.lucene.search.Query getFieldQuery(String field,
> > String queryText) throws ParseException {
> >         logger.debug("FieldQuery no slop:"+queryText);
> >         String val = queryText;
> >         String fieldName = field;
> >         try {
> >             Dispatcher dispatcher = Dispatcher.getInstance();
> >             Context c = new Context();
> >             c.setClazz(clazz);
> >             c.setFieldData(MetadataHelper.getIndexField(clazz,field));
> >             c.setValue(val);
> >             dispatcher.beforeQuery(c);
> >             val = c.getWorkingValue();
> >             fieldName = c.getChangedFieldName();
> >             logger.debug("Query text translated to "+fieldName+ ":" +
> > val);
> >
> >         } catch (Exception e) {
> >             e.printStackTrace();
> >         }
> >
> >         logger.debug("TermQuery...");
> >         setLowercaseExpandedTerms(false);
> >         TermQuery termQuery = new TermQuery(new Term(fieldName, val));
> >
> >         return termQuery;//(field,val);
> >     }
> >
> >     @Override
> >     protected org.apache.lucene.search.Query getFuzzyQuery(String arg0,
> > String arg1, float arg2) throws ParseException {
> >         logger.debug("FuzzyQuery Text:"+arg1);
> >         return super.getFuzzyQuery(arg0, arg1, arg2);
> >     }
> >
> >     @Override
> >     protected org.apache.lucene.search.Query getPrefixQuery(String
> field,
> > String text) throws ParseException {
> >         logger.debug("PrefixQuery Text:"+text);
> >         //PrefixQuery prefixQuery = new PrefixQuery(new
> Term(field,text));
> >         setLowercaseExpandedTerms(false);
> >         return super.getPrefixQuery(field,text);
> >     }
> >
> >     @Override
> >     protected org.apache.lucene.search.Query getWildcardQuery(String
> > field,
> > String text) throws ParseException {
> >         logger.debug("WildcardQuery:"+text);
> >         setLowercaseExpandedTerms(false);
> >         //WildcardQuery doesn't need to perform any translation on its
> > numbers
> >         return super.getWildcardQuery(field, text);
> >     }
> >
> >     @Override
> >     protected Query getFieldQuery(String field, String queryText, int
> > slop)
> > throws ParseException {
> >         logger.debug("PhraseQuery :"+queryText+" with slop:"+slop);
> >         String val = queryText;
> >         String fieldName = field;
> >         try {
> >             Dispatcher dispatcher = Dispatcher.getInstance();
> >             Context c = new Context();
> >             c.setClazz(clazz);
> >             c.setFieldData(MetadataHelper.getIndexField(clazz,field));
> >             c.setValue(val);
> >             dispatcher.beforeQuery(c);
> >             val = c.getWorkingValue();
> >             fieldName = c.getChangedFieldName();
> >             logger.debug("Query text translated to
> > "+fieldName+":"+val+"");
> >
> >         } catch (Exception e) {
> >             e.printStackTrace();
> >         }
> >         PhraseQuery phraseQuery = new PhraseQuery();
> >         phraseQuery.add(new Term(fieldName, val));
> >         phraseQuery.setSlop(slop);
> >         return phraseQuery;
> >     }
> >
> >
> > }
> > --------------------------
> >
> > On 8/16/07, testn <te...@doramail.com> wrote:
> > >
> > >
> > > Can you post your code? Make sure that when you use wildcard in your
> > > custom
> > > query parser, it will generate either WildcardQuery or PrefixQuery
> > > correctly.
> > >
> > >
> > > is_maximum wrote:
> > > >
> > > > Yes karl, when I explore the index by Luke I can see the terms
> > > > for example I have a field namely, patientResult, it contains value
> > "Ca.
> > > > Oxalate:many" and also other values such as "Ca. Oxalate:few" etc.
> > > >
> > > > the problems are when I put this query: patientResult:(Ca.
> > Oxalate:few)
> > > > the result is
> > > > 84329 Ca. Oxalate:few
> > > > 112519 Ca. Oxalate:many
> > > > 139141 Ca. Oxalate:many
> > > > 394321 Ca. Oxalate:few
> > > > 397671 Ca. Oxalate:nod
> > > > 387549 Ca. Oxalate: mod
> > > >
> > > > however this is not the required result but another problem is when
> I
> > > put
> > > > patientResult:Oxalate or patientResult:Oxalate* no result will
> > return!!!
> > > >
> > > > let me tell you that I am extended MultiFieldQueryParser to override
> > its
> > > > methods and in getFieldQuery(...) method I return TermQuery
> > > >
> > > > I don't know what I was made wrong?
> > > >
> > > >
> > > >
> > > >
> > > > On 8/15/07, karl wettin <ka...@gmail.com> wrote:
> > > >>
> > > >>
> > > >> 15 aug 2007 kl. 07.18 skrev Mohammad Norouzi:
> > > >>
> > > >> > I am using WhitespaceAnalyzer and the query is " icdCode:H* " but
> > > >> > there is
> > > >> > no result however I know that there are many documents with this
> > > >> > field value
> > > >> > such as H20, H20.5 etc.     this field is tokenized and indexed
> > > >> > what is
> > > >> > wrong with this?
> > > >> > when I test this query with Luke it will return no result as
> well.
> > > >>
> > > >> Can you also use Luke to inspect documents you know should contain
> > > these
> > > >> terms and make sure it really is in there?
> > > >>
> > > >> --
> > > >> karl
> > > >>
> > > >>
> ---------------------------------------------------------------------
> > > >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > >> For additional commands, e-mail: java-user-help@lucene.apache.org
> > > >>
> > > >>
> > > >
> > > >
> > > > --
> > > > Regards,
> > > > Mohammad
> > > > --------------------------
> > > > see my blog: http://brainable.blogspot.com/
> > > > another in Persian: http://fekre-motefavet.blogspot.com/
> > > >
> > > >
> > >
> > > --
> > > View this message in context:
> > > http://www.nabble.com/query-question-tf4271198.html#a12185271
> > > Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> >
> >
> > --
> > Regards,
> > Mohammad
> > --------------------------
> > see my blog: http://brainable.blogspot.com/
> > another in Persian: http://fekre-motefavet.blogspot.com/
> >
>



-- 
Regards,
Mohammad
--------------------------
see my blog: http://brainable.blogspot.com/
another in Persian: http://fekre-motefavet.blogspot.com/

Re: query question

Posted by Erick Erickson <er...@gmail.com>.
I think you'll get much farther much faster if you concentrate on
a very simple test case for searching until you get the results you
expect.

It's particularly telling that you can't get your results from Luke.
All the rest of your code is irrelevant until you get what you expect
from Luke with a simple analyzer or with a stupid-simple bit of
test code. Until then, the rest of your code, in which bugs may
lurk, just gets in your way.

For instance.... you have colons in your term text. I believe you have
to escape these for query parsing to work correctly. You have mixed
case. Are you absolutely sure that the casing is consistent between
indexing and querying? You have other punctuation. Are you also sure
that it's not stripped by the query ananlyzers? The fragment above
doesn't show us what analyzer you use. I flat guarantee that if it's
StandardAnalyzer, lots of punctuation is stripped and the term text is
lowercased. Some innocent-seeming bit of code can mess you up in
any of these cases.

You'll get a log of mileage out of query.toString(), which shows you
exactly what the query you send to the searcher looks like. Just
copying this into Luke and playing around with it has been very helpful
to me.

I can't emphasize enough that I've been well served by simplifying the
code until it worked. Usually this results for me in a forehead-slapping
moment and after that putting the complexity back in is easy. And the
total time spent is MUCH shorter than trying to debug the complex case.

And if you only knew how many times I've said something similar to

"in following code, Context and Dispatcher are parts of interceptor pattern
in which I change the given values if they are number and has nothing to do
with queries with string values"

and been totally wrong <G>.....

Best
Erick

On 8/18/07, Mohammad Norouzi <mn...@gmail.com> wrote:
>
> testn,
>
> here is my code but the thing is strange is that by Luke I can't reach my
> goal as well,
>
> look, I have a field (Indexed, Tokenized and Stored) this field has a wide
> variety of values from numbers to characters, I give the query
> patientResult:oxalate but the result is no document (using
> WhitespaceAnalyzer) but I expect to have values like Ca. Oxalate:few and
> Ca.
> Oxalate:many
>
> in following code, Context and Dispatcher are parts of interceptor pattern
> in which I change the given values if they are number and has nothing to
> do
> with queries with string values
>
>
> public class ExtendedQueryParser extends MultiFieldQueryParser {
>     private Log logger = LogFactory.getLog(ExtendedQueryParser.class);
>     /**
>      * if true, overrides the getRangeQuery() method and treat with dates
> just like other strings, but
>      * if false, everything will normally proceed just like its super
> class.
>
>      */
>     private boolean asString;
>     private Class clazz;
>
>     public ExtendedQueryParser(String[] fields,Analyzer analyzer,Class
> clazz) {
>         super(fields,analyzer);
>         //this.asString = asString;
>         this.clazz = clazz;
>     }
>
>     @Override
>     protected org.apache.lucene.search.Query getRangeQuery(String field,
> String part1, String part2, boolean inclusive) throws ParseException {
>         String val1 = part1;
>         String val2 = part2;
>         String fieldName = field;
>         try {
>             Dispatcher dispatcher = Dispatcher.getInstance();
>             Context c = new Context();
>             c.setClazz(clazz);
>             c.setFieldData(MetadataHelper.getIndexField(clazz,field));
>             c.setValue(val1);
>             dispatcher.beforeQuery(c);
>             val1 = c.getWorkingValue();
>
>             c.setValue(val2);
>             dispatcher.beforeQuery(c);
>             val2 = c.getWorkingValue();
>             fieldName = c.getChangedFieldName();
>             logger.debug("Query text translated to "+fieldName+":["+val1+
> "
> TO " + val2+"]");
>
>         } catch (Exception e) {
>             e.printStackTrace();
>         }
>
>         BooleanQuery.setMaxClauseCount(5120);//5 * 1024
>         return new RangeQuery(new Term(fieldName, val1),new
> Term(fieldName,
> val2),inclusive);
>     }
>
>     @Override
>     protected org.apache.lucene.search.Query getFieldQuery(String field,
> String queryText) throws ParseException {
>         logger.debug("FieldQuery no slop:"+queryText);
>         String val = queryText;
>         String fieldName = field;
>         try {
>             Dispatcher dispatcher = Dispatcher.getInstance();
>             Context c = new Context();
>             c.setClazz(clazz);
>             c.setFieldData(MetadataHelper.getIndexField(clazz,field));
>             c.setValue(val);
>             dispatcher.beforeQuery(c);
>             val = c.getWorkingValue();
>             fieldName = c.getChangedFieldName();
>             logger.debug("Query text translated to "+fieldName+ ":" +
> val);
>
>         } catch (Exception e) {
>             e.printStackTrace();
>         }
>
>         logger.debug("TermQuery...");
>         setLowercaseExpandedTerms(false);
>         TermQuery termQuery = new TermQuery(new Term(fieldName, val));
>
>         return termQuery;//(field,val);
>     }
>
>     @Override
>     protected org.apache.lucene.search.Query getFuzzyQuery(String arg0,
> String arg1, float arg2) throws ParseException {
>         logger.debug("FuzzyQuery Text:"+arg1);
>         return super.getFuzzyQuery(arg0, arg1, arg2);
>     }
>
>     @Override
>     protected org.apache.lucene.search.Query getPrefixQuery(String field,
> String text) throws ParseException {
>         logger.debug("PrefixQuery Text:"+text);
>         //PrefixQuery prefixQuery = new PrefixQuery(new Term(field,text));
>         setLowercaseExpandedTerms(false);
>         return super.getPrefixQuery(field,text);
>     }
>
>     @Override
>     protected org.apache.lucene.search.Query getWildcardQuery(String
> field,
> String text) throws ParseException {
>         logger.debug("WildcardQuery:"+text);
>         setLowercaseExpandedTerms(false);
>         //WildcardQuery doesn't need to perform any translation on its
> numbers
>         return super.getWildcardQuery(field, text);
>     }
>
>     @Override
>     protected Query getFieldQuery(String field, String queryText, int
> slop)
> throws ParseException {
>         logger.debug("PhraseQuery :"+queryText+" with slop:"+slop);
>         String val = queryText;
>         String fieldName = field;
>         try {
>             Dispatcher dispatcher = Dispatcher.getInstance();
>             Context c = new Context();
>             c.setClazz(clazz);
>             c.setFieldData(MetadataHelper.getIndexField(clazz,field));
>             c.setValue(val);
>             dispatcher.beforeQuery(c);
>             val = c.getWorkingValue();
>             fieldName = c.getChangedFieldName();
>             logger.debug("Query text translated to
> "+fieldName+":"+val+"");
>
>         } catch (Exception e) {
>             e.printStackTrace();
>         }
>         PhraseQuery phraseQuery = new PhraseQuery();
>         phraseQuery.add(new Term(fieldName, val));
>         phraseQuery.setSlop(slop);
>         return phraseQuery;
>     }
>
>
> }
> --------------------------
>
> On 8/16/07, testn <te...@doramail.com> wrote:
> >
> >
> > Can you post your code? Make sure that when you use wildcard in your
> > custom
> > query parser, it will generate either WildcardQuery or PrefixQuery
> > correctly.
> >
> >
> > is_maximum wrote:
> > >
> > > Yes karl, when I explore the index by Luke I can see the terms
> > > for example I have a field namely, patientResult, it contains value
> "Ca.
> > > Oxalate:many" and also other values such as "Ca. Oxalate:few" etc.
> > >
> > > the problems are when I put this query: patientResult:(Ca.
> Oxalate:few)
> > > the result is
> > > 84329 Ca. Oxalate:few
> > > 112519 Ca. Oxalate:many
> > > 139141 Ca. Oxalate:many
> > > 394321 Ca. Oxalate:few
> > > 397671 Ca. Oxalate:nod
> > > 387549 Ca. Oxalate: mod
> > >
> > > however this is not the required result but another problem is when I
> > put
> > > patientResult:Oxalate or patientResult:Oxalate* no result will
> return!!!
> > >
> > > let me tell you that I am extended MultiFieldQueryParser to override
> its
> > > methods and in getFieldQuery(...) method I return TermQuery
> > >
> > > I don't know what I was made wrong?
> > >
> > >
> > >
> > >
> > > On 8/15/07, karl wettin <ka...@gmail.com> wrote:
> > >>
> > >>
> > >> 15 aug 2007 kl. 07.18 skrev Mohammad Norouzi:
> > >>
> > >> > I am using WhitespaceAnalyzer and the query is " icdCode:H* " but
> > >> > there is
> > >> > no result however I know that there are many documents with this
> > >> > field value
> > >> > such as H20, H20.5 etc.     this field is tokenized and indexed
> > >> > what is
> > >> > wrong with this?
> > >> > when I test this query with Luke it will return no result as well.
> > >>
> > >> Can you also use Luke to inspect documents you know should contain
> > these
> > >> terms and make sure it really is in there?
> > >>
> > >> --
> > >> karl
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > >> For additional commands, e-mail: java-user-help@lucene.apache.org
> > >>
> > >>
> > >
> > >
> > > --
> > > Regards,
> > > Mohammad
> > > --------------------------
> > > see my blog: http://brainable.blogspot.com/
> > > another in Persian: http://fekre-motefavet.blogspot.com/
> > >
> > >
> >
> > --
> > View this message in context:
> > http://www.nabble.com/query-question-tf4271198.html#a12185271
> > Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
>
> --
> Regards,
> Mohammad
> --------------------------
> see my blog: http://brainable.blogspot.com/
> another in Persian: http://fekre-motefavet.blogspot.com/
>

Re: query question

Posted by Mohammad Norouzi <mn...@gmail.com>.
testn,

here is my code but the thing is strange is that by Luke I can't reach my
goal as well,

look, I have a field (Indexed, Tokenized and Stored) this field has a wide
variety of values from numbers to characters, I give the query
patientResult:oxalate but the result is no document (using
WhitespaceAnalyzer) but I expect to have values like Ca. Oxalate:few and Ca.
Oxalate:many

in following code, Context and Dispatcher are parts of interceptor pattern
in which I change the given values if they are number and has nothing to do
with queries with string values


public class ExtendedQueryParser extends MultiFieldQueryParser {
    private Log logger = LogFactory.getLog(ExtendedQueryParser.class);
    /**
     * if true, overrides the getRangeQuery() method and treat with dates
just like other strings, but
     * if false, everything will normally proceed just like its super class.

     */
    private boolean asString;
    private Class clazz;

    public ExtendedQueryParser(String[] fields,Analyzer analyzer,Class
clazz) {
        super(fields,analyzer);
        //this.asString = asString;
        this.clazz = clazz;
    }

    @Override
    protected org.apache.lucene.search.Query getRangeQuery(String field,
String part1, String part2, boolean inclusive) throws ParseException {
        String val1 = part1;
        String val2 = part2;
        String fieldName = field;
        try {
            Dispatcher dispatcher = Dispatcher.getInstance();
            Context c = new Context();
            c.setClazz(clazz);
            c.setFieldData(MetadataHelper.getIndexField(clazz,field));
            c.setValue(val1);
            dispatcher.beforeQuery(c);
            val1 = c.getWorkingValue();

            c.setValue(val2);
            dispatcher.beforeQuery(c);
            val2 = c.getWorkingValue();
            fieldName = c.getChangedFieldName();
            logger.debug("Query text translated to "+fieldName+":["+val1+ "
TO " + val2+"]");

        } catch (Exception e) {
            e.printStackTrace();
        }

        BooleanQuery.setMaxClauseCount(5120);//5 * 1024
        return new RangeQuery(new Term(fieldName, val1),new Term(fieldName,
val2),inclusive);
    }

    @Override
    protected org.apache.lucene.search.Query getFieldQuery(String field,
String queryText) throws ParseException {
        logger.debug("FieldQuery no slop:"+queryText);
        String val = queryText;
        String fieldName = field;
        try {
            Dispatcher dispatcher = Dispatcher.getInstance();
            Context c = new Context();
            c.setClazz(clazz);
            c.setFieldData(MetadataHelper.getIndexField(clazz,field));
            c.setValue(val);
            dispatcher.beforeQuery(c);
            val = c.getWorkingValue();
            fieldName = c.getChangedFieldName();
            logger.debug("Query text translated to "+fieldName+ ":" + val);

        } catch (Exception e) {
            e.printStackTrace();
        }

        logger.debug("TermQuery...");
        setLowercaseExpandedTerms(false);
        TermQuery termQuery = new TermQuery(new Term(fieldName, val));

        return termQuery;//(field,val);
    }

    @Override
    protected org.apache.lucene.search.Query getFuzzyQuery(String arg0,
String arg1, float arg2) throws ParseException {
        logger.debug("FuzzyQuery Text:"+arg1);
        return super.getFuzzyQuery(arg0, arg1, arg2);
    }

    @Override
    protected org.apache.lucene.search.Query getPrefixQuery(String field,
String text) throws ParseException {
        logger.debug("PrefixQuery Text:"+text);
        //PrefixQuery prefixQuery = new PrefixQuery(new Term(field,text));
        setLowercaseExpandedTerms(false);
        return super.getPrefixQuery(field,text);
    }

    @Override
    protected org.apache.lucene.search.Query getWildcardQuery(String field,
String text) throws ParseException {
        logger.debug("WildcardQuery:"+text);
        setLowercaseExpandedTerms(false);
        //WildcardQuery doesn't need to perform any translation on its
numbers
        return super.getWildcardQuery(field, text);
    }

    @Override
    protected Query getFieldQuery(String field, String queryText, int slop)
throws ParseException {
        logger.debug("PhraseQuery :"+queryText+" with slop:"+slop);
        String val = queryText;
        String fieldName = field;
        try {
            Dispatcher dispatcher = Dispatcher.getInstance();
            Context c = new Context();
            c.setClazz(clazz);
            c.setFieldData(MetadataHelper.getIndexField(clazz,field));
            c.setValue(val);
            dispatcher.beforeQuery(c);
            val = c.getWorkingValue();
            fieldName = c.getChangedFieldName();
            logger.debug("Query text translated to "+fieldName+":"+val+"");

        } catch (Exception e) {
            e.printStackTrace();
        }
        PhraseQuery phraseQuery = new PhraseQuery();
        phraseQuery.add(new Term(fieldName, val));
        phraseQuery.setSlop(slop);
        return phraseQuery;
    }


}
--------------------------

On 8/16/07, testn <te...@doramail.com> wrote:
>
>
> Can you post your code? Make sure that when you use wildcard in your
> custom
> query parser, it will generate either WildcardQuery or PrefixQuery
> correctly.
>
>
> is_maximum wrote:
> >
> > Yes karl, when I explore the index by Luke I can see the terms
> > for example I have a field namely, patientResult, it contains value "Ca.
> > Oxalate:many" and also other values such as "Ca. Oxalate:few" etc.
> >
> > the problems are when I put this query: patientResult:(Ca. Oxalate:few)
> > the result is
> > 84329 Ca. Oxalate:few
> > 112519 Ca. Oxalate:many
> > 139141 Ca. Oxalate:many
> > 394321 Ca. Oxalate:few
> > 397671 Ca. Oxalate:nod
> > 387549 Ca. Oxalate: mod
> >
> > however this is not the required result but another problem is when I
> put
> > patientResult:Oxalate or patientResult:Oxalate* no result will return!!!
> >
> > let me tell you that I am extended MultiFieldQueryParser to override its
> > methods and in getFieldQuery(...) method I return TermQuery
> >
> > I don't know what I was made wrong?
> >
> >
> >
> >
> > On 8/15/07, karl wettin <ka...@gmail.com> wrote:
> >>
> >>
> >> 15 aug 2007 kl. 07.18 skrev Mohammad Norouzi:
> >>
> >> > I am using WhitespaceAnalyzer and the query is " icdCode:H* " but
> >> > there is
> >> > no result however I know that there are many documents with this
> >> > field value
> >> > such as H20, H20.5 etc.     this field is tokenized and indexed
> >> > what is
> >> > wrong with this?
> >> > when I test this query with Luke it will return no result as well.
> >>
> >> Can you also use Luke to inspect documents you know should contain
> these
> >> terms and make sure it really is in there?
> >>
> >> --
> >> karl
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> >
> >
> > --
> > Regards,
> > Mohammad
> > --------------------------
> > see my blog: http://brainable.blogspot.com/
> > another in Persian: http://fekre-motefavet.blogspot.com/
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/query-question-tf4271198.html#a12185271
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
Regards,
Mohammad
--------------------------
see my blog: http://brainable.blogspot.com/
another in Persian: http://fekre-motefavet.blogspot.com/

Re: query question

Posted by testn <te...@doramail.com>.
Can you post your code? Make sure that when you use wildcard in your custom
query parser, it will generate either WildcardQuery or PrefixQuery
correctly. 


is_maximum wrote:
> 
> Yes karl, when I explore the index by Luke I can see the terms
> for example I have a field namely, patientResult, it contains value "Ca.
> Oxalate:many" and also other values such as "Ca. Oxalate:few" etc.
> 
> the problems are when I put this query: patientResult:(Ca. Oxalate:few)
> the result is
> 84329 Ca. Oxalate:few
> 112519 Ca. Oxalate:many
> 139141 Ca. Oxalate:many
> 394321 Ca. Oxalate:few
> 397671 Ca. Oxalate:nod
> 387549 Ca. Oxalate: mod
> 
> however this is not the required result but another problem is when I put
> patientResult:Oxalate or patientResult:Oxalate* no result will return!!!
> 
> let me tell you that I am extended MultiFieldQueryParser to override its
> methods and in getFieldQuery(...) method I return TermQuery
> 
> I don't know what I was made wrong?
> 
> 
> 
> 
> On 8/15/07, karl wettin <ka...@gmail.com> wrote:
>>
>>
>> 15 aug 2007 kl. 07.18 skrev Mohammad Norouzi:
>>
>> > I am using WhitespaceAnalyzer and the query is " icdCode:H* " but
>> > there is
>> > no result however I know that there are many documents with this
>> > field value
>> > such as H20, H20.5 etc.     this field is tokenized and indexed
>> > what is
>> > wrong with this?
>> > when I test this query with Luke it will return no result as well.
>>
>> Can you also use Luke to inspect documents you know should contain these
>> terms and make sure it really is in there?
>>
>> --
>> karl
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> 
> 
> -- 
> Regards,
> Mohammad
> --------------------------
> see my blog: http://brainable.blogspot.com/
> another in Persian: http://fekre-motefavet.blogspot.com/
> 
> 

-- 
View this message in context: http://www.nabble.com/query-question-tf4271198.html#a12185271
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: query question

Posted by Mohammad Norouzi <mn...@gmail.com>.
Yes karl, when I explore the index by Luke I can see the terms
for example I have a field namely, patientResult, it contains value "Ca.
Oxalate:many" and also other values such as "Ca. Oxalate:few" etc.

the problems are when I put this query: patientResult:(Ca. Oxalate:few)
the result is
84329 Ca. Oxalate:few
112519 Ca. Oxalate:many
139141 Ca. Oxalate:many
394321 Ca. Oxalate:few
397671 Ca. Oxalate:nod
387549 Ca. Oxalate: mod

however this is not the required result but another problem is when I put
patientResult:Oxalate or patientResult:Oxalate* no result will return!!!

let me tell you that I am extended MultiFieldQueryParser to override its
methods and in getFieldQuery(...) method I return TermQuery

I don't know what I was made wrong?




On 8/15/07, karl wettin <ka...@gmail.com> wrote:
>
>
> 15 aug 2007 kl. 07.18 skrev Mohammad Norouzi:
>
> > I am using WhitespaceAnalyzer and the query is " icdCode:H* " but
> > there is
> > no result however I know that there are many documents with this
> > field value
> > such as H20, H20.5 etc.     this field is tokenized and indexed
> > what is
> > wrong with this?
> > when I test this query with Luke it will return no result as well.
>
> Can you also use Luke to inspect documents you know should contain these
> terms and make sure it really is in there?
>
> --
> karl
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
Regards,
Mohammad
--------------------------
see my blog: http://brainable.blogspot.com/
another in Persian: http://fekre-motefavet.blogspot.com/

Re: query question

Posted by karl wettin <ka...@gmail.com>.
15 aug 2007 kl. 07.18 skrev Mohammad Norouzi:

> I am using WhitespaceAnalyzer and the query is " icdCode:H* " but  
> there is
> no result however I know that there are many documents with this  
> field value
> such as H20, H20.5 etc.     this field is tokenized and indexed  
> what is
> wrong with this?
> when I test this query with Luke it will return no result as well.

Can you also use Luke to inspect documents you know should contain these
terms and make sure it really is in there?

-- 
karl

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org