You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Fred Toth <ft...@synernet.com> on 2004/09/24 18:26:40 UTC

Keyword query confusion

Hi all,

I'm trying to understand what's going on with the query parser
and keyword fields.

I've got a large subset of my documents which are "publications".
So as to be able to query these, I've got this in the indexer:

doc.add(Field.Keyword("is_pub", "1"));

However, if I run a query:

	is_pub:1

I get no hits. If I find a document by other means and dump the
fields, the "is_pub" keyword is there, with value of "1".

Now, I've learned that if I change the field to contain the value "true"
instead of the string "1", this query:

	is_pub:true

works just fine.

So, I'm pretty sure I'm running afoul of the analyzer, right? The doc says
specifically that I should add keyword query clauses programmatically,
and I'm guessing that's what's wrong.

But can someone explain this? It sure is useful to be able to test this
sort of thing with the query parser. What is going on with the standard
analyzer that makes "true" work and "1" not work?

Is there a way around this other than by writing code to create the
query? This also applies to other types of query, like "pub_date:2004".

Hoping for enlightenment...

Thanks,

Fred


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Keyword query confusion

Posted by Erik Hatcher <er...@hatcher.net>.
On Sep 25, 2004, at 5:59 AM, Erik Hatcher wrote:
> On Sep 24, 2004, at 12:26 PM, Fred Toth wrote:
>> I'm trying to understand what's going on with the query parser
>> and keyword fields.
>
> It's a confusing situation, for sure.
>
>> I've got a large subset of my documents which are "publications".
>> So as to be able to query these, I've got this in the indexer:
>>
>> doc.add(Field.Keyword("is_pub", "1"));
>>
>> However, if I run a query:
>>
>> 	is_pub:1
>>
>> I get no hits. If I find a document by other means and dump the
>> fields, the "is_pub" keyword is there, with value of "1".
>
> As already stated - it is the analyzer eating the "1".  Every field is 
> analyzed by QueryParser, but during indexing Field.Keyword fields are 
> not indexed.

Typo correction:  "during indexing Field.Keyword fields are not" 
*analyzed*


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Keyword query confusion

Posted by Erik Hatcher <er...@hatcher.net>.
On Sep 24, 2004, at 12:26 PM, Fred Toth wrote:
> I'm trying to understand what's going on with the query parser
> and keyword fields.

It's a confusing situation, for sure.

> I've got a large subset of my documents which are "publications".
> So as to be able to query these, I've got this in the indexer:
>
> doc.add(Field.Keyword("is_pub", "1"));
>
> However, if I run a query:
>
> 	is_pub:1
>
> I get no hits. If I find a document by other means and dump the
> fields, the "is_pub" keyword is there, with value of "1".

As already stated - it is the analyzer eating the "1".  Every field is 
analyzed by QueryParser, but during indexing Field.Keyword fields are 
not indexed.

Search the archives for discussion on a KeywordAnalyzer and how to use 
it with PerFieldAnalyzerWrapper.  Also, the info here is valuable:

	http://wiki.apache.org/jakarta-lucene/AnalysisParalysis

Visualizing what an analyzer does and using Query.toString are both 
techniques to clearly point out what is happening.

> Now, I've learned that if I change the field to contain the value 
> "true"
> instead of the string "1", this query:
>
> 	is_pub:true
>
> works just fine.
>
> So, I'm pretty sure I'm running afoul of the analyzer, right? The doc 
> says
> specifically that I should add keyword query clauses programmatically,
> and I'm guessing that's what's wrong.

It really depends on your needs.  I personally wouldn't want end-users 
knowing to type "is_pub:true" into a search box.  Designing the most 
appropriate search interface for your situation is highly recommended.  
And in this case a checkbox for "Is published?" that translates into a 
TermQuery behind the scenes (likely combined with other pieces, perhaps 
a QueryParser parsed piece, using BooleanQuery).  TermQuery text is not 
analyzed, so you'd be safe there.

> But can someone explain this? It sure is useful to be able to test this
> sort of thing with the query parser. What is going on with the standard
> analyzer that makes "true" work and "1" not work?

Numbers get axed, that is what happens.

> Is there a way around this other than by writing code to create the
> query? This also applies to other types of query, like "pub_date:2004".

A PerFieldAnalyzerWrapper using WhitespaceAnalyzer for the "is_pub" 
field would do the trick in this case.

Again, users typing "pub_date:2004" seems awkward to me - make a year 
drop-down box if they need to select a year.

> Hoping for enlightenment...

Now that's a tall order... or is it?!  It's surrounding us all - we 
simply have to breath it in.

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


RE: Keyword query confusion

Posted by Aviran <am...@infosciences.com>.
The StandardAnalyzer removes the "1" as it is a stop word.
There are two ways you can work around this problem.
1 as you mentioned is to create a Query object programmatically.
2 You can use WhiteSpace Analyzer instead of StandardAnalyzer.

Aviran

-----Original Message-----
From: Fred Toth [mailto:ftoth@synernet.com] 
Sent: Friday, September 24, 2004 12:27 PM
To: lucene-user@jakarta.apache.org
Subject: Keyword query confusion


Hi all,

I'm trying to understand what's going on with the query parser and keyword
fields.

I've got a large subset of my documents which are "publications". So as to
be able to query these, I've got this in the indexer:

doc.add(Field.Keyword("is_pub", "1"));

However, if I run a query:

	is_pub:1

I get no hits. If I find a document by other means and dump the fields, the
"is_pub" keyword is there, with value of "1".

Now, I've learned that if I change the field to contain the value "true"
instead of the string "1", this query:

	is_pub:true

works just fine.

So, I'm pretty sure I'm running afoul of the analyzer, right? The doc says
specifically that I should add keyword query clauses programmatically, and
I'm guessing that's what's wrong.

But can someone explain this? It sure is useful to be able to test this sort
of thing with the query parser. What is going on with the standard analyzer
that makes "true" work and "1" not work?

Is there a way around this other than by writing code to create the query?
This also applies to other types of query, like "pub_date:2004".

Hoping for enlightenment...

Thanks,

Fred


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org