You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Paul Taylor <pa...@fastmail.fm> on 2012/02/01 22:32:44 UTC

When does Query Parser do its analysis ?

So I subclass Query Parser and give it query

dug up

then debugging shows it calls getFieldQuery(String field, String 
queryText, boolean quoted) twice
once with

queryText=dug

and one with

queryText=up

but then when I run it with query dúg up the first call is

queryText=dúg

even though the analyser I use remove accents

So it seems like it just broke the text up at spaces, and does text 
analysis within getFieldQuery(), but how can it make the assumption that 
text should only be broken at whitespace ?
This seemed to be confirmed that when i pass it query 'dug/up' it just 
passes it as one string, but then its seems to get converted to 'dug up' 
within the getFieldQuery()


Sorry I don't get it.

Paul





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: When does Query Parser do its analysis ?

Posted by Chris Hostetter <ho...@fucit.org>.

: So it seems like it just broke the text up at spaces, and does text analysis
: within getFieldQuery(), but how can it make the assumption that text should
: only be broken at whitespace ?

whitespace is a significant metacharacter to the Queryparser - it is used 
to distinguish multiple clauses of a BooleanQuery.

if you want whitepace to be treated as a literal part of the query, you 
need to either escape it, or quote it...

  dug\ up
  "dug up"

: This seemed to be confirmed that when i pass it query 'dug/up' it just passes
: it as one string, but then its seems to get converted to 'dug up' within the
: getFieldQuery()

getFieldQuery is responsible for calling the analyzer - so in your 
'dug/up' example the analyzer you are using in your QueryParser instance 
is evidently tokenizing on "/"


-Hoss

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: When does Query Parser do its analysis ?

Posted by Paul Taylor <pa...@fastmail.fm>.

On 02/02/2012 07:27, Doron Cohen wrote:
>
>     In my particular case I add album catalogsno to my index as a
>     keyword field , but of course if the cat log number contains a
>     space as they often do (i.e. cad 6) there is a mismatch. Ive now
>     changed my indexing to index the value as 'cad6' removing spaces.
>     Now if the query sent to the query parser is just
>
>     cad 6
>
>      there is the issue that it breaks them up into two separate
>     fields , but I thought it that if the query sent to the parser was
>
>     "cad 6"
>
>     then the complete string would be passed using the analyzer , but
>     it doesn't seem to quite work, it creates a TermQuery instead of a
>     PhraseQuery , yet the explain shows the query to have the value
>
>     catno:cad 6
>
>     rather than
>
>     catno:cad6
>
>     and I dont get a match, what does that mean ?
>
>
> Seems like at query time a KeywordAnalyzer was applied, while at 
> indexing time additional logic of removing spaces was (first) applied, 
> therefore the different results at indexing and search.
>
> Doron
Hi, sort of I had an error in the reusableTokenStream() method of my 
analyzer, so it wasn't doing the full analysis at query time, working now.

thanks Paul

Re: When does Query Parser do its analysis ?

Posted by Doron Cohen <cd...@gmail.com>.

>
> In my particular case I add album catalogsno to my index as a keyword
> field , but of course if the cat log number contains a space as they often
> do (i.e. cad 6) there is a mismatch. Ive now changed my indexing to index
> the value as 'cad6' removing spaces. Now if the query sent to the query
> parser is just
>
> cad 6
>
>  there is the issue that it breaks them up into two separate fields , but
> I thought it that if the query sent to the parser was
>
> "cad 6"
>
> then the complete string would be passed using the analyzer , but it
> doesn't seem to quite work, it creates a TermQuery instead of a PhraseQuery
> , yet the explain shows the query to have the value
>
> catno:cad 6
>
> rather than
>
> catno:cad6
>
> and I dont get a match, what does that mean ?


Seems like at query time a KeywordAnalyzer was applied, while at indexing
time additional logic of removing spaces was (first) applied, therefore the
different results at indexing and search.

Doron

Re: When does Query Parser do its analysis ?

Posted by Paul Taylor <pa...@fastmail.fm>.

On 01/02/2012 22:03, Robert Muir wrote:
> On Wed, Feb 1, 2012 at 4:32 PM, Paul Taylor<pa...@fastmail.fm>  wrote:
>> So it seems like it just broke the text up at spaces, and does text analysis
>> within getFieldQuery(), but how can it make the assumption that text should
>> only be broken at whitespace ?
> you are right, see this bug report:
> https://issues.apache.org/jira/browse/LUCENE-2605
>
I've voted on it, although reading the Hoss Mans reply I understand the 
issue.

In my particular case I add album catalogsno to my index as a keyword 
field , but of course if the cat log number contains a space as they 
often do (i.e. cad 6) there is a mismatch. Ive now changed my indexing 
to index the value as 'cad6' removing spaces. Now if the query sent to 
the query parser is just

cad 6

  there is the issue that it breaks them up into two separate fields , 
but I thought it that if the query sent to the parser was

"cad 6"

then the complete string would be passed using the analyzer , but it 
doesn't seem to quite work, it creates a TermQuery instead of a 
PhraseQuery , yet the explain shows the query to have the value

catno:cad 6

rather than

catno:cad6

and I dont get a match, what does that mean ?

Paul

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: When does Query Parser do its analysis ?

Posted by Robert Muir <rc...@gmail.com>.

On Wed, Feb 1, 2012 at 4:32 PM, Paul Taylor <pa...@fastmail.fm> wrote:
>
> So it seems like it just broke the text up at spaces, and does text analysis
> within getFieldQuery(), but how can it make the assumption that text should
> only be broken at whitespace ?

you are right, see this bug report:
https://issues.apache.org/jira/browse/LUCENE-2605

-- 
lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org