You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Paul Taylor <pa...@fastmail.fm> on 2012/02/01 22:32:44 UTC
When does Query Parser do its analysis ?
So I subclass Query Parser and give it query
dug up
then debugging shows it calls getFieldQuery(String field, String
queryText, boolean quoted) twice
once with
queryText=dug
and one with
queryText=up
but then when I run it with query dúg up the first call is
queryText=dúg
even though the analyser I use remove accents
So it seems like it just broke the text up at spaces, and does text
analysis within getFieldQuery(), but how can it make the assumption that
text should only be broken at whitespace ?
This seemed to be confirmed that when i pass it query 'dug/up' it just
passes it as one string, but then its seems to get converted to 'dug up'
within the getFieldQuery()
Sorry I don't get it.
Paul
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: When does Query Parser do its analysis ?
Posted by Chris Hostetter <ho...@fucit.org>.
: So it seems like it just broke the text up at spaces, and does text analysis
: within getFieldQuery(), but how can it make the assumption that text should
: only be broken at whitespace ?
whitespace is a significant metacharacter to the Queryparser - it is used
to distinguish multiple clauses of a BooleanQuery.
if you want whitepace to be treated as a literal part of the query, you
need to either escape it, or quote it...
dug\ up
"dug up"
: This seemed to be confirmed that when i pass it query 'dug/up' it just passes
: it as one string, but then its seems to get converted to 'dug up' within the
: getFieldQuery()
getFieldQuery is responsible for calling the analyzer - so in your
'dug/up' example the analyzer you are using in your QueryParser instance
is evidently tokenizing on "/"
-Hoss
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: When does Query Parser do its analysis ?
Posted by Paul Taylor <pa...@fastmail.fm>.
On 02/02/2012 07:27, Doron Cohen wrote:
>
> In my particular case I add album catalogsno to my index as a
> keyword field , but of course if the cat log number contains a
> space as they often do (i.e. cad 6) there is a mismatch. Ive now
> changed my indexing to index the value as 'cad6' removing spaces.
> Now if the query sent to the query parser is just
>
> cad 6
>
> there is the issue that it breaks them up into two separate
> fields , but I thought it that if the query sent to the parser was
>
> "cad 6"
>
> then the complete string would be passed using the analyzer , but
> it doesn't seem to quite work, it creates a TermQuery instead of a
> PhraseQuery , yet the explain shows the query to have the value
>
> catno:cad 6
>
> rather than
>
> catno:cad6
>
> and I dont get a match, what does that mean ?
>
>
> Seems like at query time a KeywordAnalyzer was applied, while at
> indexing time additional logic of removing spaces was (first) applied,
> therefore the different results at indexing and search.
>
> Doron
Hi, sort of I had an error in the reusableTokenStream() method of my
analyzer, so it wasn't doing the full analysis at query time, working now.
thanks Paul
Re: When does Query Parser do its analysis ?
Posted by Doron Cohen <cd...@gmail.com>.
>
> In my particular case I add album catalogsno to my index as a keyword
> field , but of course if the cat log number contains a space as they often
> do (i.e. cad 6) there is a mismatch. Ive now changed my indexing to index
> the value as 'cad6' removing spaces. Now if the query sent to the query
> parser is just
>
> cad 6
>
> there is the issue that it breaks them up into two separate fields , but
> I thought it that if the query sent to the parser was
>
> "cad 6"
>
> then the complete string would be passed using the analyzer , but it
> doesn't seem to quite work, it creates a TermQuery instead of a PhraseQuery
> , yet the explain shows the query to have the value
>
> catno:cad 6
>
> rather than
>
> catno:cad6
>
> and I dont get a match, what does that mean ?
Seems like at query time a KeywordAnalyzer was applied, while at indexing
time additional logic of removing spaces was (first) applied, therefore the
different results at indexing and search.
Doron
Re: When does Query Parser do its analysis ?
Posted by Paul Taylor <pa...@fastmail.fm>.
On 01/02/2012 22:03, Robert Muir wrote:
> On Wed, Feb 1, 2012 at 4:32 PM, Paul Taylor<pa...@fastmail.fm> wrote:
>> So it seems like it just broke the text up at spaces, and does text analysis
>> within getFieldQuery(), but how can it make the assumption that text should
>> only be broken at whitespace ?
> you are right, see this bug report:
> https://issues.apache.org/jira/browse/LUCENE-2605
>
I've voted on it, although reading the Hoss Mans reply I understand the
issue.
In my particular case I add album catalogsno to my index as a keyword
field , but of course if the cat log number contains a space as they
often do (i.e. cad 6) there is a mismatch. Ive now changed my indexing
to index the value as 'cad6' removing spaces. Now if the query sent to
the query parser is just
cad 6
there is the issue that it breaks them up into two separate fields ,
but I thought it that if the query sent to the parser was
"cad 6"
then the complete string would be passed using the analyzer , but it
doesn't seem to quite work, it creates a TermQuery instead of a
PhraseQuery , yet the explain shows the query to have the value
catno:cad 6
rather than
catno:cad6
and I dont get a match, what does that mean ?
Paul
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: When does Query Parser do its analysis ?
Posted by Robert Muir <rc...@gmail.com>.
On Wed, Feb 1, 2012 at 4:32 PM, Paul Taylor <pa...@fastmail.fm> wrote:
>
> So it seems like it just broke the text up at spaces, and does text analysis
> within getFieldQuery(), but how can it make the assumption that text should
> only be broken at whitespace ?
you are right, see this bug report:
https://issues.apache.org/jira/browse/LUCENE-2605
--
lucidimagination.com
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org