You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucenenet.apache.org by Björn Kremer <bk...@patorg.de> on 2012/04/17 11:59:22 UTC

Wildcard queries are not analyzed

Hello,


maybe I have found a little lucene problem: Wildcard queries are not 
analyzed correctly. I'm using the german analyzer with the 
'GermanDIN2Stemmer'.

In the lucene-index my name('Björn') is stored as 'bjorn'. If I performe 
a wildcard query like 'björ*' the function 'GetPrefixQuery' does not 
analyze the search term. So the query result is 'björ*' instead of 
'bjor*'. (björ* = no match, bjor* = match)


Thank You
Björn

Re: Wildcard queries are not analyzed

Posted by Björn Kremer <bk...@patorg.de>.

Hello,

I have added the issue: 
https://issues.apache.org/jira/browse/LUCENENET-486 A testproject is 
attached.

Björn

Am 17.04.2012 20:40, schrieb Christopher Currens:
> I should also add, that directly reading the token stream, will produce
> "bjor" (no wildcard) from "björ*".
>
> Björn,
>
> It would be great to see some example code that you're using to reproduce
> this behavior, just to make sure we're testing it in the same way.  Also,
> could I persuade you to create an issue for this here:
> https://issues.apache.org/jira/browse/LUCENENET, so that we can keep track
> of the progress on it?
>
> Thanks,
> Christopher
>
> On Tue, Apr 17, 2012 at 11:34 AM, Christopher Currens<
> currens.chris@gmail.com>  wrote:
>
>> Thanks Björn.
>>
>> So I've compared the code with the java equivalent, and the result from
>> java, via running the analyzer in the QueryParser:
>>
>> Field:björ*
>>
>> So, it seems to have the same behavior in Java as well.  I want to see if
>> this is a known issue or expected behavior in java, and go from there.  If
>> it is, can anyone think of any unexpected side effects to fixing this, so
>> "björ*" becomes "bjor*"?
>>
>>
>> Thanks,
>> Christopher
>>
>>
>> 2012/4/17 Björn Kremer<bk...@patorg.de>
>>
>>> Hello,
>>>
>>>
>>> maybe I have found a little lucene problem: Wildcard queries are not
>>> analyzed correctly. I'm using the german analyzer with the
>>> 'GermanDIN2Stemmer'.
>>>
>>> In the lucene-index my name('Björn') is stored as 'bjorn'. If I performe
>>> a wildcard query like 'björ*' the function 'GetPrefixQuery' does not
>>> analyze the search term. So the query result is 'björ*' instead of 'bjor*'.
>>> (björ* = no match, bjor* = match)
>>>
>>>
>>> Thank You
>>> Björn
>>>
>>>

Re: Wildcard queries are not analyzed

Posted by Christopher Currens <cu...@gmail.com>.

I should also add, that directly reading the token stream, will produce
"bjor" (no wildcard) from "björ*".

Björn,

It would be great to see some example code that you're using to reproduce
this behavior, just to make sure we're testing it in the same way.  Also,
could I persuade you to create an issue for this here:
https://issues.apache.org/jira/browse/LUCENENET, so that we can keep track
of the progress on it?

Thanks,
Christopher

On Tue, Apr 17, 2012 at 11:34 AM, Christopher Currens <
currens.chris@gmail.com> wrote:

> Thanks Björn.
>
> So I've compared the code with the java equivalent, and the result from
> java, via running the analyzer in the QueryParser:
>
> Field:björ*
>
> So, it seems to have the same behavior in Java as well.  I want to see if
> this is a known issue or expected behavior in java, and go from there.  If
> it is, can anyone think of any unexpected side effects to fixing this, so
> "björ*" becomes "bjor*"?
>
>
> Thanks,
> Christopher
>
>
> 2012/4/17 Björn Kremer <bk...@patorg.de>
>
>> Hello,
>>
>>
>> maybe I have found a little lucene problem: Wildcard queries are not
>> analyzed correctly. I'm using the german analyzer with the
>> 'GermanDIN2Stemmer'.
>>
>> In the lucene-index my name('Björn') is stored as 'bjorn'. If I performe
>> a wildcard query like 'björ*' the function 'GetPrefixQuery' does not
>> analyze the search term. So the query result is 'björ*' instead of 'bjor*'.
>> (björ* = no match, bjor* = match)
>>
>>
>> Thank You
>> Björn
>>
>>
>

Re: Wildcard queries are not analyzed

Posted by Christopher Currens <cu...@gmail.com>.

Thanks Björn.

So I've compared the code with the java equivalent, and the result from
java, via running the analyzer in the QueryParser:

Field:björ*

So, it seems to have the same behavior in Java as well.  I want to see if
this is a known issue or expected behavior in java, and go from there.  If
it is, can anyone think of any unexpected side effects to fixing this, so
"björ*" becomes "bjor*"?


Thanks,
Christopher


2012/4/17 Björn Kremer <bk...@patorg.de>

> Hello,
>
>
> maybe I have found a little lucene problem: Wildcard queries are not
> analyzed correctly. I'm using the german analyzer with the
> 'GermanDIN2Stemmer'.
>
> In the lucene-index my name('Björn') is stored as 'bjorn'. If I performe a
> wildcard query like 'björ*' the function 'GetPrefixQuery' does not analyze
> the search term. So the query result is 'björ*' instead of 'bjor*'. (björ*
> = no match, bjor* = match)
>
>
> Thank You
> Björn
>
>

Re: Wildcard queries are not analyzed

Posted by Björn Kremer <bk...@patorg.de>.

Hello,

of course this is a problem. But the current solution doesn't find words 
that are in the lucene indexe. Here are some samples: 
https://issues.apache.org/jira/browse/LUCENENET-486 And this is a real 
problem with a 'default' lucene analyzer. Not with a hypothetical ;)

Thank you
Björn

Am 17.04.2012 21:26, schrieb Digy:
> GetPrefixQuery doesn't use analyzers, and it is a well known issue of
> Lucene.
>
> Suppose a hypothetical analyzer(with stemming) which stems 'went' as 'go'
> and you want to search 'wentworth miller'.
> A search like 'went*' would be converted to 'go*' which i guess wouldn't be
> what you want.
>
> DIGY
>
>
> -----Original Message-----
> From: Björn Kremer [mailto:bkr@patorg.de]
> Sent: Tuesday, April 17, 2012 12:59 PM
> To: lucene-net-dev@lucene.apache.org
> Subject: Wildcard queries are not analyzed
>
> Hello,
>
>
> maybe I have found a little lucene problem: Wildcard queries are not
> analyzed correctly. I'm using the german analyzer with the
> 'GermanDIN2Stemmer'.
>
> In the lucene-index my name('Björn') is stored as 'bjorn'. If I performe a
> wildcard query like 'björ*' the function 'GetPrefixQuery' does not analyze
> the search term. So the query result is 'björ*' instead of 'bjor*'. (björ* =
> no match, bjor* = match)
>
>
> Thank You
> Björn
>
> -----
>
> Checked by AVG - www.avg.com
> Version: 2012.0.1913 / Virus Database: 2411/4940 - Release Date: 04/16/12
>

RE: Wildcard queries are not analyzed

Posted by Digy <di...@gmail.com>.

GetPrefixQuery doesn't use analyzers, and it is a well known issue of
Lucene.

Suppose a hypothetical analyzer(with stemming) which stems 'went' as 'go'
and you want to search 'wentworth miller'.
A search like 'went*' would be converted to 'go*' which i guess wouldn't be
what you want.

DIGY


-----Original Message-----
From: Björn Kremer [mailto:bkr@patorg.de] 
Sent: Tuesday, April 17, 2012 12:59 PM
To: lucene-net-dev@lucene.apache.org
Subject: Wildcard queries are not analyzed

Hello,


maybe I have found a little lucene problem: Wildcard queries are not
analyzed correctly. I'm using the german analyzer with the
'GermanDIN2Stemmer'.

In the lucene-index my name('Björn') is stored as 'bjorn'. If I performe a
wildcard query like 'björ*' the function 'GetPrefixQuery' does not analyze
the search term. So the query result is 'björ*' instead of 'bjor*'. (björ* =
no match, bjor* = match)


Thank You
Björn

-----

Checked by AVG - www.avg.com
Version: 2012.0.1913 / Virus Database: 2411/4940 - Release Date: 04/16/12