You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Ribeaud, Christian (Ext)" <ch...@novartis.com> on 2016/08/11 13:42:36 UTC

Wildcard search not working

Hi,

What would be the reasons making the wildcard search for Lucene Query Parser NOT working?

We are using Solr 5.4.1 and, using the admin console, I am triggering for instance searches with term 'roche' in a specific core. Everything fine, I am getting for instance two matches. I would expect at least the same number of matches with term 'r?che'. However, this does NOT happen. I am getting zero matches. Same problem occurs with 'r*che'. 'roch?' does not work neither but 'roch*' works.

Switching debug mode brings following output:

"debug": {
    "rawquerystring": "roch?",
    "querystring": "roch?",
    "parsedquery": "text:roch?",
    "parsedquery_toString": "text:roch?",
    "explain": {},
    "QParser": "LuceneQParser",
...

Any idea? Thanks and cheers,

christian


RE: Wildcard search not working

Posted by "Ribeaud, Christian (Ext)" <ch...@novartis.com>.
Hi Ahmet, Hi Upayavira,

OK, it seems that I have to dive a bit deeper in the Solr filters and tokenizers. I've just realized that my command there is too limited.
Thanks a lot guys so far for help. Cheers and have a nice day,

christian

-----Original Message-----
From: Ahmet Arslan [mailto:iorixxx@yahoo.com] 
Sent: Freitag, 12. August 2016 07:41
To: solr-user@lucene.apache.org; Ribeaud, Christian (Ext)
Subject: Re: Wildcard search not working

Hi Christian,

Please use the following filter before/above the stemmer.
<filter class="solr.KeywordRepeatFilterFactory"/>

Plus, you may want to add :

<analyzer type="multiterm">
  <tokenizer class="solr.KeywordTokenizerFactory" />
  <filter class="solr.LowerCaseFilterFactory"/>
  <filter class="solr.GermanNormalizationFilterFactory"/></analyzer>

Ahmet



On Thursday, August 11, 2016 9:31 PM, "Ribeaud, Christian (Ext)" <ch...@novartis.com> wrote:
Hi Ahmet,

Many thanks for your reply. I had a look at the URL you pointed out but, honestly, I have to admit that I did not fully understand you.
Let's be a bit more concrete. Following the schema snippet for the corresponding field:

...
<field name="title" type="text_de" indexed="true" stored="true" required="false" multiValued="false" />

<!-- German -->
<fieldType name="text_de" class="solr.TextField" positionIncrementGap="100">
    <analyzer> 
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_de.txt" format="snowball" />
        <filter class="solr.GermanNormalizationFilterFactory"/>
        <filter class="solr.GermanLightStemFilterFactory"/>
        <!-- less aggressive: <filter class="solr.GermanMinimalStemFilterFactory"/> -->
        <!-- more aggressive: <filter class="solr.SnowballPorterFilterFactory" language="German2"/> -->
    </analyzer>
</fieldType>
...

What is wrong with this schema? Respectively, what should I change to be able to correctly do wildcard searches?

Many thanks for your time. Cheers,

christian
--
Christian Ribeaud
Software Engineer (External)
NIBR / WSJ-310.5.17
Novartis Campus
CH-4056 Basel



-----Original Message-----
From: Ahmet Arslan [mailto:iorixxx@yahoo.com] 
Sent: Donnerstag, 11. August 2016 16:00
To: solr-user@lucene.apache.org; Ribeaud, Christian (Ext)
Subject: Re: Wildcard search not working

Hi Chiristian,

The query r?che may not return at least the same number of matches as roche depending on your analysis chain.
The difference is roche is analyzed but r?che don't. Wildcard queries are executed on the indexed/analyzed terms.
For example, if roche is indexed/analyzed as roch, the query r?che won't match it.

Please see : https://wiki.apache.org/solr/MultitermQueryAnalysis

Ahmet



On Thursday, August 11, 2016 4:42 PM, "Ribeaud, Christian (Ext)" <ch...@novartis.com> wrote:
Hi,

What would be the reasons making the wildcard search for Lucene Query Parser NOT working?

We are using Solr 5.4.1 and, using the admin console, I am triggering for instance searches with term 'roche' in a specific core. Everything fine, I am getting for instance two matches. I would expect at least the same number of matches with term 'r?che'. However, this does NOT happen. I am getting zero matches. Same problem occurs with 'r*che'. 'roch?' does not work neither but 'roch*' works.

Switching debug mode brings following output:

"debug": {
    "rawquerystring": "roch?",
    "querystring": "roch?",
    "parsedquery": "text:roch?",
    "parsedquery_toString": "text:roch?",
    "explain": {},
    "QParser": "LuceneQParser",
...

Any idea? Thanks and cheers,

christian

Re: Wildcard search not working

Posted by Ahmet Arslan <io...@yahoo.com.INVALID>.
Hi Christian,

Please use the following filter before/above the stemmer.
<filter class="solr.KeywordRepeatFilterFactory"/>

Plus, you may want to add :

<analyzer type="multiterm">
  <tokenizer class="solr.KeywordTokenizerFactory" />
  <filter class="solr.LowerCaseFilterFactory"/>
  <filter class="solr.GermanNormalizationFilterFactory"/></analyzer>

Ahmet



On Thursday, August 11, 2016 9:31 PM, "Ribeaud, Christian (Ext)" <ch...@novartis.com> wrote:
Hi Ahmet,

Many thanks for your reply. I had a look at the URL you pointed out but, honestly, I have to admit that I did not fully understand you.
Let's be a bit more concrete. Following the schema snippet for the corresponding field:

...
<field name="title" type="text_de" indexed="true" stored="true" required="false" multiValued="false" />

<!-- German -->
<fieldType name="text_de" class="solr.TextField" positionIncrementGap="100">
    <analyzer> 
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_de.txt" format="snowball" />
        <filter class="solr.GermanNormalizationFilterFactory"/>
        <filter class="solr.GermanLightStemFilterFactory"/>
        <!-- less aggressive: <filter class="solr.GermanMinimalStemFilterFactory"/> -->
        <!-- more aggressive: <filter class="solr.SnowballPorterFilterFactory" language="German2"/> -->
    </analyzer>
</fieldType>
...

What is wrong with this schema? Respectively, what should I change to be able to correctly do wildcard searches?

Many thanks for your time. Cheers,

christian
--
Christian Ribeaud
Software Engineer (External)
NIBR / WSJ-310.5.17
Novartis Campus
CH-4056 Basel



-----Original Message-----
From: Ahmet Arslan [mailto:iorixxx@yahoo.com] 
Sent: Donnerstag, 11. August 2016 16:00
To: solr-user@lucene.apache.org; Ribeaud, Christian (Ext)
Subject: Re: Wildcard search not working

Hi Chiristian,

The query r?che may not return at least the same number of matches as roche depending on your analysis chain.
The difference is roche is analyzed but r?che don't. Wildcard queries are executed on the indexed/analyzed terms.
For example, if roche is indexed/analyzed as roch, the query r?che won't match it.

Please see : https://wiki.apache.org/solr/MultitermQueryAnalysis

Ahmet



On Thursday, August 11, 2016 4:42 PM, "Ribeaud, Christian (Ext)" <ch...@novartis.com> wrote:
Hi,

What would be the reasons making the wildcard search for Lucene Query Parser NOT working?

We are using Solr 5.4.1 and, using the admin console, I am triggering for instance searches with term 'roche' in a specific core. Everything fine, I am getting for instance two matches. I would expect at least the same number of matches with term 'r?che'. However, this does NOT happen. I am getting zero matches. Same problem occurs with 'r*che'. 'roch?' does not work neither but 'roch*' works.

Switching debug mode brings following output:

"debug": {
    "rawquerystring": "roch?",
    "querystring": "roch?",
    "parsedquery": "text:roch?",
    "parsedquery_toString": "text:roch?",
    "explain": {},
    "QParser": "LuceneQParser",
...

Any idea? Thanks and cheers,

christian

Re: Wildcard search not working

Posted by Upayavira <uv...@odoko.co.uk>.
You have a stemming filter in your analysis chain. Go to the analysis
tab, select the 'text' field, and put "Roche" into both boxes. Click
analyse. I bet you you will see Roch, not Roche, because of your
stemming filter shown below.

That's what Ahmet shrewdly identified above.

Upayavira

On Thu, 11 Aug 2016, at 08:31 PM, Ribeaud, Christian (Ext) wrote:
> Hi Ahmet,
> 
> Many thanks for your reply. I had a look at the URL you pointed out but,
> honestly, I have to admit that I did not fully understand you.
> Let's be a bit more concrete. Following the schema snippet for the
> corresponding field:
> 
> ...
> <field name="title" type="text_de" indexed="true" stored="true"
> required="false" multiValued="false" />
> 
> <!-- German -->
> <fieldType name="text_de" class="solr.TextField"
> positionIncrementGap="100">
>     <analyzer> 
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
>         words="lang/stopwords_de.txt" format="snowball" />
>         <filter class="solr.GermanNormalizationFilterFactory"/>
>         <filter class="solr.GermanLightStemFilterFactory"/>
>         <!-- less aggressive: <filter
>         class="solr.GermanMinimalStemFilterFactory"/> -->
>         <!-- more aggressive: <filter
>         class="solr.SnowballPorterFilterFactory" language="German2"/> -->
>     </analyzer>
> </fieldType>
> ...
> 
> What is wrong with this schema? Respectively, what should I change to be
> able to correctly do wildcard searches?
> 
> Many thanks for your time. Cheers,
> 
> christian
> --
> Christian Ribeaud
> Software Engineer (External)
> NIBR / WSJ-310.5.17
> Novartis Campus
> CH-4056 Basel
> 
> 
> -----Original Message-----
> From: Ahmet Arslan [mailto:iorixxx@yahoo.com] 
> Sent: Donnerstag, 11. August 2016 16:00
> To: solr-user@lucene.apache.org; Ribeaud, Christian (Ext)
> Subject: Re: Wildcard search not working
> 
> Hi Chiristian,
> 
> The query r?che may not return at least the same number of matches as
> roche depending on your analysis chain.
> The difference is roche is analyzed but r?che don't. Wildcard queries are
> executed on the indexed/analyzed terms.
> For example, if roche is indexed/analyzed as roch, the query r?che won't
> match it.
> 
> Please see : https://wiki.apache.org/solr/MultitermQueryAnalysis
> 
> Ahmet
> 
> 
> 
> On Thursday, August 11, 2016 4:42 PM, "Ribeaud, Christian (Ext)"
> <ch...@novartis.com> wrote:
> Hi,
> 
> What would be the reasons making the wildcard search for Lucene Query
> Parser NOT working?
> 
> We are using Solr 5.4.1 and, using the admin console, I am triggering for
> instance searches with term 'roche' in a specific core. Everything fine,
> I am getting for instance two matches. I would expect at least the same
> number of matches with term 'r?che'. However, this does NOT happen. I am
> getting zero matches. Same problem occurs with 'r*che'. 'roch?' does not
> work neither but 'roch*' works.
> 
> Switching debug mode brings following output:
> 
> "debug": {
>     "rawquerystring": "roch?",
>     "querystring": "roch?",
>     "parsedquery": "text:roch?",
>     "parsedquery_toString": "text:roch?",
>     "explain": {},
>     "QParser": "LuceneQParser",
> ...
> 
> Any idea? Thanks and cheers,
> 
> christian

RE: Wildcard search not working

Posted by "Ribeaud, Christian (Ext)" <ch...@novartis.com>.
Hi Ahmet,

Many thanks for your reply. I had a look at the URL you pointed out but, honestly, I have to admit that I did not fully understand you.
Let's be a bit more concrete. Following the schema snippet for the corresponding field:

...
<field name="title" type="text_de" indexed="true" stored="true" required="false" multiValued="false" />

<!-- German -->
<fieldType name="text_de" class="solr.TextField" positionIncrementGap="100">
    <analyzer> 
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_de.txt" format="snowball" />
        <filter class="solr.GermanNormalizationFilterFactory"/>
        <filter class="solr.GermanLightStemFilterFactory"/>
        <!-- less aggressive: <filter class="solr.GermanMinimalStemFilterFactory"/> -->
        <!-- more aggressive: <filter class="solr.SnowballPorterFilterFactory" language="German2"/> -->
    </analyzer>
</fieldType>
...

What is wrong with this schema? Respectively, what should I change to be able to correctly do wildcard searches?

Many thanks for your time. Cheers,

christian
--
Christian Ribeaud
Software Engineer (External)
NIBR / WSJ-310.5.17
Novartis Campus
CH-4056 Basel


-----Original Message-----
From: Ahmet Arslan [mailto:iorixxx@yahoo.com] 
Sent: Donnerstag, 11. August 2016 16:00
To: solr-user@lucene.apache.org; Ribeaud, Christian (Ext)
Subject: Re: Wildcard search not working

Hi Chiristian,

The query r?che may not return at least the same number of matches as roche depending on your analysis chain.
The difference is roche is analyzed but r?che don't. Wildcard queries are executed on the indexed/analyzed terms.
For example, if roche is indexed/analyzed as roch, the query r?che won't match it.

Please see : https://wiki.apache.org/solr/MultitermQueryAnalysis

Ahmet



On Thursday, August 11, 2016 4:42 PM, "Ribeaud, Christian (Ext)" <ch...@novartis.com> wrote:
Hi,

What would be the reasons making the wildcard search for Lucene Query Parser NOT working?

We are using Solr 5.4.1 and, using the admin console, I am triggering for instance searches with term 'roche' in a specific core. Everything fine, I am getting for instance two matches. I would expect at least the same number of matches with term 'r?che'. However, this does NOT happen. I am getting zero matches. Same problem occurs with 'r*che'. 'roch?' does not work neither but 'roch*' works.

Switching debug mode brings following output:

"debug": {
    "rawquerystring": "roch?",
    "querystring": "roch?",
    "parsedquery": "text:roch?",
    "parsedquery_toString": "text:roch?",
    "explain": {},
    "QParser": "LuceneQParser",
...

Any idea? Thanks and cheers,

christian

Re: Wildcard search not working

Posted by Ahmet Arslan <io...@yahoo.com.INVALID>.
Hi Chiristian,

The query r?che may not return at least the same number of matches as roche depending on your analysis chain.
The difference is roche is analyzed but r?che don't. Wildcard queries are executed on the indexed/analyzed terms.
For example, if roche is indexed/analyzed as roch, the query r?che won't match it.

Please see : https://wiki.apache.org/solr/MultitermQueryAnalysis

Ahmet



On Thursday, August 11, 2016 4:42 PM, "Ribeaud, Christian (Ext)" <ch...@novartis.com> wrote:
Hi,

What would be the reasons making the wildcard search for Lucene Query Parser NOT working?

We are using Solr 5.4.1 and, using the admin console, I am triggering for instance searches with term 'roche' in a specific core. Everything fine, I am getting for instance two matches. I would expect at least the same number of matches with term 'r?che'. However, this does NOT happen. I am getting zero matches. Same problem occurs with 'r*che'. 'roch?' does not work neither but 'roch*' works.

Switching debug mode brings following output:

"debug": {
    "rawquerystring": "roch?",
    "querystring": "roch?",
    "parsedquery": "text:roch?",
    "parsedquery_toString": "text:roch?",
    "explain": {},
    "QParser": "LuceneQParser",
...

Any idea? Thanks and cheers,

christian