You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Vadim Kisselmann <v....@googlemail.com> on 2011/10/13 16:26:03 UTC

Strange result behavior with wildcard-queries

Hello folks,
my wildcard-search shows strange behavior.
Sometimes i have results, sometimes not.

I use the last nightly build(Solr 4.0, Build #1643)

I use this filters and tokenizers to "index":
WhitespaceTokenizer
WoldDelimiterFilter
LowerCaseFilter
RemoveDuplicateTokenFilter
ReversedWildcardFilter

This one for "query":
WhitespaceTokenizer
WoldDelimiterFilter
LowerCaseFilter
RemoDuplicatesTokenFilter

This is my search-text:
http://www.redakteur.eu/?p=89025

And my words:

Schnelle DNA-Analyse führt zu taktischen Vorsprüngen im Gefechtseinsatz
*DNA-Analyse AND Gefechtseinsatz*

Behavior and Query:

DNA-Analyse AND Gefechtseinsatz (1 MATCH)
DNA-Analyse AND Gefechts* (1 MATCH)
DNA-Analyse AND ge* (1 MATCH)

DNA* (0 MATCHES)
dna* (6 MATCHES)

Gefechts* (0 MATCHES)
gefecht* (6 MATCHES)

*seinsatz (1 MATCH)
*NA-Analyse (0 MATCHES)
*na-Analyse (0 MATCHES)
*na-analyse (0 MATCHES)

Which filters can help me to match this one result with standard-wildquery
und reversed-wildquery?

Thanks and regards
Vadim

Re: Strange result behavior with wildcard-queries

Posted by Vadim Kisselmann <v....@googlemail.com>.
Hi Erick,

thanks for your quick response.
I've analyzed it and i've already thought the same.

This are the JIRA-Issues:
https://issues.apache.org/jira/browse/SOLR-219
https://issues.apache.org/jira/browse/SOLR-2438
Both are still open.

I think i wait 1-2 months. Then i write a custom component:)

Regards
Vadim




2011/10/13 Erick Erickson <er...@gmail.com>

> Wildcard queries are NOT analyzed. So the fact that your
> queries that are identical except for case produce different
> result sets is expected behavior. I believe there's a JIRA to
> allow limited analysis of wildcard queries, but I confess I
> don't know what the status of it is.
>
> You'll have to do whatever normalization you need to do at
> the app level before you pass the query on to Solr or write a
> custom component to deal with this case I think.
>
> Best
> Erick
>
> On Thu, Oct 13, 2011 at 10:26 AM, Vadim Kisselmann
> <v....@googlemail.com> wrote:
> > Hello folks,
> > my wildcard-search shows strange behavior.
> > Sometimes i have results, sometimes not.
> >
> > I use the last nightly build(Solr 4.0, Build #1643)
> >
> > I use this filters and tokenizers to "index":
> > WhitespaceTokenizer
> > WoldDelimiterFilter
> > LowerCaseFilter
> > RemoveDuplicateTokenFilter
> > ReversedWildcardFilter
> >
> > This one for "query":
> > WhitespaceTokenizer
> > WoldDelimiterFilter
> > LowerCaseFilter
> > RemoDuplicatesTokenFilter
> >
> > This is my search-text:
> > http://www.redakteur.eu/?p=89025
> >
> > And my words:
> >
> > Schnelle DNA-Analyse führt zu taktischen Vorsprüngen im Gefechtseinsatz
> > *DNA-Analyse AND Gefechtseinsatz*
> >
> > Behavior and Query:
> >
> > DNA-Analyse AND Gefechtseinsatz (1 MATCH)
> > DNA-Analyse AND Gefechts* (1 MATCH)
> > DNA-Analyse AND ge* (1 MATCH)
> >
> > DNA* (0 MATCHES)
> > dna* (6 MATCHES)
> >
> > Gefechts* (0 MATCHES)
> > gefecht* (6 MATCHES)
> >
> > *seinsatz (1 MATCH)
> > *NA-Analyse (0 MATCHES)
> > *na-Analyse (0 MATCHES)
> > *na-analyse (0 MATCHES)
> >
> > Which filters can help me to match this one result with
> standard-wildquery
> > und reversed-wildquery?
> >
> > Thanks and regards
> > Vadim
> >
>

Re: Strange result behavior with wildcard-queries

Posted by Erick Erickson <er...@gmail.com>.
Wildcard queries are NOT analyzed. So the fact that your
queries that are identical except for case produce different
result sets is expected behavior. I believe there's a JIRA to
allow limited analysis of wildcard queries, but I confess I
don't know what the status of it is.

You'll have to do whatever normalization you need to do at
the app level before you pass the query on to Solr or write a
custom component to deal with this case I think.

Best
Erick

On Thu, Oct 13, 2011 at 10:26 AM, Vadim Kisselmann
<v....@googlemail.com> wrote:
> Hello folks,
> my wildcard-search shows strange behavior.
> Sometimes i have results, sometimes not.
>
> I use the last nightly build(Solr 4.0, Build #1643)
>
> I use this filters and tokenizers to "index":
> WhitespaceTokenizer
> WoldDelimiterFilter
> LowerCaseFilter
> RemoveDuplicateTokenFilter
> ReversedWildcardFilter
>
> This one for "query":
> WhitespaceTokenizer
> WoldDelimiterFilter
> LowerCaseFilter
> RemoDuplicatesTokenFilter
>
> This is my search-text:
> http://www.redakteur.eu/?p=89025
>
> And my words:
>
> Schnelle DNA-Analyse führt zu taktischen Vorsprüngen im Gefechtseinsatz
> *DNA-Analyse AND Gefechtseinsatz*
>
> Behavior and Query:
>
> DNA-Analyse AND Gefechtseinsatz (1 MATCH)
> DNA-Analyse AND Gefechts* (1 MATCH)
> DNA-Analyse AND ge* (1 MATCH)
>
> DNA* (0 MATCHES)
> dna* (6 MATCHES)
>
> Gefechts* (0 MATCHES)
> gefecht* (6 MATCHES)
>
> *seinsatz (1 MATCH)
> *NA-Analyse (0 MATCHES)
> *na-Analyse (0 MATCHES)
> *na-analyse (0 MATCHES)
>
> Which filters can help me to match this one result with standard-wildquery
> und reversed-wildquery?
>
> Thanks and regards
> Vadim
>