You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Mannott, Birgit" <B....@klopotek.de> on 2017/09/14 11:06:21 UTC

query with @ and *

Hi,

I have a problem when searching on email addresses.
@ seems to be handled as a special character but I don't find anything about it in the documentation.

This is my test data
test@one.com
test@two.com

searching for test* results both, ok.
searching for test@one.com results the correct one, ok.
searching for test results both, what I didn't expect but it's ok.
searching for test@one* results none and that's the problem.

Escaping the char @ doesn't change it.
It seems that every query containing @ and * has no result.

Has anyone an idea how to change this?

Thanks,
Birgit






Re: query with @ and *

Posted by Erick Erickson <er...@gmail.com>.
See: https://lucidworks.com/2011/11/29/whats-with-lowercasing-wildcard-multiterm-queries-in-solr/

It discusses the general problem of particular filters being able to
cope with wildcards or not. Generally any filter that could
potentially produce more than one output token per input token is
skipped when wildcards are encountered.

Best,
Erick

On Thu, Sep 14, 2017 at 6:26 AM, Susheel Kumar <su...@gmail.com> wrote:
> You may want to use UAX29URLEmailTokenizerFactory tokenizer into your
> analysis chain.
>
> Thanks,
> Susheel
>
>
> On Thu, Sep 14, 2017 at 8:46 AM, Shawn Heisey <ap...@elyograg.org> wrote:
>
>> On 9/14/2017 5:06 AM, Mannott, Birgit wrote:
>> > I have a problem when searching on email addresses.
>> > @ seems to be handled as a special character but I don't find anything
>> about it in the documentation.
>> >
>> > This is my test data
>> > test@one.com
>> > test@two.com
>>
>> Chances are that have analysis defined on this field, and that the
>> analysis includes a tokenizer or tokenizer/filter combination that
>> splits on punctuation.  This means that for the both entries, you have
>> three terms.  For the first one, those terms are test, one, and com.
>> For the second one, they are test,  two, and com.  The rest of what I'm
>> writing assumes that this is the case.
>>
>> > searching for test* results both, ok.
>>
>> This matches the term "test" in both entries.
>>
>> > searching for test@one.com results the correct one, ok.
>>
>> Query analysis probably splits the same way index analysis does, so the
>> actual search is for all three terms.
>>
>> > searching for test results both, what I didn't expect but it's ok.
>>
>> In this case, it matches the simple term "test" that's in the index on
>> both documents.
>>
>> > searching for test@one* results none and that's the problem.
>>
>> When you include wildcards in a query, most query analysis is skipped,
>> so it's looking for the literal text "test@one" followed by any
>> characters.  Because the index analysis removed the @ character and
>> split the things around it into separate terms, this will not match any
>> of the terms in the index.
>>
>> Wildcards, while they do work in many cases, are often not the correct
>> way to do queries.
>>
>> Thanks,
>> Shawn
>>
>>

Re: query with @ and *

Posted by Susheel Kumar <su...@gmail.com>.
You may want to use UAX29URLEmailTokenizerFactory tokenizer into your
analysis chain.

Thanks,
Susheel


On Thu, Sep 14, 2017 at 8:46 AM, Shawn Heisey <ap...@elyograg.org> wrote:

> On 9/14/2017 5:06 AM, Mannott, Birgit wrote:
> > I have a problem when searching on email addresses.
> > @ seems to be handled as a special character but I don't find anything
> about it in the documentation.
> >
> > This is my test data
> > test@one.com
> > test@two.com
>
> Chances are that have analysis defined on this field, and that the
> analysis includes a tokenizer or tokenizer/filter combination that
> splits on punctuation.  This means that for the both entries, you have
> three terms.  For the first one, those terms are test, one, and com.
> For the second one, they are test,  two, and com.  The rest of what I'm
> writing assumes that this is the case.
>
> > searching for test* results both, ok.
>
> This matches the term "test" in both entries.
>
> > searching for test@one.com results the correct one, ok.
>
> Query analysis probably splits the same way index analysis does, so the
> actual search is for all three terms.
>
> > searching for test results both, what I didn't expect but it's ok.
>
> In this case, it matches the simple term "test" that's in the index on
> both documents.
>
> > searching for test@one* results none and that's the problem.
>
> When you include wildcards in a query, most query analysis is skipped,
> so it's looking for the literal text "test@one" followed by any
> characters.  Because the index analysis removed the @ character and
> split the things around it into separate terms, this will not match any
> of the terms in the index.
>
> Wildcards, while they do work in many cases, are often not the correct
> way to do queries.
>
> Thanks,
> Shawn
>
>

Re: query with @ and *

Posted by Shawn Heisey <ap...@elyograg.org>.
On 9/14/2017 5:06 AM, Mannott, Birgit wrote:
> I have a problem when searching on email addresses.
> @ seems to be handled as a special character but I don't find anything about it in the documentation.
>
> This is my test data
> test@one.com
> test@two.com

Chances are that have analysis defined on this field, and that the
analysis includes a tokenizer or tokenizer/filter combination that
splits on punctuation.  This means that for the both entries, you have
three terms.  For the first one, those terms are test, one, and com. 
For the second one, they are test,  two, and com.  The rest of what I'm
writing assumes that this is the case.

> searching for test* results both, ok.

This matches the term "test" in both entries.

> searching for test@one.com results the correct one, ok.

Query analysis probably splits the same way index analysis does, so the 
actual search is for all three terms.

> searching for test results both, what I didn't expect but it's ok.

In this case, it matches the simple term "test" that's in the index on
both documents.

> searching for test@one* results none and that's the problem.

When you include wildcards in a query, most query analysis is skipped, 
so it's looking for the literal text "test@one" followed by any
characters.  Because the index analysis removed the @ character and
split the things around it into separate terms, this will not match any
of the terms in the index.

Wildcards, while they do work in many cases, are often not the correct
way to do queries.

Thanks,
Shawn


Re: query with @ and *

Posted by Atita Arora <at...@gmail.com>.
Hi,

Can you give us a little information about the query parser you using in
your handler ?

Thanks,
Ati


On Thu, Sep 14, 2017 at 4:36 PM, Mannott, Birgit <B....@klopotek.de>
wrote:

> Hi,
>
> I have a problem when searching on email addresses.
> @ seems to be handled as a special character but I don't find anything
> about it in the documentation.
>
> This is my test data
> test@one.com
> test@two.com
>
> searching for test* results both, ok.
> searching for test@one.com results the correct one, ok.
> searching for test results both, what I didn't expect but it's ok.
> searching for test@one* results none and that's the problem.
>
> Escaping the char @ doesn't change it.
> It seems that every query containing @ and * has no result.
>
> Has anyone an idea how to change this?
>
> Thanks,
> Birgit
>
>
>
>
>
>