You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by gateway0 <re...@yahoo.de> on 2009/07/07 10:40:05 UTC

Can´t use wildcard "*" on alphanumeric values?

Hi,

I indexed my data and defined a defaultsearchfield named "text:" (<field
name="text" type="text" indexed="true" stored="false" multiValued="true"/>).

I copied all my other field values into that field. Now my problem:

Lets say I have 2 values indexed 
1.value "ABCD"
2.value "ABCD3456"

Now when I do a wildcard search over that two values the following happens:
- query:"q=AB*" => All two values are returned "ABCD" and "ABCD3456" =>
wildcard is functioning!
- query:"q=ABCD3*" => No results are returned! (expected: "ABCD3456") =>
wildcard does not function!

Am I doing something wrong? Is there a way to use wildcards on alphanumeric
values?

(offtopic: How is for example google dealing with a problem like that, are
they hiding the wildcards from the user)

kind regards Sebastian
-- 
View this message in context: http://www.nabble.com/Can%C2%B4t-use-wildcard-%22*%22-on-alphanumeric-values--tp24369209p24369209.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Can´t use wildcard "*" on alphanumeric values?

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.

On Tue, Jul 7, 2009 at 6:45 PM, gateway0 <re...@yahoo.de> wrote:

>
> Thank you, that was it.
>
> Why is the preserveOriginal="1" option nowhere documented?
>
>
A simple case of oversight :)

I've added a note on preserveOriginal and splitOnNumerics (another omission)
to the wiki page http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

-- 
Regards,
Shalin Shekhar Mangar.

Re: Can´t use wildcard "*" on alphanumeric values?

Posted by gateway0 <re...@yahoo.de>.

Thank you, that was it.

Why is the preserveOriginal="1" option nowhere documented?




Shalin Shekhar Mangar wrote:
> 
> On Tue, Jul 7, 2009 at 2:10 PM, gateway0 <re...@yahoo.de> wrote:
> 
>>
>> I indexed my data and defined a defaultsearchfield named "text:" (<field
>> name="text" type="text" indexed="true" stored="false"
>> multiValued="true"/>).
>>
>> Lets say I have 2 values indexed
>> 1.value "ABCD"
>> 2.value "ABCD3456"
>>
>> Now when I do a wildcard search over that two values the following
>> happens:
>> - query:"q=AB*" => All two values are returned "ABCD" and "ABCD3456" =>
>> wildcard is functioning!
>> - query:"q=ABCD3*" => No results are returned! (expected: "ABCD3456") =>
>> wildcard does not function!
>>
>> Am I doing something wrong? Is there a way to use wildcards on
>> alphanumeric
>> values?
>>
> 
> I think the problem is that the WordDelimiterFilter applied on 'text'
> type,
> splits 'ABCD3456' into 'ABCD' and '3456' etc. Also, prefix queries are not
> analyzed so that don't pass through the same filters.
> 
> I guess one simple solution to your problem is to add preserveOriginal="1"
> to the WordDelimiterFilterFactory definition inside the 'text' field type.
> 
> -- 
> Regards,
> Shalin Shekhar Mangar.
> 
> 

-- 
View this message in context: http://www.nabble.com/Can%C2%B4t-use-wildcard-%22*%22-on-alphanumeric-values--tp24369209p24373135.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Can´t use wildcard "*" on alphanumeric values?

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.

On Tue, Jul 7, 2009 at 2:10 PM, gateway0 <re...@yahoo.de> wrote:

>
> I indexed my data and defined a defaultsearchfield named "text:" (<field
> name="text" type="text" indexed="true" stored="false"
> multiValued="true"/>).
>
> Lets say I have 2 values indexed
> 1.value "ABCD"
> 2.value "ABCD3456"
>
> Now when I do a wildcard search over that two values the following happens:
> - query:"q=AB*" => All two values are returned "ABCD" and "ABCD3456" =>
> wildcard is functioning!
> - query:"q=ABCD3*" => No results are returned! (expected: "ABCD3456") =>
> wildcard does not function!
>
> Am I doing something wrong? Is there a way to use wildcards on alphanumeric
> values?
>

I think the problem is that the WordDelimiterFilter applied on 'text' type,
splits 'ABCD3456' into 'ABCD' and '3456' etc. Also, prefix queries are not
analyzed so that don't pass through the same filters.

I guess one simple solution to your problem is to add preserveOriginal="1"
to the WordDelimiterFilterFactory definition inside the 'text' field type.

-- 
Regards,
Shalin Shekhar Mangar.