You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Thomas Fischer <fi...@aon.at> on 2011/06/08 00:23:24 UTC

wildcard search

Hello,

I am testing solr 3.2 and have problems with wildcards.
I am indexing values like "IA 300; IC 330; IA 317; IA 318" in a field "GOK", and can't find a way to search with wildcards.
I want to use a wild card search to match something like "IA 31?" but cannot find a way to do so.
GOK:IA\ 38* doesn't work with the contents of GOK indexed as text.
Is there a way to index and search that would meet my requirements?

Thomas



Re: wildcard search

Posted by Thomas Fischer <fi...@aon.at>.
Hi Ahmet,

>> so I created a fake license
>> ComplexPhrase-LICENSE-MIT.txt
>> for ComplexPhrase and tried again, which ran through
>> successfully, I hope this is OK.
> 
> I didn't used it with solr 3.2. I will check about it. 
> 
> So your GOK field already contains the list as multivalued. Then you can use prefix query parser plugin for this. Just make sure that field type of GOK is string not text.  
> q={!prefix f=GOK}IA 3   should be equivalent to  {!complexphrase}GOK:"IA 3*"

I'll try that. But my search requests come from a pazpar2 system and are directed against different clients, which all get requests of the form 
GOK:"IA 32*", so in some sense this is better for me.

I found two problems:

– In the solr 1.4.2 version I'm testing the request "IA 32*" works, but GOK:"IA 32*" will not. Is this somehow related to the indexing of that field?

– The other is that "IA320" (on 1.4.2) and GOK:"IA320" (on 3.2) will throw an exception:
description
The server encountered an internal error
(Unknown query type "org.apache.lucene.search.PhraseQuery" found in phrase query string "IA620" java.lang.IllegalArgumentException: Unknown query type "org.apache.lucene.search.PhraseQuery" found in phrase query string "IA620" at org.apache.lucene.queryParser.ComplexPhraseQueryParser

Cheers
Thomas



wildcard search: Update

Posted by Thomas Fischer <fi...@aon.at>.
Hello,

I'm still struggling with wildcard search in solr.
I installed the ComplexPhraseQueryParser which essentially accomplishes what I'm looking for: I can search in my field "GOK" using phrases with wildcards, e.g. GOK:"POF 15?".
This works with either solr 1.4.2 or 3.3.
What irritates me is that this kind of a search throws an exception when there is *no* space, e.g. for GOK:"POF15?"  (useless) or DDC:"942.?" (meaningful). On the other hand, the search will work if the quotes are omitted: DDC:942.? yields the expected results.

An additional source of irritation is the error message:

The server encountered an internal error (Unknown query type "org.apache.lucene.search.WildcardQuery" found in phrase query string "POF15?" java.lang.IllegalArgumentException: Unknown query type "org.apache.lucene.search.WildcardQuery" found in phrase query string "POF1??" at org.apache.lucene.queryParser.ComplexPhraseQueryParser$ComplexPhraseQuery.rewrite

I don' understand why the query type "org.apache.lucene.search.WildcardQuery" is unknown (this is contained in lucene-core-2.9.3.jar), nor what it means that it is 'found in phrase query string "POF15?"'

Can anybody give me a hint how to handle this problem (apart from erasing the quotes if no whitespace is present)?

Cheers
Thomas



Re: wildcard search

Posted by Ahmet Arslan <io...@yahoo.com>.
> I tried to follow this recipe, adapting it to the solr 3.2
> I am testing right now.
> The first try gave me a message
> 
>      [java] !!!!!!! Couldn't get
> license file for
> /Installer/solr/apache-solr-3.2.0/solr/lib/ComplexPhrase-1.0.jar
>      [java] At least one file does not
> have a license, or it's license name is not in the proper
> format.  See the logs.
> 
> BUILD FAILED
> 
> so I created a fake license
> ComplexPhrase-LICENSE-MIT.txt
> for ComplexPhrase and tried again, which ran through
> successfully, I hope this is OK.

I didn't used it with solr 3.2. I will check about it. 

> > IA 300
> > IC 330
> > IA 314
> > IA 318
> 
> I didn't have to split them up, they are already separated
> as field with multiValued="true".
> But I need to be able to search for IA 310 - IA 319 with
> one call,
> {!complexphrase}GOK:"IA 31?"
> will do this now, or even for 
> {!complexphrase}GOK:"IA 3*"
> to catch all those in one go.

So your GOK field already contains the list as multivalued. Then you can use prefix query parser plugin for this. Just make sure that field type of GOK is string not text.  
q={!prefix f=GOK}IA 3   should be equivalent to  {!complexphrase}GOK:"IA 3*"


Re: wildcard search

Posted by Thomas Fischer <fi...@aon.at>.
Hi Ahmet,

>>> I don't use it myself  (but I will soon), so I
>>> may be wrong, but did you try
>>> to use the ComplexPhraseQueryParser :
>>> 
>>> ComplexPhraseQueryParser
>>>           QueryParser which
>>> permits complex phrase query syntax eg "(john
>>> jon jonathan~) peters*".
>>> 
>>> It seems that you could do such type of queries :
>>> 
>>> GOK:"IA 38*"
>> 
>> yes that sounds interesting.
>> But I don't know how to get and install it into solr. Cam
>> you give me a hint?
> 
> https://issues.apache.org/jira/browse/SOLR-1604

I tried to follow this recipe, adapting it to the solr 3.2 I am testing right now.
The first try gave me a message

     [java] !!!!!!! Couldn't get license file for /Installer/solr/apache-solr-3.2.0/solr/lib/ComplexPhrase-1.0.jar
     [java] At least one file does not have a license, or it's license name is not in the proper format.  See the logs.

BUILD FAILED

so I created a fake license
ComplexPhrase-LICENSE-MIT.txt
for ComplexPhrase and tried again, which ran through successfully, I hope this is OK.

I registered queryparser not to solrhome/conf/solrconfig.xml (no such thing, I'm running multiple cores) but to
solrhome/cores/lit/conf/solrconfig.xml
and could search successfully for
{!complexphrase}GOK:"IC 62*"

> But it seems that you can achieve what you want with vanilla solr.
> 
> I don't follow the multivalued part in your example but you can tokenize 
> "IA 300; IC 330; IA 317; IA 318" into these 4 tokens 
> 
> IA 300
> IC 330
> IA 314
> IA 318

I didn't have to split them up, they are already separated as field with multiValued="true".
But I need to be able to search for IA 310 - IA 319 with one call,
{!complexphrase}GOK:"IA 31?"
will do this now, or even for 
{!complexphrase}GOK:"IA 3*"
to catch all those in one go.

Thanks, this helped a lot
Thomas


Re: wildcard search

Posted by Ahmet Arslan <io...@yahoo.com>.
> > I don't use it myself  (but I will soon), so I
> may be wrong, but did you try
> > to use the ComplexPhraseQueryParser :
> > 
> > ComplexPhraseQueryParser
> >          QueryParser which
> permits complex phrase query syntax eg "(john
> > jon jonathan~) peters*".
> > 
> > It seems that you could do such type of queries :
> > 
> > GOK:"IA 38*"
> 
> yes that sounds interesting.
> But I don't know how to get and install it into solr. Cam
> you give me a hint?

https://issues.apache.org/jira/browse/SOLR-1604

But it seems that you can achieve what you want with vanilla solr.

I don't follow the multivalued part in your example but you can tokenize 
"IA 300; IC 330; IA 317; IA 318" into these 4 tokens 

IA 300
IC 330
IA 314
IA 318

Using Pattern Tokenizer Factory. And you can use PrefixQParserPlugin for searching.

http://lucene.apache.org/solr/api/org/apache/solr/search/PrefixQParserPlugin.html




Re: wildcard search

Posted by Thomas Fischer <fi...@aon.at>.
Hi Ludovic,


> I don't use it myself  (but I will soon), so I may be wrong, but did you try
> to use the ComplexPhraseQueryParser :
> 
> ComplexPhraseQueryParser
>          QueryParser which permits complex phrase query syntax eg "(john
> jon jonathan~) peters*".
> 
> It seems that you could do such type of queries :
> 
> GOK:"IA 38*"

yes that sounds interesting.
But I don't know how to get and install it into solr. Cam you give me a hint?

Thanks
Thomas

Re: wildcard search

Posted by lboutros <bo...@gmail.com>.
Hi Thomas,

I don't use it myself  (but I will soon), so I may be wrong, but did you try
to use the ComplexPhraseQueryParser :

ComplexPhraseQueryParser
          QueryParser which permits complex phrase query syntax eg "(john
jon jonathan~) peters*".

It seems that you could do such type of queries :

GOK:"IA 38*"

Ludovic.


-----
Jouve
France.
--
View this message in context: http://lucene.472066.n3.nabble.com/memory-leak-during-undeploying-tp2620093p3039561.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: wildcard search

Posted by Erick Erickson <er...@gmail.com>.
Hmmm, have you tried EdgeNGrams? This works for me (at the expense
of a somewhat larger index, of course)...

    <fieldType name="edge" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.KeywordTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory" />
        <filter class="solr.EdgeNGramFilterFactory" minGramSize="4"
maxGramSize="15" side="front"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.KeywordTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory" />
      </analyzer>


and a field of type "edge" named "thomasfield"....


Now searches like
thomasfield:"GOK IA 3"
(include quotes!) should work. The various parameters (min/max gram size)
I chose arbitrarily, you'll want to tweak them.

I include a lowercasefilter for safety's sake if people are actually
going to type
things in...

It's probably instructive to look at the admin/analysis page to see how
this all plays out....

Best
Erick


On Wed, Jun 8, 2011 at 9:29 AM, Thomas Fischer <fi...@aon.at> wrote:
> Hi Erick,
>
> I have a multivalued field "GOK" (local classification scheme) with separate entries of the sort
>  IA 300; IC 330; IA 317; IA 318, i.e. 1 to 3 capital characters, space, 3 digits.
> I want to be able to perform a truncated search on that field:
> either just the string before the space, or a combination of that string with 1 or 2 digits, something like:
> GOK:IA
> or
> GOK:IA 3*
> or
> GOK:IA 31?
> My problem is the clash between the phrase (GOK:"IA 317" works) and the wildcards.
>
> As a start I tried as type
> <fieldType name="text" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
> from the solr 3.2 distribution schema
> (apache-solr-3.2.0/example/solr/conf/schema.xml),
> the field is just
> <field name="GOK" type="text" multiValued="true"/>
>
> BTW, I have another field "DDC" with entries of the form "t1:086643" with analogous requirements which yields similar problems due to the colon, also indexed as text.
> Here also
> DDC:T1\:086643
> works, but not
> DDC:T1\:08664?
>
> Thanks in advance
> Thomas
>
>> Yes there is, but you haven't provided enough information to
>> make a suggestion. What isthe fieldType definition? What is
>> the field definition?
>>
>> Two resources that'll help you greatly are:
>> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
>>
>> and the admin/analysis page...
>>
>> Best
>> Erick
>>
>> On Tue, Jun 7, 2011 at 6:23 PM, Thomas Fischer <fi...@aon.at> wrote:
>>> Hello,
>>>
>>> I am testing solr 3.2 and have problems with wildcards.
>>> I am indexing values like "IA 300; IC 330; IA 317; IA 318" in a field "GOK", and can't find a way to search with wildcards.
>>> I want to use a wild card search to match something like "IA 31?" but cannot find a way to do so.
>>> GOK:IA\ 38* doesn't work with the contents of GOK indexed as text.
>>> Is there a way to index and search that would meet my requirements?
>>>
>>> Thomas
>>>
>>>
>>>
>
> Mit freundlichen Grüßen
> Thomas Fischer
>
>
>

Re: wildcard search

Posted by Thomas Fischer <fi...@aon.at>.
Hi Erick,

I have a multivalued field "GOK" (local classification scheme) with separate entries of the sort
 IA 300; IC 330; IA 317; IA 318, i.e. 1 to 3 capital characters, space, 3 digits.
I want to be able to perform a truncated search on that field:
either just the string before the space, or a combination of that string with 1 or 2 digits, something like:
GOK:IA
or
GOK:IA 3*
or
GOK:IA 31?
My problem is the clash between the phrase (GOK:"IA 317" works) and the wildcards.

As a start I tried as type
<fieldType name="text" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
from the solr 3.2 distribution schema
(apache-solr-3.2.0/example/solr/conf/schema.xml),
the field is just
<field name="GOK" type="text" multiValued="true"/>

BTW, I have another field "DDC" with entries of the form "t1:086643" with analogous requirements which yields similar problems due to the colon, also indexed as text.
Here also 
DDC:T1\:086643
works, but not 
DDC:T1\:08664?

Thanks in advance
Thomas

> Yes there is, but you haven't provided enough information to
> make a suggestion. What isthe fieldType definition? What is
> the field definition?
> 
> Two resources that'll help you greatly are:
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
> 
> and the admin/analysis page...
> 
> Best
> Erick
> 
> On Tue, Jun 7, 2011 at 6:23 PM, Thomas Fischer <fi...@aon.at> wrote:
>> Hello,
>> 
>> I am testing solr 3.2 and have problems with wildcards.
>> I am indexing values like "IA 300; IC 330; IA 317; IA 318" in a field "GOK", and can't find a way to search with wildcards.
>> I want to use a wild card search to match something like "IA 31?" but cannot find a way to do so.
>> GOK:IA\ 38* doesn't work with the contents of GOK indexed as text.
>> Is there a way to index and search that would meet my requirements?
>> 
>> Thomas
>> 
>> 
>> 

Mit freundlichen Grüßen
Thomas Fischer



Re: wildcard search

Posted by Erick Erickson <er...@gmail.com>.
Yes there is, but you haven't provided enough information to
make a suggestion. What isthe fieldType definition? What is
the field definition?

Two resources that'll help you greatly are:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

and the admin/analysis page...

Best
Erick

On Tue, Jun 7, 2011 at 6:23 PM, Thomas Fischer <fi...@aon.at> wrote:
> Hello,
>
> I am testing solr 3.2 and have problems with wildcards.
> I am indexing values like "IA 300; IC 330; IA 317; IA 318" in a field "GOK", and can't find a way to search with wildcards.
> I want to use a wild card search to match something like "IA 31?" but cannot find a way to do so.
> GOK:IA\ 38* doesn't work with the contents of GOK indexed as text.
> Is there a way to index and search that would meet my requirements?
>
> Thomas
>
>
>