You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by PeterKerk <ve...@hotmail.com> on 2014/01/08 10:21:37 UTC

Searchquery on field that contains space

My query on finding a cityname does not show the closest matching value, but
instead gives priority to the first word in the searchquery.

I believe it has something to do with the whitespace tokenenization, but I
don't know which fields to change to what type.


Here's what happens when I search on "new york"

http://localhost:8983/solr/tt-cities/select/?indent=off&facet=false&fl=id,title&q=title_search:*new%20york*&defType=lucene&start=0&rows=10

<result name="response" numFound="810" start="0">
	<doc>
	<str name="title">New Golden Beach</str>
	</doc>
	<doc>
	<str name="title">New Auckland</str>
	</doc>
	<doc>
	<str name="title">New Waverly</str>
	</doc>
	<doc>
	<str name="title">New Market Village Mobile Home Park</str>
	</doc>
	<doc>
	<str name="title">New Centerville</str>
	</doc>
	<doc>
	<str name="title">New Meadows</str>
	</doc>
	<doc>
	<str name="title">New Plymouth</str>
	</doc>
	<doc>
	<str name="title">New Hope Mobile Home Park</str>
	</doc>
	<doc>
	<str name="title">New Light</str>
	</doc>
	<doc>
	<str name="title">New Vienna</str>
	</doc>
</result>


My schema.xml

    <fieldType name="text_ws" class="solr.TextField"
positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      </analyzer>
    </fieldType>


	<field name="title" type="text_ws" indexed="true" stored="true"/>
	<field name="title_search" type="string" indexed="true" stored="true"/>
	
	<copyField source="title" dest="title_search"/>

I also tried:

	<field name="title_search" type="text" indexed="true" stored="true"/>	

And:
	<field name="title" type="string" indexed="true" stored="true"/>
	<field name="title_search" type="string" indexed="true" stored="true"/>

	
	
What to do?



--
View this message in context: http://lucene.472066.n3.nabble.com/Searchquery-on-field-that-contains-space-tp4110166.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Searchquery on field that contains space

Posted by PeterKerk <ve...@hotmail.com>.
@Ahmet:

Thanks, but I also need to be able to search via wildcard and just found
that a "-" might be resulting in unwanted results. E.g. when using this
query:

http://localhost:8983/solr/tt-cities/select/?indent=off&facet=false&fl=id,title,provincetitle_nl&q=title_search:nij*&defType=lucene&start=0&rows=15

I also get a result for "Halle-Nijman", so it seems the wildcard is not
working, as "Halle-Nijman" does not start with "nij" (or "Nij")
I also tried:
q=title_search:(nij*)
q=title_search:(nij)*

How can I fix this?


@Erick:

When I'm on the analysis page I get the error:

"This Functionality requires the /analysis/field Handler to be registered
and active!"

So I added this line to my solr config (based on this post:
http://stackoverflow.com/questions/12627734/configure-field-analysis-handler-solr-4)

<requestHandler name="/analysis/field"
class="solr.FieldAnalysisRequestHandler" />

But still the same error occurs.



--
View this message in context: http://lucene.472066.n3.nabble.com/Searchquery-on-field-that-contains-space-tp4110166p4110485.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Searchquery on field that contains space

Posted by Erick Erickson <er...@gmail.com>.
You can also use phrase queries as title_search:"new york" if your
intent is to find the words "new" and "york" right next to each other.
There's also "slop", as "new york"~3 if you want to find the two words
within 3 (in this example) positions of each other.

Take a look at the admin/analysis page for questions like this, it'll
_really_
help.

Best,
Erick


On Wed, Jan 8, 2014 at 4:38 AM, Ahmet Arslan <io...@yahoo.com> wrote:

> Hi Peter,
>
> q=title_search:new york parsed as title_search:new
> default_search_field:york.
> If you use a tokenized type, use parenthesis q=title_search:(new york)
>
> If you use string type, use term query parser q={!term f=city_search}new
> york
>
> Ahmet
>
>
>
> On Wednesday, January 8, 2014 11:22 AM, PeterKerk <ve...@hotmail.com>
> wrote:
> My query on finding a cityname does not show the closest matching value,
> but
> instead gives priority to the first word in the searchquery.
>
> I believe it has something to do with the whitespace tokenenization, but I
> don't know which fields to change to what type.
>
>
> Here's what happens when I search on "new york"
>
>
> http://localhost:8983/solr/tt-cities/select/?indent=off&facet=false&fl=id,title&q=title_search:*new%20york*&defType=lucene&start=0&rows=10
>
> <result name="response" numFound="810" start="0">
>     <doc>
>     <str name="title">New Golden Beach</str>
>     </doc>
>     <doc>
>     <str name="title">New Auckland</str>
>     </doc>
>     <doc>
>     <str name="title">New Waverly</str>
>     </doc>
>     <doc>
>     <str name="title">New Market Village Mobile Home Park</str>
>     </doc>
>     <doc>
>     <str name="title">New Centerville</str>
>     </doc>
>     <doc>
>     <str name="title">New Meadows</str>
>     </doc>
>     <doc>
>     <str name="title">New Plymouth</str>
>     </doc>
>     <doc>
>     <str name="title">New Hope Mobile Home Park</str>
>     </doc>
>     <doc>
>     <str name="title">New Light</str>
>     </doc>
>     <doc>
>     <str name="title">New Vienna</str>
>     </doc>
> </result>
>
>
> My schema.xml
>
>     <fieldType name="text_ws" class="solr.TextField"
> positionIncrementGap="100">
>       <analyzer>
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>       </analyzer>
>     </fieldType>
>
>
>     <field name="title" type="text_ws" indexed="true" stored="true"/>
>     <field name="title_search" type="string" indexed="true" stored="true"/>
>
>     <copyField source="title" dest="title_search"/>
>
> I also tried:
>
>     <field name="title_search" type="text" indexed="true"
> stored="true"/>
>
> And:
>     <field name="title" type="string" indexed="true" stored="true"/>
>     <field name="title_search" type="string" indexed="true" stored="true"/>
>
>
>
> What to do?
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Searchquery-on-field-that-contains-space-tp4110166.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Re: Searchquery on field that contains space

Posted by Erick Erickson <er...@gmail.com>.
What's the purpose of having two fields "title" and "title_search"?
They both are exactly the same so it seems you could get rid of
one....

Just a nit.
Erick

As far as the analysis page is concerned, I suspect you took out
this definition from your solrconfig.xml file:

 <requestHandler name="/analysis/field"
                  startup="lazy"
                  class="solr.FieldAnalysisRequestHandler" />

PUT IT BACK ;). Really, this page will save you again and again
and again.

At least when I commented out this definition and tried using the
analysis page I got the same error. You may have taken out other
things in your solrconfig.xml file that are needed for this to work, but
this is the place to start.

Best
Erick

On Fri, Jan 10, 2014 at 4:31 AM, PeterKerk <ve...@hotmail.com> wrote:
> @iorixxx: thanks, you 2nd solution worked.
>
> The first one didn't (does not matter now), I got this:
>
> <field name="title" type="prefix_full" indexed="true" stored="true"/>
> <field name="title_search" type="prefix_full" indexed="true" stored="true"/>
>
> With the first solution all queries work as expected, however with this:
>
> q=title_search:"new%20yk"*
>
> still new york is returned.
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Searchquery-on-field-that-contains-space-tp4110166p4110658.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Searchquery on field that contains space

Posted by PeterKerk <ve...@hotmail.com>.
@iorixxx: thanks, you 2nd solution worked.

The first one didn't (does not matter now), I got this:

<field name="title" type="prefix_full" indexed="true" stored="true"/>
<field name="title_search" type="prefix_full" indexed="true" stored="true"/>

With the first solution all queries work as expected, however with this:

q=title_search:"new%20yk"*

still new york is returned.



--
View this message in context: http://lucene.472066.n3.nabble.com/Searchquery-on-field-that-contains-space-tp4110166p4110658.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Searchquery on field that contains space

Posted by Ahmet Arslan <io...@yahoo.com>.
Hi Peter,

Here are two different ways to do it.

1) Use phrase query q=yourField:"new y" with the following type.

<fieldType name="prefix_full" class="solr.TextField" positionIncrementGap="1">
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory" /> 
<filter class="solr.TrimFilterFactory" /> 
<filter class="solr.LowerCaseFilterFactory" /> 
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="20" /> 
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory" /> 
<filter class="solr.LowerCaseFilterFactory" /> 
</analyzer>
</fieldType>

2) Use prefix query q={!prefix f=yourField}new y with following type:

https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-PrefixQueryParser


<fieldType name="text_lower" class="solr.TextField" positionIncrementGap="1">
<analyzer>
<tokenizer class="solr.KeywordTokenizerFactory" /> 
<filter class="solr.TrimFilterFactory" /> 
<filter class="solr.LowerCaseFilterFactory" /> 
</analyzer>
</fieldType>

By the way I don't post on StackOverflow.

Ahmet



On Thursday, January 9, 2014 7:51 PM, PeterKerk <ve...@hotmail.com> wrote:
Hi Ahmet,

Thanks. Also for that link, although it's too advanced for my usecase.

I see that by using KeywordTokenizerFactory it almost works now, but when I
search on:

"new y", no results are found, 

but when I search on "new", I do get "New York".

So the space in the searchquery is still causing problems, what could that
be?

Thanks again!

ps. are you guys (like you, Erick, Maurice etc.) also active on
StackOverflow? At least you'll get the credit for good support :)



--
View this message in context: http://lucene.472066.n3.nabble.com/Searchquery-on-field-that-contains-space-tp4110166p4110515.html

Sent from the Solr - User mailing list archive at Nabble.com.


Re: Searchquery on field that contains space

Posted by PeterKerk <ve...@hotmail.com>.
Hi Ahmet,

Thanks. Also for that link, although it's too advanced for my usecase.

I see that by using KeywordTokenizerFactory it almost works now, but when I
search on:

"new y", no results are found, 

but when I search on "new", I do get "New York".

So the space in the searchquery is still causing problems, what could that
be?

Thanks again!

ps. are you guys (like you, Erick, Maurice etc.) also active on
StackOverflow? At least you'll get the credit for good support :)



--
View this message in context: http://lucene.472066.n3.nabble.com/Searchquery-on-field-that-contains-space-tp4110166p4110515.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Searchquery on field that contains space

Posted by Ahmet Arslan <io...@yahoo.com>.
Hi Peter,

Use KeywordTokenizerFactory instead of Whitespace tokenizer.

Also you might interested in this : http://www.cominvent.com/2012/01/25/super-flexible-autocomplete-with-solr/

Ahmet



On Thursday, January 9, 2014 6:35 PM, PeterKerk <ve...@hotmail.com> wrote:
Basically a user starts typing the first letters of a city and I want to
return citynames that start with those letters, case-insensitive and not
splitting the cityname on separate words (whether the separator is a
whitespace or a "-").
But although the search of a user is case-insensitive, I want to return the
values including casing, search on "new york" would return "New York", where
the latter is how it's stored in my MS-SQL DB.

I've been testing my code via the admin/analysis page.

I believe I don't want the WhitespaceTokenizerFactory on my field definition
since that splits the city names I want the following behavior:

query on:

"new*" returns "New york" or "newbee", but does not return values like
"greater new hampshire"
"york*" does NOT return "new york"

"nij*" returns "Nijmegen", but not "Halle-Nijman"

Here's what I have come up so far:

    <field name="title" type="text_lower_exact" indexed="true" stored="true"/>
    <field name="title_search" type="text_lower_exact" indexed="true"
stored="true"/>


    <fieldType name="text_lower_exact" class="solr.TextField"
positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords_dutch.txt"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords_dutch.txt"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
    </fieldType>    
    
    
    But when I leave out the WhitespaceTokenizerFactory I get:  Plugin init
failure for [schema.xml] fieldType "text_lower_exact": analyzer without
class or tokenizer,trace=org.apache.solr.common.SolrException: SolrCore
'tt-cities' is not available due to init failure



--
View this message in context: http://lucene.472066.n3.nabble.com/Searchquery-on-field-that-contains-space-tp4110166p4110495.html

Sent from the Solr - User mailing list archive at Nabble.com.


Re: Searchquery on field that contains space

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
On Thu, Jan 9, 2014 at 11:34 PM, PeterKerk <ve...@hotmail.com> wrote:

> Basically a user starts typing the first letters of a city and I want to
> return citynames that start with those letters, case-insensitive and not
> splitting the cityname on separate words (whether the separator is a
> whitespace or a "-").
> But although the search of a user is case-insensitive, I want to return the
> values including casing, search on "new york" would return "New York",
> where
> the latter is how it's stored in my MS-SQL DB.
>

Did you have a look at Analyzing Suggestor, it might be a better match for
your needs:
http://blog.mikemccandless.com/2012/09/lucenes-new-analyzing-suggester.html

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)

Re: Searchquery on field that contains space

Posted by PeterKerk <ve...@hotmail.com>.
Basically a user starts typing the first letters of a city and I want to
return citynames that start with those letters, case-insensitive and not
splitting the cityname on separate words (whether the separator is a
whitespace or a "-").
But although the search of a user is case-insensitive, I want to return the
values including casing, search on "new york" would return "New York", where
the latter is how it's stored in my MS-SQL DB.

I've been testing my code via the admin/analysis page.

I believe I don't want the WhitespaceTokenizerFactory on my field definition
since that splits the city names I want the following behavior:

query on:

"new*" returns "New york" or "newbee", but does not return values like
"greater new hampshire"
"york*" does NOT return "new york"

"nij*" returns "Nijmegen", but not "Halle-Nijman"

Here's what I have come up so far:

	<field name="title" type="text_lower_exact" indexed="true" stored="true"/>
	<field name="title_search" type="text_lower_exact" indexed="true"
stored="true"/>


    <fieldType name="text_lower_exact" class="solr.TextField"
positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords_dutch.txt"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords_dutch.txt"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
    </fieldType>	
	
	
	But when I leave out the WhitespaceTokenizerFactory I get:  Plugin init
failure for [schema.xml] fieldType "text_lower_exact": analyzer without
class or tokenizer,trace=org.apache.solr.common.SolrException: SolrCore
'tt-cities' is not available due to init failure



--
View this message in context: http://lucene.472066.n3.nabble.com/Searchquery-on-field-that-contains-space-tp4110166p4110495.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Searchquery on field that contains space

Posted by Ahmet Arslan <io...@yahoo.com>.
Hi Peter,

q=title_search:new york parsed as title_search:new default_search_field:york. 
If you use a tokenized type, use parenthesis q=title_search:(new york)

If you use string type, use term query parser q={!term f=city_search}new york

Ahmet



On Wednesday, January 8, 2014 11:22 AM, PeterKerk <ve...@hotmail.com> wrote:
My query on finding a cityname does not show the closest matching value, but
instead gives priority to the first word in the searchquery.

I believe it has something to do with the whitespace tokenenization, but I
don't know which fields to change to what type.


Here's what happens when I search on "new york"

http://localhost:8983/solr/tt-cities/select/?indent=off&facet=false&fl=id,title&q=title_search:*new%20york*&defType=lucene&start=0&rows=10

<result name="response" numFound="810" start="0">
    <doc>
    <str name="title">New Golden Beach</str>
    </doc>
    <doc>
    <str name="title">New Auckland</str>
    </doc>
    <doc>
    <str name="title">New Waverly</str>
    </doc>
    <doc>
    <str name="title">New Market Village Mobile Home Park</str>
    </doc>
    <doc>
    <str name="title">New Centerville</str>
    </doc>
    <doc>
    <str name="title">New Meadows</str>
    </doc>
    <doc>
    <str name="title">New Plymouth</str>
    </doc>
    <doc>
    <str name="title">New Hope Mobile Home Park</str>
    </doc>
    <doc>
    <str name="title">New Light</str>
    </doc>
    <doc>
    <str name="title">New Vienna</str>
    </doc>
</result>


My schema.xml

    <fieldType name="text_ws" class="solr.TextField"
positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      </analyzer>
    </fieldType>


    <field name="title" type="text_ws" indexed="true" stored="true"/>
    <field name="title_search" type="string" indexed="true" stored="true"/>
    
    <copyField source="title" dest="title_search"/>

I also tried:

    <field name="title_search" type="text" indexed="true" stored="true"/>    

And:
    <field name="title" type="string" indexed="true" stored="true"/>
    <field name="title_search" type="string" indexed="true" stored="true"/>

    
    
What to do?



--
View this message in context: http://lucene.472066.n3.nabble.com/Searchquery-on-field-that-contains-space-tp4110166.html
Sent from the Solr - User mailing list archive at Nabble.com.