You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by PeterKerk <ve...@hotmail.com> on 2010/12/11 22:20:58 UTC

Re: full text search in multiple fields

Ok, Im back ;)

There's one final thing that needs to be fixed..

Im trying to apply the same logic as on cities, but now for the title of a
location.

There's a location with title: hortus rodondendrus

This location is found using this query:
http://localhost:8983/solr/db/select/?indent=on&q=hortus&defType=dismax&qf=title_search^20.0
But not when using this query:
http://localhost:8983/solr/db/select/?indent=on&q=hort&defType=dismax&qf=title_search^20.0

So, I believe my title value is not indexed the way I'd like it to be
indexed. I think currently Im indexing it in full words, but am not
tokenizing it per character...if that makes sense :)

The fieldtype of title is "text", defined below:

    <fieldType name="text" class="solr.TextField"
positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
		
        <!-- in this example, we will only use synonyms at query time
        <filter class="solr.SynonymFilterFactory"
synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
        -->
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords_dutch.txt"/>
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords_dutch.txt"/>
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
    </fieldType>


What should I add for this to be indexed in such a way that word parts are
also found?

Thanks!
-- 
View this message in context: http://lucene.472066.n3.nabble.com/full-text-search-in-multiple-fields-tp1888328p2070528.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: full text search in multiple fields

Posted by PeterKerk <ve...@hotmail.com>.
whoops :)
It was directed at iorixxx, in the first post before me
-- 
View this message in context: http://lucene.472066.n3.nabble.com/full-text-search-in-multiple-fields-tp1888328p2079581.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: full text search in multiple fields

Posted by Dennis Gearon <ge...@sbcglobal.net>.
For those of us who come late to a thread, having at least the last post that 
you're replying to would help. Me at least ;-)

 Dennis Gearon


Signature Warning
----------------
It is always a good idea to learn from your own mistakes. It is usually a better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



----- Original Message ----
From: PeterKerk <ve...@hotmail.com>
To: solr-user@lucene.apache.org
Sent: Sun, December 12, 2010 1:47:35 PM
Subject: Re: full text search in multiple fields


I went for the * operator, and it works now! Thanks!
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/full-text-search-in-multiple-fields-tp1888328p2075140.html

Sent from the Solr - User mailing list archive at Nabble.com.


Re: full text search in multiple fields

Posted by PeterKerk <ve...@hotmail.com>.
I went for the * operator, and it works now! Thanks!
-- 
View this message in context: http://lucene.472066.n3.nabble.com/full-text-search-in-multiple-fields-tp1888328p2075140.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: full text search in multiple fields

Posted by PeterKerk <ve...@hotmail.com>.
Oeps, sloppy, was a copy paste error.

I now have: 

WORKING:
http://localhost:8983/solr/db/select/?indent=on&q=title_search:Pappegay&defType=lucene&fl=id,title

NOT WORKING:
http://localhost:8983/solr/db/select/?indent=on&q=title_search:Pappegay*&defType=lucene&fl=id,title
-- 
View this message in context: http://lucene.472066.n3.nabble.com/full-text-search-in-multiple-fields-tp1888328p2134044.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: full text search in multiple fields

Posted by Ahmet Arslan <io...@yahoo.com>.
> 
> When I do:
> &q=title_search:Pappegay*&defType=lucene&q=*:*&fl=id,title
> 
> nothing is found.
> 
> but if I do:
> &q=title_search:Pappegay&defType=lucene&q=*:*&fl=id,title
> 
> the location IS found.
> 
> I do need a wildcard though, since users may also search on
> parts of the
> title (as described earlier in this post). But this looks
> almost as if the
> location is not found if the wildcard is on the end and the
> searched string
> is no longer than the position of the wildcard(if that
> makes sense :)

Why are you using two q parameters in your search URL? &q=*:*&q=title_search:Pappegay*


      

Re: full text search in multiple fields

Posted by PeterKerk <ve...@hotmail.com>.
@iorixxx: removing that line did solve the problem, thanks!
-- 
View this message in context: http://lucene.472066.n3.nabble.com/full-text-search-in-multiple-fields-tp1888328p2138629.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: full text search in multiple fields

Posted by Erick Erickson <er...@gmail.com>.
My first guess: You've got some sort of stemming going on at index time so
tuinkamer is getting indexed as tuinkam or something. To find out, look
at you admin page, the "schema browser".

Another interesting page is admin/analysis, which can show you what happens
at each step of the indexing process (check the debug checkbox). Be a little
cautious with wildcards in the query though, the output may be a little
misleading.

You might try getting a copy of Luke to examine your index and see what's
actually in there. Often problems like this are a result of thinking what
actually got indexed is different than what actually was indexed.

Finally, you can use the &debugQuery=on to examine the query, although in
this particular case I don't think it would have helped.

Best
Erick

On Thu, Dec 23, 2010 at 2:20 PM, PeterKerk <ve...@hotmail.com> wrote:

>
> Sorry to bother you again, but it still doesnt seem to work all the time...
>
> This (what you solved earlier) works:
> &q=title_search:Pappegay&defType=lucene&fl=id,title
>
>
> But for another location, which value in DB is: "de tuinkamer"
>
> When I query the id of that location:
> &q=id:431&fl=id,title
> the location is found, so it IS indexed...
>
>
> But this query DOESNT work:
>
> &q=title_search:tuinkamer*&defType=lucene&fl=id,title
>
> And this one DOES:
> &q=title_search:tuin*&defType=lucene&fl=id,title
>
> for me this is unexpected...what can it be?
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/full-text-search-in-multiple-fields-tp1888328p2137983.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: full text search in multiple fields

Posted by Ahmet Arslan <io...@yahoo.com>.
> But for another location, which value in DB is: "de
> tuinkamer"
> 
> When I query the id of that location:
> &q=id:431&fl=id,title
> the location is found, so it IS indexed...
> 
> 
> But this query DOESNT work:
> 
> &q=title_search:tuinkamer*&defType=lucene&fl=id,title
> 
> And this one DOES:
> &q=title_search:tuin*&defType=lucene&fl=id,title
> 
> for me this is unexpected...what can it be?

As you can verify from /solr/admin/analysis.jsp, tuinkamer is reduced to tuinkam by EnglishPorterFilterFactory. So it expected/normal that &q=title_search:tuinkamer* won't return that document.  Remember tuinkamer* is not analyzed and tested against "what is indexed". That said, if you plan using wildcards, remove EnglishPorterFilterFactory from your analyzers.



      

Re: full text search in multiple fields

Posted by PeterKerk <ve...@hotmail.com>.
Sorry to bother you again, but it still doesnt seem to work all the time...

This (what you solved earlier) works:
&q=title_search:Pappegay&defType=lucene&fl=id,title


But for another location, which value in DB is: "de tuinkamer"

When I query the id of that location:
&q=id:431&fl=id,title
the location is found, so it IS indexed...


But this query DOESNT work:

&q=title_search:tuinkamer*&defType=lucene&fl=id,title

And this one DOES:
&q=title_search:tuin*&defType=lucene&fl=id,title

for me this is unexpected...what can it be?
-- 
View this message in context: http://lucene.472066.n3.nabble.com/full-text-search-in-multiple-fields-tp1888328p2137983.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: full text search in multiple fields

Posted by PeterKerk <ve...@hotmail.com>.
Correct! Thanks again, it now works! :)
-- 
View this message in context: http://lucene.472066.n3.nabble.com/full-text-search-in-multiple-fields-tp1888328p2137284.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: full text search in multiple fields

Posted by Ahmet Arslan <io...@yahoo.com>.
> 
> When I do:
> &q=title_search:Pappegay*&defType=lucene&q=*:*&fl=id,title
> 
> nothing is found.
> 

This is expected since you have lowercase filter in your index analyzer. Wildcard searches are not analyzed. So you need to lowercase your query on client side. &q=title_search:pappegay*&defType=lucene&fl=id,title


      

Re: full text search in multiple fields

Posted by PeterKerk <ve...@hotmail.com>.
Mmmm, this is strange:

When I do:
&q=title_search:Pappegay*&defType=lucene&q=*:*&fl=id,title

nothing is found.

but if I do:
&q=title_search:Pappegay&defType=lucene&q=*:*&fl=id,title

the location IS found.

I do need a wildcard though, since users may also search on parts of the
title (as described earlier in this post). But this looks almost as if the
location is not found if the wildcard is on the end and the searched string
is no longer than the position of the wildcard(if that makes sense :)
-- 
View this message in context: http://lucene.472066.n3.nabble.com/full-text-search-in-multiple-fields-tp1888328p2133991.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: full text search in multiple fields

Posted by Ahmet Arslan <io...@yahoo.com>.
> 
> The name of the location in the database is:
> "Museumrestaurant De Pappegay"

What was the wildcard query for this?




      

Re: full text search in multiple fields

Posted by PeterKerk <ve...@hotmail.com>.
Ok, I was trying to hide the actual name of the location, because I dont want
it to get indexed by search engines AND its a bit of a weird name :p

The name of the location in the database is: "Museumrestaurant De Pappegay"

Anyway, here it is, I executed the queries you gave me, and this is the
result:

DOC FOUND:
http://localhost:8983/solr/db/select/?indent=on&facet=true&sort=membervalue%20desc&sort=location_rating%20desc&q=title_search:%22pappegay%22&defType=lucene&fl=title,title_search
http://localhost:8983/solr/db/select/?indent=on&facet=true&sort=membervalue%20desc&sort=location_rating%20desc&q=title_search:%22Pappegay%22&defType=lucene&fl=title,title_search

http://localhost:8983/solr/db/select/?indent=on&facet=true&sort=membervalue%20desc&sort=location_rating%20desc&q=title:%22Pappegay%22&defType=lucene&fl=title,title_search

NO DOC FOUND:
http://localhost:8983/solr/db/select/?indent=on&facet=true&sort=membervalue%20desc&sort=location_rating%20desc&q=title:%22pappegay%22&defType=lucene&fl=title,title_search
-- 
View this message in context: http://lucene.472066.n3.nabble.com/full-text-search-in-multiple-fields-tp1888328p2133915.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: full text search in multiple fields

Posted by Ahmet Arslan <io...@yahoo.com>.
> Certainly did!
> Why, are you saying this code is correct as-is?

Yes, the query &q=title_search:hort*&defType=lucene should return documents having "Hortus supremus" in their title field with the configurations you send us.

It should exists somewhere in the result set, if not in the top 10.

Try a few things to make sure your document is indexed.

&q=title_search:"Hortus supremus"&defType=lucene&fl=title,title_search
&q=title:"Hortus supremus"&defType=lucene&fl=title,title_search

Are they returning that document? Or find that document's unique id and query it.


      

Re: full text search in multiple fields

Posted by PeterKerk <ve...@hotmail.com>.
Certainly did!
Why, are you saying this code is correct as-is?
-- 
View this message in context: http://lucene.472066.n3.nabble.com/full-text-search-in-multiple-fields-tp1888328p2133022.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: full text search in multiple fields

Posted by Jonathan Rochkind <ro...@jhu.edu>.
Did you reindex after you changed your analyzers?

On 12/22/2010 12:57 PM, PeterKerk wrote:
> Hi guys,
>
> There's one more thing to get this code to work as I need I just found
> out...
>
> Im now using:&q=title_search:hort*&defType=lucene
> as iorixxx suggested.
>
> it works good BUT, this query doesnt find results if the title in DB is
> "Hortus supremus"
>
> I tried adding some tokenizers and filters to solve this, what I think is a
> casing issue, but no luck...
>
> below is my code...what am I missing here?
>
> Thanks again!
>
>
> <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
>    <analyzer type="index">
> 	<tokenizer class="solr.WhitespaceTokenizerFactory"/>
> 	
> 	<!-- in this example, we will only use synonyms at query time
> 	<filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt"
> ignoreCase="true" expand="false"/>
> 	-->
> 	<filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords_dutch.txt"/>
> 	<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> catenateAll="0" splitOnCaseChange="1"/>
> 	<filter class="solr.LowerCaseFilterFactory"/>
> 	<filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
> 	<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>    </analyzer>
>    <analyzer type="query">
> 	<tokenizer class="solr.WhitespaceTokenizerFactory"/>
> 	<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
> 	<filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords_dutch.txt"/>
> 	<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> generateNumberParts="1" catenateWords="0" catenateNumbers="0"
> catenateAll="0" splitOnCaseChange="1"/>
> 	<filter class="solr.LowerCaseFilterFactory"/>
> 	<filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
> 	<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>    </analyzer>
> </fieldType>
>
>
> <field name="title" type="text_ws" indexed="true" stored="true"/>
> <field name="title_search" type="text" indexed="true" stored="true"/>
> <copyField source="title" dest="title_search"/>

Re: full text search in multiple fields

Posted by PeterKerk <ve...@hotmail.com>.
Hi guys,

There's one more thing to get this code to work as I need I just found
out...

Im now using: &q=title_search:hort*&defType=lucene 
as iorixxx suggested.

it works good BUT, this query doesnt find results if the title in DB is
"Hortus supremus"

I tried adding some tokenizers and filters to solve this, what I think is a
casing issue, but no luck...

below is my code...what am I missing here?

Thanks again!


<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
	<tokenizer class="solr.WhitespaceTokenizerFactory"/>
	
	<!-- in this example, we will only use synonyms at query time
	<filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt"
ignoreCase="true" expand="false"/>
	-->
	<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords_dutch.txt"/>
	<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="1" catenateNumbers="1"
catenateAll="0" splitOnCaseChange="1"/>
	<filter class="solr.LowerCaseFilterFactory"/>
	<filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
	<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
  <analyzer type="query">
	<tokenizer class="solr.WhitespaceTokenizerFactory"/>
	<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
	<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords_dutch.txt"/>
	<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1" catenateWords="0" catenateNumbers="0"
catenateAll="0" splitOnCaseChange="1"/>
	<filter class="solr.LowerCaseFilterFactory"/>
	<filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
	<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
</fieldType>


<field name="title" type="text_ws" indexed="true" stored="true"/>
<field name="title_search" type="text" indexed="true" stored="true"/>
<copyField source="title" dest="title_search"/>
-- 
View this message in context: http://lucene.472066.n3.nabble.com/full-text-search-in-multiple-fields-tp1888328p2132659.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: full text search in multiple fields

Posted by Ahmet Arslan <io...@yahoo.com>.
> There's a location with title: hortus rodondendrus
> 
> This location is found using this query:
> http://localhost:8983/solr/db/select/?indent=on&q=hortus&defType=dismax&qf=title_search^20.0
> But not when using this query:
> http://localhost:8983/solr/db/select/?indent=on&q=hort&defType=dismax&qf=title_search^20.0
> 
> So, I believe my title value is not indexed the way I'd
> like it to be
> indexed. I think currently Im indexing it in full words,
> but am not
> tokenizing it per character...if that makes sense :)

> What should I add for this to be indexed in such a way that
> word parts are
> also found?

The question is, do you want to retrieve that document, with the following queries too? h, ho, hor, hort, hortu.

Or is there a special relation between just hortus and hort?

For the former one, you can use * operator, e.g. &q=title_search:hort*&defType=lucene 
Please note that * is not supported by dismax.

For the latter one you can use http://wiki.apache.org/solr/LanguageAnalysis#solr.StemmerOverrideFilterFactory to manually reduce hortus to hort.