You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Frederico Azeiteiro <Fr...@cision.com> on 2010/07/02 12:57:02 UTC

steps to improve search

Hi,

I'm using the default text field type on my schema.

 

Is there a quick way to do more accurate searches like searching for
"paying for it" only return docs with the full expression "paying for
it",  and not return articles with word "pay" as it does now?

 

Thanks,

Frederico


Re: steps to improve search

Posted by Erick Erickson <er...@gmail.com>.
Yes, when you change the schema in the indexing portion,
it is necessary to reindex the data. You can change the
search parts w/o reindexing..

Also, see this page:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
See the CommonGramsFilterFactory section, which contains
this tidibt:
<<<CommonGramsFilter is useful for issuing phrase queries (i.e. "the cat")
that contain stop words. Normally phrases containing stop words would not
match their intended target and instead, the query "the cat" would match all
documents containing "cat", which can be undesirable behavior.>>

HTH
Erick

On Fri, Jul 2, 2010 at 11:38 AM, Frederico Azeiteiro <
Frederico.Azeiteiro@cision.com> wrote:

> Thanks Leonardo, I didn't know that tool, very good!
>
> So I see what is wrong:
>
> SnowballPorterFilterFactory and StopFilterFactory. (both used on index and
> query)
>
> I tried remove the snowball and change the stopfilter to "ignorecase=false"
> on QUERY and restarted solr.
>
> But now I get no results :(.
>
> On index analysis I get (result of filters):
> paying  for     it
> paying
> paying
> paying
> pay
>
> For Query analysis (result of filters):
> paying  for     it
> paying  for     it
> paying
> paying
> paying
>
> This means that at the end, the word indexed is "pay" and the searched is
> "paying"?
>
> It's necessary to reindex the data?
>
> Thanks
>
> -----Original Message-----
> From: Leonardo Menezes [mailto:leonardo.menezess@googlemail.com]
> Sent: sexta-feira, 2 de Julho de 2010 12:58
> To: solr-user@lucene.apache.org
> Subject: Re: steps to improve search
>
> most likely due to:
> EnglishPorterFilterFactory
> RemoveDuplicatesTokenFilterFactory
> StopFilterFactory
>
> you get those "fake" matches. try going into the admin, on the analysis
> section. in there you can "simulate" the index/search of a document, and
> see
> how its actually searched/indexed. it will give you some clues...
>
> On Fri, Jul 2, 2010 at 1:50 PM, Frederico Azeiteiro <
> Frederico.Azeiteiro@cision.com> wrote:
>
> > For the example given, I need the full expression "paying for it", so
> > yes all the words.
> > -----Original Message-----
> > From: Ahmet Arslan [mailto:iorixxx@yahoo.com]
> > Sent: sexta-feira, 2 de Julho de 2010 12:30
> > To: solr-user@lucene.apache.org
> > Subject: RE: steps to improve search
> >
> > > I need to know how to achieve more accurates queries (like
> > > the example below...) using these filters.
> >
> > do you want that all terms - you search - must appear in returned
> > documents?
> >
> > You can change default operator of QueryParser to AND. either in
> > schema.xml or appending &q.op=AND you your search url. I am assuming you
> > are not using dismax.
> >
> >
> >
> >
>

RE: steps to improve search

Posted by Frederico Azeiteiro <Fr...@cision.com>.
Thanks Leonardo, I didn't know that tool, very good!

So I see what is wrong:

SnowballPorterFilterFactory and StopFilterFactory. (both used on index and query)

I tried remove the snowball and change the stopfilter to "ignorecase=false" on QUERY and restarted solr.

But now I get no results :(.

On index analysis I get (result of filters):
paying	for	it
paying
paying
paying
pay

For Query analysis (result of filters):
paying	for	it
paying	for	it
paying
paying
paying

This means that at the end, the word indexed is "pay" and the searched is "paying"?

It's necessary to reindex the data?

Thanks

-----Original Message-----
From: Leonardo Menezes [mailto:leonardo.menezess@googlemail.com] 
Sent: sexta-feira, 2 de Julho de 2010 12:58
To: solr-user@lucene.apache.org
Subject: Re: steps to improve search

most likely due to:
EnglishPorterFilterFactory
RemoveDuplicatesTokenFilterFactory
StopFilterFactory

you get those "fake" matches. try going into the admin, on the analysis
section. in there you can "simulate" the index/search of a document, and see
how its actually searched/indexed. it will give you some clues...

On Fri, Jul 2, 2010 at 1:50 PM, Frederico Azeiteiro <
Frederico.Azeiteiro@cision.com> wrote:

> For the example given, I need the full expression "paying for it", so
> yes all the words.
> -----Original Message-----
> From: Ahmet Arslan [mailto:iorixxx@yahoo.com]
> Sent: sexta-feira, 2 de Julho de 2010 12:30
> To: solr-user@lucene.apache.org
> Subject: RE: steps to improve search
>
> > I need to know how to achieve more accurates queries (like
> > the example below...) using these filters.
>
> do you want that all terms - you search - must appear in returned
> documents?
>
> You can change default operator of QueryParser to AND. either in
> schema.xml or appending &q.op=AND you your search url. I am assuming you
> are not using dismax.
>
>
>
>

Re: steps to improve search

Posted by Leonardo Menezes <le...@googlemail.com>.
most likely due to:
EnglishPorterFilterFactory
RemoveDuplicatesTokenFilterFactory
StopFilterFactory

you get those "fake" matches. try going into the admin, on the analysis
section. in there you can "simulate" the index/search of a document, and see
how its actually searched/indexed. it will give you some clues...

On Fri, Jul 2, 2010 at 1:50 PM, Frederico Azeiteiro <
Frederico.Azeiteiro@cision.com> wrote:

> For the example given, I need the full expression "paying for it", so
> yes all the words.
> -----Original Message-----
> From: Ahmet Arslan [mailto:iorixxx@yahoo.com]
> Sent: sexta-feira, 2 de Julho de 2010 12:30
> To: solr-user@lucene.apache.org
> Subject: RE: steps to improve search
>
> > I need to know how to achieve more accurates queries (like
> > the example below...) using these filters.
>
> do you want that all terms - you search - must appear in returned
> documents?
>
> You can change default operator of QueryParser to AND. either in
> schema.xml or appending &q.op=AND you your search url. I am assuming you
> are not using dismax.
>
>
>
>

RE: steps to improve search

Posted by Frederico Azeiteiro <Fr...@cision.com>.
For the example given, I need the full expression "paying for it", so
yes all the words.
-----Original Message-----
From: Ahmet Arslan [mailto:iorixxx@yahoo.com] 
Sent: sexta-feira, 2 de Julho de 2010 12:30
To: solr-user@lucene.apache.org
Subject: RE: steps to improve search

> I need to know how to achieve more accurates queries (like
> the example below...) using these filters.

do you want that all terms - you search - must appear in returned
documents?

You can change default operator of QueryParser to AND. either in
schema.xml or appending &q.op=AND you your search url. I am assuming you
are not using dismax.


      

RE: steps to improve search

Posted by Ahmet Arslan <io...@yahoo.com>.
> My Query: Headline:("paying for it") on solr admin
> interface
> 
> Some results:
> ...l stop paying tax until council pays for dam...
> "Why paying extra doesn't always pay!"
> "...pay cut as M&S investor pressure pays off"
> "Can't pay or won't pay: the debt collector call"
> 
> What could be wrong here?

May be StopWordFilterFactory is eating words 'for' and 'it'?
Remove it from your fieldType definition, re-start solr + re-index.


      

RE: steps to improve search

Posted by Frederico Azeiteiro <Fr...@cision.com>.
I'm using " surrounding the text.

My Query: Headline:("paying for it") on solr admin interface

Some results:
...l stop paying tax until council pays for dam...
"Why paying extra doesn't always pay!"
"...pay cut as M&S investor pressure pays off"
"Can't pay or won't pay: the debt collector call"

What could be wrong here?
Thanks.
-----Original Message-----
From: Leonardo Menezes [mailto:leonardo.menezess@googlemail.com] 
Sent: sexta-feira, 2 de Julho de 2010 12:30
To: solr-user@lucene.apache.org
Subject: Re: steps to improve search

No, you explained alright, but then didnt understand the answer. Searching
with the " surrounding the text you are searching for, has exactly the
effect you are looking for. try it...

On Fri, Jul 2, 2010 at 1:23 PM, Frederico Azeiteiro <
Frederico.Azeiteiro@cision.com> wrote:

> I'm sorry, maybe I didn’t explain correctly.
>
> The issue is using the default text FIELD TYPE, not the default text FIELD.
> The "text" field type uses a  lot of filters on indexing.
> I need to know how to achieve more accurates queries (like the example
> below...) using these filters.
>
>
> -----Original Message-----
> From: Leonardo Menezes [mailto:leonardo.menezess@googlemail.com]
> Sent: sexta-feira, 2 de Julho de 2010 12:07
> To: solr-user@lucene.apache.org
> Subject: Re: steps to improve search
>
> Try
> field:"text to search"
>
> On Fri, Jul 2, 2010 at 12:57 PM, Frederico Azeiteiro <
> Frederico.Azeiteiro@cision.com> wrote:
>
> > Hi,
> >
> > I'm using the default text field type on my schema.
> >
> >
> >
> > Is there a quick way to do more accurate searches like searching for
> > "paying for it" only return docs with the full expression "paying for
> > it",  and not return articles with word "pay" as it does now?
> >
> >
> >
> > Thanks,
> >
> > Frederico
> >
> >
>

Re: steps to improve search

Posted by Leonardo Menezes <le...@googlemail.com>.
No, you explained alright, but then didnt understand the answer. Searching
with the " surrounding the text you are searching for, has exactly the
effect you are looking for. try it...

On Fri, Jul 2, 2010 at 1:23 PM, Frederico Azeiteiro <
Frederico.Azeiteiro@cision.com> wrote:

> I'm sorry, maybe I didn’t explain correctly.
>
> The issue is using the default text FIELD TYPE, not the default text FIELD.
> The "text" field type uses a  lot of filters on indexing.
> I need to know how to achieve more accurates queries (like the example
> below...) using these filters.
>
>
> -----Original Message-----
> From: Leonardo Menezes [mailto:leonardo.menezess@googlemail.com]
> Sent: sexta-feira, 2 de Julho de 2010 12:07
> To: solr-user@lucene.apache.org
> Subject: Re: steps to improve search
>
> Try
> field:"text to search"
>
> On Fri, Jul 2, 2010 at 12:57 PM, Frederico Azeiteiro <
> Frederico.Azeiteiro@cision.com> wrote:
>
> > Hi,
> >
> > I'm using the default text field type on my schema.
> >
> >
> >
> > Is there a quick way to do more accurate searches like searching for
> > "paying for it" only return docs with the full expression "paying for
> > it",  and not return articles with word "pay" as it does now?
> >
> >
> >
> > Thanks,
> >
> > Frederico
> >
> >
>

RE: steps to improve search

Posted by Ahmet Arslan <io...@yahoo.com>.
> I need to know how to achieve more accurates queries (like
> the example below...) using these filters.

do you want that all terms - you search - must appear in returned documents?

You can change default operator of QueryParser to AND. either in schema.xml or appending &q.op=AND you your search url. I am assuming you are not using dismax.


      

RE: steps to improve search

Posted by Frederico Azeiteiro <Fr...@cision.com>.
I'm sorry, maybe I didn’t explain correctly. 

The issue is using the default text FIELD TYPE, not the default text FIELD.
The "text" field type uses a  lot of filters on indexing. 
I need to know how to achieve more accurates queries (like the example below...) using these filters.


-----Original Message-----
From: Leonardo Menezes [mailto:leonardo.menezess@googlemail.com] 
Sent: sexta-feira, 2 de Julho de 2010 12:07
To: solr-user@lucene.apache.org
Subject: Re: steps to improve search

Try
field:"text to search"

On Fri, Jul 2, 2010 at 12:57 PM, Frederico Azeiteiro <
Frederico.Azeiteiro@cision.com> wrote:

> Hi,
>
> I'm using the default text field type on my schema.
>
>
>
> Is there a quick way to do more accurate searches like searching for
> "paying for it" only return docs with the full expression "paying for
> it",  and not return articles with word "pay" as it does now?
>
>
>
> Thanks,
>
> Frederico
>
>

Re: steps to improve search

Posted by Leonardo Menezes <le...@googlemail.com>.
Try
field:"text to search"

On Fri, Jul 2, 2010 at 12:57 PM, Frederico Azeiteiro <
Frederico.Azeiteiro@cision.com> wrote:

> Hi,
>
> I'm using the default text field type on my schema.
>
>
>
> Is there a quick way to do more accurate searches like searching for
> "paying for it" only return docs with the full expression "paying for
> it",  and not return articles with word "pay" as it does now?
>
>
>
> Thanks,
>
> Frederico
>
>