You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Jonathan Rochkind <ro...@jhu.edu> on 2010/08/04 02:40:47 UTC

Re: wildcard and proximity searches

Frederico Azeiteiro wrote:
>
>>> But it is unusual to use both leading and trailing * operator. Why are
>>>       
> you doing this?
>
> Yes I know, but I have a few queries that need this. I'll try the
> "ReversedWildcardFilterFactory". 
>
>
>   

ReverseWildcardFilter will help leading wildcard, but will not help 
trying to use a query with BOTH leading and trailing wildcard. it'll 
still be slow. Solr/lucene isn't good at that; I didn't even know Solr 
would do it at all in fact.

If you really needed to do that, the way to play to solr/lucene's way of 
doing things, would be to have a field where you actually index each 
_character_ as a seperate token. Then leading and trailing wildcard 
search is basically reduced to a "phrase search", but where the words 
are actually characters.   But then you're going to get an index where 
pretty much every token belongs to every document, which Solr isn't that 
great at either, but then you can apply "commongram" stuff on top to 
help that out a lot too. Not quite sure what the end result will be, 
I've never tried it.  I'd only use that weird special "char as token" 
field for queries that actually required leading and trailing wildcards.

Figuring out how to set up your analyzers, and what (if anything) you're 
going to have to do client-app-side to transform the user's query into 
something that'll end up searching like a "phrase search where each 
'word' is a character.... is left as an exersize for the reader. :)  

Jonathan

Re: wildcard and proximity searches

Posted by Ahmet Arslan <io...@yahoo.com>.


--- On Tue, 10/5/10, Mark N <ni...@gmail.com> wrote:

> From: Mark N <ni...@gmail.com>
> Subject: Re: wildcard and proximity searches
> To: solr-user@lucene.apache.org
> Date: Tuesday, October 5, 2010, 2:30 PM
> Thanks ahmet
> 
> Is it also possible to search the document having a 
> field ENDING with
> "week*"
> 
> query should return documents with a field ending
> with  week and its
> derivatives such as weekly,weeks
> 
> So above query should return
> 
> "this week"
> "Past three weeks"
> "Report weekly"

But if you can append some artificial token to end of your field (client side maybe), you can use this plug-in for this task.  

Your modified documents/fields:

"this week END"
"Past three weeks END"
"Report weekly END"

Query "week* END" will return those three documents.

Re: wildcard and proximity searches

Posted by Ahmet Arslan <io...@yahoo.com>.

> Is it also possible to search the document having a 
> field ENDING with
> "week*"
> 
> query should return documents with a field ending
> with  week and its
> derivatives such as weekly,weeks
> 
> So above query should return
> 
> "this week"
> "Past three weeks"
> "Report weekly"

No this is not possible with this plugin. By the way the notation "week*" does not mean that. This is equal to week* (without quotations).

There is an ongoing discussion (title = "Begins with and ends with word") about endsWith type of search.

Re: wildcard and proximity searches

Posted by Mark N <ni...@gmail.com>.

Thanks ahmet

Is it also possible to search the document having a  field ENDING with
"week*"

query should return documents with a field ending with  week and its
derivatives such as weekly,weeks

So above query should return

"this week"
"Past three weeks"
"Report weekly"

thanks
chandan



On Tue, Oct 5, 2010 at 5:04 PM, Ahmet Arslan <io...@yahoo.com> wrote:

> > Also does this plugin allow us to use proximity with wild
> > card
> > *          "solr mail*"~10 *
> >
>
> Yes it supports "solr mail*"~10 kind of queries without any problem.
>
> Currently it throws exception with "mail*" kind of queries, but they are
> not valid phrase queries. Because there is only one clause inside quotation
> marks.
>
>
>
>


-- 
Nipen Mark

RE: wildcard and proximity searches

Posted by Frederico Azeiteiro <Fr...@cision.com>.

Hi Mark,
unfortanelly it's still on my ToDo list... :(.
 
I don't know if it allows "solr mail*"~10 . I hope so, as i'll need that also on the future.
 
Frederico

________________________________

De: Mark N [mailto:nipen.mark@gmail.com]
Enviada: ter 05-10-2010 11:29
Para: solr-user@lucene.apache.org
Assunto: Re: wildcard and proximity searches



Hi

were you successful in trying SOLR -1604  to allow wild card queries in
phrases ?

Also does this plugin allow us to use proximity with wild card
*          "solr mail*"~10 *

If this the right approach to go ahead to support these functionalities?

thanks
Mark





On Wed, Aug 4, 2010 at 2:24 PM, Frederico Azeiteiro <
Frederico.Azeiteiro@cision.com> wrote:

> Thanks for you ideia.
>
> At this point I'm logging each query time. My ideia is to divide my
> queries into "normal queries" and "heavy queries". I have some heavy
> queries with 1 minute or 2mintes to get results. But they have for
> instance (*word1* AND *word2* AND word3*). I guess that this will be
> always slower (could be a little faster with
> "ReversedWildcardFilterFactory") but they never be ready in a few
> seconds. For now, I just increased the timeout for those :) (using
> solrnet).
>
> My priority at the moment is the queries phrases like "word1* word2*
> word3". After this is working, I'll try to optimize the "heavy queries"
>
> Frederico
>
>
> -----Original Message-----
> From: Jonathan Rochkind [mailto:rochkind@jhu.edu]
> Sent: quarta-feira, 4 de Agosto de 2010 01:41
> To: solr-user@lucene.apache.org
> Subject: Re: wildcard and proximity searches
>
> Frederico Azeiteiro wrote:
> >
> >>> But it is unusual to use both leading and trailing * operator. Why
> are
> >>>
> > you doing this?
> >
> > Yes I know, but I have a few queries that need this. I'll try the
> > "ReversedWildcardFilterFactory".
> >
> >
> >
>
> ReverseWildcardFilter will help leading wildcard, but will not help
> trying to use a query with BOTH leading and trailing wildcard. it'll
> still be slow. Solr/lucene isn't good at that; I didn't even know Solr
> would do it at all in fact.
>
> If you really needed to do that, the way to play to solr/lucene's way of
>
> doing things, would be to have a field where you actually index each
> _character_ as a seperate token. Then leading and trailing wildcard
> search is basically reduced to a "phrase search", but where the words
> are actually characters.   But then you're going to get an index where
> pretty much every token belongs to every document, which Solr isn't that
>
> great at either, but then you can apply "commongram" stuff on top to
> help that out a lot too. Not quite sure what the end result will be,
> I've never tried it.  I'd only use that weird special "char as token"
> field for queries that actually required leading and trailing wildcards.
>
> Figuring out how to set up your analyzers, and what (if anything) you're
>
> going to have to do client-app-side to transform the user's query into
> something that'll end up searching like a "phrase search where each
> 'word' is a character.... is left as an exersize for the reader. :)
>
> Jonathan
>



--
Nipen Mark

Re: wildcard and proximity searches

Posted by Ahmet Arslan <io...@yahoo.com>.

> Also does this plugin allow us to use proximity with wild
> card
> *          "solr mail*"~10 *
> 

Yes it supports "solr mail*"~10 kind of queries without any problem. 

Currently it throws exception with "mail*" kind of queries, but they are not valid phrase queries. Because there is only one clause inside quotation marks.

Re: wildcard and proximity searches

Posted by Mark N <ni...@gmail.com>.

Hi

were you successful in trying SOLR -1604  to allow wild card queries in
phrases ?

Also does this plugin allow us to use proximity with wild card
*          "solr mail*"~10 *

If this the right approach to go ahead to support these functionalities?

thanks
Mark





On Wed, Aug 4, 2010 at 2:24 PM, Frederico Azeiteiro <
Frederico.Azeiteiro@cision.com> wrote:

> Thanks for you ideia.
>
> At this point I'm logging each query time. My ideia is to divide my
> queries into "normal queries" and "heavy queries". I have some heavy
> queries with 1 minute or 2mintes to get results. But they have for
> instance (*word1* AND *word2* AND word3*). I guess that this will be
> always slower (could be a little faster with
> "ReversedWildcardFilterFactory") but they never be ready in a few
> seconds. For now, I just increased the timeout for those :) (using
> solrnet).
>
> My priority at the moment is the queries phrases like "word1* word2*
> word3". After this is working, I'll try to optimize the "heavy queries"
>
> Frederico
>
>
> -----Original Message-----
> From: Jonathan Rochkind [mailto:rochkind@jhu.edu]
> Sent: quarta-feira, 4 de Agosto de 2010 01:41
> To: solr-user@lucene.apache.org
> Subject: Re: wildcard and proximity searches
>
> Frederico Azeiteiro wrote:
> >
> >>> But it is unusual to use both leading and trailing * operator. Why
> are
> >>>
> > you doing this?
> >
> > Yes I know, but I have a few queries that need this. I'll try the
> > "ReversedWildcardFilterFactory".
> >
> >
> >
>
> ReverseWildcardFilter will help leading wildcard, but will not help
> trying to use a query with BOTH leading and trailing wildcard. it'll
> still be slow. Solr/lucene isn't good at that; I didn't even know Solr
> would do it at all in fact.
>
> If you really needed to do that, the way to play to solr/lucene's way of
>
> doing things, would be to have a field where you actually index each
> _character_ as a seperate token. Then leading and trailing wildcard
> search is basically reduced to a "phrase search", but where the words
> are actually characters.   But then you're going to get an index where
> pretty much every token belongs to every document, which Solr isn't that
>
> great at either, but then you can apply "commongram" stuff on top to
> help that out a lot too. Not quite sure what the end result will be,
> I've never tried it.  I'd only use that weird special "char as token"
> field for queries that actually required leading and trailing wildcards.
>
> Figuring out how to set up your analyzers, and what (if anything) you're
>
> going to have to do client-app-side to transform the user's query into
> something that'll end up searching like a "phrase search where each
> 'word' is a character.... is left as an exersize for the reader. :)
>
> Jonathan
>



-- 
Nipen Mark

RE: wildcard and proximity searches

Posted by Frederico Azeiteiro <Fr...@cision.com>.

Thanks for you ideia.

At this point I'm logging each query time. My ideia is to divide my
queries into "normal queries" and "heavy queries". I have some heavy
queries with 1 minute or 2mintes to get results. But they have for
instance (*word1* AND *word2* AND word3*). I guess that this will be
always slower (could be a little faster with
"ReversedWildcardFilterFactory") but they never be ready in a few
seconds. For now, I just increased the timeout for those :) (using
solrnet).

My priority at the moment is the queries phrases like "word1* word2*
word3". After this is working, I'll try to optimize the "heavy queries"

Frederico

-----Original Message-----
From: Jonathan Rochkind [mailto:rochkind@jhu.edu] 
Sent: quarta-feira, 4 de Agosto de 2010 01:41
To: solr-user@lucene.apache.org
Subject: Re: wildcard and proximity searches

Frederico Azeiteiro wrote:
>
>>> But it is unusual to use both leading and trailing * operator. Why
are
>>>       
> you doing this?
>
> Yes I know, but I have a few queries that need this. I'll try the
> "ReversedWildcardFilterFactory". 
>
>
>   

ReverseWildcardFilter will help leading wildcard, but will not help 
trying to use a query with BOTH leading and trailing wildcard. it'll 
still be slow. Solr/lucene isn't good at that; I didn't even know Solr 
would do it at all in fact.

If you really needed to do that, the way to play to solr/lucene's way of

doing things, would be to have a field where you actually index each 
_character_ as a seperate token. Then leading and trailing wildcard 
search is basically reduced to a "phrase search", but where the words 
are actually characters.   But then you're going to get an index where 
pretty much every token belongs to every document, which Solr isn't that

great at either, but then you can apply "commongram" stuff on top to 
help that out a lot too. Not quite sure what the end result will be, 
I've never tried it.  I'd only use that weird special "char as token" 
field for queries that actually required leading and trailing wildcards.

Figuring out how to set up your analyzers, and what (if anything) you're

going to have to do client-app-side to transform the user's query into 
something that'll end up searching like a "phrase search where each 
'word' is a character.... is left as an exersize for the reader. :)  

Jonathan