You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Mark N <ni...@gmail.com> on 2010/10/05 12:29:38 UTC

Re: wildcard and proximity searches

Hi

were you successful in trying SOLR -1604  to allow wild card queries in
phrases ?

Also does this plugin allow us to use proximity with wild card
*          "solr mail*"~10 *

If this the right approach to go ahead to support these functionalities?

thanks
Mark





On Wed, Aug 4, 2010 at 2:24 PM, Frederico Azeiteiro <
Frederico.Azeiteiro@cision.com> wrote:

> Thanks for you ideia.
>
> At this point I'm logging each query time. My ideia is to divide my
> queries into "normal queries" and "heavy queries". I have some heavy
> queries with 1 minute or 2mintes to get results. But they have for
> instance (*word1* AND *word2* AND word3*). I guess that this will be
> always slower (could be a little faster with
> "ReversedWildcardFilterFactory") but they never be ready in a few
> seconds. For now, I just increased the timeout for those :) (using
> solrnet).
>
> My priority at the moment is the queries phrases like "word1* word2*
> word3". After this is working, I'll try to optimize the "heavy queries"
>
> Frederico
>
>
> -----Original Message-----
> From: Jonathan Rochkind [mailto:rochkind@jhu.edu]
> Sent: quarta-feira, 4 de Agosto de 2010 01:41
> To: solr-user@lucene.apache.org
> Subject: Re: wildcard and proximity searches
>
> Frederico Azeiteiro wrote:
> >
> >>> But it is unusual to use both leading and trailing * operator. Why
> are
> >>>
> > you doing this?
> >
> > Yes I know, but I have a few queries that need this. I'll try the
> > "ReversedWildcardFilterFactory".
> >
> >
> >
>
> ReverseWildcardFilter will help leading wildcard, but will not help
> trying to use a query with BOTH leading and trailing wildcard. it'll
> still be slow. Solr/lucene isn't good at that; I didn't even know Solr
> would do it at all in fact.
>
> If you really needed to do that, the way to play to solr/lucene's way of
>
> doing things, would be to have a field where you actually index each
> _character_ as a seperate token. Then leading and trailing wildcard
> search is basically reduced to a "phrase search", but where the words
> are actually characters.   But then you're going to get an index where
> pretty much every token belongs to every document, which Solr isn't that
>
> great at either, but then you can apply "commongram" stuff on top to
> help that out a lot too. Not quite sure what the end result will be,
> I've never tried it.  I'd only use that weird special "char as token"
> field for queries that actually required leading and trailing wildcards.
>
> Figuring out how to set up your analyzers, and what (if anything) you're
>
> going to have to do client-app-side to transform the user's query into
> something that'll end up searching like a "phrase search where each
> 'word' is a character.... is left as an exersize for the reader. :)
>
> Jonathan
>



-- 
Nipen Mark

Re: wildcard and proximity searches

Posted by Ahmet Arslan <io...@yahoo.com>.

--- On Tue, 10/5/10, Mark N <ni...@gmail.com> wrote:

> From: Mark N <ni...@gmail.com>
> Subject: Re: wildcard and proximity searches
> To: solr-user@lucene.apache.org
> Date: Tuesday, October 5, 2010, 2:30 PM
> Thanks ahmet
> 
> Is it also possible to search the document having a 
> field ENDING with
> "week*"
> 
> query should return documents with a field ending
> with  week and its
> derivatives such as weekly,weeks
> 
> So above query should return
> 
> "this week"
> "Past three weeks"
> "Report weekly"

But if you can append some artificial token to end of your field (client side maybe), you can use this plug-in for this task.  

Your modified documents/fields:

"this week END"
"Past three weeks END"
"Report weekly END"

Query "week* END" will return those three documents.



      

Re: wildcard and proximity searches

Posted by Ahmet Arslan <io...@yahoo.com>.
> Is it also possible to search the document having a 
> field ENDING with
> "week*"
> 
> query should return documents with a field ending
> with  week and its
> derivatives such as weekly,weeks
> 
> So above query should return
> 
> "this week"
> "Past three weeks"
> "Report weekly"

No this is not possible with this plugin. By the way the notation "week*" does not mean that. This is equal to week* (without quotations).

There is an ongoing discussion (title = "Begins with and ends with word") about endsWith type of search.


      

Re: wildcard and proximity searches

Posted by Mark N <ni...@gmail.com>.
Thanks ahmet

Is it also possible to search the document having a  field ENDING with
"week*"

query should return documents with a field ending with  week and its
derivatives such as weekly,weeks

So above query should return

"this week"
"Past three weeks"
"Report weekly"

thanks
chandan



On Tue, Oct 5, 2010 at 5:04 PM, Ahmet Arslan <io...@yahoo.com> wrote:

> > Also does this plugin allow us to use proximity with wild
> > card
> > *          "solr mail*"~10 *
> >
>
> Yes it supports "solr mail*"~10 kind of queries without any problem.
>
> Currently it throws exception with "mail*" kind of queries, but they are
> not valid phrase queries. Because there is only one clause inside quotation
> marks.
>
>
>
>


-- 
Nipen Mark

RE: wildcard and proximity searches

Posted by Frederico Azeiteiro <Fr...@cision.com>.
Hi Mark,
unfortanelly it's still on my ToDo list... :(.
 
I don't know if it allows "solr mail*"~10 . I hope so, as i'll need that also on the future.
 
Frederico

________________________________

De: Mark N [mailto:nipen.mark@gmail.com]
Enviada: ter 05-10-2010 11:29
Para: solr-user@lucene.apache.org
Assunto: Re: wildcard and proximity searches



Hi

were you successful in trying SOLR -1604  to allow wild card queries in
phrases ?

Also does this plugin allow us to use proximity with wild card
*          "solr mail*"~10 *

If this the right approach to go ahead to support these functionalities?

thanks
Mark





On Wed, Aug 4, 2010 at 2:24 PM, Frederico Azeiteiro <
Frederico.Azeiteiro@cision.com> wrote:

> Thanks for you ideia.
>
> At this point I'm logging each query time. My ideia is to divide my
> queries into "normal queries" and "heavy queries". I have some heavy
> queries with 1 minute or 2mintes to get results. But they have for
> instance (*word1* AND *word2* AND word3*). I guess that this will be
> always slower (could be a little faster with
> "ReversedWildcardFilterFactory") but they never be ready in a few
> seconds. For now, I just increased the timeout for those :) (using
> solrnet).
>
> My priority at the moment is the queries phrases like "word1* word2*
> word3". After this is working, I'll try to optimize the "heavy queries"
>
> Frederico
>
>
> -----Original Message-----
> From: Jonathan Rochkind [mailto:rochkind@jhu.edu]
> Sent: quarta-feira, 4 de Agosto de 2010 01:41
> To: solr-user@lucene.apache.org
> Subject: Re: wildcard and proximity searches
>
> Frederico Azeiteiro wrote:
> >
> >>> But it is unusual to use both leading and trailing * operator. Why
> are
> >>>
> > you doing this?
> >
> > Yes I know, but I have a few queries that need this. I'll try the
> > "ReversedWildcardFilterFactory".
> >
> >
> >
>
> ReverseWildcardFilter will help leading wildcard, but will not help
> trying to use a query with BOTH leading and trailing wildcard. it'll
> still be slow. Solr/lucene isn't good at that; I didn't even know Solr
> would do it at all in fact.
>
> If you really needed to do that, the way to play to solr/lucene's way of
>
> doing things, would be to have a field where you actually index each
> _character_ as a seperate token. Then leading and trailing wildcard
> search is basically reduced to a "phrase search", but where the words
> are actually characters.   But then you're going to get an index where
> pretty much every token belongs to every document, which Solr isn't that
>
> great at either, but then you can apply "commongram" stuff on top to
> help that out a lot too. Not quite sure what the end result will be,
> I've never tried it.  I'd only use that weird special "char as token"
> field for queries that actually required leading and trailing wildcards.
>
> Figuring out how to set up your analyzers, and what (if anything) you're
>
> going to have to do client-app-side to transform the user's query into
> something that'll end up searching like a "phrase search where each
> 'word' is a character.... is left as an exersize for the reader. :)
>
> Jonathan
>



--
Nipen Mark



Re: wildcard and proximity searches

Posted by Ahmet Arslan <io...@yahoo.com>.
> Also does this plugin allow us to use proximity with wild
> card
> *          "solr mail*"~10 *
> 

Yes it supports "solr mail*"~10 kind of queries without any problem. 

Currently it throws exception with "mail*" kind of queries, but they are not valid phrase queries. Because there is only one clause inside quotation marks.