You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Mark Miller <ma...@gmail.com> on 2006/08/31 12:55:13 UTC

FuzzyQurey in SpanQuery

Anyone know of a way to get a fuzzy query into a spanquery?

- Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: FuzzyQurey in SpanQuery

Posted by karl wettin <ka...@gmail.com>.
On Thu, 2006-08-31 at 17:33 -0400, Mark Miller wrote:
> Bad news for me. Any hope of a speedier fuzzy span?

I just came to think of something. Bob Carpenter posted some optimized
fuzzy code on det dev-list some time ago. According to my messurements
it was something like 15-25% faster. Don't know if it was committed.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: FuzzyQurey in SpanQuery

Posted by karl wettin <ka...@gmail.com>.
On Thu, 2006-08-31 at 17:33 -0400, Mark Miller wrote:
> 
> Bad news for me. Any hope of a speedier fuzzy span? 

Using a spell checker comes in mind.

A speedier index is another way to go. RAMDirectory is n times faster
than FSDirectory and issue 550-index is 5x faster than RAMDirectory if
you only look at fuzzyness.



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: FuzzyQurey in SpanQuery

Posted by Mark Miller <ma...@gmail.com>.
karl wettin wrote:
> On Thu, 2006-08-31 at 17:17 -0400, Mark Miller wrote:
>   
>> I want to use it for my query parser so you can do a fuzzy search
>> inside of a proximity search. Is it any slower than a standard fuzzy
>> query? 
>>     
>
> I find it to be extremly slow. All terms in the index need to be
> enumerated (or a subset if a prefix length is provided). But try it out.
> You are more than welcome to report the speed here or in the jira issue.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>   
Bad news for me. Any hope of a speedier fuzzy span?

- Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: FuzzyQurey in SpanQuery

Posted by karl wettin <ka...@gmail.com>.
On Thu, 2006-08-31 at 17:17 -0400, Mark Miller wrote:
> 
> I want to use it for my query parser so you can do a fuzzy search
> inside of a proximity search. Is it any slower than a standard fuzzy
> query? 

I find it to be extremly slow. All terms in the index need to be
enumerated (or a subset if a prefix length is provided). But try it out.
You are more than welcome to report the speed here or in the jira issue.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: FuzzyQurey in SpanQuery

Posted by Mark Miller <ma...@gmail.com>.
karl wettin wrote:
> On Thu, 2006-08-31 at 14:27 -0400, Mark Miller wrote:
>
>   
>> When is a query rewritten? I build my query and then before using it, I 
>> would like to print it out to double check it. Not possible? Does the 
>> rewrite happen inside search?
>>     
>
> Right, you can't do a toString prior to rewriting it. The problem is of
> course that the rewritten query contains lots of possible choises for
> the fuzzy term, extracted from the IndexReader passed when rewriting it.
>
> If you reallt really want to inspect the rewritten query, create a new
> instance and pass on the IndexReader. This will be slow though. In fact,
> fuzzy span is quite slow by it self.
>
> By the way, what do you plan to use it for? I use it as a very crued
> text mining classifier, never exposed to the end user.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>   
I want to use it for my query parser so you can do a fuzzy search inside 
of a proximity search. Is it any slower than a standard fuzzy query?

- Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: FuzzyQurey in SpanQuery

Posted by karl wettin <ka...@gmail.com>.
On Thu, 2006-08-31 at 14:27 -0400, Mark Miller wrote:

> When is a query rewritten? I build my query and then before using it, I 
> would like to print it out to double check it. Not possible? Does the 
> rewrite happen inside search?

Right, you can't do a toString prior to rewriting it. The problem is of
course that the rewritten query contains lots of possible choises for
the fuzzy term, extracted from the IndexReader passed when rewriting it.

If you reallt really want to inspect the rewritten query, create a new
instance and pass on the IndexReader. This will be slow though. In fact,
fuzzy span is quite slow by it self.

By the way, what do you plan to use it for? I use it as a very crued
text mining classifier, never exposed to the end user.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: FuzzyQurey in SpanQuery

Posted by Mark Miller <ma...@gmail.com>.
karl wettin wrote:
> On Thu, 2006-08-31 at 06:55 -0400, Mark Miller wrote:
>   
>> Anyone know of a way to get a fuzzy query into a spanquery?
>>     
>
> http://issues.apache.org/jira/browse/LUCENE-522
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>   
I found this:
Also it throws nullpointer if you do a toString() prior to rewriting the 
query. Perhaps thats the way it is? Didn't check it out. Just reporting 
before I forget about it.

When is a query rewritten? I build my query and then before using it, I 
would like to print it out to double check it. Not possible? Does the 
rewrite happen inside search?

- Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: FuzzyQurey in SpanQuery

Posted by Mark Miller <ma...@gmail.com>.
karl wettin wrote:
> On Thu, 2006-08-31 at 06:55 -0400, Mark Miller wrote:
>   
>> Anyone know of a way to get a fuzzy query into a spanquery?
>>     
>
> http://issues.apache.org/jira/browse/LUCENE-522
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>   
Great. Very sweet karl.

Question:
You run into problems with it crapping out on tostring with a null 
pointer exception?

- Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: FuzzyQurey in SpanQuery

Posted by karl wettin <ka...@gmail.com>.
On Thu, 2006-08-31 at 06:55 -0400, Mark Miller wrote:
> Anyone know of a way to get a fuzzy query into a spanquery?

http://issues.apache.org/jira/browse/LUCENE-522


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: FuzzyQurey in SpanQuery

Posted by mark harwood <ma...@yahoo.co.uk>.
Something like this?

Query expandedQuery=fuzzyQuery.rewrite(reader);
HashSet termsSet=new HashSet();
expandedQuery.extractTerms(termsSet);
ArrayList termsList=new ArrayList();
for (Iterator iter = termsSet.iterator(); iter.hasNext();)
 {
      Term term = (Term) iter.next();
      SpanTermQuery stq=new SpanTermQuery(term);
       termsList.add(stq);       
 }
 SpanOrQuery soq=new SpanOrQuery((SpanQuery[]) termsList.toArray(new SpanQuery[termsList.size()]));

I imagine this general approach should work for other multi-term queries eg wildcards too.


----- Original Message ----
From: Mark Miller <ma...@gmail.com>
To: java-user@lucene.apache.org
Sent: Thursday, 31 August, 2006 11:55:13 AM
Subject: FuzzyQurey in SpanQuery

Anyone know of a way to get a fuzzy query into a spanquery?

- Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org