You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by ya...@bloglines.com on 2005/06/25 02:00:18 UTC

Span query performance issue

Hi,

I'm comparing SpanNearQuery to PhraseQuery results and noticing about
an 8x difference on Linux.  Is a SpanNearQuery doing 8x as much work?  


I'm considering diving into the code if the results sounds unusual to people.
 But if its really doing that much more work, I won't spend time optimizing
something that can't get much faster.

Thanks.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Span query performance issue

Posted by Paul Elschot <pa...@xs4all.nl>.
On Saturday 25 June 2005 04:26, jian chen wrote:
> Hi,
> 
> I think Span query in general should do more work than simple Phrase
> query. Phrase query, in its simplest form, should just try to find all
> terms that are adjacent to each other. Meanwhile, Span query does not
> necessary be adjacent to each other, but, with other words in between.
> 
> Therefore, I think Span query deserves to be slower than Phrase query.
> This said, Span query is way more powerful than Phrase query.
> 
> Jian
> 
> On 25 Jun 2005 00:00:18 -0000, yahootintin.11533894@bloglines.com
> <ya...@bloglines.com> wrote:
> > Hi,
> > 
> > I'm comparing SpanNearQuery to PhraseQuery results and noticing about
> > an 8x difference on Linux.  Is a SpanNearQuery doing 8x as much work?
> > 
> > 
> > I'm considering diving into the code if the results sounds unusual to 
people.
> >  But if its really doing that much more work, I won't spend time 
optimizing
> > something that can't get much faster.

The main difference is in the extra generality of Spans over positions.
Spans have a begin position and an end position.
Matching two Spans for  the terms of a phrase requires testing both
their begin positions and their end positions, even though they differ
only by a constant for the same term.
Spans also carry around their current document number and this may
involve some more redundancies when finding finding the matches
within a single document.
Also, for exact matches (zero slop) PhraseQuery uses a separate scorer
that takes full advantage of the special case.
So, when the generality of the Spans is not needed, one should always
try and use a PhraseQuery. 

I'm not surprised that SpanNearQuery is slower than PhraseQuery,
and I'd expect a factor 3-4 between them. The factor 8 might indicate that
there is some room for improvement in the span package.
(I'd expect the CellQueue in NearSpans to be the bottleneck.)

Regards,
Paul Elschot


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Span query performance issue

Posted by jian chen <ch...@gmail.com>.
Hi,

I think Span query in general should do more work than simple Phrase
query. Phrase query, in its simplest form, should just try to find all
terms that are adjacent to each other. Meanwhile, Span query does not
necessary be adjacent to each other, but, with other words in between.

Therefore, I think Span query deserves to be slower than Phrase query.
This said, Span query is way more powerful than Phrase query.

Jian

On 25 Jun 2005 00:00:18 -0000, yahootintin.11533894@bloglines.com
<ya...@bloglines.com> wrote:
> Hi,
> 
> I'm comparing SpanNearQuery to PhraseQuery results and noticing about
> an 8x difference on Linux.  Is a SpanNearQuery doing 8x as much work?
> 
> 
> I'm considering diving into the code if the results sounds unusual to people.
>  But if its really doing that much more work, I won't spend time optimizing
> something that can't get much faster.
> 
> Thanks.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org