You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by hao yan <hy...@gmail.com> on 2014/03/05 18:47:19 UTC

Under what circumtances, termsEnum's next(), or seekExact(), o seekCeli() is more efficient?

Hey, guys

Two questions:
1.  From our experiments, we find next() is very costly. In TermsEnum's
javadoc,  it said that
TermsEnum is unpositioned when you first obtain it and you must
successfully call next or one of seek* methods. From my experiments, I find
that next() is costly.

My question is:

Can we only use seekExact() and never use seekCeil or next , in order to
improve performance ?

thanks!

hao

Re: Under what circumtances, termsEnum's next(), or seekExact(), o seekCeli() is more efficient?

Posted by hao yan <hy...@gmail.com>.
Hey, Michael

thanks!

it is close to what I think.

hao


On Fri, Mar 7, 2014 at 9:55 AM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> On Wed, Mar 5, 2014 at 4:34 PM, hao yan <hy...@gmail.com> wrote:
> > Hi, Michael
> >
> > 1.We find actually both are costly. I am not sure what is the difference
> btw
> > "
> > first next only once + seekExact from then on" and "always seekExact".  I
> > mean, the first call of "next" and the first call of seekExact, they are
> > different? If what next() does is to load a block of data and position to
> > the beginning of th block and seekExact() is to load a block and
> position to
> > the target, then next() should be more efficient, right?
>
> The first next() call is not that different from seekExact: it must
> load the block containing the first term and read bytes from it.
>
> After that, next() should be cheaper than seekExact.
>
> > 2. Is multiFields/multiTerms/multiTermsEnum efficient ? We have a fixed
> > number ( three) segments always. We want to search on the three segments
> for
> > each query. Therefore we borrowed most of the code of multixxx.  Is there
> > anyway to optimize this?
>
> They are relatively efficient?  I mean, they must merge-sort the
> terms, and manage N segments that might have a term under the hood,
> but it's the best we can do (unless you can forceMerge).
>
> But it's better to operate per-segment if you care about performance.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Re: Under what circumtances, termsEnum's next(), or seekExact(), o seekCeli() is more efficient?

Posted by Michael McCandless <lu...@mikemccandless.com>.
On Wed, Mar 5, 2014 at 4:34 PM, hao yan <hy...@gmail.com> wrote:
> Hi, Michael
>
> 1.We find actually both are costly. I am not sure what is the difference btw
> "
> first next only once + seekExact from then on" and "always seekExact".  I
> mean, the first call of "next" and the first call of seekExact, they are
> different? If what next() does is to load a block of data and position to
> the beginning of th block and seekExact() is to load a block and position to
> the target, then next() should be more efficient, right?

The first next() call is not that different from seekExact: it must
load the block containing the first term and read bytes from it.

After that, next() should be cheaper than seekExact.

> 2. Is multiFields/multiTerms/multiTermsEnum efficient ? We have a fixed
> number ( three) segments always. We want to search on the three segments for
> each query. Therefore we borrowed most of the code of multixxx.  Is there
> anyway to optimize this?

They are relatively efficient?  I mean, they must merge-sort the
terms, and manage N segments that might have a term under the hood,
but it's the best we can do (unless you can forceMerge).

But it's better to operate per-segment if you care about performance.

Mike McCandless

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Under what circumtances, termsEnum's next(), or seekExact(), o seekCeli() is more efficient?

Posted by hao yan <hy...@gmail.com>.
Hi, Michael

1.We find actually both are costly. I am not sure what is the difference
btw "
first next only once + seekExact from then on" and "always seekExact".  I
mean, the first call of "next" and the first call of seekExact, they are
different? If what next() does is to load a block of data and position to
the beginning of th block and seekExact() is to load a block and position
to the target, then next() should be more efficient, right?



2. Is multiFields/multiTerms/multiTermsEnum efficient ? We have a fixed
number ( three) segments always. We want to search on the three segments
for each query. Therefore we borrowed most of the code of multixxx.  Is
there anyway to optimize this?

Please advise.
thanks!

hao


On Wed, Mar 5, 2014 at 10:21 AM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> Hmm I see only one question?
>
> Sure, after pulling a new TermsEnum, you can use .seekExact to
> position it, instead of next, but you must have a term in mind to
> "target" when you call seekExact.
>
> seekExact ought to be faster than seekCeil, since it's not required to
> position the enum if the term was not found.
>
> But next() ought to be faster than seekExact; you're seeing the opposite?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Wed, Mar 5, 2014 at 12:47 PM, hao yan <hy...@gmail.com> wrote:
> > Hey, guys
> >
> > Two questions:
> > 1.  From our experiments, we find next() is very costly. In TermsEnum's
> > javadoc,  it said that
> > TermsEnum is unpositioned when you first obtain it and you must
> successfully
> > call next or one of seek* methods. From my experiments, I find that
> next()
> > is costly.
> >
> > My question is:
> >
> > Can we only use seekExact() and never use seekCeil or next , in order to
> > improve performance ?
> >
> > thanks!
> >
> > hao
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Re: Under what circumtances, termsEnum's next(), or seekExact(), o seekCeli() is more efficient?

Posted by Michael McCandless <lu...@mikemccandless.com>.
Hmm I see only one question?

Sure, after pulling a new TermsEnum, you can use .seekExact to
position it, instead of next, but you must have a term in mind to
"target" when you call seekExact.

seekExact ought to be faster than seekCeil, since it's not required to
position the enum if the term was not found.

But next() ought to be faster than seekExact; you're seeing the opposite?

Mike McCandless

http://blog.mikemccandless.com


On Wed, Mar 5, 2014 at 12:47 PM, hao yan <hy...@gmail.com> wrote:
> Hey, guys
>
> Two questions:
> 1.  From our experiments, we find next() is very costly. In TermsEnum's
> javadoc,  it said that
> TermsEnum is unpositioned when you first obtain it and you must successfully
> call next or one of seek* methods. From my experiments, I find that next()
> is costly.
>
> My question is:
>
> Can we only use seekExact() and never use seekCeil or next , in order to
> improve performance ?
>
> thanks!
>
> hao
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org