You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Erick Erickson <er...@gmail.com> on 2007/07/20 16:04:35 UTC

Re: TermEnum - previous() method ?

There's nothing that I know of built in to Lucene that allows you
to do anything like termenum.previous(), I'm pretty sure you
have to roll your own.

Some possibilities:

1> Depending upon how many terms in your index, you could
just read all the authors into a Java collection of your choice
at startup time and reference that list.

2> You could just iterate from the beginning of the list every
time, storing the N elements somewhere and stop when
you get to the ones you need. Do you have any evidence
that this wouldn't work for you? I was surprised how quickly this
worked.

3> A variant of <2> would be to choose some arbitrary term that's
less than your target, skip to that one, and iterate forward. For instance,
if you were looking for the terms previous to "Erickson", you could
skip to "Eg", and iterate forward. You'd have to do some heuristics
in case there was, say, only one author between "Eg" and "Erickson".

4> You could assemble a list of, say, every 100th term, either at
startup time or as a permanent part of your index. When you had
to do a previous, skip to the appropriate term and iterate forward.
Remember that not all documents need to have the same fields,
so you can have a "super special document" that contains some
list like this.

I'd recommend that you take some simple timings of iterating
through the entire list to see if you need to get fancy or not. The
critical piece of information you've omitted is any indication of
how many terms you expect to have to iterate over. If it's
10,000 terms, the simple solution will almost certainly work for
you. If it's 10,000,000,000 then you have to do some fancier
dancing.

Best
Erick

On 7/20/07, muraalee <mu...@gmail.com> wrote:
>
>
> Hi Mark,
> Thanks for your inputs.
>
> >> I do wonder why you want a previous though? It sounds like you might be
> >> better off heading down a different path...
>
> In our content, we have indexed Author as separate field. We want to
> expose
> a feature to  'browse' the Author list. They can type any author name and
> do
> a locate from there onwards.. While doing that they need to navigate
> 'previous' & 'next' author list.
>
> e.g AU:Rowling
>
> sample code snippet:
> TermEnum browseTermEnum = indexReader.terms(new Term("AU", "rowling"));
> while(browseTermEnum.next()){
>   System.out.println( browseTermEnum.term().text());
> }
>
> Navigating the next 10 authors is straight foward becuase we do that by
> calling browseTermEnum.next(), but we couldn't do previous. I did consider
> the approach of all terms while navigation. But the problem is, when we do
> a
> lookup of term like 'rowling', we didn't iterate from the start of the
> list,
> hence we may not have the previous terms..
>
> Any thoughts ?
>
>
> markrmiller wrote:
> >
> > I am not very familiar with the Lucene file formats, but I think that
> > there is a lot of "this number tells you how far ahead to read" when
> > enumerating terms. As you might guess, I think this lends toward reading
> > the terms file forward. Not that an index couldn't point you into the
> > terms index somehow (a meta index?). It would make just as much sense
> (or
> > more) to buffer all of the terms to allow a previous though. Depending
> on
> > your RAM and index size, this could be an option.
> >
> > I do wonder why you want a previous though? It sounds like you might be
> > better off heading down a different path...
> >
> > - Mark
> >
> >
> >
> > muraalee wrote:
> >>
> >> Hi All,
> >> I searched in this forum for anybody looking for need for previous()
> >> method in TermEnum. I found only this link
> >>
> http://www.nabble.com/How-to-navigate-through-indexed-terms-tf28148.html#a189225
> >>
> >> Would it be possible to implement previous() method ? I know i am
> asking
> >> for quick solution here ;) Just i want to ensure if it not implemented,
> >> there might be a reason. So i can consider alternates approaches to
> >> implement similar feature..
> >>
> >> appreciate your thoughts...
> >>
> >> Thanks
> >> Murali V
> >>
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/TermEnum----previous%28%29-method---tf4107296.html#a11707160
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: TermEnum - previous() method ?

Posted by Erick Erickson <er...@gmail.com>.

You know, every time I see a solution like this, I have to smack
myself upside the head. As an old C programmer, I have a real
problem "wasting", say,  a few tens of megabytes just to make
my life easier. Of disk space. That costs virtually nothing. And
(as you say, assuming some constraints) doesn't change my
search response time. And won't cost the company time tracing
down bugs. Which will let me work on something *else* that's
might generate revenue.

Siiigggghhhh. Sometime I'll get into the modern world and actually
think of this kind of solution myself.

Erick

On 7/20/07, Will Johnson <wj...@getconnected.com> wrote:
>
> One other possibility I've used in the past:
>
> Store 2 fields, one with the normal characters, one with each character
> inversed, A->Z, Y->B, and so on.  As long as you have the function to go
> from the normal->reversed and reversed->normal you can iterate over both
> fields in a forward only mode and just reverse out the strings on the
> display side.
>
> This method makes a number of assumptions about index size constraints,
> character sets; ie ymmv.
>
> - will
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: Friday, July 20, 2007 10:05 AM
> To: java-user@lucene.apache.org
> Subject: Re: TermEnum - previous() method ?
>
> There's nothing that I know of built in to Lucene that allows you
> to do anything like termenum.previous(), I'm pretty sure you
> have to roll your own.
>
> Some possibilities:
>
> 1> Depending upon how many terms in your index, you could
> just read all the authors into a Java collection of your choice
> at startup time and reference that list.
>
> 2> You could just iterate from the beginning of the list every
> time, storing the N elements somewhere and stop when
> you get to the ones you need. Do you have any evidence
> that this wouldn't work for you? I was surprised how quickly this
> worked.
>
> 3> A variant of <2> would be to choose some arbitrary term that's
> less than your target, skip to that one, and iterate forward. For
> instance,
> if you were looking for the terms previous to "Erickson", you could
> skip to "Eg", and iterate forward. You'd have to do some heuristics
> in case there was, say, only one author between "Eg" and "Erickson".
>
> 4> You could assemble a list of, say, every 100th term, either at
> startup time or as a permanent part of your index. When you had
> to do a previous, skip to the appropriate term and iterate forward.
> Remember that not all documents need to have the same fields,
> so you can have a "super special document" that contains some
> list like this.
>
> I'd recommend that you take some simple timings of iterating
> through the entire list to see if you need to get fancy or not. The
> critical piece of information you've omitted is any indication of
> how many terms you expect to have to iterate over. If it's
> 10,000 terms, the simple solution will almost certainly work for
> you. If it's 10,000,000,000 then you have to do some fancier
> dancing.
>
> Best
> Erick
>
> On 7/20/07, muraalee <mu...@gmail.com> wrote:
> >
> >
> > Hi Mark,
> > Thanks for your inputs.
> >
> > >> I do wonder why you want a previous though? It sounds like you
> might be
> > >> better off heading down a different path...
> >
> > In our content, we have indexed Author as separate field. We want to
> > expose
> > a feature to  'browse' the Author list. They can type any author name
> and
> > do
> > a locate from there onwards.. While doing that they need to navigate
> > 'previous' & 'next' author list.
> >
> > e.g AU:Rowling
> >
> > sample code snippet:
> > TermEnum browseTermEnum = indexReader.terms(new Term("AU",
> "rowling"));
> > while(browseTermEnum.next()){
> >   System.out.println( browseTermEnum.term().text());
> > }
> >
> > Navigating the next 10 authors is straight foward becuase we do that
> by
> > calling browseTermEnum.next(), but we couldn't do previous. I did
> consider
> > the approach of all terms while navigation. But the problem is, when
> we do
> > a
> > lookup of term like 'rowling', we didn't iterate from the start of the
> > list,
> > hence we may not have the previous terms..
> >
> > Any thoughts ?
> >
> >
> > markrmiller wrote:
> > >
> > > I am not very familiar with the Lucene file formats, but I think
> that
> > > there is a lot of "this number tells you how far ahead to read" when
> > > enumerating terms. As you might guess, I think this lends toward
> reading
> > > the terms file forward. Not that an index couldn't point you into
> the
> > > terms index somehow (a meta index?). It would make just as much
> sense
> > (or
> > > more) to buffer all of the terms to allow a previous though.
> Depending
> > on
> > > your RAM and index size, this could be an option.
> > >
> > > I do wonder why you want a previous though? It sounds like you might
> be
> > > better off heading down a different path...
> > >
> > > - Mark
> > >
> > >
> > >
> > > muraalee wrote:
> > >>
> > >> Hi All,
> > >> I searched in this forum for anybody looking for need for
> previous()
> > >> method in TermEnum. I found only this link
> > >>
> >
> http://www.nabble.com/How-to-navigate-through-indexed-terms-tf28148.html
> #a189225
> > >>
> > >> Would it be possible to implement previous() method ? I know i am
> > asking
> > >> for quick solution here ;) Just i want to ensure if it not
> implemented,
> > >> there might be a reason. So i can consider alternates approaches to
> > >> implement similar feature..
> > >>
> > >> appreciate your thoughts...
> > >>
> > >> Thanks
> > >> Murali V
> > >>
> > >
> > >
> >
> > --
> > View this message in context:
> >
> http://www.nabble.com/TermEnum----previous%28%29-method---tf4107296.html
> #a11707160
> > Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

RE: TermEnum - previous() method ?

Posted by Will Johnson <wj...@GETCONNECTED.COM>.

One other possibility I've used in the past:

Store 2 fields, one with the normal characters, one with each character
inversed, A->Z, Y->B, and so on.  As long as you have the function to go
from the normal->reversed and reversed->normal you can iterate over both
fields in a forward only mode and just reverse out the strings on the
display side.  

This method makes a number of assumptions about index size constraints,
character sets; ie ymmv.

- will  

-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com] 
Sent: Friday, July 20, 2007 10:05 AM
To: java-user@lucene.apache.org
Subject: Re: TermEnum - previous() method ?

There's nothing that I know of built in to Lucene that allows you
to do anything like termenum.previous(), I'm pretty sure you
have to roll your own.

Some possibilities:

1> Depending upon how many terms in your index, you could
just read all the authors into a Java collection of your choice
at startup time and reference that list.

2> You could just iterate from the beginning of the list every
time, storing the N elements somewhere and stop when
you get to the ones you need. Do you have any evidence
that this wouldn't work for you? I was surprised how quickly this
worked.

3> A variant of <2> would be to choose some arbitrary term that's
less than your target, skip to that one, and iterate forward. For
instance,
if you were looking for the terms previous to "Erickson", you could
skip to "Eg", and iterate forward. You'd have to do some heuristics
in case there was, say, only one author between "Eg" and "Erickson".

4> You could assemble a list of, say, every 100th term, either at
startup time or as a permanent part of your index. When you had
to do a previous, skip to the appropriate term and iterate forward.
Remember that not all documents need to have the same fields,
so you can have a "super special document" that contains some
list like this.

I'd recommend that you take some simple timings of iterating
through the entire list to see if you need to get fancy or not. The
critical piece of information you've omitted is any indication of
how many terms you expect to have to iterate over. If it's
10,000 terms, the simple solution will almost certainly work for
you. If it's 10,000,000,000 then you have to do some fancier
dancing.

Best
Erick

On 7/20/07, muraalee <mu...@gmail.com> wrote:
>
>
> Hi Mark,
> Thanks for your inputs.
>
> >> I do wonder why you want a previous though? It sounds like you
might be
> >> better off heading down a different path...
>
> In our content, we have indexed Author as separate field. We want to
> expose
> a feature to  'browse' the Author list. They can type any author name
and
> do
> a locate from there onwards.. While doing that they need to navigate
> 'previous' & 'next' author list.
>
> e.g AU:Rowling
>
> sample code snippet:
> TermEnum browseTermEnum = indexReader.terms(new Term("AU",
"rowling"));
> while(browseTermEnum.next()){
>   System.out.println( browseTermEnum.term().text());
> }
>
> Navigating the next 10 authors is straight foward becuase we do that
by
> calling browseTermEnum.next(), but we couldn't do previous. I did
consider
> the approach of all terms while navigation. But the problem is, when
we do
> a
> lookup of term like 'rowling', we didn't iterate from the start of the
> list,
> hence we may not have the previous terms..
>
> Any thoughts ?
>
>
> markrmiller wrote:
> >
> > I am not very familiar with the Lucene file formats, but I think
that
> > there is a lot of "this number tells you how far ahead to read" when
> > enumerating terms. As you might guess, I think this lends toward
reading
> > the terms file forward. Not that an index couldn't point you into
the
> > terms index somehow (a meta index?). It would make just as much
sense
> (or
> > more) to buffer all of the terms to allow a previous though.
Depending
> on
> > your RAM and index size, this could be an option.
> >
> > I do wonder why you want a previous though? It sounds like you might
be
> > better off heading down a different path...
> >
> > - Mark
> >
> >
> >
> > muraalee wrote:
> >>
> >> Hi All,
> >> I searched in this forum for anybody looking for need for
previous()
> >> method in TermEnum. I found only this link
> >>
>
http://www.nabble.com/How-to-navigate-through-indexed-terms-tf28148.html
#a189225
> >>
> >> Would it be possible to implement previous() method ? I know i am
> asking
> >> for quick solution here ;) Just i want to ensure if it not
implemented,
> >> there might be a reason. So i can consider alternates approaches to
> >> implement similar feature..
> >>
> >> appreciate your thoughts...
> >>
> >> Thanks
> >> Murali V
> >>
> >
> >
>
> --
> View this message in context:
>
http://www.nabble.com/TermEnum----previous%28%29-method---tf4107296.html
#a11707160
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org