You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Jason Calabrese <ma...@jasoncalabrese.com> on 2006/07/06 02:06:29 UTC

Inserting a document into an index at a specified position

All,

For performance reasons we keep our index of over a million documents ordered 
alphabeticaly.  This way for an alpha sort we can just use the index order.  
This works very good, but I'm now looking for a way to insert a single 
document to the index in the correct position.  

Is there any standard way to do this?

--Jason

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Inserting a document into an index at a specified position

Posted by Jason Calabrese <ma...@jasoncalabrese.com>.
We only display the 10 hits at a time, so we don't need to iterate through all 
the hits.

It feels like there should be a way to pull a document out 1 index and stick 
it into an other and bring all the unstored fields along with it.

On Friday 07 July 2006 12:52, Erick Erickson wrote:
>  Did you use a Hits object to assemble your results? And is that what
> you're measuring when you say it's slow? In other words, were you measuring
> the time it took to execute the statement
>
> Hits hits = searcher.search(query, new Sort("fullname"));
>
> or the time it took to iterate over the Hits object and do something? If
> the latter, your problem may really be the fact that the Hits object
> re-issues the search every 100 retrievals or so (this has been discussed in
> the mail archive...) and you'd get satisfactory performance by using a
> lower-level interface HitCollector(?) TopDocs(?).
>
> Otherwise, I haven't a clue, but you probably already realized that...
>
> Best
> Erick

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Inserting a document into an index at a specified position

Posted by Erick Erickson <er...@gmail.com>.
 Did you use a Hits object to assemble your results? And is that what you're
measuring when you say it's slow? In other words, were you measuring the
time it took to execute the statement

Hits hits = searcher.search(query, new Sort("fullname"));

or the time it took to iterate over the Hits object and do something? If the
latter, your problem may really be the fact that the Hits object re-issues
the search every 100 retrievals or so (this has been discussed in the mail
archive...) and you'd get satisfactory performance by using a lower-level
interface HitCollector(?) TopDocs(?).

Otherwise, I haven't a clue, but you probably already realized that...

Best
Erick

Re: Inserting a document into an index at a specified position

Posted by Jason Calabrese <ma...@jasoncalabrese.com>.
> When you say you keep your documents ordered alphabetically, it's confusing
> to me. Are you saying that you pre-sort all your documents then insert them
> one after another so that automatically-generated internal Lucene ID maps
> exactly to the alphabetical ordering? That is, for any document IDs D1 and
> D2 and any documents C1 and C2 (where C1 and C2 are the alphabetical
> representations of the documents, whatever that means) if D1 < D2 then C1 <
> C2?

Yes, this is a pre-sort. For our application we have some fairly large result 
sets and using the standard sort on a name field was too slow.  By 
pre-sorting before we index we can make sure that all the docs are inserted 
in alpha order, and then sort them by index order just as fast or faster than 
the standard relvance sort.

This:
Hits hits = searcher.search(query, Sort.INDEXORDER);

is much faster than:
Hits hits = searcher.search(query, new Sort("fullname"));

> The short answer is that you can't insert a document into a Lucene index
> and have any control whatsoever about the assigned document ID. The
> assigned document ID is always greater than the maximum document ID already
> in your index.

I know that there is no direct way to insert a doc a at a specified position 
with a single IndexWriter method, but it seems that there is a better way 
then reindexing everything.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Inserting a document into an index at a specified position

Posted by Erick Erickson <er...@gmail.com>.
When you say you keep your documents ordered alphabetically, it's confusing
to me. Are you saying that you pre-sort all your documents then insert them
one after another so that automatically-generated internal Lucene ID maps
exactly to the alphabetical ordering? That is, for any document IDs D1 and
D2 and any documents C1 and C2 (where C1 and C2 are the alphabetical
representations of the documents, whatever that means) if D1 < D2 then C1 <
C2?

The short answer is that you can't insert a document into a Lucene index and
have any control whatsoever about the assigned document ID. The assigned
document ID is always greater than the maximum document ID already in your
index.

But it doesn't make sense to try. You have documents A, B, D that you index.
They get IDs 1, 2, 3. Now you want to index document C. What sort of
document ID would you expect? 2.5? Or do I completely misunderstand your
problem?

Would it work to just index a field for each document that contained the
alphabetical representation and use that for retrieval ordering? I *think*
you can use a FilteredTermEnum with a new Term("field", "") to enumerate all
the terms in an index ( They are guaranteed to be in lexical order.....).
Then you let lucene do your sorting... I'm a little fuzzy on how to go from
there to a document, but I suspect there's a way.

Hope this helps
Erick

Re: Inserting a document into an index at a specified position

Posted by Jason Calabrese <ma...@jasoncalabrese.com>.
All,

I sent this the other day, but didn't get any responses.  I'm hoping that it 
was just missed, so I'm trying again.

There has to be a better way to to insert a document in to an index then 
reindexing everything.

--Jason

On Wednesday 05 July 2006 5:06 pm, Jason Calabrese wrote:
> All,
>
> For performance reasons we keep our index of over a million documents
> ordered alphabeticaly.  This way for an alpha sort we can just use the
> index order. This works very good, but I'm now looking for a way to insert
> a single document to the index in the correct position.
>
> Is there any standard way to do this?
>
> --Jason

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org