You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Trejkaz <tr...@trypticon.org> on 2013/01/31 14:00:20 UTC

Migrating from using doc IDs to using application IDs from the FieldCache

Hi all.

We have an application which has been around for so long that it's
still using doc IDs to key to an external database.

Obviously this won't work forever (even in Lucene 3.x we had to use a
custom merge policy to keep it working) so we want to introduce
application IDs eventually. We have two potential paths here:

  1. Use our existing GUID field as the new unique ID and migrate to
that (with the huge drawback that indexing GUID columns on the
database is much, much slower than the current int column.)
  2. Introduce a new application ID (either int or long) and somehow
insert that into every document.

The time it would take to pull every document out, add a field for the
ID and index it again is too long to be desirable, so I'm wondering:
is there some way we could add this field to every document without
reindexing? Keeping Lucene 3.x as the limitation, that is... our
2.x-created indexes presumably won't open at all in 4.x without
significant work.

TX

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Migrating from using doc IDs to using application IDs from the FieldCache

Posted by Michael McCandless <lu...@mikemccandless.com>.
Unfortunately, t's not possible/easy to just add one new field to all
existing docs ... there are several issues open to do this, eg see
https://issues.apache.org/jira/browse/LUCENE-4258 and LUCENE-3837 and
LUCENE-4272.

Mike McCandless

http://blog.mikemccandless.com

On Thu, Jan 31, 2013 at 8:00 AM, Trejkaz <tr...@trypticon.org> wrote:
> Hi all.
>
> We have an application which has been around for so long that it's
> still using doc IDs to key to an external database.
>
> Obviously this won't work forever (even in Lucene 3.x we had to use a
> custom merge policy to keep it working) so we want to introduce
> application IDs eventually. We have two potential paths here:
>
>   1. Use our existing GUID field as the new unique ID and migrate to
> that (with the huge drawback that indexing GUID columns on the
> database is much, much slower than the current int column.)
>   2. Introduce a new application ID (either int or long) and somehow
> insert that into every document.
>
> The time it would take to pull every document out, add a field for the
> ID and index it again is too long to be desirable, so I'm wondering:
> is there some way we could add this field to every document without
> reindexing? Keeping Lucene 3.x as the limitation, that is... our
> 2.x-created indexes presumably won't open at all in 4.x without
> significant work.
>
> TX
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org