You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Gauri Shankar <gs...@gmail.com> on 2008/02/09 13:38:14 UTC

how to get the programmatic control over index's document id

Hi,

I would like to get the control over the docId field from my code. Can
anyone suggest some way for doing the same?


-- 
Warm Regards,
Gauri Shankar

Re: how to get the programmatic control over index's document id

Posted by Patrick Turcotte <pa...@gmail.com>.
Add a field to your document.

document.add(new Field("id", idString));

Or something like that. (Don't have the doc handy right now).

Hope this helps.

Patrick

On Feb 9, 2008 7:38 AM, Gauri Shankar <gs...@gmail.com> wrote:
> Hi,
>
> I would like to get the control over the docId field from my code. Can
> anyone suggest some way for doing the same?
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: how to get the programmatic control over index's document id

Posted by Erick Erickson <er...@gmail.com>.
I'm a little confused about what's happening here

Are you inserting a pre-existing primary key from the DB into your
Lucene documents?

Or are you using the Lucene doc ID as your primary key in the database?
If this latter, you're going to have many, many problems since the Lucene
ID can, and will, change. Sometimes surprisingly. So in this case speeding
up your app is the least of your problems <G>....

Assuming you're doing the former, you don't have to call getDocument to get
the DB primary key from your Lucene document. Look at TermEnum/TermDocs
which can give you the Lucene doc ID *from the index* rather than from the
document.
Of course, you'll have to index the DB PK in your Lucene document, but that
doesn't take much space....

And TermDocs/TermEnum is MUCH faster than getDocument....

But what problem are you really trying to solve? It sounds like you're
searching the
Lucene index, then trying to use the DB primary keys that you get back from
the search to do something in the DB. If this isn't the case, perhaps you
could explain the problem you're trying to solve a little more.

Best
Erick


On Thu, Feb 14, 2008 at 6:59 AM, Gauri Shankar <gs...@gmail.com>
wrote:

> Thanks a lot for both of you.
>
> yes, I am talking about internally assigned document id.
>
> Erick : I am already using the unique id into the index mapped to one of
> our
> DB's primary key to uniquely identify the docs from index.  Now to get the
> value of this unique field i need to call getDocumet(). But when the
> resultset is too large than this step is very slow as I am calling
> getDocument for each hit. Now I am using the FIELDCACHE and that has
> improved a lot. But I am thinking If I can find a way so that docId can be
> my unique ids and that will super optimize our search.
> Any suggestions??
>
> Thanks,
> Gauri Shankar
>
> On Sat, Feb 9, 2008 at 9:10 PM, Erick Erickson <er...@gmail.com>
> wrote:
>
> > If you're referring to the internally-assigned document id, I don't
> think
> > there is a way. Assuming you're trying to assign one yourself or some
> > such.
> >
> > From all the discussions I've seen, I don't think there's even a faint
> > possibility that controlling this will be added to Lucene. Note that
> > existing IDs change as your index changes.
> >
> > Why do you care? What problem are you trying to solve? One common
> > suggestion is to create your own field (as Patrick suggests) that
> contains
> > your own unique ID. Using TermEnum/TermDocs will give you efficient
> > ways of going from your unique ID to a docID...
> >
> > Best
> > Erick
> >
> > On Feb 9, 2008 7:38 AM, Gauri Shankar <gs...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > I would like to get the control over the docId field from my code. Can
> > > anyone suggest some way for doing the same?
> > >
> > >
> > > --
> > > Warm Regards,
> > > Gauri Shankar
> > >
> >
>
>
>
> --
> Warm Regards,
> Gauri Shankar
>

Re: how to get the programmatic control over index's document id

Posted by John Wang <jo...@gmail.com>.
There was a thread on this exact issue.If you are using 2.3, the payload api
would help with that.

-John

On Thu, Feb 14, 2008 at 3:59 AM, Gauri Shankar <gs...@gmail.com>
wrote:

> Thanks a lot for both of you.
>
> yes, I am talking about internally assigned document id.
>
> Erick : I am already using the unique id into the index mapped to one of
> our
> DB's primary key to uniquely identify the docs from index.  Now to get the
> value of this unique field i need to call getDocumet(). But when the
> resultset is too large than this step is very slow as I am calling
> getDocument for each hit. Now I am using the FIELDCACHE and that has
> improved a lot. But I am thinking If I can find a way so that docId can be
> my unique ids and that will super optimize our search.
> Any suggestions??
>
> Thanks,
> Gauri Shankar
>
> On Sat, Feb 9, 2008 at 9:10 PM, Erick Erickson <er...@gmail.com>
> wrote:
>
> > If you're referring to the internally-assigned document id, I don't
> think
> > there is a way. Assuming you're trying to assign one yourself or some
> > such.
> >
> > From all the discussions I've seen, I don't think there's even a faint
> > possibility that controlling this will be added to Lucene. Note that
> > existing IDs change as your index changes.
> >
> > Why do you care? What problem are you trying to solve? One common
> > suggestion is to create your own field (as Patrick suggests) that
> contains
> > your own unique ID. Using TermEnum/TermDocs will give you efficient
> > ways of going from your unique ID to a docID...
> >
> > Best
> > Erick
> >
> > On Feb 9, 2008 7:38 AM, Gauri Shankar <gs...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > I would like to get the control over the docId field from my code. Can
> > > anyone suggest some way for doing the same?
> > >
> > >
> > > --
> > > Warm Regards,
> > > Gauri Shankar
> > >
> >
>
>
>
> --
> Warm Regards,
> Gauri Shankar
>

Re: how to get the programmatic control over index's document id

Posted by Gauri Shankar <gs...@gmail.com>.
Thanks a lot for both of you.

yes, I am talking about internally assigned document id.

Erick : I am already using the unique id into the index mapped to one of our
DB's primary key to uniquely identify the docs from index.  Now to get the
value of this unique field i need to call getDocumet(). But when the
resultset is too large than this step is very slow as I am calling
getDocument for each hit. Now I am using the FIELDCACHE and that has
improved a lot. But I am thinking If I can find a way so that docId can be
my unique ids and that will super optimize our search.
Any suggestions??

Thanks,
Gauri Shankar

On Sat, Feb 9, 2008 at 9:10 PM, Erick Erickson <er...@gmail.com>
wrote:

> If you're referring to the internally-assigned document id, I don't think
> there is a way. Assuming you're trying to assign one yourself or some
> such.
>
> From all the discussions I've seen, I don't think there's even a faint
> possibility that controlling this will be added to Lucene. Note that
> existing IDs change as your index changes.
>
> Why do you care? What problem are you trying to solve? One common
> suggestion is to create your own field (as Patrick suggests) that contains
> your own unique ID. Using TermEnum/TermDocs will give you efficient
> ways of going from your unique ID to a docID...
>
> Best
> Erick
>
> On Feb 9, 2008 7:38 AM, Gauri Shankar <gs...@gmail.com> wrote:
>
> > Hi,
> >
> > I would like to get the control over the docId field from my code. Can
> > anyone suggest some way for doing the same?
> >
> >
> > --
> > Warm Regards,
> > Gauri Shankar
> >
>



-- 
Warm Regards,
Gauri Shankar

Re: how to get the programmatic control over index's document id

Posted by Erick Erickson <er...@gmail.com>.
If you're referring to the internally-assigned document id, I don't think
there is a way. Assuming you're trying to assign one yourself or some
such.

>From all the discussions I've seen, I don't think there's even a faint
possibility that controlling this will be added to Lucene. Note that
existing IDs change as your index changes.

Why do you care? What problem are you trying to solve? One common
suggestion is to create your own field (as Patrick suggests) that contains
your own unique ID. Using TermEnum/TermDocs will give you efficient
ways of going from your unique ID to a docID...

Best
Erick

On Feb 9, 2008 7:38 AM, Gauri Shankar <gs...@gmail.com> wrote:

> Hi,
>
> I would like to get the control over the docId field from my code. Can
> anyone suggest some way for doing the same?
>
>
> --
> Warm Regards,
> Gauri Shankar
>