You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Waheed Mohammed <wa...@fiz-technik.de> on 2006/12/11 16:15:45 UTC

Lucene id generation

Hello,

Is there a way to influence lucene's generation of ids while indexing.

my requirement is. I want to have different indexes where no index should have  
ids that have been assigned to an index earlier.
for instance
IDX1 : {0.........100}
IDX2: {101.......200}
IDX3: {201.......300}
but not 
IDX1 : {0.........100}
IDX2 : {0.........100}
IDX3 : {0.........100}

any help is greatly appriciated.
-- 
A W Mohammed
Software Entwickler
FIZ-technik e.V
Frankfut am Main


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lucene id generation

Posted by karl wettin <ka...@gmail.com>.
11 dec 2006 kl. 16.15 skrev Waheed Mohammed:

>
> Is there a way to influence lucene's generation of ids while indexing.

If you speak of the Lucene "document number", then no. And are you  
aware of the fact that document numbers are eligable for change at  
any time (optimization) without giving you any notification of what  
was changed to what?

>
> my requirement is. I want to have different indexes where no index  
> should have
> ids that have been assigned to an index earlier.

You'll have to handle and add thoses identities manually in a stored  
field.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lucene id generation

Posted by Erick Erickson <er...@gmail.com>.
I don't believe that this is possible. Or desirable. Lucene IDs are mutable,
even within an index. That is, if you index docs that get, say, IDs 1, 2, 3,
4, 5 and delete doc 2 and optimize, Docs 4 and 5 get reassigned IDs 3 and 4
(or something similar).

You're far better off controlling this yourself. That is, forget Lucene IDs
and make up your own unique IDs that you can guarantee form disjoint sets
across your multiple indexes, then work with those.

Best
Erick

On 12/11/06, Waheed Mohammed <wa...@fiz-technik.de> wrote:
>
> Hello,
>
> Is there a way to influence lucene's generation of ids while indexing.
>
> my requirement is. I want to have different indexes where no index should
> have
> ids that have been assigned to an index earlier.
> for instance
> IDX1 : {0.........100}
> IDX2: {101.......200}
> IDX3: {201.......300}
> but not
> IDX1 : {0.........100}
> IDX2 : {0.........100}
> IDX3 : {0.........100}
>
> any help is greatly appriciated.
> --
> A W Mohammed
> Software Entwickler
> FIZ-technik e.V
> Frankfut am Main
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Lucene id generation

Posted by Waheed Mohammed <wa...@fiz-technik.de>.
Thanks for the instant reply,
I see what rajesh advises is something lilke what MultiReader does.
That would be my last approach becouse of the complexities it will introduce 
in developing the business case I have.
Any thing other than that would be a appriciable ppointer


On Monday 11 December 2006 17:10, Ramana Jelda wrote:
> I really lack this feature from lucene too.
> Whatever the requirements from Mohammed, There surely I see some
> improvements in search performance.
>
> My argument here is, why not lucene provides a mechanism to be able to
> provide custom document ids?
>
> > -----Original Message-----
> > From: Find Me [mailto:findmath@gmail.com]
> > Sent: Monday, December 11, 2006 4:34 PM
> > To: java-user@lucene.apache.org
> > Subject: Re: Lucene id generation
> >
> > On 12/11/06, Waheed Mohammed <wa...@fiz-technik.de> wrote:
> > > Hello,
> > >
> > > Is there a way to influence lucene's generation of ids
> >
> > while indexing.
> >
> > > my requirement is. I want to have different indexes where no index
> > > should have ids that have been assigned to an index earlier.
> > > for instance
> > > IDX1 : {0.........100}
> > > IDX2: {101.......200}
> > > IDX3: {201.......300}
> > > but not
> > > IDX1 : {0.........100}
> > > IDX2 : {0.........100}
> > > IDX3 : {0.........100}
> >
> > I dont think you should be doing that. If you want to have
> > the same effect,
> > during searching you can package hits from different indices with a
> > predetermined offset for each index. For ex: IDX1 will have
> > an offset 0,
> > IDX2 will have 101...and so on.
> >
> > --Rajesh Munavalli
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org

-- 
A W Mohammed
Software Entwickler
FIZ-technik e.V
Frankfut am Main


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Lucene id generation

Posted by Chris Hostetter <ho...@fucit.org>.
: Exactly in this scenario, I would love to use my custom generated document
: id.
: The array reference number is MyId & its value is some-interested-value
: matched to MyID.
:
: So,how can I generate custom document id.?

you can't .. you can index you custom "MyID" value and use the FieldCache
to look it up -- using the lucene docid.

That's what Karl is refering to: he uses the lucene docid as the *index*
in the array, and his MyId values are *stored* in the array.


: > -----Original Message-----
: > From: Chris Hostetter [mailto:hossman_lucene@fucit.org]
: > Sent: Friday, December 15, 2006 6:35 AM
: > To: java-user@lucene.apache.org
: > Subject: Re: Lucene id generation
: >
: >
: > Karl: it sounds like you are just refering to using the
: > lucene docid as an array index for the FieldCache of your
: > "MyID" field ... that's a perfectly valid use of the docid,
: > the key being that you aren't expecting the id to contain any
: > meaningful data itself -- it's just a refrence number.
: >
: > : > if you are trying to think of Lucene's docid as a meaningful
: > : > number, you
: > : > are doing something wrong.
: > :
: > : There is this one place where I use it. The index is add only, and
: > : the only data that interests me is the stored field MyID, also kept
: > : track in an int[docid]. In case of index operation that
: > change docid,
: > : I simply repopulate the int[].
: > :
: > : I throw this MyID value around quite a bit, starting in the hit
: > : collector. It save me time from deserializing all hits.
: > :
: > : Is this reasonable?
: >
: >
: >
: > -Hoss
: >
: >
: > ---------------------------------------------------------------------
: > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
: > For additional commands, e-mail: java-user-help@lucene.apache.org
: >
:
:
: ---------------------------------------------------------------------
: To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
: For additional commands, e-mail: java-user-help@lucene.apache.org
:



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Lucene id generation

Posted by Ramana Jelda <ra...@ciao-group.com>.
Hi Hoss,
Exactly in this scenario, I would love to use my custom generated document
id.
The array reference number is MyId & its value is some-interested-value
matched to MyID.

So,how can I generate custom document id.?

Jelda

> -----Original Message-----
> From: Chris Hostetter [mailto:hossman_lucene@fucit.org] 
> Sent: Friday, December 15, 2006 6:35 AM
> To: java-user@lucene.apache.org
> Subject: Re: Lucene id generation
> 
> 
> Karl: it sounds like you are just refering to using the 
> lucene docid as an array index for the FieldCache of your 
> "MyID" field ... that's a perfectly valid use of the docid, 
> the key being that you aren't expecting the id to contain any 
> meaningful data itself -- it's just a refrence number.
> 
> : > if you are trying to think of Lucene's docid as a meaningful
> : > number, you
> : > are doing something wrong.
> :
> : There is this one place where I use it. The index is add only, and
> : the only data that interests me is the stored field MyID, also kept
> : track in an int[docid]. In case of index operation that 
> change docid,
> : I simply repopulate the int[].
> :
> : I throw this MyID value around quite a bit, starting in the hit
> : collector. It save me time from deserializing all hits.
> :
> : Is this reasonable?
> 
> 
> 
> -Hoss
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lucene id generation

Posted by Chris Hostetter <ho...@fucit.org>.
Karl: it sounds like you are just refering to using the lucene docid as an
array index for the FieldCache of your "MyID" field ... that's a perfectly
valid use of the docid, the key being that you aren't expecting the id to
contain any meaningful data itself -- it's just a refrence number.

: > if you are trying to think of Lucene's docid as a meaningful
: > number, you
: > are doing something wrong.
:
: There is this one place where I use it. The index is add only, and
: the only data that interests me is the stored field MyID, also kept
: track in an int[docid]. In case of index operation that change docid,
: I simply repopulate the int[].
:
: I throw this MyID value around quite a bit, starting in the hit
: collector. It save me time from deserializing all hits.
:
: Is this reasonable?



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lucene id generation

Posted by karl wettin <ka...@gmail.com>.
11 dec 2006 kl. 20.04 skrev Chris Hostetter:

> if you are trying to think of Lucene's docid as a meaningful  
> number, you
> are doing something wrong.

There is this one place where I use it. The index is add only, and  
the only data that interests me is the stored field MyID, also kept  
track in an int[docid]. In case of index operation that change docid,  
I simply repopulate the int[].

I throw this MyID value around quite a bit, starting in the hit  
collector. It save me time from deserializing all hits.

Is this reasonable?




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Lucene id generation

Posted by Chris Hostetter <ho...@fucit.org>.
if you are trying to think of Lucene's docid as a meaningful number, you
are doing something wrong.

A lot of people want to view Lucene docids the same way they look at
auto-incrimented unique keys in a database -- don't do that.  Instead
think of them as memory addresses in C or C++ ... they are a handy
numberic value that tells Lucene at what offset in various segment files
it can find data about that document -- as your index changes, as data
gets moved arround, docids change.

the best corrallary that can be made to a database is not auto-generated
unique keys, it's row numbers ... the physical row number of where that
row is in the sequence of rows in your table -- a number most databases
never give you access to unless you are dealing withthe low level
internals of hte table, because as you add or deleted lots of data, as you
drop and load new indexes those numbers can change.

if you want control of a unique ID for each of hte documents in your index
-- at one as a field just like any other.



: Date: Mon, 11 Dec 2006 17:10:18 +0100
: From: Ramana Jelda <ra...@ciao-group.com>
: Reply-To: java-user@lucene.apache.org
: To: java-user@lucene.apache.org
: Subject: RE: Lucene id generation
:
: I really lack this feature from lucene too.
: Whatever the requirements from Mohammed, There surely I see some
: improvements in search performance.
:
: My argument here is, why not lucene provides a mechanism to be able to
: provide custom document ids?
:
:
: > -----Original Message-----
: > From: Find Me [mailto:findmath@gmail.com]
: > Sent: Monday, December 11, 2006 4:34 PM
: > To: java-user@lucene.apache.org
: > Subject: Re: Lucene id generation
: >
: > On 12/11/06, Waheed Mohammed <wa...@fiz-technik.de> wrote:
: > >
: > > Hello,
: > >
: > > Is there a way to influence lucene's generation of ids
: > while indexing.
: > >
: > > my requirement is. I want to have different indexes where no index
: > > should have ids that have been assigned to an index earlier.
: > > for instance
: > > IDX1 : {0.........100}
: > > IDX2: {101.......200}
: > > IDX3: {201.......300}
: > > but not
: > > IDX1 : {0.........100}
: > > IDX2 : {0.........100}
: > > IDX3 : {0.........100}
: >
: >
: > I dont think you should be doing that. If you want to have
: > the same effect,
: > during searching you can package hits from different indices with a
: > predetermined offset for each index. For ex: IDX1 will have
: > an offset 0,
: > IDX2 will have 101...and so on.
: >
: > --Rajesh Munavalli
: >
:
:
: ---------------------------------------------------------------------
: To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
: For additional commands, e-mail: java-user-help@lucene.apache.org
:



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Lucene id generation

Posted by Ramana Jelda <ra...@ciao-group.com>.
I really lack this feature from lucene too.
Whatever the requirements from Mohammed, There surely I see some
improvements in search performance.

My argument here is, why not lucene provides a mechanism to be able to
provide custom document ids?


> -----Original Message-----
> From: Find Me [mailto:findmath@gmail.com] 
> Sent: Monday, December 11, 2006 4:34 PM
> To: java-user@lucene.apache.org
> Subject: Re: Lucene id generation
> 
> On 12/11/06, Waheed Mohammed <wa...@fiz-technik.de> wrote:
> >
> > Hello,
> >
> > Is there a way to influence lucene's generation of ids 
> while indexing.
> >
> > my requirement is. I want to have different indexes where no index 
> > should have ids that have been assigned to an index earlier.
> > for instance
> > IDX1 : {0.........100}
> > IDX2: {101.......200}
> > IDX3: {201.......300}
> > but not
> > IDX1 : {0.........100}
> > IDX2 : {0.........100}
> > IDX3 : {0.........100}
> 
> 
> I dont think you should be doing that. If you want to have 
> the same effect,
> during searching you can package hits from different indices with a
> predetermined offset for each index. For ex: IDX1 will have 
> an offset 0,
> IDX2 will have 101...and so on.
> 
> --Rajesh Munavalli
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Lucene id generation

Posted by Find Me <fi...@gmail.com>.
On 12/11/06, Waheed Mohammed <wa...@fiz-technik.de> wrote:
>
> Hello,
>
> Is there a way to influence lucene's generation of ids while indexing.
>
> my requirement is. I want to have different indexes where no index should
> have
> ids that have been assigned to an index earlier.
> for instance
> IDX1 : {0.........100}
> IDX2: {101.......200}
> IDX3: {201.......300}
> but not
> IDX1 : {0.........100}
> IDX2 : {0.........100}
> IDX3 : {0.........100}


I dont think you should be doing that. If you want to have the same effect,
during searching you can package hits from different indices with a
predetermined offset for each index. For ex: IDX1 will have an offset 0,
IDX2 will have 101...and so on.

--Rajesh Munavalli