You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Dwaipayan Roy <dw...@gmail.com> on 2018/03/09 10:06:32 UTC

getting Lucene Docid from inside score()

While searching, I want to get the lucene assigned docid (that starts from
0 to the number of documents -1) of a document having a particular query
term.

From inside the score(), printing 'doc' or calling docId() is returning a
docid which, I think, is the internal docid of a segment in which the
document is indexed. However, I want to have the lucene assigned docid. How
to do that?

Dwaipayan..

Re: getting Lucene Docid from inside score()

Posted by Erick Erickson <er...@gmail.com>.
I was thinking this was a Solr question rather than a Lucene one so
the [docid] bit doesn't apply if you're in the lucene code. If you
_are_ really going from solr, just put [docid] in your Solr "fl" list.
Look in the Solr ref guide for an explanation:
https://lucene.apache.org/solr/guide/6_6/transforming-result-documents.html

If you _are_ doing this in the Lucene code, Isn't what you want just
the "doc" member variable of a ScoreDoc?

Best,
Erick


On Sat, Mar 10, 2018 at 4:43 AM, dwaipayan.roy@gmail.com
<dw...@gmail.com> wrote:
> Hi Erick,
>
> Many thanks for your reply and explanation.
>
> I really want this to work. The good news for me is, the index is static, there is no chance of any modification of the index.
>
>> Luke and the like are using a point-in-time snapshot of the index.
>
> I want to get that lucene-assigned docid, the same id that is returned, after performing a search(), in the form of topDocs.scoreDocs.
>         ScoreDoc[] hits;
>         indexSearcher.search(luceneQuery, collector);
>         topDocs = collector.topDocs();
>         hits = topDocs.scoreDocs;
>         System.out.println(hits[0].doc);               // I want this docid inside score()
>
>> If you still want to get the internal ID, just specify the
>> pseudo-field [docid], as: "fl=id,[docid]"
>
> I didn't get your suggestion properly. Can you please explain a little? I will be waiting for you reply.
>
> With regards,
>
> Dwaipayan..
>
> On 2018/03/09 20:04:59, Erick Erickson <er...@gmail.com> wrote:
>> You almost certainly do _not_ want this unless you are absolutely and
>> totally sure that your index does not change between the time you ask
>> for for the internal Lucene doc ID and the time you use it. No docs
>> may be added. No forceMerges are done. In fact, I'd go so far as to
>> say you shouldn't open any new searchers.
>>
>> Here's the reason. Say I have a single segment index with internal doc
>> IDs 1, 2, 3, 4, 5. Say I delete docs 2 and 3. Now say I optimize, the
>> new segment has IDs 1, 2, 3. This a simplification to illustrate that
>> _whenever_ a segment gets rewritten for any reason, internal Lucene
>> doc IDs may change. All this goes on in the background and you have no
>> control over when.
>>
>> Docs may even get renumbered relative to each other. Let's claim that
>> your SOlr ID is doc1 and its associated internal ID is 1. doc100 has
>> internal id 100. Segment merging could assign doc1 an id of 200 and
>> doc100 an id of 150. You just don't know.
>>
>> Luke and the like are using a point-in-time snapshot of the index.
>>
>> If you still want to get the internal ID, just specify the
>> pseudo-field [docid], as: "fl=id,[docid]"
>>
>> Best,
>> Erick
>>
>> On Fri, Mar 9, 2018 at 3:50 AM, dwaipayan.roy@gmail.com
>> <dw...@gmail.com> wrote:
>> > Thank you very much for your reply. Yes, I really want this (for
>> > implementing a retrieval function that extends the LMDir function).
>> > Precisely, I want the document numbering same as that we see in
>> > Lucene-Index-Viewers like Luke.
>> >
>> > I am not sure what you meant by "segment offset, held by a leaf reader"..
>> > Can you please explain a little, exactly when and what I need to do?
>> >
>> > Many thanks.
>> >
>> > On 2018/03/09 11:25:44, Michael Sokolov <ms...@gmail.com> wrote:
>> >> Are you sure you want this? Lucene docids aren't generally useful outside a
>> >> narrow internal context. They can change over time for example.
>> >>
>> >> But if you do, it sounds like maybe what you are seeing is the per segment
>> >> docid. To get a global one you have to add the segment offset, held by a
>> >> leaf reader.
>> >>
>> >> On Mar 9, 2018 5:06 AM, "Dwaipayan Roy" <dw...@gmail.com> wrote:
>> >>
>> >> > While searching, I want to get the lucene assigned docid (that starts from
>> >> > 0 to the number of documents -1) of a document having a particular query
>> >> > term.
>> >> >
>> >> > From inside the score(), printing 'doc' or calling docId() is returning a
>> >> > docid which, I think, is the internal docid of a segment in which the
>> >> > document is indexed. However, I want to have the lucene assigned docid. How
>> >> > to do that?
>> >> >
>> >> > Dwaipayan..
>> >> >
>> >>
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: getting Lucene Docid from inside score()

Posted by dw...@gmail.com, dw...@gmail.com.
Hi Erick,

Many thanks for your reply and explanation.

I really want this to work. The good news for me is, the index is static, there is no chance of any modification of the index.

> Luke and the like are using a point-in-time snapshot of the index.

I want to get that lucene-assigned docid, the same id that is returned, after performing a search(), in the form of topDocs.scoreDocs.
        ScoreDoc[] hits;
        indexSearcher.search(luceneQuery, collector);
        topDocs = collector.topDocs();
        hits = topDocs.scoreDocs;
        System.out.println(hits[0].doc);               // I want this docid inside score()

> If you still want to get the internal ID, just specify the
> pseudo-field [docid], as: "fl=id,[docid]"

I didn't get your suggestion properly. Can you please explain a little? I will be waiting for you reply.

With regards,

Dwaipayan..

On 2018/03/09 20:04:59, Erick Erickson <er...@gmail.com> wrote: 
> You almost certainly do _not_ want this unless you are absolutely and
> totally sure that your index does not change between the time you ask
> for for the internal Lucene doc ID and the time you use it. No docs
> may be added. No forceMerges are done. In fact, I'd go so far as to
> say you shouldn't open any new searchers.
> 
> Here's the reason. Say I have a single segment index with internal doc
> IDs 1, 2, 3, 4, 5. Say I delete docs 2 and 3. Now say I optimize, the
> new segment has IDs 1, 2, 3. This a simplification to illustrate that
> _whenever_ a segment gets rewritten for any reason, internal Lucene
> doc IDs may change. All this goes on in the background and you have no
> control over when.
> 
> Docs may even get renumbered relative to each other. Let's claim that
> your SOlr ID is doc1 and its associated internal ID is 1. doc100 has
> internal id 100. Segment merging could assign doc1 an id of 200 and
> doc100 an id of 150. You just don't know.
> 
> Luke and the like are using a point-in-time snapshot of the index.
> 
> If you still want to get the internal ID, just specify the
> pseudo-field [docid], as: "fl=id,[docid]"
> 
> Best,
> Erick
> 
> On Fri, Mar 9, 2018 at 3:50 AM, dwaipayan.roy@gmail.com
> <dw...@gmail.com> wrote:
> > Thank you very much for your reply. Yes, I really want this (for
> > implementing a retrieval function that extends the LMDir function).
> > Precisely, I want the document numbering same as that we see in
> > Lucene-Index-Viewers like Luke.
> >
> > I am not sure what you meant by "segment offset, held by a leaf reader"..
> > Can you please explain a little, exactly when and what I need to do?
> >
> > Many thanks.
> >
> > On 2018/03/09 11:25:44, Michael Sokolov <ms...@gmail.com> wrote:
> >> Are you sure you want this? Lucene docids aren't generally useful outside a
> >> narrow internal context. They can change over time for example.
> >>
> >> But if you do, it sounds like maybe what you are seeing is the per segment
> >> docid. To get a global one you have to add the segment offset, held by a
> >> leaf reader.
> >>
> >> On Mar 9, 2018 5:06 AM, "Dwaipayan Roy" <dw...@gmail.com> wrote:
> >>
> >> > While searching, I want to get the lucene assigned docid (that starts from
> >> > 0 to the number of documents -1) of a document having a particular query
> >> > term.
> >> >
> >> > From inside the score(), printing 'doc' or calling docId() is returning a
> >> > docid which, I think, is the internal docid of a segment in which the
> >> > document is indexed. However, I want to have the lucene assigned docid. How
> >> > to do that?
> >> >
> >> > Dwaipayan..
> >> >
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: getting Lucene Docid from inside score()

Posted by Erick Erickson <er...@gmail.com>.
You almost certainly do _not_ want this unless you are absolutely and
totally sure that your index does not change between the time you ask
for for the internal Lucene doc ID and the time you use it. No docs
may be added. No forceMerges are done. In fact, I'd go so far as to
say you shouldn't open any new searchers.

Here's the reason. Say I have a single segment index with internal doc
IDs 1, 2, 3, 4, 5. Say I delete docs 2 and 3. Now say I optimize, the
new segment has IDs 1, 2, 3. This a simplification to illustrate that
_whenever_ a segment gets rewritten for any reason, internal Lucene
doc IDs may change. All this goes on in the background and you have no
control over when.

Docs may even get renumbered relative to each other. Let's claim that
your SOlr ID is doc1 and its associated internal ID is 1. doc100 has
internal id 100. Segment merging could assign doc1 an id of 200 and
doc100 an id of 150. You just don't know.

Luke and the like are using a point-in-time snapshot of the index.

If you still want to get the internal ID, just specify the
pseudo-field [docid], as: "fl=id,[docid]"

Best,
Erick

On Fri, Mar 9, 2018 at 3:50 AM, dwaipayan.roy@gmail.com
<dw...@gmail.com> wrote:
> Thank you very much for your reply. Yes, I really want this (for
> implementing a retrieval function that extends the LMDir function).
> Precisely, I want the document numbering same as that we see in
> Lucene-Index-Viewers like Luke.
>
> I am not sure what you meant by "segment offset, held by a leaf reader"..
> Can you please explain a little, exactly when and what I need to do?
>
> Many thanks.
>
> On 2018/03/09 11:25:44, Michael Sokolov <ms...@gmail.com> wrote:
>> Are you sure you want this? Lucene docids aren't generally useful outside a
>> narrow internal context. They can change over time for example.
>>
>> But if you do, it sounds like maybe what you are seeing is the per segment
>> docid. To get a global one you have to add the segment offset, held by a
>> leaf reader.
>>
>> On Mar 9, 2018 5:06 AM, "Dwaipayan Roy" <dw...@gmail.com> wrote:
>>
>> > While searching, I want to get the lucene assigned docid (that starts from
>> > 0 to the number of documents -1) of a document having a particular query
>> > term.
>> >
>> > From inside the score(), printing 'doc' or calling docId() is returning a
>> > docid which, I think, is the internal docid of a segment in which the
>> > document is indexed. However, I want to have the lucene assigned docid. How
>> > to do that?
>> >
>> > Dwaipayan..
>> >
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: getting Lucene Docid from inside score()

Posted by dw...@gmail.com, dw...@gmail.com.
Thank you very much for your reply. Yes, I really want this (for
implementing a retrieval function that extends the LMDir function).
Precisely, I want the document numbering same as that we see in
Lucene-Index-Viewers like Luke.

I am not sure what you meant by "segment offset, held by a leaf reader"..
Can you please explain a little, exactly when and what I need to do?

Many thanks.

On 2018/03/09 11:25:44, Michael Sokolov <ms...@gmail.com> wrote: 
> Are you sure you want this? Lucene docids aren't generally useful outside a
> narrow internal context. They can change over time for example.
> 
> But if you do, it sounds like maybe what you are seeing is the per segment
> docid. To get a global one you have to add the segment offset, held by a
> leaf reader.
> 
> On Mar 9, 2018 5:06 AM, "Dwaipayan Roy" <dw...@gmail.com> wrote:
> 
> > While searching, I want to get the lucene assigned docid (that starts from
> > 0 to the number of documents -1) of a document having a particular query
> > term.
> >
> > From inside the score(), printing 'doc' or calling docId() is returning a
> > docid which, I think, is the internal docid of a segment in which the
> > document is indexed. However, I want to have the lucene assigned docid. How
> > to do that?
> >
> > Dwaipayan..
> >
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: getting Lucene Docid from inside score()

Posted by Michael Sokolov <ms...@gmail.com>.
Are you sure you want this? Lucene docids aren't generally useful outside a
narrow internal context. They can change over time for example.

But if you do, it sounds like maybe what you are seeing is the per segment
docid. To get a global one you have to add the segment offset, held by a
leaf reader.

On Mar 9, 2018 5:06 AM, "Dwaipayan Roy" <dw...@gmail.com> wrote:

> While searching, I want to get the lucene assigned docid (that starts from
> 0 to the number of documents -1) of a document having a particular query
> term.
>
> From inside the score(), printing 'doc' or calling docId() is returning a
> docid which, I think, is the internal docid of a segment in which the
> document is indexed. However, I want to have the lucene assigned docid. How
> to do that?
>
> Dwaipayan..
>