You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by saisantoshi <sa...@gmail.com> on 2013/01/24 00:19:10 UTC

TopDocCollector vs TopScoreDocCollector (semantics changed in 4.0, not backward comptabile)

Our current search implementation (based on 2.4.0) uses a collector extending
the TopDocCollector class

public class MyHitCollector extends TopDocsCollector {

    private IndexReader indexReader;
    private CustomFilter customFilter;

    public MyHitCollector (IndexReader indexReader, int numberOfHits,
CustomFilter filter) {
    *    super(numberOfHits);*
        this.indexReader = indexReader;
        this.nodeFilter = filter();
    }

    *public void collect(int doc, float score) {*
        try {
            if (score > 0.0f) {
                // do something
                    super.collect(doc, score);
                }
            }
        } catch (Exception e) {
           
        }
    }


//Using the collector
             MyHitCollector collector;
             IndexSearcher searcher= new IndexSearcher(reader);
               try {
                   collector = new MyHitCollector(reader, maximumHits,
filter);
                    searcher.search(query, null, collector);
                } finally {
              
                }

                TopDocs docs = collector.topDocs();


Now in 4.0, the TopDocCollector is removed and the suggested class is to use
TopScoreDocCollector (for faster performance).. I don't see the following
signatures available in the newer class thus breaking the backward
compatibility.

public collect(int doc, float score).   //I think this is no longer there.
 super(numberOfHits)  .  //The constructor for this is also been removed in
4.0.. This used to be in 2.4

This looks to be me backward compatibility is broken and there is no proper
documentation as well.

Could someone suggest any alternative here? Any collector that we can use to
be backward compatible?

Thanks and appreciate your help.

Thanks,
Sai.



--
View this message in context: http://lucene.472066.n3.nabble.com/TopDocCollector-vs-TopScoreDocCollector-semantics-changed-in-4-0-not-backward-comptabile-tp4035806.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: TopDocCollector vs TopScoreDocCollector (semantics changed in 4.0, not backward comptabile)

Posted by saisantoshi <sa...@gmail.com>.

I am sorry but I am confused looking at the change logs and the enhancements
done.  Since we are jumping from 2.4 - 4.0. Could you please point me to any
example code that extends one of the new collectors.. that would help a lot
or it would be great if you could give some pointers on how we can modify
our existing collector.

Thanks in advance and really appreciate your help here... Any example code
is also fine...



--
View this message in context: http://lucene.472066.n3.nabble.com/TopDocCollector-vs-TopScoreDocCollector-semantics-changed-in-4-0-not-backward-comptabile-tp4035806p4035815.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: TopDocCollector vs TopScoreDocCollector (semantics changed in 4.0, not backward comptabile)

Posted by saisantoshi <sa...@gmail.com>.

I am not looking for negative scores and want to skip it.

Thanks,
Sai



--
View this message in context: http://lucene.472066.n3.nabble.com/TopDocCollector-vs-TopScoreDocCollector-semantics-changed-in-4-0-not-backward-comptabile-tp4035806p4036378.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: TopDocCollector vs TopScoreDocCollector (semantics changed in 4.0, not backward comptabile)

Posted by Simon Willnauer <si...@gmail.com>.

On Fri, Jan 25, 2013 at 3:29 PM, saisantoshi <sa...@gmail.com> wrote:
> Thanks a lot. If we want to wrap TopScoreDocCollector into
> PositiveScoresOnlyCollector. Can we do that?
> I need only positive scores and I dont think topscore collector can handle
> by itself right?
>

I guess so! But how do you get neg. scores?
>
> Thanks,
> Sai
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/TopDocCollector-vs-TopScoreDocCollector-semantics-changed-in-4-0-not-backward-comptabile-tp4035806p4036240.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: TopDocCollector vs TopScoreDocCollector (semantics changed in 4.0, not backward comptabile)

Posted by saisantoshi <sa...@gmail.com>.

Thanks a lot. If we want to wrap TopScoreDocCollector into
PositiveScoresOnlyCollector. Can we do that?
I need only positive scores and I dont think topscore collector can handle
by itself right?


Thanks,
Sai



--
View this message in context: http://lucene.472066.n3.nabble.com/TopDocCollector-vs-TopScoreDocCollector-semantics-changed-in-4-0-not-backward-comptabile-tp4035806p4036240.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: TopDocCollector vs TopScoreDocCollector (semantics changed in 4.0, not backward comptabile)

Posted by saisantoshi <sa...@gmail.com>.

Here is how I am using it:

public class MyCollector extends PositiveScoresOnlyCollector  {

    private IndexReader indexReader;
         
   
    public MyCollector(IndexReader indexReader, PositiveScoresOnlyCollector
topScore) {
        super(topScore); 
        this.indexReader = indexReader;
      }

    @Override
    public void collect(int doc) {
        try {
                Document doc = indexReader.document(doc);
                //Custom Logic
                    super.collect(doc);
                
            
        } catch (Exception e) {
            
        }
    }



                MyCollector mycollector;
                TopScoreDocCollector topScore =
TopScoreDocCollector.create(100, true);
                IndexSearcher indexSearcher = new
IndexSearcher(indexReader);
                mycollector = new MyCollector(indexReader, new
PositiveScoresOnlyCollector(topScore));
                indexSearcher.search(queryString, (Filter) null,
mycollector);
                TopDocs hitDocs = topScore.topDocs();


Not sure what I am doing wrong here? How do I get a context to the
AtomicReader in my custom collector?

Thanks,
Sai



--
View this message in context: http://lucene.472066.n3.nabble.com/TopDocCollector-vs-TopScoreDocCollector-semantics-changed-in-4-0-not-backward-comptabile-tp4035806p4043502.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: TopDocCollector vs TopScoreDocCollector (semantics changed in 4.0, not backward comptabile)

Posted by saisantoshi <sa...@gmail.com>.

Could someone please comment on the above?

Thanks,
Sai



--
View this message in context: http://lucene.472066.n3.nabble.com/TopDocCollector-vs-TopScoreDocCollector-semantics-changed-in-4-0-not-backward-comptabile-tp4035806p4045855.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: TopDocCollector vs TopScoreDocCollector (semantics changed in 4.0, not backward comptabile)

Posted by saisantoshi <sa...@gmail.com>.

Thanks for the response and really appreciate your help. I have read the
documentation but could not get it in the first read as I was new to Lucene.
I have changed it to AtomicReader and it seems to be working fine.

One last clarification is do we also need to use AtomicReader for the
following below as well?

IndexReader indexReader = DirectoryReader.open(directory);   // Current

Should it be changed to:

AtomicReader indexReader = DirectoryReader.open(directory); 

Thanks,
Sai



--
View this message in context: http://lucene.472066.n3.nabble.com/TopDocCollector-vs-TopScoreDocCollector-semantics-changed-in-4-0-not-backward-comptabile-tp4035806p4045319.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: TopDocCollector vs TopScoreDocCollector (semantics changed in 4.0, not backward comptabile)

Posted by Michael Sokolov <ms...@safaribooksonline.com>.

On 03/01/2013 07:56 AM, Uwe Schindler wrote:
> The slowdown happens not on making the doc ids absolute (it is just an addition), the slowdown appears when you retrieve the stored fields on the top-level reader (because the composite top-level reader has to do a binary search in the reader tree to find the correct reader). This answer was related to the code pasted by the user asking this question.
Thanks - that's added a new nugget to my understanding :)

-Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: TopDocCollector vs TopScoreDocCollector (semantics changed in 4.0, not backward comptabile)

Posted by Uwe Schindler <uw...@thetaphi.de>.

The slowdown happens not on making the doc ids absolute (it is just an addition), the slowdown appears when you retrieve the stored fields on the top-level reader (because the composite top-level reader has to do a binary search in the reader tree to find the correct reader). This answer was related to the code pasted by the user asking this question.

If you need top-level doc ids because you present the global doc-ids to the user (e.g. this is how TopScoreDocCollector works), you can of course add the doc base. But inside the collector it makes absolutely no sense to transform the local and relative doc ids to absolute ones just to call a method on a top-level reader that needs to do the opposite with a binary search. In that case, use the AtomicReader directly. If you also access FieldCache, working with absolute doc-ids also brings in waste of megabytes of memory and FieldCache insanity.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Michael Sokolov [mailto:msokolov@safaribooksonline.com]
> Sent: Friday, March 01, 2013 1:41 PM
> To: java-user@lucene.apache.org
> Cc: Uwe Schindler
> Subject: Re: TopDocCollector vs TopScoreDocCollector (semantics changed in
> 4.0, not backward comptabile)
> 
> On 2/28/2013 5:05 PM, Uwe Schindler wrote:
> > ...  Collector instead of HitCollector (like your ancient Lucene from 2.4), you
> have to respect the new semantics that are *different* to old HitCollector.
> Collector works with low-level atomic readers (also in Lucene 3.x), the calls to
> the "collect(int)" method are *not* using global document IDs, so using a
> IndexReader from outside does not work and will never work - PERIOD: The
> document IDs are only *relative* to the atomic reader that was passed to
> the collector by setNextReader() before a sequence of collect() calls. To
> make global docIds out of it, you may use readerContext.docBase, but this is
> slower than using the low-level atomic reader.
> >
> Uwe, thanks for this lucid explanation!  I wonder if you wouldn't mind
> elaborating a bit on the slowdown you refer to from using docBase to
> absolutize docIDs.  I have a use case where I need to pass control to my
> caller, allowing them to *pull* results - so I don't know how many I will need.
> In the case where documents are returned in(docID) order, the code is
> actually pretty straightforward: I iterate over the atomic readers and pull
> results from each in turn.  Are you saying that is slower because it prevents
> multi-threading, or is there some other reason?
> 
> -Mike
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: TopDocCollector vs TopScoreDocCollector (semantics changed in 4.0, not backward comptabile)

Posted by Michael Sokolov <ms...@safaribooksonline.com>.

On 2/28/2013 5:05 PM, Uwe Schindler wrote:
> ...  Collector instead of HitCollector (like your ancient Lucene from 2.4), you have to respect the new semantics that are *different* to old HitCollector. Collector works with low-level atomic readers (also in Lucene 3.x), the calls to the "collect(int)" method are *not* using global document IDs, so using a IndexReader from outside does not work and will never work - PERIOD: The document IDs are only *relative* to the atomic reader that was passed to the collector by setNextReader() before a sequence of collect() calls. To make global docIds out of it, you may use readerContext.docBase, but this is slower than using the low-level atomic reader.
>
Uwe, thanks for this lucid explanation!  I wonder if you wouldn't mind 
elaborating a bit on the slowdown you refer to from using docBase to 
absolutize docIDs.  I have a use case where I need to pass control to my 
caller, allowing them to *pull* results - so I don't know how many I 
will need.  In the case where documents are returned in(docID) order, 
the code is actually pretty straightforward: I iterate over the atomic 
readers and pull results from each in turn.  Are you saying that is 
slower because it prevents multi-threading, or is there some other reason?

-Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Solr Or Lucene Paging

Posted by Ian Lea <ia...@gmail.com>.

You're probably better off asking Solr questions on the solr list.
But if you really need the 20 hits starting at 1000000 i.e. page
number 50000 you'd better rethink your requirements and your indexing
strategy.


--
Ian.


On Fri, Mar 1, 2013 at 6:48 AM, dizh <di...@neusoft.com> wrote:
>
> Hi，All：
>
>  I want to ask a question, How does Solr implements Paging, such as start = 1000000 and row = 20
>  I roughly saw Solr source, hare is the code:
>  getDocListNC(QueryResult qr,QueryCommand cmd) ;
>  often it uses :
>  topCollector = TopFieldCollector.create(weightSort(cmd.getSort()), len, false, needScores, needScores, true);
>  super.search(query, luceneFilter, collector);
>  See it , it only fetch all docs and then do paging, So my question is Is it too slow? if I set len = 10000000 Often cause OOM
>  Is Solr shard can perfectly solve it?
>  Thank you!
> ---------------------------------------------------------------------------------------------------
> Confidentiality Notice: The information contained in this e-mail and any accompanying attachment(s)
> is intended only for the use of the intended recipient and may be confidential and/or privileged of
> Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of this communication is
> not the intended recipient, unauthorized use, forwarding, printing,  storing, disclosure or copying
> is strictly prohibited, and may be unlawful.If you have received this communication in error,please
> immediately notify the sender by return e-mail, and delete the original message and all copies from
> your system. Thank you.
> ---------------------------------------------------------------------------------------------------

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Solr Or Lucene Paging

Posted by dizh <di...@neusoft.com>.

Hi，All：

 I want to ask a question, How does Solr implements Paging, such as start = 1000000 and row = 20
 I roughly saw Solr source, hare is the code:
 getDocListNC(QueryResult qr,QueryCommand cmd) ; 
 often it uses :
 topCollector = TopFieldCollector.create(weightSort(cmd.getSort()), len, false, needScores, needScores, true); 
 super.search(query, luceneFilter, collector); 
 See it , it only fetch all docs and then do paging, So my question is Is it too slow? if I set len = 10000000 Often cause OOM
 Is Solr shard can perfectly solve it?
 Thank you!
---------------------------------------------------------------------------------------------------
Confidentiality Notice: The information contained in this e-mail and any accompanying attachment(s) 
is intended only for the use of the intended recipient and may be confidential and/or privileged of 
Neusoft Corporation, its subsidiaries and/or its affiliates. If any reader of this communication is 
not the intended recipient, unauthorized use, forwarding, printing,  storing, disclosure or copying 
is strictly prohibited, and may be unlawful.If you have received this communication in error,please 
immediately notify the sender by return e-mail, and delete the original message and all copies from 
your system. Thank you. 
---------------------------------------------------------------------------------------------------

RE: TopDocCollector vs TopScoreDocCollector (semantics changed in 4.0, not backward comptabile)

Posted by Uwe Schindler <uw...@thetaphi.de>.

Hi,

This is not a bug in Lucene 4.0. This behavior is unchanged since Lucene 2.9/3.0, you just don't read javadocs and you just don't seem to understand the changes since Lucene 2.9.

I just repeat one final time: Collector is a low level search component in Lucene and was introduced in Lucene 2.9 to replace the old "HitCollector". So if you upgrade your code to use Collector instead of HitCollector (like your ancient Lucene from 2.4), you have to respect the new semantics that are *different* to old HitCollector. Collector works with low-level atomic readers (also in Lucene 3.x), the calls to the "collect(int)" method are *not* using global document IDs, so using a IndexReader from outside does not work and will never work - PERIOD: The document IDs are only *relative* to the atomic reader that was passed to the collector by setNextReader() before a sequence of collect() calls. To make global docIds out of it, you may use readerContext.docBase, but this is slower than using the low-level atomic reader.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: saisantoshi [mailto:saisantoshi76@gmail.com]
> Sent: Thursday, February 28, 2013 10:55 PM
> To: java-user@lucene.apache.org
> Subject: RE: TopDocCollector vs TopScoreDocCollector (semantics changed in
> 4.0, not backward comptabile)
> 
> Thanks a lot. Really appreciate your help here.
> 
> I have read through the document and understand that the IndexReader
> uses sub readers (to look into the index files) and AtomicReader does not.
> But how does this affect from the search stand point of view. I think search
> results should be consistent for both the readers.
> 
> It happened to be my case that the search was behaving weird ( returning
> incorrect Documents) until I am using the IndexReader and started to work
> fine when I changed it back to "AtomicReader". Not sure if this has solved
> the problem by changing it to AtomicReader? This seems to be a bug in the
> IndexReader in 4.0
> 
> // indexReader.document(doc) is giving incorrect result in 4.0
> 
> // atomicReader.document(doc) is giving the correct result.
> 
> 
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/TopDocCollector-vs-
> TopScoreDocCollector-semantics-changed-in-4-0-not-backward-comptabile-
> tp4035806p4043788.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: TopDocCollector vs TopScoreDocCollector (semantics changed in 4.0, not backward comptabile)

Posted by saisantoshi <sa...@gmail.com>.

Thanks a lot. Really appreciate your help here.

I have read through the document and understand that the IndexReader uses
sub readers (to look into the index files) and AtomicReader does not. But
how does this affect from the search stand point of view. I think search
results should be consistent for both the readers.

It happened to be my case that the search was behaving weird ( returning
incorrect Documents) until I am using the IndexReader and started to work
fine when I changed it back to "AtomicReader". Not sure if this has solved
the problem by changing it to AtomicReader? This seems to be a bug in the
IndexReader in 4.0

// indexReader.document(doc) is giving incorrect result in 4.0

// atomicReader.document(doc) is giving the correct result.



--
View this message in context: http://lucene.472066.n3.nabble.com/TopDocCollector-vs-TopScoreDocCollector-semantics-changed-in-4-0-not-backward-comptabile-tp4035806p4043788.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: TopDocCollector vs TopScoreDocCollector (semantics changed in 4.0, not backward comptabile)

Posted by Uwe Schindler <uw...@thetaphi.de>.

Hi,

I answered you question parallel to your second mail. This is not a new change in Lucene 4, its like that since Lucene 2.9/3.0.
You may also read: http://blog.thetaphi.de/2012/02/is-your-indexreader-atomic-major.html

The comment to your code: Don't pass a IndexReader to the ctor of your collector, instead *implement* setNextReader in your collector and use the passed AtomicReaderContext to get an AtomicReader. The document IDs passed to collect(int) are not global, they are only valid to the current atomic reader (as Lucene search is working segment-wise and not globally).

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: saisantoshi [mailto:saisantoshi76@gmail.com]
> Sent: Thursday, February 28, 2013 7:26 PM
> To: java-user@lucene.apache.org
> Subject: RE: TopDocCollector vs TopScoreDocCollector (semantics changed in
> 4.0, not backward comptabile)
> 
> Could someone please comment on the above code snippet ?
> 
> Also, one observation is that our search results are not consistent if we are
> using* IndexReader vs AtomicReader?* Could this be a problem?
> 
> Thanks,
> Sai.
> 
> 
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/TopDocCollector-vs-
> TopScoreDocCollector-semantics-changed-in-4-0-not-backward-comptabile-
> tp4035806p4043719.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: TopDocCollector vs TopScoreDocCollector (semantics changed in 4.0, not backward comptabile)

Posted by saisantoshi <sa...@gmail.com>.

Could someone please comment on the above code snippet ?

Also, one observation is that our search results are not consistent if we
are using* IndexReader vs AtomicReader?* Could this be a problem?

Thanks,
Sai.



--
View this message in context: http://lucene.472066.n3.nabble.com/TopDocCollector-vs-TopScoreDocCollector-semantics-changed-in-4-0-not-backward-comptabile-tp4035806p4043719.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: TopDocCollector vs TopScoreDocCollector (semantics changed in 4.0, not backward comptabile)

Posted by Uwe Schindler <uw...@thetaphi.de>.

You have to implement setNextReader in your collector. In setNextReader() save the AtomicReader from context.reader() in a field and use it from the collect method.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: saisantoshi [mailto:saisantoshi76@gmail.com]
> Sent: Wednesday, February 27, 2013 11:51 PM
> To: java-user@lucene.apache.org
> Subject: RE: TopDocCollector vs TopScoreDocCollector (semantics changed in
> 4.0, not backward comptabile)
> 
> Thanks. Is there any issue the way we are calling the
> indexReader.getDocument(doc)?
> 
> Not sure how do I get an AtomicReaderConext in the following below
> method?
> Any pointers on how do I get that instance is appreciated?
> 
> public void collect(int doc) throws IOException {
>     // ADD YOUR CUSTOM LOGIC HERE
> 
>  *  How do I get an AtomicReader context here? *
> 
>      delegate.collect(doc);
>    }
> 
> Thanks and appreciate your help here.
> 
> 
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/TopDocCollector-vs-
> TopScoreDocCollector-semantics-changed-in-4-0-not-backward-comptabile-
> tp4035806p4043497.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: TopDocCollector vs TopScoreDocCollector (semantics changed in 4.0, not backward comptabile)

Posted by saisantoshi <sa...@gmail.com>.

Thanks. Is there any issue the way we are calling the
indexReader.getDocument(doc)? 

Not sure how do I get an AtomicReaderConext in the following below method?
Any pointers on how do I get that instance is appreciated?

public void collect(int doc) throws IOException {
    // ADD YOUR CUSTOM LOGIC HERE

 *  How do I get an AtomicReader context here? *

     delegate.collect(doc);
   } 

Thanks and appreciate your help here.



--
View this message in context: http://lucene.472066.n3.nabble.com/TopDocCollector-vs-TopScoreDocCollector-semantics-changed-in-4-0-not-backward-comptabile-tp4035806p4043497.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: TopDocCollector vs TopScoreDocCollector (semantics changed in 4.0, not backward comptabile)

Posted by Uwe Schindler <uw...@thetaphi.de>.

You have to use the IndexReader that you get via Collector.setNextReader(AtomicReaderContext ctx). The context will provide you with the correct atomic reader and the correct document base for collecting documents with collect (all ids are relative to the context).

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: saisantoshi [mailto:saisantoshi76@gmail.com]
> Sent: Wednesday, February 27, 2013 10:39 PM
> To: java-user@lucene.apache.org
> Subject: Re: TopDocCollector vs TopScoreDocCollector (semantics changed in
> 4.0, not backward comptabile)
> 
> I want to get the Document in the following below code and thats why I need
> an indexReader
> 
> public void collect(int doc) throws IOException {
>     // ADD YOUR CUSTOM LOGIC HERE
> 
> *    Document doc = indexReader.document(doc)*
>     delegate.collect(doc);
>   }
> 
> 
> But this seems to be the problem as the indexReader is fetching an incorrect
> document. Do you think that there are any concurrency issues here?
> 
> Thanks,
> Sai.
> 
> 
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/TopDocCollector-vs-
> TopScoreDocCollector-semantics-changed-in-4-0-not-backward-comptabile-
> tp4035806p4043488.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: TopDocCollector vs TopScoreDocCollector (semantics changed in 4.0, not backward comptabile)

Posted by saisantoshi <sa...@gmail.com>.

I want to get the Document in the following below code and thats why I need
an indexReader

public void collect(int doc) throws IOException {
    // ADD YOUR CUSTOM LOGIC HERE

*    Document doc = indexReader.document(doc)*
    delegate.collect(doc);
  }


But this seems to be the problem as the indexReader is fetching an incorrect
document. Do you think that there are any concurrency issues here?

Thanks,
Sai.



--
View this message in context: http://lucene.472066.n3.nabble.com/TopDocCollector-vs-TopScoreDocCollector-semantics-changed-in-4-0-not-backward-comptabile-tp4035806p4043488.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: TopDocCollector vs TopScoreDocCollector (semantics changed in 4.0, not backward comptabile)

Posted by Simon Willnauer <si...@gmail.com>.

hey,

you don't need to set the indexreader in the constructor. An
AtomicReader is passed in for each segment to
Collector#setNextReader(AtomicReaderContext)
If you want to use a given collector and extend it with some custom
code in collect I would likely write a delegate Collector like this:

public class DelegatingCollector extends Collector {

  private final Collector delegate;

  public DelegatingCollector(Collector delegate) {
    this.delegate = delegate;
  }

 public Collector getDelegate() {
    return delegate;
  }

  @Override
  public void setScorer(Scorer scorer) throws IOException {
    delegate.setScorer(scorer);
  }

  @Override
  public void collect(int doc) throws IOException {
    // ADD YOUR CUSTOM LOGIC HERE
    delegate.collect(doc);
  }

  @Override
  public void setNextReader(AtomicReaderContext context) throws IOException {
    // THIS IS WHERE YOU GET THE READER --> context.reader()
    delegate.setNextReader(context);
  }

  @Override
  public boolean acceptsDocsOutOfOrder() {
    return delegate.acceptsDocsOutOfOrder();
  }
}

maybe this is easier for you? then you can simply call
TopScoreDocCollector.create(int numHits, boolean docsScoredInOrder);
and use the specialized collector for your settings in the delegate?

simon

On Thu, Jan 24, 2013 at 11:37 PM, saisantoshi <sa...@gmail.com> wrote:
> Can someone please help us here to validate the above?
>
> Thanks,
> Sai.
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/TopDocCollector-vs-TopScoreDocCollector-semantics-changed-in-4-0-not-backward-comptabile-tp4035806p4036093.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: TopDocCollector vs TopScoreDocCollector (semantics changed in 4.0, not backward comptabile)

Posted by saisantoshi <sa...@gmail.com>.

Can someone please help us here to validate the above?

Thanks,
Sai.



--
View this message in context: http://lucene.472066.n3.nabble.com/TopDocCollector-vs-TopScoreDocCollector-semantics-changed-in-4-0-not-backward-comptabile-tp4035806p4036093.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: TopDocCollector vs TopScoreDocCollector (semantics changed in 4.0, not backward comptabile)

Posted by saisantoshi <sa...@gmail.com>.

Here is the way I implemented a collector class. Appreciate if you could let
me know of any issues.. 

public class MyCollector extends PositiveScoresOnlyCollector  {

    private IndexReader indexReader;
      
   
    public MyCollector (IndexReader indexReader,PositiveScoresOnlyCollector
topScore) {
        super(topScore); 
        this.indexReader = indexReader;
    }

    @Override
    public void collect(int doc) {
        try {
               //Custom Logic
                    super.collect(doc);
           }
            
        } catch (Exception e) {
         
        }
    }



//Usage:

MyCollector collector;
                TopScoreDocCollector topScore =
TopScoreDocCollector.create(hits, true);
                IndexSearcher searcher = new IndexSearcher(reader);
                try {
                    collector = new MyCollector(indexReader, new
PositiveScoresOnlyCollector(topScore));
                    searcher.search(query, (Filter) null, collector);
                } finally {
                 
                }
    



--
View this message in context: http://lucene.472066.n3.nabble.com/TopDocCollector-vs-TopScoreDocCollector-semantics-changed-in-4-0-not-backward-comptabile-tp4035806p4035870.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: TopDocCollector vs TopScoreDocCollector (semantics changed in 4.0, not backward comptabile)

Posted by Uwe Schindler <uw...@thetaphi.de>.

This has been changed in Lucene 2.9, its nothing new in Lucene 4.0. Read the changes logs of Lucene 2.9/3.0, there is explained what you need to do.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: saisantoshi [mailto:saisantoshi76@gmail.com]
> Sent: Thursday, January 24, 2013 12:19 AM
> To: java-user@lucene.apache.org
> Subject: TopDocCollector vs TopScoreDocCollector (semantics changed in 4.0,
> not backward comptabile)
> 
> Our current search implementation (based on 2.4.0) uses a collector
> extending the TopDocCollector class
> 
> public class MyHitCollector extends TopDocsCollector {
> 
>     private IndexReader indexReader;
>     private CustomFilter customFilter;
> 
>     public MyHitCollector (IndexReader indexReader, int numberOfHits,
> CustomFilter filter) {
>     *    super(numberOfHits);*
>         this.indexReader = indexReader;
>         this.nodeFilter = filter();
>     }
> 
>     *public void collect(int doc, float score) {*
>         try {
>             if (score > 0.0f) {
>                 // do something
>                     super.collect(doc, score);
>                 }
>             }
>         } catch (Exception e) {
> 
>         }
>     }
> 
> 
> //Using the collector
>              MyHitCollector collector;
>              IndexSearcher searcher= new IndexSearcher(reader);
>                try {
>                    collector = new MyHitCollector(reader, maximumHits, filter);
>                     searcher.search(query, null, collector);
>                 } finally {
> 
>                 }
> 
>                 TopDocs docs = collector.topDocs();
> 
> 
> Now in 4.0, the TopDocCollector is removed and the suggested class is to use
> TopScoreDocCollector (for faster performance).. I don't see the following
> signatures available in the newer class thus breaking the backward
> compatibility.
> 
> public collect(int doc, float score).   //I think this is no longer there.
>  super(numberOfHits)  .  //The constructor for this is also been removed in
> 4.0.. This used to be in 2.4
> 
> This looks to be me backward compatibility is broken and there is no proper
> documentation as well.
> 
> Could someone suggest any alternative here? Any collector that we can use
> to be backward compatible?
> 
> Thanks and appreciate your help.
> 
> Thanks,
> Sai.
> 
> 
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/TopDocCollector-vs-
> TopScoreDocCollector-semantics-changed-in-4-0-not-backward-comptabile-
> tp4035806.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org