You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Patrick Diviacco <pa...@gmail.com> on 2011/03/22 09:22:50 UTC

how to get all documents in the results ?

I'm using the following code because I want to see the entire collection in
my query results:

//adding wildcards-term to see all results
rest = new TermQuery(new Term("*","*"));
booleanQuery.add(rest, BooleanClause.Occur.SHOULD);

But it doesn't work, I only see the relevant docs and not all the other
ones.
How can I get all documents ordered by relevance instead ?

ps. MatchAllDocsQuery is not a solution because I need to specify my own
custom query.

Re: how to get all documents in the results ?

Posted by Patrick Diviacco <pa...@gmail.com>.

Sorry for spam. I've actually added the following line:

booleanQuery.add(new MatchAllDocsQuery(), BooleanClause.Occur.SHOULD);

and now I get all collection back. So it seems to work...
I'm a bit confused about the score of the not relevant docs:

i.e. the last result is
298736393 107419924 All-text score:0.0018638512

I was expecting the score to be 0, instead.

thanks




On 23 March 2011 08:44, Patrick Diviacco <pa...@gmail.com> wrote:

> The issue with
>
>
> My confusion about MatchAllDocsQuery is that I cannot specify which terms
> in which fields to search with it. I'm probably wrong.
>
> I currently have a BooleanQuery, that I use to build the query with several
> fields and several terms.
>
> Can I just pass MatchAllDocsQuery to BooleanQuery.add method in order to
> add all remaining docs ?
>
> thanks
>
>
>
>
> On 22 March 2011 12:42, Anshum <an...@gmail.com> wrote:
>
>> MatchAllDocs does not consider only a single field but all fields i.e. it
>> takes a *:* query.
>>
>> *1. *
>> *****Snip****
>> Query query = new MatchAllDocsQuery();
>> TopDocs  td = is.search(query, ir.numDocs());
>> ScoreDoc[ ] scoreDocs = td.scoreDocs;
>> for(ScoreDoc scoreDoc:scoreDocs){
>>
>> ... Your code...
>>
>> }
>> ****/Snip***
>>
>> *2. Collector approach:*
>>
>> Query query = new MatchAllDocsQuery();
>> int hitsPerPage = ir.numDocs();
>> TopScoreDocCollector collector = TopScoreDocCollector.create(hitsPerPage,
>> true);
>> is.search(query, collector);
>> ScoreDoc[] scoreDocs = collector.topDocs().scoreDocs;
>> for(ScoreDoc scoreDoc:scoreDocs){
>>
>> .. Your code..
>>
>> }
>>
>> The better part about using a collector based approach is that you could
>> do
>> a lot in the collect step i.e. everything that you'd do in the iteration
>> post search could also be done while collecting here.
>>
>> Also, could you also tell me the exact case as to what is it that you are
>> trying to achieve. You may have a completely different option that you
>> haven't read which someone could advice if they know the exact intent.
>>
>> Hope this helps.
>>
>> --
>> Anshum Gupta
>> http://ai-cafe.blogspot.com
>>
>>
>> On Tue, Mar 22, 2011 at 4:59 PM, Patrick Diviacco <
>> patrick.diviacco@gmail.com> wrote:
>>
>> > 1. "all" docs
>> >
>> > 2. because matchalldocs only consider one field at once. I'm searching
>> over
>> > multiple fields instead.
>> >
>> > 3. could you tell me more about this ? It might be a solution!
>> >
>> >
>> >
>> > On 22 March 2011 12:18, Anshum <an...@gmail.com> wrote:
>> >
>> > > so a few things
>> > > 1. are you looking to get 'all' documents or only docs matching your
>> > query?
>> > > 2. if its about fetching all docs, why not use the matchalldocs query?
>> > > 3. did you try using a collector instead of topdocs?
>> > >
>> > > --
>> > > Anshum Gupta
>> > > http://ai-cafe.blogspot.com
>> > >
>> > >
>> > > On Tue, Mar 22, 2011 at 4:46 PM, Patrick Diviacco <
>> > > patrick.diviacco@gmail.com> wrote:
>> > >
>> > > > I don't think the link you suggested can help, but maybe I'm wrong.
>> > > >
>> > > > Also, the parameter MAX_HITS is not useful, it just limit the
>> results,
>> > it
>> > > > doesn't add the not relevant docs.
>> > > >
>> > > >
>> > > >
>> > > > On 22 March 2011 12:10, Anshum <an...@gmail.com> wrote:
>> > > >
>> > > > > Hi Patrick,
>> > > > > You may have a look at this, perhaps this will help you with it.
>> Let
>> > me
>> > > > > know
>> > > > > if you're still stuck up.
>> > > > >
>> > > >
>> > >
>> >
>> http://stackoverflow.com/questions/3300265/lucene-3-iterating-over-all-hits
>> > > > >
>> > > > >
>> > > > > --
>> > > > > Anshum Gupta
>> > > > > http://ai-cafe.blogspot.com
>> > > > >
>> > > > >
>> > > > > On Tue, Mar 22, 2011 at 4:10 PM, <ka...@nokia.com> wrote:
>> > > > >
>> > > > > > Not sure what your use case actually is, but it sounds like you
>> may
>> > > be
>> > > > > > unclear how Lucene works.
>> > > > > >
>> > > > > > Each query clause you have will produce an iterator that walks
>> over
>> > > the
>> > > > > > documents that match that clause.  All the documents from the
>> > entire,
>> > > > > root
>> > > > > > query get scored.  The scoring evaluation per document is also
>> > > related
>> > > > to
>> > > > > > the form of your query expression hierarchy.
>> > > > > >
>> > > > > > So, MatchAllDocsQuery is exactly what you want if you want a
>> > document
>> > > > > > iterator that includes all documents in the index.  You can
>> change
>> > > how
>> > > > > this
>> > > > > > is scored by extending MatchAllDocsQuery and writing a custom
>> > scorer.
>> > > > > >
>> > > > > > Karl
>> > > > > >
>> > > > > > -----Original Message-----
>> > > > > > From: ext Patrick Diviacco [mailto:patrick.diviacco@gmail.com]
>> > > > > > Sent: Tuesday, March 22, 2011 4:23 AM
>> > > > > > To: java-user@lucene.apache.org
>> > > > > > Subject: how to get all documents in the results ?
>> > > > > >
>> > > > > > I'm using the following code because I want to see the entire
>> > > > collection
>> > > > > in
>> > > > > > my query results:
>> > > > > >
>> > > > > > //adding wildcards-term to see all results
>> > > > > > rest = new TermQuery(new Term("*","*"));
>> > > > > > booleanQuery.add(rest, BooleanClause.Occur.SHOULD);
>> > > > > >
>> > > > > > But it doesn't work, I only see the relevant docs and not all
>> the
>> > > other
>> > > > > > ones.
>> > > > > > How can I get all documents ordered by relevance instead ?
>> > > > > >
>> > > > > > ps. MatchAllDocsQuery is not a solution because I need to
>> specify
>> > my
>> > > > own
>> > > > > > custom query.
>> > > > > >
>> > > > > >
>> > ---------------------------------------------------------------------
>> > > > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > > > > > For additional commands, e-mail:
>> java-user-help@lucene.apache.org
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: how to get all documents in the results ?

Posted by Patrick Diviacco <pa...@gmail.com>.

ok, yes, I've solved. Thanks for help.

On 23 March 2011 09:15, Anshum <an...@gmail.com> wrote:

> So functionally I am assuming you've achieved what you'd been aiming for.
> About the scores, the matchalldocs does score docs based on norm factors
> etc.
> therefore the score wouldn't be 0.
> --
> Anshum Gupta
> http://ai-cafe.blogspot.com
>
>
> On Wed, Mar 23, 2011 at 1:38 PM, Patrick Diviacco <
> patrick.diviacco@gmail.com> wrote:
>
> > yeah it is clear. However I don't just want all documents, I still want
> to
> > perform my specific query and on the bottom of the relevant docs, to list
> > all not relevant docs as well. (I need this for successive steps).
> >
> > However now it seems to work. I've added MatchAllDocsQuery to my
> > BooleanQuery, and I get the original results plus all remaining docs on
> > bottom.
> >
> >  The only thing I don't understand is the fixed score of not relevant
> docs
> > (the ones appearing only now with MatchAllDocsQuery).
> >
> > This is the list (please ignore Time and Harvesine scores).
> >
> > 298736393 298736393 All-text score:9.775101 Time:1.0 Harvesine:1.0 true
> > 298736393 298736445 All-text score:3.1505027 Time:1.0 Harvesine:1.0 true
> > 298736393 298736527 All-text score:3.1505027 Time:1.0 Harvesine:1.0 true
> > 298736393 298736583 All-text score:3.1505027 Time:0.9999981 Harvesine:1.0
> > true
> > 298736393 298736638 All-text score:3.1505027 Time:0.9999981 Harvesine:1.0
> > true
> > 298736393 298736709 All-text score:3.1505027 Time:0.9999981 Harvesine:1.0
> > true
> > 298736393 298736781 All-text score:3.1505027 Time:0.9999981
> > Harvesine:-5593.9307 true
> > 298736393 298736848 All-text score:3.1505027 Time:0.9999981 Harvesine:1.0
> > true
> > 298736393 406412394 All-text score:0.62491775 Time:0.9983447
> > Harvesine:-5593.9307 false
> > 298736393 361767007 All-text score:0.44931924 Time:0.9994102
> > Harvesine:-5593.9307 false
> > 298736393 361767030 All-text score:0.44931924 Time:0.9994102
> > Harvesine:-5593.9307 false
> > 298736393 361767061 All-text score:0.44931924 Time:0.9994102
> > Harvesine:-5593.9307 false
> > 298736393 361767089 All-text score:0.44931924 Time:0.9994102
> > Harvesine:-5593.9307 false
> > 298736393 361767103 All-text score:0.44931924 Time:0.9994102
> > Harvesine:-5593.9307 false
> > 298736393 361767123 All-text score:0.44931924 Time:0.9994102
> > Harvesine:-5593.9307 false
> > 298736393 361767146 All-text score:0.44931924 Time:0.9994102
> > Harvesine:-5593.9307 false
> > 298736393 ?361492738 All-text score:0.13399276 Time:0.99982494
> > Harvesine:-5593.9307 false
> > 298736393 361492799 All-text score:0.13399276 Time:0.99982494
> > Harvesine:-5593.9307 false
> > 298736393 85899696 All-text score:0.061811533 Time:0.9680632
> > Harvesine:-2355.1875 false
> > 298736393 103368776 All-text score:0.032769617 Time:0.855137
> > Harvesine:-5593.9307 false
> > 298736393 101106927 All-text score:0.029139375 Time:0.8688775
> > Harvesine:-5593.9307 false
> > 298736393 80793153 All-text score:0.0018638512 Time:0.99774164
> > Harvesine:-2201.7686 false
> > 298736393 82661153 All-text score:0.0018638512 Time:0.98817354
> > Harvesine:-5593.9307 false
> > 298736393 84079583 All-text score:0.0018638512 Time:0.9796918
> > Harvesine:-5593.9307 false
> > 298736393 84079707 All-text score:0.0018638512 Time:0.9796918
> > Harvesine:-5593.9307 false
> > 298736393 84079911 All-text score:0.0018638512 Time:0.9796918
> > Harvesine:-5593.9307 false
> >
> >
> > On 23 March 2011 09:01, Anshum <an...@gmail.com> wrote:
> >
> > > Hi Patrick,
> > > You *don't* need to add a MatchAllDocs query to anything. If you just
> > want
> > > all docs, just pass it to the searcher.search function and you'd get
> all
> > > results.
> > > MatchAllDocs query is the same as BooleanQuery , just that MADQ matches
> > all
> > > docs in the index. You wouldn't need to specify anything there.
> > > The below would work and get you all the docs in the index as the
> result
> > > (provided you specify a limit high enough for the numDocs to match
> param)
> > >
> > > *Query query = new MatchAllDocsQuery();*
> > > *searcher.search(query.....);*
> > >
> > > Hope this clarifies your doubt.
> > >
> > > --
> > > Anshum Gupta
> > > http://ai-cafe.blogspot.com
> > >
> > >
> > > On Wed, Mar 23, 2011 at 1:14 PM, Patrick Diviacco <
> > > patrick.diviacco@gmail.com> wrote:
> > >
> > > > The issue with
> > > >
> > > >
> > > > My confusion about MatchAllDocsQuery is that I cannot specify which
> > terms
> > > > in
> > > > which fields to search with it. I'm probably wrong.
> > > >
> > > > I currently have a BooleanQuery, that I use to build the query with
> > > several
> > > > fields and several terms.
> > > >
> > > > Can I just pass MatchAllDocsQuery to BooleanQuery.add method in order
> > to
> > > > add
> > > > all remaining docs ?
> > > >
> > > > thanks
> > > >
> > > >
> > > >
> > > >
> > > > On 22 March 2011 12:42, Anshum <an...@gmail.com> wrote:
> > > >
> > > > > MatchAllDocs does not consider only a single field but all fields
> > i.e.
> > > it
> > > > > takes a *:* query.
> > > > >
> > > > > *1. *
> > > > > *****Snip****
> > > > > Query query = new MatchAllDocsQuery();
> > > > > TopDocs  td = is.search(query, ir.numDocs());
> > > > > ScoreDoc[ ] scoreDocs = td.scoreDocs;
> > > > > for(ScoreDoc scoreDoc:scoreDocs){
> > > > >
> > > > > ... Your code...
> > > > >
> > > > > }
> > > > > ****/Snip***
> > > > >
> > > > > *2. Collector approach:*
> > > > >
> > > > > Query query = new MatchAllDocsQuery();
> > > > > int hitsPerPage = ir.numDocs();
> > > > > TopScoreDocCollector collector =
> > > TopScoreDocCollector.create(hitsPerPage,
> > > > > true);
> > > > > is.search(query, collector);
> > > > > ScoreDoc[] scoreDocs = collector.topDocs().scoreDocs;
> > > > > for(ScoreDoc scoreDoc:scoreDocs){
> > > > >
> > > > > .. Your code..
> > > > >
> > > > > }
> > > > >
> > > > > The better part about using a collector based approach is that you
> > > could
> > > > do
> > > > > a lot in the collect step i.e. everything that you'd do in the
> > > iteration
> > > > > post search could also be done while collecting here.
> > > > >
> > > > > Also, could you also tell me the exact case as to what is it that
> you
> > > are
> > > > > trying to achieve. You may have a completely different option that
> > you
> > > > > haven't read which someone could advice if they know the exact
> > intent.
> > > > >
> > > > > Hope this helps.
> > > > >
> > > > > --
> > > > > Anshum Gupta
> > > > > http://ai-cafe.blogspot.com
> > > > >
> > > > >
> > > > > On Tue, Mar 22, 2011 at 4:59 PM, Patrick Diviacco <
> > > > > patrick.diviacco@gmail.com> wrote:
> > > > >
> > > > > > 1. "all" docs
> > > > > >
> > > > > > 2. because matchalldocs only consider one field at once. I'm
> > > searching
> > > > > over
> > > > > > multiple fields instead.
> > > > > >
> > > > > > 3. could you tell me more about this ? It might be a solution!
> > > > > >
> > > > > >
> > > > > >
> > > > > > On 22 March 2011 12:18, Anshum <an...@gmail.com> wrote:
> > > > > >
> > > > > > > so a few things
> > > > > > > 1. are you looking to get 'all' documents or only docs matching
> > > your
> > > > > > query?
> > > > > > > 2. if its about fetching all docs, why not use the matchalldocs
> > > > query?
> > > > > > > 3. did you try using a collector instead of topdocs?
> > > > > > >
> > > > > > > --
> > > > > > > Anshum Gupta
> > > > > > > http://ai-cafe.blogspot.com
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Mar 22, 2011 at 4:46 PM, Patrick Diviacco <
> > > > > > > patrick.diviacco@gmail.com> wrote:
> > > > > > >
> > > > > > > > I don't think the link you suggested can help, but maybe I'm
> > > wrong.
> > > > > > > >
> > > > > > > > Also, the parameter MAX_HITS is not useful, it just limit the
> > > > > results,
> > > > > > it
> > > > > > > > doesn't add the not relevant docs.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On 22 March 2011 12:10, Anshum <an...@gmail.com> wrote:
> > > > > > > >
> > > > > > > > > Hi Patrick,
> > > > > > > > > You may have a look at this, perhaps this will help you
> with
> > > it.
> > > > > Let
> > > > > > me
> > > > > > > > > know
> > > > > > > > > if you're still stuck up.
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://stackoverflow.com/questions/3300265/lucene-3-iterating-over-all-hits
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Anshum Gupta
> > > > > > > > > http://ai-cafe.blogspot.com
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Tue, Mar 22, 2011 at 4:10 PM, <ka...@nokia.com>
> > > wrote:
> > > > > > > > >
> > > > > > > > > > Not sure what your use case actually is, but it sounds
> like
> > > you
> > > > > may
> > > > > > > be
> > > > > > > > > > unclear how Lucene works.
> > > > > > > > > >
> > > > > > > > > > Each query clause you have will produce an iterator that
> > > walks
> > > > > over
> > > > > > > the
> > > > > > > > > > documents that match that clause.  All the documents from
> > the
> > > > > > entire,
> > > > > > > > > root
> > > > > > > > > > query get scored.  The scoring evaluation per document is
> > > also
> > > > > > > related
> > > > > > > > to
> > > > > > > > > > the form of your query expression hierarchy.
> > > > > > > > > >
> > > > > > > > > > So, MatchAllDocsQuery is exactly what you want if you
> want
> > a
> > > > > > document
> > > > > > > > > > iterator that includes all documents in the index.  You
> can
> > > > > change
> > > > > > > how
> > > > > > > > > this
> > > > > > > > > > is scored by extending MatchAllDocsQuery and writing a
> > custom
> > > > > > scorer.
> > > > > > > > > >
> > > > > > > > > > Karl
> > > > > > > > > >
> > > > > > > > > > -----Original Message-----
> > > > > > > > > > From: ext Patrick Diviacco [mailto:
> > > patrick.diviacco@gmail.com]
> > > > > > > > > > Sent: Tuesday, March 22, 2011 4:23 AM
> > > > > > > > > > To: java-user@lucene.apache.org
> > > > > > > > > > Subject: how to get all documents in the results ?
> > > > > > > > > >
> > > > > > > > > > I'm using the following code because I want to see the
> > entire
> > > > > > > > collection
> > > > > > > > > in
> > > > > > > > > > my query results:
> > > > > > > > > >
> > > > > > > > > > //adding wildcards-term to see all results
> > > > > > > > > > rest = new TermQuery(new Term("*","*"));
> > > > > > > > > > booleanQuery.add(rest, BooleanClause.Occur.SHOULD);
> > > > > > > > > >
> > > > > > > > > > But it doesn't work, I only see the relevant docs and not
> > all
> > > > the
> > > > > > > other
> > > > > > > > > > ones.
> > > > > > > > > > How can I get all documents ordered by relevance instead
> ?
> > > > > > > > > >
> > > > > > > > > > ps. MatchAllDocsQuery is not a solution because I need to
> > > > specify
> > > > > > my
> > > > > > > > own
> > > > > > > > > > custom query.
> > > > > > > > > >
> > > > > > > > > >
> > > > > >
> > ---------------------------------------------------------------------
> > > > > > > > > > To unsubscribe, e-mail:
> > > > java-user-unsubscribe@lucene.apache.org
> > > > > > > > > > For additional commands, e-mail:
> > > > > java-user-help@lucene.apache.org
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: how to get all documents in the results ?

Posted by Anshum <an...@gmail.com>.

So functionally I am assuming you've achieved what you'd been aiming for.
About the scores, the matchalldocs does score docs based on norm factors
etc.
therefore the score wouldn't be 0.
--
Anshum Gupta
http://ai-cafe.blogspot.com


On Wed, Mar 23, 2011 at 1:38 PM, Patrick Diviacco <
patrick.diviacco@gmail.com> wrote:

> yeah it is clear. However I don't just want all documents, I still want to
> perform my specific query and on the bottom of the relevant docs, to list
> all not relevant docs as well. (I need this for successive steps).
>
> However now it seems to work. I've added MatchAllDocsQuery to my
> BooleanQuery, and I get the original results plus all remaining docs on
> bottom.
>
>  The only thing I don't understand is the fixed score of not relevant docs
> (the ones appearing only now with MatchAllDocsQuery).
>
> This is the list (please ignore Time and Harvesine scores).
>
> 298736393 298736393 All-text score:9.775101 Time:1.0 Harvesine:1.0 true
> 298736393 298736445 All-text score:3.1505027 Time:1.0 Harvesine:1.0 true
> 298736393 298736527 All-text score:3.1505027 Time:1.0 Harvesine:1.0 true
> 298736393 298736583 All-text score:3.1505027 Time:0.9999981 Harvesine:1.0
> true
> 298736393 298736638 All-text score:3.1505027 Time:0.9999981 Harvesine:1.0
> true
> 298736393 298736709 All-text score:3.1505027 Time:0.9999981 Harvesine:1.0
> true
> 298736393 298736781 All-text score:3.1505027 Time:0.9999981
> Harvesine:-5593.9307 true
> 298736393 298736848 All-text score:3.1505027 Time:0.9999981 Harvesine:1.0
> true
> 298736393 406412394 All-text score:0.62491775 Time:0.9983447
> Harvesine:-5593.9307 false
> 298736393 361767007 All-text score:0.44931924 Time:0.9994102
> Harvesine:-5593.9307 false
> 298736393 361767030 All-text score:0.44931924 Time:0.9994102
> Harvesine:-5593.9307 false
> 298736393 361767061 All-text score:0.44931924 Time:0.9994102
> Harvesine:-5593.9307 false
> 298736393 361767089 All-text score:0.44931924 Time:0.9994102
> Harvesine:-5593.9307 false
> 298736393 361767103 All-text score:0.44931924 Time:0.9994102
> Harvesine:-5593.9307 false
> 298736393 361767123 All-text score:0.44931924 Time:0.9994102
> Harvesine:-5593.9307 false
> 298736393 361767146 All-text score:0.44931924 Time:0.9994102
> Harvesine:-5593.9307 false
> 298736393 ?361492738 All-text score:0.13399276 Time:0.99982494
> Harvesine:-5593.9307 false
> 298736393 361492799 All-text score:0.13399276 Time:0.99982494
> Harvesine:-5593.9307 false
> 298736393 85899696 All-text score:0.061811533 Time:0.9680632
> Harvesine:-2355.1875 false
> 298736393 103368776 All-text score:0.032769617 Time:0.855137
> Harvesine:-5593.9307 false
> 298736393 101106927 All-text score:0.029139375 Time:0.8688775
> Harvesine:-5593.9307 false
> 298736393 80793153 All-text score:0.0018638512 Time:0.99774164
> Harvesine:-2201.7686 false
> 298736393 82661153 All-text score:0.0018638512 Time:0.98817354
> Harvesine:-5593.9307 false
> 298736393 84079583 All-text score:0.0018638512 Time:0.9796918
> Harvesine:-5593.9307 false
> 298736393 84079707 All-text score:0.0018638512 Time:0.9796918
> Harvesine:-5593.9307 false
> 298736393 84079911 All-text score:0.0018638512 Time:0.9796918
> Harvesine:-5593.9307 false
>
>
> On 23 March 2011 09:01, Anshum <an...@gmail.com> wrote:
>
> > Hi Patrick,
> > You *don't* need to add a MatchAllDocs query to anything. If you just
> want
> > all docs, just pass it to the searcher.search function and you'd get all
> > results.
> > MatchAllDocs query is the same as BooleanQuery , just that MADQ matches
> all
> > docs in the index. You wouldn't need to specify anything there.
> > The below would work and get you all the docs in the index as the result
> > (provided you specify a limit high enough for the numDocs to match param)
> >
> > *Query query = new MatchAllDocsQuery();*
> > *searcher.search(query.....);*
> >
> > Hope this clarifies your doubt.
> >
> > --
> > Anshum Gupta
> > http://ai-cafe.blogspot.com
> >
> >
> > On Wed, Mar 23, 2011 at 1:14 PM, Patrick Diviacco <
> > patrick.diviacco@gmail.com> wrote:
> >
> > > The issue with
> > >
> > >
> > > My confusion about MatchAllDocsQuery is that I cannot specify which
> terms
> > > in
> > > which fields to search with it. I'm probably wrong.
> > >
> > > I currently have a BooleanQuery, that I use to build the query with
> > several
> > > fields and several terms.
> > >
> > > Can I just pass MatchAllDocsQuery to BooleanQuery.add method in order
> to
> > > add
> > > all remaining docs ?
> > >
> > > thanks
> > >
> > >
> > >
> > >
> > > On 22 March 2011 12:42, Anshum <an...@gmail.com> wrote:
> > >
> > > > MatchAllDocs does not consider only a single field but all fields
> i.e.
> > it
> > > > takes a *:* query.
> > > >
> > > > *1. *
> > > > *****Snip****
> > > > Query query = new MatchAllDocsQuery();
> > > > TopDocs  td = is.search(query, ir.numDocs());
> > > > ScoreDoc[ ] scoreDocs = td.scoreDocs;
> > > > for(ScoreDoc scoreDoc:scoreDocs){
> > > >
> > > > ... Your code...
> > > >
> > > > }
> > > > ****/Snip***
> > > >
> > > > *2. Collector approach:*
> > > >
> > > > Query query = new MatchAllDocsQuery();
> > > > int hitsPerPage = ir.numDocs();
> > > > TopScoreDocCollector collector =
> > TopScoreDocCollector.create(hitsPerPage,
> > > > true);
> > > > is.search(query, collector);
> > > > ScoreDoc[] scoreDocs = collector.topDocs().scoreDocs;
> > > > for(ScoreDoc scoreDoc:scoreDocs){
> > > >
> > > > .. Your code..
> > > >
> > > > }
> > > >
> > > > The better part about using a collector based approach is that you
> > could
> > > do
> > > > a lot in the collect step i.e. everything that you'd do in the
> > iteration
> > > > post search could also be done while collecting here.
> > > >
> > > > Also, could you also tell me the exact case as to what is it that you
> > are
> > > > trying to achieve. You may have a completely different option that
> you
> > > > haven't read which someone could advice if they know the exact
> intent.
> > > >
> > > > Hope this helps.
> > > >
> > > > --
> > > > Anshum Gupta
> > > > http://ai-cafe.blogspot.com
> > > >
> > > >
> > > > On Tue, Mar 22, 2011 at 4:59 PM, Patrick Diviacco <
> > > > patrick.diviacco@gmail.com> wrote:
> > > >
> > > > > 1. "all" docs
> > > > >
> > > > > 2. because matchalldocs only consider one field at once. I'm
> > searching
> > > > over
> > > > > multiple fields instead.
> > > > >
> > > > > 3. could you tell me more about this ? It might be a solution!
> > > > >
> > > > >
> > > > >
> > > > > On 22 March 2011 12:18, Anshum <an...@gmail.com> wrote:
> > > > >
> > > > > > so a few things
> > > > > > 1. are you looking to get 'all' documents or only docs matching
> > your
> > > > > query?
> > > > > > 2. if its about fetching all docs, why not use the matchalldocs
> > > query?
> > > > > > 3. did you try using a collector instead of topdocs?
> > > > > >
> > > > > > --
> > > > > > Anshum Gupta
> > > > > > http://ai-cafe.blogspot.com
> > > > > >
> > > > > >
> > > > > > On Tue, Mar 22, 2011 at 4:46 PM, Patrick Diviacco <
> > > > > > patrick.diviacco@gmail.com> wrote:
> > > > > >
> > > > > > > I don't think the link you suggested can help, but maybe I'm
> > wrong.
> > > > > > >
> > > > > > > Also, the parameter MAX_HITS is not useful, it just limit the
> > > > results,
> > > > > it
> > > > > > > doesn't add the not relevant docs.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On 22 March 2011 12:10, Anshum <an...@gmail.com> wrote:
> > > > > > >
> > > > > > > > Hi Patrick,
> > > > > > > > You may have a look at this, perhaps this will help you with
> > it.
> > > > Let
> > > > > me
> > > > > > > > know
> > > > > > > > if you're still stuck up.
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://stackoverflow.com/questions/3300265/lucene-3-iterating-over-all-hits
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Anshum Gupta
> > > > > > > > http://ai-cafe.blogspot.com
> > > > > > > >
> > > > > > > >
> > > > > > > > On Tue, Mar 22, 2011 at 4:10 PM, <ka...@nokia.com>
> > wrote:
> > > > > > > >
> > > > > > > > > Not sure what your use case actually is, but it sounds like
> > you
> > > > may
> > > > > > be
> > > > > > > > > unclear how Lucene works.
> > > > > > > > >
> > > > > > > > > Each query clause you have will produce an iterator that
> > walks
> > > > over
> > > > > > the
> > > > > > > > > documents that match that clause.  All the documents from
> the
> > > > > entire,
> > > > > > > > root
> > > > > > > > > query get scored.  The scoring evaluation per document is
> > also
> > > > > > related
> > > > > > > to
> > > > > > > > > the form of your query expression hierarchy.
> > > > > > > > >
> > > > > > > > > So, MatchAllDocsQuery is exactly what you want if you want
> a
> > > > > document
> > > > > > > > > iterator that includes all documents in the index.  You can
> > > > change
> > > > > > how
> > > > > > > > this
> > > > > > > > > is scored by extending MatchAllDocsQuery and writing a
> custom
> > > > > scorer.
> > > > > > > > >
> > > > > > > > > Karl
> > > > > > > > >
> > > > > > > > > -----Original Message-----
> > > > > > > > > From: ext Patrick Diviacco [mailto:
> > patrick.diviacco@gmail.com]
> > > > > > > > > Sent: Tuesday, March 22, 2011 4:23 AM
> > > > > > > > > To: java-user@lucene.apache.org
> > > > > > > > > Subject: how to get all documents in the results ?
> > > > > > > > >
> > > > > > > > > I'm using the following code because I want to see the
> entire
> > > > > > > collection
> > > > > > > > in
> > > > > > > > > my query results:
> > > > > > > > >
> > > > > > > > > //adding wildcards-term to see all results
> > > > > > > > > rest = new TermQuery(new Term("*","*"));
> > > > > > > > > booleanQuery.add(rest, BooleanClause.Occur.SHOULD);
> > > > > > > > >
> > > > > > > > > But it doesn't work, I only see the relevant docs and not
> all
> > > the
> > > > > > other
> > > > > > > > > ones.
> > > > > > > > > How can I get all documents ordered by relevance instead ?
> > > > > > > > >
> > > > > > > > > ps. MatchAllDocsQuery is not a solution because I need to
> > > specify
> > > > > my
> > > > > > > own
> > > > > > > > > custom query.
> > > > > > > > >
> > > > > > > > >
> > > > >
> ---------------------------------------------------------------------
> > > > > > > > > To unsubscribe, e-mail:
> > > java-user-unsubscribe@lucene.apache.org
> > > > > > > > > For additional commands, e-mail:
> > > > java-user-help@lucene.apache.org
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: how to get all documents in the results ?

Posted by Patrick Diviacco <pa...@gmail.com>.

yeah it is clear. However I don't just want all documents, I still want to
perform my specific query and on the bottom of the relevant docs, to list
all not relevant docs as well. (I need this for successive steps).

However now it seems to work. I've added MatchAllDocsQuery to my
BooleanQuery, and I get the original results plus all remaining docs on
bottom.

 The only thing I don't understand is the fixed score of not relevant docs
(the ones appearing only now with MatchAllDocsQuery).

This is the list (please ignore Time and Harvesine scores).

298736393 298736393 All-text score:9.775101 Time:1.0 Harvesine:1.0 true
298736393 298736445 All-text score:3.1505027 Time:1.0 Harvesine:1.0 true
298736393 298736527 All-text score:3.1505027 Time:1.0 Harvesine:1.0 true
298736393 298736583 All-text score:3.1505027 Time:0.9999981 Harvesine:1.0
true
298736393 298736638 All-text score:3.1505027 Time:0.9999981 Harvesine:1.0
true
298736393 298736709 All-text score:3.1505027 Time:0.9999981 Harvesine:1.0
true
298736393 298736781 All-text score:3.1505027 Time:0.9999981
Harvesine:-5593.9307 true
298736393 298736848 All-text score:3.1505027 Time:0.9999981 Harvesine:1.0
true
298736393 406412394 All-text score:0.62491775 Time:0.9983447
Harvesine:-5593.9307 false
298736393 361767007 All-text score:0.44931924 Time:0.9994102
Harvesine:-5593.9307 false
298736393 361767030 All-text score:0.44931924 Time:0.9994102
Harvesine:-5593.9307 false
298736393 361767061 All-text score:0.44931924 Time:0.9994102
Harvesine:-5593.9307 false
298736393 361767089 All-text score:0.44931924 Time:0.9994102
Harvesine:-5593.9307 false
298736393 361767103 All-text score:0.44931924 Time:0.9994102
Harvesine:-5593.9307 false
298736393 361767123 All-text score:0.44931924 Time:0.9994102
Harvesine:-5593.9307 false
298736393 361767146 All-text score:0.44931924 Time:0.9994102
Harvesine:-5593.9307 false
298736393 ?361492738 All-text score:0.13399276 Time:0.99982494
Harvesine:-5593.9307 false
298736393 361492799 All-text score:0.13399276 Time:0.99982494
Harvesine:-5593.9307 false
298736393 85899696 All-text score:0.061811533 Time:0.9680632
Harvesine:-2355.1875 false
298736393 103368776 All-text score:0.032769617 Time:0.855137
Harvesine:-5593.9307 false
298736393 101106927 All-text score:0.029139375 Time:0.8688775
Harvesine:-5593.9307 false
298736393 80793153 All-text score:0.0018638512 Time:0.99774164
Harvesine:-2201.7686 false
298736393 82661153 All-text score:0.0018638512 Time:0.98817354
Harvesine:-5593.9307 false
298736393 84079583 All-text score:0.0018638512 Time:0.9796918
Harvesine:-5593.9307 false
298736393 84079707 All-text score:0.0018638512 Time:0.9796918
Harvesine:-5593.9307 false
298736393 84079911 All-text score:0.0018638512 Time:0.9796918
Harvesine:-5593.9307 false


On 23 March 2011 09:01, Anshum <an...@gmail.com> wrote:

> Hi Patrick,
> You *don't* need to add a MatchAllDocs query to anything. If you just want
> all docs, just pass it to the searcher.search function and you'd get all
> results.
> MatchAllDocs query is the same as BooleanQuery , just that MADQ matches all
> docs in the index. You wouldn't need to specify anything there.
> The below would work and get you all the docs in the index as the result
> (provided you specify a limit high enough for the numDocs to match param)
>
> *Query query = new MatchAllDocsQuery();*
> *searcher.search(query.....);*
>
> Hope this clarifies your doubt.
>
> --
> Anshum Gupta
> http://ai-cafe.blogspot.com
>
>
> On Wed, Mar 23, 2011 at 1:14 PM, Patrick Diviacco <
> patrick.diviacco@gmail.com> wrote:
>
> > The issue with
> >
> >
> > My confusion about MatchAllDocsQuery is that I cannot specify which terms
> > in
> > which fields to search with it. I'm probably wrong.
> >
> > I currently have a BooleanQuery, that I use to build the query with
> several
> > fields and several terms.
> >
> > Can I just pass MatchAllDocsQuery to BooleanQuery.add method in order to
> > add
> > all remaining docs ?
> >
> > thanks
> >
> >
> >
> >
> > On 22 March 2011 12:42, Anshum <an...@gmail.com> wrote:
> >
> > > MatchAllDocs does not consider only a single field but all fields i.e.
> it
> > > takes a *:* query.
> > >
> > > *1. *
> > > *****Snip****
> > > Query query = new MatchAllDocsQuery();
> > > TopDocs  td = is.search(query, ir.numDocs());
> > > ScoreDoc[ ] scoreDocs = td.scoreDocs;
> > > for(ScoreDoc scoreDoc:scoreDocs){
> > >
> > > ... Your code...
> > >
> > > }
> > > ****/Snip***
> > >
> > > *2. Collector approach:*
> > >
> > > Query query = new MatchAllDocsQuery();
> > > int hitsPerPage = ir.numDocs();
> > > TopScoreDocCollector collector =
> TopScoreDocCollector.create(hitsPerPage,
> > > true);
> > > is.search(query, collector);
> > > ScoreDoc[] scoreDocs = collector.topDocs().scoreDocs;
> > > for(ScoreDoc scoreDoc:scoreDocs){
> > >
> > > .. Your code..
> > >
> > > }
> > >
> > > The better part about using a collector based approach is that you
> could
> > do
> > > a lot in the collect step i.e. everything that you'd do in the
> iteration
> > > post search could also be done while collecting here.
> > >
> > > Also, could you also tell me the exact case as to what is it that you
> are
> > > trying to achieve. You may have a completely different option that you
> > > haven't read which someone could advice if they know the exact intent.
> > >
> > > Hope this helps.
> > >
> > > --
> > > Anshum Gupta
> > > http://ai-cafe.blogspot.com
> > >
> > >
> > > On Tue, Mar 22, 2011 at 4:59 PM, Patrick Diviacco <
> > > patrick.diviacco@gmail.com> wrote:
> > >
> > > > 1. "all" docs
> > > >
> > > > 2. because matchalldocs only consider one field at once. I'm
> searching
> > > over
> > > > multiple fields instead.
> > > >
> > > > 3. could you tell me more about this ? It might be a solution!
> > > >
> > > >
> > > >
> > > > On 22 March 2011 12:18, Anshum <an...@gmail.com> wrote:
> > > >
> > > > > so a few things
> > > > > 1. are you looking to get 'all' documents or only docs matching
> your
> > > > query?
> > > > > 2. if its about fetching all docs, why not use the matchalldocs
> > query?
> > > > > 3. did you try using a collector instead of topdocs?
> > > > >
> > > > > --
> > > > > Anshum Gupta
> > > > > http://ai-cafe.blogspot.com
> > > > >
> > > > >
> > > > > On Tue, Mar 22, 2011 at 4:46 PM, Patrick Diviacco <
> > > > > patrick.diviacco@gmail.com> wrote:
> > > > >
> > > > > > I don't think the link you suggested can help, but maybe I'm
> wrong.
> > > > > >
> > > > > > Also, the parameter MAX_HITS is not useful, it just limit the
> > > results,
> > > > it
> > > > > > doesn't add the not relevant docs.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On 22 March 2011 12:10, Anshum <an...@gmail.com> wrote:
> > > > > >
> > > > > > > Hi Patrick,
> > > > > > > You may have a look at this, perhaps this will help you with
> it.
> > > Let
> > > > me
> > > > > > > know
> > > > > > > if you're still stuck up.
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://stackoverflow.com/questions/3300265/lucene-3-iterating-over-all-hits
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Anshum Gupta
> > > > > > > http://ai-cafe.blogspot.com
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Mar 22, 2011 at 4:10 PM, <ka...@nokia.com>
> wrote:
> > > > > > >
> > > > > > > > Not sure what your use case actually is, but it sounds like
> you
> > > may
> > > > > be
> > > > > > > > unclear how Lucene works.
> > > > > > > >
> > > > > > > > Each query clause you have will produce an iterator that
> walks
> > > over
> > > > > the
> > > > > > > > documents that match that clause.  All the documents from the
> > > > entire,
> > > > > > > root
> > > > > > > > query get scored.  The scoring evaluation per document is
> also
> > > > > related
> > > > > > to
> > > > > > > > the form of your query expression hierarchy.
> > > > > > > >
> > > > > > > > So, MatchAllDocsQuery is exactly what you want if you want a
> > > > document
> > > > > > > > iterator that includes all documents in the index.  You can
> > > change
> > > > > how
> > > > > > > this
> > > > > > > > is scored by extending MatchAllDocsQuery and writing a custom
> > > > scorer.
> > > > > > > >
> > > > > > > > Karl
> > > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: ext Patrick Diviacco [mailto:
> patrick.diviacco@gmail.com]
> > > > > > > > Sent: Tuesday, March 22, 2011 4:23 AM
> > > > > > > > To: java-user@lucene.apache.org
> > > > > > > > Subject: how to get all documents in the results ?
> > > > > > > >
> > > > > > > > I'm using the following code because I want to see the entire
> > > > > > collection
> > > > > > > in
> > > > > > > > my query results:
> > > > > > > >
> > > > > > > > //adding wildcards-term to see all results
> > > > > > > > rest = new TermQuery(new Term("*","*"));
> > > > > > > > booleanQuery.add(rest, BooleanClause.Occur.SHOULD);
> > > > > > > >
> > > > > > > > But it doesn't work, I only see the relevant docs and not all
> > the
> > > > > other
> > > > > > > > ones.
> > > > > > > > How can I get all documents ordered by relevance instead ?
> > > > > > > >
> > > > > > > > ps. MatchAllDocsQuery is not a solution because I need to
> > specify
> > > > my
> > > > > > own
> > > > > > > > custom query.
> > > > > > > >
> > > > > > > >
> > > > ---------------------------------------------------------------------
> > > > > > > > To unsubscribe, e-mail:
> > java-user-unsubscribe@lucene.apache.org
> > > > > > > > For additional commands, e-mail:
> > > java-user-help@lucene.apache.org
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: how to get all documents in the results ?

Posted by Anshum <an...@gmail.com>.

Hi Patrick,
You *don't* need to add a MatchAllDocs query to anything. If you just want
all docs, just pass it to the searcher.search function and you'd get all
results.
MatchAllDocs query is the same as BooleanQuery , just that MADQ matches all
docs in the index. You wouldn't need to specify anything there.
The below would work and get you all the docs in the index as the result
(provided you specify a limit high enough for the numDocs to match param)

*Query query = new MatchAllDocsQuery();*
*searcher.search(query.....);*

Hope this clarifies your doubt.

--
Anshum Gupta
http://ai-cafe.blogspot.com


On Wed, Mar 23, 2011 at 1:14 PM, Patrick Diviacco <
patrick.diviacco@gmail.com> wrote:

> The issue with
>
>
> My confusion about MatchAllDocsQuery is that I cannot specify which terms
> in
> which fields to search with it. I'm probably wrong.
>
> I currently have a BooleanQuery, that I use to build the query with several
> fields and several terms.
>
> Can I just pass MatchAllDocsQuery to BooleanQuery.add method in order to
> add
> all remaining docs ?
>
> thanks
>
>
>
>
> On 22 March 2011 12:42, Anshum <an...@gmail.com> wrote:
>
> > MatchAllDocs does not consider only a single field but all fields i.e. it
> > takes a *:* query.
> >
> > *1. *
> > *****Snip****
> > Query query = new MatchAllDocsQuery();
> > TopDocs  td = is.search(query, ir.numDocs());
> > ScoreDoc[ ] scoreDocs = td.scoreDocs;
> > for(ScoreDoc scoreDoc:scoreDocs){
> >
> > ... Your code...
> >
> > }
> > ****/Snip***
> >
> > *2. Collector approach:*
> >
> > Query query = new MatchAllDocsQuery();
> > int hitsPerPage = ir.numDocs();
> > TopScoreDocCollector collector = TopScoreDocCollector.create(hitsPerPage,
> > true);
> > is.search(query, collector);
> > ScoreDoc[] scoreDocs = collector.topDocs().scoreDocs;
> > for(ScoreDoc scoreDoc:scoreDocs){
> >
> > .. Your code..
> >
> > }
> >
> > The better part about using a collector based approach is that you could
> do
> > a lot in the collect step i.e. everything that you'd do in the iteration
> > post search could also be done while collecting here.
> >
> > Also, could you also tell me the exact case as to what is it that you are
> > trying to achieve. You may have a completely different option that you
> > haven't read which someone could advice if they know the exact intent.
> >
> > Hope this helps.
> >
> > --
> > Anshum Gupta
> > http://ai-cafe.blogspot.com
> >
> >
> > On Tue, Mar 22, 2011 at 4:59 PM, Patrick Diviacco <
> > patrick.diviacco@gmail.com> wrote:
> >
> > > 1. "all" docs
> > >
> > > 2. because matchalldocs only consider one field at once. I'm searching
> > over
> > > multiple fields instead.
> > >
> > > 3. could you tell me more about this ? It might be a solution!
> > >
> > >
> > >
> > > On 22 March 2011 12:18, Anshum <an...@gmail.com> wrote:
> > >
> > > > so a few things
> > > > 1. are you looking to get 'all' documents or only docs matching your
> > > query?
> > > > 2. if its about fetching all docs, why not use the matchalldocs
> query?
> > > > 3. did you try using a collector instead of topdocs?
> > > >
> > > > --
> > > > Anshum Gupta
> > > > http://ai-cafe.blogspot.com
> > > >
> > > >
> > > > On Tue, Mar 22, 2011 at 4:46 PM, Patrick Diviacco <
> > > > patrick.diviacco@gmail.com> wrote:
> > > >
> > > > > I don't think the link you suggested can help, but maybe I'm wrong.
> > > > >
> > > > > Also, the parameter MAX_HITS is not useful, it just limit the
> > results,
> > > it
> > > > > doesn't add the not relevant docs.
> > > > >
> > > > >
> > > > >
> > > > > On 22 March 2011 12:10, Anshum <an...@gmail.com> wrote:
> > > > >
> > > > > > Hi Patrick,
> > > > > > You may have a look at this, perhaps this will help you with it.
> > Let
> > > me
> > > > > > know
> > > > > > if you're still stuck up.
> > > > > >
> > > > >
> > > >
> > >
> >
> http://stackoverflow.com/questions/3300265/lucene-3-iterating-over-all-hits
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Anshum Gupta
> > > > > > http://ai-cafe.blogspot.com
> > > > > >
> > > > > >
> > > > > > On Tue, Mar 22, 2011 at 4:10 PM, <ka...@nokia.com> wrote:
> > > > > >
> > > > > > > Not sure what your use case actually is, but it sounds like you
> > may
> > > > be
> > > > > > > unclear how Lucene works.
> > > > > > >
> > > > > > > Each query clause you have will produce an iterator that walks
> > over
> > > > the
> > > > > > > documents that match that clause.  All the documents from the
> > > entire,
> > > > > > root
> > > > > > > query get scored.  The scoring evaluation per document is also
> > > > related
> > > > > to
> > > > > > > the form of your query expression hierarchy.
> > > > > > >
> > > > > > > So, MatchAllDocsQuery is exactly what you want if you want a
> > > document
> > > > > > > iterator that includes all documents in the index.  You can
> > change
> > > > how
> > > > > > this
> > > > > > > is scored by extending MatchAllDocsQuery and writing a custom
> > > scorer.
> > > > > > >
> > > > > > > Karl
> > > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: ext Patrick Diviacco [mailto:patrick.diviacco@gmail.com]
> > > > > > > Sent: Tuesday, March 22, 2011 4:23 AM
> > > > > > > To: java-user@lucene.apache.org
> > > > > > > Subject: how to get all documents in the results ?
> > > > > > >
> > > > > > > I'm using the following code because I want to see the entire
> > > > > collection
> > > > > > in
> > > > > > > my query results:
> > > > > > >
> > > > > > > //adding wildcards-term to see all results
> > > > > > > rest = new TermQuery(new Term("*","*"));
> > > > > > > booleanQuery.add(rest, BooleanClause.Occur.SHOULD);
> > > > > > >
> > > > > > > But it doesn't work, I only see the relevant docs and not all
> the
> > > > other
> > > > > > > ones.
> > > > > > > How can I get all documents ordered by relevance instead ?
> > > > > > >
> > > > > > > ps. MatchAllDocsQuery is not a solution because I need to
> specify
> > > my
> > > > > own
> > > > > > > custom query.
> > > > > > >
> > > > > > >
> > > ---------------------------------------------------------------------
> > > > > > > To unsubscribe, e-mail:
> java-user-unsubscribe@lucene.apache.org
> > > > > > > For additional commands, e-mail:
> > java-user-help@lucene.apache.org
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: how to get all documents in the results ?

Posted by Patrick Diviacco <pa...@gmail.com>.

The issue with


My confusion about MatchAllDocsQuery is that I cannot specify which terms in
which fields to search with it. I'm probably wrong.

I currently have a BooleanQuery, that I use to build the query with several
fields and several terms.

Can I just pass MatchAllDocsQuery to BooleanQuery.add method in order to add
all remaining docs ?

thanks




On 22 March 2011 12:42, Anshum <an...@gmail.com> wrote:

> MatchAllDocs does not consider only a single field but all fields i.e. it
> takes a *:* query.
>
> *1. *
> *****Snip****
> Query query = new MatchAllDocsQuery();
> TopDocs  td = is.search(query, ir.numDocs());
> ScoreDoc[ ] scoreDocs = td.scoreDocs;
> for(ScoreDoc scoreDoc:scoreDocs){
>
> ... Your code...
>
> }
> ****/Snip***
>
> *2. Collector approach:*
>
> Query query = new MatchAllDocsQuery();
> int hitsPerPage = ir.numDocs();
> TopScoreDocCollector collector = TopScoreDocCollector.create(hitsPerPage,
> true);
> is.search(query, collector);
> ScoreDoc[] scoreDocs = collector.topDocs().scoreDocs;
> for(ScoreDoc scoreDoc:scoreDocs){
>
> .. Your code..
>
> }
>
> The better part about using a collector based approach is that you could do
> a lot in the collect step i.e. everything that you'd do in the iteration
> post search could also be done while collecting here.
>
> Also, could you also tell me the exact case as to what is it that you are
> trying to achieve. You may have a completely different option that you
> haven't read which someone could advice if they know the exact intent.
>
> Hope this helps.
>
> --
> Anshum Gupta
> http://ai-cafe.blogspot.com
>
>
> On Tue, Mar 22, 2011 at 4:59 PM, Patrick Diviacco <
> patrick.diviacco@gmail.com> wrote:
>
> > 1. "all" docs
> >
> > 2. because matchalldocs only consider one field at once. I'm searching
> over
> > multiple fields instead.
> >
> > 3. could you tell me more about this ? It might be a solution!
> >
> >
> >
> > On 22 March 2011 12:18, Anshum <an...@gmail.com> wrote:
> >
> > > so a few things
> > > 1. are you looking to get 'all' documents or only docs matching your
> > query?
> > > 2. if its about fetching all docs, why not use the matchalldocs query?
> > > 3. did you try using a collector instead of topdocs?
> > >
> > > --
> > > Anshum Gupta
> > > http://ai-cafe.blogspot.com
> > >
> > >
> > > On Tue, Mar 22, 2011 at 4:46 PM, Patrick Diviacco <
> > > patrick.diviacco@gmail.com> wrote:
> > >
> > > > I don't think the link you suggested can help, but maybe I'm wrong.
> > > >
> > > > Also, the parameter MAX_HITS is not useful, it just limit the
> results,
> > it
> > > > doesn't add the not relevant docs.
> > > >
> > > >
> > > >
> > > > On 22 March 2011 12:10, Anshum <an...@gmail.com> wrote:
> > > >
> > > > > Hi Patrick,
> > > > > You may have a look at this, perhaps this will help you with it.
> Let
> > me
> > > > > know
> > > > > if you're still stuck up.
> > > > >
> > > >
> > >
> >
> http://stackoverflow.com/questions/3300265/lucene-3-iterating-over-all-hits
> > > > >
> > > > >
> > > > > --
> > > > > Anshum Gupta
> > > > > http://ai-cafe.blogspot.com
> > > > >
> > > > >
> > > > > On Tue, Mar 22, 2011 at 4:10 PM, <ka...@nokia.com> wrote:
> > > > >
> > > > > > Not sure what your use case actually is, but it sounds like you
> may
> > > be
> > > > > > unclear how Lucene works.
> > > > > >
> > > > > > Each query clause you have will produce an iterator that walks
> over
> > > the
> > > > > > documents that match that clause.  All the documents from the
> > entire,
> > > > > root
> > > > > > query get scored.  The scoring evaluation per document is also
> > > related
> > > > to
> > > > > > the form of your query expression hierarchy.
> > > > > >
> > > > > > So, MatchAllDocsQuery is exactly what you want if you want a
> > document
> > > > > > iterator that includes all documents in the index.  You can
> change
> > > how
> > > > > this
> > > > > > is scored by extending MatchAllDocsQuery and writing a custom
> > scorer.
> > > > > >
> > > > > > Karl
> > > > > >
> > > > > > -----Original Message-----
> > > > > > From: ext Patrick Diviacco [mailto:patrick.diviacco@gmail.com]
> > > > > > Sent: Tuesday, March 22, 2011 4:23 AM
> > > > > > To: java-user@lucene.apache.org
> > > > > > Subject: how to get all documents in the results ?
> > > > > >
> > > > > > I'm using the following code because I want to see the entire
> > > > collection
> > > > > in
> > > > > > my query results:
> > > > > >
> > > > > > //adding wildcards-term to see all results
> > > > > > rest = new TermQuery(new Term("*","*"));
> > > > > > booleanQuery.add(rest, BooleanClause.Occur.SHOULD);
> > > > > >
> > > > > > But it doesn't work, I only see the relevant docs and not all the
> > > other
> > > > > > ones.
> > > > > > How can I get all documents ordered by relevance instead ?
> > > > > >
> > > > > > ps. MatchAllDocsQuery is not a solution because I need to specify
> > my
> > > > own
> > > > > > custom query.
> > > > > >
> > > > > >
> > ---------------------------------------------------------------------
> > > > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > > > For additional commands, e-mail:
> java-user-help@lucene.apache.org
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: how to get all documents in the results ?

Posted by Anshum <an...@gmail.com>.

MatchAllDocs does not consider only a single field but all fields i.e. it
takes a *:* query.

*1. *
*****Snip****
Query query = new MatchAllDocsQuery();
TopDocs  td = is.search(query, ir.numDocs());
ScoreDoc[ ] scoreDocs = td.scoreDocs;
for(ScoreDoc scoreDoc:scoreDocs){

... Your code...

}
****/Snip***

*2. Collector approach:*

Query query = new MatchAllDocsQuery();
int hitsPerPage = ir.numDocs();
TopScoreDocCollector collector = TopScoreDocCollector.create(hitsPerPage,
true);
is.search(query, collector);
ScoreDoc[] scoreDocs = collector.topDocs().scoreDocs;
for(ScoreDoc scoreDoc:scoreDocs){

.. Your code..

}

The better part about using a collector based approach is that you could do
a lot in the collect step i.e. everything that you'd do in the iteration
post search could also be done while collecting here.

Also, could you also tell me the exact case as to what is it that you are
trying to achieve. You may have a completely different option that you
haven't read which someone could advice if they know the exact intent.

Hope this helps.

--
Anshum Gupta
http://ai-cafe.blogspot.com


On Tue, Mar 22, 2011 at 4:59 PM, Patrick Diviacco <
patrick.diviacco@gmail.com> wrote:

> 1. "all" docs
>
> 2. because matchalldocs only consider one field at once. I'm searching over
> multiple fields instead.
>
> 3. could you tell me more about this ? It might be a solution!
>
>
>
> On 22 March 2011 12:18, Anshum <an...@gmail.com> wrote:
>
> > so a few things
> > 1. are you looking to get 'all' documents or only docs matching your
> query?
> > 2. if its about fetching all docs, why not use the matchalldocs query?
> > 3. did you try using a collector instead of topdocs?
> >
> > --
> > Anshum Gupta
> > http://ai-cafe.blogspot.com
> >
> >
> > On Tue, Mar 22, 2011 at 4:46 PM, Patrick Diviacco <
> > patrick.diviacco@gmail.com> wrote:
> >
> > > I don't think the link you suggested can help, but maybe I'm wrong.
> > >
> > > Also, the parameter MAX_HITS is not useful, it just limit the results,
> it
> > > doesn't add the not relevant docs.
> > >
> > >
> > >
> > > On 22 March 2011 12:10, Anshum <an...@gmail.com> wrote:
> > >
> > > > Hi Patrick,
> > > > You may have a look at this, perhaps this will help you with it. Let
> me
> > > > know
> > > > if you're still stuck up.
> > > >
> > >
> >
> http://stackoverflow.com/questions/3300265/lucene-3-iterating-over-all-hits
> > > >
> > > >
> > > > --
> > > > Anshum Gupta
> > > > http://ai-cafe.blogspot.com
> > > >
> > > >
> > > > On Tue, Mar 22, 2011 at 4:10 PM, <ka...@nokia.com> wrote:
> > > >
> > > > > Not sure what your use case actually is, but it sounds like you may
> > be
> > > > > unclear how Lucene works.
> > > > >
> > > > > Each query clause you have will produce an iterator that walks over
> > the
> > > > > documents that match that clause.  All the documents from the
> entire,
> > > > root
> > > > > query get scored.  The scoring evaluation per document is also
> > related
> > > to
> > > > > the form of your query expression hierarchy.
> > > > >
> > > > > So, MatchAllDocsQuery is exactly what you want if you want a
> document
> > > > > iterator that includes all documents in the index.  You can change
> > how
> > > > this
> > > > > is scored by extending MatchAllDocsQuery and writing a custom
> scorer.
> > > > >
> > > > > Karl
> > > > >
> > > > > -----Original Message-----
> > > > > From: ext Patrick Diviacco [mailto:patrick.diviacco@gmail.com]
> > > > > Sent: Tuesday, March 22, 2011 4:23 AM
> > > > > To: java-user@lucene.apache.org
> > > > > Subject: how to get all documents in the results ?
> > > > >
> > > > > I'm using the following code because I want to see the entire
> > > collection
> > > > in
> > > > > my query results:
> > > > >
> > > > > //adding wildcards-term to see all results
> > > > > rest = new TermQuery(new Term("*","*"));
> > > > > booleanQuery.add(rest, BooleanClause.Occur.SHOULD);
> > > > >
> > > > > But it doesn't work, I only see the relevant docs and not all the
> > other
> > > > > ones.
> > > > > How can I get all documents ordered by relevance instead ?
> > > > >
> > > > > ps. MatchAllDocsQuery is not a solution because I need to specify
> my
> > > own
> > > > > custom query.
> > > > >
> > > > >
> ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: how to get all documents in the results ?

Posted by Patrick Diviacco <pa...@gmail.com>.

1. "all" docs

2. because matchalldocs only consider one field at once. I'm searching over
multiple fields instead.

3. could you tell me more about this ? It might be a solution!



On 22 March 2011 12:18, Anshum <an...@gmail.com> wrote:

> so a few things
> 1. are you looking to get 'all' documents or only docs matching your query?
> 2. if its about fetching all docs, why not use the matchalldocs query?
> 3. did you try using a collector instead of topdocs?
>
> --
> Anshum Gupta
> http://ai-cafe.blogspot.com
>
>
> On Tue, Mar 22, 2011 at 4:46 PM, Patrick Diviacco <
> patrick.diviacco@gmail.com> wrote:
>
> > I don't think the link you suggested can help, but maybe I'm wrong.
> >
> > Also, the parameter MAX_HITS is not useful, it just limit the results, it
> > doesn't add the not relevant docs.
> >
> >
> >
> > On 22 March 2011 12:10, Anshum <an...@gmail.com> wrote:
> >
> > > Hi Patrick,
> > > You may have a look at this, perhaps this will help you with it. Let me
> > > know
> > > if you're still stuck up.
> > >
> >
> http://stackoverflow.com/questions/3300265/lucene-3-iterating-over-all-hits
> > >
> > >
> > > --
> > > Anshum Gupta
> > > http://ai-cafe.blogspot.com
> > >
> > >
> > > On Tue, Mar 22, 2011 at 4:10 PM, <ka...@nokia.com> wrote:
> > >
> > > > Not sure what your use case actually is, but it sounds like you may
> be
> > > > unclear how Lucene works.
> > > >
> > > > Each query clause you have will produce an iterator that walks over
> the
> > > > documents that match that clause.  All the documents from the entire,
> > > root
> > > > query get scored.  The scoring evaluation per document is also
> related
> > to
> > > > the form of your query expression hierarchy.
> > > >
> > > > So, MatchAllDocsQuery is exactly what you want if you want a document
> > > > iterator that includes all documents in the index.  You can change
> how
> > > this
> > > > is scored by extending MatchAllDocsQuery and writing a custom scorer.
> > > >
> > > > Karl
> > > >
> > > > -----Original Message-----
> > > > From: ext Patrick Diviacco [mailto:patrick.diviacco@gmail.com]
> > > > Sent: Tuesday, March 22, 2011 4:23 AM
> > > > To: java-user@lucene.apache.org
> > > > Subject: how to get all documents in the results ?
> > > >
> > > > I'm using the following code because I want to see the entire
> > collection
> > > in
> > > > my query results:
> > > >
> > > > //adding wildcards-term to see all results
> > > > rest = new TermQuery(new Term("*","*"));
> > > > booleanQuery.add(rest, BooleanClause.Occur.SHOULD);
> > > >
> > > > But it doesn't work, I only see the relevant docs and not all the
> other
> > > > ones.
> > > > How can I get all documents ordered by relevance instead ?
> > > >
> > > > ps. MatchAllDocsQuery is not a solution because I need to specify my
> > own
> > > > custom query.
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > > >
> > > >
> > >
> >
>

Re: how to get all documents in the results ?

Posted by Anshum <an...@gmail.com>.

so a few things
1. are you looking to get 'all' documents or only docs matching your query?
2. if its about fetching all docs, why not use the matchalldocs query?
3. did you try using a collector instead of topdocs?

--
Anshum Gupta
http://ai-cafe.blogspot.com


On Tue, Mar 22, 2011 at 4:46 PM, Patrick Diviacco <
patrick.diviacco@gmail.com> wrote:

> I don't think the link you suggested can help, but maybe I'm wrong.
>
> Also, the parameter MAX_HITS is not useful, it just limit the results, it
> doesn't add the not relevant docs.
>
>
>
> On 22 March 2011 12:10, Anshum <an...@gmail.com> wrote:
>
> > Hi Patrick,
> > You may have a look at this, perhaps this will help you with it. Let me
> > know
> > if you're still stuck up.
> >
> http://stackoverflow.com/questions/3300265/lucene-3-iterating-over-all-hits
> >
> >
> > --
> > Anshum Gupta
> > http://ai-cafe.blogspot.com
> >
> >
> > On Tue, Mar 22, 2011 at 4:10 PM, <ka...@nokia.com> wrote:
> >
> > > Not sure what your use case actually is, but it sounds like you may be
> > > unclear how Lucene works.
> > >
> > > Each query clause you have will produce an iterator that walks over the
> > > documents that match that clause.  All the documents from the entire,
> > root
> > > query get scored.  The scoring evaluation per document is also related
> to
> > > the form of your query expression hierarchy.
> > >
> > > So, MatchAllDocsQuery is exactly what you want if you want a document
> > > iterator that includes all documents in the index.  You can change how
> > this
> > > is scored by extending MatchAllDocsQuery and writing a custom scorer.
> > >
> > > Karl
> > >
> > > -----Original Message-----
> > > From: ext Patrick Diviacco [mailto:patrick.diviacco@gmail.com]
> > > Sent: Tuesday, March 22, 2011 4:23 AM
> > > To: java-user@lucene.apache.org
> > > Subject: how to get all documents in the results ?
> > >
> > > I'm using the following code because I want to see the entire
> collection
> > in
> > > my query results:
> > >
> > > //adding wildcards-term to see all results
> > > rest = new TermQuery(new Term("*","*"));
> > > booleanQuery.add(rest, BooleanClause.Occur.SHOULD);
> > >
> > > But it doesn't work, I only see the relevant docs and not all the other
> > > ones.
> > > How can I get all documents ordered by relevance instead ?
> > >
> > > ps. MatchAllDocsQuery is not a solution because I need to specify my
> own
> > > custom query.
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> >
>

Re: how to get all documents in the results ?

Posted by Patrick Diviacco <pa...@gmail.com>.

I don't think the link you suggested can help, but maybe I'm wrong.

Also, the parameter MAX_HITS is not useful, it just limit the results, it
doesn't add the not relevant docs.



On 22 March 2011 12:10, Anshum <an...@gmail.com> wrote:

> Hi Patrick,
> You may have a look at this, perhaps this will help you with it. Let me
> know
> if you're still stuck up.
> http://stackoverflow.com/questions/3300265/lucene-3-iterating-over-all-hits
>
>
> --
> Anshum Gupta
> http://ai-cafe.blogspot.com
>
>
> On Tue, Mar 22, 2011 at 4:10 PM, <ka...@nokia.com> wrote:
>
> > Not sure what your use case actually is, but it sounds like you may be
> > unclear how Lucene works.
> >
> > Each query clause you have will produce an iterator that walks over the
> > documents that match that clause.  All the documents from the entire,
> root
> > query get scored.  The scoring evaluation per document is also related to
> > the form of your query expression hierarchy.
> >
> > So, MatchAllDocsQuery is exactly what you want if you want a document
> > iterator that includes all documents in the index.  You can change how
> this
> > is scored by extending MatchAllDocsQuery and writing a custom scorer.
> >
> > Karl
> >
> > -----Original Message-----
> > From: ext Patrick Diviacco [mailto:patrick.diviacco@gmail.com]
> > Sent: Tuesday, March 22, 2011 4:23 AM
> > To: java-user@lucene.apache.org
> > Subject: how to get all documents in the results ?
> >
> > I'm using the following code because I want to see the entire collection
> in
> > my query results:
> >
> > //adding wildcards-term to see all results
> > rest = new TermQuery(new Term("*","*"));
> > booleanQuery.add(rest, BooleanClause.Occur.SHOULD);
> >
> > But it doesn't work, I only see the relevant docs and not all the other
> > ones.
> > How can I get all documents ordered by relevance instead ?
> >
> > ps. MatchAllDocsQuery is not a solution because I need to specify my own
> > custom query.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>

Re: how to get all documents in the results ?

Posted by Anshum <an...@gmail.com>.

Hi Patrick,
You may have a look at this, perhaps this will help you with it. Let me know
if you're still stuck up.
http://stackoverflow.com/questions/3300265/lucene-3-iterating-over-all-hits


--
Anshum Gupta
http://ai-cafe.blogspot.com


On Tue, Mar 22, 2011 at 4:10 PM, <ka...@nokia.com> wrote:

> Not sure what your use case actually is, but it sounds like you may be
> unclear how Lucene works.
>
> Each query clause you have will produce an iterator that walks over the
> documents that match that clause.  All the documents from the entire, root
> query get scored.  The scoring evaluation per document is also related to
> the form of your query expression hierarchy.
>
> So, MatchAllDocsQuery is exactly what you want if you want a document
> iterator that includes all documents in the index.  You can change how this
> is scored by extending MatchAllDocsQuery and writing a custom scorer.
>
> Karl
>
> -----Original Message-----
> From: ext Patrick Diviacco [mailto:patrick.diviacco@gmail.com]
> Sent: Tuesday, March 22, 2011 4:23 AM
> To: java-user@lucene.apache.org
> Subject: how to get all documents in the results ?
>
> I'm using the following code because I want to see the entire collection in
> my query results:
>
> //adding wildcards-term to see all results
> rest = new TermQuery(new Term("*","*"));
> booleanQuery.add(rest, BooleanClause.Occur.SHOULD);
>
> But it doesn't work, I only see the relevant docs and not all the other
> ones.
> How can I get all documents ordered by relevance instead ?
>
> ps. MatchAllDocsQuery is not a solution because I need to specify my own
> custom query.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

RE: how to get all documents in the results ?

Posted by ka...@nokia.com.

Not sure what your use case actually is, but it sounds like you may be unclear how Lucene works.

Each query clause you have will produce an iterator that walks over the documents that match that clause.  All the documents from the entire, root query get scored.  The scoring evaluation per document is also related to the form of your query expression hierarchy.

So, MatchAllDocsQuery is exactly what you want if you want a document iterator that includes all documents in the index.  You can change how this is scored by extending MatchAllDocsQuery and writing a custom scorer.

Karl

-----Original Message-----
From: ext Patrick Diviacco [mailto:patrick.diviacco@gmail.com] 
Sent: Tuesday, March 22, 2011 4:23 AM
To: java-user@lucene.apache.org
Subject: how to get all documents in the results ?

I'm using the following code because I want to see the entire collection in
my query results:

//adding wildcards-term to see all results
rest = new TermQuery(new Term("*","*"));
booleanQuery.add(rest, BooleanClause.Occur.SHOULD);

But it doesn't work, I only see the relevant docs and not all the other
ones.
How can I get all documents ordered by relevance instead ?

ps. MatchAllDocsQuery is not a solution because I need to specify my own
custom query.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org