You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucenenet.apache.org by Marco Dissel <ma...@gmail.com> on 2011/05/27 09:02:07 UTC

[Lucene.Net] search within a list of unique ID's

Our index contains documents with an unique ID (long number) corresponding
to a record in SQL database. I want to start a lucene search within a list
of ID's returned from a SQL resultset.

IDataReader reader = cmd.Execute("select ID from ...")
long[] ids = GetID's(reader);
string search = Join(ids) /// results in (ID:1 OR ID:2 OR ID:3)
search += " AND (" + <searchstring entered by user> + ")";
IndexSearcher searcher = new IndexSearcher(_Reader);
Query query = new Lucene.Net.QueryParsers.QueryParser(,,,, search);
TopDocs docs = searcher.Search(query, 1000000);

Is this the way to go?

Thanks

RE: [Lucene.Net] search within a list of unique ID's

Posted by "Granroth, Neal V." <ne...@thermofisher.com>.
Yes, I think that the relative values of the scores could change significantly depending on the number of matches to documents which are then filtered out.

I found CachingWrapperFilter useful to do this type of selection in the past where I needed to limit the set of documents that would be searched before the query is run.


- Neal

-----Original Message-----
From: Moray McConnachie [mailto:mmcconna@oxford-analytica.com] 
Sent: Friday, May 27, 2011 4:00 AM
To: lucene-net-user@lucene.apache.org
Subject: RE: [Lucene.Net] search within a list of unique ID's

It might be different as an actual number, but the relative ranking
should not change - since each ID matches only one document, only the
user's part of the search can affect the ranking.

Someone else would need to speak as to whether the relative value of the
scores will also change, but I can't see this being a problem.

The custom collector route also means that paging is handled as normal
and doesn't present a problem.

If you follow this route, and if the user's valid list of IDs remains
constant per user, you will certainly want to cache and index the list
of valid IDs. 

Yours,
Moray
-------------------------------------
Moray McConnachie
Director of IT    +44 1865 261 600
Oxford Analytica  http://www.oxan.com

-----Original Message-----
From: Marco Dissel [mailto:marco.dissel@gmail.com] 
Sent: 27 May 2011 09:14
To: lucene-net-user@lucene.apache.org
Subject: Re: [Lucene.Net] search within a list of unique ID's

But is the document score not different if i filter the list afterwards
with my list of unique ID's?


On Fri, May 27, 2011 at 9:45 AM, Moray McConnachie <
mmcconna@oxford-analytica.com> wrote:

> If the list of IDs could be very long, then the search string could 
> become horrendously long, and you might also have to look at the 
> maximum clauses Lucene permits in a query.
>
> If user's search is reasonably restrictive, you might do better 
> running the user's search then filtering the list of returned results 
> according to your list of IDs. There are numerous ways of achieving 
> this, I personally like the custom collector because you can add more 
> complicated logic later if you want.
>
> Yours,
> Moray
>
>
> -------------------------------------
> Moray McConnachie
> Director of IT    +44 1865 261 600
> Oxford Analytica  http://www.oxan.com
>
> -----Original Message-----
> From: digy digy [mailto:digydigy@gmail.com]
> Sent: 27 May 2011 08:19
> To: lucene-net-user@lucene.apache.org
> Subject: Re: [Lucene.Net] search within a list of unique ID's
>
> Creation of SimplefacetedSearch is slow. Therefore it should only be 
> created  when a new reader is opened (or reopened).
> DIGY
>
> On Fri, May 27, 2011 at 10:02 AM, Marco Dissel
> <ma...@gmail.com>wrote:
>
> > Our index contains documents with an unique ID (long number) 
> > corresponding to a record in SQL database. I want to start a lucene 
> > search within a list of ID's returned from a SQL resultset.
> >
> > IDataReader reader = cmd.Execute("select ID from ...") long[] ids = 
> > GetID's(reader); string search = Join(ids) /// results in (ID:1 OR
> > ID:2 OR ID:3) search += " AND (" + <searchstring entered by user> + 
> > ")"; IndexSearcher searcher = new IndexSearcher(_Reader); Query 
> > query = new Lucene.Net.QueryParsers.QueryParser(,,,, search); 
> > TopDocs docs =
>
> > searcher.Search(query, 1000000);
> >
> > Is this the way to go?
> >
> > Thanks
> >
>
> ---------------------------------------------------------
> Disclaimer
>
> This message and any attachments are confidential and/or privileged. 
> If this has been sent to you in error, please do not use, retain or 
> disclose them, and contact the sender as soon as possible.
>
> Oxford Analytica Ltd
> Registered in England: No. 1196703
> 5 Alfred Street, Oxford
> United Kingdom, OX1 4EH
> ---------------------------------------------------------
>
>

---------------------------------------------------------
Disclaimer 

This message and any attachments are confidential and/or privileged. If this has been sent to you in error, please do not use, retain or disclose them, and contact the sender as soon as possible.

Oxford Analytica Ltd
Registered in England: No. 1196703
5 Alfred Street, Oxford
United Kingdom, OX1 4EH
---------------------------------------------------------


RE: [Lucene.Net] search within a list of unique ID's

Posted by Moray McConnachie <mm...@oxford-analytica.com>.
It might be different as an actual number, but the relative ranking
should not change - since each ID matches only one document, only the
user's part of the search can affect the ranking.

Someone else would need to speak as to whether the relative value of the
scores will also change, but I can't see this being a problem.

The custom collector route also means that paging is handled as normal
and doesn't present a problem.

If you follow this route, and if the user's valid list of IDs remains
constant per user, you will certainly want to cache and index the list
of valid IDs. 

Yours,
Moray
-------------------------------------
Moray McConnachie
Director of IT    +44 1865 261 600
Oxford Analytica  http://www.oxan.com

-----Original Message-----
From: Marco Dissel [mailto:marco.dissel@gmail.com] 
Sent: 27 May 2011 09:14
To: lucene-net-user@lucene.apache.org
Subject: Re: [Lucene.Net] search within a list of unique ID's

But is the document score not different if i filter the list afterwards
with my list of unique ID's?


On Fri, May 27, 2011 at 9:45 AM, Moray McConnachie <
mmcconna@oxford-analytica.com> wrote:

> If the list of IDs could be very long, then the search string could 
> become horrendously long, and you might also have to look at the 
> maximum clauses Lucene permits in a query.
>
> If user's search is reasonably restrictive, you might do better 
> running the user's search then filtering the list of returned results 
> according to your list of IDs. There are numerous ways of achieving 
> this, I personally like the custom collector because you can add more 
> complicated logic later if you want.
>
> Yours,
> Moray
>
>
> -------------------------------------
> Moray McConnachie
> Director of IT    +44 1865 261 600
> Oxford Analytica  http://www.oxan.com
>
> -----Original Message-----
> From: digy digy [mailto:digydigy@gmail.com]
> Sent: 27 May 2011 08:19
> To: lucene-net-user@lucene.apache.org
> Subject: Re: [Lucene.Net] search within a list of unique ID's
>
> Creation of SimplefacetedSearch is slow. Therefore it should only be 
> created  when a new reader is opened (or reopened).
> DIGY
>
> On Fri, May 27, 2011 at 10:02 AM, Marco Dissel
> <ma...@gmail.com>wrote:
>
> > Our index contains documents with an unique ID (long number) 
> > corresponding to a record in SQL database. I want to start a lucene 
> > search within a list of ID's returned from a SQL resultset.
> >
> > IDataReader reader = cmd.Execute("select ID from ...") long[] ids = 
> > GetID's(reader); string search = Join(ids) /// results in (ID:1 OR
> > ID:2 OR ID:3) search += " AND (" + <searchstring entered by user> + 
> > ")"; IndexSearcher searcher = new IndexSearcher(_Reader); Query 
> > query = new Lucene.Net.QueryParsers.QueryParser(,,,, search); 
> > TopDocs docs =
>
> > searcher.Search(query, 1000000);
> >
> > Is this the way to go?
> >
> > Thanks
> >
>
> ---------------------------------------------------------
> Disclaimer
>
> This message and any attachments are confidential and/or privileged. 
> If this has been sent to you in error, please do not use, retain or 
> disclose them, and contact the sender as soon as possible.
>
> Oxford Analytica Ltd
> Registered in England: No. 1196703
> 5 Alfred Street, Oxford
> United Kingdom, OX1 4EH
> ---------------------------------------------------------
>
>

---------------------------------------------------------
Disclaimer 

This message and any attachments are confidential and/or privileged. If this has been sent to you in error, please do not use, retain or disclose them, and contact the sender as soon as possible.

Oxford Analytica Ltd
Registered in England: No. 1196703
5 Alfred Street, Oxford
United Kingdom, OX1 4EH
---------------------------------------------------------


Re: [Lucene.Net] search within a list of unique ID's

Posted by Marco Dissel <ma...@gmail.com>.
But is the document score not different if i filter the list afterwards with
my list of unique ID's?


On Fri, May 27, 2011 at 9:45 AM, Moray McConnachie <
mmcconna@oxford-analytica.com> wrote:

> If the list of IDs could be very long, then the search string could
> become horrendously long, and you might also have to look at the maximum
> clauses Lucene permits in a query.
>
> If user's search is reasonably restrictive, you might do better running
> the user's search then filtering the list of returned results according
> to your list of IDs. There are numerous ways of achieving this, I
> personally like the custom collector because you can add more
> complicated logic later if you want.
>
> Yours,
> Moray
>
>
> -------------------------------------
> Moray McConnachie
> Director of IT    +44 1865 261 600
> Oxford Analytica  http://www.oxan.com
>
> -----Original Message-----
> From: digy digy [mailto:digydigy@gmail.com]
> Sent: 27 May 2011 08:19
> To: lucene-net-user@lucene.apache.org
> Subject: Re: [Lucene.Net] search within a list of unique ID's
>
> Creation of SimplefacetedSearch is slow. Therefore it should only be
> created  when a new reader is opened (or reopened).
> DIGY
>
> On Fri, May 27, 2011 at 10:02 AM, Marco Dissel
> <ma...@gmail.com>wrote:
>
> > Our index contains documents with an unique ID (long number)
> > corresponding to a record in SQL database. I want to start a lucene
> > search within a list of ID's returned from a SQL resultset.
> >
> > IDataReader reader = cmd.Execute("select ID from ...") long[] ids =
> > GetID's(reader); string search = Join(ids) /// results in (ID:1 OR
> > ID:2 OR ID:3) search += " AND (" + <searchstring entered by user> +
> > ")"; IndexSearcher searcher = new IndexSearcher(_Reader); Query query
> > = new Lucene.Net.QueryParsers.QueryParser(,,,, search); TopDocs docs =
>
> > searcher.Search(query, 1000000);
> >
> > Is this the way to go?
> >
> > Thanks
> >
>
> ---------------------------------------------------------
> Disclaimer
>
> This message and any attachments are confidential and/or privileged. If
> this has been sent to you in error, please do not use, retain or disclose
> them, and contact the sender as soon as possible.
>
> Oxford Analytica Ltd
> Registered in England: No. 1196703
> 5 Alfred Street, Oxford
> United Kingdom, OX1 4EH
> ---------------------------------------------------------
>
>

RE: [Lucene.Net] search within a list of unique ID's

Posted by Moray McConnachie <mm...@oxford-analytica.com>.
If the list of IDs could be very long, then the search string could
become horrendously long, and you might also have to look at the maximum
clauses Lucene permits in a query.

If user's search is reasonably restrictive, you might do better running
the user's search then filtering the list of returned results according
to your list of IDs. There are numerous ways of achieving this, I
personally like the custom collector because you can add more
complicated logic later if you want.

Yours,
Moray


-------------------------------------
Moray McConnachie
Director of IT    +44 1865 261 600
Oxford Analytica  http://www.oxan.com

-----Original Message-----
From: digy digy [mailto:digydigy@gmail.com] 
Sent: 27 May 2011 08:19
To: lucene-net-user@lucene.apache.org
Subject: Re: [Lucene.Net] search within a list of unique ID's

Creation of SimplefacetedSearch is slow. Therefore it should only be
created  when a new reader is opened (or reopened).
DIGY

On Fri, May 27, 2011 at 10:02 AM, Marco Dissel
<ma...@gmail.com>wrote:

> Our index contains documents with an unique ID (long number) 
> corresponding to a record in SQL database. I want to start a lucene 
> search within a list of ID's returned from a SQL resultset.
>
> IDataReader reader = cmd.Execute("select ID from ...") long[] ids = 
> GetID's(reader); string search = Join(ids) /// results in (ID:1 OR 
> ID:2 OR ID:3) search += " AND (" + <searchstring entered by user> + 
> ")"; IndexSearcher searcher = new IndexSearcher(_Reader); Query query 
> = new Lucene.Net.QueryParsers.QueryParser(,,,, search); TopDocs docs =

> searcher.Search(query, 1000000);
>
> Is this the way to go?
>
> Thanks
>

---------------------------------------------------------
Disclaimer 

This message and any attachments are confidential and/or privileged. If this has been sent to you in error, please do not use, retain or disclose them, and contact the sender as soon as possible.

Oxford Analytica Ltd
Registered in England: No. 1196703
5 Alfred Street, Oxford
United Kingdom, OX1 4EH
---------------------------------------------------------


Re: [Lucene.Net] search within a list of unique ID's

Posted by Marco Dissel <ma...@gmail.com>.
I think you're answering the wrong thread... This is a new question ;-)

On Fri, May 27, 2011 at 9:18 AM, digy digy <di...@gmail.com> wrote:

> Creation of SimplefacetedSearch is slow. Therefore it should only be
> created
>  when a new reader is opened (or reopened).
> DIGY
>
> On Fri, May 27, 2011 at 10:02 AM, Marco Dissel <marco.dissel@gmail.com
> >wrote:
>
> > Our index contains documents with an unique ID (long number)
> corresponding
> > to a record in SQL database. I want to start a lucene search within a
> list
> > of ID's returned from a SQL resultset.
> >
> > IDataReader reader = cmd.Execute("select ID from ...")
> > long[] ids = GetID's(reader);
> > string search = Join(ids) /// results in (ID:1 OR ID:2 OR ID:3)
> > search += " AND (" + <searchstring entered by user> + ")";
> > IndexSearcher searcher = new IndexSearcher(_Reader);
> > Query query = new Lucene.Net.QueryParsers.QueryParser(,,,, search);
> > TopDocs docs = searcher.Search(query, 1000000);
> >
> > Is this the way to go?
> >
> > Thanks
> >
>

Re: [Lucene.Net] search within a list of unique ID's

Posted by digy digy <di...@gmail.com>.
When term count in query increases, search time increases also. Therefore,
if possible, use precomputed BitSets and "AND" it with user's query. Another
costly operation is the fectching data from the index. Limiting the result
count to smaller numbers may help too.

DIGY

On Fri, May 27, 2011 at 10:18 AM, digy digy <di...@gmail.com> wrote:

> Creation of SimplefacetedSearch is slow. Therefore it should only be
> created  when a new reader is opened (or reopened).
> DIGY
>
>
> On Fri, May 27, 2011 at 10:02 AM, Marco Dissel <ma...@gmail.com>wrote:
>
>> Our index contains documents with an unique ID (long number) corresponding
>> to a record in SQL database. I want to start a lucene search within a list
>> of ID's returned from a SQL resultset.
>>
>> IDataReader reader = cmd.Execute("select ID from ...")
>> long[] ids = GetID's(reader);
>> string search = Join(ids) /// results in (ID:1 OR ID:2 OR ID:3)
>> search += " AND (" + <searchstring entered by user> + ")";
>> IndexSearcher searcher = new IndexSearcher(_Reader);
>> Query query = new Lucene.Net.QueryParsers.QueryParser(,,,, search);
>> TopDocs docs = searcher.Search(query, 1000000);
>>
>> Is this the way to go?
>>
>> Thanks
>>
>
>

Re: [Lucene.Net] search within a list of unique ID's

Posted by digy digy <di...@gmail.com>.
Creation of SimplefacetedSearch is slow. Therefore it should only be created
 when a new reader is opened (or reopened).
DIGY

On Fri, May 27, 2011 at 10:02 AM, Marco Dissel <ma...@gmail.com>wrote:

> Our index contains documents with an unique ID (long number) corresponding
> to a record in SQL database. I want to start a lucene search within a list
> of ID's returned from a SQL resultset.
>
> IDataReader reader = cmd.Execute("select ID from ...")
> long[] ids = GetID's(reader);
> string search = Join(ids) /// results in (ID:1 OR ID:2 OR ID:3)
> search += " AND (" + <searchstring entered by user> + ")";
> IndexSearcher searcher = new IndexSearcher(_Reader);
> Query query = new Lucene.Net.QueryParsers.QueryParser(,,,, search);
> TopDocs docs = searcher.Search(query, 1000000);
>
> Is this the way to go?
>
> Thanks
>