You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Ravi Patel <rp...@live.com> on 2010/04/22 21:58:53 UTC
Indexing and Searching fields that have unique values
Using Lucene.Net
I've built an index of documents.
The documents also have a unique identifier (my identifier, not the lucene index's id).
The unique identifers are also a sort order of new-ness (higher id values are newer)
string my_id ="1234"
doc.Add(new Field("id", my_id, Field.Store.YES, Field.Index.UN_TOKENIZED));
Searching for a particular id, or range searches are incredibly slow
TermQuery query = new TermQuery(new Term("id", "1234"));
searcher.Search(query)
Any tips on how to speed up such an search?
I'm also doing RangeSearches on lower / upper ids, and those are slow too
_________________________________________________________________
The New Busy is not the too busy. Combine all your e-mail accounts with Hotmail.
http://www.windowslive.com/campaign/thenewbusy?tile=multiaccount&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4
Re: Indexing and Searching fields that have unique values
Posted by Ivan Liu <ja...@gmail.com>.
I think Anshum is right。
And may your range is too big and is sorting
2010/4/23 Anshum <an...@gmail.com>
> Hi Ravi,
>
> Adding to what Erick said, you could do index the numbers as numeric fields
> instead of strings. This should improve things for you by a considerable
> amount.
> P.S: I'm talking with my knowledge on Java Lucene.
> --
> Anshum Gupta
> Naukri Labs!
> http://ai-cafe.blogspot.com
>
> The facts expressed here belong to everybody, the opinions to me. The
> distinction is yours to draw............
>
>
> On Fri, Apr 23, 2010 at 1:43 AM, Erick Erickson <erickerickson@gmail.com
> >wrote:
>
> > You have to provide more info, especially the search code you're
> > using. How many documents in your index? What are you measuring?
> >
> > Anything else you can think of that might help people diagnose
> > your issue.
> >
> > Also, consider asking on the .Net user's list.
> >
> > Known things to look for (in Java).
> > 1> Are you re-opening an index reader each time? Don't
> > 2> Are you sorting? If so, the first querie(s) will fill internal
> > caches, this takes time. Time subsequent searches.
> >
> > HTH
> > Erick
> >
> > On Thu, Apr 22, 2010 at 3:58 PM, Ravi Patel <rp...@live.com> wrote:
> >
> > >
> > >
> > >
> > > Using Lucene.Net
> > >
> > >
> > >
> > > I've built an index of documents.
> > >
> > >
> > >
> > > The documents also have a unique identifier (my identifier, not the
> > lucene
> > > index's id).
> > >
> > > The unique identifers are also a sort order of new-ness (higher id
> values
> > > are newer)
> > >
> > >
> > >
> > > string my_id ="1234"
> > >
> > > doc.Add(new Field("id", my_id, Field.Store.YES,
> > Field.Index.UN_TOKENIZED));
> > >
> > >
> > >
> > > Searching for a particular id, or range searches are incredibly slow
> > >
> > >
> > >
> > >
> > >
> > > TermQuery query = new TermQuery(new Term("id", "1234"));
> > >
> > > searcher.Search(query)
> > >
> > >
> > >
> > >
> > >
> > > Any tips on how to speed up such an search?
> > >
> > >
> > >
> > > I'm also doing RangeSearches on lower / upper ids, and those are slow
> too
> > >
> > > _________________________________________________________________
> > > The New Busy is not the too busy. Combine all your e-mail accounts with
> > > Hotmail.
> > >
> > >
> >
> http://www.windowslive.com/campaign/thenewbusy?tile=multiaccount&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4
> >
>
--
冲浪板
my blog:冲浪板 <http://chonglangban.appspot.com/>
my site:Keji Technology <http://kejiblog.appspot.com/>
Re: Indexing and Searching fields that have unique values
Posted by Anshum <an...@gmail.com>.
Hi Ravi,
Adding to what Erick said, you could do index the numbers as numeric fields
instead of strings. This should improve things for you by a considerable
amount.
P.S: I'm talking with my knowledge on Java Lucene.
--
Anshum Gupta
Naukri Labs!
http://ai-cafe.blogspot.com
The facts expressed here belong to everybody, the opinions to me. The
distinction is yours to draw............
On Fri, Apr 23, 2010 at 1:43 AM, Erick Erickson <er...@gmail.com>wrote:
> You have to provide more info, especially the search code you're
> using. How many documents in your index? What are you measuring?
>
> Anything else you can think of that might help people diagnose
> your issue.
>
> Also, consider asking on the .Net user's list.
>
> Known things to look for (in Java).
> 1> Are you re-opening an index reader each time? Don't
> 2> Are you sorting? If so, the first querie(s) will fill internal
> caches, this takes time. Time subsequent searches.
>
> HTH
> Erick
>
> On Thu, Apr 22, 2010 at 3:58 PM, Ravi Patel <rp...@live.com> wrote:
>
> >
> >
> >
> > Using Lucene.Net
> >
> >
> >
> > I've built an index of documents.
> >
> >
> >
> > The documents also have a unique identifier (my identifier, not the
> lucene
> > index's id).
> >
> > The unique identifers are also a sort order of new-ness (higher id values
> > are newer)
> >
> >
> >
> > string my_id ="1234"
> >
> > doc.Add(new Field("id", my_id, Field.Store.YES,
> Field.Index.UN_TOKENIZED));
> >
> >
> >
> > Searching for a particular id, or range searches are incredibly slow
> >
> >
> >
> >
> >
> > TermQuery query = new TermQuery(new Term("id", "1234"));
> >
> > searcher.Search(query)
> >
> >
> >
> >
> >
> > Any tips on how to speed up such an search?
> >
> >
> >
> > I'm also doing RangeSearches on lower / upper ids, and those are slow too
> >
> > _________________________________________________________________
> > The New Busy is not the too busy. Combine all your e-mail accounts with
> > Hotmail.
> >
> >
> http://www.windowslive.com/campaign/thenewbusy?tile=multiaccount&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4
>
Re: Indexing and Searching fields that have unique values
Posted by Erick Erickson <er...@gmail.com>.
You have to provide more info, especially the search code you're
using. How many documents in your index? What are you measuring?
Anything else you can think of that might help people diagnose
your issue.
Also, consider asking on the .Net user's list.
Known things to look for (in Java).
1> Are you re-opening an index reader each time? Don't
2> Are you sorting? If so, the first querie(s) will fill internal
caches, this takes time. Time subsequent searches.
HTH
Erick
On Thu, Apr 22, 2010 at 3:58 PM, Ravi Patel <rp...@live.com> wrote:
>
>
>
> Using Lucene.Net
>
>
>
> I've built an index of documents.
>
>
>
> The documents also have a unique identifier (my identifier, not the lucene
> index's id).
>
> The unique identifers are also a sort order of new-ness (higher id values
> are newer)
>
>
>
> string my_id ="1234"
>
> doc.Add(new Field("id", my_id, Field.Store.YES, Field.Index.UN_TOKENIZED));
>
>
>
> Searching for a particular id, or range searches are incredibly slow
>
>
>
>
>
> TermQuery query = new TermQuery(new Term("id", "1234"));
>
> searcher.Search(query)
>
>
>
>
>
> Any tips on how to speed up such an search?
>
>
>
> I'm also doing RangeSearches on lower / upper ids, and those are slow too
>
> _________________________________________________________________
> The New Busy is not the too busy. Combine all your e-mail accounts with
> Hotmail.
>
> http://www.windowslive.com/campaign/thenewbusy?tile=multiaccount&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4