You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Ravi Patel <rp...@live.com> on 2010/04/22 21:58:53 UTC

Indexing and Searching fields that have unique values

 

Using Lucene.Net

 

I've built an index of documents.

 

The documents also have a unique identifier (my identifier, not the lucene index's id).

The unique identifers are also a sort order of new-ness (higher id values are newer)

 

string my_id ="1234"

doc.Add(new Field("id", my_id, Field.Store.YES, Field.Index.UN_TOKENIZED));

 

Searching for a particular id, or range searches are incredibly slow

 

 

TermQuery query = new TermQuery(new Term("id", "1234"));

searcher.Search(query)

 

 

Any tips on how to speed up such an search?

 

I'm also doing RangeSearches on lower / upper ids, and those are slow too
 		 	   		  
_________________________________________________________________
The New Busy is not the too busy. Combine all your e-mail accounts with Hotmail.
http://www.windowslive.com/campaign/thenewbusy?tile=multiaccount&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4

Re: Indexing and Searching fields that have unique values

Posted by Ivan Liu <ja...@gmail.com>.
I think Anshum is right。
And may your range is too big and is sorting


2010/4/23 Anshum <an...@gmail.com>

> Hi Ravi,
>
> Adding to what Erick said, you could do index the numbers as numeric fields
> instead of strings. This should improve things for you by a considerable
> amount.
> P.S: I'm talking with my knowledge on Java Lucene.
> --
> Anshum Gupta
> Naukri Labs!
> http://ai-cafe.blogspot.com
>
> The facts expressed here belong to everybody, the opinions to me. The
> distinction is yours to draw............
>
>
> On Fri, Apr 23, 2010 at 1:43 AM, Erick Erickson <erickerickson@gmail.com
> >wrote:
>
> > You have to provide more info, especially the search code you're
> > using. How many documents in your index? What are you measuring?
> >
> > Anything else you can think of that might help people diagnose
> > your issue.
> >
> > Also, consider asking on the .Net user's list.
> >
> > Known things to look for (in Java).
> > 1> Are you re-opening an index reader each time? Don't
> > 2> Are you sorting? If so, the first querie(s) will fill internal
> >   caches, this takes time. Time subsequent searches.
> >
> > HTH
> > Erick
> >
> > On Thu, Apr 22, 2010 at 3:58 PM, Ravi Patel <rp...@live.com> wrote:
> >
> > >
> > >
> > >
> > > Using Lucene.Net
> > >
> > >
> > >
> > > I've built an index of documents.
> > >
> > >
> > >
> > > The documents also have a unique identifier (my identifier, not the
> > lucene
> > > index's id).
> > >
> > > The unique identifers are also a sort order of new-ness (higher id
> values
> > > are newer)
> > >
> > >
> > >
> > > string my_id ="1234"
> > >
> > > doc.Add(new Field("id", my_id, Field.Store.YES,
> > Field.Index.UN_TOKENIZED));
> > >
> > >
> > >
> > > Searching for a particular id, or range searches are incredibly slow
> > >
> > >
> > >
> > >
> > >
> > > TermQuery query = new TermQuery(new Term("id", "1234"));
> > >
> > > searcher.Search(query)
> > >
> > >
> > >
> > >
> > >
> > > Any tips on how to speed up such an search?
> > >
> > >
> > >
> > > I'm also doing RangeSearches on lower / upper ids, and those are slow
> too
> > >
> > > _________________________________________________________________
> > > The New Busy is not the too busy. Combine all your e-mail accounts with
> > > Hotmail.
> > >
> > >
> >
> http://www.windowslive.com/campaign/thenewbusy?tile=multiaccount&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4
> >
>



-- 
冲浪板
my blog:冲浪板 <http://chonglangban.appspot.com/>
my site:Keji Technology <http://kejiblog.appspot.com/>

Re: Indexing and Searching fields that have unique values

Posted by Anshum <an...@gmail.com>.
Hi Ravi,

Adding to what Erick said, you could do index the numbers as numeric fields
instead of strings. This should improve things for you by a considerable
amount.
P.S: I'm talking with my knowledge on Java Lucene.
--
Anshum Gupta
Naukri Labs!
http://ai-cafe.blogspot.com

The facts expressed here belong to everybody, the opinions to me. The
distinction is yours to draw............


On Fri, Apr 23, 2010 at 1:43 AM, Erick Erickson <er...@gmail.com>wrote:

> You have to provide more info, especially the search code you're
> using. How many documents in your index? What are you measuring?
>
> Anything else you can think of that might help people diagnose
> your issue.
>
> Also, consider asking on the .Net user's list.
>
> Known things to look for (in Java).
> 1> Are you re-opening an index reader each time? Don't
> 2> Are you sorting? If so, the first querie(s) will fill internal
>   caches, this takes time. Time subsequent searches.
>
> HTH
> Erick
>
> On Thu, Apr 22, 2010 at 3:58 PM, Ravi Patel <rp...@live.com> wrote:
>
> >
> >
> >
> > Using Lucene.Net
> >
> >
> >
> > I've built an index of documents.
> >
> >
> >
> > The documents also have a unique identifier (my identifier, not the
> lucene
> > index's id).
> >
> > The unique identifers are also a sort order of new-ness (higher id values
> > are newer)
> >
> >
> >
> > string my_id ="1234"
> >
> > doc.Add(new Field("id", my_id, Field.Store.YES,
> Field.Index.UN_TOKENIZED));
> >
> >
> >
> > Searching for a particular id, or range searches are incredibly slow
> >
> >
> >
> >
> >
> > TermQuery query = new TermQuery(new Term("id", "1234"));
> >
> > searcher.Search(query)
> >
> >
> >
> >
> >
> > Any tips on how to speed up such an search?
> >
> >
> >
> > I'm also doing RangeSearches on lower / upper ids, and those are slow too
> >
> > _________________________________________________________________
> > The New Busy is not the too busy. Combine all your e-mail accounts with
> > Hotmail.
> >
> >
> http://www.windowslive.com/campaign/thenewbusy?tile=multiaccount&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4
>

Re: Indexing and Searching fields that have unique values

Posted by Erick Erickson <er...@gmail.com>.
You have to provide more info, especially the search code you're
using. How many documents in your index? What are you measuring?

Anything else you can think of that might help people diagnose
your issue.

Also, consider asking on the .Net user's list.

Known things to look for (in Java).
1> Are you re-opening an index reader each time? Don't
2> Are you sorting? If so, the first querie(s) will fill internal
   caches, this takes time. Time subsequent searches.

HTH
Erick

On Thu, Apr 22, 2010 at 3:58 PM, Ravi Patel <rp...@live.com> wrote:

>
>
>
> Using Lucene.Net
>
>
>
> I've built an index of documents.
>
>
>
> The documents also have a unique identifier (my identifier, not the lucene
> index's id).
>
> The unique identifers are also a sort order of new-ness (higher id values
> are newer)
>
>
>
> string my_id ="1234"
>
> doc.Add(new Field("id", my_id, Field.Store.YES, Field.Index.UN_TOKENIZED));
>
>
>
> Searching for a particular id, or range searches are incredibly slow
>
>
>
>
>
> TermQuery query = new TermQuery(new Term("id", "1234"));
>
> searcher.Search(query)
>
>
>
>
>
> Any tips on how to speed up such an search?
>
>
>
> I'm also doing RangeSearches on lower / upper ids, and those are slow too
>
> _________________________________________________________________
> The New Busy is not the too busy. Combine all your e-mail accounts with
> Hotmail.
>
> http://www.windowslive.com/campaign/thenewbusy?tile=multiaccount&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4