You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by sam xia <ho...@yahoo.com> on 2004/02/25 22:01:49 UTC

segments question

Hi,

My pages can be sorted to about 10000 sub categories. 
Each category could have up to 1 million html pages.
(of course, right now I do not have this yet. I am on
the early staging of thinking...) The index will be
stored in hard disk.

A user may be interested in 10 out of the 10000 sub
categories depending on the query string. I would like
to have the search within the 10 sub categories. I do
not want to waste time searching on 9990 categories.

One approach is to build each category into a segment.
Then there will be 10000 segments. So the query will
be run within the 10 segments. But putting 10000 sub
folders to a hard drive could slow things down, since
hard disk seek is slow.

Or should I build the whole thing into one big segment
and use the filter to do this. There is a DateFilter.
Is there a way to implement a category filter?

What is the best way to accomplish this?


Thanks very much



__________________________________
Do you Yahoo!?
Yahoo! Mail SpamGuard - Read only the mail you want.
http://antispam.yahoo.com/tools

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: segments question

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Feb 25, 2004, at 7:58 PM, sam xia wrote:
>
>> I'd recommend a pool of filters for each category.
>> Regenerate them
>> when the index changes, otherwise leave the
>> instances alive and reuse
>> them for queries - this will speed things up pretty
>> dramatically I'd
>> guess.  There is a QueryFilter you could use, or
>> write a custom one
>> that could be faster.
>>
>> 	Erik
>>
> thanks for your quick response, Erik. BTW, I checked
> your web site, the logo is cool.

haha.... I'll take that a hint that I should put more there than a 
"splash" page :)

> Is there a way to store the filter cache to hard
> drive? Then I can just read it from hard drive. Since
> I have lots of categories, it might be impossible to
> cashe every query filter bitsets in memory.

BitSet's are tiny - so you'd have to have a *lot* of categories to fill 
up a decent sized RAM.

But Filter is Serializable, so you could (in theory) persist them 
somewhere.

> How about I keep every category to one segment?

I'm not sure how you'd ensure that.  Never optimize?

You could have a separate index per category, but you already mentioned 
that I think and it would take up file handles.

>  Then I
> have 10000 segements to work with. Is it going to be
> faster than the filter solution?

Filters are fast.

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: segments question

Posted by sam xia <ho...@yahoo.com>.
> I'd recommend a pool of filters for each category. 
> Regenerate them 
> when the index changes, otherwise leave the
> instances alive and reuse 
> them for queries - this will speed things up pretty
> dramatically I'd 
> guess.  There is a QueryFilter you could use, or
> write a custom one 
> that could be faster.
> 
> 	Erik
> 
thanks for your quick response, Erik. BTW, I checked
your web site, the logo is cool. 

Is there a way to store the filter cache to hard
drive? Then I can just read it from hard drive. Since
I have lots of categories, it might be impossible to
cashe every query filter bitsets in memory.

How about I keep every category to one segment? Then I
have 10000 segements to work with. Is it going to be
faster than the filter solution?

------
Plus:
I am looking forward to your lucene book. I found one
in Amazon.com. Not sure if any good.

Professional Portal Development with Apache Tools :
Jetspeed, Lucene, James, Slide (Wrox Press)
by W. Clay Richardson (Author), Donald Avondolio
(Author), Joe Vitale (Author), Peter Len (Author),
Kevin T. Smith (Author) 

-- thank you 


__________________________________
Do you Yahoo!?
Yahoo! Mail SpamGuard - Read only the mail you want.
http://antispam.yahoo.com/tools

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: segments question

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Feb 25, 2004, at 4:01 PM, sam xia wrote:
> Or should I build the whole thing into one big segment
> and use the filter to do this. There is a DateFilter.
> Is there a way to implement a category filter?
>
> What is the best way to accomplish this?

I'd recommend a pool of filters for each category.  Regenerate them 
when the index changes, otherwise leave the instances alive and reuse 
them for queries - this will speed things up pretty dramatically I'd 
guess.  There is a QueryFilter you could use, or write a custom one 
that could be faster.

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org