You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by sam xia <ho...@yahoo.com> on 2004/02/25 22:01:49 UTC
segments question
Hi,
My pages can be sorted to about 10000 sub categories.
Each category could have up to 1 million html pages.
(of course, right now I do not have this yet. I am on
the early staging of thinking...) The index will be
stored in hard disk.
A user may be interested in 10 out of the 10000 sub
categories depending on the query string. I would like
to have the search within the 10 sub categories. I do
not want to waste time searching on 9990 categories.
One approach is to build each category into a segment.
Then there will be 10000 segments. So the query will
be run within the 10 segments. But putting 10000 sub
folders to a hard drive could slow things down, since
hard disk seek is slow.
Or should I build the whole thing into one big segment
and use the filter to do this. There is a DateFilter.
Is there a way to implement a category filter?
What is the best way to accomplish this?
Thanks very much
__________________________________
Do you Yahoo!?
Yahoo! Mail SpamGuard - Read only the mail you want.
http://antispam.yahoo.com/tools
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: segments question
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Feb 25, 2004, at 7:58 PM, sam xia wrote:
>
>> I'd recommend a pool of filters for each category.
>> Regenerate them
>> when the index changes, otherwise leave the
>> instances alive and reuse
>> them for queries - this will speed things up pretty
>> dramatically I'd
>> guess. There is a QueryFilter you could use, or
>> write a custom one
>> that could be faster.
>>
>> Erik
>>
> thanks for your quick response, Erik. BTW, I checked
> your web site, the logo is cool.
haha.... I'll take that a hint that I should put more there than a
"splash" page :)
> Is there a way to store the filter cache to hard
> drive? Then I can just read it from hard drive. Since
> I have lots of categories, it might be impossible to
> cashe every query filter bitsets in memory.
BitSet's are tiny - so you'd have to have a *lot* of categories to fill
up a decent sized RAM.
But Filter is Serializable, so you could (in theory) persist them
somewhere.
> How about I keep every category to one segment?
I'm not sure how you'd ensure that. Never optimize?
You could have a separate index per category, but you already mentioned
that I think and it would take up file handles.
> Then I
> have 10000 segements to work with. Is it going to be
> faster than the filter solution?
Filters are fast.
Erik
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: segments question
Posted by sam xia <ho...@yahoo.com>.
> I'd recommend a pool of filters for each category.
> Regenerate them
> when the index changes, otherwise leave the
> instances alive and reuse
> them for queries - this will speed things up pretty
> dramatically I'd
> guess. There is a QueryFilter you could use, or
> write a custom one
> that could be faster.
>
> Erik
>
thanks for your quick response, Erik. BTW, I checked
your web site, the logo is cool.
Is there a way to store the filter cache to hard
drive? Then I can just read it from hard drive. Since
I have lots of categories, it might be impossible to
cashe every query filter bitsets in memory.
How about I keep every category to one segment? Then I
have 10000 segements to work with. Is it going to be
faster than the filter solution?
------
Plus:
I am looking forward to your lucene book. I found one
in Amazon.com. Not sure if any good.
Professional Portal Development with Apache Tools :
Jetspeed, Lucene, James, Slide (Wrox Press)
by W. Clay Richardson (Author), Donald Avondolio
(Author), Joe Vitale (Author), Peter Len (Author),
Kevin T. Smith (Author)
-- thank you
__________________________________
Do you Yahoo!?
Yahoo! Mail SpamGuard - Read only the mail you want.
http://antispam.yahoo.com/tools
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: segments question
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Feb 25, 2004, at 4:01 PM, sam xia wrote:
> Or should I build the whole thing into one big segment
> and use the filter to do this. There is a DateFilter.
> Is there a way to implement a category filter?
>
> What is the best way to accomplish this?
I'd recommend a pool of filters for each category. Regenerate them
when the index changes, otherwise leave the instances alive and reuse
them for queries - this will speed things up pretty dramatically I'd
guess. There is a QueryFilter you could use, or write a custom one
that could be faster.
Erik
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org