You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Ray <ra...@rayweb.de> on 2008/03/06 16:23:11 UTC
MultiSearcher to overcome the Integer.MAX_VALUE limit
Hey Guys,
just a quick question to confirm an assumption I have.
Is it correct that I can have around 100 Indexes each at its
Integer.MAX_VALUE limit of documents, but can happily
search them all with a MultiSearcher if all combined returned
hits don't add up to the Integer.MAX_VALUE themselves ?
Kind regards,
Ray.
Re: MultiSearcher to overcome the Integer.MAX_VALUE limit
Posted by Erick Erickson <er...@gmail.com>.
Well, I really don't have a clue what'll happen with that many
documents. It's more a matter of unique terms from what I
understand.
I'll be *really* curious how it turns out.
Erick
On Thu, Mar 6, 2008 at 6:03 PM, Ray <ra...@rayweb.de> wrote:
>
> Thanks for your answer.
>
> Well I want to search around 6 billion documents.
> Most of them very small, but I am confident to be hitting
> that number in the long run.
>
> I am currently running a small random text indexer with 400 docs/second.
> It will reach 2 billion in around 45 days.
>
> I really hope you all who are saying 2 billion docs
> will bring lucene to its knees are wrong...
>
> Ray.
>
> ----- Original Message -----
> From: "Erick Erickson" <er...@gmail.com>
> To: <ja...@lucene.apache.org>
> Sent: Thursday, March 06, 2008 10:40 PM
> Subject: Re: MultiSearcher to overcome the Integer.MAX_VALUE limit
>
>
> > Well, I'm not sure. But any index, even one split amongst many nodes
> > is going to have some interesting performance characteristics if you
> > have over 2 billion documents.... So I'm not sure it matters <G>...
> >
> > What problem are you really trying to solve? You'll probably get
> > more meaningful answers if you tell us what that is.
> >
> > Best
> > Erick
> >
> > On Thu, Mar 6, 2008 at 10:23 AM, Ray <ra...@rayweb.de> wrote:
> >
> >> Hey Guys,
> >>
> >> just a quick question to confirm an assumption I have.
> >>
> >> Is it correct that I can have around 100 Indexes each at its
> >> Integer.MAX_VALUE limit of documents, but can happily
> >> search them all with a MultiSearcher if all combined returned
> >> hits don't add up to the Integer.MAX_VALUE themselves ?
> >>
> >> Kind regards,
> >>
> >> Ray.
> >>
> >>
> >>
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
RE: MultiSearcher to overcome the Integer.MAX_VALUE limit
Posted by sp...@gmx.eu.
> Right... but trust me, you really wouldn't want to. You need
> distributed search at that level anyway.
Hm, 2 billion small docs are not so much.
Why do I need distributed search and what exactly do you means with
distributed search? Multiple IndexSearchers? Multiple processes? Multiple
machines?
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: MultiSearcher to overcome the Integer.MAX_VALUE limit
Posted by Yonik Seeley <yo...@apache.org>.
On Sat, Mar 8, 2008 at 2:06 PM, <sp...@gmx.eu> wrote:
> Does this mean that I cannot search indexes with more than 2 billion docs at
> all with a single IndexSearcher?
Right... but trust me, you really wouldn't want to. You need
distributed search at that level anyway.
-Yonik
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
RE: MultiSearcher to overcome the Integer.MAX_VALUE limit
Posted by sp...@gmx.eu.
Does this mean that I cannot search indexes with more than 2 billion docs at
all with a single IndexSearcher?
> -----Original Message-----
> From: Mark Miller [mailto:markrmiller@gmail.com]
> Sent: Samstag, 8. März 2008 18:57
> To: java-user@lucene.apache.org
> Subject: Re: MultiSearcher to overcome the Integer.MAX_VALUE limit
>
> Random text can often be pretty slow when done per word.
>
> I think you will have to modify the MultiSearcher a bit. The
> MultiSearcher takes a global id space and converts to and from an
> individual Searcher id space. The MultiSearcher's id space is
> limited to
> an int as well, but I think if you change it to a float/double, you
> should be all set.
>
> - Mark
>
> Toke Eskildsen wrote:
> > On Fri, 2008-03-07 at 00:03 +0100, Ray wrote:
> >
> >> I am currently running a small random text indexer with
> 400 docs/second.
> >> It will reach 2 billion in around 45 days.
> >>
> >
> > If you are just doing it to test large indexes (in terms of document
> > count), then you need to look into your index-generation
> code. I tried
> > making an ultra-simple index builder, where each document contains a
> > unique id and one of nine fixed strings. The index-building
> speed on my
> > desktop computer is 40.000 documents/second (tested with 100 million
> > documents).
> >
> > I would suspect that your random text generator is where all the
> > time-intensive processing occurs. Either that or you're
> flushing after
> > each document addition (which lowers my execution speed to about 100
> > documents/second).
> >
> >
> >
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: MultiSearcher to overcome the Integer.MAX_VALUE limit
Posted by Mark Miller <ma...@gmail.com>.
Random text can often be pretty slow when done per word.
I think you will have to modify the MultiSearcher a bit. The
MultiSearcher takes a global id space and converts to and from an
individual Searcher id space. The MultiSearcher's id space is limited to
an int as well, but I think if you change it to a float/double, you
should be all set.
- Mark
Toke Eskildsen wrote:
> On Fri, 2008-03-07 at 00:03 +0100, Ray wrote:
>
>> I am currently running a small random text indexer with 400 docs/second.
>> It will reach 2 billion in around 45 days.
>>
>
> If you are just doing it to test large indexes (in terms of document
> count), then you need to look into your index-generation code. I tried
> making an ultra-simple index builder, where each document contains a
> unique id and one of nine fixed strings. The index-building speed on my
> desktop computer is 40.000 documents/second (tested with 100 million
> documents).
>
> I would suspect that your random text generator is where all the
> time-intensive processing occurs. Either that or you're flushing after
> each document addition (which lowers my execution speed to about 100
> documents/second).
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: MultiSearcher to overcome the Integer.MAX_VALUE limit
Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
On Fri, 2008-03-07 at 00:03 +0100, Ray wrote:
> I am currently running a small random text indexer with 400 docs/second.
> It will reach 2 billion in around 45 days.
If you are just doing it to test large indexes (in terms of document
count), then you need to look into your index-generation code. I tried
making an ultra-simple index builder, where each document contains a
unique id and one of nine fixed strings. The index-building speed on my
desktop computer is 40.000 documents/second (tested with 100 million
documents).
I would suspect that your random text generator is where all the
time-intensive processing occurs. Either that or you're flushing after
each document addition (which lowers my execution speed to about 100
documents/second).
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: MultiSearcher to overcome the Integer.MAX_VALUE limit
Posted by Ray <ra...@rayweb.de>.
Thanks for your answer.
Well I want to search around 6 billion documents.
Most of them very small, but I am confident to be hitting
that number in the long run.
I am currently running a small random text indexer with 400 docs/second.
It will reach 2 billion in around 45 days.
I really hope you all who are saying 2 billion docs
will bring lucene to its knees are wrong...
Ray.
----- Original Message -----
From: "Erick Erickson" <er...@gmail.com>
To: <ja...@lucene.apache.org>
Sent: Thursday, March 06, 2008 10:40 PM
Subject: Re: MultiSearcher to overcome the Integer.MAX_VALUE limit
> Well, I'm not sure. But any index, even one split amongst many nodes
> is going to have some interesting performance characteristics if you
> have over 2 billion documents.... So I'm not sure it matters <G>...
>
> What problem are you really trying to solve? You'll probably get
> more meaningful answers if you tell us what that is.
>
> Best
> Erick
>
> On Thu, Mar 6, 2008 at 10:23 AM, Ray <ra...@rayweb.de> wrote:
>
>> Hey Guys,
>>
>> just a quick question to confirm an assumption I have.
>>
>> Is it correct that I can have around 100 Indexes each at its
>> Integer.MAX_VALUE limit of documents, but can happily
>> search them all with a MultiSearcher if all combined returned
>> hits don't add up to the Integer.MAX_VALUE themselves ?
>>
>> Kind regards,
>>
>> Ray.
>>
>>
>>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: MultiSearcher to overcome the Integer.MAX_VALUE limit
Posted by Erick Erickson <er...@gmail.com>.
Well, I'm not sure. But any index, even one split amongst many nodes
is going to have some interesting performance characteristics if you
have over 2 billion documents.... So I'm not sure it matters <G>...
What problem are you really trying to solve? You'll probably get
more meaningful answers if you tell us what that is.
Best
Erick
On Thu, Mar 6, 2008 at 10:23 AM, Ray <ra...@rayweb.de> wrote:
> Hey Guys,
>
> just a quick question to confirm an assumption I have.
>
> Is it correct that I can have around 100 Indexes each at its
> Integer.MAX_VALUE limit of documents, but can happily
> search them all with a MultiSearcher if all combined returned
> hits don't add up to the Integer.MAX_VALUE themselves ?
>
> Kind regards,
>
> Ray.
>
>
>