You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Michael Wechner <mi...@wyona.com> on 2004/07/05 13:37:00 UTC
Re: incrementally indexing a million documents
Nader Henein wrote:
> How are you documents named? is it alphabetical
alphabetically
> or numerical, Mine where numerical so I I creates n directories like so
> 11 , 12, 13, 14, .... 19, 21 , 22 , 23 .. ........ 99 you get the idea
right, but don't I loose all the performance on sorting and could
instead rebuild the index from scratch ... ?
Thanks
Michi
> and I stored the files into the directories that each belonged to
> depending on the last two numbers in the file name (you could use file
> size to shuffle the files around as well (ie, use the 2 rightmost
> numbers in the file size in bytes) so at this point you'll have
> shuffled your million docs into 100 directories and then Lucene can
> spider through each set of directories indexing let's say 5000 files
> at a time and then deleting them or moving them into another location,
> it you get 100 million files simply up the precision on the directory
> to a 3 digit setup or a 4 digit setup (once you automate it, sky's the
> limit)
> Hope this helps
>
> Nader Henein
>
>
> Michael Wechner wrote:
>
>> I try to index around a million documents. The problem is
>> that I run out of memory during sorting by uid when I go through
>> the directory recursively.
>>
>> Well, I could add more memory, but this wouldn't really solve my
>> problem,
>> because at some point I will always run out of memory (e.g. 10
>> million documents).
>>
>> Is there another approach than sorting by uid?
>>
>> Thanks
>>
>> Michi
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
--
Michael Wechner
Wyona Inc. - Open Source Content Management - Apache Lenya
http://www.wyona.com http://cocoon.apache.org/lenya/
michael.wechner@wyona.com michi@apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org