You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by zqzuk <zi...@hotmail.com> on 2008/01/09 12:14:20 UTC

Using the post tool - too many files in a folder?

Hi, I am using the post.jar tool to post files to solr. I d like to post
everything in a folder, e.g., "myfolder". I typed in command:

java -jar post.jar c:/myfolder/*.xml.

This works perfectly when I test on a sample of 100k xml files. But when I
work on the real dataset, there are over 1m files in the folder. And when I
typed in the same command and hits enter, the program hangs and there are no
response after a long while.

Is it because there are too many files? What is the best practice (should I
separate the 1million files into 100 subfolders and do the posting from
those folders separately?)

Many thanks!
-- 
View this message in context: http://www.nabble.com/Using-the-post-tool---too-many-files-in-a-folder--tp14709773p14709773.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Using the post tool - too many files in a folder?

Posted by Chris Hostetter <ho...@fucit.org>.
see the other responses to address the solr/best practices aspects of your 
sitaution, but as to the specifics of why you are seeing what you are 
seeing:

your shell (regardless of wether you are on a unix box or a windows box) 
has toresolve the "*.xml" part of that command line and convert it to the 
"real" list of all files in a directory ... with more then a few thousand 
files that will take a non-negligable amount of time anyway -- let alone 
when you get to the millions.

this is why you might notice a considerable pause before the 
SimplePostTool ever logs anything ... it hasn't even been run by your 
shell yet (not to mention that once your shell finally does execute the 
java process, the JVM has to initialize an array of those million 
filenames before it ever calls the "main(String[])" function of th 
SimplePostTool)


-Hoss


Re: Using the post tool - too many files in a folder?

Posted by Yonik Seeley <yo...@apache.org>.
Best practice indexing doesn't create intermediate files at all, but
constructs in memory and posts to solr via an indexing program.  There
are java, ruby, python, etc clients to help you talk to Solr over
HTTP.

If you don't want to do any programming and your data is in a
database, using a CSV dump may be the next best option.

-Yonik

On Jan 9, 2008 9:11 AM, Gunther, Andrew <Gu...@si.edu> wrote:
> Is there a practical reason behind trying to post 1m different files
> instead of several large files. If this is a unix setup can you try
> post.sh instead.
>
> -----Original Message-----
> From: zqzuk [mailto:ziqi.zhang@hotmail.com]
> Sent: Wednesday, January 09, 2008 6:14 AM
> To: solr-user@lucene.apache.org
> Subject: Using the post tool - too many files in a folder?
>
>
> Hi, I am using the post.jar tool to post files to solr. I d like to post
> everything in a folder, e.g., "myfolder". I typed in command:
>
> java -jar post.jar c:/myfolder/*.xml.
>
> This works perfectly when I test on a sample of 100k xml files. But when
> I
> work on the real dataset, there are over 1m files in the folder. And
> when I
> typed in the same command and hits enter, the program hangs and there
> are no
> response after a long while.
>
> Is it because there are too many files? What is the best practice
> (should I
> separate the 1million files into 100 subfolders and do the posting from
> those folders separately?)
>
> Many thanks!
> --
> View this message in context:
> http://www.nabble.com/Using-the-post-tool---too-many-files-in-a-folder--
> tp14709773p14709773.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

RE: Using the post tool - too many files in a folder?

Posted by "Gunther, Andrew" <Gu...@si.edu>.
Is there a practical reason behind trying to post 1m different files
instead of several large files. If this is a unix setup can you try
post.sh instead.

-----Original Message-----
From: zqzuk [mailto:ziqi.zhang@hotmail.com] 
Sent: Wednesday, January 09, 2008 6:14 AM
To: solr-user@lucene.apache.org
Subject: Using the post tool - too many files in a folder?


Hi, I am using the post.jar tool to post files to solr. I d like to post
everything in a folder, e.g., "myfolder". I typed in command:

java -jar post.jar c:/myfolder/*.xml.

This works perfectly when I test on a sample of 100k xml files. But when
I
work on the real dataset, there are over 1m files in the folder. And
when I
typed in the same command and hits enter, the program hangs and there
are no
response after a long while.

Is it because there are too many files? What is the best practice
(should I
separate the 1million files into 100 subfolders and do the posting from
those folders separately?)

Many thanks!
-- 
View this message in context:
http://www.nabble.com/Using-the-post-tool---too-many-files-in-a-folder--
tp14709773p14709773.html
Sent from the Solr - User mailing list archive at Nabble.com.