You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oodt.apache.org by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov> on 2011/07/08 20:06:31 UTC

Re: cas crawler multi-threaded

Hi Robert,

Thanks for your question. Answers below:

On Jul 7, 2011, at 11:24 AM, Ando, Robert R (388K) wrote:

> Chris,
> 
> not sure who to ask.
> 
> Is the cas-crawler multi-threaded?  If a speedup is needed,
> would it be hard to do so?  (There are many
> other ways to speed up archiving files.)

Cas-crawler is intentionally not multi-threaded by default, however 
the architecture of the system deals with that by allowing multiple  
crawlers to be run on a single directory area. The way you can 
prevent them from trampling over one another is through the use 
of PreConditionComparators and Actions to isolate what type of 
files the crawler should crawl, or via noRecur and crawlForDirs as 
options to isolate as well. 

Another strategy is separating out the ingest/staging area by directory type 
and then instantiating multiple crawlers based on that organization.

Does that help/make sense? We can chat more but I thought that 
would be a good start to the conversation.

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++