You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@manifoldcf.apache.org by Karl Wright <da...@gmail.com> on 2014/12/08 15:49:42 UTC

Re: Continues Job Crawling

Hi Babita,

How you use continuous crawling depends on what you are trying to
accomplish with it.  In the continuous crawling model, ManifoldCF requeues
documents after it crawls them, and checks them again after an interval
that is determined in part by how often they've changed in the past.  The
job therefore runs forever, or at least until there are no documents
whatsoever in the job queue.

Continuous crawling also has no way of deleting documents that are no
longer reachable from seeds, EXCEPT when hop count is in play.  For this
reason, in many cases it is a good idea to have documents expire after a
time.

Since you are crawling SharePoint, I can say the following:

- There is no point in reseeding, because only one document is ever seeded
(the root document).  So set the reseed interval to infinity (blank).
- Refetching of documents is sufficient to determine if a document has been
deleted.  So set the expiration interval to infinity (blank).
- Recrawl interval and maximum recrawl interval is up to you.  You would
need to set these based on how often you want MCF to recheck any given
SharePoint document for changes. Setting this parameter too small means
that MCF will be refetching documents constantly, which would place a heavy
load on SharePoint.  Setting this too high would mean that changes might
not be noticed for more time.

Thanks,
Karl


On Mon, Dec 8, 2014 at 9:34 AM, Babita Bansal <ba...@gmail.com>
wrote:

> Hi Karl
>
> Hope you are doing good.
>
> We will scheduling share point jobs continues, Could you please let me
> know the recommended values for these 4 parameters?
>
> Recrawl interval (if continuous): minutes (blank=infinity)Maximum recrawl
> interval (if continuous): minutes (blank=infinity)Expiration interval (if
> continuous): minutes (blank=infinity)Reseed interval (if continuous): minutes
> (blank=infinity)
>
>
> Thanks
> Babita Bansal
>
>
>