You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@manifoldcf.apache.org by ritika jain <ri...@gmail.com> on 2021/11/18 07:48:50 UTC

Manifoldcf background process

Hi All,

I would like to understand the background process of Manifoldcf windows
shares jobs , and how it processes the path mentioned in the jobs
configuration.

I am creating a dynamic job via API using PHP which will pick up approx 70k
of documents and a dynamic job with  70k of different paths mentioned in
the job and mention folder-subfolders path otherwise and file name in
filespec.

My question is, how does manifold work in the background to access all
different folders at a time. Because mostly all files correspond to
different folders. Does manifold loads while fetching all folder
permissions and accessing folder/subfolders files. How does it fetch
permission for one folder say for path 1 and simultaneously fetch different
folder permission/access for say path2.
Does it load the manifold. Because when this job is running then manifoldcf
seems to be under heavy load and it gets really really slow and has to
restart the docker container every 15-20 min.

How can a job be run efficiently?

Thanks
Ritika

Re: Manifoldcf background process

Posted by Karl Wright <da...@gmail.com>.
The degree of parallelism can be controlled in two ways.
The first way is to set the number of worker threads to something
reasonable.  Usually, this is no more than about 2x the number of
processors you have.
The second way is to control the number of connections in your jcifs
connector to keep it at something reasonable, e.g. 4 (because windows SMB
is really not good at handling more than that anyway).

These two controls are independent of each other.  From your description,
it sounds like the parameter you want to set is not the number of worker
threads but rather the number of connections.  But setting both properly
certainly will help.  The reason that having a high worker thread count is
bad is because you use up some amount of memory for each active thread, and
that means if you give too big a value you need to give ManifoldCF way too
much memory, and you won't be able to compute it in advance either.


Karl


On Thu, Nov 18, 2021 at 2:49 AM ritika jain <ri...@gmail.com>
wrote:

> Hi All,
>
> I would like to understand the background process of Manifoldcf windows
> shares jobs , and how it processes the path mentioned in the jobs
> configuration.
>
> I am creating a dynamic job via API using PHP which will pick up approx
> 70k of documents and a dynamic job with  70k of different paths mentioned
> in the job and mention folder-subfolders path otherwise and file name in
> filespec.
>
> My question is, how does manifold work in the background to access all
> different folders at a time. Because mostly all files correspond to
> different folders. Does manifold loads while fetching all folder
> permissions and accessing folder/subfolders files. How does it fetch
> permission for one folder say for path 1 and simultaneously fetch different
> folder permission/access for say path2.
> Does it load the manifold. Because when this job is running then
> manifoldcf seems to be under heavy load and it gets really really slow and
> has to restart the docker container every 15-20 min.
>
> How can a job be run efficiently?
>
> Thanks
> Ritika
>
>