You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@manifoldcf.apache.org by Karl Wright <da...@gmail.com> on 2018/07/30 09:48:13 UTC

Re: Scheduling Problem

Hi Cheng,

Dynamic recrawl revisits documents based on the frequency that they changed
in the past.   It is therefore hard to make any prediction about whether a
document will be recrawled in a given time interval.  You need recrawls of
existing directories in order to discover new documents in SharePoint.

If you want more predictable crawling, I'd suggest doing standard minimal
crawls on a fixed schedule.  That will pick up any new documents added.
Then do full crawls (not dynamic) periodically (once a week?) to clean up
any deleted documents.

Thanks,
Karl


On Mon, Jul 30, 2018 at 4:35 AM Cheng Zeng <ze...@hotmail.co.uk> wrote:

> Hi Karl,
>
>
> I have a question about the schedule-related configuration in the job. I
> have a continuously running job which crawls the documents in Sharepoint
> 2013 and the job is supposed to re-crawl about 26,000 docs every 24 hours
> as configured, however, it seems that there are something wrong with my
> configuration, as I found that the number of active documents is only
> increased by 1 or 2  when there are about 20 new documents created in the
> Sharepoint after the continuous job runs for over a few weeks. If I
> restarted the job, there were more active documents found and the number of
> active documents reflected the correct number of the documents in the
> Sharepoint lists. It seems that the job is not re-scanning all the
> documents. I suspect there is something wrong with my scheduling
> configuration. Although I have read section about how to set up the
> schedule-related configuration information at end-user-documentation at
> https://manifoldcf.apache.org/release/release-2.10/en_US/end-user-documentation.html#jobs,
> I am still confused by the incorrect number of active documents of the job
> after the continuous job runs for a few weeks.  The version of mcf I am
> using is 2.6.
>
>
> My schedule configuration is as follows:
>
>
> Schedule type: Rescan Documents Dynamically
>
> Recrawl interval (if continuous): 1440 minutes
>
> Maximum recrawl interval (if continuous): blank
>
> Expiration interval (if continuous): blank
>
> Reseed interval (if continuous): blank
>
>
>
> Scheduled time:
>
> schedule 1: Any day of week, 5am plus 0, every month of year on any day of
> month         Job invocation:complete
>
>                      Maximum run time: 3000 minutes
>
>
> schedule 2: Any day of week, 12pm plus 0, every month of year on any day
> of month      Job invocation:complete
>
>                      Maximum run time: 3000 minutes
>
>
> The screenshot of the scheduling is attached.
>
>
> Could you please give me some advice about the problem I face with.
>
>
> BTW: Does MCF support Domino? Are there any methods to extract documents
> from Domino?
>
>
>
> Best wishes,
>
>
> Cheng
>
>
>
>

Re: Scheduling Problem and the IBM Domino Connector

Posted by Karl Wright <da...@gmail.com>.
I am not aware of any existing Domino connector.

Karl


On Mon, Jul 30, 2018 at 12:19 PM Cheng Zeng <ze...@hotmail.co.uk> wrote:

> Thank you very much for your reply. Your advice is very helpful.
>
> I am wondering if the MCF supports IBM Domino?
>
> Does anyone know if there are available libraries or API resource to
> extract documents from Domino server?
>
> Best wishes,
> Cheng
>
> On 30 Jul 2018, at 17:48, Karl Wright <da...@gmail.com> wrote:
>
> Hi Cheng,
>
> Dynamic recrawl revisits documents based on the frequency that they
> changed in the past.   It is therefore hard to make any prediction about
> whether a document will be recrawled in a given time interval.  You need
> recrawls of existing directories in order to discover new documents in
> SharePoint.
>
> If you want more predictable crawling, I'd suggest doing standard minimal
> crawls on a fixed schedule.  That will pick up any new documents added.
> Then do full crawls (not dynamic) periodically (once a week?) to clean up
> any deleted documents.
>
> Thanks,
> Karl
>
>
> On Mon, Jul 30, 2018 at 4:35 AM Cheng Zeng <ze...@hotmail.co.uk> wrote:
>
>> Hi Karl,
>>
>>
>> I have a question about the schedule-related configuration in the job. I
>> have a continuously running job which crawls the documents in Sharepoint
>> 2013 and the job is supposed to re-crawl about 26,000 docs every 24
>> hours as configured, however, it seems that there are something wrong with
>> my configuration, as I found that the number of active documents is only
>> increased by 1 or 2  when there are about 20 new documents created in the
>> Sharepoint after the continuous job runs for over a few weeks. If I
>> restarted the job, there were more active documents found and the number of
>> active documents reflected the correct number of the documents in the
>> Sharepoint lists. It seems that the job is not re-scanning all the
>> documents. I suspect there is something wrong with my scheduling
>> configuration. Although I have read section about how to set up the
>> schedule-related configuration information at end-user-documentation at
>> https://manifoldcf.apache.org/release/release-2.10/en_US/end-user-documentation.html#jobs,
>> I am still confused by the incorrect number of active documents of the job
>> after the continuous job runs for a few weeks.  The version of mcf I am
>> using is 2.6.
>>
>>
>> My schedule configuration is as follows:
>>
>>
>> Schedule type: Rescan Documents Dynamically
>>
>> Recrawl interval (if continuous): 1440 minutes
>>
>> Maximum recrawl interval (if continuous): blank
>>
>> Expiration interval (if continuous): blank
>>
>> Reseed interval (if continuous): blank
>>
>>
>>
>> Scheduled time:
>>
>> schedule 1: Any day of week, 5am plus 0, every month of year on any day
>> of month         Job invocation:complete
>>
>>                      Maximum run time: 3000 minutes
>>
>>
>> schedule 2: Any day of week, 12pm plus 0, every month of year on any day
>> of month      Job invocation:complete
>>
>>                      Maximum run time: 3000 minutes
>>
>>
>> The screenshot of the scheduling is attached.
>>
>>
>> Could you please give me some advice about the problem I face with.
>>
>>
>> BTW: Does MCF support Domino? Are there any methods to extract documents
>> from Domino?
>>
>>
>>
>> Best wishes,
>>
>>
>> Cheng
>>
>>
>>
>>

Re: Scheduling Problem and the IBM Domino Connector

Posted by Cheng Zeng <ze...@hotmail.co.uk>.
Thank you very much for your reply. Your advice is very helpful.

I am wondering if the MCF supports IBM Domino?

Does anyone know if there are available libraries or API resource to extract documents from Domino server?

Best wishes,
Cheng

On 30 Jul 2018, at 17:48, Karl Wright <da...@gmail.com>> wrote:

Hi Cheng,

Dynamic recrawl revisits documents based on the frequency that they changed in the past.   It is therefore hard to make any prediction about whether a document will be recrawled in a given time interval.  You need recrawls of existing directories in order to discover new documents in SharePoint.

If you want more predictable crawling, I'd suggest doing standard minimal crawls on a fixed schedule.  That will pick up any new documents added.  Then do full crawls (not dynamic) periodically (once a week?) to clean up any deleted documents.

Thanks,
Karl


On Mon, Jul 30, 2018 at 4:35 AM Cheng Zeng <ze...@hotmail.co.uk>> wrote:

Hi Karl,


I have a question about the schedule-related configuration in the job. I have a continuously running job which crawls the documents in Sharepoint 2013 and the job is supposed to re-crawl about 26,000 docs every 24 hours as configured, however, it seems that there are something wrong with my configuration, as I found that the number of active documents is only increased by 1 or 2  when there are about 20 new documents created in the Sharepoint after the continuous job runs for over a few weeks. If I restarted the job, there were more active documents found and the number of active documents reflected the correct number of the documents in the Sharepoint lists. It seems that the job is not re-scanning all the documents. I suspect there is something wrong with my scheduling configuration. Although I have read section about how to set up the schedule-related configuration information at end-user-documentation at https://manifoldcf.apache.org/release/release-2.10/en_US/end-user-documentation.html#jobs, I am still confused by the incorrect number of active documents of the job after the continuous job runs for a few weeks.  The version of mcf I am using is 2.6.


My schedule configuration is as follows:


Schedule type: Rescan Documents Dynamically

Recrawl interval (if continuous): 1440 minutes

Maximum recrawl interval (if continuous): blank

Expiration interval (if continuous): blank

Reseed interval (if continuous): blank



Scheduled time:

schedule 1: Any day of week, 5am plus 0, every month of year on any day of month         Job invocation:complete

                     Maximum run time: 3000 minutes


schedule 2: Any day of week, 12pm plus 0, every month of year on any day of month      Job invocation:complete

                     Maximum run time: 3000 minutes


The screenshot of the scheduling is attached.


Could you please give me some advice about the problem I face with.


BTW: Does MCF support Domino? Are there any methods to extract documents from Domino?



Best wishes,


Cheng