You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@manifoldcf.apache.org by "Karl Wright (JIRA)" <ji...@apache.org> on 2019/05/01 11:09:00 UTC
[jira] [Resolved] (CONNECTORS-1602) Continuous crawling doesn't
recrawl everything
[ https://issues.apache.org/jira/browse/CONNECTORS-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Karl Wright resolved CONNECTORS-1602.
-------------------------------------
Resolution: Not A Problem
> Continuous crawling doesn't recrawl everything
> ----------------------------------------------
>
> Key: CONNECTORS-1602
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1602
> Project: ManifoldCF
> Issue Type: Bug
> Components: Web connector
> Reporter: Donald Van den Driessche
> Priority: Major
>
> When crawling a website in continuous crawling mode we saw that not all documents are recrawled.
> The site is quite extensive. We figured out that after crawling a document/page gets a recrawl timestamp in between the recrawl interval and max recrawl interval.
> But if these values occur within the first crawl, Manifold starts recrawling those, but seems to ignore the rest of the website. Also sometimes documents get recrawled 5 times while other don't get recrawled. Apparently due to the same issue.
>
> Is it possible to shed a bit more light on the continuous crawling?
> Is it a good system to use for crawling a (extensive) website?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)