You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@manifoldcf.apache.org by "Karl Wright (JIRA)" <ji...@apache.org> on 2015/01/28 19:02:35 UTC

[jira] [Resolved] (CONNECTORS-1153) Documents crawled using manifoldcf 1.6 or earlier are needlessly recrawled after upgrade to 1.7 or later

     [ https://issues.apache.org/jira/browse/CONNECTORS-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Karl Wright resolved CONNECTORS-1153.
-------------------------------------
    Resolution: Fixed

> Documents crawled using manifoldcf 1.6 or earlier are needlessly recrawled after upgrade to 1.7 or later
> --------------------------------------------------------------------------------------------------------
>
>                 Key: CONNECTORS-1153
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1153
>             Project: ManifoldCF
>          Issue Type: Bug
>    Affects Versions: ManifoldCF 1.7, ManifoldCF 1.8
>            Reporter: Aeham Abushwashi
>            Assignee: Karl Wright
>             Fix For: ManifoldCF 1.8.1, ManifoldCF 2.0.1, ManifoldCF 1.9, ManifoldCF 2.1
>
>
> After upgrading to mcf 1.7 or later, pre-existing documents are recrawled and re-indexed even if they have not changed in any way since their last pre-upgrade crawl. The impact can be significant for large manifold deployments with millions+ static documents.
> There appear to be three contributing factors:
> 1. The empty transformation version of a legacy document is different from the initial value of "0+0!" - in PipelineObjectWithVersions#buildAddPipeline and IncrementalIngester#checkFetchDocument
> 2. Incorrect comparison of output versions in PipelineObjectWithVersions#buildAddPipeline where oldOutputVersion is compared to a VersionContext object instead of the version string, which can be obtained by calling VersionContext#getVersionString - if IPipelineSpecification#getStageDescriptionString continues to return a VersionContext object, a rename of the method could be useful
> 3. In PipelineObjectWithVersions#buildAddPipeline, a null value for newAuthorityNameString is not treated the same as an empty string (like it is in other methods)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)