You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@manifoldcf.apache.org by "Karl Wright (JIRA)" <ji...@apache.org> on 2014/12/09 17:04:12 UTC

[jira] [Assigned] (CONNECTORS-1118) Documents processed by the shared drive connector incur an unnecessary synchronisation hit

     [ https://issues.apache.org/jira/browse/CONNECTORS-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Karl Wright reassigned CONNECTORS-1118:
---------------------------------------

    Assignee: Karl Wright

> Documents processed by the shared drive connector incur an unnecessary synchronisation hit
> ------------------------------------------------------------------------------------------
>
>                 Key: CONNECTORS-1118
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1118
>             Project: ManifoldCF
>          Issue Type: Improvement
>          Components: Framework core
>    Affects Versions: ManifoldCF 1.7.2
>            Reporter: Aeham Abushwashi
>            Assignee: Karl Wright
>
> Each document processed by the shared drive connector is passed through SharedDriveConnector#checkInclude to verify whether the document is eligible for ingestion. The calls made here to WorkerThread$ProcessActivity#checkMimeTypeIndexable and WorkerThread$ProcessActivity#checkLengthIndexable are unnecessarily costly as they each create a fresh instance of IncrementalIngester$PipelineConnections on every call. The constructor of IncrementalIngester$PipelineConnections can be very expensive due to the loading of output connection objects, which in turn requires some locking (via ZK - in a distrubuted environment).
> The other area of inefficiency is in WorkerThread$ProcessActivity#processDocumentReferences. This method creates new instances of PriorityCalculator using the less-efficient 3-arg constructor. This can be addressed using the same pattern implemented for CONNECTORS-1094
> To highlight the impact of the above calls, I profiled an active worker thread for 40 minutes. During that window, it spent ~23 minutes in SharedDriveConnector#checkInclude and its callees + 9 minutes creating instances of PriorityCalculator.
> I've seen the above issues when using the shared drive connector but I think other connectors too could be impacted - depending on how they're implemented.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)