You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@manifoldcf.apache.org by "Karl Wright (JIRA)" <ji...@apache.org> on 2017/01/06 12:59:58 UTC
[jira] [Assigned] (CONNECTORS-1364) Better bin naming in the Shared
Drive Connector
[ https://issues.apache.org/jira/browse/CONNECTORS-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Karl Wright reassigned CONNECTORS-1364:
---------------------------------------
Assignee: Karl Wright
> Better bin naming in the Shared Drive Connector
> -----------------------------------------------
>
> Key: CONNECTORS-1364
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1364
> Project: ManifoldCF
> Issue Type: Improvement
> Components: JCIFS connector
> Affects Versions: ManifoldCF 1.9
> Reporter: Aeham Abushwashi
> Assignee: Karl Wright
>
> Hello and happy new year!
> Bin naming in the Shared Drive Connector makes assumptions that are not always valid.
> As I understand it, Manifold uses bins to prevent overloading data sources. In the SDC, server name is designated as bin name. All jobs created against a particular server will be treated as one unit when documents are prioritised, which can severely disadvantage some jobs (e.g. late starters).
> Moreover, this is incompatible with some common enterprise server topologies. In Windows DFS, which is widely used in large enterprises, what the SDC thinks of as a server name, isn’t actually a physical resource. It’s a namespace that can span many servers and shares. In this case, it doesn’t make sense to throttle simply on the root ‘server’ name. In other environments, a powerful storage server can be more than capable of handling high crawl load; overzealous throttling can end up limiting/hurting Manifold’s performance there.
> I’m struggling to find a single solution that fits all so I’m leaning towards passing in to the repo connection config some sort of server topology flag or throttling depth flag as a hint that ShareDriveConnector#getBinNames can use to decide whether the bin name should be server, server+share or server+share+root_folder. Share and root_folder would need to be explicitly passed in the repo config too or extracted from the documentIdentifier arg in getBinNames (assuming it's reliable).
> Thoughts?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)