You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@manifoldcf.apache.org by "Karl Wright (Commented) (JIRA)" <ji...@apache.org> on 2011/12/07 15:56:39 UTC
[jira] [Commented] (CONNECTORS-309) On Canonicalization Tab , Allow
regex transforms to modify the URL's for a crawl
[ https://issues.apache.org/jira/browse/CONNECTORS-309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13164438#comment-13164438 ]
Karl Wright commented on CONNECTORS-309:
----------------------------------------
I'd view this as "custom canonicalization" functionality. The use case Mr. Kelleher relayed to me was to remove a parameter from the query string. A general regexp is not a good tool for that b/c of issues related to URL encoding, so maybe we could structure it as parameter removal/addition instead.
> On Canonicalization Tab , Allow regex transforms to modify the URL's for a crawl
> --------------------------------------------------------------------------------
>
> Key: CONNECTORS-309
> URL: https://issues.apache.org/jira/browse/CONNECTORS-309
> Project: ManifoldCF
> Issue Type: Improvement
> Components: Web connector
> Affects Versions: ManifoldCF 0.4
> Reporter: Michael J. Kelleher
> Priority: Minor
>
> There was not a "Component" for a Job. Canonicalization is part of the Job definition.
> I would like the ability to use a regex to transform a URL (not necessarily including the hostname and port). Specifically what I would like to use this for is to remove certain URL request parameters from the URL.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira