You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@manifoldcf.apache.org by "Karl Wright (Commented) (JIRA)" <ji...@apache.org> on 2011/12/07 15:56:39 UTC

[jira] [Commented] (CONNECTORS-309) On Canonicalization Tab , Allow regex transforms to modify the URL's for a crawl

    [ https://issues.apache.org/jira/browse/CONNECTORS-309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13164438#comment-13164438 ] 

Karl Wright commented on CONNECTORS-309:
----------------------------------------

I'd view this as "custom canonicalization" functionality.  The use case Mr. Kelleher relayed to me was to remove a parameter from the query string.  A general regexp is not a good tool for that b/c of issues related to URL encoding, so maybe we could structure it as parameter removal/addition instead.

                
> On Canonicalization Tab , Allow regex transforms to modify the URL's for a crawl
> --------------------------------------------------------------------------------
>
>                 Key: CONNECTORS-309
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-309
>             Project: ManifoldCF
>          Issue Type: Improvement
>          Components: Web connector
>    Affects Versions: ManifoldCF 0.4
>            Reporter: Michael J. Kelleher
>            Priority: Minor
>
> There was not a "Component" for a Job.  Canonicalization is part of the Job definition.
> I would like the ability to use a regex to transform a URL (not necessarily including the hostname and port).  Specifically what I would like to use this for is to remove certain URL request parameters from the URL.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira