You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@manifoldcf.apache.org by "Michael J. Kelleher (Created) (JIRA)" <ji...@apache.org> on 2011/12/07 15:34:39 UTC
[jira] [Created] (CONNECTORS-309) On Canonicalization Tab , Allow
regex transforms to modify the URL's for a crawl
On Canonicalization Tab , Allow regex transforms to modify the URL's for a crawl
--------------------------------------------------------------------------------
Key: CONNECTORS-309
URL: https://issues.apache.org/jira/browse/CONNECTORS-309
Project: ManifoldCF
Issue Type: Improvement
Components: Web connector
Affects Versions: ManifoldCF 0.4
Reporter: Michael J. Kelleher
Priority: Minor
There was not a "Component" for a Job. Canonicalization is part of the Job definition.
I would like the ability to use a regex to transform a URL (not necessarily including the hostname and port). Specifically what I would like to use this for is to remove certain URL request parameters from the URL.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CONNECTORS-309) On Canonicalization Tab , Allow
regex transforms to modify the URL's for a crawl
Posted by "Karl Wright (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CONNECTORS-309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Karl Wright updated CONNECTORS-309:
-----------------------------------
Fix Version/s: ManifoldCF next
> On Canonicalization Tab , Allow regex transforms to modify the URL's for a crawl
> --------------------------------------------------------------------------------
>
> Key: CONNECTORS-309
> URL: https://issues.apache.org/jira/browse/CONNECTORS-309
> Project: ManifoldCF
> Issue Type: Improvement
> Components: Web connector
> Affects Versions: ManifoldCF 0.4
> Reporter: Michael J. Kelleher
> Priority: Minor
> Fix For: ManifoldCF next
>
>
> There was not a "Component" for a Job. Canonicalization is part of the Job definition.
> I would like the ability to use a regex to transform a URL (not necessarily including the hostname and port). Specifically what I would like to use this for is to remove certain URL request parameters from the URL.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CONNECTORS-309) On Canonicalization Tab , Allow
regex transforms to modify the URL's for a crawl
Posted by "Karl Wright (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CONNECTORS-309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Karl Wright updated CONNECTORS-309:
-----------------------------------
Fix Version/s: (was: ManifoldCF 0.5)
ManifoldCF next
> On Canonicalization Tab , Allow regex transforms to modify the URL's for a crawl
> --------------------------------------------------------------------------------
>
> Key: CONNECTORS-309
> URL: https://issues.apache.org/jira/browse/CONNECTORS-309
> Project: ManifoldCF
> Issue Type: Improvement
> Components: Web connector
> Affects Versions: ManifoldCF 0.4
> Reporter: Michael J. Kelleher
> Assignee: Karl Wright
> Priority: Minor
> Fix For: ManifoldCF next
>
>
> There was not a "Component" for a Job. Canonicalization is part of the Job definition.
> I would like the ability to use a regex to transform a URL (not necessarily including the hostname and port). Specifically what I would like to use this for is to remove certain URL request parameters from the URL.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CONNECTORS-309) On Canonicalization Tab , Allow
regex transforms to modify the URL's for a crawl
Posted by "Karl Wright (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CONNECTORS-309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13164438#comment-13164438 ]
Karl Wright commented on CONNECTORS-309:
----------------------------------------
I'd view this as "custom canonicalization" functionality. The use case Mr. Kelleher relayed to me was to remove a parameter from the query string. A general regexp is not a good tool for that b/c of issues related to URL encoding, so maybe we could structure it as parameter removal/addition instead.
> On Canonicalization Tab , Allow regex transforms to modify the URL's for a crawl
> --------------------------------------------------------------------------------
>
> Key: CONNECTORS-309
> URL: https://issues.apache.org/jira/browse/CONNECTORS-309
> Project: ManifoldCF
> Issue Type: Improvement
> Components: Web connector
> Affects Versions: ManifoldCF 0.4
> Reporter: Michael J. Kelleher
> Priority: Minor
>
> There was not a "Component" for a Job. Canonicalization is part of the Job definition.
> I would like the ability to use a regex to transform a URL (not necessarily including the hostname and port). Specifically what I would like to use this for is to remove certain URL request parameters from the URL.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CONNECTORS-309) On Canonicalization Tab , Allow
regex transforms to modify the URL's for a crawl
Posted by "Karl Wright (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CONNECTORS-309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Karl Wright updated CONNECTORS-309:
-----------------------------------
Fix Version/s: (was: ManifoldCF next)
ManifoldCF 0.5
Assignee: Karl Wright
As stated this looks straightforward and will probably fit in the 0.5 timeframe.
> On Canonicalization Tab , Allow regex transforms to modify the URL's for a crawl
> --------------------------------------------------------------------------------
>
> Key: CONNECTORS-309
> URL: https://issues.apache.org/jira/browse/CONNECTORS-309
> Project: ManifoldCF
> Issue Type: Improvement
> Components: Web connector
> Affects Versions: ManifoldCF 0.4
> Reporter: Michael J. Kelleher
> Assignee: Karl Wright
> Priority: Minor
> Fix For: ManifoldCF 0.5
>
>
> There was not a "Component" for a Job. Canonicalization is part of the Job definition.
> I would like the ability to use a regex to transform a URL (not necessarily including the hostname and port). Specifically what I would like to use this for is to remove certain URL request parameters from the URL.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira