You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@manifoldcf.apache.org by "Korneel Staelens (JIRA)" <ji...@apache.org> on 2019/01/24 20:55:00 UTC
[jira] [Created] (CONNECTORS-1573) Web Crawler exclude from index
matches too much?
Korneel Staelens created CONNECTORS-1573:
--------------------------------------------
Summary: Web Crawler exclude from index matches too much?
Key: CONNECTORS-1573
URL: https://issues.apache.org/jira/browse/CONNECTORS-1573
Project: ManifoldCF
Issue Type: Bug
Components: Web connector
Affects Versions: ManifoldCF 2.10
Reporter: Korneel Staelens
Hello,
I'm not sure this is a bug, or my misinterpretation of the exclusion rules:
I want to set-up a rule, so that it does NOT index a parentpage, but does index all childpages of that parent:
I'm setting up a rule:
Inclusions:
.*
Exclustions:
[http://www.website.com/nl/]
(I've tried also: http://www.website.com/nl/(\s)* )
No dice, I'f I'm looking at the logs, I see the pages are crawled, but not indexed due to job restriction. Is my rule wrong? Or is this a small bug?
Thanks for advice!
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)