You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@manifoldcf.apache.org by "Korneel Staelens (JIRA)" <ji...@apache.org> on 2019/01/24 20:55:00 UTC

[jira] [Created] (CONNECTORS-1573) Web Crawler exclude from index matches too much?

Korneel Staelens created CONNECTORS-1573:
--------------------------------------------

             Summary: Web Crawler exclude from index matches too much?
                 Key: CONNECTORS-1573
                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1573
             Project: ManifoldCF
          Issue Type: Bug
          Components: Web connector
    Affects Versions: ManifoldCF 2.10
            Reporter: Korneel Staelens


Hello, 

I'm not sure this is a bug, or my misinterpretation of the exclusion rules:

I want to set-up a rule, so that it does NOT index a parentpage, but does index all childpages of that parent:

I'm setting up a rule: 

Inclusions: 

.*

 

Exclustions:

[http://www.website.com/nl/]

(I've tried also: http://www.website.com/nl/(\s)* )

No dice, I'f I'm looking at the logs, I see the pages are crawled, but not indexed due to job restriction. Is my rule wrong? Or is this a small bug?

 

Thanks for advice!

 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)