You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@manifoldcf.apache.org by "Karl Wright (JIRA)" <ji...@apache.org> on 2019/01/24 23:15:00 UTC

[jira] [Resolved] (CONNECTORS-1573) Web Crawler exclude from index matches too much?

     [ https://issues.apache.org/jira/browse/CONNECTORS-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Karl Wright resolved CONNECTORS-1573.
-------------------------------------
    Resolution: Not A Problem

> Web Crawler exclude from index matches too much?
> ------------------------------------------------
>
>                 Key: CONNECTORS-1573
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1573
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: Web connector
>    Affects Versions: ManifoldCF 2.10
>            Reporter: Korneel Staelens
>            Priority: Major
>
> Hello, 
> I'm not sure this is a bug, or my misinterpretation of the exclusion rules:
> I want to set-up a rule, so that it does NOT index a parentpage, but does index all childpages of that parent:
> I'm setting up a rule: 
> Inclusions: 
> .*
>  
> Exclustions:
> [http://www.website.com/nl/]
> (I've tried also: http://www.website.com/nl/(\s)* )
> No dice, I'f I'm looking at the logs, I see the pages are crawled, but not indexed due to job restriction. Is my rule wrong? Or is this a small bug?
>  
> Thanks for advice!
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)