You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@manifoldcf.apache.org by "Karl Wright (JIRA)" <ji...@apache.org> on 2013/04/25 14:30:17 UTC

[jira] [Commented] (CONNECTORS-680) "Illegal seed URL" shows up in manifoldcf.log with reg exp entries in "Include in crawl" box

    [ https://issues.apache.org/jira/browse/CONNECTORS-680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13641728#comment-13641728 ] 

Karl Wright commented on CONNECTORS-680:
----------------------------------------

Ok - I misunderstood.  The seed has to match the "include in crawl" reg exp.  It clearly doesn't which is why you get the warning and the seed is excluded.  That's as designed.  It's easy to work around by just adding a regexp that captures just the naked domain, if that's what the seeds contain.
                
> "Illegal seed URL" shows up in manifoldcf.log with reg exp entries in "Include in crawl" box
> --------------------------------------------------------------------------------------------
>
>                 Key: CONNECTORS-680
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-680
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: Web connector
>            Reporter: Erlend GarĂ¥sen
>            Assignee: Erlend GarĂ¥sen
>             Fix For: ManifoldCF 1.2
>
>
> The following error shows up in manifoldcf.log if there are regular expression entries in the "Include in crawl" text box for the web crawler job:
> {code}
> WARN 2013-04-25 14:15:07,431 (Startup thread) - WEB: Illegal seed URL 'http://www.ibsen.uio.no/'
> {code}
> This has nothing to do with using a trailing slash or not, the error shows up even thought the seed URLs are entered correctly.
> The entry in the "Include in crawl" box used to trigger this error was:
> {code}
> .*bokstav=G.*
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira